pocket_coffea.utils package

Contents

pocket_coffea.utils package#

Submodules#

pocket_coffea.utils.benchmarking module#

pocket_coffea.utils.benchmarking.print_processing_stats(output, start_time, workers)#

Prints processing statistics using rich.Table.

pocket_coffea.utils.build_jets_calibrator module#

Nice code to build the JEC/JER and JES uncertainties taken from andrzejnovak/boostedhiggs

pocket_coffea.utils.build_jets_calibrator.build(params, filter_years=None)#

Build the factory objects from the list of JEC files for each era for ak4 and ak8 jets and same them on disk in cloudpikle format

pocket_coffea.utils.configurator module#

class pocket_coffea.utils.configurator.Configurator(workflow, parameters, datasets, skim, preselections, categories, weights, variations, variables, weights_classes=None, columns=None, workflow_options=None, save_skimmed_files=None, do_postprocessing=True)#

Bases: object

Main class driving the configuration of a PocketCoffea analysis. The Configurator groups the several aspects that define an analysis run: - skims, preselections, categorization - output: variables and columns - datasets - weights and variations and the objects proving them - workflow - analysis parameters

The running environment configuration is not part of the Configurator class.

The available Weights are taken from the list of weights classes passed to the Configurator.

clone()#

Create a copy of the configurator in the loaded=False state

filter_dataset(nfiles)#
load()#

This function loads the configuration for samples/weights/variations and creates the necessary objects for the processor to use. It also loads the workflow

load_columns_config(wcfg)#
load_cuts_and_categories(skim: list, preselections: list, categories)#

This function loads the list of cuts and groups them in categories. Each cut is identified by a unique id (see Cut class definition)

load_datasets()#
load_subsamples()#
load_variations_config(wcfg, variation_type)#

This function loads the variations definition and prepares a list of weights to be applied for each sample and category

load_weights_config(wcfg)#

This function loads the weights definition and prepares a list of weights to be applied for each sample and category

load_workflow()#
perform_checks()#
save_config(output)#
set_filesets_manually(filesets)#

This function sets the filesets directly, usually before the configuration is loaded. This is useful to pickle an unloaded version of the configuration restricting the filesets a priori. It is used in the condor submission script. The filesets_loaded attribute is set to True to avoid reloading the datasets.

pocket_coffea.utils.configurator.format(data, indent=0, width=80, depth=None, compact=True, sort_dicts=True)#

pocket_coffea.utils.dataset module#

class pocket_coffea.utils.dataset.Dataset(name, cfg, sites_cfg=None, append_parents=False)#

Bases: object

check_samples()#
down_file = <parsl.app.python.PythonApp object>#
download()#
get_samples(files)#
save(append=True, overwrite=False, split=False)#
class pocket_coffea.utils.dataset.Sample(name, das_names, sample, metadata, sites_cfg, **kwargs)#

Bases: object

check_files(prefix)#
get_filelist()#

Function to get the dataset filelist from DAS and from Rucio. From DAS we get the general info about the dataset (event count, file size), whereas from rucio we get the specific path at the sites without the redirector (it helps with xrootd access in coffea).

get_parentlist(inplace=False)#

Function to get the parent dataset filelist from DAS. The parent list is included as an additional metadata in the sample’s dict.

get_sample_dict(redirector=True, prefix='root://xrootd-cms.infn.it//')#
pocket_coffea.utils.dataset.build_datasets(cfg, keys=None, overwrite=False, download=False, check=False, split_by_year=False, local_prefix=None, allowlist_sites=None, include_redirector=False, blocklist_sites=None, regex_sites=None, parallelize=4)#
pocket_coffea.utils.dataset.do_dataset(key, config, local_prefix, allowlist_sites, include_redirector, blocklist_sites, regex_sites, **kwargs)#

pocket_coffea.utils.load_output module#

pocket_coffea.utils.load_output.load_output(file)#

pocket_coffea.utils.logging module#

class pocket_coffea.utils.logging.LogFormatter(color, *args, **kwargs)#

Bases: Formatter

COLOR_CODES = {10: '\x1b[1;30m', 20: '\x1b[0;37m', 30: '\x1b[1;33m', 40: '\x1b[1;31m', 50: '\x1b[1;35m'}#
RESET_CODE = '\x1b[0m'#
format(record, *args, **kwargs)#

Format the specified record as text.

The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.

pocket_coffea.utils.logging.setup_logging(console_log_output, console_log_level, console_log_color, logfile_file, logfile_log_level, logfile_log_color, log_line_template)#

pocket_coffea.utils.network module#

pocket_coffea.utils.network.check_port(port)#
pocket_coffea.utils.network.get_proxy_path() str#

Checks if the VOMS proxy exists and if it is valid for at least 1 hour. If it exists, returns the path of it

pocket_coffea.utils.plot_efficiency module#

class pocket_coffea.utils.plot_efficiency.EfficiencyMap(shape, config, year, outputdir, mode='standard')#

Bases: object

compute_efficiency(cat, var, era=None)#

Compute the data and MC efficiency, the scale factor and the corresponding uncertainties for a given category cat and a variation var. If the computation has to be performed for a specific data-taking era, also the argument era has to be specified.

define_1d_figures(cat, syst, save_plots=True)#

Define the figures of the 1D histogram plots.

define_datamc(cat, var, era=None)#

Define the data and MC dictionaries used for slicing the histograms self.h_data and self.h_mc, for .

define_systematics()#

Define the list of systematics, given the variations.

define_variations(syst)#

Define the variations, given a systematic uncertainty syst.

initialize_stack()#

Initialize the lists and dictionaries to save the scale factor corrections in a stack.

plot1d(cat, syst, var, save_plots=True, era=None)#

Function to plot a 1D efficiency or scale factor for a given category cat and a variation var. To save the output plots, the flag save_plots has to be set to True. If the computation has to be performed for a specific data-taking era, also the argument era has to be specified.

plot2d(cat, syst, var, save_plots=True, era=None)#

Function to plot a 2D efficiency or scale factor for a given category cat and a variation var. To save the output plots, the flag save_plots has to be set to True. If the computation has to be performed for a specific data-taking era, also the argument era has to be specified.

save1d(save_plots)#

Function to save the 1D plots as png files if save_plots is set to True.

save2d(cat, syst, var, label, save_plots, era=None)#

Function that saves the 2D plots as png files if save_plots is set to True. The category cat, the systematic uncertainty syst and the variation var have to be specified. The argument label is required to specify the map that needs to be plotted.

save_corrections()#

Function to save the dictionary of corrections containing the scale factor value, the x-axis (and y-axis for 2D maps) and the data-taking year.

pocket_coffea.utils.plot_efficiency.plot_efficiency_maps(shape, config, year, outputdir, save_plots=False)#

Function to plot 1D and 2D efficiencies and scale factors and save the corrections dictionaries for the systematic variations included in the input histograms.

pocket_coffea.utils.plot_efficiency.plot_efficiency_maps_splitHT(shape, config, year, outputdir, save_plots=False)#

Function to plot 1D and 2D efficiencies and scale factors and save the corrections dictionaries for the HT systematic variation.

pocket_coffea.utils.plot_efficiency.plot_efficiency_maps_spliteras(shape, config, year, outputdir, save_plots=False)#

Function to plot 1D and 2D efficiencies and scale factors and save the corrections dictionaries for the data-taking era systematic variation.

pocket_coffea.utils.plot_efficiency.plot_ratio(x, y, ynom, yerrnom, xerr, edges, xlabel, ylabel, syst, var, opts, ax, data=False, sf=False, **kwargs)#

Function to plot the uncertainty band corresponding to the variation of an efficiency or scale factor in the ratio plot on an axis ax. To plot the data efficiency variation, the flag data has to be set to True. To plot the scale factor variation, the flag sf has to be set to True.

pocket_coffea.utils.plot_efficiency.plot_residue(x, y, ynom, yerrnom, xerr, edges, xlabel, ylabel, syst, var, opts, ax, data=False, sf=False, **kwargs)#

Function to plot the uncertainty band corresponding to the variation of an efficiency or scale factor in the residue plot on an axis ax. To plot the data efficiency variation, the flag data has to be set to True. To plot the scale factor variation, the flag sf has to be set to True.

pocket_coffea.utils.plot_efficiency.plot_variation(x, y, yerr, xerr, xlabel, ylabel, syst, var, opts, ax, data=False, sf=False, **kwargs)#

Function to plot a variation of an efficiency or scale factor on an axis ax. To plot the data efficiency variation, the flag data has to be set to True. To plot the scale factor variation, the flag sf has to be set to True.

pocket_coffea.utils.plot_efficiency.stack_sum(stack)#

Returns the sum histogram of a stack (hist.stack.Stack) of histograms.

pocket_coffea.utils.plot_efficiency.uncertainty_efficiency(eff, den, sumw2_num=None, sumw2_den=None, mc=False)#

Returns the uncertainty on an efficiency eff=num/den given the efficiency eff, the denominator den. For MC efficiency also the sum of the squared weights of numerator and denominator (sumw2_num, sumw2_den) have to be passed as argument and the flag mc has to be set to True.

pocket_coffea.utils.plot_efficiency.uncertainty_sf(eff_data, eff_mc, unc_eff_data, unc_eff_mc)#

Returns the uncertainty on a scale factor given the data and MC efficiency (eff_data, eff_mc) and the corresponding uncertainties (unc_eff_data, unc_eff_mc).

pocket_coffea.utils.plot_functions module#

pocket_coffea.utils.plot_functions.plot_shapes_comparison(df, var, shapes, title=None, ylog=False, output_folder=None, figsize=(8, 9), dpi=100, lumi_label='$137/fb$ (13 TeV)', outputfile=None)#

This function plots the comparison between different shapes, specified in the format shapes = [ (sample,cat,year,variation, label),]

The sample, cat and year are used to retrive the shape from the df, the label is used in the plotting. The ratio of all the shapes w.r.t. of the first one in the list are printed.

The plot is saved if outputfile!=None.

pocket_coffea.utils.plot_sf module#

pocket_coffea.utils.plot_sf.plot_variation_correctionlib(file, axis_x, systematics, plot_dir, **kwargs)#

pocket_coffea.utils.plot_utils module#

class pocket_coffea.utils.plot_utils.PlotManager(variables, hist_objs, datasets_metadata, plot_dir, style_cfg, has_mcstat=True, toplabel=None, only_cat=None, only_year=None, workers=8, log=False, density=False, verbose=1, save=True, index_file=None, cache=True)#

Bases: object

This class manages multiple Shape objects and their plotting.

copy_index_file()#

Copy the specified index file to the plot directory and each of the subdirectories.

make_dirs()#

Create directories recursively before saving plots with multiprocessing to avoid conflicts between different processes.

plot_comparison(name, ratio=True, format='png')#

Plots one histogram, for all years and categories.

plot_comparison_all(ratio=True, format=<built-in function format>)#

Plots all the histograms contained in the dictionary, for all years and categories.

plot_datamc(name, ratio=True, syst=True, spliteras=False, format='png')#

Plots one histogram, for all years and categories.

plot_datamc_all(ratio=True, syst=True, spliteras=False, format='png')#

Plots all the histograms contained in the dictionary, for all years and categories.

plot_systematic_shifts(shape, format='png', ratio=True)#

Plots the systematic shifts for all the variations.

plot_systematic_shifts_all(format='png', ratio=True)#

Plots the systematic shifts for all the shape objects.

class pocket_coffea.utils.plot_utils.Shape(h_dict, datasets_metadata, name, plot_dir, style_cfg, has_mcstat=True, toplabel=None, only_cat=None, log=False, density=False, year=None, verbose=1, cache=True)#

Bases: object

This class handles the plotting of 1D data/MC histograms. The constructor requires as arguments: - h_dict: dictionary of histograms, with the following structure {} - name: name that identifies the Shape object. - style_cfg: dictionary with style and plotting options.

blind_hist(cat, hist)#
property categorical_axes_data#

Returns the list of categorical axes of a data histogram.

property categorical_axes_mc#

Returns the list of categorical axes of a MC histogram.

define_figure(ratio=True)#

Defines the figure for the Data/MC plot. If ratio is True, a subplot is defined to include the Data/MC ratio plot.

property dense_axes#

Returns the list of dense axes of a histogram, defined as the axes that are not categorical axes.

property dense_dim#

Returns the number of dense axes of a histogram.

filter_samples()#

Filters samples according to the list of samples in the style options. If the option only_samples is specified, only the samples in the list are kept. If the option exclude_samples is specified, the samples in the list are removed. If both options are specified, the samples in the list only_samples are kept, provided they are not in the list exclude_samples.

format_figure(cat, ratio=True, ref=None)#

Formats the figure’s axes, labels, ticks, xlim and ylim.

get_axis_items(axis_name, is_mc)#

Returns the list of values contained in a Hist axis.

get_datamc_ratio(cat)#

Computes the data/MC ratio and the corresponding uncertainty.

get_ref_ratios(cat, ref=None)#

Computes the ratios and the corresponding uncertainty between two processes.

group_samples()#

Groups samples according to the dictionary self.style.samples_map

load_attributes()#

Loads the attributes from the dictionary of histograms.

load_syst_manager()#

Loads the attributes from the dictionary of histograms.

plot_compare_ratios(cat, ref, ax=None)#

Plots the ratios as an errorbar plots.

plot_comparison(cat, ratio=True, ax=None, rax=None)#

Plots the comparison of the histograms

plot_comparison_all(ratio=True, save=True, format='png')#
plot_data(cat, ax=None)#

Plots the data histogram as an errorbar plot.

plot_datamc(cat, ratio=True, syst=True, ax=None, rax=None)#

Plots the data histogram as an errorbar plot on top of the MC stacked histograms. If ratio is True, also the Data/MC ratio plot is plotted. If syst is True, also the total systematic uncertainty is plotted.

plot_datamc_all(ratio=True, syst=True, spliteras=False, save=True, format='png')#

Plots the data and MC histograms for each year and category contained in the histograms. If ratio is True, also the Data/MC ratio plot is plotted. If syst is True, also the total systematic uncertainty is plotted.

plot_datamc_ratio(cat, ax=None)#

Plots the Data/MC ratio as an errorbar plot.

plot_mc(cat, ax=None)#

Plots the MC histograms as a stacked plot.

plot_systematic_shifts(cat, syst_name, ratio=True, format='png', save=True)#

Plots the systematic shifts (up/down) of a given systematic uncertainty. The systematic shifts are plotted as a ratio plot if ratio is set to True.

plot_systematic_shifts_all(ratio=True, format='png', save=True)#

Plots the systematic shifts (up/down) of all the systematic uncertainties for a given category.

plot_systematic_uncertainty(cat, ratio=False, ax=None)#

Plots the asymmetric systematic uncertainty band on top of the MC stack, if ratio is set to False. To plot the systematic uncertainty in a ratio plot, ratio has to be set to True and the uncertainty band will be plotted around 1 in the ratio plot.

replace_missing_variations()#

Replaces the missing categories in the MC histograms with the nominal values.

rescale_samples()#
property samples#
property samples_data#
property samples_mc#
property stack_sum_data#

Returns the sum histogram of a stack (hist.stack.Stack) of data histograms.

property stack_sum_mc_nominal#

Returns the sum histogram of a stack (hist.stack.Stack) of MC histograms.

class pocket_coffea.utils.plot_utils.Style(style_cfg)#

Bases: object

This class manages all the style options for Data/MC plots.

set_defaults()#
update(style_cfg)#

Updates the style options with a new dictionary.

class pocket_coffea.utils.plot_utils.SystManager(shape: Shape, style: Style)#

Bases: object

This class handles the systematic uncertainties of 1D MC histograms.

get_syst(syst_name: str, cat: str)#

Returns the SystUnc object corresponding to a given systematic uncertainty, passed as the argument syst_name and for a given category, passed as the argument cat.

mcstat(cat)#
total(cat)#
update(cat, stacks)#

Updates the dictionary of systematic uncertainties with the new cached stacks.

class pocket_coffea.utils.plot_utils.SystUnc(shape: Shape, stacks: dict | None = None, name: str | None = None, syst_list: list | None = None)#

Bases: object

This class stores the information of a single systematic uncertainty of a 1D MC histogram. The built-in __add__() method implements the sum in quadrature of two systematic uncertainties, returning a SystUnc instance corresponding to their sum in quadrature.

check_empty_variations(stacks)#

Method used in the constructor to check if any of the systematic variations is empty.

define_figure(ratio=True, toplabel=None)#

Defines the figure for the systematic shifts plot.

Parameters:

ratio (bool, optional) – plot ratio of shifts and nominal, defaults to True

property down#
format_figure(ratio=True)#

Formats the figure’s axes, labels, ticks, xlim and ylim.

property nsyst#
plot(ratio=True, toplabel=None, log=False)#

Plots the nominal, up and down systematic variations on the same plot.

property ratio_down#
property ratio_up#
property up#
property yaxis_ratio_limit: tuple#

Calculate the limits for the y-axis in the ratio plot.

Returns:

A tuple containing the lower and upper limits for the y-axis.

Return type:

tuple

pocket_coffea.utils.rucio module#

pocket_coffea.utils.rucio.get_dataset_files_from_dbs(dataset_name: str, dbs_instance: str = 'prod/global')#

This function queries the DBS server to get information about the location of each block in a CMS dataset. It is used instead of the rucio replica query when the dataset is not available in rucio.

pocket_coffea.utils.rucio.get_dataset_files_replicas(dataset, allowlist_sites=None, include_redirector=False, blocklist_sites=None, regex_sites=None, mode='full', partial_allowed=False, client=None, scope='cms')#

This function queries the Rucio server to get information about the location of all the replicas of the files in a CMS dataset.

The sites can be filtered in 3 different ways: - allowlist_sites: list of sites to select from. If the file is not found there, raise an Exception. - blocklist_sites: list of sites to avoid. If the file has no left site, raise an Exception - regex_sites: regex expression to restrict the list of sites.

The fileset returned by the function is controlled by the mode parameter: - “full”: returns the full set of replicas and sites (passing the filtering parameters) - “first”: returns the first replica found for each file - “best”: to be implemented (ServiceX..) - “roundrobin”: try to distribute the replicas over different sites

Parameters:
  • dataset (str)

  • allowlist_sites (list)

  • blocklist_sites (list)

  • regex_sites (list)

  • mode (str, default "full")

  • client (rucio Client, optional)

  • partial_allowed (bool, default False)

  • scope (rucio scope, "cms")

Returns:

  • files (list) – depending on the mode option. - If mode==”full”, returns the complete list of replicas for each file in the dataset - If mode==”first”, returns only the first replica for each file.

  • sites (list) – depending on the mode option. - If mode==”full”, returns the list of sites where the file replica is available for each file in the dataset - If mode==”first”, returns a list of sites for the first replica of each file.

  • sites_counts (dict) – Metadata counting the coverage of the dataset by site

pocket_coffea.utils.rucio.get_rucio_client(proxy=None) Client#

Open a client to the CMS rucio server using x509 proxy.

Parameters:

proxy (str, optional) – Use the provided proxy file if given, if not use voms-proxy-info to get the current active one.

Returns:

nativeClient – Rucio client

Return type:

rucio.Client

pocket_coffea.utils.rucio.get_xrootd_sites_map()#

The mapping between RSE (sites) and the xrootd prefix rules is read from /cvmfs/cms/cern.ch/SITECONF/*site*/storage.json.

This function returns the list of xrootd prefix rules for each site.

pocket_coffea.utils.rucio.query_dataset(query: str, client=None, tree: bool = False, datatype='container', scope='cms')#

This function uses the rucio client to query for containers or datasets.

Parameters:
  • query (str = query to filter datasets / containers with the rucio list_dids functions)

  • client (rucio client)

  • tree (bool = if True return the results splitting the dataset name in parts parts)

  • datatype ("container/dataset": rucio terminology. "Container"==CMS dataset. "Dataset" == CMS block.)

  • scope ("cms". Rucio instance)

Returns:

  • list of containers/datasets

  • if tree==True, returns the list of dataset and also a dictionary decomposing the datasets

  • names in the 1st command part and a list of available 2nd parts.

pocket_coffea.utils.run module#

pocket_coffea.utils.skim module#

pocket_coffea.utils.skim.copy_file(fname: str, localdir: str, location: str, subdirs: List[str] | None = None)#
pocket_coffea.utils.skim.is_rootcompat(a)#

Is it a flat or 1-d jagged array?

pocket_coffea.utils.skim.save_skimed_dataset_definition(processing_out, fileout, check_initial_events=True)#
pocket_coffea.utils.skim.uproot_writeable(events)#

Restrict to columns that uproot can write compactly

pocket_coffea.utils.utils module#

pocket_coffea.utils.utils.adapt_chunksize(nevents, run_options)#

Helper function to adjust the chunksize so that each worker has at least a chunk to process. If the number of available workers exceeds the maximum number of workers for a given dataset, the chunksize is reduced so that all the available workers are used to process the given dataset.

pocket_coffea.utils.utils.add_to_path(p)#
pocket_coffea.utils.utils.dump_ak_array(akarr: Array, fname: str, location: str, subdirs: List[str] | None = None) None#

Dump an awkward array to disk at location/’/’.join(subdirs)/fname.

pocket_coffea.utils.utils.load_config(cfg, do_load=True, save_config=True, outputdir=None)#

Helper function to load a Configurator instance from a user defined python module

pocket_coffea.utils.utils.path_import(absolute_path)#

Module contents#