pocket_coffea.utils package#

Submodules#

pocket_coffea.utils.build_jets_calibrator module#

Nice code to build the JEC/JER and JES uncertainties taken from andrzejnovak/boostedhiggs

pocket_coffea.utils.build_jets_calibrator.build(params)#

Build the factory objects from the list of JEC files for each era for ak4 and ak8 jets and same them on disk in cloudpikle format

pocket_coffea.utils.configurator module#

class pocket_coffea.utils.configurator.Configurator(workflow, parameters, datasets, skim, preselections, categories, weights, variations, variables, columns=None, workflow_options=None, save_skimmed_files=None)#

Bases: object

Main class driving the configuration of a PocketCoffea analysis. The Configurator groups the several aspects that define an analysis run: - skims, preselections, categorization - output: variables and columns - datasets - weights and variations - workflow - analysis parameters

The running environment configuration is not part of the Configurator class.

filter_dataset(nfiles)#
load_columns_config(wcfg)#
load_cuts_and_categories(skim: list, preselections: list, categories)#

This function loads the list of cuts and groups them in categories. Each cut is identified by a unique id (see Cut class definition)

load_datasets()#
load_subsamples()#
load_variations_config(wcfg, variation_type)#

This function loads the variations definition and prepares a list of weights to be applied for each sample and category

load_weights_config(wcfg)#

This function loads the weights definition and prepares a list of weights to be applied for each sample and category

load_workflow()#
save_config(output)#
pocket_coffea.utils.configurator.format(data, indent=0, width=80, depth=None, compact=True, sort_dicts=True)#

pocket_coffea.utils.dataset module#

class pocket_coffea.utils.dataset.Dataset(name, cfg, sites_cfg=None, append_parents=False)#

Bases: object

check_samples()#
down_file = <parsl.app.python.PythonApp object>#
download()#
get_samples(files)#
save(append=True, overwrite=False, split=False)#
class pocket_coffea.utils.dataset.Sample(name, das_names, sample, metadata, sites_cfg, **kwargs)#

Bases: object

check_files(prefix)#
get_filelist()#

Function to get the dataset filelist from DAS and from Rucio. From DAS we get the general info about the dataset (event count, file size), whereas from rucio we get the specific path at the sites without the redirector (it helps with xrootd access in coffea).

get_parentlist(inplace=False)#

Function to get the parent dataset filelist from DAS. The parent list is included as an additional metadata in the sample’s dict.

get_sample_dict(redirector=True, prefix='root://xrootd-cms.infn.it//')#
pocket_coffea.utils.dataset.build_datasets(cfg, keys=None, overwrite=False, download=False, check=False, split_by_year=False, local_prefix=None, whitelist_sites=None, blacklist_sites=None, regex_sites=None, parallelize=4)#
pocket_coffea.utils.dataset.do_dataset(key, config, local_prefix, whitelist_sites, blacklist_sites, regex_sites, **kwargs)#

pocket_coffea.utils.load_output module#

pocket_coffea.utils.load_output.load_output(file)#

pocket_coffea.utils.logging module#

class pocket_coffea.utils.logging.LogFormatter(color, *args, **kwargs)#

Bases: Formatter

COLOR_CODES = {10: '\x1b[1;30m', 20: '\x1b[0;37m', 30: '\x1b[1;33m', 40: '\x1b[1;31m', 50: '\x1b[1;35m'}#
RESET_CODE = '\x1b[0m'#
format(record, *args, **kwargs)#

Format the specified record as text.

The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.

pocket_coffea.utils.logging.setup_logging(console_log_output, console_log_level, console_log_color, logfile_file, logfile_log_level, logfile_log_color, log_line_template)#

pocket_coffea.utils.network module#

pocket_coffea.utils.network.check_port(port)#
pocket_coffea.utils.network.get_proxy_path() str#

Checks if the VOMS proxy exists and if it is valid for at least 1 hour. If it exists, returns the path of it

pocket_coffea.utils.plot_efficiency module#

class pocket_coffea.utils.plot_efficiency.EfficiencyMap(shape, config, year, outputdir, mode='standard')#

Bases: object

compute_efficiency(cat, var, era=None)#

Compute the data and MC efficiency, the scale factor and the corresponding uncertainties for a given category cat and a variation var. If the computation has to be performed for a specific data-taking era, also the argument era has to be specified.

define_1d_figures(cat, syst, save_plots=True)#

Define the figures of the 1D histogram plots.

define_datamc(cat, var, era=None)#

Define the data and MC dictionaries used for slicing the histograms self.h_data and self.h_mc, for .

define_systematics()#

Define the list of systematics, given the variations.

define_variations(syst)#

Define the variations, given a systematic uncertainty syst.

initialize_stack()#

Initialize the lists and dictionaries to save the scale factor corrections in a stack.

plot1d(cat, syst, var, save_plots=True, era=None)#

Function to plot a 1D efficiency or scale factor for a given category cat and a variation var. To save the output plots, the flag save_plots has to be set to True. If the computation has to be performed for a specific data-taking era, also the argument era has to be specified.

plot2d(cat, syst, var, save_plots=True, era=None)#

Function to plot a 2D efficiency or scale factor for a given category cat and a variation var. To save the output plots, the flag save_plots has to be set to True. If the computation has to be performed for a specific data-taking era, also the argument era has to be specified.

save1d(save_plots)#

Function to save the 1D plots as png files if save_plots is set to True.

save2d(cat, syst, var, label, save_plots, era=None)#

Function that saves the 2D plots as png files if save_plots is set to True. The category cat, the systematic uncertainty syst and the variation var have to be specified. The argument label is required to specify the map that needs to be plotted.

save_corrections()#

Function to save the dictionary of corrections containing the scale factor value, the x-axis (and y-axis for 2D maps) and the data-taking year.

pocket_coffea.utils.plot_efficiency.plot_efficiency_maps(shape, config, year, outputdir, save_plots=False)#

Function to plot 1D and 2D efficiencies and scale factors and save the corrections dictionaries for the systematic variations included in the input histograms.

pocket_coffea.utils.plot_efficiency.plot_efficiency_maps_splitHT(shape, config, year, outputdir, save_plots=False)#

Function to plot 1D and 2D efficiencies and scale factors and save the corrections dictionaries for the HT systematic variation.

pocket_coffea.utils.plot_efficiency.plot_efficiency_maps_spliteras(shape, config, year, outputdir, save_plots=False)#

Function to plot 1D and 2D efficiencies and scale factors and save the corrections dictionaries for the data-taking era systematic variation.

pocket_coffea.utils.plot_efficiency.plot_ratio(x, y, ynom, yerrnom, xerr, edges, xlabel, ylabel, syst, var, opts, ax, data=False, sf=False, **kwargs)#

Function to plot the uncertainty band corresponding to the variation of an efficiency or scale factor in the ratio plot on an axis ax. To plot the data efficiency variation, the flag data has to be set to True. To plot the scale factor variation, the flag sf has to be set to True.

pocket_coffea.utils.plot_efficiency.plot_residue(x, y, ynom, yerrnom, xerr, edges, xlabel, ylabel, syst, var, opts, ax, data=False, sf=False, **kwargs)#

Function to plot the uncertainty band corresponding to the variation of an efficiency or scale factor in the residue plot on an axis ax. To plot the data efficiency variation, the flag data has to be set to True. To plot the scale factor variation, the flag sf has to be set to True.

pocket_coffea.utils.plot_efficiency.plot_variation(x, y, yerr, xerr, xlabel, ylabel, syst, var, opts, ax, data=False, sf=False, **kwargs)#

Function to plot a variation of an efficiency or scale factor on an axis ax. To plot the data efficiency variation, the flag data has to be set to True. To plot the scale factor variation, the flag sf has to be set to True.

pocket_coffea.utils.plot_efficiency.stack_sum(stack)#

Returns the sum histogram of a stack (hist.stack.Stack) of histograms.

pocket_coffea.utils.plot_efficiency.uncertainty_efficiency(eff, den, sumw2_num=None, sumw2_den=None, mc=False)#

Returns the uncertainty on an efficiency eff=num/den given the efficiency eff, the denominator den. For MC efficiency also the sum of the squared weights of numerator and denominator (sumw2_num, sumw2_den) have to be passed as argument and the flag mc has to be set to True.

pocket_coffea.utils.plot_efficiency.uncertainty_sf(eff_data, eff_mc, unc_eff_data, unc_eff_mc)#

Returns the uncertainty on a scale factor given the data and MC efficiency (eff_data, eff_mc) and the corresponding uncertainties (unc_eff_data, unc_eff_mc).

pocket_coffea.utils.plot_sf module#

pocket_coffea.utils.plot_sf.plot_variation_correctionlib(file, axis_x, systematics, plot_dir, **kwargs)#

pocket_coffea.utils.plot_utils module#

class pocket_coffea.utils.plot_utils.PlotManager(variables, hist_objs, datasets_metadata, plot_dir, style_cfg, toplabel=None, only_cat=None, workers=8, log=False, density=False, verbose=1, save=True)#

Bases: object

This class manages multiple Shape objects and their plotting.

make_dirs()#

Create directories recursively before saving plots with multiprocessing to avoid conflicts between different processes.

plot_datamc(name, syst=True, spliteras=False)#

Plots one histogram, for all years and categories.

plot_datamc_all(syst=True, spliteras=False)#

Plots all the histograms contained in the dictionary, for all years and categories.

class pocket_coffea.utils.plot_utils.Shape(h_dict, datasets_metadata, name, plot_dir, style_cfg, toplabel=None, only_cat=None, log=False, density=False, verbose=1)#

Bases: object

This class handles the plotting of 1D data/MC histograms. The constructor requires as arguments: - h_dict: dictionary of histograms, with the following structure {} - name: name that identifies the Shape object. - style_cfg: dictionary with style and plotting options.

property categorical_axes_data#

Returns the list of categorical axes of a data histogram.

property categorical_axes_mc#

Returns the list of categorical axes of a MC histogram.

define_figure(ratio=True)#

Defines the figure for the Data/MC plot. If ratio is True, a subplot is defined to include the Data/MC ratio plot.

property dense_axes#

Returns the list of dense axes of a histogram, defined as the axes that are not categorical axes.

property dense_dim#

Returns the number of dense axes of a histogram.

exclude_samples()#
format_figure(cat, ratio=True)#

Formats the figure’s axes, labels, ticks, xlim and ylim.

get_axis_items(axis_name, is_mc)#

Returns the list of values contained in a Hist axis.

get_datamc_ratio(cat)#

Computes the data/MC ratio and the corresponding uncertainty.

group_samples()#

Groups samples according to the dictionary self.style.samples_map

load_attributes()#

Loads the attributes from the dictionary of histograms.

plot_data(cat, ax=None)#

Plots the data histogram as an errorbar plot.

plot_datamc(cat, ratio=True, syst=True, ax=None, rax=None)#

Plots the data histogram as an errorbar plot on top of the MC stacked histograms. If ratio is True, also the Data/MC ratio plot is plotted. If syst is True, also the total systematic uncertainty is plotted.

plot_datamc_all(ratio=True, syst=True, spliteras=False, save=True)#

Plots the data and MC histograms for each year and category contained in the histograms. If ratio is True, also the Data/MC ratio plot is plotted. If syst is True, also the total systematic uncertainty is plotted.

plot_datamc_ratio(cat, ax=None)#

Plots the Data/MC ratio as an errorbar plot.

plot_mc(cat, ax=None)#

Plots the MC histograms as a stacked plot.

plot_systematic_uncertainty(cat, ratio=False, ax=None)#

Plots the asymmetric systematic uncertainty band on top of the MC stack, if ratio is set to False. To plot the systematic uncertainty in a ratio plot, ratio has to be set to True and the uncertainty band will be plotted around 1 in the ratio plot.

rescale_samples()#
property samples#
property samples_data#
property samples_mc#
property stack_sum_data#

Returns the sum histogram of a stack (hist.stack.Stack) of data histograms.

property stack_sum_mc_nominal#

Returns the sum histogram of a stack (hist.stack.Stack) of MC histograms.

class pocket_coffea.utils.plot_utils.Style(style_cfg)#

Bases: object

This class manages all the style options for Data/MC plots.

set_defaults()#
update(style_cfg)#

Updates the style options with a new dictionary.

class pocket_coffea.utils.plot_utils.SystManager(shape: Shape, style: Style, has_mcstat=True)#

Bases: object

This class handles the systematic uncertainties of 1D MC histograms.

get_syst(syst_name: str, cat: str)#

Returns the SystUnc object corresponding to a given systematic uncertainty, passed as the argument syst_name and for a given category, passed as the argument cat.

mcstat(cat)#
total(cat)#
update()#

Updates the dictionary of systematic uncertainties with the new cached stacks.

class pocket_coffea.utils.plot_utils.SystUnc(style: Style, stacks: dict | None = None, name: str | None = None, syst_list: list | None = None)#

Bases: object

This class stores the information of a single systematic uncertainty of a 1D MC histogram. The built-in __add__() method implements the sum in quadrature of two systematic uncertainties, returning a SystUnc instance corresponding to their sum in quadrature.

property down#
property nsyst#
plot(ax=None)#

Plots the nominal, up and down systematic variations on the same plot.

property ratio_down#
property ratio_up#
property up#

pocket_coffea.utils.rucio module#

pocket_coffea.utils.rucio.get_dataset_files(dataset, whitelist_sites=None, blacklist_sites=None, regex_sites=None, output='first')#

This function queries the Rucio server to get information about the location of all the replicas of the files in a CMS dataset.

The sites can be filtered in 3 different ways: - whilist_sites: list of sites to select from. If the file is not found there, raise an Expection. - blacklist_sites: list of sites to avoid. If the file has no left site, raise an Exception - regex_sites: regex expression to restrict the list of sites.

The function can return all the possible sites for each file (output=”all”) or the first site found for each file (output=”first”, by default)

pocket_coffea.utils.rucio.get_dataset_files_from_dbs(dataset_name: str, dbs_instance: str = 'prod/global')#

This function queries the DBS server to get information about the location of each block in a CMS dataset. It is used instead of the rucio replica query when the dataset is not available in rucio.

pocket_coffea.utils.rucio.get_rucio_client()#
pocket_coffea.utils.rucio.get_xrootd_sites_map()#

pocket_coffea.utils.run module#

pocket_coffea.utils.skim module#

pocket_coffea.utils.skim.copy_file(fname: str, localdir: str, location: str, subdirs: List[str] | None = None)#
pocket_coffea.utils.skim.is_rootcompat(a)#

Is it a flat or 1-d jagged array?

pocket_coffea.utils.skim.uproot_writeable(events)#

Restrict to columns that uproot can write compactly

pocket_coffea.utils.utils module#

pocket_coffea.utils.utils.add_to_path(p)#
pocket_coffea.utils.utils.dump_ak_array(akarr: Array, fname: str, location: str, subdirs: List[str] | None = None) None#

Dump an awkward array to disk at location/’/’.join(subdirs)/fname.

pocket_coffea.utils.utils.path_import(absolute_path)#

Module contents#