pocket_coffea.lib package#

Submodules#

pocket_coffea.lib.categorization module#

This module defines several helper classes to handle Masks and Selections.

  • The MaskStorage is a generalization of the PackedSelection utility to be able to store effectively dim=2 masks.

  • MultiCut objects which splits the events in a defined number of subcategories.

  • StandardSelection: handles the definition of categories from a dictionary of Cut objects. All the cuts are handled by a single MaskStorage. The StandardSelection object stores which cuts are applied in each category.

  • CartesianSelection: handles the definition of cartesian product of categories. The class keeps a list of MultiCut objects, each defining a set of subcategories. Then, it defines automatically categories which are the cartesian products of the categories defined by each MultiCut. A StandardSelection object can be embeeded in the CartesianSelection to defined categories not used in the cartesian product.

class pocket_coffea.lib.categorization.CartesianSelection(multicuts: List[MultiCut], common_cats: StandardSelection | None = None)#

Bases: object

The CartesianSelection class is needed for a special type of categorization: - you need to fully loop on the combination of different binnings. - you can express each binning with a MultiCut object.

The cartesian product of a list of MultiCut object is build automatically: the class provides a generator computing the and of the masks on the fly. Computed masks are cached for multiple use between calls to prepare().

Common categories to be applied outside of the cartesian product can be defined with a StandardSelection object.

The Cuts can be on events or on collections (dim=2). The StandardSelection is independent from the MultiCut cartesian product. If one of the multicut is dim=2, all the cartesian product will be multidimensional.

get_mask(category)#
get_masks()#
items()#
keys()#
prepare(events, processor_params, **kwargs)#
serialize()#
property template_mask#
class pocket_coffea.lib.categorization.MaskStorage(dim=1, counts=None)#

Bases: object

The MaskStorage class stores in a PackedSelection multidimensional cuts: it does so by flattening the mask and keeping track of the counts of elements for the unflattening. Dim=1 and dim=2 cuts can be stored together if the storage is initialize with dim=2.

add(id, mask)#
all(cut_ids, unflatten=True)#
property masks#
property names#
class pocket_coffea.lib.categorization.MultiCut(name: str, cuts: List[Cut], cuts_names: List[str] | None = None)#

Bases: object

Class for keeping track of a list of cuts and their masks by index. The class is built from a list of Cut objects and cuts names.

This class just wraps a list of Cuts to be used along with the CartesianSelection class. This class is useful to build a 1D binning to be combined with other binnings with a CartesianSelection.

The prepare() methods instantiate a PackedSelection object to keep track of the masks. If the cuts are more than 64 the current implementation fails.

The single masks can be retrieved by their index in the list with the function get_mask(cut_index)

get_mask(cut_index)#
property ncuts#
prepare(events, processor_params, **kwargs)#
serialize()#
class pocket_coffea.lib.categorization.StandardSelection(categories)#

Bases: object

The StandardSelection class defined the simplest categorization.

Each category is identified by a label string and has a list of Cut object: the AND of the Cut objects defines the category mask.

The class stores all the Cut objects mask in a single MaskStorage to reduce the memory overhead. Then, when a category mask is asked, the cut objects associated to the category are used.

The object can handle a mixture of dim=1 and dim=2 Cut objects. The MaskStorage is instantiated automatically with dim=2 if at least one Cut is defined over a collection different from “events”.

get_mask(category)#
get_masks()#
items()#
keys()#
prepare(events, processor_params, **kwargs)#
serialize()#
property template_mask#

pocket_coffea.lib.columns_manager module#

class pocket_coffea.lib.columns_manager.ColOut(collection: str, columns: List[str], flatten: bool = True, store_size: bool = True, fill_none: bool = True, fill_value: float = -999.0, pos_start: int = None, pos_end: int = None)#

Bases: object

collection: str#
columns: List[str]#
fill_none: bool = True#
fill_value: float = -999.0#
flatten: bool = True#
pos_end: int = None#
pos_start: int = None#
store_size: bool = True#
class pocket_coffea.lib.columns_manager.ColumnsManager(cfg, categories_config)#

Bases: object

add_column(cfg: ColOut, categories=None)#
fill_ak_arrays(events, cuts_masks, subsample_mask=None, weights_manager=None)#
fill_columns_accumulators(events, cuts_masks, subsample_mask=None, weights_manager=None)#
property ncols#

pocket_coffea.lib.cut_definition module#

class pocket_coffea.lib.cut_definition.Cut(name: str, params: dict, function: Callable, collection: str = 'events')#

Bases: object

Class for keeping track of a cut function and its parameters.

Parameters:
  • name – name of the cut

  • params – dictionary of parameters passed to the cut function.

  • coll – collection that the cut is applied on. If “events” the mask will be 1-D. If “Jet”, e.g., the mask will be dim=2 to be applied on the Jet collection.

  • function – function defining the cut code. Signature fn(events, params, **kwargs)

collection: str = 'events'#
function: Callable#
get_mask(events, processor_params, **kwargs)#

The function get called from the processor and the params are passed by default as the second argument. Additional parameters as the year, sample name or others can be included by the processor and are passed to the function.

property id#

The id property must be used inside the framework to identify the cut instead of the name. It represents the cut in a human-readable way, but keeping into account also the hash value for uniquiness.

name: str#
params: dict#
serialize(src_code=False)#

pocket_coffea.lib.cut_functions module#

pocket_coffea.lib.cut_functions.count_objects_eq(events, params, year, sample)#

Count the number of objects in params[“object”] and keep only events with same (==) amount than params[“value”].

pocket_coffea.lib.cut_functions.count_objects_gt(events, params, **kwargs)#

Count the number of objects in params[“object”] and keep only events with larger (>) amount than params[“value”].

pocket_coffea.lib.cut_functions.count_objects_lt(events, params, year, sample)#

Count the number of objects in params[“object”] and keep only events with smaller (<) amount than params[“value”].

pocket_coffea.lib.cut_functions.eq_nObj(events, params, **kwargs)#
pocket_coffea.lib.cut_functions.eq_nObj_minPt(events, params, **kwargs)#
pocket_coffea.lib.cut_functions.get_HLTsel(primaryDatasets=None, invert=False)#

Create the HLT trigger mask

The Cut function reads the triggers configuration and create the mask. For MC the OR of all the triggers in the specific configuration key is performed. For DATA only the corresponding primary dataset triggers are applied. if primaryDatasets param is passed, the correspoding triggers are applied, both on DATA and MC, overwriting any other configuration.

This is useful to remove the overlap of primary datasets in data.

Parameters:
  • primaryDatasets – (optional) list of primaryDatasets to use. Overwrites any other config both for Data and MC

  • invert – invert the mask, if True the function returns events failing the HLT selection

Returns:

events mask

pocket_coffea.lib.cut_functions.get_JetVetoMap(name='JetVetoMaps')#
pocket_coffea.lib.cut_functions.get_JetVetoMap_Mask(events, params, year, processor_params, sample, isMC, **kwargs)#
pocket_coffea.lib.cut_functions.get_nBtag(*args, **kwargs)#
pocket_coffea.lib.cut_functions.get_nBtagEq(N, minpt=0, coll='BJetGood', wp='M', name=None)#
pocket_coffea.lib.cut_functions.get_nBtagMin(N, minpt=0, coll='BJetGood', wp='M', name=None)#
pocket_coffea.lib.cut_functions.get_nElectron(N, minpt=0, coll='ElectronGood', name=None)#
pocket_coffea.lib.cut_functions.get_nMuon(N, minpt=0, coll='MuonGood', name=None)#
pocket_coffea.lib.cut_functions.get_nObj_eq(N, minpt=None, coll='JetGood', name=None)#

Factory function which creates a cut for == number of objects. Optionally a minimum pT is requested.

Parameters:
  • N – request == N objects

  • coll – collection to use

  • minpt – minimum pT

  • name – name for the cut, by defaul it is built as n{coll}_eq{N}_pt{minpt}

Returns:

a Cut object

pocket_coffea.lib.cut_functions.get_nObj_less(N, coll='JetGood', name=None)#

Factory function which creates a cut for < number of objects.

Parameters:
  • N – request < N objects

  • coll – collection to use

  • name – name for the cut, by defaul it is built as n{coll}_less{N}

Returns:

a Cut object

pocket_coffea.lib.cut_functions.get_nObj_min(N, minpt=None, coll='JetGood', name=None)#

Factory function which creates a cut for minimum number of objects. Optionally a minimum pT is requested.

Parameters:
  • N – request >= N objects

  • coll – collection to use

  • minpt – minimum pT

  • name – name for the cut, by defaul it is built as n{coll}_min{N}_pt{minpt}

Returns:

a Cut object

pocket_coffea.lib.cut_functions.less_nObj(events, params, **kwargs)#
pocket_coffea.lib.cut_functions.min_nObj(events, params, **kwargs)#
pocket_coffea.lib.cut_functions.min_nObj_minPt(events, params, **kwargs)#
pocket_coffea.lib.cut_functions.nBtagEq(events, params, year, processor_params, **kwargs)#

Mask for == N jets with minpt and passing btagging. The btag params will come from the processor, not from the parameters

pocket_coffea.lib.cut_functions.nBtagMin(events, params, year, processor_params, **kwargs)#

Mask for min N jets with minpt and passing btagging. The btag params will come from the processor, not from the parameters

pocket_coffea.lib.cut_functions.nElectron(events, params, year, **kwargs)#

Mask for min N electrons with minpt.

pocket_coffea.lib.cut_functions.nMuon(events, params, year, **kwargs)#

Mask for min N electrons with minpt.

pocket_coffea.lib.cut_functions.passthrough_f(events, **kargs)#

Identity cut: passthrough of all events.

pocket_coffea.lib.deltaR_matching module#

pocket_coffea.lib.deltaR_matching.deltaR_matching_nonunique(obj1, obj2, radius=0.4)#

Doing this you can keep the assignment on the obj2 collection unique, but you are not checking the uniqueness of the matching to the first collection.

pocket_coffea.lib.deltaR_matching.get_matching_objects_indices_padnone(idx_matched_obj, idx_matched_obj2, maxdim_obj2, deltaR, builder, builder2, builder3)#
pocket_coffea.lib.deltaR_matching.get_matching_pairs_indices(idx_1, idx_2, builder, builder2)#
pocket_coffea.lib.deltaR_matching.metric_eta(obj, obj2)#
pocket_coffea.lib.deltaR_matching.metric_phi(obj, obj2)#
pocket_coffea.lib.deltaR_matching.metric_pt(obj, obj2)#
pocket_coffea.lib.deltaR_matching.object_matching(obj, obj2, dr_min, dpt_max=None, return_indices=False)#

pocket_coffea.lib.hist_manager module#

class pocket_coffea.lib.hist_manager.Axis(field: str, label: str, bins: int = None, start: float = None, stop: float = None, coll: str = 'events', name: str = None, pos: int = None, type: str = 'regular', transform: str = None, lim: Tuple[float] = (0, 0), underflow: bool = True, overflow: bool = True, growth: bool = False)#

Bases: object

bins: int = None#
coll: str = 'events'#
field: str#
growth: bool = False#
label: str#
lim: Tuple[float] = (0, 0)#
name: str = None#
overflow: bool = True#
pos: int = None#
start: float = None#
stop: float = None#
transform: str = None#
type: str = 'regular'#
underflow: bool = True#
class pocket_coffea.lib.hist_manager.HistConf(axes: List[pocket_coffea.lib.hist_manager.Axis], storage: str = 'weight', autofill: bool = True, variations: bool = True, only_variations: List[str] = None, exclude_samples: List[str] = None, only_samples: List[str] = None, exclude_categories: List[str] = None, only_categories: List[str] = None, no_weights: bool = False, metadata_hist: bool = False)#

Bases: object

autofill: bool = True#
axes: List[Axis]#
collapse_2D_masks = False#
collapse_2D_masks_mode = 'OR'#
exclude_categories: List[str] = None#
exclude_samples: List[str] = None#
hist_obj = None#
metadata_hist: bool = False#
no_weights: bool = False#
only_categories: List[str] = None#
only_samples: List[str] = None#
only_variations: List[str] = None#
serialize()#
storage: str = 'weight'#
variations: bool = True#
class pocket_coffea.lib.hist_manager.HistManager(hist_config, year, sample, subsamples, categories_config, variations_config, processor_params, custom_axes=None, isMC=True)#

Bases: object

fill_histograms(events, weights_manager, categories, shape_variation='nominal', subsamples=None, custom_fields=None, custom_weight=None)#

We loop on the configured histograms only Doing so the catergory, sample, variation selections are handled correctly (by the constructor).

Custom_fields is a dict of additional array. The expected lenght of the first dimension is the number of events. The categories mask will be applied.

get_histogram(subsample, name)#
get_histograms(subsample)#
get_metadata_histograms(subsample)#
pocket_coffea.lib.hist_manager.get_hist_axis_from_config(ax: Axis)#
pocket_coffea.lib.hist_manager.mask_and_broadcast_weight(category, subsample, variation, weight, mask, data_structure)#
pocket_coffea.lib.hist_manager.weights_cache(fun)#

Function decorator to cache the weights calculation when they are ndim=1 on data_structure of ndim=1. The weight is cached by (category, subsample, variation)

pocket_coffea.lib.jets module#

pocket_coffea.lib.jets.CvsLsorted(jets, ctag)#
pocket_coffea.lib.jets.add_jec_variables(jets, event_rho)#
pocket_coffea.lib.jets.btagging(Jet, btag, wp)#
pocket_coffea.lib.jets.get_dijet(jets)#
pocket_coffea.lib.jets.jet_correction(params, events, jets, factory, jet_type, year, cache)#
pocket_coffea.lib.jets.jet_correction_correctionlib(events, Jet, typeJet, year, JECversion, JERversion=None, verbose=False)#

This function implements the Jet Energy corrections and Jet energy smearning using factors from correctionlib common-POG json file example here: https://gitlab.cern.ch/cms-nanoAOD/jsonpog-integration/-/blob/master/examples/jercExample.py

pocket_coffea.lib.jets.jet_selection(events, jet_type, params, leptons_collection='')#
pocket_coffea.lib.jets.load_jet_factory(params)#
pocket_coffea.lib.jets.met_correction(params, MET, jets)#

pocket_coffea.lib.leptons module#

pocket_coffea.lib.leptons.get_charged_leptons(electrons, muons, charge, mask)#
pocket_coffea.lib.leptons.get_diboson(dileptons, MET, transverse=False)#
pocket_coffea.lib.leptons.get_dilepton(electrons, muons, transverse=False)#
pocket_coffea.lib.leptons.lepton_selection(events, lepton_flavour, params)#

pocket_coffea.lib.objects module#

pocket_coffea.lib.parton_provenance module#

pocket_coffea.lib.parton_provenance.get_partons_provenance_tt5F(pdgIds, array_builder)#

2=hadronic top bquark, 3=leptonic top bquark, 4=additional radiation 5=hadronic W (from top) decay quarks

pocket_coffea.lib.parton_provenance.get_partons_provenance_ttHbb(pdgIds, array_builder)#

1=higgs, 2=hadronic top bquark, 3=leptonic top bquark, 4=additional radiation 5=hadronic W (from top) decay quarks

pocket_coffea.lib.parton_provenance.get_partons_provenance_ttbb4F(pdgIds, array_builder)#

1=g->bb, 2=hadronic top bquark, 3=leptonic top bquark, 4=additional radiation 5=hadronic W (from top) decay quarks

pocket_coffea.lib.reconstruction module#

pocket_coffea.lib.scale_factors module#

pocket_coffea.lib.scale_factors.get_ele_sf(params, year, pt, eta, counts=None, key='', pt_region=None, variations=['nominal'])#

This function computes the per-electron reco or id SF. If ‘reco’, the appropriate corrections are chosen by using the argument pt_region.

pocket_coffea.lib.scale_factors.get_mu_sf(params, year, pt, eta, counts, key='')#

This function computes the per-muon id or iso SF.

pocket_coffea.lib.scale_factors.sf_L1prefiring(events)#

Correction due to the wrong association of L1 trigger primitives (TP) in ECAL to the previous bunch crossing, also known as “L1 prefiring”. The event weights produced by the latest version of the producer are included in nanoAOD starting from version V9. The function returns the nominal, up and down L1 prefiring weights.

pocket_coffea.lib.scale_factors.sf_btag(params, jets, year, njets, variations=['central'])#

DeepJet AK4 btagging SF. See https://cms-nanoaod-integration.web.cern.ch/commonJSONSFs/summaries/BTV_2018_UL_btagging.html The scale factors have 8 default uncertainty sources (hf,lf,hfstats1/2,lfstats1/2,cferr1/2) (all of this up_*var*, and down_*var*). All except the cferr1/2 uncertainties are to be applied to light and b jets. The cferr1/2 uncertainties are to be applied to c jets. hf/lfstats1/2 uncertainties are to be decorrelated between years, the others correlated. Additional jes-varied scale factors are supplied to be applied for the jes variations.

if variation is not one of the jes ones both the up and down sf is returned. If variation is a jet variation the argument must be up_jes* or down_jes* since it is applied on the specified Jes variation jets.

pocket_coffea.lib.scale_factors.sf_btag_calib(params, sample, year, njets, jetsHt)#

Correction to btagSF computing by comparing the inclusive shape without btagSF and with btagSF in 2D: njets-JetsHT bins. Each sample/year has a different correction stored in the correctionlib format.

pocket_coffea.lib.scale_factors.sf_ctag(params, jets, year, njets, variations=['central'])#

Shape correction scale factors (SF) for DepJet charm tagger, taken from: https://cms-nanoaod-integration.web.cern.ch/commonJSONSFs/summaries/BTV_201X_UL_ctagging.html SFs are obtained per jet, then multiplied to obtain an overall weight per event. Note: this SF does not preserve the normalization of the MC samples!

One has to re-normalize the MC samples using the sf_ctag_calib() method. The norm. calibration corrections are phase-space/analysis dependant. Therefore one has to derive them for each analysis separately.

pocket_coffea.lib.scale_factors.sf_ctag_calib(params, dataset, year, njets, jetsHt)#

These are correctiosn to normalization of every dataset after application of the ctag_sf shape correction. It was computed in V+2J selection by comparing the inclusive shape with and without ctagSF in 2D: in nJets-JetsHT bins. Each sample/year has a different correction stored in the correctionlib format. Note: the correction file in parameters/ctagSF_calibrationSF.json is uesd by default here, which was derived for V+2J phase space. It may not be suitable for other analyses.

pocket_coffea.lib.scale_factors.sf_ele_id(params, events, year)#

This function computes the per-electron id SF and returns the corresponding per-event SF, obtained by multiplying the per-electron SF in each event. Additionally, also the up and down variations of the SF are returned.

pocket_coffea.lib.scale_factors.sf_ele_reco(params, events, year)#

This function computes the per-electron reco SF and returns the corresponding per-event SF, obtained by multiplying the per-electron SF in each event. Additionally, also the up and down variations of the SF are returned. Electrons are split into two categories based on a pt cut depending on the Run preiod, so that the proper SF is applied.

pocket_coffea.lib.scale_factors.sf_ele_trigger(params, events, year, variations=['nominal'])#

This function computes the semileptonic electron trigger SF by considering the leading electron in the event. This computation is valid only in the case of the semileptonic final state. Additionally, also the up and down variations of the SF for a set of systematic uncertainties are returned.

pocket_coffea.lib.scale_factors.sf_jet_puId(params, jets, year, njets)#
pocket_coffea.lib.scale_factors.sf_mu(params, events, year, key='')#

This function computes the per-muon id SF and returns the corresponding per-event SF, obtained by multiplying the per-muon SF in each event. Additionally, also the up and down variations of the SF are returned.

pocket_coffea.lib.scale_factors.sf_pileup_reweight(params, events, year)#

pocket_coffea.lib.triggers module#

pocket_coffea.lib.triggers.get_trigger_mask(events, trigger_dict, year, isMC, primaryDatasets=None, invert=False)#

Computes the HLT trigger mask

The function reads the triggers configuration and create the mask. For MC the OR of all the triggers is performed. For DATA only the corresponding primary dataset triggers are applied. if primaryDataset param is passed, the correspoding triggers are applied, both on DATA and MC.

Parameters:
  • events – Awkward arrays

  • key – Key of the triggers config

  • year – year of the dataset

  • isMC – MC/data

  • primaryDatasets – default None. Overwrite the configuration and applied the specified list of primary dataset triggers both on MC and data

  • invert – Invert the mask, returning which events do not path ANY of the triggers

Returns:

the events mask.

pocket_coffea.lib.weights_manager module#

class pocket_coffea.lib.weights_manager.WeightCustom(name: str, function: Callable)#

Bases: object

User-defined weight

Custom Weights can be created by the user in the configuration by using a WeightCustom object.

  • name: name of the weight

  • function: function defining the weights of the events chunk.

    signuture (params, events, size, metadata:dict, shape_variation:str)

  • variations: list of variations

The function must return the weights in the following format:

[(name:str, nominal, up, down),
 (name:str, nominal, up, down)]

Variations modifiers will have the format nameUp/nameDown

Multiple weights can be produced by a single WeightCustom object.

function: Callable#
name: str#
serialize(src_code=False)#
class pocket_coffea.lib.weights_manager.WeightsManager(params, weightsConf, size, events, shape_variation, metadata, storeIndividual=False)#

Bases: object

The WeightManager class handles the weights defined by the framework and custom weights created by the user in the processor or by configuration.

It handles inclusive or bycategory weights for each sample. Weights can be inclusive or by category. Moreover, different samples can have different weights, as defined in the weights_configuration. The name of the weights available in the current workflow are defined in the class methods “available_weights”

add_weight(name, nominal, up=None, down=None, category=None)#

Add manually a weight to a specific category

classmethod available_variations()#

Predefine weights variations for CMS Run2 UL analysis.

classmethod available_weights()#

Predefine weights for CMS Run2 UL analysis.

get_weight(category=None, modifier=None)#

The function returns the total weights stored in the processor for the current sample. If category==None the inclusive weight is returned. If category!=None but weights_split_bycat=False, the inclusive weight is returned. Otherwise the inclusive*category specific weight is returned. The requested variation==modifier must be available or in the inclusive weights, or in the bycategory weights.

Module contents#