HOW-TOs for common tasks#
HLT trigger selection#
HLT triggers are applied as a skim cut, before any object correction. The list of
triggers is declared once in the parameters under the HLT_triggers key, organized by
data-taking period and primary dataset:
HLT_triggers:
"2018":
SingleEle:
- Ele32_WPTight_Gsf
- Ele28_eta2p1_WPTight_Gsf_HT150
SingleMuon:
- IsoMu24
The selection itself is built with the get_HLTsel factory from
pocket_coffea.lib.cut_functions and added to the skim list of the Configurator:
1from pocket_coffea.lib.cut_functions import get_HLTsel
2
3skim = [
4 get_nPVgood(1), eventFlags, goldenJson,
5 get_HLTsel(), # OR of all primary datasets
6],
Behavior of get_HLTsel(primaryDatasets=None, invert=False):
MC: the OR of all triggers in the configuration is applied.
Data: only the triggers of the primary dataset matching the sample are applied.
primaryDatasets=[...]: restrict to the given primary datasets, both for Data and MC, overriding the automatic behavior. This is the way to remove the primary-dataset overlap in data:1skim = [get_HLTsel(primaryDatasets=["SingleMuon", "SingleEle"])]
invert=True: keep events failing the selection. Useful for cross-cleaning one primary dataset against another (see Primary dataset cross-cleaning).
To apply a fixed list of trigger paths irrespective of the parameters file, use
get_HLTsel_custom(["HLT_IsoMu24", ...]).
Warning
Trigger cuts read raw NanoAOD branches and therefore belong in skim, never in
preselections — object corrections happen after the skim, inside the calibration loop.
Define a new cut function#
A selection is a Cut object (pocket_coffea.lib.cut_definition.Cut) wrapping a function
with the signature function(events, params, year, sample, **kwargs) that returns a
boolean awkward array (one entry per event). The params dictionary is attached to the
Cut and passed through at evaluation time, so the same function can be reused with
different thresholds.
1import awkward as ak
2from pocket_coffea.lib.cut_definition import Cut
3
4def min_jet_function(events, params, year, sample, **kwargs):
5 mask = ak.num(events.JetGood) >= params["njet"]
6 return mask
7
8min_2jets = Cut(
9 name="min_2jets",
10 params={"njet": 2},
11 function=min_jet_function,
12)
For simple, one-off selections an inline lambda is enough:
1even_event = Cut(
2 name="even_event",
3 function=lambda events, params, **kwargs: events.event % 2 == 0,
4 params={},
5)
Parametrized factory cuts#
The recommended pattern for reusable cuts is a factory that returns a configured Cut,
so the threshold becomes part of the cut name (and therefore of the output keys). Several
ready-made factories live in pocket_coffea.lib.cut_functions:
1from pocket_coffea.lib.cut_functions import get_nObj_min, get_nObj_eq, get_nObj_less
2
3get_nObj_min(4, minpt=30., coll="JetGood") # >= 4 jets with pt > 30
4get_nObj_eq(2, coll="BJetGood") # exactly 2 b-jets
5get_nObj_less(3, coll="MuonGood") # < 3 muons
Your own factories follow the same shape:
1def get_min_jets(njet, name=None):
2 if name is None:
3 name = f"min_{njet}_jets"
4 return Cut(name=name, params={"njet": njet}, function=min_jet_function)
Note
If your cut needs a variable that is not in NanoAOD, compute it in
define_common_variables_before_presel in the workflow so it is available when the
preselection runs.
Categories: standard and Cartesian selections#
Categories are the orthogonal regions in which histograms, columns and weights are filled.
The simplest form is a dictionary mapping a category name to a list of Cuts (ANDed
together):
1from pocket_coffea.parameters.cuts import passthrough
2
3categories = {
4 "baseline": [passthrough],
5 "1btag": [get_nObj_min(1, coll="BJetGood")],
6 "2btag": [get_nObj_min(2, coll="BJetGood")],
7}
When you need the full product of several independent cut sets (e.g. jet multiplicity ×
b-jet multiplicity), use CartesianSelection with one MultiCut per axis. It avoids
spelling out every combination by hand:
1from pocket_coffea.lib.categorization import CartesianSelection, MultiCut
2
3categories = CartesianSelection(
4 multicuts = [
5 MultiCut(name="Njets",
6 cuts=[get_nObj_eq(1, 30., "JetGood"),
7 get_nObj_eq(2, 30., "JetGood"),
8 get_nObj_min(3, 30., "JetGood")],
9 cuts_names=["1j", "2j", "3j"]),
10 MultiCut(name="Nbjet",
11 cuts=[get_nObj_eq(0, 15., "BJetGood"),
12 get_nObj_eq(1, 15., "BJetGood"),
13 get_nObj_min(2, coll="BJetGood")],
14 cuts_names=["0b", "1b", "2b"]),
15 ],
16 # categories that are NOT part of the cartesian product
17 common_cats = {
18 "inclusive": [passthrough],
19 "4jets_40pt" : [get_nObj_min(4, 40., "JetGood")],
20 }
21)
This produces the cross-product categories 1j_0b, 1j_1b, …, 3j_2b (the names are the
cuts_names joined with _), plus the common_cats entries inclusive and 4jets_40pt.
CartesianSelection is far more memory-efficient than declaring every combination as a
StandardSelection, because the per-axis masks are computed once and combined on the fly.
Skimming events#
Skimming NanoAOD events and save the reduced files on disk can speedup a lot the processing of the analysis. The recommended executor for the skimming process is the direct condor-job executor, which splits the workload in condor jobs without using the dask scheduler. This makes the resubmission of failed skim jobs easier.
Follow these instructions to skim the files on EOS:
Add the
save_skimmed_filesargument to the configurator with a suitable folder name: e.g.save_skimmed_files = "root://eoscms.cern.ch//eos/cms/store/group/phys_higgs/ttHbb/Run3_semileptonic_skim/"It is recommended to run the processing on HTCondor at CERN using the new direct condor executor. That will send out standard jobs instead of using dask. Please make sure your dataset list is up-to-date before sending the jobs.
pocket-coffea run --cfg config_skim.py -o output_skim_config -e condor@lxplus --scaleout NUMBEROFJOBS --chunksize 200000 --job-dir jobs --job-name skim --queue workday --dry-run. Use the--dry-runoption to check the job splitting configuration and remove it when you are happy to submit the jobs.Check the status of the jobs with
pocket-coffea check-jobs -j jobs-dir/skim. Optionally activate the automatic resubmitting option to resubmit failed jobs.Merge the per-job
.coffeaoutputs into a single file withmerge-outputs:merge-outputs output_skim_config/output_job_*.coffea -o output_total.coffea
This step is mandatory, not optional: each skim job writes a
.coffeafile carrying the cutflow and, crucially, thesum_genweightsof the chunks it processed. Merging accumulates these correctly across all jobs — including the chunks that skimmed 0 events (an event-less chunk still contributes to the generator-weight sum used for the cross-section normalization downstream). The skimmed ROOT files alone do not carry this information, so without the merge the normalization of the skimmed dataset would be wrong. The resultingoutput_total.coffeais the input to the next step.Warning
Merge the raw
output_job_*.coffeafiles in a singlemerge-outputscall. Re-merging an already-merged file double-rescales the histograms (thesum_genweightsis not reset between passes), so never feed a merged output back intomerge-outputs.Once done, we usually do an hadd to sum all the small files produced by each saved chunk. An utility script to compute the groups and correctly hadd them is available
pocket-coffea hadd-skimmed-files -fl ../output_total.coffea -o root://eoscms.cern.ch//eos/cms/store/group/phys_higgs/ttHbb/Run3_dileptonic_skim_hadd -e 400000 --dry -s 6this script creates some files to be able to send out jobs that runs the hadd for each group of files. Note that it reads the mergedoutput_total.coffeaproduced in the previous step, so the per-chunk event counts andsum_genweightsare taken from there.From all this process you will get out at the end an updated
dataset_definition_file.jsonto be used in your analysis config.
Minimal skimming configuration#
Skimming is enabled simply by setting save_skimmed_files on the Configurator to an
output folder (local path or root://… EOS URL). The skim cuts in the skim list define
which events are written; everything downstream (preselections, categories, …) is still
declared but is only used if you keep processing after the skim:
1cfg = Configurator(
2 ...
3 skim = [
4 get_nPVgood(1), eventFlags, goldenJson,
5 get_HLTsel(),
6 get_nObj_min(2, minpt=20., coll="Jet"), # at least 2 raw jets
7 ],
8 save_skimmed_files = "./skim/",
9 ...
10)
Each processed chunk produces one ROOT file under save_skimmed_files. The generator
weight sum is rescaled (skimRescaleGenWeight) so cross sections still match once the
skimmed files are used as inputs downstream. In addition each job writes a .coffea
output holding the cutflow and the sum_genweights of every chunk it processed (including
chunks that skimmed 0 events); these must be combined with merge-outputs — see step 4 of
the workflow above — to get the correct normalization of the skimmed
dataset.
Skim modes#
When save_skimmed_files is set, which events get written is governed by the
skim_mode entry of workflow_options. Two modes are available; both short-circuit the
processing — once the skimmed chunk is exported the processor returns immediately and does
not fill histograms or columns.
skim_mode |
events written | calibration / preselection run? |
|---|---|---|
"skim" (default) |
events passing the skim cuts |
no — cuts evaluated on raw NanoAOD only |
"presel_any_variation" |
events passing the preselections in at least one calibration variation |
yes — but only to build the selection mask |
skim_mode: "skim" (default)#
This is the standard skim: the cuts in the skim list are evaluated on the raw NanoAOD
(before any object correction) and every event that passes is exported. No calibrators and
no preselections are run, so the skim is fast and depends only on uncorrected quantities.
This is the mode used by the minimal configuration above
— you do not need to set skim_mode explicitly.
Use it when your skim selection can be expressed purely on raw branches (trigger bits, golden JSON, event flags, raw-jet/lepton multiplicities, …). It is the right choice for the first-pass reduction of a large dataset.
skim_mode: "presel_any_variation"#
In this mode the skim is systematic-aware: an event is kept if it passes the
preselections in any active calibration variation. For example an event that fails the
nominal jet selection but enters it after a JES Up shift is still written, so the skimmed
files remain valid inputs for a later systematics analysis.
1cfg = Configurator(
2 ...
3 workflow_options = {"skim_mode": "presel_any_variation"},
4
5 skim = [get_nPVgood(1), eventFlags, get_HLTsel()], # loose, on raw NanoAOD
6 save_skimmed_files = "./skim_presel_any/",
7
8 preselections = [get_nObj_min(5, minpt=30., coll="JetGood")], # tighter, on calibrated objects
9
10 calibrators = default_calibrators_sequence,
11 variations = {
12 "weights": {"common": {"inclusive": [], "bycategory": {}}, "bysample": {}},
13 "shape": {"common": {"inclusive": ["jet_calibration"]}, "bysample": {}},
14 },
15 ...
16)
Internally the processor first applies the skim cuts, then runs the calibration loop as a
dry run: for every variation it applies the object preselection and computes the
preselection mask without filtering events, accumulating the logical OR of all the
per-variation masks. The combined mask is finally applied to the (uncalibrated) post-skim
events, which are then exported. The set of variations considered is exactly the one
declared in variations["shape"] together with the nominal pass.
Use this mode when the events you ultimately want are defined by a selection on calibrated objects (jets after JEC/JER, etc.) and you must not lose events that only enter the selection under a systematic shift.
Warning
skim_mode only has an effect when save_skimmed_files is configured. Setting it without
save_skimmed_files triggers a warning and is ignored — the processor then runs the normal
analysis (calibration, preselection, histograms) rather than skimming.
hadd the skimmed files#
The skim produces many small per-chunk files. hadd-skimmed-files computes sensible
groups and writes out the jobs to merge them:
pocket-coffea hadd-skimmed-files -fl output_total.coffea \
-o root://eoscms.cern.ch//eos/cms/store/group/.../skim_hadd \
-e 400000 -s 6 --dry
After the hadd jobs have run, validate the result with --check. This does not run hadd
again — it rebuilds the expected workload and verifies, for each output file, that it
exists, opens, has an Events tree, and that GetEntries() matches the expected sum of
nskimmed_events. Any failures are written to hadd_failed.json / .txt together with a
resubmission .sub file:
pocket-coffea hadd-skimmed-files -fl output_total.coffea \
-o root://eoscms.cern.ch//eos/cms/store/group/.../skim_hadd --check
Subsamples#
A subsample splits one declared sample into several mutually-defined sub-categories at
the dataset level, each selected by a list of Cuts. Subsamples are declared in the
datasets block and produce independent output keys named <sample>__<subsample>:
1datasets = {
2 "jsons": ['datasets/datasets_cern.json'],
3 "filter": {
4 "samples": ['TTTo2L2Nu', "DATA_SingleMuon"],
5 "year": ['2018'],
6 },
7 "subsamples": {
8 "TTTo2L2Nu": {
9 "ele": [get_nObj_min(1, coll="ElectronGood"), get_nObj_eq(0, coll="MuonGood")],
10 "mu": [get_nObj_eq(0, coll="ElectronGood"), get_nObj_min(1, coll="MuonGood")],
11 }
12 }
13}
This yields TTTo2L2Nu__ele and TTTo2L2Nu__mu in the output. Subsamples need not be
orthogonal or exhaustive — an event can fall into several subsamples (it is duplicated) or
into none (it is dropped). Subsample-specific weights and variations are addressed by the
full <sample>__<subsample> key in the bysample block:
1weights = {
2 "common": {"inclusive": ["genWeight", "lumi", "XS", "pileup"]},
3 "bysample": {
4 "TTTo2L2Nu__ele": {
5 "inclusive": ["sf_custom_C"],
6 "bycategory": {"B": ["sf_custom_D"]},
7 }
8 }
9}
Primary dataset cross-cleaning#
In data the same physics event can be stored in more than one primary dataset (PD). A
di-leptonic event, for instance, may fire both a single-electron and a single-muon
trigger, and would then be present in both the EGamma and SingleMuon PDs. If you
simply OR all triggers on each PD you will count such events twice. Cross-cleaning removes
the overlap by assigning each event to exactly one PD.
The strategy is an ordered priority among the primary datasets: the first PD keeps every
event that fires its own triggers; each subsequent PD keeps events that fire its own
triggers and do not fire the triggers of any higher-priority PD. The veto is built
with get_HLTsel(primaryDatasets=[...], invert=True).
This is done at the subsample level, one subsample per PD, so that the trigger logic is isolated per data sample. A common idiom is to give the single subsample the same name as the sample, so the cleaned output keeps a clean, self-describing name:
1datasets = {
2 "filter": {
3 "samples": ["DATA_EGamma", "DATA_SingleMuon", ...],
4 "year": ['2022_preEE', '2022_postEE', '2023_preBPix', '2023_postBPix', '2024'],
5 },
6 "subsamples": {
7 # Priority 1: EGamma keeps all events firing the SingleEle triggers
8 "DATA_EGamma": {
9 "DATA_EGamma": [get_HLTsel(primaryDatasets=["SingleEle"])],
10 },
11 # Priority 2: SingleMuon keeps events firing SingleMuon triggers
12 # but NOT the (higher-priority) SingleEle triggers
13 "DATA_SingleMuon": {
14 "DATA_SingleMuon": [
15 get_HLTsel(primaryDatasets=["SingleMuon"]),
16 get_HLTsel(primaryDatasets=["SingleEle"], invert=True),
17 ],
18 },
19 },
20}
With this ordering:
an event firing only SingleEle triggers → kept by
DATA_EGammaonly;an event firing only SingleMuon triggers → kept by
DATA_SingleMuononly;an event firing both → kept by
DATA_EGamma(priority 1) and explicitly vetoed inDATA_SingleMuonby the inverted SingleEle selection.
so every data event is counted exactly once.
Note
The cross-cleaning cuts go in the subsample definitions, not in the skim. The skim
typically applies the inclusive OR of all the primary-dataset triggers so that no useful
event is dropped before the cleaning step:
1skim = [
2 get_nPVgood(1), eventFlags, goldenJson,
3 ...
4 get_HLTsel(["SingleEle", "SingleMuon"]), # inclusive OR; cleaning happens per-subsample
5]
Warning
Apply the cross-cleaning only to data. The MC samples should keep the inclusive
trigger OR (the default get_HLTsel() behavior), because there is no double-counting to
remove in simulation — each MC event exists only once regardless of which triggers it fires.
The same mechanism extends to more than two primary datasets: order them by priority and,
for the i-th PD, add one inverted get_HLTsel per higher-priority PD (or a single
inverted selection listing all the higher-priority primary datasets).
Define a custom weight#
Custom event weights are created with WeightLambda.wrap_func, then registered by passing
them in the weights_classes list (concatenated with the built-in common_weights).
The wrapped function has the signature
function(params, metadata, events, size, shape_variations) and must return an array of
length size.
A constant or nominal-only weight sets has_variations=False and returns a single array:
1import numpy as np
2from pocket_coffea.lib.weights.weights import WeightLambda
3from pocket_coffea.lib.weights.common import common_weights
4
5my_custom_weight = WeightLambda.wrap_func(
6 name="my_custom_weight",
7 function=lambda params, metadata, events, size, shape_variations: np.ones(size) * 2.0,
8 has_variations=False,
9)
Once defined, register the wrapper and reference it by name in the weights block. Weights
can be applied to all samples (common) or to specific ones (bysample), and either
inclusively or only in specific categories (bycategory):
1cfg = Configurator(
2 ...
3 weights = {
4 "common": {
5 "inclusive": ["genWeight", "lumi", "XS", "pileup"],
6 "bycategory": {"2jets_B": ["my_custom_weight"]},
7 },
8 "bysample": {
9 "TTTo2L2Nu": {"bycategory": {"2jets_D": ["sf_btag"]}},
10 },
11 },
12 weights_classes = common_weights + [my_custom_weight],
13 ...
14)
Define a custom weight with custom variations#
To attach a single Up/Down systematic to a weight, set has_variations=True and return
the triple (nominal, up, down):
1my_custom_weight_var = WeightLambda.wrap_func(
2 name="my_custom_weight_withvar",
3 function=lambda params, metadata, events, size, shape_variations: (
4 np.ones(size) * 2.0, # nominal
5 np.ones(size) * 3.0, # up
6 np.ones(size) * 0.5, # down
7 ),
8 has_variations=True,
9)
For several named variations on the same weight, declare them in variations=[...] and
return (nominal, [names], [ups], [downs]):
1my_custom_weight_multivar = WeightLambda.wrap_func(
2 name="my_custom_weight_multivar",
3 function=lambda params, metadata, events, size, shape_variations: (
4 np.ones(size) * 2.0, # nominal
5 ["stat", "syst"], # variation names
6 [np.ones(size) * 3.0, np.ones(size) * 4.0], # up for each
7 [np.ones(size) * 0.5, np.ones(size) * 0.25], # down for each
8 ),
9 has_variations=True,
10 variations=["stat", "syst"],
11)
A weight only produces systematic templates if it is also listed in the variations
block of the Configurator. The structure mirrors the weights block, so a variation can
be enabled for all samples or only for some, inclusively or per-category:
1variations = {
2 "weights": {
3 "common": {
4 "inclusive": ["pileup", "sf_ele_id", "sf_ele_reco"],
5 "bycategory": {"1btag": ["sf_btag"]},
6 },
7 "bysample": {
8 "TTTo2L2Nu": {"bycategory": {"2jets_D": ["my_custom_weight_multivar"]}},
9 },
10 },
11}
The resulting histogram variation axis will contain one <name>Up/<name>Down pair per
declared variation (e.g. my_custom_weight_multivar_statUp, ..._systDown).
Note
Weights can also be applied to data (e.g. data-driven correction factors): list them in
the weights block exactly as for MC. See
tests/test_full_configs/test_custom_weights/config_weights_data.py.
promptMVA lepton selection and on-the-fly re-evaluation#
The promptMVA (mvaTTH) is a BDT-based lepton ID used in ttH analyses to suppress
non-prompt leptons. For Run3 2022–2023 datasets produced with NanoAODv12/v13, the
mvaTTH branch stored in NanoAOD was trained on a different data-taking period and
must be re-evaluated on the fly before applying the working-point cut. For 2024
datasets (NanoAODv15) the value is correct out of the box.
PocketCoffea provides:
ElectronMVAEvaluator/MuonMVAEvaluator— XGBoost wrappers inpocket_coffea.lib.xgboost_evaluatorthat reproduce the TMVA score in[-1, 1].lepton_selection_promptMVA— drop-in replacement forlepton_selectioninpocket_coffea.lib.leptonsthat applies the full preselection and optional MVA WP cut.Pre-configured model weights and scale factors in
pocket_coffea/parameters/lepton_scale_factors.yamlunderlepton_scale_factors.{electron,muon}_sf.promptMVA_jsons.
Object preselection parameters#
The lepton_selection_promptMVA function requires several additional keys under
object_preselection compared to the standard lepton_selection. Add these to your
params/object_preselection.yaml:
object_preselection:
Electron:
pt: 15.0
eta: 2.5
sip3d: 8.0 # 3D impact-parameter significance
lostHits: 1 # max number of lost tracker hits
dxy: 0.05 # |dxy| < dxy [cm]
dz: 0.1 # |dz| < dz [cm]
id: # NanoAOD field used as MVA score (per year)
"2022_preEE": mvaTTH_redo
"2022_postEE": mvaTTH_redo
"2023_preBPix": mvaTTH_redo
"2023_postBPix": mvaTTH_redo
"2024": mvaTTH # read directly from NanoAOD in v15
mva_wp: # MVA working-point threshold (per year)
"2022_preEE": 0.90
"2022_postEE": 0.90
"2023_preBPix": 0.90
"2023_postBPix": 0.90
"2024": 0.90
btag_cut: # DeepFlavB score veto on the closest jet (per year)
2022_preEE: 0.3086
2022_postEE: 0.3196
2023_preBPix: 0.2431
2023_postBPix: 0.2435
"2024": 999.
Muon:
pt: 10.0
eta: 2.4
iso: 0.4
sip3d: 8.0
dxy: 0.05
dz: 0.1
base_id: looseId # NanoAOD field for baseline muon ID
id:
"2022_preEE": mvaTTH_redo
"2022_postEE": mvaTTH_redo
"2023_preBPix": mvaTTH_redo
"2023_postBPix": mvaTTH_redo
"2024": mvaTTH
mva_wp:
"2022_preEE": 0.64
"2022_postEE": 0.64
"2023_preBPix": 0.64
"2023_postBPix": 0.64
"2024": 0.64
btag_cut:
2022_preEE: 0.3086
2022_postEE: 0.3196
2023_preBPix: 0.2431
2023_postBPix: 0.2435
"2024": 999.
Workflow implementation#
In apply_object_preselection, first compute scores for 2022–2023 years and attach
them as mvaTTH_redo, then call lepton_selection_promptMVA with apply_mva_cut=True:
1import awkward as ak
2from pocket_coffea.workflows.base import BaseProcessorABC
3from pocket_coffea.lib.objects import jet_selection, btagging
4from pocket_coffea.lib.leptons import lepton_selection_promptMVA
5from pocket_coffea.lib.xgboost_evaluator import ElectronMVAEvaluator, MuonMVAEvaluator
6
7REDO_MVA_YEARS = ["2022_preEE", "2022_postEE", "2023_preBPix", "2023_postBPix"]
8
9
10class MyProcessor(BaseProcessorABC):
11
12 def apply_object_preselection(self, variation):
13 # Re-evaluate promptMVA on the fly for 2022/2023 datasets
14 if self._year in REDO_MVA_YEARS:
15 ele_evaluator = ElectronMVAEvaluator(
16 self.params.lepton_scale_factors.electron_sf.promptMVA_jsons[self._year].model_weights
17 )
18 muon_evaluator = MuonMVAEvaluator(
19 self.params.lepton_scale_factors.muon_sf.promptMVA_jsons[self._year].model_weights
20 )
21
22 _, presel_ele_mask = lepton_selection_promptMVA(
23 self.events, "Electron", self.params, self._year, apply_mva_cut=False
24 )
25 self.events["Electron"] = ak.with_field(
26 self.events["Electron"],
27 ele_evaluator.evaluate(self.events["Electron"], self.events["Jet"], mask=presel_ele_mask),
28 "mvaTTH_redo",
29 )
30
31 _, presel_muon_mask = lepton_selection_promptMVA(
32 self.events, "Muon", self.params, self._year, apply_mva_cut=False
33 )
34 self.events["Muon"] = ak.with_field(
35 self.events["Muon"],
36 muon_evaluator.evaluate(self.events["Muon"], self.events["Jet"], mask=presel_muon_mask),
37 "mvaTTH_redo",
38 )
39
40 # Apply final selection using the (re-)computed MVA score
41 self.events["ElectronGood"], _ = lepton_selection_promptMVA(
42 self.events, "Electron", self.params, self._year,
43 apply_mva_cut=True,
44 mva_var=self.params.object_preselection.Electron.id[self._year],
45 )
46 self.events["MuonGood"], _ = lepton_selection_promptMVA(
47 self.events, "Muon", self.params, self._year,
48 apply_mva_cut=True,
49 mva_var=self.params.object_preselection.Muon.id[self._year],
50 )
51
52 leptons = ak.with_name(
53 ak.concatenate((self.events.MuonGood, self.events.ElectronGood), axis=1),
54 name="PtEtaPhiMCandidate",
55 )
56 self.events["LeptonGood"] = leptons[ak.argsort(leptons.pt, ascending=False)]
57 self.events["JetGood"], self.jetGoodMask = jet_selection(
58 self.events, "Jet", self.params, self._year, leptons_collection="LeptonGood"
59 )
60 self.events["BJetGood"] = btagging(
61 self.events["JetGood"],
62 self.params.btagging.working_point[self._year],
63 wp=self.params.object_preselection.Jet["btag"]["wp"],
64 )
Scale factors#
The corresponding promptMVA lepton ID scale factors are provided as the
sf_ele_promptMVA and sf_mu_promptMVA weight wrappers, already included in
common_weights. Add them to the weights section of your configurator:
1weights = {
2 "common": {
3 "inclusive": [
4 "genWeight", "lumi", "XS",
5 "sf_ele_promptMVA",
6 "sf_mu_promptMVA",
7 ],
8 },
9}
Note
For 2022–2023 the scale factors are derived from the TTH multilepton team and stored
under pocket_coffea/parameters/custom_SFs/{electron,muon}_promptMVA/. For 2024
(NanoAODv15) central POG scale factors are used automatically.
Systematic shape variations#
Shape (a.k.a. “up/down template”) systematics come from object calibrations — JEC/JER,
MET, electron scale & smearing, etc. They are enabled by listing the relevant calibrator
in the shape block of variations, while the calibrators themselves are activated
through the calibrators argument of the Configurator:
1from pocket_coffea.lib.calibrators.common.common import default_calibrators_sequence
2
3cfg = Configurator(
4 ...
5 calibrators = default_calibrators_sequence, # JetsCalibrator, METCalibrator, ElectronsScaleCalibrator, ...
6 variations = {
7 "weights": {"common": {"inclusive": [...]}},
8 "shape": {"common": {"inclusive": ["jet_calibration"]}},
9 },
10 ...
11)
During processing the calibration loop re-runs the full per-event selection once per
variation, so each shape systematic produces its own entry on the histogram variation
axis. For the jet calibration the axis labels are built as <jet_type>_<variation>Up/Down,
e.g. AK4PFchs_JES_TotalUp, AK4PFchs_JERDown for Run2, or AK4PFPuppi_JES_TotalUp for
Run3. Which individual JES/JER sources are produced is controlled by the
jets_calibration.variations parameter (see
Systematic Variations under Jet calibration).
Subsample-specific shape variations#
Shape variations declared as common apply to every (sub)sample. To restrict an
additional shape variation to a single subsample, address it by its <sample>__<subsample>
key in the bysample block, exactly as for weights. A subsample that is not listed gets
only the common variations, so its variation axis will not contain the subsample-specific
labels. Two subsamples with identical selection cuts will produce identical nominal
histograms, while only the configured one carries the extra variation templates (see
tests/test_full_configs/test_shape_variations/config_shape_var_bysubsample.py).
When shape and weight variations are both active, the weights applied during a shape pass are the nominal subsample weights — so a subsample-specific normalization factor multiplies both the nominal and the JES-shifted templates consistently.
Electron scale & smearing#
The ElectronsScaleCalibrator (part of default_calibrators_sequence) applies the
electron scale & smearing corrections and their variations. When the corrected
ElectronGood_pt is exported alongside the uncorrected ElectronGood_pt_original, the two
differ for every event, confirming the correction was applied (see
config_eleSS_Run3.py).
Apply corrections#
Here we describe how to apply certain corrections recommended by CMS POGs.
MET-xy#
From a purely physical point of view, the distribution of the \(\phi\)-component of the missing transverse momentum (a.k.a. MET) should be uniform due to rotational symmetry. However, for a variety of detector-related reasons, the distribution is not uniform in practice, but shows a sinus-like behavior. To correct this behavior, the x- and y-component of the MET can be altered in accordance to the recommendation of JME. In the PocketCoffea workflow, these corrections can be applied using the met_xy_correction() function:
from pocket_coffea.lib.jets import met_xy_correction
met_pt_corr, met_phi_corr = met_xy_correction(self.params, self.events, self._year, self._era)
Note, that this shift also alters the \(p_\mathrm{T}\) component! Also, the corrections are only implemented for Run2 UL (thus far).
Jet calibration configuration#
PocketCoffea provides a flexible jet calibration system that handles Jet Energy Corrections (JEC), Jet Energy Resolution (JER), and systematic variations. The calibration is configured through the jets_calibration.yaml parameters file and applied using specialized calibrator classes.
Core Concepts#
The jet calibration system is built around three main concepts:
Jet Types: Abstract jet algorithm/configuration identifiers (e.g.,
AK4PFchs,AK4PFPuppi,AK8PFPuppi)Jet Collections: Actual NanoAOD collections that store jet objects (e.g.,
Jet,FatJet)Factory Configurations: JEC/JER correction files and settings associated with each jet type
The system maps jet types to collections and applies the appropriate corrections based on data-taking period, MC/Data status, and systematic variation requests.
Available Calibrators#
JetsCalibrator: Standard JEC/JER calibrator for regular jet collections
Applies JEC (L1FastJet, L2Relative, etc.) and JER corrections
Handles systematic variations (JES uncertainties, JER variations)
Works with both MC and Data
Can apply the pT regression to some jets collections before calibration.
Configuration Structure#
The configuration is organized in several sections:
Factory Configuration#
Defines the correction files for each jet type, data-taking period, and correction level:
default_jets_calibration:
factory_config_clib:
AK4PFchs:
2016_PreVFP:
json_path: ${cvmfs:Run2-2016preVFP-UL-NanoAODv9,JME,jet_jerc.json.gz}
jec_mc: Summer19UL16APV_V7_MC
jec_data:
B: Summer19UL16APV_RunBCD_V7_DATA
C: Summer19UL16APV_RunBCD_V7_DATA
D: Summer19UL16APV_RunBCD_V7_DATA
E: Summer19UL16APV_RunEF_V7_DATA
F: Summer19UL16APV_RunEF_V7_DATA
jer: Summer20UL16APV_JRV3_MC
level: L1L2L3Res
2016_PostVFP:
json_path: ${cvmfs:Run2-2016postVFP-UL-NanoAODv9,JME,jet_jerc.json.gz}
jec_mc: Summer19UL16_V7_MC
jec_data: Summer19UL16_RunFGH_V7_DATA
jer: Summer20UL16_JRV3_MC
level: L1L2L3Res
....
AK4PFPuppi:
2022_preEE:
json_path: ${cvmfs:Run3-22CDSep23-Summer22-NanoAODv12,JME,jet_jerc.json.gz}
jec_mc: Summer22_22Sep2023_V3_MC
jec_data: Summer22_22Sep2023_RunCD_V3_DATA
jer: Summer22_22Sep2023_JRV1_MC
level: L1L2L3Res
2022_postEE:
json_path: ${cvmfs:Run3-22EFGSep23-Summer22EE-NanoAODv12,JME,jet_jerc.json.gz}
jec_mc: Summer22EE_22Sep2023_V3_MC
jec_data:
E: Summer22EE_22Sep2023_RunE_V3_DATA
F: Summer22EE_22Sep2023_RunF_V3_DATA
G: Summer22EE_22Sep2023_RunG_V3_DATA
jer: Summer22EE_22Sep2023_JRV1_MC
level: L1L2L3Res
Collection Mapping#
Maps jet types to actual NanoAOD collections for each data-taking period:
jets_calibration:
collection:
2022_preEE:
AK4PFPuppi: "Jet"
AK8PFPuppi: "FatJet"
# AK4PFPuppiPNetRegression: "Jet"
Here, the AK4PFPuppi, AK8PFPuppi and AK4PFPuppiPNetRegression are the jet types used as internal labels in the PocketCoffea configuration to link various pieces of the jets configuration together. The Jet and FatJet are the names of the branches in NanoAOD.
Warning
All the collections defined in the jets_calibration.collection entry will be calibrated by the configured JetsCalibrator if included in the calibrators sequence. It is not allowed to match the same jet collection to multiple jet types: an error will be raised.
Collection Name Aliases#
In order to use the same calibration settings on different jet collections, it is possible to define aliases for the collection names with the collection_name_alias key. For example, if you have a jet collection called JetCustom that you want to calibrate in the same way as you do for the Jet collection, which is mapped to the AK4PFPuppi jet type, your configuration would look like this:
jets_calibration:
collection:
2022_preEE:
AK4PFPuppi: "Jet"
AK4PFPuppiCustom: "JetCustom"
collection_name_alias:
2022_preEE:
AK4PFPuppiCustom : "AK4PFPuppi"
Warning
In order to merge the variations of the JetCustom collection, you need to define a merge_collections_for_variations entry as specified in section Merging Systematic Variations.
Calibration Control Flags#
Here is how one can enable/disable different correction types per jet type and period:
# Apply JEC to MC
apply_jec_MC:
2022_preEE:
AK4PFPuppi: True
AK4PFPuppiPNetRegression: True
# Apply JER to MC
apply_jer_MC:
2022_preEE:
AK4PFPuppi: True
AK4PFPuppiPNetRegression: True
# Apply JEC to Data
apply_jec_Data:
2022_preEE:
AK4PFPuppi: True
# Apply pT regression to MC
apply_pt_regr_MC:
2022_preEE:
AK4PFPuppi: False
AK4PFPuppiPNetRegression: True
sort_by_pt:
2016_PreVFP:
AK4PFchs: True
AK8PFPuppi: True
2016_PostVFP:
AK4PFchs: True
AK8PFPuppi: True
Systematic Variations#
Different sets of JEC systematic variations are available in the default parameter set.
default_jets_calibration:
variations:
full_variations: # All individual JES sources
AK4PFPuppi:
2022_preEE:
- JES_Regrouped_Absolute
- JES_Regrouped_FlavorQCD
- JER
total_variation: # Total JES uncertainty
AK4PFPuppi:
2022_preEE:
- JES_Total
- JER
The set of variations to be used has to be setup in the jets_calibration.variation key. For example:
jets_calibration:
variations: "${default_jets_calibration.variations.total_variation}"
The user can also enable only some of the variations for different data taking periods
jets_calibration:
variations:
2022_preEE:
AK4PFPuppi:
- JES_Total # Only total JES uncertainty
- JER # JER variations
Merging Systematic Variations#
By default, each systematic variation produces a separate collection of calibrated jets. To reduce the number of collections, variations can be merged into a single collection per jet type. This is configured using the merge_collections_for_variations key, which specifies which jet types should have their variations merged and under which new name:, e.g.:
jets_calibration:
merge_collections_for_variations:
2022_preEE:
AK4Jet:
- AK4PFPuppi
- AK4PFPuppiPNetRegression
- AK4PFPuppiCustom
This will create variation with the name AK4Jet_{variation}_[up|down] that merges the variations from the specified jet types.
ToDo: better describe how this merging is done. What if AK4PFPuppi and AK4PFPuppiCustom have different up_variation for example.
MET Recalibration#
Configure MET corrections that propagate jet calibration changes:
rescale_MET_config:
2022_preEE:
apply: True
MET_collection: "PuppiMET"
Jet_collection: "Jet"
Usage in Analysis#
The jet calibration is typically applied automatically when using the standard calibrator sequence:
1from pocket_coffea.lib.calibrators.common import default_calibrators_sequence
2
3# Default sequence includes: JetsCalibrator, METCalibrator, ElectronsScaleCalibrator
4calibrators = default_calibrators_sequence
For custom configurations, modify the jets_calibration section in your parameters, for example:
jets_calibration:
# Use a different variation set
variations: "${default_jets_calibration.variations.full_variations}"
# Enable specific jet types
collection:
2022_preEE:
AK4PFPuppiCustomSetOfCorrections: "Jet"
Jet energy regression#
Starting from Run3 datasetes the ParticleNet jet energy regression corrections are part of the Jet object in NanoAOD. But they are not applied by default. In PocketCoffea the regression is implemented in JetPtRegressionCalibrator in the calibration sequence and can be turned On/Off via configuration parameters as show in the example below:
jets_calibration:
collection:
2022_preEE:
AK4PFPuppi: null
AK4PFPuppiPNetRegression: "Jet"
#AK4PFPuppiPNetRegressionPlusNeutrino: "Jet"
apply_pt_regr_MC:
2022_preEE:
AK4PFPuppiPNetRegression: True
#AK4PFPuppiPNetRegressionPlusNeutrino: True
apply_pt_regr_Data:
2022_preEE:
AK4PFPuppiPNetRegression: True
#AK4PFPuppiPNetRegressionPlusNeutrino: True
Note that there are two versions of regression are available in PNet: with and without neutrinos used in the training.
In the example above, the default jets configuration is overwritten to assign the Jet collection to the AK4PFPuppiPNetRegression tag, and to activate the pt regression for data and MC for that tag. Note the line AK4PFPuppi: null – it is needed to remove the association of the AK4PFPuppi to the Jet, which is default pocket-coffea setting.
However, this is not all. We also need to apply JEC and JER on the regressed jets. The dedicated corrections, derived for PNet regressed jets, have been recently releasesd by JME, see https://cms-jerc.web.cern.ch/ExpJEC/ (as of 19 May 2026). The corrections are located at CERN EOS (not CVMFS!). One has to specify a path to them and set the tags, like so for 2022_preEE:
jets_calibration:
...
# set up collection, apply_pt_regr_MC and apply_pt_regr_Data fields
...
# Path to JERCS at CERN EOS:
path_to_JERC_PNetReg: '/eos/cms/store/group/phys_jetmet/ExpJEC/json_files_Reg/'
# If running outside CERN without access to EOS: copy the files to your cluster and use the corresponding path.
jet_types:
AK4PFPuppiPNetRegression:
2022_preEE:
json_path: "${jets_calibration.path_to_JERC_PNetReg}/Run3Summer22_22Sep2023/regJet_jerc.json.gz"
jec_mc: Summer22_22Sep2023_V4_MC
jec_data: Summer22_22Sep2023_V4_DATA
jer: Summer22_22Sep2023_JRV1_MC
level: L1L2L3Res
variations:
AK4PFPuppiPNetRegression:
2022_preEE:
- JES_Total
- JER
An example of a full config can be found here.
If the user needs to apply regression only to a subset of jets, then the best strategy is to define a copy of the Jet collection and calibrate that.
An example configuration for this:
jets_calibration:
collection:
2022_preEE:
AK4PFPuppiPNetRegression: "JetPtReg"
#AK4PFPuppiPNetRegressionPlusNeutrino: "JetPtReg"
object_preselections:
Jet:
pt: 20
eta..
JetPtReg:
pt: 20
eta: ...
btag:
wp:
M
For this to work a clone of the Jet collection (called JetPtReg) needs to be defined in the process_extra_after_skim function of the user’s workflow:
1from pocket_coffea.workflows.base import BaseProcessorABC
2
3
4class PtRegrProcessor(BaseProcessorABC):
5 def __init__(self, cfg) -> None:
6 super().__init__(cfg=cfg)
7
8 def process_extra_after_skim(self):
9 # Create extra Jet collections for testing
10 self.events["JetPtReg"] = ak.copy(self.events["Jet"])
11 # self.events["JetPtRegPlusNeutrino"] = ak.copy(self.events["Jet"])
Now it is up to the user to deal with two collections in their workflow: Jet and JetPtReg.
Further references:
The analysis note: AN-2022/094
Measuring response in Z+b events: presentation
JME page with experimental JECs: https://cms-jerc.web.cern.ch/ExpJEC/
Merge regressed and standard jet pT#
In some cases, e.g. PNet regression in NanoAODv12, the regression can be applied only to a subset of jets (e.g. cutting on pT and eta of the jet). In this case, one may want to merge the regressed pT values with the standard pT values for jets failing the regression criteria. In order to do this, your configuration would look like this:
jets_calibration:
collection:
2022_preEE:
AK4PFPuppi: "Jet"
AK4PFPuppiPNetRegression: "JetPtReg"
#AK4PFPuppiPNetRegressionPlusNeutrino: "JetPtReg"
The merging of the two jet collections should be done in the user’s workflow, e.g. in the apply_object_preselection section:
1from pocket_coffea.workflows.base import BaseProcessorABC
2
3
4class PtRegrProcessor(BaseProcessorABC):
5 def __init__(self, cfg) -> None:
6 super().__init__(cfg=cfg)
7
8 def process_extra_after_skim(self):
9 # Create extra Jet collections for testing
10 self.events["JetPtReg"] = ak.copy(self.events["Jet"])
11 #self.events["JetPtRegPlusNeutrino"] = ak.copy(self.events["Jet"])
12
13 def apply_object_preselection(self, variation):
14 # Use the regressed jet from PNet collection if available,
15 # otherwise use the standard, JEC corrected collection.
16 # This way we consider correctly all fields which change depending on
17 # the pt definition, namely the pt, mass and the associated systematic variations:
18 self.events["Jet"] = ak.where(
19 self.events["JetPtReg"].pt > 0,
20 self.events["JetPtReg"],
21 self.events.Jet,
22 )
Warning
In order to merge the variations of the Jet and JetPtReg collections, you need to define a merge_collections_for_variations entry as specified in section Merging Systematic Variations.
Warning
When merging the collections like this, make sure to set the sort_by_pt option to False for the jet type in the jets calibration configuration, otherwise the jet ordering will be changed and the merging will fail.
Create a custom executor to use onnxruntime#
This example shows running on CERN lxplus and assumes a prior understanding of how to load and use an ML model with onnxruntime. For more examples see the executors in the ttHbb analysis here
At the time of writing, onnxruntime is not installed in the singularity container, which means that you will need to run with a custom environment. Instructions for this are given in Running the analysis
The following code is a custom executor which is meant to be filled in with details such as the path to the model.onnx file and options used in the InferenceSession.
1from pocket_coffea.executors.executors_lxplus import DaskExecutorFactory
2from dask.distributed import WorkerPlugin, Worker, Client
3
4
5class WorkerInferenceSessionPlugin(WorkerPlugin):
6 def __init__(self, model_path, session_name):
7 super().__init__()
8 self.model_path = model_path
9 self.session_name = session_name
10
11 async def setup(self, worker: Worker):
12 import onnxruntime as ort
13
14 session = ort.InferenceSession(
15 self.model_path,
16 # Whatever other options you use
17 providers=["CPUExecutionProvider"],
18 )
19 worker.data["model_session_" + self.session_name] = session
20
21
22class OnnxExecutorFactory(DaskExecutorFactory):
23 def __init__(self, **kwargs):
24 super().__init__(**kwargs)
25
26 def setup(self):
27 super().setup()
28 self.Model = "absolute/path/to/your/model.onnx"
29 inference_session_plugin = WorkerInferenceSessionPlugin(self.Model, "ModelName")
30 self.dask_client.register_plugin(inference_session_plugin)
31
32 def close(self):
33 self.dask_client.close()
34 self.dask_cluster.close()
35
36
37def get_executor_factory(executor_name, **kwargs):
38 return OnnxExecutorFactory(**kwargs)
To use the model to process events in the workflow.py file, one would do something like this. See here for another example.
1# import the get_worker function
2from dask.distributed import get_worker
3
4# Rest of workflow
5
6
7# Suppose you want to apply your model after preselection. You would do e.g.
8def process_extra_after_presel():
9 try:
10 worker = get_worker()
11 except ValueError:
12 worker = None
13
14 # Whatever needs to be done to prepare the inputs to the model
15
16 if worker is None:
17 # make it work running locally too
18 import onnxruntime as ort
19
20 session = ort.InferenceSession(
21 self.model_path,
22 # Whatever other options you use
23 providers=["CPUExecutionProvider"],
24 )
25 else:
26 session = worker.data["model_session_ModelName"]
27
28 # Continue as you normally would when using an ML model with onnxruntime, e.g.
29 model_output = session.run(
30 # inputs and options
31 )
To run with the custom executor, assuming the file is called custom_executor.py, one replaces --executor dask@lxplus with --executor-custom-setup custom_executor.py for example:
pocket-coffea run --cfg myConfig.py -o outputDir -ro run_options.yaml --executor-custom-setup custom_executor.py
Lastly, the custom executor will print a lot of INFO level log messages which are suppressed when running with the built-in pocket-coffea executors. To suppress these for all executors, create a file ~/.config/dask/logging.yaml and add the following:
logging:
distributed: warning
distributed.client: warning
distributed.worker: warning
distributed.nanny: warning
distributed.scheduler: warning
Split large outputs into categories#
If your configuration contains a large number of categories, variables, and systematics, the number of histograms can grow very large. Although on disk, the size of the merged output_all.coffea is usually less than O(10 GB), the full accumulation process can consume around O(100 GB) of RAM. This seems unavoidable as coffea accumulation necessarily happens on memory. This can be addressed in one of two ways:
The
merge-outputscript dumps partial.coffeaoutputs whenever memory usage exceeds 50% of the available RAM on the machine. However, this means one still has to use a different large-memory machine to merge them into one.coffeafile and/or read them all into memory during plotting. The fragmented.coffeadumps consume less space on disk and are fewer in number, so it is easier toscpthem to other machines using this approach.A more efficient solution is to split outputs into “category groups” (i.e. channels or regions of the analysis) and merge/process only one group of categories at one time. Since plots are typically made per channel, this lets one do everything without loading multiple category-grouped files into the memory.
Currently, the second solution is implemented only for the condor@lxplus executor. It can be utilized as follows:
runner: Pass--split-by-categorytorunner(actually gets passed to the executor parameters). The output from each job is then further split to contain 8 categories per output file, so each job producesn_groups = n_categories/8output files. This is handled through thesplit-outputcommand, which in turn callsutils.filter_output.filter_output_by_category.merge-outputs: Handles the merging per category automatically (no extra flag needed) if--split-by-categorywas passed torunner. This will producen_groupsmerged outputs, containing mutually exclusive groups of categories.make-plots: Pass--split-by-categoryflag to handle only the category-wise merged outputs.
Export columns (arrays)#
Besides histograms, PocketCoffea can export selected object fields as flat arrays
(“columns”), stored in the output under output["columns"]. Columns are declared with
ColOut objects in the columns block of the Configurator, using the same
common / bysample / bycategory structure as weights:
1from pocket_coffea.lib.columns_manager import ColOut
2
3columns = {
4 "common": {
5 "inclusive": [
6 ColOut("JetGood", ["pt", "eta"]),
7 ],
8 },
9 "bysample": {
10 "TTTo2L2Nu": {
11 "bycategory": {
12 "2btag": [ColOut("BJetGood", ["pt", "eta", "phi"])],
13 "4jets": [ColOut("JetGood", ["pt", "eta", "phi"])],
14 }
15 }
16 },
17}
ColOut(collection, columns, flatten=True, store_size=True, fill_none=True, fill_value=-999.0, pos_start=None, pos_end=None):
flatten=True(default) flattens the jagged per-event arrays into a single flat array; the per-event multiplicity is recoverable from the companion size column written whenstore_size=True. Setflatten=Falseto keep the jagged (per-event) structure — needed if you intend to dump to parquet (see below).fill_none/fill_valuecontrol how missing entries are padded.pos_start/pos_endexport only a slice of the collection (e.g. the leading jet).
Dump columns to parquet per chunk#
By default the columns accumulate into the .coffea output. For large column dumps it is
more efficient to write one parquet file per processed chunk, keeping the jagged structure
(flatten=False). Enable this with the dump_columns_as_arrays_per_chunk workflow option:
1cfg = Configurator(
2 ...
3 workflow_options = {"dump_columns_as_arrays_per_chunk": "./columns"},
4 columns = {
5 "common": {"inclusive": [ColOut("JetGood", ["pt", "eta"], flatten=False)]},
6 },
7 ...
8)
Each chunk writes its arrays under ./columns/..., organized by sample and category,
instead of being kept in memory until the end of the job.
Delayed branches (lazy event variables)#
Sometimes a derived per-event quantity is needed in cuts, weights or histograms but you
want it computed once per nominal event and reused across all calibrator variations,
without it being recomputed in every pass. Register such a quantity as a delayed branch
in the workflow __init__ via self.delayed_branches.register:
1from pocket_coffea.workflows.base import BaseProcessorABC
2
3class BasicProcessor(BaseProcessorABC):
4 def __init__(self, cfg):
5 super().__init__(cfg)
6 # compute event parity once; events that are masked out get the default value
7 self.delayed_branches.register(
8 "parity",
9 lambda events: events.event % 2,
10 default_value=-1.0,
11 )
The registered branch then behaves like any other event field: it can be referenced by name in cut functions, in custom weights, and as a histogram axis:
1variables = {
2 "Parity": HistConf([Axis(field="parity", bins=3, start=-1, stop=2, label="Parity")],
3 variations=True),
4}
register(name, compute_fn, default_value=1.0):
compute_fn(events)returns an array with a leading event axis. The value is evaluated on the nominal events and mapped to each variation, so it is computed only once.default_valuefills events that are not part of the current selection (the example uses-1.0so unselected events are distinguishable from real parities0/1).To register several branches at once, pass a list of names and a
compute_fnreturning a dict{name: array};default_valuemay then be a dict mapping each name to its default.Dense multi-dimensional outputs are supported as long as the trailing shape matches the shape of the default value.