Full analysis examples#

Few analyses example, with increasing level of complexity are documented here and on the configuration repository: AnalysisConfigs

A full example of all the steps needed to run a full analysis with PocketCoffea is reported, starting from the creation of the datasets list, the customization of parameters and the production of the final shapes and plots. As an example, a toy example version of the Drell-Yan analysis targeting the \(Z\rightarrow \mu\mu\) channel is implemented.

The main steps that need to be performed are the following:

Build the json datasets
Compute the sum of genweights for Monte Carlo datasets and rebuild json datasets
Define selections: trigger, skim, object preselection, event preselection and categorization
Define weights and variations
Define histograms
Run the processor
Make the plots from processor output

Installation#

Install PocketCoffea#

PocketCoffea can be installed from pip or directly from the sources. The latter is needed if you wish to contribute to the core of the framework code and to get the latest under development features.

Tip

The environment needs to be set differently depending on the cluster where you want to run. Please refer to the detailed guide Installation guide.

If you want to test it on lxplus just use the singularity image:

apptainer shell  -B /afs -B /cvmfs/cms.cern.ch -B /tmp  -B /eos/cms/  \
                 -B /etc/sysconfig/ngbauth-submit  \
                 -B ${XDG_RUNTIME_DIR}  --env KRB5CCNAME=${XDG_RUNTIME_DIR}/krb5cc 
                 /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/cms-analysis/general/pocketcoffea:lxplus-cc7-stable

Tip

If you want just to test the analysis example or to work on an analysis configuration without modifying the central code, you don’t need any other setup step!

If instead you want to install it from source, in order to be able to modify and contribute to the core framework code you need to create a virtual environment. The recommended way of doing it is inside the singularity image as described in Installation.

git clone https://github.com/PocketCoffea/PocketCoffea.git
cd PocketCoffea
python -m venv myenv
source myenv/bin/activate

pip install -e .  
#-e installs it in "editable" mode so that the modified files are included dynamically in the package.

Setup the job submission with local core changes

N.B.: In order to properly propagated the local environment and local code changes to jobs running on condor through Dask the user needs to setup the executor options properly with the local-virtualenv: true options. Checkout the running instructions for more details Running.

Configuration files#

The parameters specific to the analysis have to be specified in the configuration file. This file contains a pure python dictionary named cfg that is read and manipulated by the Configurator module.

Tip

Have a look at the Configuration page for more detailed explanations.

A dedicated repository AnalysisConfigs collects the config files for different analysis. Some analysis configs have been included in the main PocketCoffea repository for reference.

# Now downloading the example configuration
git clone https://github.com/PocketCoffea/AnalysisConfigs.git
cd AnalysisConfigs/configs/zmumu

# Ready to start!

A dedicated folder zmumu under the configs contains all the config files, the datasets definition json file, the workflows files and possibly extra files with parameters that are needed for this tutorial analysis:

Dataset preparation#

Build datasets metadata#

The datasets of this analysis include a Drell-Yan Monte Carlo dataset and a SingleMuon dataset. We have to look for the corresponding DAS keys:

/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/NANOAODSIM
/SingleMuon/Run2018C-UL2018_MiniAODv2_NanoAODv9-v2/NANOAOD

The list of datasets has to be written in a structured dictionary together with the corresponding metadata in a json file. This json file is then read by the builds_datasets.py script to produce the actual json datasets that are passed as input to the Coffea processor.

Tip

For a more detailed discussion about datasets, samples and their meaning in PocketCoffea, see Datasets handling

The steps are the following:

Create a json file that contains the required datasets, dataset_definitions.json. Each entry of the dictionary corresponds to a dataset. Datasets include a list of DAS keys, the output json dataset path and the metadata. In addition a label sample is specified to assign a the “type” of events contained in the dataset (e.g. group QCD datasets from different HT bins in a single sample).

The general idea is the following:

The dataset key uniquely identifies a dataset.
The sample key is the one used internally in the framework to categorize the type of the events contained in the dataset.
The json_output key defines the path of the destination json file which will store the list of files.
Several files entries can be defined for each dataset, including a list of DAS names and a dedicated metadata dictionary.
The metadata keys should include:
- For Monte Carlo: year, isMC and xsec.
- For Data: year, isMC, era and primaryDataset.

When the json datasets are built, the metadata parameters are linked to the files list, defining a unique dataset entry with the corresponding files. The primaryDataset key for Data datasets is needed in order to apply a trigger selection only to the corresponding dataset (e.g. apply the SingleMuon trigger only to datasets having primaryDataset=SingleMuon).

The structure of the datasets_definitions.json file after filling in the dictionary with the parameters relevant to our Drell-Yan and SingleMuon datasets should be the following:

{
   "DYJetsToLL_M-50":{
        "sample": "DYJetsToLL",
        "json_output"    : "datasets/DYJetsToLL_M-50.json",
        "files":[
            { "das_names": ["/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/NANOAODSIM"],
              "metadata": {
                  "year":"2018",
                  "isMC": true,
		          "xsec": 6077.22,
                  "part": "" 
                  }
            }
        ]
  },
    "DATA_SingleMuon": {
        "sample": "DATA_SingleMuon",
        "json_output": "datasets/DATA_SingleMuon.json",
        "files": [
             {
                "das_names": [
                    "/SingleMuon/Run2018A-UL2018_MiniAODv2_NanoAODv9-v2/NANOAOD"
                ],
                "metadata": {
                    "year": "2018",
                    "isMC": false,
                    "primaryDataset": "SingleMuon",
                    "era": "A"
                },
                "das_parents_names": [
                    "/SingleMuon/Run2018A-UL2018_MiniAODv2-v3/MINIAOD"
                ]
            },
            {
                "das_names": [
                    "/SingleMuon/Run2018B-UL2018_MiniAODv2_NanoAODv9-v2/NANOAOD"
                ],
                "metadata": {
                    "year": "2018",
                    "isMC": false,
                    "primaryDataset": "SingleMuon",
                    "era": "B"
                },
                "das_parents_names": [
                    "/SingleMuon/Run2018B-UL2018_MiniAODv2-v2/MINIAOD"
                ]
            },
            {
                "das_names": [
                    "/SingleMuon/Run2018C-UL2018_MiniAODv2_NanoAODv9-v2/NANOAOD"
                ],
                "metadata": {
                    "year": "2018",
                    "isMC": false,
                    "primaryDataset": "SingleMuon",
                    "era": "C"
                },
                "das_parents_names": [
                    "/SingleMuon/Run2018C-UL2018_MiniAODv2-v2/MINIAOD"
                ]
            },
            {....}
        ]
    }
}

To produce the json files containing the file lists, run the following command:

# if you are using singularity, create the ticket outside of it.
voms-proxy-init -voms cms -rfc --valid 168:0

build_datasets.py --cfg datasets/dataset_definitions.json -o

# if you are running at CERN it is useful to restrict the data sources to Tiers closer to CERN
build_datasets.py --cfg datasets/datasets_definitions.json -o -rs 'T[123]_(FR|IT|BE|CH|DE)_\w+'

Four json files are produced as output, two for each dataset: a version includes file paths with a specific prefix corresponding to a site (querying Rucio for the availability of the files at the moment of running the script) while another has a global redirector prefix (e.g. xrootd-cms.infn.it), and is named with the suffix _redirector.json If one has to rebuild the dataset to include more datasets, the extra argument --overwrite can be provided to the script.

ls -lrt datasets/
total 129K
-rw-r--r-- 1   49K May  8 13:38 DYJetsToLL_M-50_2018.json
-rw-r--r-- 1   47K May  8 13:38 DYJetsToLL_M-50_redirector_2018.json
-rw-r--r-- 1   46K May  8 13:38 DYJetsToLL_M-50_local_2018.json
-rw-r--r-- 1  5.6K May 11 19:19 datasets_definitions.json~
-rw-r--r-- 1   49K May 11 19:20 DYJetsToLL_M-50.json
-rw-r--r-- 1   47K May 11 19:20 DYJetsToLL_M-50_redirector.json
-rw-r--r-- 1  144K May 11 19:21 DATA_SingleMuon.json
-rw-r--r-- 1  135K May 11 19:21 DATA_SingleMuon_redirector.json
-rw-r--r-- 1  3.0K May 11 19:22 datasets_definitions.json

There are more options to specify a regex to filter CMS Tiers or options to whitelist or blacklist sites. Moreover the output jsons can be split automatically by year or kept together.

(pocket-coffea) ➜  zmumu git:(main) ✗ build_datasets.py -h

   ___       _ __   _____       __               __ 
  / _ )__ __(_) /__/ / _ \___ _/ /____ ____ ___ / /_
 / _  / // / / / _  / // / _ `/ __/ _ `(_-</ -_) __/
/____/\_,_/_/_/\_,_/____/\_,_/\__/\_,_/___/\__/\__/ 
                                                   

usage: build_datasets.py [-h] [--cfg CFG] [-k KEYS [KEYS ...]] [-d] [-o] [-c] [-s] [-l LOCAL_PREFIX] [-ws WHITELIST_SITES [WHITELIST_SITES ...]] [-bs BLACKLIST_SITES [BLACKLIST_SITES ...]] [-rs REGEX_SITES]

Build dataset fileset in json format

optional arguments:
  -h, --help            show this help message and exit
  --cfg CFG             Config file with parameters specific to the current run
  -k KEYS [KEYS ...], --keys KEYS [KEYS ...]
                        Dataset keys to select
  -d, --download        Download dataset files on local machine
  -o, --overwrite       Overwrite existing file definition json
  -c, --check           Check file existance in the local prefix
  -s, --split-by-year   Split output files by year
  -l LOCAL_PREFIX, --local-prefix LOCAL_PREFIX
                        Local prefix
  -ws WHITELIST_SITES [WHITELIST_SITES ...], --whitelist-sites WHITELIST_SITES [WHITELIST_SITES ...]
                        List of sites in the whitelist
  -bs BLACKLIST_SITES [BLACKLIST_SITES ...], --blacklist-sites BLACKLIST_SITES [BLACKLIST_SITES ...]
                        List of sites in the blacklist
  -rs REGEX_SITES, --regex-sites REGEX_SITES
                        Regex to filter sites

Analysis configuration#

Parameters#

PocketCoffea distinguishes between parameters for the analysis and specific runs configuration.

A parameter is considered something defining more in general a specific analysis and it is usually common between different runs: e.g. HLT triggers, scale factors working points, JES/JER configuration.
A configuration is considered specific for a run of the analysis where the users applied selections, cleans objects and output histograms or arrays.

Tip

Parameters are handled as yaml files with the OmegaConf utility library. A set of common and defaults are defined centrally by PocketCoffea and then the user can overwrite, replace and add analysis specific configurations. Have a look at the Parameters page.

This is done usually in the preamble of the analysis config file:

import os
localdir = os.path.dirname(os.path.abspath(__file__))

# Loading default parameters
from pocket_coffea.parameters import defaults
default_parameters = defaults.get_default_parameters()
defaults.register_configuration_dir("config_dir", localdir+"/params")

parameters = defaults.merge_parameters_from_files(default_parameters,
                                                  f"{localdir}/params/object_preselection.yaml",
                                                  f"{localdir}/params/triggers.yaml",
                                                  update=True)

The parameters object can be manipualted freely by the user and then passed to the Configurator class to be used in the analysis. The parameters are then dumped along with the analysis output to preserve them.

Configuration#

A specific analysis run is defined in PocketCoffea using an instance of a Configurator code class. This class groups all the information about skimming, categorization, datasets, and outputs. The next sections of the tutorial briefly describes how to configure it for the Zmumu analysis.

The configurator instance is created inside the main configuration file example_config.py and assied to a variable called cfg. This special name is used by the framework when the file is passed to the runner.py script to be executed.

from pocket_coffea.utils.configurator import Configurator
from pocket_coffea.lib.cut_definition import Cut
from pocket_coffea.lib.cut_functions import get_nObj_min, get_HLTsel
from pocket_coffea.parameters.cuts import passthrough
from pocket_coffea.parameters.histograms import *
from workflow import ZmumuBaseProcessor
from custom_cut_functions import *


cfg = Configurator(
    parameters = parameters,
    datasets = {
        "jsons": [f"{localdir}/datasets/DATA_SingleMuon.json",
                  f"{localdir}/datasets/DYJetsToLL_M-50.json"
                    ],
        "filter" : {
            "samples": ["DATA_SingleMuon",
                        "DYJetsToLL"],
            "samples_exclude" : [],
            "year": ['2018']
        }
    },
    #.....continues
    workflow = ZmumuBaseProcessor,

Datasets are specified by passing the list of json to be used and they can be filtered by year of sample type.

Parameters are also passed to the Configurator as well as the workflow class to be used in the processing. This class is user defined in the workflow.py file locally in the folder.

Define selections#

The selections are performed at two levels:

Object preselection: selecting the “good” objects that will be used in the final analysis (e.g. JetGood, MuonGood, ElectronGood…).
Event selection: selections on the events that enter the final analysis, done in three steps:
1. Skim and trigger: loose cut on the events and trigger requirements.
2. Preselection: baseline event selection for the analysis.
3. Categorization: selection to split the events passing the event preselection into different categories (e.g. signal region, control region).

Object preselection#

To select the objects entering the final analysis, we need to specify a series of cut parameters for the leptons and jets in the file params/object_preselection.yaml. These selections include the \(p_T\), \(\n\) acceptance cuts, the object identification working points, the muon isolation, the b-tagging working point, etc.

For the \(Z\rightarrow \mu\mu\) analysis, we just use the standard definitions for the muon, electron and jet objects:

object_preselection:
  Muon:
    pt: 15
    eta: 2.4
    iso: 0.25  #PFIsoLoose
    id: tightId

  Electron:
    pt: 15
    eta: 2.4
    iso: 0.06
    id: mvaFall17V2Iso_WP80

  Jet:
    dr_lepton: 0.4
    pt: 30
    eta: 2.4
    jetId: 2
    puId:
      wp: L
      value: 4
      maxpt: 50.0
    btag:
      wp: M

This parameters are used by the functions which filters the object collections in the workflow.py file.

Event selection#

In PocketCoffea, the event selections are implemented with a dedicated Cut object, that stores both the information of the cutting function and its input parameters. Several factory Cut objects are available in pocket_coffea.lib.cut_functions, otherwise the user can define their own custom Cut objects.

Skim#

The skim selection of the events is performed “on the fly” to reduce the number of processed events. At this stage we also apply the HLT trigger requirements required by the analysis. The following steps of the analysis are performed only on the events passing the skim selection, while the others are discarded from the branch events, therefore reducing the computational load on the processor. In the config file, we specify two skim cuts: one is selecting events with at least one 18 GeV muon and the second is requiring the HLT SingleMuon path.

Triggers are specified in a parameter yaml files under the params dir (but the localtion is up to the user). The parameters are then loadedand added to the default parameters in the preamble of the config file.

In the preamble of example_config.py, which we pass as an argument to the factory function get_HLTsel():

cfg = Configurator(
    # .....   
    
    skim = [get_nObj_min(1, 18., "Muon"),
            # Asking only SingleMuon triggers since we are only using SingleMuon PD data
            get_HLTsel(primaryDatasets=["SingleMuon"])], 
    

Event preselection#

In the \(Z\rightarrow \mu\mu\) analysis, we want to select events with exactly two muons with opposite charge. In addition, we require a cut on the leading muon pT and on the dilepton invariant mass, to select the Z boson mass window. The parameters are directly passed to the constructor of the Cut object as the dictionary params. We can define the function dimuon and the Cut object dimuon_presel in the preamble of the config script:

def dimuon(events, params, year, sample, **kwargs):
    # Masks for same-flavor (SF) and opposite-sign (OS)
    SF = ((events.nMuonGood == 2) & (events.nElectronGood == 0))
    OS = events.ll.charge == 0$
    mask = (
       (events.nLeptonGood == 2)
       & (ak.firsts(events.MuonGood.pt) > params["pt_leading_muon"])
       & OS & SF
       & (events.ll.mass > params["mll"]["low"])
       & (events.ll.mass < params["mll"]["high"])
    )

    # Pad None values with False
    return ak.where(ak.is_none(mask), False, mask)

dimuon_presel = Cut(
    name="dilepton",
    params={
       "pt_leading_muon": 25,
       "mll": {'low': 25, 'high': 2000},
    },
    function=dimuon,
 )

In a scenario of an analysis requiring several different cuts, a dedicated library of cuts and functions can be defined in a separate file and imported in the config file. This is the strategy implemented in the example (and recommended to keep the config file minimal). Have a loot at the custom_cut_functions.py file.

The preselections field in the config file is updated accordingly:

cfg = Configurator(
    # .....   
    
    preselections = [dimuon_presel],

Categorization#

In the toy \(Z\rightarrow \mu\mu\) analysis, no further categorization of the events is performed. Only a baseline category is defined with the passthrough factory cut that is just passing the events through without any further selection:

cfg = Configurator(
    # .....   
    
    categories = {
        "baseline": [passthrough],
    }, 

If for example \(Z\rightarrow ee\) events were also included in the analysis, one could have defined a more general “dilepton” preselection and categorized the events as 2e or 2mu depending if they contain two electrons or two muons, respectively.

Define weights and variations#

The application of the nominal value of scale factors and weights is switched on and off just by adding the corresponding key in the weights dictionary:

cfg = Configurator(
    # .....

    weights = {
         "common": {
             "inclusive": ["genWeight","lumi","XS",
                           "pileup",
                           "sf_mu_id","sf_mu_iso",
                           ],
             "bycategory" : {
             }
         },
         "bysample": {
         }
     },cfg = {
       ...
       "weights": {
          "common": {
             "inclusive": ["genWeight","lumi","XS",
                           "pileup",
                           "sf_mu_id","sf_mu_iso",
                           ],
             "bycategory" : {
             }
         },
         "bysample": {
         }
       },
       ...
    }

In our case, we are applying the nominal scaling of Monte Carlo by lumi * XS / genWeight together with the pileup reweighting and the muon ID and isolation scale factors. The reweighting of the events is managed internally by the module WeightsManager.

To store also the up and down systematic variations corresponding to a given weight, one can specify it in the variations dictionary:

cfg = Configurator(
    # .....
    variations = {
        "weights": {
            "common": {
                "inclusive": [  "pileup",
                                "sf_mu_id", "sf_mu_iso"
                              ],
                "bycategory" : {
                }
            },
        "bysample": {
        }    
        },
    },

In this case we will store the variations corresponding to the systematic variation of pileup and the muon ID and isolation scale factors. These systematic uncertainties will be included in the final plots.

Define histograms#

In PocketCoffea histograms can be defined directly from the configuration: look at the variable dictionary under example_config.py. Histograms are defined with the options specified in hist_manager.py. An histogram is a collection of Axis objects with additional options for excluding/including variables, samples, and systematic variations.

Tip

Histogram configuration is described in details here

In order to create a user defined histogram add Axis as a list (1 element for 1D-hist, 2 elements for 2D-hist)

cfg = Configurator(
    # .....
    
   variables : {
       
           # 1D plots
           "mll" : HistConf( [Axis(coll="ll", field="mass", bins=100, start=50, stop=150, label=r"$M_{\ell\ell}$ [GeV]")] 
    },
	
	# coll : collection/objects under events
	# field: fields under collections
	# bins, start, stop: # bins, axis-min, axis-max
	# label: axis label name

The collection is the name used to access the fields of the events main dataset (the NanoAOD Events tree). The field specifies the specific array to use. If a field is global in the events, e.g. a user defined arrays in events, just use coll=events. Please refer to the Axis code for a complete description of the options.

There are some predefined hist dictionaries ready to be built for the user:

cfg = Configurator(
    # .....
    
   variables : {
        **count_hist(name="nJets", coll="JetGood",bins=8, start=0, stop=8),
	    # Muon kinematics
	    **muon_hists(coll="MuonGood", pos=0),
	    # Jet kinematics
        **jet_hists(coll="JetGood", pos=0),
    },
       

Run the processor#

Run the coffea processor to get .coffea output files. The coffea processor can be run locally or be scaled out to clusters:

iterative execution runs single thread locally and it useful for debugging
futures execute the processor in multiple threads locally and it can be usefull for fast processing of a small amount of file.
dask@lxplus uses the dask scheduler with lxplus the configuration to send out workers on HTCondor jobs.

The executors are configured in pocket_coffea/executors module for a set of predefined grid-sites. Please get in contact with the developers and open a PR if you want to include your specific site to the supported list.

Tip

Have a look at Running for more details about the configuration of the different execturos.

In this tutorial we assume the use of lxplus, but the example should work fine also on other sites with the iterative or futures setup. The Dask scheduler execution needs to be configured properly to send out jobs in your facility.

## First let's test the running locally with  --test for iterative processor with ``--limit-chunks/-lc``(default:2) and ``--limit-files/-lf``(default:1)
runner.py --cfg example_config.py --test  -o output_test

We can now submit the full processing on the HTcondor cluster with dask:

## change the --executor and numbers of jobs with -s/--scaleout

runner.py --cfg example_config.py  --full --executor dask@lxplus  --scaleout 10  -o output_dask

The scaleout configurations really depends on cluster and schedulers with different sites(lxplus, LPC, naf-desy). The default options for the executor dask@lxplus are contained here. But the user can add more custom run options with a .yaml file:

$> cat custom_run_options.yaml

queue: espresso
chunksize: 50000
scaleout: 10

Try now to run with custom options:

runner.py --cfg example_config.py  --full --executor dask@lxplus --run-options custom_run_options.yaml  -o output_dask

The output of the script will be similar to

$ runner.py --cfg example_config.py  --executor dask@lxplus --run-options custom_run_options.yaml  -o output_dask

    ____             __        __  ______      ________          
   / __ \____  _____/ /_____  / /_/ ____/___  / __/ __/__  ____ _
  / /_/ / __ \/ ___/ //_/ _ \/ __/ /   / __ \/ /_/ /_/ _ \/ __ `/
 / ____/ /_/ / /__/ ,< /  __/ /_/ /___/ /_/ / __/ __/  __/ /_/ / 
/_/    \____/\___/_/|_|\___/\__/\____/\____/_/ /_/  \___/\__,_/  
                                                                 

Loading the configuration file...
[INFO    ] Configurator instance:
  - Workflow: <class 'workflow.ZmumuBaseProcessor'>
  - N. datasets: 5 
   -- Dataset: DATA_SingleMuon_2018_EraA,  Sample: DATA_SingleMuon, N. files: 92, N. events: 241608232
   -- Dataset: DATA_SingleMuon_2018_EraB,  Sample: DATA_SingleMuon, N. files: 51, N. events: 119918017
   -- Dataset: DATA_SingleMuon_2018_EraC,  Sample: DATA_SingleMuon, N. files: 56, N. events: 109986009
   -- Dataset: DATA_SingleMuon_2018_EraD,  Sample: DATA_SingleMuon, N. files: 194, N. events: 513909894
   -- Dataset: DYJetsToLL_M-50_2018,  Sample: DYJetsToLL, N. files: 204, N. events: 195510810
  - Subsamples:
   -- Sample DATA_SingleMuon: StandardSelection ['DATA_SingleMuon'], (1 categories)
   -- Sample DYJetsToLL: StandardSelection ['DYJetsToLL'], (1 categories)
  - Skim: ['nMuon_min1_pt18.0', 'HLT_trigger_SingleMuon']
  - Preselection: ['dilepton']
  - Categories: StandardSelection ['baseline'], (1 categories)
  - Variables:  ['MuonGood_eta_1', 'MuonGood_pt_1', 'MuonGood_phi_1', 'nElectronGood', 'nMuonGood', 'nJets', 'nBJets', 'JetGood_eta_1', 'JetGood_pt_1', 'JetGood_phi_1', 'JetGood_btagDeepFlavB_1', 'JetGood_eta_2', 'JetGood_pt_2', 'JetGood_phi_2', 'JetGood_btagDeepFlavB_2', 'mll']
  - available weights variations: {'DATA_SingleMuon': ['nominal', 'sf_mu_iso', 'pileup', 'sf_mu_id'], 'DYJetsToLL': ['nominal', 'sf_mu_iso', 'pileup', 'sf_mu_id']} 
  - available shape variations: {'DATA_SingleMuon': [], 'DYJetsToLL': []}
Saving config file to output_all/config.json

# DASK output starts here

Produce plots#

Once the processing is done, the output folder looks like

(pocket-coffea) ➜  output_folder git:(main) ✗ lrt
total 146M
-rw-r--r-- 1 user group  31K Jun 19 11:34 parameters_dump.yaml
-rw-r--r-- 1 user group 126K Jun 19 11:34 config.json
-rw-r--r-- 1 user group  20M Jun 19 11:34 configurator.pkl
-rw-r--r-- 1 user group 144M Jun 19 11:52 output_all.coffea

The parameters used in the processing are dumped, as well as a human-readable version of the config file in json format, and a machine-readable Configurator object in cloudpickle format.

To produce plots, for each category and variable defined in the configuration a make_plots.py script is available:

$ make_plots.py --help

usage: make_plots.py [-h] [--input-dir INPUT_DIR] [--cfg CFG] [-op OVERWRITE_PARAMETERS [OVERWRITE_PARAMETERS ...]] [-o OUTPUTDIR] [-i INPUTFILE] [-j WORKERS] [-oc ONLY_CAT [ONLY_CAT ...]] [-os ONLY_SYST [ONLY_SYST ...]] [-e EXCLUDE_HIST [EXCLUDE_HIST ...]] [-oh ONLY_HIST [ONLY_HIST ...]] [--split-systematics] [--partial-unc-band]
                     [--overwrite] [--log] [--density] [-v VERBOSE]

Plot histograms from coffea file

optional arguments:
  -h, --help            show this help message and exit
  --input-dir INPUT_DIR
                        Directory with cofea files and parameters
  --cfg CFG             YAML file with all the analysis parameters
  -op OVERWRITE_PARAMETERS [OVERWRITE_PARAMETERS ...], --overwrite-parameters OVERWRITE_PARAMETERS [OVERWRITE_PARAMETERS ...]
                        YAML file with plotting parameters to overwrite default parameters
  -o OUTPUTDIR, --outputdir OUTPUTDIR
                        Output folder
  -i INPUTFILE, --inputfile INPUTFILE
                        Input file
  -j WORKERS, --workers WORKERS
                        Number of parallel workers to use for plotting
  -oc ONLY_CAT [ONLY_CAT ...], --only-cat ONLY_CAT [ONLY_CAT ...]
                        Filter categories with string
  -os ONLY_SYST [ONLY_SYST ...], --only-syst ONLY_SYST [ONLY_SYST ...]
                        Filter systematics with a list of strings
  -e EXCLUDE_HIST [EXCLUDE_HIST ...], --exclude-hist EXCLUDE_HIST [EXCLUDE_HIST ...]
                        Exclude histograms with a list of regular expression strings
  -oh ONLY_HIST [ONLY_HIST ...], --only-hist ONLY_HIST [ONLY_HIST ...]
                        Filter histograms with a list of regular expression strings
  --split-systematics   Split systematic uncertainties in the ratio plot
  --partial-unc-band    Plot only the partial uncertainty band corresponding to the systematics specified as the argument `only_syst`
  --overwrite, --over   Overwrite plots in output folder
  --log                 Set y-axis scale to log
  --density             Set density parameter to have a normalized plot
  -v VERBOSE, --verbose VERBOSE
                        Verbose level for debugging. Higher the number more stuff is printed.

By running:

$ cd output
$ make_plots.py -i output_all.coffea --cfg parameters_dump.yaml -o plots

all the plots are created in the plots folder. The structure of the output and the dataset metadata dictionary contained in the output file are used to compose the samples list forming the stack of histograms.

Additionally, one can customize the plotting style by passing a custom yaml file with the desired plotting options an additional argument:

$ make_plots.py -i output_all.coffea --cfg parameters_dump.yaml -o plots -op plotting_style.yaml

More instructions on how to customize the plotting parameters in plotting_style.yaml can be found in the Plotting section.

Full analysis examples

Contents

Full analysis examples#

Installation#

Install PocketCoffea#

Configuration files#

Dataset preparation#

Build datasets metadata#

Analysis configuration#

Parameters#

Configuration#

Define selections#

Object preselection#

Event selection#

Skim#

Event preselection#

Categorization#

Define weights and variations#

Define histograms#

Run the processor#

Produce plots#