pocket_coffea.utils.stat package#
Submodules#
pocket_coffea.utils.stat.combine module#
Datacard Class and Utilities for CMS Combine Tool
- class pocket_coffea.utils.stat.combine.Datacard(histograms: dict[str, dict[str, Hist]], datasets_metadata: dict[str, dict[str, dict]], cutflow: dict[str, dict[str, float]], years: list[str], mc_processes: MCProcesses, systematics: Systematics, category: str, data_processes: DataProcesses | None = None, mcstat: bool | dict = True, bins_edges: list[float] | None = None, bin_prefix: str | None = None, bin_suffix: str | None = None, verbose: bool = True)#
Bases:
objectDatacard containing processes, systematics and write utilities.
- Parameters:
histograms (dict[str, dict[str, hist.Hist]]) – Dict with histograms for each sample
datasets_metadata (dict[str, dict[str, dict]]) – Metadata for datasets
cutflow (dict[str, dict[str, float]]) – Cutflow information for datasets
years (list[str]) – Years of data taking
mc_processes (MCProcesses) – mc_processes
systematics (Systematics) – systematic uncertainties
category (str) – Category in datacard
data_processes (DataProcesses, optional) – Data processes, defaults to None
mcstat (bool | dict, optional) – Whether to include MC statistics, you can also pass a dict with the options accepted by combine, defaults to True
bins_edges (list[float], optional) – Bin edges for rebinning histograms, defaults to None
bin_prefix (str, optional) – prefix for the bin name, defaults to None
bin_suffix (str, optional) – suffix for the bin name, defaults to None
- property adjust_columns#
- property adjust_first_column#
- property adjust_syst_colum#
- property bin: str#
Name of the bin in the datacard
- content(shapes_filename: str) str#
Generate the content of the datacard.
- Parameters:
shapes_filename (str) – The filename of the root file containing the shape histograms.
- Returns:
Content of the datacard as a string.
- Return type:
str
- create_shape_histogram_dict(is_data: bool = False) dict[str, Hist]#
Create a dictionary of histograms for each process and systematic.
- Parameters:
is_data (bool, optional) – Flag to indicate if the datacard is for data, defaults to False
- Returns:
dictionary of histograms, keys are process_systematic
- Return type:
dict[str, hist.Hist]
- dump(directory: PathLike, card_name: str = 'datacard.txt', shapes_name: str = 'shapes.root') None#
Dump datacard and shapes to a directory.
- Parameters:
directory (os.PathLike) – Directory to dump the datacard and shapes
card_name (str, optional) – name of the datacard file, defaults to “datacard.txt”
shapes_filename (str, optional) – name of the shapes file, defaults to “shapes.root”
- expectation_section() str#
- get_datasets_by_sample(sample: str, year: str | None = None) list[str]#
Retrieve the list of dataset names for a given sample and optionally a specific year.
- Parameters:
sample (str) – The sample name for which to retrieve datasets.
year (str, optional, default=None) – The year (data-taking period) to filter datasets. If None (default), datasets from all years in self.years are returned.
- Returns:
List of dataset names corresponding to the sample (and year, if specified).
- Return type:
list[str]
- property imax#
Number of bins in the datacard
- is_empty_dataset(dataset: str) bool#
Check if dataset is empty
- property jmax#
Number of background processes + number of signal processes - 1
- property kmax#
Number of nuisance parameters in the datacard
- property mcstat_config: dict#
Return the configuration for MC statistics.
- mcstat_section() str#
- property observation#
Number of observed events in the datacard
- observation_section() str#
- preamble() str#
- rate(process: str, systematic='nominal') float#
Rate of a process in the datacard
- rate_parameters_section() str#
- rearrange_histograms(is_data: bool = False) Hist#
Rearrange histograms from pocket_coffea output format to match processes and systematics in one histogram.
- Parameters:
is_data (bool, optional) – Flag to indicate if the datacard is for data, defaults to False
- Returns:
Rearranged histogram
- Return type:
hist.Hist
- shape_section(shapes_name: str) str#
shapes process channel file histogram [histogram_with_systematics]
- property shape_variations: list[str]#
- systematics_section() str#
- pocket_coffea.utils.stat.combine.combine_datacards(datacards: dict[Datacard], directory: str, path: str = 'combine_cards.sh', card_name: str = 'datacard_combined.txt', workspace_name: str = 'workspace.root', channel_masks: bool = False) None#
Write the bash script to combine datacards from different categories.
- Parameters:
datacards (dict[Datacard]) – Dictionary mapping output filenames to Datacard objects to combine.
directory (str) – Directory to save the bash script and combined datacard.
path (str) – Path (relative to directory) for the bash script file. Must end with .sh.
card_name (str) – Name of the combined datacard file.
workspace_name (str) – Name of the output workspace file.
channel_masks (bool) – Whether to add –channel-masks option to text2workspace.py.
pocket_coffea.utils.stat.processes module#
Physical Processes as Dataclasses and Utilities
- class pocket_coffea.utils.stat.processes.DataProcess(name: str, samples: Iterable, label: str | None = None, *, years: Iterable)#
Bases:
ProcessClass to store information of a Data process
- Parameters:
name – Name of the process
samples – Iterable of sample names associated with the process
years – Iterable of years the process is relevant for
label – Label for the process, defaults to name if not specified
Inherits from Process and sets is_data to True by default.
- years: Iterable#
- class pocket_coffea.utils.stat.processes.DataProcesses(processes: list[DataProcess])#
Bases:
dict[str,DataProcess]Custom dict to store information of multiple data processes.
- Parameters:
processes (list[Process]) – List of processes
- class pocket_coffea.utils.stat.processes.MCProcess(name: str, samples: Iterable, label: str | None = None, *, is_signal: bool, years: Iterable, has_rateParam: bool = False)#
Bases:
ProcessClass to store information of a Monte Carlo process
- Parameters:
name – Name of the process
samples – Iterable of sample names associated with the process
years – Iterable of years the process is relevant for
is_signal – Whether the process is a signal process
has_rateParam – Whether the process has a rate parameter, defaults to False
label – Label for the process, defaults to name if not specified
Inherits from Process and sets is_data to False by default.
- has_rateParam: bool = False#
- is_signal: bool#
- years: Iterable#
- class pocket_coffea.utils.stat.processes.MCProcesses(processes: list[MCProcess])#
Bases:
dict[str,MCProcess]Custom dict to store information of multiple MC processes.
- Parameters:
processes (list[Process]) – List of processes
- property background_processes: list[str]#
Names of all Background Processes.
- property n_processes: int#
Number of Processes
- property signal_processes: list[str]#
Names of all Signal MC Processes.
- class pocket_coffea.utils.stat.processes.Process(name: str, samples: Iterable, label: str | None = None)#
Bases:
objectClass to store information of a physical process
- Parameters:
name – Name of the process
samples – Iterable of sample names associated with the process
label – Label for the process, defaults to name if not specified
is_data – Whether the process is data (needs to be set by subclasses)
Note
It is recommended to use the MCProcess or DataProcess subclasses directly. This base class is primarily for shared attributes and methods.
- is_data: bool#
- label: str = None#
- name: str#
- samples: Iterable#
pocket_coffea.utils.stat.systematics module#
Systematic Uncertainties and Utilities for Statistical Analysis
- class pocket_coffea.utils.stat.systematics.SystematicUncertainty(name: str, typ: str, processes: list[str] | tuple[str] | dict[str, float], years: list[str] | tuple[str], value: float | tuple[float] | None = None, datacard_name: str | None = None, coffea_name_alias: str | dict[str, str] | None = None)#
Bases:
objectStore information about one systematic uncertainty.
- Parameters:
name – Name of the systematic uncertainty.
typ – Type of the systematic uncertainty (e.g. ‘shape’, ‘lnN’).
processes – List or tuple of process names affected, or a dict mapping process names to values.
years – List or tuple of years the uncertainty applies to.
value – Value (float or tuple of floats) of the uncertainty for all processes, or None if using a dict for processes.
datacard_name – Name of the systematic uncertainty in the datacard. Defaults to name if not specified.
coffea_name_alias –
Name of the shape variation as stored in the coffea output histograms. Use this when the coffea variation name differs from the canonical name — most commonly when one logical systematic is recorded under different names per process (e.g. parton-shower weights named differently for different generators). Can be a single string applied to all processes, or a dict mapping process names to per-process alias strings. Processes missing from the dict fall back to name. Defaults to name if not specified.
Note: as a plain string this field is largely redundant with name — if you only need a global rename, just set name to the coffea variation name and let datacard_name carry the datacard-side label. coffea_name_alias earns its keep in the dict form, where the alias varies by process.
- coffea_name_alias: str | dict[str, str] = None#
- datacard_name: str = None#
- get_coffea_name(process: str) str#
Return the coffea variation alias for a given process.
Falls back to name when a dict alias does not list process.
- name: str#
- processes: list[str] | tuple[str] | dict[str, float]#
- typ: str#
- value: float | tuple[float] = None#
- years: list[str] | tuple[str]#
- class pocket_coffea.utils.stat.systematics.Systematics(systematics: list[SystematicUncertainty])#
Bases:
dict[str,SystematicUncertainty]Store information of a list of systematic uncertainties
- get_systematics_by_process(process: Process) list[SystematicUncertainty]#
List of Systematics that affect a specific process.
- get_systematics_by_type(syst_type: str) dict[SystematicUncertainty]#
Dict of Systematics of a specific type.
- list_type(syst_type: str) list[str]#
List of Names of Systematics of a specific type.
- n_systematics() int#
Number of Systematics
- property variations_names: list[str]#
List of Names of Shape Variations.
Module contents#
- class pocket_coffea.utils.stat.DataProcess(name: str, samples: Iterable, label: str | None = None, *, years: Iterable)#
Bases:
ProcessClass to store information of a Data process
- Parameters:
name – Name of the process
samples – Iterable of sample names associated with the process
years – Iterable of years the process is relevant for
label – Label for the process, defaults to name if not specified
Inherits from Process and sets is_data to True by default.
- is_data: bool#
- name: str#
- samples: Iterable#
- years: Iterable#
- class pocket_coffea.utils.stat.DataProcesses(processes: list[DataProcess])#
Bases:
dict[str,DataProcess]Custom dict to store information of multiple data processes.
- Parameters:
processes (list[Process]) – List of processes
- class pocket_coffea.utils.stat.Datacard(histograms: dict[str, dict[str, Hist]], datasets_metadata: dict[str, dict[str, dict]], cutflow: dict[str, dict[str, float]], years: list[str], mc_processes: MCProcesses, systematics: Systematics, category: str, data_processes: DataProcesses | None = None, mcstat: bool | dict = True, bins_edges: list[float] | None = None, bin_prefix: str | None = None, bin_suffix: str | None = None, verbose: bool = True)#
Bases:
objectDatacard containing processes, systematics and write utilities.
- Parameters:
histograms (dict[str, dict[str, hist.Hist]]) – Dict with histograms for each sample
datasets_metadata (dict[str, dict[str, dict]]) – Metadata for datasets
cutflow (dict[str, dict[str, float]]) – Cutflow information for datasets
years (list[str]) – Years of data taking
mc_processes (MCProcesses) – mc_processes
systematics (Systematics) – systematic uncertainties
category (str) – Category in datacard
data_processes (DataProcesses, optional) – Data processes, defaults to None
mcstat (bool | dict, optional) – Whether to include MC statistics, you can also pass a dict with the options accepted by combine, defaults to True
bins_edges (list[float], optional) – Bin edges for rebinning histograms, defaults to None
bin_prefix (str, optional) – prefix for the bin name, defaults to None
bin_suffix (str, optional) – suffix for the bin name, defaults to None
- property adjust_columns#
- property adjust_first_column#
- property adjust_syst_colum#
- property bin: str#
Name of the bin in the datacard
- content(shapes_filename: str) str#
Generate the content of the datacard.
- Parameters:
shapes_filename (str) – The filename of the root file containing the shape histograms.
- Returns:
Content of the datacard as a string.
- Return type:
str
- create_shape_histogram_dict(is_data: bool = False) dict[str, Hist]#
Create a dictionary of histograms for each process and systematic.
- Parameters:
is_data (bool, optional) – Flag to indicate if the datacard is for data, defaults to False
- Returns:
dictionary of histograms, keys are process_systematic
- Return type:
dict[str, hist.Hist]
- dump(directory: PathLike, card_name: str = 'datacard.txt', shapes_name: str = 'shapes.root') None#
Dump datacard and shapes to a directory.
- Parameters:
directory (os.PathLike) – Directory to dump the datacard and shapes
card_name (str, optional) – name of the datacard file, defaults to “datacard.txt”
shapes_filename (str, optional) – name of the shapes file, defaults to “shapes.root”
- expectation_section() str#
- get_datasets_by_sample(sample: str, year: str | None = None) list[str]#
Retrieve the list of dataset names for a given sample and optionally a specific year.
- Parameters:
sample (str) – The sample name for which to retrieve datasets.
year (str, optional, default=None) – The year (data-taking period) to filter datasets. If None (default), datasets from all years in self.years are returned.
- Returns:
List of dataset names corresponding to the sample (and year, if specified).
- Return type:
list[str]
- property imax#
Number of bins in the datacard
- is_empty_dataset(dataset: str) bool#
Check if dataset is empty
- property jmax#
Number of background processes + number of signal processes - 1
- property kmax#
Number of nuisance parameters in the datacard
- property mcstat_config: dict#
Return the configuration for MC statistics.
- mcstat_section() str#
- property observation#
Number of observed events in the datacard
- observation_section() str#
- preamble() str#
- rate(process: str, systematic='nominal') float#
Rate of a process in the datacard
- rate_parameters_section() str#
- rearrange_histograms(is_data: bool = False) Hist#
Rearrange histograms from pocket_coffea output format to match processes and systematics in one histogram.
- Parameters:
is_data (bool, optional) – Flag to indicate if the datacard is for data, defaults to False
- Returns:
Rearranged histogram
- Return type:
hist.Hist
- shape_section(shapes_name: str) str#
shapes process channel file histogram [histogram_with_systematics]
- property shape_variations: list[str]#
- systematics_section() str#
- class pocket_coffea.utils.stat.MCProcess(name: str, samples: Iterable, label: str | None = None, *, is_signal: bool, years: Iterable, has_rateParam: bool = False)#
Bases:
ProcessClass to store information of a Monte Carlo process
- Parameters:
name – Name of the process
samples – Iterable of sample names associated with the process
years – Iterable of years the process is relevant for
is_signal – Whether the process is a signal process
has_rateParam – Whether the process has a rate parameter, defaults to False
label – Label for the process, defaults to name if not specified
Inherits from Process and sets is_data to False by default.
- has_rateParam: bool = False#
- is_data: bool#
- is_signal: bool#
- name: str#
- samples: Iterable#
- years: Iterable#
- class pocket_coffea.utils.stat.MCProcesses(processes: list[MCProcess])#
Bases:
dict[str,MCProcess]Custom dict to store information of multiple MC processes.
- Parameters:
processes (list[Process]) – List of processes
- property background_processes: list[str]#
Names of all Background Processes.
- property n_processes: int#
Number of Processes
- property signal_processes: list[str]#
Names of all Signal MC Processes.
- class pocket_coffea.utils.stat.SystematicUncertainty(name: str, typ: str, processes: list[str] | tuple[str] | dict[str, float], years: list[str] | tuple[str], value: float | tuple[float] | None = None, datacard_name: str | None = None, coffea_name_alias: str | dict[str, str] | None = None)#
Bases:
objectStore information about one systematic uncertainty.
- Parameters:
name – Name of the systematic uncertainty.
typ – Type of the systematic uncertainty (e.g. ‘shape’, ‘lnN’).
processes – List or tuple of process names affected, or a dict mapping process names to values.
years – List or tuple of years the uncertainty applies to.
value – Value (float or tuple of floats) of the uncertainty for all processes, or None if using a dict for processes.
datacard_name – Name of the systematic uncertainty in the datacard. Defaults to name if not specified.
coffea_name_alias –
Name of the shape variation as stored in the coffea output histograms. Use this when the coffea variation name differs from the canonical name — most commonly when one logical systematic is recorded under different names per process (e.g. parton-shower weights named differently for different generators). Can be a single string applied to all processes, or a dict mapping process names to per-process alias strings. Processes missing from the dict fall back to name. Defaults to name if not specified.
Note: as a plain string this field is largely redundant with name — if you only need a global rename, just set name to the coffea variation name and let datacard_name carry the datacard-side label. coffea_name_alias earns its keep in the dict form, where the alias varies by process.
- coffea_name_alias: str | dict[str, str] = None#
- datacard_name: str = None#
- get_coffea_name(process: str) str#
Return the coffea variation alias for a given process.
Falls back to name when a dict alias does not list process.
- name: str#
- processes: list[str] | tuple[str] | dict[str, float]#
- typ: str#
- value: float | tuple[float] = None#
- years: list[str] | tuple[str]#
- class pocket_coffea.utils.stat.Systematics(systematics: list[SystematicUncertainty])#
Bases:
dict[str,SystematicUncertainty]Store information of a list of systematic uncertainties
- get_systematics_by_process(process: Process) list[SystematicUncertainty]#
List of Systematics that affect a specific process.
- get_systematics_by_type(syst_type: str) dict[SystematicUncertainty]#
Dict of Systematics of a specific type.
- list_type(syst_type: str) list[str]#
List of Names of Systematics of a specific type.
- n_systematics() int#
Number of Systematics
- property variations_names: list[str]#
List of Names of Shape Variations.