pocket_coffea.utils.stat package

Contents

pocket_coffea.utils.stat package#

Submodules#

pocket_coffea.utils.stat.combine module#

Datacard Class and Utilities for CMS Combine Tool

class pocket_coffea.utils.stat.combine.Datacard(histograms: dict[str, dict[str, Hist]], datasets_metadata: dict[str, dict[str, dict]], cutflow: dict[str, dict[str, float]], years: list[str], mc_processes: MCProcesses, systematics: Systematics, category: str, data_processes: DataProcesses | None = None, mcstat: bool | dict = True, bins_edges: list[float] | None = None, bin_prefix: str | None = None, bin_suffix: str | None = None, verbose: bool = True)#

Bases: object

Datacard containing processes, systematics and write utilities.

Parameters:
  • histograms (dict[str, dict[str, hist.Hist]]) – Dict with histograms for each sample

  • datasets_metadata (dict[str, dict[str, dict]]) – Metadata for datasets

  • cutflow (dict[str, dict[str, float]]) – Cutflow information for datasets

  • years (list[str]) – Years of data taking

  • mc_processes (MCProcesses) – mc_processes

  • systematics (Systematics) – systematic uncertainties

  • category (str) – Category in datacard

  • data_processes (DataProcesses, optional) – Data processes, defaults to None

  • mcstat (bool | dict, optional) – Whether to include MC statistics, you can also pass a dict with the options accepted by combine, defaults to True

  • bins_edges (list[float], optional) – Bin edges for rebinning histograms, defaults to None

  • bin_prefix (str, optional) – prefix for the bin name, defaults to None

  • bin_suffix (str, optional) – suffix for the bin name, defaults to None

property adjust_columns#
property adjust_first_column#
property adjust_syst_colum#
property bin: str#

Name of the bin in the datacard

content(shapes_filename: str) str#

Generate the content of the datacard.

Parameters:

shapes_filename (str) – The filename of the root file containing the shape histograms.

Returns:

Content of the datacard as a string.

Return type:

str

create_shape_histogram_dict(is_data: bool = False) dict[str, Hist]#

Create a dictionary of histograms for each process and systematic.

Parameters:

is_data (bool, optional) – Flag to indicate if the datacard is for data, defaults to False

Returns:

dictionary of histograms, keys are process_systematic

Return type:

dict[str, hist.Hist]

dump(directory: PathLike, card_name: str = 'datacard.txt', shapes_name: str = 'shapes.root') None#

Dump datacard and shapes to a directory.

Parameters:
  • directory (os.PathLike) – Directory to dump the datacard and shapes

  • card_name (str, optional) – name of the datacard file, defaults to “datacard.txt”

  • shapes_filename (str, optional) – name of the shapes file, defaults to “shapes.root”

expectation_section() str#
get_datasets_by_sample(sample: str, year: str | None = None) list[str]#

Retrieve the list of dataset names for a given sample and optionally a specific year.

Parameters:
  • sample (str) – The sample name for which to retrieve datasets.

  • year (str, optional, default=None) – The year (data-taking period) to filter datasets. If None (default), datasets from all years in self.years are returned.

Returns:

List of dataset names corresponding to the sample (and year, if specified).

Return type:

list[str]

property imax#

Number of bins in the datacard

is_empty_dataset(dataset: str) bool#

Check if dataset is empty

property jmax#

Number of background processes + number of signal processes - 1

property kmax#

Number of nuisance parameters in the datacard

property mcstat_config: dict#

Return the configuration for MC statistics.

mcstat_section() str#
property observation#

Number of observed events in the datacard

observation_section() str#
preamble() str#
rate(process: str, systematic='nominal') float#

Rate of a process in the datacard

rate_parameters_section() str#
rearrange_histograms(is_data: bool = False) Hist#

Rearrange histograms from pocket_coffea output format to match processes and systematics in one histogram.

Parameters:

is_data (bool, optional) – Flag to indicate if the datacard is for data, defaults to False

Returns:

Rearranged histogram

Return type:

hist.Hist

shape_section(shapes_name: str) str#

shapes process channel file histogram [histogram_with_systematics]

property shape_variations: list[str]#
systematics_section() str#
pocket_coffea.utils.stat.combine.combine_datacards(datacards: dict[Datacard], directory: str, path: str = 'combine_cards.sh', card_name: str = 'datacard_combined.txt', workspace_name: str = 'workspace.root', channel_masks: bool = False) None#

Write the bash script to combine datacards from different categories.

Parameters:
  • datacards (dict[Datacard]) – Dictionary mapping output filenames to Datacard objects to combine.

  • directory (str) – Directory to save the bash script and combined datacard.

  • path (str) – Path (relative to directory) for the bash script file. Must end with .sh.

  • card_name (str) – Name of the combined datacard file.

  • workspace_name (str) – Name of the output workspace file.

  • channel_masks (bool) – Whether to add –channel-masks option to text2workspace.py.

pocket_coffea.utils.stat.processes module#

Physical Processes as Dataclasses and Utilities

class pocket_coffea.utils.stat.processes.DataProcess(name: str, samples: Iterable, label: str | None = None, *, years: Iterable)#

Bases: Process

Class to store information of a Data process

Parameters:
  • name – Name of the process

  • samples – Iterable of sample names associated with the process

  • years – Iterable of years the process is relevant for

  • label – Label for the process, defaults to name if not specified

Inherits from Process and sets is_data to True by default.

years: Iterable#
class pocket_coffea.utils.stat.processes.DataProcesses(processes: list[DataProcess])#

Bases: dict[str, DataProcess]

Custom dict to store information of multiple data processes.

Parameters:

processes (list[Process]) – List of processes

class pocket_coffea.utils.stat.processes.MCProcess(name: str, samples: Iterable, label: str | None = None, *, is_signal: bool, years: Iterable, has_rateParam: bool = False)#

Bases: Process

Class to store information of a Monte Carlo process

Parameters:
  • name – Name of the process

  • samples – Iterable of sample names associated with the process

  • years – Iterable of years the process is relevant for

  • is_signal – Whether the process is a signal process

  • has_rateParam – Whether the process has a rate parameter, defaults to False

  • label – Label for the process, defaults to name if not specified

Inherits from Process and sets is_data to False by default.

has_rateParam: bool = False#
is_signal: bool#
years: Iterable#
class pocket_coffea.utils.stat.processes.MCProcesses(processes: list[MCProcess])#

Bases: dict[str, MCProcess]

Custom dict to store information of multiple MC processes.

Parameters:

processes (list[Process]) – List of processes

property background_processes: list[str]#

Names of all Background Processes.

property n_processes: int#

Number of Processes

property signal_processes: list[str]#

Names of all Signal MC Processes.

class pocket_coffea.utils.stat.processes.Process(name: str, samples: Iterable, label: str | None = None)#

Bases: object

Class to store information of a physical process

Parameters:
  • name – Name of the process

  • samples – Iterable of sample names associated with the process

  • label – Label for the process, defaults to name if not specified

  • is_data – Whether the process is data (needs to be set by subclasses)

Note

It is recommended to use the MCProcess or DataProcess subclasses directly. This base class is primarily for shared attributes and methods.

is_data: bool#
label: str = None#
name: str#
samples: Iterable#

pocket_coffea.utils.stat.systematics module#

Systematic Uncertainties and Utilities for Statistical Analysis

class pocket_coffea.utils.stat.systematics.SystematicUncertainty(name: str, typ: str, processes: list[str] | tuple[str] | dict[str, float], years: list[str] | tuple[str], value: float | tuple[float] | None = None, datacard_name: str | None = None, coffea_name_alias: str | dict[str, str] | None = None)#

Bases: object

Store information about one systematic uncertainty.

Parameters:
  • name – Name of the systematic uncertainty.

  • typ – Type of the systematic uncertainty (e.g. ‘shape’, ‘lnN’).

  • processes – List or tuple of process names affected, or a dict mapping process names to values.

  • years – List or tuple of years the uncertainty applies to.

  • value – Value (float or tuple of floats) of the uncertainty for all processes, or None if using a dict for processes.

  • datacard_name – Name of the systematic uncertainty in the datacard. Defaults to name if not specified.

  • coffea_name_alias

    Name of the shape variation as stored in the coffea output histograms. Use this when the coffea variation name differs from the canonical name — most commonly when one logical systematic is recorded under different names per process (e.g. parton-shower weights named differently for different generators). Can be a single string applied to all processes, or a dict mapping process names to per-process alias strings. Processes missing from the dict fall back to name. Defaults to name if not specified.

    Note: as a plain string this field is largely redundant with name — if you only need a global rename, just set name to the coffea variation name and let datacard_name carry the datacard-side label. coffea_name_alias earns its keep in the dict form, where the alias varies by process.

coffea_name_alias: str | dict[str, str] = None#
datacard_name: str = None#
get_coffea_name(process: str) str#

Return the coffea variation alias for a given process.

Falls back to name when a dict alias does not list process.

name: str#
processes: list[str] | tuple[str] | dict[str, float]#
typ: str#
value: float | tuple[float] = None#
years: list[str] | tuple[str]#
class pocket_coffea.utils.stat.systematics.Systematics(systematics: list[SystematicUncertainty])#

Bases: dict[str, SystematicUncertainty]

Store information of a list of systematic uncertainties

get_systematics_by_process(process: Process) list[SystematicUncertainty]#

List of Systematics that affect a specific process.

get_systematics_by_type(syst_type: str) dict[SystematicUncertainty]#

Dict of Systematics of a specific type.

list_type(syst_type: str) list[str]#

List of Names of Systematics of a specific type.

n_systematics() int#

Number of Systematics

property variations_names: list[str]#

List of Names of Shape Variations.

Module contents#

class pocket_coffea.utils.stat.DataProcess(name: str, samples: Iterable, label: str | None = None, *, years: Iterable)#

Bases: Process

Class to store information of a Data process

Parameters:
  • name – Name of the process

  • samples – Iterable of sample names associated with the process

  • years – Iterable of years the process is relevant for

  • label – Label for the process, defaults to name if not specified

Inherits from Process and sets is_data to True by default.

is_data: bool#
name: str#
samples: Iterable#
years: Iterable#
class pocket_coffea.utils.stat.DataProcesses(processes: list[DataProcess])#

Bases: dict[str, DataProcess]

Custom dict to store information of multiple data processes.

Parameters:

processes (list[Process]) – List of processes

class pocket_coffea.utils.stat.Datacard(histograms: dict[str, dict[str, Hist]], datasets_metadata: dict[str, dict[str, dict]], cutflow: dict[str, dict[str, float]], years: list[str], mc_processes: MCProcesses, systematics: Systematics, category: str, data_processes: DataProcesses | None = None, mcstat: bool | dict = True, bins_edges: list[float] | None = None, bin_prefix: str | None = None, bin_suffix: str | None = None, verbose: bool = True)#

Bases: object

Datacard containing processes, systematics and write utilities.

Parameters:
  • histograms (dict[str, dict[str, hist.Hist]]) – Dict with histograms for each sample

  • datasets_metadata (dict[str, dict[str, dict]]) – Metadata for datasets

  • cutflow (dict[str, dict[str, float]]) – Cutflow information for datasets

  • years (list[str]) – Years of data taking

  • mc_processes (MCProcesses) – mc_processes

  • systematics (Systematics) – systematic uncertainties

  • category (str) – Category in datacard

  • data_processes (DataProcesses, optional) – Data processes, defaults to None

  • mcstat (bool | dict, optional) – Whether to include MC statistics, you can also pass a dict with the options accepted by combine, defaults to True

  • bins_edges (list[float], optional) – Bin edges for rebinning histograms, defaults to None

  • bin_prefix (str, optional) – prefix for the bin name, defaults to None

  • bin_suffix (str, optional) – suffix for the bin name, defaults to None

property adjust_columns#
property adjust_first_column#
property adjust_syst_colum#
property bin: str#

Name of the bin in the datacard

content(shapes_filename: str) str#

Generate the content of the datacard.

Parameters:

shapes_filename (str) – The filename of the root file containing the shape histograms.

Returns:

Content of the datacard as a string.

Return type:

str

create_shape_histogram_dict(is_data: bool = False) dict[str, Hist]#

Create a dictionary of histograms for each process and systematic.

Parameters:

is_data (bool, optional) – Flag to indicate if the datacard is for data, defaults to False

Returns:

dictionary of histograms, keys are process_systematic

Return type:

dict[str, hist.Hist]

dump(directory: PathLike, card_name: str = 'datacard.txt', shapes_name: str = 'shapes.root') None#

Dump datacard and shapes to a directory.

Parameters:
  • directory (os.PathLike) – Directory to dump the datacard and shapes

  • card_name (str, optional) – name of the datacard file, defaults to “datacard.txt”

  • shapes_filename (str, optional) – name of the shapes file, defaults to “shapes.root”

expectation_section() str#
get_datasets_by_sample(sample: str, year: str | None = None) list[str]#

Retrieve the list of dataset names for a given sample and optionally a specific year.

Parameters:
  • sample (str) – The sample name for which to retrieve datasets.

  • year (str, optional, default=None) – The year (data-taking period) to filter datasets. If None (default), datasets from all years in self.years are returned.

Returns:

List of dataset names corresponding to the sample (and year, if specified).

Return type:

list[str]

property imax#

Number of bins in the datacard

is_empty_dataset(dataset: str) bool#

Check if dataset is empty

property jmax#

Number of background processes + number of signal processes - 1

property kmax#

Number of nuisance parameters in the datacard

property mcstat_config: dict#

Return the configuration for MC statistics.

mcstat_section() str#
property observation#

Number of observed events in the datacard

observation_section() str#
preamble() str#
rate(process: str, systematic='nominal') float#

Rate of a process in the datacard

rate_parameters_section() str#
rearrange_histograms(is_data: bool = False) Hist#

Rearrange histograms from pocket_coffea output format to match processes and systematics in one histogram.

Parameters:

is_data (bool, optional) – Flag to indicate if the datacard is for data, defaults to False

Returns:

Rearranged histogram

Return type:

hist.Hist

shape_section(shapes_name: str) str#

shapes process channel file histogram [histogram_with_systematics]

property shape_variations: list[str]#
systematics_section() str#
class pocket_coffea.utils.stat.MCProcess(name: str, samples: Iterable, label: str | None = None, *, is_signal: bool, years: Iterable, has_rateParam: bool = False)#

Bases: Process

Class to store information of a Monte Carlo process

Parameters:
  • name – Name of the process

  • samples – Iterable of sample names associated with the process

  • years – Iterable of years the process is relevant for

  • is_signal – Whether the process is a signal process

  • has_rateParam – Whether the process has a rate parameter, defaults to False

  • label – Label for the process, defaults to name if not specified

Inherits from Process and sets is_data to False by default.

has_rateParam: bool = False#
is_data: bool#
is_signal: bool#
name: str#
samples: Iterable#
years: Iterable#
class pocket_coffea.utils.stat.MCProcesses(processes: list[MCProcess])#

Bases: dict[str, MCProcess]

Custom dict to store information of multiple MC processes.

Parameters:

processes (list[Process]) – List of processes

property background_processes: list[str]#

Names of all Background Processes.

property n_processes: int#

Number of Processes

property signal_processes: list[str]#

Names of all Signal MC Processes.

class pocket_coffea.utils.stat.SystematicUncertainty(name: str, typ: str, processes: list[str] | tuple[str] | dict[str, float], years: list[str] | tuple[str], value: float | tuple[float] | None = None, datacard_name: str | None = None, coffea_name_alias: str | dict[str, str] | None = None)#

Bases: object

Store information about one systematic uncertainty.

Parameters:
  • name – Name of the systematic uncertainty.

  • typ – Type of the systematic uncertainty (e.g. ‘shape’, ‘lnN’).

  • processes – List or tuple of process names affected, or a dict mapping process names to values.

  • years – List or tuple of years the uncertainty applies to.

  • value – Value (float or tuple of floats) of the uncertainty for all processes, or None if using a dict for processes.

  • datacard_name – Name of the systematic uncertainty in the datacard. Defaults to name if not specified.

  • coffea_name_alias

    Name of the shape variation as stored in the coffea output histograms. Use this when the coffea variation name differs from the canonical name — most commonly when one logical systematic is recorded under different names per process (e.g. parton-shower weights named differently for different generators). Can be a single string applied to all processes, or a dict mapping process names to per-process alias strings. Processes missing from the dict fall back to name. Defaults to name if not specified.

    Note: as a plain string this field is largely redundant with name — if you only need a global rename, just set name to the coffea variation name and let datacard_name carry the datacard-side label. coffea_name_alias earns its keep in the dict form, where the alias varies by process.

coffea_name_alias: str | dict[str, str] = None#
datacard_name: str = None#
get_coffea_name(process: str) str#

Return the coffea variation alias for a given process.

Falls back to name when a dict alias does not list process.

name: str#
processes: list[str] | tuple[str] | dict[str, float]#
typ: str#
value: float | tuple[float] = None#
years: list[str] | tuple[str]#
class pocket_coffea.utils.stat.Systematics(systematics: list[SystematicUncertainty])#

Bases: dict[str, SystematicUncertainty]

Store information of a list of systematic uncertainties

get_systematics_by_process(process: Process) list[SystematicUncertainty]#

List of Systematics that affect a specific process.

get_systematics_by_type(syst_type: str) dict[SystematicUncertainty]#

Dict of Systematics of a specific type.

list_type(syst_type: str) list[str]#

List of Names of Systematics of a specific type.

n_systematics() int#

Number of Systematics

property variations_names: list[str]#

List of Names of Shape Variations.