pocket_coffea.law_tasks package#

Subpackages#

Submodules#

pocket_coffea.law_tasks.utils module#

utility functions for law tasks in pocket_coffea

pocket_coffea.law_tasks.utils.create_datasets_paths(datasets: dict, split_by_year: bool = False, output_dir: str | PathLike | None = None) → list#

Create a set of dataset paths based on the given datasets dictionary. Split datasets by year if the split_by_year flag is set. The input datasets definition dictionary has to be according to pocket_coffea. The set of file paths is returned.

Parameters:

datasets (dict) – A dictionary containing dataset information.
split_by_year (bool, optional) – A flag indicating whether to split the datasets by year. Default is False.

Returns:

A list of dataset paths.

Return type:

list

pocket_coffea.law_tasks.utils.exclude_samples_from_plotting(plotting_style: dict, exclude_samples: list)#

Exclude specified samples from plotting.

Parameters:

plotting_style (dict) – The plotting style configuration.
exclude_samples (list[str]) – The list of sample names to exclude from plotting.

Returns:

The updated plotting style configuration.

Return type:

dict

Example:

>>> plotting_style = {"exclude_samples": []}
>>> exclude_samples = ["sample1", "sample2"]
>>> exclude_samples_from_plotting(plotting_style, exclude_samples)
{'exclude_samples': ['sample1', 'sample2']}

pocket_coffea.law_tasks.utils.extract_executor_and_site(executor: str) → tuple#

Extract executor and site from executor string.

Parameters:: executor (str) – Name of the executor.
Returns:: Tuple containing the name of the executor and cluster.
Return type:: tuple

pocket_coffea.law_tasks.utils.filter_items_by_regex(regex: str, items: Iterable[str], match: bool = True) → list#

Apply a regular expression on an iterable.

Parameters:

regex (str) – The regular expression to match.
items (Iterable[str]) – The iterable of items to match the regular expression on.
match (bool, optional) – Flag indicating whether to match the regular expression (True) or not (False). Default is True.

Returns:

The list of items that match (or do not match) the regular expression.

Return type:

list[str]

pocket_coffea.law_tasks.utils.get_executor(executor: str, run_options: dict, output_dir: str | PathLike)#

Get the executor factory based on the provided executor, run options, and output directory. Loads the module defined in pocket_coffea executors.

Parameters:

executor (str) – The name of the executor and possible site (e.g. dask@lxplus).
run_options (dict) – The run options for the executor.
output_dir (FileName) – The output directory for the executor.

Returns:

The executor factory.

Return type:

executors_base.ExecutorFactoryABC

Raises:

TypeError – If the executor factory is not of type executors_base.ExecutorFactoryABC.

pocket_coffea.law_tasks.utils.import_analysis_config(cfg: str | PathLike) → tuple[Configurator, ModuleType]#

Import the analysis configuration module and return the Configurator object.

Parameters:

cfg (FileName) – path to the config.py file

Raises:

AttributeError – if config.py has no attribute cfg
TypeError – if cfg is not of type Configurator (pocket_coffea)

Returns:

Configurator object and the imported module

Return type:

tuple[Configurator, ModuleType]

pocket_coffea.law_tasks.utils.load_analysis_config(cfg: str | PathLike, output_dir: str | PathLike | None = None, save: bool = True) → tuple[Configurator, dict]#

Load the analysis config.

Parameters:

cfg (FileName) – Path to the config file.
output_dir (FileName) – The output directory to save the configuration and parameters.
save (bool, optional) – Flag indicating whether to save the configuration and parameters. Default is True.

Raises:

AttributeError – If the config file does not have the attribute cfg.
TypeError – If cfg is not of type Configurator (pocket_coffea).

Returns:

A tuple containing the Configurator and run_options (if defined in config).

Return type:

tuple

pocket_coffea.law_tasks.utils.load_plotting_style(params_file: str | PathLike, custom_plot_style: str | PathLike | None = None)#

Load the plotting style parameters from a configuration file. Merge them if custom_plot_style is provided

Parameters:

params_file – The path to the configuration file containing the plotting style parameters.
custom_plot_style – The path to a custom plotting style file. If provided, the parameters from this file will be merged with the default parameters.

Returns:

The plotting style parameters.

pocket_coffea.law_tasks.utils.load_run_options(run_options: dict, executor: str, config: Configurator, test: bool = False, scaleout: int | None = None, limit_files: int | None = None, limit_chunks: int | None = None) → tuple#

Load the run options for a given executor and scaleout value. Update the configuration based on the provided run options.

Parameters:

run_options (dict) – A dictionary containing the run options.
executor (str) – The executor to use (and possible site, e.g. dask@lxplus).
config (Configurator) – The Configurator object.
test (bool, optional) – Flag indicating whether to run in test mode. Defaults to False.
scaleout (int, optional) – The scaleout value. Defaults to None.
limit_files (int, optional) – The limit of files. Defaults to None.
limit_chunks (int, optional) – The limit of chunks. Defaults to None.

Returns:

A tuple containing the updated run options dictionary and the Configurator object.

Return type:

tuple

This function loads the run options for a given executor and scaleout value. It merges the default run options with the provided run options, and updates the scaleout value if provided. The run options are returned as a dictionary.

pocket_coffea.law_tasks.utils.load_sample_names(sample_config: str | PathLike, prefix: str | None = None)#

Load sample names from a sample configuration file.

Parameters:

sample_config (str or os.PathLike) – The path to the sample configuration file.
prefix (str, optional) – Optional prefix to filter sample names. Only sample names that start with the specified prefix will be included. Defaults to None.

Returns:

List of sample names.

Return type:

list[str]

pocket_coffea.law_tasks.utils.merge_datasets_definition(definition_files: list) → dict[str, dict]#

Merge multiple dataset definition files into one.

Parameters:: definition_files (list) – List of dataset definition files.
Returns:: The merged dataset definition.
Return type:: dict

Warning

If duplicate keys are found in the datasets definition, a warning will be raised.

pocket_coffea.law_tasks.utils.modify_dataset_output_path(dataset_definition: str | PathLike | dict, dataset_configuration: dict, output_file: str | PathLike | None = None) → dict#

Modify the dataset definition file to include the full output path in the json output field.

Parameters:

dataset_definition (Union[FileName, dict]) – The path to the dataset definition file or the dataset definition as a dictionary.
dataset_configuration (dict) – The configuration for the datasets from the configurator.
output_file (str or os.PathLike) – The name of the output file. If provided, the modified dataset definition will be saved with this filename in the output directory. If not provided, the modified dataset definition will not be saved. Default is None.

Returns:

The modified dataset definition as a dictionary.

Return type:

dict

pocket_coffea.law_tasks.utils.process_datasets(coffea_executor: ~coffea.processor.executor.ExecutorBase, config: ~pocket_coffea.utils.configurator.Configurator, run_options: dict, processor_instance, output_path: str | ~os.PathLike | None = None, process_separately: bool = False, schema: ~coffea.nanoevents.schemas.base.BaseSchema = <class 'coffea.nanoevents.schemas.nanoaod.NanoAODSchema'>, file_format: str = 'root')#

pocket_coffea.law_tasks.utils.read_datasets_definition(dataset_definition: str | PathLike) → dict#

Read the datasets definition file and return it as a dictionary.

Parameters:: dataset_definition (str or os.PathLike) – The path to the dataset definition file.
Returns:: The datasets definition as a dictionary.
Return type:: dict

pocket_coffea.law_tasks package

Contents

pocket_coffea.law_tasks package#

Subpackages#

Submodules#

pocket_coffea.law_tasks.utils module#

Module contents#