pocket_coffea.scripts.dataset package#

Submodules#

pocket_coffea.scripts.dataset.append_genweights module#

pocket_coffea.scripts.dataset.append_parents module#

pocket_coffea.scripts.dataset.build_datasets module#

pocket_coffea.scripts.dataset.dataset_query module#

class pocket_coffea.scripts.dataset.dataset_query.DataDiscoveryCLI#

Bases: object

property as_dict#
do_allowlist_sites(sites=None)#
do_blocklist_sites(sites=None)#
do_clear()#
do_list_replicas()#
do_list_selected()#
do_login(proxy=None)#

Login to the rucio client. Optionally a specific proxy file can be passed to the command. If the proxy file is not specified, voms-proxy-info is used

do_prioritylist_sites(sites=None)#

Choose prioritised sites by which to order replicas independent of location and availability.

do_query(query=None)#
do_query_results()#
do_regex_sites(regex=None)#
do_replicas(mode=None, selection=None)#

Query Rucio for replicas. mode: - None: ask the user about the mode

  • round-robin (take files randomly from available sites),

  • choose: ask the user to choose from a list of sites

  • first: take the first site from the rucio query

selection: list of indices or ‘all’ to select all the selected datasets for replicas query

do_save(filename=None)#

Save the replica information in yaml format

do_select(selection=None, metadata=None)#

Selected the datasets from the list of query results. Input a list of indices also with range 4-6 or “all”.

do_set_replicas_sorting(sort: str | None = None)#

Set the sorting mode for the replicas.

If sort is None, it will ask the user for the sorting mode. If user input is empty, the sorting mode will not be changed.

Parameters:

sort (str, optional) – how to sort replicas, by default None if None, it will ask the user for the sorting mode.

do_sites_filters(ask_clear=True)#
do_whoami()#
extract_era_from_dataset_name(dataset_name)#
extract_xsec_from_dataset_name(dataset_name)#
extract_year_from_dataset_name(dataset_name)#
generate_default_metadata(dataset)#
is_mc_dataset(dataset_name)#
load_dataset_definition(dataset_definition, query_results_strategy='all', replicas_strategy='round-robin')#

Initialize the DataDiscoverCLI by querying a set of datasets defined in dataset_definitions and selected results and replicas following the options.

  • query_results_strategy: “all” or “manual” to be prompt for selection

  • replicas_strategy:
    • “round-robin”: select randomly from the available sites for each file

    • “choose”: filter the sites with a list of indices for all the files

    • “first”: take the first result returned by rucio

    • “manual”: to be prompt for manual decision dataset by dataset

start_cli()#
pocket_coffea.scripts.dataset.dataset_query.get_indices_query(input_str: str, maxN: int) List[int]#
pocket_coffea.scripts.dataset.dataset_query.print_dataset_query(query, dataset_list, console, selected=[])#

pocket_coffea.scripts.dataset.download module#

Module contents#