pocket_coffea.scripts.dataset package#

Submodules#

pocket_coffea.scripts.dataset.append_genweights module#

pocket_coffea.scripts.dataset.append_parents module#

pocket_coffea.scripts.dataset.build_datasets module#

pocket_coffea.scripts.dataset.dataset_query module#

class pocket_coffea.scripts.dataset.dataset_query.DataDiscoveryCLI#

Bases: object

property as_dict#
do_allowlist_sites(sites=None)#
do_blocklist_sites(sites=None)#
do_clear()#
do_list_replicas()#
do_list_selected()#
do_login(proxy=None)#

Login to the rucio client. Optionally a specific proxy file can be passed to the command. If the proxy file is not specified, voms-proxy-info is used

do_query(query=None)#
do_query_results()#
do_regex_sites(regex=None)#
do_replicas(mode=None, selection=None)#

Query Rucio for replicas. mode: - None: ask the user about the mode

  • round-robin (take files randomly from available sites),

  • choose: ask the user to choose from a list of sites

  • first: take the first site from the rucio query

selection: list of indices or ‘all’ to select all the selected datasets for replicas query

do_save(filename=None)#

Save the replica information in yaml format

do_select(selection=None, metadata=None)#

Selected the datasets from the list of query results. Input a list of indices also with range 4-6 or “all”.

do_sites_filters(ask_clear=True)#
do_whoami()#
extract_era_from_dataset_name(dataset_name)#
extract_xsec_from_dataset_name(dataset_name)#
extract_year_from_dataset_name(dataset_name)#
generate_default_metadata(dataset)#
is_mc_dataset(dataset_name)#
load_dataset_definition(dataset_definition, query_results_strategy='all', replicas_strategy='round-robin')#

Initialize the DataDiscoverCLI by querying a set of datasets defined in dataset_definitions and selected results and replicas following the options.

  • query_results_strategy: “all” or “manual” to be prompt for selection

  • replicas_strategy:
    • “round-robin”: select randomly from the available sites for each file

    • “choose”: filter the sites with a list of indices for all the files

    • “first”: take the first result returned by rucio

    • “manual”: to be prompt for manual decision dataset by dataset

start_cli()#
pocket_coffea.scripts.dataset.dataset_query.get_indices_query(input_str: str, maxN: int) List[int]#
pocket_coffea.scripts.dataset.dataset_query.print_dataset_query(query, dataset_list, console, selected=[])#

pocket_coffea.scripts.dataset.download module#

Module contents#