pocket_coffea.scripts.dataset package#
Submodules#
pocket_coffea.scripts.dataset.append_genweights module#
pocket_coffea.scripts.dataset.append_parents module#
pocket_coffea.scripts.dataset.build_datasets module#
pocket_coffea.scripts.dataset.dataset_query module#
- class pocket_coffea.scripts.dataset.dataset_query.DataDiscoveryCLI#
Bases:
object- property as_dict#
- do_allowlist_sites(sites=None)#
- do_blocklist_sites(sites=None)#
- do_clear()#
- do_list_replicas()#
- do_list_selected()#
- do_login(proxy=None)#
Login to the rucio client. Optionally a specific proxy file can be passed to the command. If the proxy file is not specified, voms-proxy-info is used
- do_prioritylist_sites(sites=None)#
Choose prioritised sites by which to order replicas independent of location and availability.
- do_query(query=None)#
- do_query_results()#
- do_regex_sites(regex=None)#
- do_replicas(mode=None, selection=None)#
Query Rucio for replicas. mode: - None: ask the user about the mode
round-robin (take files randomly from available sites),
choose: ask the user to choose from a list of sites
first: take the first site from the rucio query
selection: list of indices or ‘all’ to select all the selected datasets for replicas query
- do_save(filename=None)#
Save the replica information in yaml format
- do_select(selection=None, metadata=None)#
Selected the datasets from the list of query results. Input a list of indices also with range 4-6 or “all”.
- do_set_replicas_sorting(sort: str | None = None)#
Set the sorting mode for the replicas.
If sort is None, it will ask the user for the sorting mode. If user input is empty, the sorting mode will not be changed.
- Parameters:
sort (str, optional) – how to sort replicas, by default None if None, it will ask the user for the sorting mode.
- do_sites_filters(ask_clear=True)#
- do_whoami()#
- extract_era_from_dataset_name(dataset_name)#
- extract_xsec_from_dataset_name(dataset_name)#
- extract_year_from_dataset_name(dataset_name)#
- generate_default_metadata(dataset)#
- is_mc_dataset(dataset_name)#
- load_dataset_definition(dataset_definition, query_results_strategy='all', replicas_strategy='round-robin')#
Initialize the DataDiscoverCLI by querying a set of datasets defined in dataset_definitions and selected results and replicas following the options.
query_results_strategy: “all” or “manual” to be prompt for selection
- replicas_strategy:
“round-robin”: select randomly from the available sites for each file
“choose”: filter the sites with a list of indices for all the files
“first”: take the first result returned by rucio
“manual”: to be prompt for manual decision dataset by dataset
- start_cli()#
- pocket_coffea.scripts.dataset.dataset_query.get_indices_query(input_str: str, maxN: int) List[int]#
- pocket_coffea.scripts.dataset.dataset_query.print_dataset_query(query, dataset_list, console, selected=[])#