pocket_coffea.law_tasks.tasks package#
Submodules#
pocket_coffea.law_tasks.tasks.base module#
- class pocket_coffea.law_tasks.tasks.base.BaseTask(*args, **kwargs)#
Bases:
Task- property base_store: Path#
The base path where all output files of tasks are stored.
- Returns:
Environment variable ANALYSIS_STORE
- Return type:
Path
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- local_directory_target(*path: str) LocalDirectoryTarget#
Return a LocalDirectoryTarget for the given path(s). Pass multiple path parts as separate arguments.
- Returns:
LocalDirectoryTarget for the given path
- Return type:
law.LocalDirectoryTarget
- local_file_target(*path: str) LocalFileTarget#
Return a LocalFileTarget for the given path(s). Pass multiple path parts as separate arguments.
- Returns:
LocalFileTarget for the given path
- Return type:
law.LocalFileTarget
- local_path(*path: str) Path#
Return path to a location in the local store. Is always prepended with environment variable $ANALYSIS_STORE. Pass multiple path parts as separate arguments.
- Returns:
joined ANALYSIS_STORE with store_parts and arguments
- Return type:
str
- store_parts() tuple[str]#
Tuple of parts that get added to the store path (local/wlcg). Can be overridden in subclasses to add more parts.
- Returns:
Task class name and version
- Return type:
tuple[str]
- version#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- property version_store: Path#
The base path where all output files of tasks are stored for a specific version.
- Returns:
base_store for specific version
- Return type:
Path
- wlcg_directory_target(*path: str, **kwargs) str#
Return a WLCGDirectoryTarget for the given path(s). Pass multiple path parts as separate arguments. Will be prepended with the store’s base path set in law.cfg.
- Returns:
WLCGDirectoryTarget for the given path
- Return type:
str
- wlcg_file_target(*path: str, **kwargs) str#
Return a WLCGFileTarget for the given path(s). Pass multiple path parts as separate arguments. Will be prepended with the store’s base path set in law.cfg.
- Returns:
WLCGFileTarget for the given path
- Return type:
str
- wlcg_path(*path: str) Path#
Return path to a location in the WLCG store. Pass multiple path parts as separate arguments.
- Returns:
joined store_parts and arguments
- Return type:
str
- class pocket_coffea.law_tasks.tasks.base.BaseTaskWithTest(*args, **kwargs)#
Bases:
BaseTask- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- store_parts() tuple[str]#
Tuple of parts that get added to the store path (local/wlcg). Can be overridden in subclasses to add more parts.
- Returns:
Task class name and version
- Return type:
tuple[str]
- test#
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
pocket_coffea.law_tasks.tasks.datacard module#
- class pocket_coffea.law_tasks.tasks.datacard.DatacardProducer(*args, **kwargs)#
Bases:
BaseTask- category#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- cfg#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- clone_parent(**kwargs)#
- clone_parents(**kwargs)#
- datacard_name#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- output() dict[str, LocalFileTarget]#
The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Targetor a list ofTargetinstances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
- requires() Runner#
The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
- run()#
The task run method, to be overridden in a subclass.
See Task.run
- shapes_name#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- stat_config#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- store_parts() tuple[str]#
Tuple of parts that get added to the store path (local/wlcg). Can be overridden in subclasses to add more parts.
- Returns:
Task class name and version
- Return type:
tuple[str]
- transfer#
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- variable#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- years#
- __init__(*args, cls=luigi.Parameter, inst=None, unique=False, sort=False, min_len=None,
max_len=None, choices=None, brace_expand=False, escape_sep=True, force_tuple=True, **kwargs)
Parameter that parses a comma-separated value (CSV) and produces a tuple. cls (inst) can refer to an other parameter class (instance) that will be used to parse and serialize the particular items.
When unique is True, both parsing and serialization methods make sure that values are unique. sort can be a boolean or a function for sorting parameter values.
When min_len (max_len) is set to an integer, an error is raised in case the number of elements to serialize or parse (evaluated after potentially ensuring uniqueness) deceeds (exceeds) that value. Just like in luigi’s ChoiceParamater, choices can be a sequence of accepted values.
When brace_expand is True, brace expansion is applied, potentially extending the list of values. However, note that in this case commas that are not meant to act as a delimiter cannot be quoted in csv-style with double quotes, but they should rather be backslash-escaped instead. Unless escape_sep is False, escaped separators (comma) are not split when parsing strings and, likewise, separators contained in values to serialze are escaped.
By default, single values are parsed such that they result in a tuple containing a single item. However, when force_tuple is False, single values that do not end with a comma are not wrapped by a tuple. Likewise, during serialization they are converted to a string as is, whereas tuple containing only a single item will end with a trailing comma.
Example:
p = CSVParameter(cls=luigi.IntParameter) p.parse("4,5,6,6") # => (4, 5, 6, 6) p.serialize((7, 8, 9)) # => "7,8,9" # "," that should not be used as delimiter p = CSVParameter() p.parse("a,b,\"c,d\"") # -> ("a", "b", "c,d") # same as p.parse("a,b,c\,d") # -> ("a", "b", "c,d") # uniqueness check p = CSVParameter(cls=luigi.IntParameter, unique=True) p.parse("4,5,6,6") # => (4, 5, 6) # length check p = CSVParameter(cls=luigi.IntParameter, max_len=2) p.parse("4,5,6") # => ValueError # choices p = CSVParameter(cls=luigi.IntParameter, choices=(1, 2)) p.parse("2,3") # => ValueError # brace expansion p = CSVParameter(cls=luigi.IntParameter, brace_expand=True) # (note that with brace_expand enabled, the quoting if "," only works with back slashes) p.parse("1{2,3,4}9") # => (129, 139, 149) # do not force tuples to wrap single values p = CSVParameter(cls=luigi.IntParameter, force_tuple=False) p.parse("1") # => 1 # note: the result would be (1,) with force_tuple left at True (default) p.parse("1,") # => (1,) p.serialize(1) # => "1" p.serialize((1,)) # => "1," p.serialize((1, 2)) # => "1,2"
Note
Due to the way instance caching is implemented in luigi, parameters should always have hashable, immutable values. Therefore, this parameter produces a tuple and, in particular, not a list. To avoid undesired side effects, the default value given to the constructor is also converted to a tuple.
- _inst#
type:
clsInstance of the luigi parameter class cls or inst directory, that is used internally for parameter parsing and serialization.
pocket_coffea.law_tasks.tasks.datasets module#
law tasks for a HEP analysis with pocket_coffea
- class pocket_coffea.law_tasks.tasks.datasets.CreateDatasets(*args, **kwargs)#
Bases:
BaseTaskCreate dataset json files
- allowlist_sites#
- __init__(*args, cls=luigi.Parameter, inst=None, unique=False, sort=False, min_len=None,
max_len=None, choices=None, brace_expand=False, escape_sep=True, force_tuple=True, **kwargs)
Parameter that parses a comma-separated value (CSV) and produces a tuple. cls (inst) can refer to an other parameter class (instance) that will be used to parse and serialize the particular items.
When unique is True, both parsing and serialization methods make sure that values are unique. sort can be a boolean or a function for sorting parameter values.
When min_len (max_len) is set to an integer, an error is raised in case the number of elements to serialize or parse (evaluated after potentially ensuring uniqueness) deceeds (exceeds) that value. Just like in luigi’s ChoiceParamater, choices can be a sequence of accepted values.
When brace_expand is True, brace expansion is applied, potentially extending the list of values. However, note that in this case commas that are not meant to act as a delimiter cannot be quoted in csv-style with double quotes, but they should rather be backslash-escaped instead. Unless escape_sep is False, escaped separators (comma) are not split when parsing strings and, likewise, separators contained in values to serialze are escaped.
By default, single values are parsed such that they result in a tuple containing a single item. However, when force_tuple is False, single values that do not end with a comma are not wrapped by a tuple. Likewise, during serialization they are converted to a string as is, whereas tuple containing only a single item will end with a trailing comma.
Example:
p = CSVParameter(cls=luigi.IntParameter) p.parse("4,5,6,6") # => (4, 5, 6, 6) p.serialize((7, 8, 9)) # => "7,8,9" # "," that should not be used as delimiter p = CSVParameter() p.parse("a,b,\"c,d\"") # -> ("a", "b", "c,d") # same as p.parse("a,b,c\,d") # -> ("a", "b", "c,d") # uniqueness check p = CSVParameter(cls=luigi.IntParameter, unique=True) p.parse("4,5,6,6") # => (4, 5, 6) # length check p = CSVParameter(cls=luigi.IntParameter, max_len=2) p.parse("4,5,6") # => ValueError # choices p = CSVParameter(cls=luigi.IntParameter, choices=(1, 2)) p.parse("2,3") # => ValueError # brace expansion p = CSVParameter(cls=luigi.IntParameter, brace_expand=True) # (note that with brace_expand enabled, the quoting if "," only works with back slashes) p.parse("1{2,3,4}9") # => (129, 139, 149) # do not force tuples to wrap single values p = CSVParameter(cls=luigi.IntParameter, force_tuple=False) p.parse("1") # => 1 # note: the result would be (1,) with force_tuple left at True (default) p.parse("1,") # => (1,) p.serialize(1) # => "1" p.serialize((1,)) # => "1," p.serialize((1, 2)) # => "1,2"
Note
Due to the way instance caching is implemented in luigi, parameters should always have hashable, immutable values. Therefore, this parameter produces a tuple and, in particular, not a list. To avoid undesired side effects, the default value given to the constructor is also converted to a tuple.
- _inst#
type:
clsInstance of the luigi parameter class cls or inst directory, that is used internally for parameter parsing and serialization.
- blocklist_sites#
- __init__(*args, cls=luigi.Parameter, inst=None, unique=False, sort=False, min_len=None,
max_len=None, choices=None, brace_expand=False, escape_sep=True, force_tuple=True, **kwargs)
Parameter that parses a comma-separated value (CSV) and produces a tuple. cls (inst) can refer to an other parameter class (instance) that will be used to parse and serialize the particular items.
When unique is True, both parsing and serialization methods make sure that values are unique. sort can be a boolean or a function for sorting parameter values.
When min_len (max_len) is set to an integer, an error is raised in case the number of elements to serialize or parse (evaluated after potentially ensuring uniqueness) deceeds (exceeds) that value. Just like in luigi’s ChoiceParamater, choices can be a sequence of accepted values.
When brace_expand is True, brace expansion is applied, potentially extending the list of values. However, note that in this case commas that are not meant to act as a delimiter cannot be quoted in csv-style with double quotes, but they should rather be backslash-escaped instead. Unless escape_sep is False, escaped separators (comma) are not split when parsing strings and, likewise, separators contained in values to serialze are escaped.
By default, single values are parsed such that they result in a tuple containing a single item. However, when force_tuple is False, single values that do not end with a comma are not wrapped by a tuple. Likewise, during serialization they are converted to a string as is, whereas tuple containing only a single item will end with a trailing comma.
Example:
p = CSVParameter(cls=luigi.IntParameter) p.parse("4,5,6,6") # => (4, 5, 6, 6) p.serialize((7, 8, 9)) # => "7,8,9" # "," that should not be used as delimiter p = CSVParameter() p.parse("a,b,\"c,d\"") # -> ("a", "b", "c,d") # same as p.parse("a,b,c\,d") # -> ("a", "b", "c,d") # uniqueness check p = CSVParameter(cls=luigi.IntParameter, unique=True) p.parse("4,5,6,6") # => (4, 5, 6) # length check p = CSVParameter(cls=luigi.IntParameter, max_len=2) p.parse("4,5,6") # => ValueError # choices p = CSVParameter(cls=luigi.IntParameter, choices=(1, 2)) p.parse("2,3") # => ValueError # brace expansion p = CSVParameter(cls=luigi.IntParameter, brace_expand=True) # (note that with brace_expand enabled, the quoting if "," only works with back slashes) p.parse("1{2,3,4}9") # => (129, 139, 149) # do not force tuples to wrap single values p = CSVParameter(cls=luigi.IntParameter, force_tuple=False) p.parse("1") # => 1 # note: the result would be (1,) with force_tuple left at True (default) p.parse("1,") # => (1,) p.serialize(1) # => "1" p.serialize((1,)) # => "1," p.serialize((1, 2)) # => "1,2"
Note
Due to the way instance caching is implemented in luigi, parameters should always have hashable, immutable values. Therefore, this parameter produces a tuple and, in particular, not a list. To avoid undesired side effects, the default value given to the constructor is also converted to a tuple.
- _inst#
type:
clsInstance of the luigi parameter class cls or inst directory, that is used internally for parameter parsing and serialization.
- cfg#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- check#
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- clone_parent(**kwargs)#
- clone_parents(**kwargs)#
- dataset_definition#
- __init__(*args, cls=luigi.Parameter, inst=None, unique=False, sort=False, min_len=None,
max_len=None, choices=None, brace_expand=False, escape_sep=True, force_tuple=True, **kwargs)
Parameter that parses a comma-separated value (CSV) and produces a tuple. cls (inst) can refer to an other parameter class (instance) that will be used to parse and serialize the particular items.
When unique is True, both parsing and serialization methods make sure that values are unique. sort can be a boolean or a function for sorting parameter values.
When min_len (max_len) is set to an integer, an error is raised in case the number of elements to serialize or parse (evaluated after potentially ensuring uniqueness) deceeds (exceeds) that value. Just like in luigi’s ChoiceParamater, choices can be a sequence of accepted values.
When brace_expand is True, brace expansion is applied, potentially extending the list of values. However, note that in this case commas that are not meant to act as a delimiter cannot be quoted in csv-style with double quotes, but they should rather be backslash-escaped instead. Unless escape_sep is False, escaped separators (comma) are not split when parsing strings and, likewise, separators contained in values to serialze are escaped.
By default, single values are parsed such that they result in a tuple containing a single item. However, when force_tuple is False, single values that do not end with a comma are not wrapped by a tuple. Likewise, during serialization they are converted to a string as is, whereas tuple containing only a single item will end with a trailing comma.
Example:
p = CSVParameter(cls=luigi.IntParameter) p.parse("4,5,6,6") # => (4, 5, 6, 6) p.serialize((7, 8, 9)) # => "7,8,9" # "," that should not be used as delimiter p = CSVParameter() p.parse("a,b,\"c,d\"") # -> ("a", "b", "c,d") # same as p.parse("a,b,c\,d") # -> ("a", "b", "c,d") # uniqueness check p = CSVParameter(cls=luigi.IntParameter, unique=True) p.parse("4,5,6,6") # => (4, 5, 6) # length check p = CSVParameter(cls=luigi.IntParameter, max_len=2) p.parse("4,5,6") # => ValueError # choices p = CSVParameter(cls=luigi.IntParameter, choices=(1, 2)) p.parse("2,3") # => ValueError # brace expansion p = CSVParameter(cls=luigi.IntParameter, brace_expand=True) # (note that with brace_expand enabled, the quoting if "," only works with back slashes) p.parse("1{2,3,4}9") # => (129, 139, 149) # do not force tuples to wrap single values p = CSVParameter(cls=luigi.IntParameter, force_tuple=False) p.parse("1") # => 1 # note: the result would be (1,) with force_tuple left at True (default) p.parse("1,") # => (1,) p.serialize(1) # => "1" p.serialize((1,)) # => "1," p.serialize((1, 2)) # => "1,2"
Note
Due to the way instance caching is implemented in luigi, parameters should always have hashable, immutable values. Therefore, this parameter produces a tuple and, in particular, not a list. To avoid undesired side effects, the default value given to the constructor is also converted to a tuple.
- _inst#
type:
clsInstance of the luigi parameter class cls or inst directory, that is used internally for parameter parsing and serialization.
- dataset_dir#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- download#
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- include_redirector#
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- keys#
Parameter whose value is a
tupleortupleof tuples.In the task definition, use
class MyTask(luigi.Task): book_locations = luigi.TupleParameter() def run(self): for location in self.book_locations: print("Go to page %d, line %d" % (location[0], location[1]))
At the command line, use
$ luigi --module my_tasks MyTask --book_locations <JSON string>
Simple example with two grades:
$ luigi --module my_tasks MyTask --book_locations '((12,3),(4,15),(52,1))'
- local_prefix#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- output()#
json files for datasets
- overwrite#
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- parallelize#
Parameter whose value is an
int.
- prioritylist_sites#
- __init__(*args, cls=luigi.Parameter, inst=None, unique=False, sort=False, min_len=None,
max_len=None, choices=None, brace_expand=False, escape_sep=True, force_tuple=True, **kwargs)
Parameter that parses a comma-separated value (CSV) and produces a tuple. cls (inst) can refer to an other parameter class (instance) that will be used to parse and serialize the particular items.
When unique is True, both parsing and serialization methods make sure that values are unique. sort can be a boolean or a function for sorting parameter values.
When min_len (max_len) is set to an integer, an error is raised in case the number of elements to serialize or parse (evaluated after potentially ensuring uniqueness) deceeds (exceeds) that value. Just like in luigi’s ChoiceParamater, choices can be a sequence of accepted values.
When brace_expand is True, brace expansion is applied, potentially extending the list of values. However, note that in this case commas that are not meant to act as a delimiter cannot be quoted in csv-style with double quotes, but they should rather be backslash-escaped instead. Unless escape_sep is False, escaped separators (comma) are not split when parsing strings and, likewise, separators contained in values to serialze are escaped.
By default, single values are parsed such that they result in a tuple containing a single item. However, when force_tuple is False, single values that do not end with a comma are not wrapped by a tuple. Likewise, during serialization they are converted to a string as is, whereas tuple containing only a single item will end with a trailing comma.
Example:
p = CSVParameter(cls=luigi.IntParameter) p.parse("4,5,6,6") # => (4, 5, 6, 6) p.serialize((7, 8, 9)) # => "7,8,9" # "," that should not be used as delimiter p = CSVParameter() p.parse("a,b,\"c,d\"") # -> ("a", "b", "c,d") # same as p.parse("a,b,c\,d") # -> ("a", "b", "c,d") # uniqueness check p = CSVParameter(cls=luigi.IntParameter, unique=True) p.parse("4,5,6,6") # => (4, 5, 6) # length check p = CSVParameter(cls=luigi.IntParameter, max_len=2) p.parse("4,5,6") # => ValueError # choices p = CSVParameter(cls=luigi.IntParameter, choices=(1, 2)) p.parse("2,3") # => ValueError # brace expansion p = CSVParameter(cls=luigi.IntParameter, brace_expand=True) # (note that with brace_expand enabled, the quoting if "," only works with back slashes) p.parse("1{2,3,4}9") # => (129, 139, 149) # do not force tuples to wrap single values p = CSVParameter(cls=luigi.IntParameter, force_tuple=False) p.parse("1") # => 1 # note: the result would be (1,) with force_tuple left at True (default) p.parse("1,") # => (1,) p.serialize(1) # => "1" p.serialize((1,)) # => "1," p.serialize((1, 2)) # => "1,2"
Note
Due to the way instance caching is implemented in luigi, parameters should always have hashable, immutable values. Therefore, this parameter produces a tuple and, in particular, not a list. To avoid undesired side effects, the default value given to the constructor is also converted to a tuple.
- _inst#
type:
clsInstance of the luigi parameter class cls or inst directory, that is used internally for parameter parsing and serialization.
- regex_sites#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- run()#
The task run method, to be overridden in a subclass.
See Task.run
- sort_replicas#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- split_by_year#
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
pocket_coffea.law_tasks.tasks.plotting module#
pocket_coffea.law_tasks.tasks.runner module#
- class pocket_coffea.law_tasks.tasks.runner.Runner(*args, **kwargs)#
Bases:
BaseTaskRun the analysis with pocket_coffea requires CreateDatasets task
- cfg#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- clone_parent(**kwargs)#
- clone_parents(**kwargs)#
- coffea_output#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- config = None#
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- executor#
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- limit_chunks#
Parameter whose value is an
int.
- limit_files#
Parameter whose value is an
int.
- output() dict[str, LocalFileTarget]#
The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Targetor a list ofTargetinstances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
- process_separately#
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- requires() dict[str, Task]#
The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
- run()#
The task run method, to be overridden in a subclass.
See Task.run
- scaleout#
Parameter whose value is an
int.
- skip_bad_files#
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- property skip_output_removal: bool#
bool(x) -> bool
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
- store_parts() tuple[str]#
Tuple of parts that get added to the store path (local/wlcg). Can be overridden in subclasses to add more parts.
- Returns:
Task class name and version
- Return type:
tuple[str]
- test#
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.