pocket_coffea.law_tasks.tasks package

Contents

pocket_coffea.law_tasks.tasks package#

Submodules#

pocket_coffea.law_tasks.tasks.base module#

class pocket_coffea.law_tasks.tasks.base.BaseTask(*args, **kwargs)#

Bases: Task

property base_store: Path#

The base path where all output files of tasks are stored.

Returns:

Environment variable ANALYSIS_STORE

Return type:

Path

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
local_directory_target(*path: str) LocalDirectoryTarget#

Return a LocalDirectoryTarget for the given path(s). Pass multiple path parts as separate arguments.

Returns:

LocalDirectoryTarget for the given path

Return type:

law.LocalDirectoryTarget

local_file_target(*path: str) LocalFileTarget#

Return a LocalFileTarget for the given path(s). Pass multiple path parts as separate arguments.

Returns:

LocalFileTarget for the given path

Return type:

law.LocalFileTarget

local_path(*path: str) Path#

Return path to a location in the local store. Is always prepended with environment variable $ANALYSIS_STORE. Pass multiple path parts as separate arguments.

Returns:

joined ANALYSIS_STORE with store_parts and arguments

Return type:

str

store_parts() tuple[str]#

Tuple of parts that get added to the store path (local/wlcg). Can be overridden in subclasses to add more parts.

Returns:

Task class name and version

Return type:

tuple[str]

version#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

property version_store: Path#

The base path where all output files of tasks are stored for a specific version.

Returns:

base_store for specific version

Return type:

Path

wlcg_directory_target(*path: str, **kwargs) str#

Return a WLCGDirectoryTarget for the given path(s). Pass multiple path parts as separate arguments. Will be prepended with the store’s base path set in law.cfg.

Returns:

WLCGDirectoryTarget for the given path

Return type:

str

wlcg_file_target(*path: str, **kwargs) str#

Return a WLCGFileTarget for the given path(s). Pass multiple path parts as separate arguments. Will be prepended with the store’s base path set in law.cfg.

Returns:

WLCGFileTarget for the given path

Return type:

str

wlcg_path(*path: str) Path#

Return path to a location in the WLCG store. Pass multiple path parts as separate arguments.

Returns:

joined store_parts and arguments

Return type:

str

class pocket_coffea.law_tasks.tasks.base.BaseTaskWithTest(*args, **kwargs)#

Bases: BaseTask

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
store_parts() tuple[str]#

Tuple of parts that get added to the store path (local/wlcg). Can be overridden in subclasses to add more parts.

Returns:

Task class name and version

Return type:

tuple[str]

test#

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

pocket_coffea.law_tasks.tasks.datacard module#

class pocket_coffea.law_tasks.tasks.datacard.DatacardProducer(*args, **kwargs)#

Bases: BaseTask

category#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

cfg#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

clone_parent(**kwargs)#
clone_parents(**kwargs)#
datacard_name#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
output() dict[str, LocalFileTarget]#

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note

If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

requires() Runner#

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

run()#

The task run method, to be overridden in a subclass.

See Task.run

shapes_name#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

stat_config#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

store_parts() tuple[str]#

Tuple of parts that get added to the store path (local/wlcg). Can be overridden in subclasses to add more parts.

Returns:

Task class name and version

Return type:

tuple[str]

transfer#

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

variable#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

years#
__init__(*args, cls=luigi.Parameter, inst=None, unique=False, sort=False, min_len=None,

max_len=None, choices=None, brace_expand=False, escape_sep=True, force_tuple=True, **kwargs)

Parameter that parses a comma-separated value (CSV) and produces a tuple. cls (inst) can refer to an other parameter class (instance) that will be used to parse and serialize the particular items.

When unique is True, both parsing and serialization methods make sure that values are unique. sort can be a boolean or a function for sorting parameter values.

When min_len (max_len) is set to an integer, an error is raised in case the number of elements to serialize or parse (evaluated after potentially ensuring uniqueness) deceeds (exceeds) that value. Just like in luigi’s ChoiceParamater, choices can be a sequence of accepted values.

When brace_expand is True, brace expansion is applied, potentially extending the list of values. However, note that in this case commas that are not meant to act as a delimiter cannot be quoted in csv-style with double quotes, but they should rather be backslash-escaped instead. Unless escape_sep is False, escaped separators (comma) are not split when parsing strings and, likewise, separators contained in values to serialze are escaped.

By default, single values are parsed such that they result in a tuple containing a single item. However, when force_tuple is False, single values that do not end with a comma are not wrapped by a tuple. Likewise, during serialization they are converted to a string as is, whereas tuple containing only a single item will end with a trailing comma.

Example:

p = CSVParameter(cls=luigi.IntParameter)
p.parse("4,5,6,6")
# => (4, 5, 6, 6)
p.serialize((7, 8, 9))
# => "7,8,9"

# "," that should not be used as delimiter
p = CSVParameter()
p.parse("a,b,\"c,d\"")
# -> ("a", "b", "c,d")
# same as
p.parse("a,b,c\,d")
# -> ("a", "b", "c,d")

# uniqueness check
p = CSVParameter(cls=luigi.IntParameter, unique=True)
p.parse("4,5,6,6")
# => (4, 5, 6)

# length check
p = CSVParameter(cls=luigi.IntParameter, max_len=2)
p.parse("4,5,6")
# => ValueError

# choices
p = CSVParameter(cls=luigi.IntParameter, choices=(1, 2))
p.parse("2,3")
# => ValueError

# brace expansion
p = CSVParameter(cls=luigi.IntParameter, brace_expand=True)
# (note that with brace_expand enabled, the quoting if "," only works with back slashes)
p.parse("1{2,3,4}9")
# => (129, 139, 149)

# do not force tuples to wrap single values
p = CSVParameter(cls=luigi.IntParameter, force_tuple=False)
p.parse("1")
# => 1
# note: the result would be (1,) with force_tuple left at True (default)
p.parse("1,")
# => (1,)
p.serialize(1)
# => "1"
p.serialize((1,))
# => "1,"
p.serialize((1, 2))
# => "1,2"

Note

Due to the way instance caching is implemented in luigi, parameters should always have hashable, immutable values. Therefore, this parameter produces a tuple and, in particular, not a list. To avoid undesired side effects, the default value given to the constructor is also converted to a tuple.

_inst#

type: cls

Instance of the luigi parameter class cls or inst directory, that is used internally for parameter parsing and serialization.

pocket_coffea.law_tasks.tasks.datasets module#

law tasks for a HEP analysis with pocket_coffea

class pocket_coffea.law_tasks.tasks.datasets.CreateDatasets(*args, **kwargs)#

Bases: BaseTask

Create dataset json files

allowlist_sites#
__init__(*args, cls=luigi.Parameter, inst=None, unique=False, sort=False, min_len=None,

max_len=None, choices=None, brace_expand=False, escape_sep=True, force_tuple=True, **kwargs)

Parameter that parses a comma-separated value (CSV) and produces a tuple. cls (inst) can refer to an other parameter class (instance) that will be used to parse and serialize the particular items.

When unique is True, both parsing and serialization methods make sure that values are unique. sort can be a boolean or a function for sorting parameter values.

When min_len (max_len) is set to an integer, an error is raised in case the number of elements to serialize or parse (evaluated after potentially ensuring uniqueness) deceeds (exceeds) that value. Just like in luigi’s ChoiceParamater, choices can be a sequence of accepted values.

When brace_expand is True, brace expansion is applied, potentially extending the list of values. However, note that in this case commas that are not meant to act as a delimiter cannot be quoted in csv-style with double quotes, but they should rather be backslash-escaped instead. Unless escape_sep is False, escaped separators (comma) are not split when parsing strings and, likewise, separators contained in values to serialze are escaped.

By default, single values are parsed such that they result in a tuple containing a single item. However, when force_tuple is False, single values that do not end with a comma are not wrapped by a tuple. Likewise, during serialization they are converted to a string as is, whereas tuple containing only a single item will end with a trailing comma.

Example:

p = CSVParameter(cls=luigi.IntParameter)
p.parse("4,5,6,6")
# => (4, 5, 6, 6)
p.serialize((7, 8, 9))
# => "7,8,9"

# "," that should not be used as delimiter
p = CSVParameter()
p.parse("a,b,\"c,d\"")
# -> ("a", "b", "c,d")
# same as
p.parse("a,b,c\,d")
# -> ("a", "b", "c,d")

# uniqueness check
p = CSVParameter(cls=luigi.IntParameter, unique=True)
p.parse("4,5,6,6")
# => (4, 5, 6)

# length check
p = CSVParameter(cls=luigi.IntParameter, max_len=2)
p.parse("4,5,6")
# => ValueError

# choices
p = CSVParameter(cls=luigi.IntParameter, choices=(1, 2))
p.parse("2,3")
# => ValueError

# brace expansion
p = CSVParameter(cls=luigi.IntParameter, brace_expand=True)
# (note that with brace_expand enabled, the quoting if "," only works with back slashes)
p.parse("1{2,3,4}9")
# => (129, 139, 149)

# do not force tuples to wrap single values
p = CSVParameter(cls=luigi.IntParameter, force_tuple=False)
p.parse("1")
# => 1
# note: the result would be (1,) with force_tuple left at True (default)
p.parse("1,")
# => (1,)
p.serialize(1)
# => "1"
p.serialize((1,))
# => "1,"
p.serialize((1, 2))
# => "1,2"

Note

Due to the way instance caching is implemented in luigi, parameters should always have hashable, immutable values. Therefore, this parameter produces a tuple and, in particular, not a list. To avoid undesired side effects, the default value given to the constructor is also converted to a tuple.

_inst#

type: cls

Instance of the luigi parameter class cls or inst directory, that is used internally for parameter parsing and serialization.

blocklist_sites#
__init__(*args, cls=luigi.Parameter, inst=None, unique=False, sort=False, min_len=None,

max_len=None, choices=None, brace_expand=False, escape_sep=True, force_tuple=True, **kwargs)

Parameter that parses a comma-separated value (CSV) and produces a tuple. cls (inst) can refer to an other parameter class (instance) that will be used to parse and serialize the particular items.

When unique is True, both parsing and serialization methods make sure that values are unique. sort can be a boolean or a function for sorting parameter values.

When min_len (max_len) is set to an integer, an error is raised in case the number of elements to serialize or parse (evaluated after potentially ensuring uniqueness) deceeds (exceeds) that value. Just like in luigi’s ChoiceParamater, choices can be a sequence of accepted values.

When brace_expand is True, brace expansion is applied, potentially extending the list of values. However, note that in this case commas that are not meant to act as a delimiter cannot be quoted in csv-style with double quotes, but they should rather be backslash-escaped instead. Unless escape_sep is False, escaped separators (comma) are not split when parsing strings and, likewise, separators contained in values to serialze are escaped.

By default, single values are parsed such that they result in a tuple containing a single item. However, when force_tuple is False, single values that do not end with a comma are not wrapped by a tuple. Likewise, during serialization they are converted to a string as is, whereas tuple containing only a single item will end with a trailing comma.

Example:

p = CSVParameter(cls=luigi.IntParameter)
p.parse("4,5,6,6")
# => (4, 5, 6, 6)
p.serialize((7, 8, 9))
# => "7,8,9"

# "," that should not be used as delimiter
p = CSVParameter()
p.parse("a,b,\"c,d\"")
# -> ("a", "b", "c,d")
# same as
p.parse("a,b,c\,d")
# -> ("a", "b", "c,d")

# uniqueness check
p = CSVParameter(cls=luigi.IntParameter, unique=True)
p.parse("4,5,6,6")
# => (4, 5, 6)

# length check
p = CSVParameter(cls=luigi.IntParameter, max_len=2)
p.parse("4,5,6")
# => ValueError

# choices
p = CSVParameter(cls=luigi.IntParameter, choices=(1, 2))
p.parse("2,3")
# => ValueError

# brace expansion
p = CSVParameter(cls=luigi.IntParameter, brace_expand=True)
# (note that with brace_expand enabled, the quoting if "," only works with back slashes)
p.parse("1{2,3,4}9")
# => (129, 139, 149)

# do not force tuples to wrap single values
p = CSVParameter(cls=luigi.IntParameter, force_tuple=False)
p.parse("1")
# => 1
# note: the result would be (1,) with force_tuple left at True (default)
p.parse("1,")
# => (1,)
p.serialize(1)
# => "1"
p.serialize((1,))
# => "1,"
p.serialize((1, 2))
# => "1,2"

Note

Due to the way instance caching is implemented in luigi, parameters should always have hashable, immutable values. Therefore, this parameter produces a tuple and, in particular, not a list. To avoid undesired side effects, the default value given to the constructor is also converted to a tuple.

_inst#

type: cls

Instance of the luigi parameter class cls or inst directory, that is used internally for parameter parsing and serialization.

cfg#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

check#

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

clone_parent(**kwargs)#
clone_parents(**kwargs)#
dataset_definition#
__init__(*args, cls=luigi.Parameter, inst=None, unique=False, sort=False, min_len=None,

max_len=None, choices=None, brace_expand=False, escape_sep=True, force_tuple=True, **kwargs)

Parameter that parses a comma-separated value (CSV) and produces a tuple. cls (inst) can refer to an other parameter class (instance) that will be used to parse and serialize the particular items.

When unique is True, both parsing and serialization methods make sure that values are unique. sort can be a boolean or a function for sorting parameter values.

When min_len (max_len) is set to an integer, an error is raised in case the number of elements to serialize or parse (evaluated after potentially ensuring uniqueness) deceeds (exceeds) that value. Just like in luigi’s ChoiceParamater, choices can be a sequence of accepted values.

When brace_expand is True, brace expansion is applied, potentially extending the list of values. However, note that in this case commas that are not meant to act as a delimiter cannot be quoted in csv-style with double quotes, but they should rather be backslash-escaped instead. Unless escape_sep is False, escaped separators (comma) are not split when parsing strings and, likewise, separators contained in values to serialze are escaped.

By default, single values are parsed such that they result in a tuple containing a single item. However, when force_tuple is False, single values that do not end with a comma are not wrapped by a tuple. Likewise, during serialization they are converted to a string as is, whereas tuple containing only a single item will end with a trailing comma.

Example:

p = CSVParameter(cls=luigi.IntParameter)
p.parse("4,5,6,6")
# => (4, 5, 6, 6)
p.serialize((7, 8, 9))
# => "7,8,9"

# "," that should not be used as delimiter
p = CSVParameter()
p.parse("a,b,\"c,d\"")
# -> ("a", "b", "c,d")
# same as
p.parse("a,b,c\,d")
# -> ("a", "b", "c,d")

# uniqueness check
p = CSVParameter(cls=luigi.IntParameter, unique=True)
p.parse("4,5,6,6")
# => (4, 5, 6)

# length check
p = CSVParameter(cls=luigi.IntParameter, max_len=2)
p.parse("4,5,6")
# => ValueError

# choices
p = CSVParameter(cls=luigi.IntParameter, choices=(1, 2))
p.parse("2,3")
# => ValueError

# brace expansion
p = CSVParameter(cls=luigi.IntParameter, brace_expand=True)
# (note that with brace_expand enabled, the quoting if "," only works with back slashes)
p.parse("1{2,3,4}9")
# => (129, 139, 149)

# do not force tuples to wrap single values
p = CSVParameter(cls=luigi.IntParameter, force_tuple=False)
p.parse("1")
# => 1
# note: the result would be (1,) with force_tuple left at True (default)
p.parse("1,")
# => (1,)
p.serialize(1)
# => "1"
p.serialize((1,))
# => "1,"
p.serialize((1, 2))
# => "1,2"

Note

Due to the way instance caching is implemented in luigi, parameters should always have hashable, immutable values. Therefore, this parameter produces a tuple and, in particular, not a list. To avoid undesired side effects, the default value given to the constructor is also converted to a tuple.

_inst#

type: cls

Instance of the luigi parameter class cls or inst directory, that is used internally for parameter parsing and serialization.

dataset_dir#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

download#

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
include_redirector#

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

keys#

Parameter whose value is a tuple or tuple of tuples.

In the task definition, use

class MyTask(luigi.Task):
  book_locations = luigi.TupleParameter()

    def run(self):
        for location in self.book_locations:
            print("Go to page %d, line %d" % (location[0], location[1]))

At the command line, use

$ luigi --module my_tasks MyTask --book_locations <JSON string>

Simple example with two grades:

$ luigi --module my_tasks MyTask --book_locations '((12,3),(4,15),(52,1))'
local_prefix#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

output()#

json files for datasets

overwrite#

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

parallelize#

Parameter whose value is an int.

prioritylist_sites#
__init__(*args, cls=luigi.Parameter, inst=None, unique=False, sort=False, min_len=None,

max_len=None, choices=None, brace_expand=False, escape_sep=True, force_tuple=True, **kwargs)

Parameter that parses a comma-separated value (CSV) and produces a tuple. cls (inst) can refer to an other parameter class (instance) that will be used to parse and serialize the particular items.

When unique is True, both parsing and serialization methods make sure that values are unique. sort can be a boolean or a function for sorting parameter values.

When min_len (max_len) is set to an integer, an error is raised in case the number of elements to serialize or parse (evaluated after potentially ensuring uniqueness) deceeds (exceeds) that value. Just like in luigi’s ChoiceParamater, choices can be a sequence of accepted values.

When brace_expand is True, brace expansion is applied, potentially extending the list of values. However, note that in this case commas that are not meant to act as a delimiter cannot be quoted in csv-style with double quotes, but they should rather be backslash-escaped instead. Unless escape_sep is False, escaped separators (comma) are not split when parsing strings and, likewise, separators contained in values to serialze are escaped.

By default, single values are parsed such that they result in a tuple containing a single item. However, when force_tuple is False, single values that do not end with a comma are not wrapped by a tuple. Likewise, during serialization they are converted to a string as is, whereas tuple containing only a single item will end with a trailing comma.

Example:

p = CSVParameter(cls=luigi.IntParameter)
p.parse("4,5,6,6")
# => (4, 5, 6, 6)
p.serialize((7, 8, 9))
# => "7,8,9"

# "," that should not be used as delimiter
p = CSVParameter()
p.parse("a,b,\"c,d\"")
# -> ("a", "b", "c,d")
# same as
p.parse("a,b,c\,d")
# -> ("a", "b", "c,d")

# uniqueness check
p = CSVParameter(cls=luigi.IntParameter, unique=True)
p.parse("4,5,6,6")
# => (4, 5, 6)

# length check
p = CSVParameter(cls=luigi.IntParameter, max_len=2)
p.parse("4,5,6")
# => ValueError

# choices
p = CSVParameter(cls=luigi.IntParameter, choices=(1, 2))
p.parse("2,3")
# => ValueError

# brace expansion
p = CSVParameter(cls=luigi.IntParameter, brace_expand=True)
# (note that with brace_expand enabled, the quoting if "," only works with back slashes)
p.parse("1{2,3,4}9")
# => (129, 139, 149)

# do not force tuples to wrap single values
p = CSVParameter(cls=luigi.IntParameter, force_tuple=False)
p.parse("1")
# => 1
# note: the result would be (1,) with force_tuple left at True (default)
p.parse("1,")
# => (1,)
p.serialize(1)
# => "1"
p.serialize((1,))
# => "1,"
p.serialize((1, 2))
# => "1,2"

Note

Due to the way instance caching is implemented in luigi, parameters should always have hashable, immutable values. Therefore, this parameter produces a tuple and, in particular, not a list. To avoid undesired side effects, the default value given to the constructor is also converted to a tuple.

_inst#

type: cls

Instance of the luigi parameter class cls or inst directory, that is used internally for parameter parsing and serialization.

regex_sites#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

run()#

The task run method, to be overridden in a subclass.

See Task.run

sort_replicas#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

split_by_year#

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

pocket_coffea.law_tasks.tasks.plotting module#

pocket_coffea.law_tasks.tasks.runner module#

class pocket_coffea.law_tasks.tasks.runner.Runner(*args, **kwargs)#

Bases: BaseTask

Run the analysis with pocket_coffea requires CreateDatasets task

cfg#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

clone_parent(**kwargs)#
clone_parents(**kwargs)#
coffea_output#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

config = None#
exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
executor#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

limit_chunks#

Parameter whose value is an int.

limit_files#

Parameter whose value is an int.

output() dict[str, LocalFileTarget]#

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note

If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

process_separately#

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

requires() dict[str, Task]#

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

run()#

The task run method, to be overridden in a subclass.

See Task.run

scaleout#

Parameter whose value is an int.

skip_bad_files#

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

property skip_output_removal: bool#

bool(x) -> bool

Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.

store_parts() tuple[str]#

Tuple of parts that get added to the store path (local/wlcg). Can be overridden in subclasses to add more parts.

Returns:

Task class name and version

Return type:

tuple[str]

test#

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

Module contents#