Input option `for_each`

Difficulty level: easy
Time need to lean: 10 minutes or less
Key points:
- for_each runs the substep with different parameters

Option for_each allows you to create substeps with different values of one or more variables.

`for_each` with names of variables

If you have a variable that have already been defined with a list of values, you can specify its name to parameter for_each. The iteration variable will have a name prefixed with an underscore.

For eample, option for_each='method' creates two substeps with m1 and m2, presented by variable _method:

0: file1 file2 m1
1: file1 file2 m2

Nested loops are also allowed. For example,

0: _input=file1 file2 _method=m1, _pars=1
1: _input=file1 file2 _method=m2, _pars=1
2: _input=file1 file2 _method=m1, _pars=2
3: _input=file1 file2 _method=m2, _pars=2

If you would like to loop the process with several parameters, you can put them into the same level by 'var1,var2'. For example,

0: _input=file1 file2 _method=m1, _pars=1
1: _input=file1 file2 _method=m2, _pars=2

The variable passed to option for_each can a sequence (list, tuple, set, etc), a Pandas Series, Index, or DataFrame. In the last case, each _loop variable presents a line in the dataframe and you can access single values using format _loop["header"]. For example

_index=0
_df=
A        1
B        2
C    Hello
Name: 0, dtype: object
_output=1_2_Hello.txt

_index=1
_df=
A        2
B        4
C    World
Name: 1, dtype: object
_output=2_4_World.txt

`for_each` with dictionary of variables

If you would like define your own loop variable, or if the default loop variable does not work (e.g. loop through obj.sequence where _obj.sequence is not a valid variable name), you can use a dictionary syntax in the format of {'varname': sequence}. Mult-variable and nested loops can be specified in the format of {'var1': seq1, 'var2': seq2} (same level) and [{'var1': seq1}, {'var2': seq2}] for different levels.

For example, the first example for this parameter can be written as

0: file1 file2 m1
1: file1 file2 m2

and a latter example can be written as

0: _input=file1 file2 method=m1, pars=1
1: _input=file1 file2 method=m2, pars=2

If you need to defined nested loops, you can use

0: _input=file1 file2 method=m1, pars=1
1: _input=file1 file2 method=m2, pars=1
2: _input=file1 file2 method=m1, pars=2
3: _input=file1 file2 method=m2, pars=2

The dictionary syntax also supports multiple keys. This helps customizing groups of variables. For example in the script below we only care for situations where n is greater than p,

0 1.txt 100 50
1 2.txt 300 50
2 3.txt 300 100
3 4.txt 300 200

`for_each` with list of contexts

In some cases your "contexts" for substeps are irregular and cannot be easily written as nested loops. It can be easier and clearer to specify the variables and their values for each substep. for_each allows you to specify complete contexts as list of dictionaries, under the condition that all dictionaries should have the same set of keys.

For example, the last example was pretty difficult to understand with _n > _p in a list comprehension. The following example specifies the same loops and is a lot easier to read:

0 1.txt 100 50
1 2.txt 300 50
2 3.txt 300 100
3 4.txt 300 200

`for_each` and `group_by`

Options for_each and group_by can be used together, in which case for_each is applied to each substep created by group_by, creating more substeps.

0: _input=file1 method=m1, pars=1
1: _input=file2 method=m1, pars=1
2: _input=file1 method=m2, pars=2
3: _input=file2 method=m2, pars=2

Input option for_each

for_each with names of variables

for_each with dictionary of variables

for_each with list of contexts

for_each and group_by

Input option `for_each`

`for_each` with names of variables

`for_each` with dictionary of variables

`for_each` with list of contexts

`for_each` and `group_by`