- Difficulty level: easy
- Time need to lean: 10 minutes or less
- Key points:
- Step output are defined for each substep and can be derived from substep input (variable
_input
) - Variable
step_output
is defined at the completion of the step, and can be passed to other steps
- Step output are defined for each substep and can be derived from substep input (variable
The output statement defines the output files or targets of a SoS step, it is optional but is fundamental for the creation of all but very simple workflows. You can check out the How to create dependencies between SoS steps tutorial for a quick overview of the use of output statements. This tutorial lists what you can put in the output statement of a step with simple examples and you should refer to other tutorials for more in-depth discussions of the topics.
The output
statement is optional. When no output file is defined, a step will have undefined output.
For example, the following workflow has a step A
that execute a simple shell script. No output statement is needed and the workflow will work just fine.
In simple workflows with numerically indexed steps, an empty output will be passed to the next step.
The easiest way to explicitly specify input of a step is to list output files directly in the output
statement.
Here we showed touch function for _output, which is of type sos_targets. This function creates one or more files in variable _output and will be used quite often in the tutorials because SoS will check if the output file exists after the execution of the step.
As for the case of input statement, multiple files can be listed as multiple paramters, sequences (list, tuple etc), or variables of string or sequence types.
The output
statement can define output for a single substep or all substeps. That is to say,
- If the
output
targets are ungrouped, it defines_output
.step_output
would be an accumulated version of_output
. - If the
output
targets are grouped with optionsgroup_by
orfor_each
, it definesstep_output
, which should have the same number of groups asstep_input
Let us create a few input files,
The output
statement usually defines output of a single substep. In the following example, option group_by
creates two substeps with _input
being a.txt
and b.txt
respectively. The _input
(actually _input[0]
is of type file_target
, which is derived from pathlib.Path
so you can use any member function for pathlib.Path
. Here we use with_suffix
to obtain a.bak
from a.txt
.
As you can see, _output
is defined for each substep from _input
. But what is step_output
?
step_output
is defined as an accumuted version of _output
, with _output
as its groups. It is useful only when the output is imported to other steps, either implicitly as show below, or as output of functions output_from
and named_output
.
SoS substeps must produce different sets of _output
. The following workflow will fail to execute because both substeps will attemp to produce a.bak
.
In situations when you have predefined input and output pairs, you can define output targets with groups using option group_by
. The key here is that the number of groups should match the number of substeps. Technically speaking the output
statement defines step_output
and each substep takes one group as its _output
.
For example,
Similar to named input, you can assign labels to output files and refer them with _output["label"]
.
More importantly though, is that these labels defines named output that can be referred to with function named_output
.
The paired_with
variables can be used to attach variables to output files.
Option group_with
can be used to attach variable to output groups, which can be useful as annotations for output files when the output is passed to other steps.
A potentially confusing part of the group_with
option is that it assigns elements to either _output
or step_output
, depending on how output
statement is defined. If the output
does not have group_by
and for_each
option, it defines a single _output
and group_with
should assign a single element to _output
of this specific substep:
If you would like to attach some result to individual substep, it can be easier to just set the variable to _output
though.