- Difficulty level: easy
- Time need to lean: 10 minutes or less
- Key points:
- Step output are defined for each substep and can be derived from substep input (variable
_input) - Variable
step_outputis defined at the completion of the step, and can be passed to other steps
- Step output are defined for each substep and can be derived from substep input (variable
The output statement defines the output files or targets of a SoS step, it is optional but is fundamental for the creation of all but very simple workflows. You can check out the How to create dependencies between SoS steps tutorial for a quick overview of the use of output statements. This tutorial lists what you can put in the output statement of a step with simple examples and you should refer to other tutorials for more in-depth discussions of the topics.
The output statement is optional. When no output file is defined, a step will have undefined output.
For example, the following workflow has a step A that execute a simple shell script. No output statement is needed and the workflow will work just fine.
In simple workflows with numerically indexed steps, an empty output will be passed to the next step.
The easiest way to explicitly specify input of a step is to list output files directly in the output statement.
Here we showed touch function for _output, which is of type sos_targets. This function creates one or more files in variable _output and will be used quite often in the tutorials because SoS will check if the output file exists after the execution of the step.
As for the case of input statement, multiple files can be listed as multiple paramters, sequences (list, tuple etc), or variables of string or sequence types.
The output statement can define output for a single substep or all substeps. That is to say,
- If the
outputtargets are ungrouped, it defines_output.step_outputwould be an accumulated version of_output. - If the
outputtargets are grouped with optionsgroup_byorfor_each, it definesstep_output, which should have the same number of groups asstep_input
Let us create a few input files,
The output statement usually defines output of a single substep. In the following example, option group_by creates two substeps with _input being a.txt and b.txt respectively. The _input (actually _input[0] is of type file_target, which is derived from pathlib.Path so you can use any member function for pathlib.Path. Here we use with_suffix to obtain a.bak from a.txt.
As you can see, _output is defined for each substep from _input. But what is step_output?
step_output is defined as an accumuted version of _output, with _output as its groups. It is useful only when the output is imported to other steps, either implicitly as show below, or as output of functions output_from and named_output.
SoS substeps must produce different sets of _output. The following workflow will fail to execute because both substeps will attemp to produce a.bak.
In situations when you have predefined input and output pairs, you can define output targets with groups using option group_by. The key here is that the number of groups should match the number of substeps. Technically speaking the output statement defines step_output and each substep takes one group as its _output.
For example,
Similar to named input, you can assign labels to output files and refer them with _output["label"].
More importantly though, is that these labels defines named output that can be referred to with function named_output.
The paired_with variables can be used to attach variables to output files.
Option group_with can be used to attach variable to output groups, which can be useful as annotations for output files when the output is passed to other steps.
A potentially confusing part of the group_with option is that it assigns elements to either _output or step_output, depending on how output statement is defined. If the output does not have group_by and for_each option, it defines a single _output and group_with should assign a single element to _output of this specific substep:
If you would like to attach some result to individual substep, it can be easier to just set the variable to _output though.