- Difficulty level: intermediate
- Time need to lean: 10 minutes or less
- Key points:
- Function
output_from(step)
refers to output from anotherstep
output_from(step)[name]
can be used to refer to named output fromstep
- Function
As shown in the example from tutorial How to use named output in data-flow style workflows, function named_output
can be used to refer to named output from another step:
One obvious limitation of named_output()
is that the name has to be unique in the workflow. For example, in the following script where another step test_csv
also gives its output a name csv
, the workflow would fail due to ambiguity. This is usually not a concern with small workflows. However, when workflows get more and more complex, it is sometimes desired to anchor named output more precisely.
Function output_from(steps, group_by, ...)
Function output_from
refers to the output of step
. The returned the object is the complete output from step
with its own sources and groups. Therefore,
- More than one steps can be specified as a list of step names
- Option
group_by
can be used to regroup the returned files output_from(step)[name]
refers to all output with sourcename
Function output_from
imports the output from one or more other steps. For example, in the following workflow output_from(['step_10', 'step_20'])
takes the output from steps step_10
and step_20
as input.
The above example is a simple forward workflow with numerically numbered steps. In this case the parameters of output_from
can be simplied to just the indexes (integers) so the workflow can be written as
The source steps
of output_from(steps)
does not have to be limited to numerically-indexed steps. For example, the above example can be written as:
The sources
of the files returned from output_from()
is by default the names of the steps so you can refer to these files separately using the _input[name]
syntax:
If the output has its own sources (names), the sources will be kept.
As usual, keyword arguments of the input statement override the sources
of input files:
Similar to the case with named_output
, the returned object from output_from()
keeps its original groups. For example,
You can override the groups using the group_by
option of output_from
.
Note that we used
_input.with_suffix('.bak')
when _input
contains only one filename and the above the statement is equivalent to
_input[0].with_suffix('.bak')
However, when _input
contains more than one files, you will have to deal with them one by one as follows:
[x.with_suffix('.bak') for x in _input]
Going back to our convert
, plot
example. When another step is added to have the same named output, it is no longer possible to use named_output(name)
. In this case you can explicitly specify the step from which the named output is defined, and use
output_from(step)[name]
instead of
named_output(name)
as shown in the following example:
Note that output_from
is better than named_output
for its ability to referring to a specific step, but is also worse than named_output
for the same reason because it makes the workflow more difficult to maintain. We generally recommend the use of named_output
for its simplicity.
Function output_from()
obtains outputs, actually substeps output from another step. There is, however, a case when a substep is skipped and leaves no output. In this case, the substep output is dicarded.
For example, when a substep in the step A
of the following workflow is skipped, the result from output_from('A')
contains only the output of valid substeps.
However, if you would like to keep consistent number of substeps across steps, you can handle get output from all substeps by using option remove_empty_groups=False
.
Function output_from(workflow_name)
output_from(workflow_name)
is equivalent to output_from(workflow_name_index)
where index
is the largest index of the workflow workflow_name
Function output_from
is usually used to refer the output of a specific step. However, similar to target sos_step
that can refer to a numerically indexed workflow, output_from
can also accept the name of the workflow and returns the output of the last step of the workflow.
For example, in the following workflow, output_from('A')
is used to obtain the output of step A_2
, which is the last step of the workflow A
. Although output_from('A')
is identical to output_from('A_2')
, it frees you from specifying the index of the last step of the workflow, and is more intuitive to think output_from('A')
as the output of the workflow.