- Difficulty level: easy to intermediate
- Time need to lean: 20 minutes or less
- Key points:
- A numerically indexed step by default depends on its previous steps
- A step can depend on steps that generate statically defined outputs
- A step can depend on steps with
named_outputs(name)
- A step can depend on the output of another step with
output_from(step)
- A step can depend on another step with
sos_step(step)
- A step can depend on steps of another workflow with
sos_step(workflow)
- A step can depend on another step that
share
a variable - A step can depend on a step that
provides
output through patten matching of file names
SoS provides a plethora of methods to build dependencies between steps. These dependencies creates DAGs of workflows based on which workflows are executed. These dependencies could be built statically before the execution of workflow, or be added during the execution of the workflow but the general idea is the same.
This tutorial summarizes how SoS steps are connected using very simple examples. The details of each method will be discussed later.
Save and view DAGs of a workflow
- Option
-d
saves the directed acyclic graphs (DAGs) in.dot
format during the evolution of a workflow. Multiple workflows will be saved in the same output file for any change of the DAG. - Magic
%preview
displays DAGs as an animation in which black, green, and blue presentspending
,running
, andcompleted
statuses of the nodes.
Tracing dependencies of existing targets (option -T
) *
The -T
(tracing) option forces SoS to trace and execute steps that produce an input or dependent target of a step, even if it already exists. We use this option in this tutorial to contruct the complete DAGs of workflows regardless the existence of intermediate files.
If no input
and output
is defined, a step with numerically indexed step name will be executed after its previous step. Therefore, the following workflow will be executed sequentially in numeric order.
If the input of a step is not available, SoS will attempts to find another step that generates such a file and execute it before this step. A simple data-flow data workflow can therefore be written by defining static input and output of each step. The dependent files can also be specified in depends:
statements.
If there are multiple output or the output is not statically defined, function named_output()
can be used to named output from another step. Basically, name_output()
makes a step depends on another step with the named output. This function can also be used in depends
statement.
Similar to named_output
, function output_from(step)
imports entire output from specified step, therefore creating a step-based dependency. This function can also be used in depends:
statement.
If you simply want to execute another step before a step (for example if the step does not produce any output so you cannot use output_from
), you can explicitly depend on another step using target sos_step
. This allows you to build DAGs explicit without having to define input and output of each step.
Option for sos_step()
can be a workflow name so you can explicitly execute another workflow before the execution of a step.
If a step requires a certain variable sos_variable()
that is shared by another step through step option share
, the step that generates the variable will be executed before the step so that the variable can be passed.
If none of the above method works, SoS looks for special steps that provides
the required file through pattern-matching. Both input:
and depends:
statements can be used.
As an interesting way to summaize this tutorial, let us look backward and see how a step can become a dependency to another step:
Method | Syntax | Matched by | Comment |
---|---|---|---|
by step name | [name] |
depends: sos_step(name) |
name may or may not have index. Can only be in used depends: statement |
input: output_from(name) |
Use output from step name |
||
depends: output_from(name) |
|||
[wfname_index] |
depends: sos_step(wfname) |
Match multiple steps (a workflow) with wfname |
|
by static output | output: 'output.txt' |
input: 'output.txt' |
Matching statically defined targets |
depends: 'output.txt' |
|||
output: A='output.txt' |
input: 'output.txt' |
named output can still be referred to directly | |
depends: 'output.txt' |
|||
by named output | output: A='output.txt' |
input: named_output('A') |
Name A can refer to multiple or dynamic outputs |
depends: named_output('A') |
|||
by shared variable | [name: shared='var'] |
depends: sos_variable('var') |
The only way to exchange variables between steps |
by pattern matching | [name: provides='output.txt'] |
input: 'output.txt' |
Matching statically defined targets |
depends: 'output.txt' |
|||
[name: provides='{filename}.txt'] |
input: 'output.txt' |
pattern matching, only for file targets | |
depends: 'output.txt' |
Note that although we use output.txt
as example for files, non-file targets are allowed in all cases where output.txt
is used, except for the last case of provides='{filename}.txt'
since pattern matching is available only for filenames.