Edit this page on our live server and create a PR by running command !create-pr in the console panel

Summary of step dependencies

  • Difficulty level: easy to intermediate
  • Time need to lean: 20 minutes or less
  • Key points:
    • A numerically indexed step by default depends on its previous steps
    • A step can depend on steps that generate statically defined outputs
    • A step can depend on steps with named_outputs(name)
    • A step can depend on the output of another step with output_from(step)
    • A step can depend on another step with sos_step(step)
    • A step can depend on steps of another workflow with sos_step(workflow)
    • A step can depend on another step that share a variable
    • A step can depend on a step that provides output through patten matching of file names

SoS provides a plethora of methods to build dependencies between steps. These dependencies creates DAGs of workflows based on which workflows are executed. These dependencies could be built statically before the execution of workflow, or be added during the execution of the workflow but the general idea is the same.

This tutorial summarizes how SoS steps are connected using very simple examples. The details of each method will be discussed later.

Numerically indexed steps

If no input and output is defined, a step with numerically indexed step name will be executed after its previous step. Therefore, the following workflow will be executed sequentially in numeric order.

In [1]:
Executing step_1
Executing step_2
Executing step_3
> num.dot (845 B):
No description has been provided for this image

Depends on static output from another step

If the input of a step is not available, SoS will attempts to find another step that generates such a file and execute it before this step. A simple data-flow data workflow can therefore be written by defining static input and output of each step. The dependent files can also be specified in depends: statements.

In [2]:
Generating a.txt
> static.dot (1.9 KiB):
No description has been provided for this image

Depends on named output

If there are multiple output or the output is not statically defined, function named_output() can be used to named output from another step. Basically, name_output() makes a step depends on another step with the named output. This function can also be used in depends statement.

In [3]:
> named.dot (1.6 KiB):
No description has been provided for this image

Depends on output from another step

Similar to named_output, function output_from(step) imports entire output from specified step, therefore creating a step-based dependency. This function can also be used in depends: statement.

In [4]:
> output_from.dot (1.6 KiB):
No description has been provided for this image

Depends on another step

If you simply want to execute another step before a step (for example if the step does not produce any output so you cannot use output_from), you can explicitly depend on another step using target sos_step. This allows you to build DAGs explicit without having to define input and output of each step.

In [5]:
Executing D
> sos_step.dot (1.5 KiB):
No description has been provided for this image

Depends on another workflow

Option for sos_step() can be a workflow name so you can explicitly execute another workflow before the execution of a step.

In [6]:
Executing A_1
Executing A_2
Executing D
> sos_step_wf.dot (1.9 KiB):
No description has been provided for this image

Depends on shared variable

If a step requires a certain variable sos_variable() that is shared by another step through step option share, the step that generates the variable will be executed before the step so that the variable can be passed.

In [7]:
Executing A
Executing D
> shared.dot (1.5 KiB):
No description has been provided for this image

Depends on pattern matched output

If none of the above method works, SoS looks for special steps that provides the required file through pattern-matching. Both input: and depends: statements can be used.

In [8]:
Generating a.txt
Executing D
> pattern.dot (2.1 KiB):
No description has been provided for this image

Summary

As an interesting way to summaize this tutorial, let us look backward and see how a step can become a dependency to another step:

Method Syntax Matched by Comment
by step name [name] depends: sos_step(name) name may or may not have index. Can only be in used depends: statement
    input: output_from(name) Use output from step name
    depends: output_from(name)  
  [wfname_index] depends: sos_step(wfname) Match multiple steps (a workflow) with wfname
by static output output: 'output.txt' input: 'output.txt' Matching statically defined targets
    depends: 'output.txt'  
  output: A='output.txt' input: 'output.txt' named output can still be referred to directly
    depends: 'output.txt'  
by named output output: A='output.txt' input: named_output('A') Name A can refer to multiple or dynamic outputs
    depends: named_output('A')  
by shared variable [name: shared='var'] depends: sos_variable('var') The only way to exchange variables between steps
by pattern matching [name: provides='output.txt'] input: 'output.txt' Matching statically defined targets
    depends: 'output.txt'  
  [name: provides='{filename}.txt'] input: 'output.txt' pattern matching, only for file targets
    depends: 'output.txt'  

Note that although we use output.txt as example for files, non-file targets are allowed in all cases where output.txt is used, except for the last case of provides='{filename}.txt' since pattern matching is available only for filenames.