Edit this page on our live server and create a PR by running command !create-pr in the console panel

Introduction to SoS Workflows

  • Difficulty level: easy
  • Time need to lean: 10 minutes or less
  • Key points
    • Process-oriented workflows are executed by specifying steps to execute
    • Outcome-oriented workflows are executed by specifying files to generate
    • Both styles follows the same step dependency rules and can be used together

Process-oriented workflows

Workflow with numerically indexed steps

A SoS workflow has a name and one or more numbered steps. The workflows are defined from sections in a SoS script.

  • If unspecified, SoS will execute a default workflow, or the only workflow defined in a script.
  • A section with only numeric index belongs to the default workflow
  • A section with name NAME, and NAME_idx belongs to the NAME workflow
  • A section with wild card character belongs to any workflow that the pattern matches. E.g. _10 belongs to any workflow, m_10 belongs to mouse and mice workflows.

For example, the following sections specify a workflow with four steps 5, 10, 20, and 100. The workflow steps can be specified in any order and do not have to be consecutive.

In [1]:
INFO: Running 5:
INFO: Running 10:
INFO: Running 20:
INFO: Running 100:
INFO: Workflow default (ID=7bb58e65207b5898) is executed successfully with 4 completed steps.

A workflow specified in this way is the default workflow and is actually called default in SoS output. You can specify a workflow with name and give each step a short description as follows:

In [2]:
INFO: Running start:
INFO: Running get data:
INFO: Running quality control:
INFO: Running align:
INFO: Workflow mapping (ID=ea4f71b6c0fc2e0b) is executed successfully with 4 completed steps.

Note that the first step has a name mapping without index, and is assumed to be first step (index 0) of the workflow. That is to say, the following workflow consists of a single step step.

In [3]:
INFO: Running step:
INFO: Workflow step (ID=0fb8831fb99f45c5) is executed successfully with 1 completed step.

A SoS script can define multiple workflows. For example, the following sections of SoS script defines two workflows named mouse and human.

In [4]:
INFO: Running mouse_10:
INFO: Running mouse_20:
INFO: Workflow mouse (ID=bebcdf7d6f8eb277) is executed successfully with 2 completed steps.

In this case, a command line option is needed to specify workflow name. This can be done by magic %run in Jupyter notebook, or a positional argument from the command line, e.g.

    % sos run myscript mouse

Note that the workflow argument is not needed if a default workflow is defined in the script like the following example

In [5]:
INFO: Running 10:
INFO: Running 20:
INFO: Running 30:
INFO: Workflow default (ID=5961501ca6254e36) is executed successfully with 3 completed steps.

Multiple steps can share a single step as follows

In [6]:
INFO: Running mouse_10:
INFO: Running mouse_20:
INFO: Running mouse_30:
INFO: Workflow mouse (ID=6a3ab8526e7dfcfe) is executed successfully with 3 completed steps.

and wildcard steps can be used to define a step for multiple workflows:

In [7]:
INFO: Running mouse_10:
INFO: Running mouse_20:
INFO: Running mouse_30:
INFO: Workflow mouse (ID=49534b8844fd79cd) is executed successfully with 3 completed steps.

If the steps defined in a shared section is similar but not identical, it can use step variable (discussed elsewhere) step_name to behave differently in different workflows. In the following example, the variable step_name will be mouse_20 or human_20 depending on the workflow being executed, and is used to determine the correct reference genome for different workflows.

In [8]:

Subworkflows

Although workflows are defined separately with all their steps, they do not have to be executed in their entirety. A subworkflow refers to a workflow that is defined from one or more steps of an existing workflows. It is specified using syntax workflow:[from-to] where from-to can be n (step n), -n (up to n), n-m (step n to m) and m- (from m). For example

A              # complete workflow A
A:5-10         # step 5 to 10 of A
A:50-          # step 50 up
A:-10          # up to step 10 of A
A:10           # step 10 of workflow A

In practice, the -n format is frequently used to execute part of the workflow for debudding purposes, for example:

In [9]:
INFO: Running 10:
INFO: Running 20:
INFO: Workflow default (ID=93cfb73a2b2e3791) is executed successfully with 2 completed steps.

Combined workflows

You can also combine subworkflows to execute multiple workflows one after another. For example,

A + B          # workflow A, followed by B
A:0 + B        # step 0 of A, followed by B
A:-50 + B + C  # up to step 50 of workflow A, followed by B, and C

This syntax can be used from the command line, e.g.

sos-runner myscript align+call

or from the %run magic of Jupyter notebook

In [10]:
INFO: Running check_10:
INFO: Running align_10:
INFO: Running align_20:
INFO: Running call_10:
INFO: Running call_20:
INFO: Workflow check+align+call (ID=de7ec6b500f7ef17) is executed successfully with 5 completed steps.

Dependent steps of workflows

When you specify a workflow to execute, the steps might depend on other steps. Section step dependencies lists a number of method to create dependencies, but the general idea is that the execution of workflows can trigger the execution of other steps or workflows.

Just as a very simple example, step A is executed before the execution of default step of the following workflow because sos_step('A') is listed as a dependency of step default.

In [11]:
INFO: Running A:
INFO: A (index=0) is ignored due to saved signature
INFO: Running default:
INFO: Workflow default (ID=b5189ae992731d18) is executed successfully with 1 completed step and 1 ignored step.

As another example, step A is executed before step default because step default requires input test_5.txt, which is provided by step A with a provides option.

In [12]:
INFO: Running default:
INFO: Workflow default (ID=84884b4a622a88b8) is executed successfully with 1 completed step.

Outcome-oriented workflows

Up till now we execute SoS workflows by specifying the "workflow" to execute. SoS also supports "outcome-oriented" workflows for which all steps are triggered to generate specified files.

For example, the follow workflow is triggered to generate specified outputs test_15.txt and test_25.txt. Because a step A is designed to generate such files, this step is executed twice to generate two specified files.

In [13]:
INFO: Running A:
INFO: A output: test_25.txt
INFO: Running A:
INFO: A output: test_15.txt
INFO: Workflow default (ID=63f1ebfda7f844d2) is executed successfully with 2 completed steps.

Nested workflows

A workflow could be constructed and executed from a regular step using function sos_run, which is called a "nested workflow".

For example, the default workflow of the following workflow executes two nested workflows A+B and C.

In [14]:
INFO: Running default:
INFO: Running A_1:
INFO: Running A_2:
INFO: Running B:
INFO: Running C_1:
INFO: Running C_2:
INFO: Workflow default (ID=85cba502fdf229b9) is executed successfully with 6 completed steps.

Nested workflows can also be trigged by targets to generate, which is equivalent to sos run -t from command line.

In [15]:
INFO: Running default:
INFO: Running A:
INFO: A output: test_15.txt
INFO: Workflow default (ID=b6659f954bc9f14b) is executed successfully with 2 completed steps.