- Difficulty level: easy
- Time need to lean: 10 minutes or less
- Key points
- Process-oriented workflows are executed by specifying steps to execute
- Outcome-oriented workflows are executed by specifying files to generate
- Both styles follows the same step dependency rules and can be used together
A SoS workflow has a name and one or more numbered steps. The workflows are defined from sections in a SoS script.
- If unspecified, SoS will execute a default workflow, or the only workflow defined in a script.
- A section with only numeric index belongs to the default workflow
- A section with name NAME, and NAME_idx belongs to the NAME workflow
- A section with wild card character belongs to any workflow that the pattern matches. E.g. _10 belongs to any workflow, m_10 belongs to mouse and mice workflows.
For example, the following sections specify a workflow with four steps 5, 10, 20, and 100. The workflow steps can be specified in any order and do not have to be consecutive.
A workflow specified in this way is the default workflow and is actually called default in SoS output. You can specify a workflow with name and give each step a short description as follows:
Note that the first step has a name mapping without index, and is assumed to be first step (index 0) of the workflow. That is to say, the following workflow consists of a single step step.
A SoS script can define multiple workflows. For example, the following sections of SoS script defines two workflows named mouse and human.
In this case, a command line option is needed to specify workflow name. This can be done by magic %run in Jupyter notebook, or a positional argument from the command line, e.g.
% sos run myscript mouse
Note that the workflow argument is not needed if a default workflow is defined in the script like the following example
Multiple steps can share a single step as follows
and wildcard steps can be used to define a step for multiple workflows:
If the steps defined in a shared section is similar but not identical, it can use step variable (discussed elsewhere) step_name to behave differently in different workflows. In the following example, the variable step_name will be mouse_20 or human_20 depending on the workflow being executed, and is used to determine the correct reference genome for different workflows.
Although workflows are defined separately with all their steps, they do not have to be executed in their entirety. A subworkflow refers to a workflow that is defined from one or more steps of an existing workflows. It is specified using syntax workflow:[from-to] where from-to can be n (step n), -n (up to n), n-m (step n to m) and m- (from m). For example
A # complete workflow A
A:5-10 # step 5 to 10 of A
A:50- # step 50 up
A:-10 # up to step 10 of A
A:10 # step 10 of workflow A
In practice, the -n format is frequently used to execute part of the workflow for debudding purposes, for example:
You can also combine subworkflows to execute multiple workflows one after another. For example,
A + B # workflow A, followed by B
A:0 + B # step 0 of A, followed by B
A:-50 + B + C # up to step 50 of workflow A, followed by B, and C
This syntax can be used from the command line, e.g.
sos-runner myscript align+call
or from the %run
magic of Jupyter notebook
When you specify a workflow to execute, the steps might depend on other steps. Section step dependencies lists a number of method to create dependencies, but the general idea is that the execution of workflows can trigger the execution of other steps or workflows.
Just as a very simple example, step A is executed before the execution of default step of the following workflow because sos_step('A') is listed as a dependency of step default.
As another example, step A is executed before step default because step default requires input test_5.txt, which is provided by step A with a provides option.
Up till now we execute SoS workflows by specifying the "workflow" to execute. SoS also supports "outcome-oriented" workflows for which all steps are triggered to generate specified files.
For example, the follow workflow is triggered to generate specified outputs test_15.txt and test_25.txt. Because a step A is designed to generate such files, this step is executed twice to generate two specified files.
A workflow could be constructed and executed from a regular step using function sos_run, which is called a "nested workflow".
For example, the default workflow of the following workflow executes two nested workflows A+B and C.
Nested workflows can also be trigged by targets to generate, which is equivalent to sos run -t from command line.