- Difficulty level: easy
- Time need to lean: 20 minutes or less
- Key points:
- Process-oriented workflow specifies workflows and steps to execute
Process-oriented workflows execute user-specified workflows or steps. For example, sos run script A would execute a workflow named A, which can be a single-step workflow with step A, or a multi-step workflow with steps such as A_10, A_20. The workflow may or may not generate any output.
Process-oriented workflows execute steps. For example, the first example in our tutorial on SoS workflow defines a workflow plot
with two steps plot_10
and plot_20
. The magic %run
plot or command sos run script plot
executes all steps in the workflow, regardless of these steps produce any output.
The previous example simply lists all the scripts for each step and does not specify the input and output of the step. SoS assumes that steps without input
statement depends on all its previous steps. That is to say, plot_20
will be executed after plot_10
, and plot_30
, if exists, will be executed after both steps plot_10
and plot_20
. The entire step will be executed sequentially.
You can add input
and output
statements to the steps, which allows you to
- Use variables
_input
and_output
in scripts, which is arguably more readable. - Allows SoS to track the input and output of steps and create signatures. Steps will be ignored if they have been executed before. See runtime signature for details.
- Allows SoS to determine step dependencies and create DAGs so that SoS can execute steps in parallel (see next section).
The following workflow is the version of the previous workflow with input
and output
statements. Note that, however, that plot_20
does not define input because a numerically-indexed step by default takes the step_output
of its previous step (step_10
in this case) as its step_input
.
Concepturally speaking process-oriented workflows are executed sequentially. When you design a workflow, you focus on initial input files, and how they are processed step by step. However, in a complex workflow, there will be branches of the process and you can execute these branches in parallel if you specify input and output of steps.
For example, your workflow can have multiple starting points with different input files:
In this workflow, steps 10
and 20
are executed in parallel because they have different input files and do not depend on each other.
As a slightly more complex example, the following workflow has two longer branches with 20
executed after 10
, and 40
after 30
. More interestingly, because it takes longer for step 10
to execute, step 40
actually starts before step 20
. That is to say, although the workflow executes sequentially conceptually, in really the steps could be executed out of their numerical order.