- Difficulty level: intemediate
- Time need to lean: 20 minutes or less
- Key points:
- The mixed-style workflow consists of both regular and auxiliary steps and can be trigged by both workflow name and targets.
Process-oriented or outcome-oriented styles are only used to explain two typical camps of workflows. Technically speaking SoS does not enforce any style,
- If a workflow is specified, SoS collects steps for the workflow and builds a DAG.
- If one or more targets are specified, SoS locates steps that generates these targets.
- If any of the
input
ordepends
targets are missing, it extends the DAG according to step dependency rules.
This approach unifies the two styles and allows users to write workflows in a combination of both styles, which we call a mixed-style.
Auxiliary steps provide a mechanism to produce missing targets and can also be used in forward-time workflows. The resulting workflows have a numbered "stem" steps and an arbitrary number of auxiliary steps that provide required input and dependent files for these steps.
For example, the following example demonstrates the use of a nested workflow with two forward-style workflows with assistance from two auxiliary steps.
In this example,
A
default
step serves as the entry point for the workflow, which calls a nested workflowalign+call
with thecall
steps executed after thealign
steps.dbsnp.vcf
andhg19.fa
are required by stepsalign_10
andcall_10
. They are provided by two auxiliary steps so these two steps will be called only ifdbsnp.csv
and/orhg19.fa
are missing.
Here is another example of a mixed-style workflow that executes a forward-style workflow to satisfy dependency of a makefile-style workflow.
The example is a bit complex,
- There is no
default
entry point and the workflow is triggered by-t vcf
, which is anamed_output
specified in stepcall_20
. - The output of this step determined from a parameter
sample-name
, which is specified from command line as--sample-name KS1
. call_20
is part of a forward-style workflow. Since it does not define any input, it depends upon all its previous steps, which in this case iscall_10
.call_10
depends ondbsnp.vcf
andhg19.fa
, which are generated by auxiliary stepsrefseq
anddbsnp
call_10
also depends on a named outputbam
, which is trandlated toKS1_sorted.bam
according to stepalign_20
.KS1_sorted.bam
does not exist soalign_20
is needed, and for the same reason,align_10
has to be executed.
So in the end we have obtained the same DAG as the previous one, but as you can see the way this DAG is constructed is vastly different.
The use of nested workflows that construct and execute workflows in both process-oriented and outcome-oriented styles allows even more flexible execution of SoS workflows but here we just use sos_run
to demonstrate potential multiple ways to execute a workflow.
This example has a forward-style workflow process
in which step process_20
depends on an auxiliary step download
.
- In the first case with command line equivalence
sos run myscript process
the forward-style workflow process
is executed but download
is also executed because it is required by step 20.
- In the second example
sos run myscript -t ms.pdf.gz
two auxiliary steps download
and gzip
are called to produce target ms.pdf.gz
.
- In the third example
sos run myscript -t step20.out
the process
workflow is executed partially until it generates target step20.out
.
- In the fourth example
sos run myscript -t step20.out ms1.pdf.gz
the process
workflow is executed partially to produce target step20.out
, and two auxiliary steps are executed to produce the additional target ms1.pdf.gz
.
- In the last example
sos run myscript process -t ms1.pdf.gz
sos executes the workflow process
, and because the workflow does not generate ms1.pdf.gz
, it also runs two auxiliary steps to produce it.