- Difficulty level: easy
- Time need to lean: 10 minutes or less
- Key points:
- SoS is extended from Python 3.6+ and accept any Python statements
- SoS uses section headers to define workflow steps
- SoS uses
parameter
,input
,output
,depends
, andtask
statements to construct workflows
A SoS script defines one or more workflows, and each workflow consists of one or more steps.
SoS steps are defined by SoS sections. A SoS section consists of a header with one or more step names and optional options. The body of a SoS section consists of optional comments, statements, input, output, depends statements, parameter definitions, and external task definition.
The following terms will be used throughout this documentation:
- Script: A SoS script that defines one or more workflows.
- Section: A group of statements with a header that defines one or more SoS steps. A header can be ignored if it is the first or only section in a script.
- Workflow: A collection of steps that can be executed to complete certain task.
- Step: A step of a workflow that performs one piece of the workflow.
- Target: Objects that are input or result of a SoS step, which are usually files, but can also be objects such as other SoS steps and SoS variables.
- Step options: Options specified in section header to assist the definition of the workflow.
- Step input: Specifies the input targets of the step.
- Step output: Specifies the output targets of the step.
- Step dependencies: Specifies the targets that are required by the step.
- Substep: A substep consists of all statements after the
input
specification. It can be executed multiple times each with a subset of input files and/or different parameters. - Task: Part or all substep that will be executed outside of SoS, and potentially on a different server. These are usually resource intensive jobs that will take a long time to complete.
- Action: SoS or user-defined Python functions. They differ from regular Python functions in that they may behave differently in different running mode of SoS (e.g. ignore when executed in dryrun mode).
SoS is based on the Python 3 (3.6 and above) programming language. If you are unfamiliar with Python, you can learn some basics of Python, usually in less than half a day, by reading some Python tutorials (e.g. the official python tutorial). This short introduction is good enough for you to get started with SoS.
SoS adds the following syntax to standard Python syntax:
syntax | Example | Main Usage |
---|---|---|
Script format of function call | R: expand=True data <- read.csv("{_input}") |
Verbatim inclusion of scripts with optional indentation and string interpolation |
Section specification | [align_20] |
Define steps of workflows |
SoS statements | Direct execution of steps | |
parameter : cutoff=10 |
Obtain option from command line or workflow caller | |
input : fastq_files |
Specify input files of a step | |
output : f"{_input}.idx" |
Specify output files of a step | |
depends : hg19_fa |
Specify step dependencies | |
task : queue='cluster' |
Specify external tasks |
These syntaxes are described in details in the following sections
A sos script can be defined in a plain text file. A .sos
suffix is recommended but not required. A SoS script consists of sections that define steps of one or more workflows.
A SoS script usually starts with lines
#!/usr/bin/env sos-runner
#fileformat=SOS1.0
The first line allows the script to be executed by command sos-runner
if it is executed as an executable script. The second line tells SoS the format of the script. The #fileformat
line does not have to be the first or second line but should be in the first comment block. The latest version of SOS format is assumed if no format line is present so it is a good practice to specify version of file format to make sure the script is interpreted correctly.
A SoS workflow can be embedded in a SoS Notebook, and consists of all SoS sections in SoS cells.
You can preview the embedded workflow with magic %preview --workflow
in a SoS notebook, or convert a SoS Notebook to .sos
format using command
sos convert filename.ipynb filename.sos
Note that although the workflow defined in a previous section contains a default global section without header, the global section is not considered part of the embedded workflow.
Global sections can be defined without section header in a .sos
file as statements before any other section, and as a regular section with header [global]
. The global sections are the only section that can appear multiple times in a SoS script.
Definitions in the global section are shared by all sections so it is usually used to define global variables and parameters. SoS implicitly defines the following variables in the global section:
SOS_VERSION
: version of SoS interpreter.CONFIG
: configurations read from site, hosts, global, local and user specific configuration files. See configuration files for details.
A SoS section is marked by a section header in the format of
[names: options]
The header should start with a [
from the beginning of a line and end with a ]
. It can contain one or more names with optional description (for each step) and section options (for all steps defined in the section).
Section names of a section follow the following rules:
Format | Example | Usage |
---|---|---|
name_index |
human_10 |
Defines step index of workflow name . Here name can be any name with alpha-numeric characters and - and _ . index should be a non-negative number. |
name |
update-website |
Section name without index is equivalent to name_0 |
index |
10 |
Section name without workflow name is equivalent to default_index |
pattern_index |
*_0 , human*_10 |
Equivalent to step index of all matching workflows defined in the script. The pattern should follow Unix filename matching |
stepname (desc) |
10 (align) |
Optional short description can be used to describe the goal of the step |
name1,name2,... |
human_10,mouse_10 |
Comma separated names define multiple steps for one or more workflows |
A SoS step accepts the following options:
Name | Example | Usage |
---|---|---|
skip |
skip skip=False |
Always or conditionally skip a step, expr should be evaluated as True or False |
shared |
shared='K' shared={'K': 'output[0]'} |
Variable that will be made available to the workflow after completion of the step |
provides |
provides='filename' provides=executable('fastqc') provides='{filename}.bam.gz' provides=['A1.txt', 'A2.txt'] |
Targets that will be generated after the completion of the step. This option turns the step to an auxiliary step that will be executed when the provided target is needed. |
Please refer to section SoS Step for more details on these options.
SoS workflows consist of SoS steps. Please refer to section SoS Workflows for the definition of process- and outcome-oriented workflows in SoS.
Most comments in SoS scripts are significant in that they will be displayed as help messages of the script. In particular,
- The first comment block is the description of the script. This is where you introduce the purpose of the workflows.
- Comments immediately before section header and parameter: definitions become the descriptions of the sections and parameters.
- Workflow, step, and parameter descriptions are displayed in the output of
-h
of the script.
For example, option -h
of the %run
magic displays the help message of the script. The same could be achived on command line with
sos run script.sos -h
sos run script.ipynb -h
sos-runner script.sos -h
or
script.sos -h
if you have
#!/usr/bin/env sos-runner
as the first line of the script and give script.sos
executable permission.