Edit this page on our live server and create a PR by running command !create-pr in the console panel

Syntax and file formats

  • Difficulty level: easy
  • Time need to lean: 10 minutes or less
  • Key points:
    • SoS is extended from Python 3.6+ and accept any Python statements
    • SoS uses section headers to define workflow steps
    • SoS uses parameter, input, output, depends, and task statements to construct workflows

Terminology

A SoS script defines one or more workflows, and each workflow consists of one or more steps.

SoS steps are defined by SoS sections. A SoS section consists of a header with one or more step names and optional options. The body of a SoS section consists of optional comments, statements, input, output, depends statements, parameter definitions, and external task definition.

The following terms will be used throughout this documentation:

  • Script: A SoS script that defines one or more workflows.
  • Section: A group of statements with a header that defines one or more SoS steps. A header can be ignored if it is the first or only section in a script.
  • Workflow: A collection of steps that can be executed to complete certain task.
  • Step: A step of a workflow that performs one piece of the workflow.
  • Target: Objects that are input or result of a SoS step, which are usually files, but can also be objects such as other SoS steps and SoS variables.
  • Step options: Options specified in section header to assist the definition of the workflow.
  • Step input: Specifies the input targets of the step.
  • Step output: Specifies the output targets of the step.
  • Step dependencies: Specifies the targets that are required by the step.
  • Substep: A substep consists of all statements after the input specification. It can be executed multiple times each with a subset of input files and/or different parameters.
  • Task: Part or all substep that will be executed outside of SoS, and potentially on a different server. These are usually resource intensive jobs that will take a long time to complete.
  • Action: SoS or user-defined Python functions. They differ from regular Python functions in that they may behave differently in different running mode of SoS (e.g. ignore when executed in dryrun mode).

SoS Syntax

SoS is based on the Python 3 (3.6 and above) programming language. If you are unfamiliar with Python, you can learn some basics of Python, usually in less than half a day, by reading some Python tutorials (e.g. the official python tutorial). This short introduction is good enough for you to get started with SoS.

SoS adds the following syntax to standard Python syntax:

syntax Example Main Usage
Script format of function call R: expand=True
data <- read.csv("{_input}")
Verbatim inclusion of scripts with optional indentation and string interpolation
Section specification [align_20] Define steps of workflows
SoS statements Direct execution of steps
parameter: cutoff=10 Obtain option from command line or workflow caller
input: fastq_files Specify input files of a step
output: f"{_input}.idx" Specify output files of a step
depends: hg19_fa Specify step dependencies
task: queue='cluster' Specify external tasks

These syntaxes are described in details in the following sections

File formats

Native SoS file format

A sos script can be defined in a plain text file. A .sos suffix is recommended but not required. A SoS script consists of sections that define steps of one or more workflows.

A SoS script usually starts with lines

#!/usr/bin/env sos-runner
#fileformat=SOS1.0

The first line allows the script to be executed by command sos-runner if it is executed as an executable script. The second line tells SoS the format of the script. The #fileformat line does not have to be the first or second line but should be in the first comment block. The latest version of SOS format is assumed if no format line is present so it is a good practice to specify version of file format to make sure the script is interpreted correctly.

Notebook format

A SoS workflow can be embedded in a SoS Notebook, and consists of all SoS sections in SoS cells.

You can preview the embedded workflow with magic %preview --workflow in a SoS notebook, or convert a SoS Notebook to .sos format using command

sos convert filename.ipynb filename.sos

Note that although the workflow defined in a previous section contains a default global section without header, the global section is not considered part of the embedded workflow.

Sections

Global sections and default variables

Global sections can be defined without section header in a .sos file as statements before any other section, and as a regular section with header [global]. The global sections are the only section that can appear multiple times in a SoS script.

Definitions in the global section are shared by all sections so it is usually used to define global variables and parameters. SoS implicitly defines the following variables in the global section:

  • SOS_VERSION: version of SoS interpreter.
  • CONFIG: configurations read from site, hosts, global, local and user specific configuration files. See configuration files for details.

SoS Sections

A SoS section is marked by a section header in the format of

[names: options]

The header should start with a [ from the beginning of a line and end with a ]. It can contain one or more names with optional description (for each step) and section options (for all steps defined in the section).

Section names of a section follow the following rules:

Format Example Usage
name_index human_10 Defines step index of workflow name. Here name can be any name with alpha-numeric characters and - and _. index should be a non-negative number.
name update-website Section name without index is equivalent to name_0
index 10 Section name without workflow name is equivalent to default_index
pattern_index *_0, human*_10 Equivalent to step index of all matching workflows defined in the script. The pattern should follow Unix filename matching
stepname (desc) 10 (align) Optional short description can be used to describe the goal of the step
name1,name2,... human_10,mouse_10 Comma separated names define multiple steps for one or more workflows

A SoS step accepts the following options:

Name Example Usage
skip skip
skip=False
Always or conditionally skip a step, expr should be evaluated as True or False
shared shared='K'
shared={'K': 'output[0]'}
Variable that will be made available to the workflow after completion of the step
provides provides='filename'
provides=executable('fastqc')
provides='{filename}.bam.gz'
provides=['A1.txt', 'A2.txt']
Targets that will be generated after the completion of the step. This option turns the step to an auxiliary step that will be executed when the provided target is needed.

Please refer to section SoS Step for more details on these options.

SoS workflows consist of SoS steps. Please refer to section SoS Workflows for the definition of process- and outcome-oriented workflows in SoS.

Comments and help messages

Most comments in SoS scripts are significant in that they will be displayed as help messages of the script. In particular,

  • The first comment block is the description of the script. This is where you introduce the purpose of the workflows.
  • Comments immediately before section header and parameter: definitions become the descriptions of the sections and parameters.
  • Workflow, step, and parameter descriptions are displayed in the output of -h of the script.

For example, option -h of the %run magic displays the help message of the script. The same could be achived on command line with

sos run script.sos -h
sos run script.ipynb -h
sos-runner script.sos -h

or

script.sos -h

if you have

#!/usr/bin/env sos-runner

as the first line of the script and give script.sos executable permission.

InĀ [1]:
usage: sos run /Users/bpeng1/sos/sos-docs/src/user_guide/.tmp_script_cfe4atzs.sos
               [workflow_name | -t targets] [options] [workflow_options]
  workflow_name:        Single or combined workflows defined in this script
  targets:              One or more targets to generate
  options:              Single-hyphen sos parameters (see "sos run -h" for details)
  workflow_options:     Double-hyphen workflow-specific parameters

This workflow converts input excel file
into a .csv file and plot fields log2FoldChange
again stat

Workflows:
  plot

Global Workflow Options:
  --excel-file VAL (as str, required)
                        input excel file
  --csv-file 'DEG.csv'
                        intermediate csv file
  --figure-file 'output.pdf'
                        output figure file

Sections
  plot_10:              Uses command xlsx2csv to convert excel file to csv
                        format
  plot_20:              Load data in csv format and plot log2FoldChange again
                        stat