- Difficulty level: intermediate
- Time need to lean: 10 minutes or less
- Key points:
- SoS actions are Python functions that usually starts an interpreter to execute a script
- Parameters of actions allow you to execute actions with additional parameter, control input and output, and execute in containers
Although arbitrary python functions can be used in SoS step process, SoS defines many special functions called actions
that accepts some shared parameters, and can behave differently in different running modes of SoS.
For example, command sleep 5
would be executed in run mode,
However, if the action is executed in dryrun mode (option -n
), it will just print the script it is intended to execute.
Actions can have their own parameters but they all accept common set of options that define how they interact with SoS.
Action option active
is used to activate or inactivate an action. It accept either a condition that returns a boolean variable (True
or False
), or one or more integers, or slices that corresponds to indexes of active substeps.
The first usage allows you to execute an action only if certain condition is met, so
if cond:
action(script)
is equivalent to
action(script, active=cond)
or
action: active=cond
script
in script format. For example, the following action will only be executed if a.txt
exists
For the second usage, when a loop is defined by for_each
or group_by
options of input:
statement, an action after input would be repeated for each substep. The active
parameter accepts an integer, either a non-negative number, a negative number (counting backward), a sequence of indexes, or a slice object, for which the action would be active.
For example, for an input loop that loops through a sequence of numbers, the first action run
is executed for all groups, the second action is executed for even number of groups, the last action is executed for the last step.
Option allow_error
tells SoS that the action might fail but this should not stop the workflow from executing. This option essentially turns an error to a warning message and change the return value of action to None
.
For example, in the following example, the wrong shell script would stop the execution of the step so the following action is not executed.
With option allow_error=True
, the error from the sh
action would turn to a warning and the rest of the step would continue to execute:
All script-executing actions accept an option args
, which changes how the script is executed.
By default, such an action has an interpreter
(e.g. bash
), a default args='{filename:q}'
, and the script would be executed as interpreter args
, which is
bash {filename:q}
where {filename:q}
would be replaced by the script file created from the body of the action.
If you would like to change the command line with additional parameters, or different format of filename, you can specify an alternative args
, with variables filename
(filename of temporary script) and script
(actual content of the script).
For example, you can pass command line options to a bash script using args
as follows
and you can actually execute a command without filename
, and instead executing the script directly from command line
Parameter container
and engine
specify name or URL and execution engine of the container used to execute the action. Parameter engine
is usually derived from container
but can be specified explicitly as one of
engine='docker'
: Execute the script in specified container using dockerengine='singularity'
: Execute the script with singularityengine='local'
: Execute the script locally, this is the default mode.
Parameters container
and engine
accept the following values:
container |
engine |
execute by | example | comment |
---|---|---|---|---|
tag |
|
docker | container='ubuntu' |
docker is the default container engine |
name |
docker |
docker | container='ubuntu', engine='docker' |
treat name as docker tag |
docker://tag |
|
docker | container='docker://ubuntu' |
|
filename.simg |
|
singularity | container='ubuntu.simg' |
|
shub://tag |
|
singularity | container='shub://GodloveD/lolcow' |
Image will be pulled to a local image |
library://tag |
|
singularity | container='library://GodloveD/lolcow' |
Image will be pulled to a local image |
name |
singularity |
singularity | container='a_dir', engine='singularity' |
treat name as singularity image file or directory |
docker://tag |
singularity |
singularity | container='docker://godlovdc/lolcow', engine='singularity' |
|
file://filename |
|
singularity | container='file://ubuntu.simg' |
|
local://name |
|
local | container='local:any_tag' |
local://any_tag is equivalent to engine='local' |
name |
local |
local | engine=engine with parameter: engine='docker' |
Usually used to override parameter container |
Basically,
container='tag'
pulls and uses docker imagetag
container='filename.simg
uses an existing singularity imagecontainer='shub://tag'
pulls and uses singularity imageshub://tag
, which will generate a localtag.simg
file
If a docker image is specified, the action is assumed to be executed in the specified docker container. The image will be automatically downloaded (pulled) if it is not available locally.
For example, executing the following script
[10]
python3: container='python'
set = {'a', 'b'}
print(set)
under a docker terminal (that is connected to the docker daemon) will
- Pull docker image
python
, which is the official docker image for Python 2 and 3. - Create a python script with the specified content
- Run the docker container
python
and make the script available inside the container - Use the
python3
command inside the container to execute the script.
Additional docker_run
parameters can be passed to actions when the action
is executed in a docker image. These options include
name
: name of the container (option--name
)tty
: if a tty is attached (default toTrue
, option-t
)stdin_open
: if stdin should be open (default toFalse
, option-i
)user
: username (default oroot
, option-u
)environment
: Can be a string, a list of string or dictinary of environment variables for docker (option-e
)volumes
: shared volumes as a string or list of strings, in the format ofhostdir
(forhostdir:hostdir
) orhostdir:mnt_dir
, in addition to current working directory which will always be shared.volumes_from
: container names or Ids to get volumes fromport
: port opened (option-p
)extra_args
: If there is any extra arguments you would like to pass to thedocker run
process (after you check the actual command ofdocker run
of SoS
Because of the different configurations of docker images, use of docker in SoS can be complicated. Please refer to http://vatlab.github.io/doc/user_guide/docker.html for details.
Option default_env
set environment variables if they do not exist in the system. The value of this option should be a dictionary with string keys and values.
For example, if we have a process that depends on an environmental variable DEBUG
, you can set a default value for it
If users actually set DEBUG
to something else, the option will not be applied and shell script will be running in production mode.
Option env
set environment variables that overrides system variables defined in os.environ
. This option can be used to define PATH
and other environmental variables for the action. Note that the effect of option is limited to this option.
Although all actions accept parameter input
, its usage vary among actions. Roughly speaking, script-executing actions such as run
, bash
and python
prepend the content of all input files to the script; report-generation actions report
, pandoc
and RMarkdown
append the content of input files after the specifie script, and other actions usually ignore this parameter.
For example, if you have defined a few utility functions that will be used by multiple scripts, you can define it in a separate file
and include it in python
actions as follows:
Note that although SoS would check the existence of input
files before executing the action, this option does not define any variable (such as _input
) to be used in the script.
Similar to input
, parameter output
defines the output of an action, which can be a single name (or target) or a list of files or targets. SoS would check the existence of output target after the completion of the action. For example,
Option stdout
is applicable to script-executing actions such as bash
and R
and redirect the standard out of the action to specified file. The value of the option should be a path-like object (str
, path
, etc), or False
. The file will be opened in append
mode so you will have to remove or truncate the file if the file already exists. If stdout=False
, the output will be suppressed (redirect to /dev/null
under linux).
For example,
Option stderr
is similar to stdout
but redirects the standard error output of actions. stderr=False
also suppresses stderr.
Actions are by default executed directly with their interpreters, for example an R
action will trigger a command Rscript script_name
where script_name
is a temporary file with the content of the script.
You could execute the command in a template that is specified either directly with option template
, or a name with option template_name
.
Expansion of template
When a template is specified directly, it should be a string with the following variables that will be expanded before execution:
variable | value |
---|---|
cmd |
the command being executed (e.g. Rscript script_name ) |
filename |
the script file (e.g. script_name ) with type sos_targets |
script |
the script that is being executed |
variable | any keyword argument |
For example, with a template cat {filename}
, the action prints the content of the script instead of executing it.
In another example, a template is used to calcuate the time used to execute the shell script.
Pre-defined templates
If option template_name
is specified, SoS will look into configuration files for a dictionary named action_templates
for the template, and then for default templates provided by SoS.
For example, if we save templates show_script
and time_me
in a configuration file myconfig.yml
These templates can be used directly with option template_name
:
To use built-in template conda
, you will need to provide option env_name
as a keyword argument as follows
Templates are by default shell scripts (and batch script under windows) and are executed as such. However, arbitrary interpreter could be specified with a shebang line in the template. For example, the following template wraps the python script directly to print execution time. Note that the braces that are not intepolated by SoS are doubled in the Python f-string.
If an action takes a long time to execute and the step it resides tend to be changed (for example, during the development of a workflow step), you might want to keep action-level signatures so that the action could be skipped if it has been executed before.
Action-level signature is controlled by parameter tracked
, which can be None
(no signature), True
(record signature), False
(do not record signature), a string (filename), or a list of filenames. When this parameter is True
or one or more filenames, SoS will
- if specified, collect targets specified by parameter
input
- if specified, colelct targets specified by parameter
output
- if one or more files are specified, collect targets from parameter
tracked
These files, together with the content of the first parameter (usually a script), will be used to create a step signature and allow the actions with the same signature be skipped.
For example, suppose action sh
is time-consuming that produces output test.txt
Because of the tracked=True
parameter, a signature will be created with output
and it will not be re-executed even when the step itself is changed (from sleep(2)
to sleep(1)
).
Note that the signature can only be saved and used with appropriate signature mode (force
, default
etc).
Option workdir
changes the current working directory for the action, and change back once the action is executed. The directory will be created if it does not exist.