- Difficulty level: easy
- Time need to lean: 10 minutes or less
- Key points:
- SoS reads multiple configuration files and merge the results
- User configuration files can be specified with option
-c
- Content of configuration file is available through variable
CONFIG
- Host-specific paths can be accessed by
path(name, default)
SoS reads configurations from
- A site configuration file
site_config.yml
under the sos package directory. This is where system adminstrators define system-wide configurations (e.g. host definitions) for all users. - A host configuration file
~/.sos/hosts.yml
that defines properties of local and remote hosts. - A global sos configuration file
~/.sos/config.yml
that defines other user-specific settings. - And an optional configuration file specified by command line option
-c
that defines workflow-specific settings.
The configuration files should be in the format of YAML
or its subset format JSON
. When a SoS script is loaded, SoS looks for and parses site and global configuration files and an optional user-specified configuration file. The results are used by SoS for the execution of workflows, and are available to the workflow as a global variable CONFIG
.
All configurations from the aforementioned files are merged to a single dictionary. A dictionary could therefore contain keys defined in different configuration files and a latter file could overwrite keys defined in a previous file. For example, if
{'A': {'B': 'old', 'C': 'old'}
is defined in~/.sos/config.yml
usingA: B: old C: old
{'A': {'B': 'new', 'D': 'new'}
is defined inmy_config.yml
usingA: B: new D: new
then the final result using -c my_config.yml
would be {'A': {'B': 'new', 'C': 'old', 'D': 'new'}}
as if a sinle configuration file with content
A:
B: new
C: old
D: new
is used. This is how site or global configurations are extended or overridden by user configurations.
A special key based_on
will be processed after all configuration files are loaded. The value of based_on
should be one or more keys to other dictionaries in the configuration (e.g. hosts.cluster
). The consequence of this key is that the items from the referred dictionaries would be merged to the present dictionary if they do not exist in the present dictionary. This allows you to derive a dictionary from an existing one. For example,
Internally, SoS interpolates string values as if they are Python f-strings. That is to say, expressions inside { }
will be interpolated before they are used.
For example, let us assume that we have an incomplete host definition as follows:
yml
user_name: user
hosts:
desktop:
paths:
home: "{os.environ['HOME']}"
cluster:
address: "{user_name}@domain.com:{port}"
port: 123
queue: medium
task_template: |
#PBS -q {queue}
We can see that hosts
-> cluster
-> address
and task_template
have expressions in { }
that will be expanded as f-string by SoS.
The f-strings will be expanded according to the following rules:
- Variables provided from workflow or command line have the highest priority. For example, if
queue='long'
is specified as runtime options of tasks, variablequeue
will be expanded aslong
.
%run -q cluster
task: queue='long'
...
Variables in the parent dictionary. In this example
port
would be used foraddress
, andqueue
would be used fortask_template
if it is not defined from workflow. That is to sayqueue: medium
provides a default value to variablequeue
.Variables in the root of the configuration dictionary. In this example
user_name
is defined and would be used foraddress
. Because keyuser_name
is frequently used inhosts.yml
, SoS automatically definesuser_name
as the local user ID (all lower case) inCONFIG
if it is not defined in any of the configuration files.
Note that module os
is made available during string interpolation to allow expansion of environment variables from os.environ
.
Putting these knowledge in use, let us create a configuration file with %save
magic
This configuration file defines hosts named host_r
and host_r33
with address localhost
. The workflow_template
would be used if the host name is specified with option -r
. Although the example is meant for a cluster system that loads appropriate module with command module load
, this example just echo
the module load
line to show how the workflow_template
is expanded.
First, if we use host host_r
, R_version=3.1
will be used:
If we use host host_r33
, R_version=3.3
will be used to expand workflow_template
derived from host_r
.
Then, finally, if we provide a value of R_version
from command line, it will override any existing values defined in the config file.
As shown above, the dictionary loaded from SoS configuration files is available to SoS workflow as variable CONFIG
. This allows a workflow to retrieve settings from configuration files.
For example, a workflow could be define as follows, which uses Bob
as a default value for manager
uses Elena
from command line
Or, with the following configuration file
use default values from a configuration file
path(name, default)
The path
datatype of SoS is derived from `pathlib.Path`. One of the additions of this datatype is paramters `name
and default
, which returns a pre-defined path
defined in
CONFIG["hosts"][current-host]["paths"]
where current-host
is normally localhost
but can be one of the remote hosts if the function is called from a remote host. A default
value could be returned if name
is not available in the configuration.
The hosts
definitions in ~/.sos/hosts.yml
allow the definition of paths for different hosts. For clarity let us define a local configuration file that points localhost
to a example_host
configuration.
Without worrying about the localhost
part for now, this configuration file defines a few paths for the localhost. The paths
could be retrieved using path(name='project')
so that you can write your script in a host-independent way. For example, the following workflow uses path(name='project')
to get the host-specific project
directory, which is defined as /Users/bpeng1/Documents
in myconfig.yml
.
If you are uncertain if project
is defined for current host, you can use default
to specify a default value