Edit this page on our live server and create a PR by running command !create-pr in the console panel

Configuration files

  • Difficulty level: easy
  • Time need to lean: 10 minutes or less
  • Key points:
    • SoS reads multiple configuration files and merge the results
    • User configuration files can be specified with option -c
    • Content of configuration file is available through variable CONFIG
    • Host-specific paths can be accessed by path(name, default)

SoS configuration files

SoS reads configurations from

  • A site configuration file site_config.yml under the sos package directory. This is where system adminstrators define system-wide configurations (e.g. host definitions) for all users.
  • A host configuration file ~/.sos/hosts.yml that defines properties of local and remote hosts.
  • A global sos configuration file ~/.sos/config.yml that defines other user-specific settings.
  • And an optional configuration file specified by command line option -c that defines workflow-specific settings.

The configuration files should be in the format of YAML or its subset format JSON. When a SoS script is loaded, SoS looks for and parses site and global configuration files and an optional user-specified configuration file. The results are used by SoS for the execution of workflows, and are available to the workflow as a global variable CONFIG.

Merge of multiple configuration files

All configurations from the aforementioned files are merged to a single dictionary. A dictionary could therefore contain keys defined in different configuration files and a latter file could overwrite keys defined in a previous file. For example, if

  • {'A': {'B': 'old', 'C': 'old'} is defined in ~/.sos/config.yml using

    A:
        B: old
        C: old
    
  • {'A': {'B': 'new', 'D': 'new'} is defined in my_config.yml using

    A:
        B: new
        D: new
    

then the final result using -c my_config.yml would be {'A': {'B': 'new', 'C': 'old', 'D': 'new'}} as if a sinle configuration file with content

A:
    B: new
    C: old
    D: new

is used. This is how site or global configurations are extended or overridden by user configurations.

Derived dictionary keys

A special key based_on will be processed after all configuration files are loaded. The value of based_on should be one or more keys to other dictionaries in the configuration (e.g. hosts.cluster). The consequence of this key is that the items from the referred dictionaries would be merged to the present dictionary if they do not exist in the present dictionary. This allows you to derive a dictionary from an existing one. For example,

In [1]:
Cell content saved to my_config.yml, use option -r to also execute the cell.
In [2]:
{'description': 'Cluster', 'queue_type': 'pbs', 'address': 'domain.com'}

String interpolation

Internally, SoS interpolates string values as if they are Python f-strings. That is to say, expressions inside { } will be interpolated before they are used.

For example, let us assume that we have an incomplete host definition as follows:

yml
user_name: user
hosts:
  desktop:
    paths:
        home: "{os.environ['HOME']}"
  cluster:
    address: "{user_name}@domain.com:{port}"
    port: 123
    queue: medium
    task_template: |
        #PBS -q {queue}

We can see that hosts -> cluster -> address and task_template have expressions in { } that will be expanded as f-string by SoS.

The f-strings will be expanded according to the following rules:

  1. Variables provided from workflow or command line have the highest priority. For example, if queue='long' is specified as runtime options of tasks, variable queue will be expanded as long.
%run -q cluster
task: queue='long'
...
  1. Variables in the parent dictionary. In this example port would be used for address, and queue would be used for task_template if it is not defined from workflow. That is to say queue: medium provides a default value to variable queue.

  2. Variables in the root of the configuration dictionary. In this example user_name is defined and would be used for address. Because key user_name is frequently used in hosts.yml, SoS automatically defines user_name as the local user ID (all lower case) in CONFIG if it is not defined in any of the configuration files.

Note that module os is made available during string interpolation to allow expansion of environment variables from os.environ.

Putting these knowledge in use, let us create a configuration file with %save magic

In [3]:
Cell content saved to my_config.yml, use option -r to also execute the cell.

This configuration file defines hosts named host_r and host_r33 with address localhost. The workflow_template would be used if the host name is specified with option -r. Although the example is meant for a cluster system that loads appropriate module with command module load, this example just echo the module load line to show how the workflow_template is expanded.

First, if we use host host_r, R_version=3.1 will be used:

In [4]:
module load R/3.1
Hello

If we use host host_r33, R_version=3.3 will be used to expand workflow_template derived from host_r.

In [5]:
module load R/3.3
Hello

Then, finally, if we provide a value of R_version from command line, it will override any existing values defined in the config file.

In [6]:
module load R/4.3
[#] 1 step processed (1 job completed)

Use of configuration files

Variable CONFIG

As shown above, the dictionary loaded from SoS configuration files is available to SoS workflow as variable CONFIG. This allows a workflow to retrieve settings from configuration files.

For example, a workflow could be define as follows, which uses Bob as a default value for manager

In [7]:
[#] 1 step processed (1 job completed)

uses Elena from command line

In [8]:
[#] 1 step processed (1 job completed)

Or, with the following configuration file

In [9]:
Cell content saved to myconfig.yml, use option -r to also execute the cell.

use default values from a configuration file

In [10]:
Martin

Host-dependent paths

The hosts definitions in ~/.sos/hosts.yml allow the definition of paths for different hosts. For clarity let us define a local configuration file that points localhost to a example_host configuration.

In [11]:
Cell content saved to myconfig.yml, use option -r to also execute the cell.

Without worrying about the localhost part for now, this configuration file defines a few paths for the localhost. The paths could be retrieved using path(name='project') so that you can write your script in a host-independent way. For example, the following workflow uses path(name='project') to get the host-specific project directory, which is defined as /Users/bpeng1/Documents in myconfig.yml.

In [12]:
Working on /Users/bpeng/vatlab/sos-docs/src/user_guide

If you are uncertain if project is defined for current host, you can use default to specify a default value

In [13]:
Working on /Users/bpeng/vatlab/sos-docs/src/user_guide