Edit this page on our live server and create a PR by running command !create-pr in the console panel

Sharing variables across steps

  • Difficulty level: intemediate
  • Time need to lean: 20 minutes or less
  • Key points:
    • Variables defined in steps are not accessible from other steps
    • Variables can be shared to steps that depends on it through target sos_variable

Section option shared

SoS executes each step in a separate process and by default does not return any result to the master SoS process. Option shared is used to share variables between steps. This option accepts:

  • A string (variable name), or
  • A map between variable names and expressions (strings) that will be evaluated upon the completion of the step.
  • A sequence of strings (variables) or maps.

For example,

In [1]:
100
In [2]:
100
200

The dict format of shared option allows the specification of expressions to be evaluated after the completion of the step, and can be used to pass pieces of step_output as follows:

In [3]:
a.res
a.txt

sos_variable targets

When we shared variables from a step, the variables will be available to the step that will be executed after it. This is why res and stat would be accessible from step 20 after the completion of step 10. However, in a more general case, a step would need to depends on a target sos_variable to access the shared variable in a non-forward stype workflow.

For example, in the following workflow, two sos_variable targets creates two dependencies on steps notebookCount and lineCount so that these two steps will be executed before default and provide the required variables.

In [4]:
There are 94 notebooks in this directory
Current notebook has 632 lines

Sharing variables from substeps

When you share a variable from a step with multiple substeps, there can be multiple copies of the variable for each substep and it is uncertain which copy SoS will return. Current implementation returns the variable from the last substep, but this is not guaranteed.

For example, in the following workflow multiple random seeds have been generated, but only the last seed is shared outside of step 1 and obtained by step 2.

In [5]:
50
606
267
52
701
Got seed 701 at step 2
Got seed 701 at step 2
Got seed 701 at step 2
Got seed 701 at step 2
Got seed 701 at step 2

If you would like to see the variable in all substeps, you can prefix the variable name with step_, which is a convention for option shared to collect variables from all substeps.

In [6]:
17
114
688
99
253

You can also use the step_* vsriables in expressions as in the following example:

In [7]:
[5, 2, 5, 2, 7, 10, 5, 0, 2, 2]
40

Here we used group_by='all' to collapse multiple substeps into 1.

Sharing variables from tasks

Variables generated by external tasks adds another layer of complexity because tasks usually do not share variables with the substep it belongs. To solve this problem, you will have to use the shared option of task to return the variable to the substep:

In [8]:
[0, 7, 2, 23, 24]
56