- Difficulty level: easy
- Time need to lean: 10 minutes or less
- Key points:
- Runtime signatures avoids repeated execution of steps
- Option
-s
controls the behavior of signatures
One of the most annonying problems with the development and execution of workflows is that it can take very long times to execute them. What makes things worse is that we frequently need to re-run the workflow with different paremeters and even different tools -- it can be really time-consuming to re-execute the whole workflow repeatedly, but it is also very error-prone to repeat selected steps of a workflow.
SoS addresses this problem by using runtime signatures to keep track of execution units, namely the input, output, and dependent targets, and related SoS variables of a piece of workflow. SoS tracks execution of statements at the step level for each substep and saves runtime signatures at a folder called .sos
under the project directory.
Before running any examples, let us clear all runtime signatures of workflows executed under the current directory.
SoS workflows can be executed in batch mode and in interactive mode using the SoS kernel in Jupyter notebook or qtconsole. Because the SoS kernel is mostly used to execute short statements in SoS and other kernels, runtime signatures are by default set to ignore
in interactive mode (and to default
in batch mode).
A consequence of this setting is that scratch steps will always be executed.
When you execute workflows with magics %run
and %sosrun
, you are running workflows in separate processes and the default mode is default
. In this mode, signatures are created and validated, and executed steps will not be re-executed.
Let us create a workflow that saves two files temp/result.txt
and temp/size.txt
, with content of the file controlled by parameter size
.
When the workflow is first executed, both steps will be executed:
Now, if we re-run the last script, nothing changes and it takes a bit of time to execute the script.
However, if you use a different parameter (not the default size=1000
), the steps would be rerun
The signature is at the step level so if you change the second step of the script, the first step would still be skipped. Note that the step is independent of the script executed so a step would be skipped even if its signature was saved by the execution of another workflow. The signature is clever enough to allow minor changes such as addition of spaces and comments.
The assert
mode is used to detect if anything has been changed after the execution of a workflow. For example, let us execute the workflow without parameter,
and the signature check would succeed
If we execute the workflow with another parameter
signature checking would fail because the last signature was saved with option --size 3000
.
and the signature checking would be fine with the parameter.
Now if you change one of the output files, sos would fail with an error message because temp/result.txt
has been changed.
The force
signature mode ignores existing signatures to re-run the workflow, and saves new signatures. This is needed when you would like to forcefully re-run all the steps to generate another set of output if outcome of some steps is random, or to re-run the workflow because of changes that is not tracked by SoS, for example after you have installed a new version of a program.
The build
mode is somewhat opposite to the force
mode in that it creates (or overwrite existing signature if exists) with existing output files. It is useful, for example, if you are adding a step to a workflow that you have tested outside of SoS (without signature) but do not want to rerun it, or if for some reason you have lost your signature files and would like to reconstruct them from existing outputs.
This mode can introduce erraneous files to the signatures because it does not check the validity of the incorporated files. For example, SoS would not complain if you change parameter and replace temp/result.txt
with something else.