- Difficulty level: easy
- Time need to lean: 10 minutes or less
- Key points:
- Option
-e
specifies how sos handled runtime options -e default
terminates the current step (and branch), but allowes other branches to complete.-e ignore
ignores errors and allow current and other branches to complete.-e abort
terminates the current and all running steps immediately.
- Option
Runtime errors happen from time to time. Depending on the nature of errors you can terminate the entire workflow brutally, gentaly, or ignore all errors.
Let us assume that an error happens at a substep of step, and we need to decide
- Should running steps or substeps be terminates immediately.
- Should the rest of the substeps of the failing step be executed if they have not been submitted.
- Should the unaffected branches of the DAG be executed while allowing the branch with failed step to terminate.
- Should SoS try to execute the steps after the failed step.
The choices to these questions are controlled by the following error modes, specified with option -e
to command sos run
(or magics %run
etc in SoS Notebook):
mode | running substeps | pending substeps | following steps | unaffected branches | exit status |
---|---|---|---|---|---|
default |
allow complete | allow complete | canceled | allow complete | failed |
ignore |
allow complete | allow complete | allow complete | allow complete | success |
abort |
aborted | canceled | canceled | canceled | failed |
Let us use the following example workflow to demonstrate the different modes. In this workflow,
- Step
10
has three substeps that are executed in parallel for 2 seconds. The second substep will generate an error at the end of the step. - Step
20
follows step10
and will execute three substeps for 2 seconds. - Step
30
hasinput: None
so it will start at the same time as step10
. It is supposed to sleep 3 seconds. - Step
40
will be executed after step30
for 1 second.
The execution of this workflow in different error handling modes are depicated as follows:
In the default
error-handling mode, three substeps of step 10 and step 30 are started at the same time. After substep 10.1 failed, step 10 is stopped, but step 30 is allowed to completed, followed by step 40 because it is independent of step 10. Step 20 is canceled due to the error from step 10.
In the ignore
error-handling mode, three substeps of step 10 and step 30 are started at the same time. After substep 10.1 failed, it produces an step_output
with an invalid substep. The workflow continues to execute. The substep 20.1
is not executed, but the rest of two substeps are executed successfully. The other branch of the DAG (steps 30
and 40
) are not affected by the error. The workflow is considered to be executed successfully in the end despite of the error.
In the abort
error-handling mode, three substeps of step 10 and step 30 are started at the same time. After substep 10.1 failed, it stops step 10, as well as the step 30 which are still running. Steps 20 and 40 are cancelled as well.