Edit this page on our live server and create a PR by running command !create-pr in the console panel

Conditional actions

  • Difficulty level: easy
  • Time need to lean: 10 minutes or less
  • Key points:
    • Normal break, continue, return structures cannot be used in the implicit loops of substeps
    • Action warn_if gives an warning under specified conditions
    • Action fail_if raises an exception that terminates the substep and therefore the entire workflow if a condition is met
    • Action done_if assumes that the substep is completed and ignores the rest of the statements
    • Action skip_if skips the substep and removed _output even if the _output has been generated

Control structures of substeps

In [1]:

SoS allows the use of arbitrary Python statements in step processes. For example, suppose you are processing a number of input files and some of them contain errors and have to be ignored, you can write a workflow step as follows:

In [2]:
generating a_0.out
generating a_1.out
generating a_3.out

However, as we have discussed in tutorials How to include scripts in different langauges in SoS workflows and How to specify input and output files and process input files in groups, steps written with loops and function calls like sh() are not very readable because the scripts are not clearly presented and users have to follow the logics of the code. Also, the input files are not processed in parallel so the step is not executed efficiently.

The more SoS way to implement the step is to use input and output statements and script format of function calls as follows:

In [3]:
generating a_0.out
generating a_1.out
generating a_2.out
generating a_3.out

The problem is that substeps are processed concurrently and we do not yet have a way to treat them differentially and introduce the logic of

    if idx == 2:  # problematic step
        continue

Action skip_if

The skip_if action allows you to skip certain substeps with certain condition. The condition can involve a (mostly) hidden variable _index which is the index of the substep. For example, the aforementioned step can be written as

In [4]:
generating a_0.out
generating a_1.out
generating a_3.out

It is important to remember that skip_if assumes that substep output is not generated and adjust _output accordingly. For example, if you pass the output of the step to another step, you will notice that the output of step 2 is empty.

In [5]:
[##] 2 steps processed (4 jobs completed, 3 jobs ignored)

Action done_if

A similar action is done_if, which also ignores the rest of the step process but assumes that the output has already been generated. Consequently, this action does not adjust _output. For example, if some more work is only applied to a subset of substeps, you can use done_if to execute additional code to only selected substeps.

In [6]:
[.32m.36m##] 2 steps processed (5 jobs completed, 3 jobs ignored)

Action warn_if

Action warn_if is very easy to use. It just produces an warning message if something suspicious is detected.

In [7]:
generating a_0.out
generating a_1.out
generating a_2.out
generating a_3.out

Action fail_if

Action fail_if terminates the execution of the workflow under certain conditions. It kills all other processes (e.g. working substeps or nested workflows) and it should be used with caution if is unsafe to terminate the workflow abruptly.

For example, if we decide to terminate the entire workflow if we detect something wrong with an input file, we can do

In [8]:
generating a_0.out
generating a_1.out
generating a_3.out
ExecuteError: [(id=6975767944567788413, index=2)]: input 2 might be problematic