- Difficulty level: easy
- Time need to lean: 10 minutes or less
- Key points:
- By default substeps are executed in parallel
- Option
concurrent=False
stops the substeps from being executed in parallel - Option
concurrent=integer
limits the number of concurrent substeps - Certain options and statements prevents substeps from being executed in parallel
Substeps of a step are by default executed concurrently with potential dependencies. For example,
As you can see, the start_time
is the start time of all substeps, and all substeps are executed concurrently.
Concurrent execution can cause some unexpected results. For example, there are 4 substeps in the following example. Each of them adds i
to a shared variable sum
, but the results are not accumulated because each substep has its own sum
.
To get the correct sum
for all substeps, you can execute the substeps sequentially by adding option concurrent=False
.
Input option concurrent
Option concurrent
accepts the following values
concurrent=True
(default): Execute substeps in parallel, subject to number of available workers.concurrent=False
: Execute substeps sequentiallyconcurrent=VAL
whereVAL
should be an integer value: Limit the number of concurrent substeps toVAL
By default, SoS submits substeps to all available workers so the number of concurrent workers is limited by the number of substeps and number of workers. However, if a substep is resource intensive (e.g. using a lot of CPU cores or memory), you might want to limit the number of concurrent substeps.
For example, the following workflow has two steps, each with 10 substeps, but the first step has option concurrent=2
, which limits the number of concurrent substeps to 2. As we can see, with option -j 5
(5 workers), substeps in the first step are executed in pairs, and substeps in the second steps are executed in groups of 5.
Substeps containing nested subworkflows (function sos_run
) are also executed concurrently by default. For example, in the following workflow where four sleep
subworkflows are executed with different parameter duration
, the subworkflows are executed in parallel and completed in random orders.
Substeps with statements after sos_run
are not executed in parallel
Because of the way subworkflows are executed, a subworkflow must be the last statement in the step process to allow the substeps to be executed in parallel. That is to say, subworkflows in
input: ... sos_run('sub') print('Done')and
input: ... sos_run('sub1') sos_run('sub2')will not be executed in parallel. Although the latter case could be executed in parallel if
sub2
does not have to be executed after sub1
and can be executed side by side with
input: ... sos_run(['sub1', 'sub2'])
There is a complication though: substeps with subworkflows must have the sos_run
as the last statement to be executed in parallel. For example, with the addition of one statement after the sos_run
call, subworkflows in the aforementioned example are executed sequentially.
This is somewhat limiting for users who get used to use a default
step to execute multiple subworkflows as follows:
However, remember that function sos_run
can accept multiple subworkflows and will execute them in parallel, you can write execute the steps in parallel as long as they donot depend on each other: