Edit this page on our live server and create a PR by running command !create-pr in the console panel

Input option concurrent

  • Difficulty level: easy
  • Time need to lean: 10 minutes or less
  • Key points:
    • By default substeps are executed in parallel
    • Option concurrent=False stops the substeps from being executed in parallel
    • Option concurrent=integer limits the number of concurrent substeps
    • Certain options and statements prevents substeps from being executed in parallel

Input option concurrent

Substeps of a step are by default executed concurrently with potential dependencies. For example,

In [3]:
Substep 0 completed in 0.0 seconds
Substep 1 completed in 1.0 seconds
Substep 2 completed in 2.0 seconds
Substep 3 completed in 3.1 seconds

As you can see, the start_time is the start time of all substeps, and all substeps are executed concurrently.

Concurrent execution can cause some unexpected results. For example, there are 4 substeps in the following example. Each of them adds i to a shared variable sum, but the results are not accumulated because each substep has its own sum.

In [4]:
sum is 0 at index 0
sum is 1 at index 1
sum is 2 at index 2
sum is 3 at index 3

To get the correct sum for all substeps, you can execute the substeps sequentially by adding option concurrent=False.

In [5]:
sum is 0 at index 0
sum is 1 at index 1
sum is 3 at index 2
sum is 6 at index 3

Limit the number of concurrent substeps

By default, SoS submits substeps to all available workers so the number of concurrent workers is limited by the number of substeps and number of workers. However, if a substep is resource intensive (e.g. using a lot of CPU cores or memory), you might want to limit the number of concurrent substeps.

For example, the following workflow has two steps, each with 10 substeps, but the first step has option concurrent=2, which limits the number of concurrent substeps to 2. As we can see, with option -j 5 (5 workers), substeps in the first step are executed in pairs, and substeps in the second steps are executed in groups of 5.

In [ ]:

Concurrency for the execution of nested subworkflows

Substeps containing nested subworkflows (function sos_run) are also executed concurrently by default. For example, in the following workflow where four sleep subworkflows are executed with different parameter duration, the subworkflows are executed in parallel and completed in random orders.

In [ ]:

There is a complication though: substeps with subworkflows must have the sos_run as the last statement to be executed in parallel. For example, with the addition of one statement after the sos_run call, subworkflows in the aforementioned example are executed sequentially.

In [ ]:

This is somewhat limiting for users who get used to use a default step to execute multiple subworkflows as follows:

In [ ]:

However, remember that function sos_run can accept multiple subworkflows and will execute them in parallel, you can write execute the steps in parallel as long as they donot depend on each other:

In [ ]: