Combining tasks

Difficulty level: intermediate
Time need to lean: 10 minutes or less
Key points:
- Option trunk_size groups small tasks into larger ones
- Option trunk_workers determines number of workers per master task
- Tasks can be dispatched and executed on multiple nodes on a cluster system

The problem with many small tasks

From time to time you may face the problem with many small tasks, such as running millions of simulations or analyzing thousands of genes. Whereas each simulation or analysis takes just a few minutes to complete, the entire analysis will take a long time and needs to be performed on a cluster. However, most cluster systems does not welcome millions or small tasks as managing a large number of jobs can pose management challenges to the scheduler.

The bash script approach

What users have usually done are running these analysis in batch, which works more or less like the following script if implemented in SoS:

Processing 1
Processing 2
Processing 3
Processing 4
Processing 5
Processing 6
Processing 7
Processing 8
Processing 9
Processing 10
Processing 11
Processing 12
Processing 13
Processing 14
Processing 15
Processing 16

The args option here determines what will be passed to the underlying bash command, which should contain {filename} as the filename of the temporary file generated by SoS. In this particular example we use

f'{{filename}} {batch*4+1} {(batch+1)*4}'

so that the following bash commands will be executed

bash {filename} 1 4
bash {filename} 5 8
bash {filename} 9 12
bash {filename} 13 16

for substeps with batch equals to 0, 1, 2 and 3 respectively.

Now that we have fewer number of jobs, we can submit the shell scripts to a batch system as tasks

The tasks in this example are executed locally but you can send the tasks to a remote host using

task: queue='host'

or

%run -q host

Grouping SoS tasks

The `trunk_size` task option

The trunk_size=n option groups tasks into groups of size `n` before submitting them to an executor. As a special case, if option `trunk_size` is specified but with a value `None`, all tasks from the step will be grouped together.

The aforementioned example can be implemented in a much easier way as follows using the trunk_size task option:

In this example, 15 tasks are generated from 15 substeps, each running a bash script

echo "Processing {id}"

with id = 0, ..., 15 respectively.

With option trunk_size=4, the tasks are grouped into master tasks with names starting with M5_.

Executing subtasks in parallel

The `trunk_workers` task option

The trunk_workers=n option specify the number of concurrent workers in each task. Similar to option -j for commands sos run and sos execute, it accepts the specification of multiple worker processes on multiple nodes. The value of this parameter will affects variables such as nodes, cores, walltime and mem in task templates.

The master tasks by default execute subtasks sequentially. If the master task has a large number of subtasks and there are computing resources available, you can specifying another option trunk_workers to set the number of workers for each master task. For example, in the following SoS workflow, the 16 tasks are submitted as a single (master) task and will be processed by two workers.

Executing subtasks on cluster system

When you submit a master task to the cluster system, you typically need to specify the resources needed for tasks, using options walltime, mem and cores. These variables will be translated and expanded in a task template (defined in SoS configuration files).

For example, with the following template for a SLURM-based cluster system,

#!/bin/bash
#SBATCH --time={walltime}
#SBATCH --nodes={nodes}
#SBATCH --ntasks-per-node={cores}
#SBATCH --mem-per-cpu={mem // cores // 1000000000}G
#SBATCH --job-name={task}
#SBATCH --output=/home/{user_name}/.sos/{task}.out
#SBATCH --error=/home/{user_name}/.sos/{task}.err

sos execute {task} -v {verbosity} -s {sig_mode} -r {run_mode}

A task with options mem='1G' and walltime='12h' will populate the template with variables mem=1000000000 (1G) and walltime=01:00:00`.

Processing data

If you group multiple tasks using options trunk_size and trunk_worker, the template variables will be adjusted automatically. For example,

input: for_each=dict(id=range(16))

task: trunk_size=None, trunk_workers=2, mem='1G', walltime='1h'
bash: expand=True
    echo "Processing {id+1}"

will generate mem='2000000000' (2G) and walltime=08:00:00' because two concurrent workers will use double the RAM, and each worker will process 8 tasks sequentially, in a total of 8 hours.

Executing subtasks on multiple computing nodes of a cluster system

If you have many small tasks and would like to submit them as a small number or a single job on the cluster system, and would like to make use of multiple nodes to process them, you can specify more than one computing nodes with option trunk_workers. In this case, trunk_workers should be a list with its length indicating the number of nodes and its elements indicating the number of workers on each node.

For example, with the template above, running sos run test.sos -q slurm_cluster will create a single master task with 16 subtasks, processed by 8 workers on two computing nodes.

input: for_each=dict(id=range(16))

task: trunk_size=None, trunk_workers=[4, 4], mem='1G', walltime='1h'
bash: expand=True
    echo "Processing {id+1}"

Under the hood, sos will

determine number of nodes and set variable nodes to 2
adjust mem and walltime according to number of concurrent and sequential tasks.
populate the task template with variables nodes, mem, walltime etc
submit the generated shell script to create a multi-node job
Because the task is executed on a cluster system, it will automatically recognize the nodes and processes per node, create workers accordingly to process the tasks on multiple nodes.