- Difficulty level: intermediate
- Time need to lean: 10 minutes or less
- Key points:
- Option
trunk_size
groups small tasks into larger ones - Option
trunk_workers
determines number of workers per master task - Tasks can be dispatched and executed on multiple nodes on a cluster system
- Option
From time to time you may face the problem with many small tasks, such as running millions of simulations or analyzing thousands of genes. Whereas each simulation or analysis takes just a few minutes to complete, the entire analysis will take a long time and needs to be performed on a cluster. However, most cluster systems does not welcome millions or small tasks as managing a large number of jobs can pose management challenges to the scheduler.
What users have usually done are running these analysis in batch, which works more or less like the following script if implemented in SoS:
The args
option here determines what will be passed to the underlying bash
command, which should contain {filename}
as the filename of the temporary file generated by SoS. In this particular example we use
f'{{filename}} {batch*4+1} {(batch+1)*4}'
so that the following bash commands will be executed
bash {filename} 1 4
bash {filename} 5 8
bash {filename} 9 12
bash {filename} 13 16
for substeps with batch
equals to 0
, 1
, 2
and 3
respectively.
Now that we have fewer number of jobs, we can submit the shell scripts to a batch system as tasks
The tasks in this example are executed locally but you can send the tasks to a remote host using
task: queue='host'
or
%run -q host
The trunk_size
task option
The trunk_size=n
option groups tasks into groups of size `n` before submitting them to an executor. As a special case, if option `trunk_size` is specified but with a value `None`, all tasks from the step will be grouped together.
The aforementioned example can be implemented in a much easier way as follows using the trunk_size
task option:
In this example, 15 tasks are generated from 15 substeps, each running a bash script
echo "Processing {id}"
with id
= 0
, ..., 15
respectively.
With option trunk_size=4
, the tasks are grouped into master tasks with names starting with M5_
.
The trunk_workers
task option
The trunk_workers=n
option specify the number of concurrent workers in each task. Similar to option -j
for commands sos run
and sos execute
, it accepts the specification of multiple worker processes on multiple nodes. The value of this parameter will affects variables such as nodes
, cores
, walltime
and mem
in task templates.
The master tasks by default execute subtasks sequentially. If the master task has a large number of subtasks and there are computing resources available, you can specifying another option trunk_workers
to set the number of workers for each master task. For example, in the following SoS workflow, the 16 tasks are submitted as a single (master) task and will be processed by two workers.
When you submit a master task to the cluster system, you typically need to specify the resources needed for tasks, using options walltime
, mem
and cores
. These variables will be translated and expanded in a task template (defined in SoS configuration files).
For example, with the following template for a SLURM-based cluster system,
#!/bin/bash
#SBATCH --time={walltime}
#SBATCH --nodes={nodes}
#SBATCH --ntasks-per-node={cores}
#SBATCH --mem-per-cpu={mem // cores // 1000000000}G
#SBATCH --job-name={task}
#SBATCH --output=/home/{user_name}/.sos/{task}.out
#SBATCH --error=/home/{user_name}/.sos/{task}.err
sos execute {task} -v {verbosity} -s {sig_mode} -r {run_mode}
A task with options mem='1G'
and walltime='12h'
will populate the template with variables mem=1000000000
(1G) and walltime=
01:00:00`.
If you group multiple tasks using options trunk_size
and trunk_worker
, the template variables will be adjusted automatically. For example,
input: for_each=dict(id=range(16))
task: trunk_size=None, trunk_workers=2, mem='1G', walltime='1h'
bash: expand=True
echo "Processing {id+1}"
will generate mem='2000000000' (2G) and
walltime=08:00:00' because two concurrent workers will use double the RAM, and each worker will process 8 tasks sequentially, in a total of 8 hours.
If you have many small tasks and would like to submit them as a small number or a single job on the cluster system, and would like to make use of multiple nodes to process them, you can specify more than one computing nodes with option trunk_workers
. In this case, trunk_workers
should be a list with its length indicating the number of nodes and its elements indicating the number of workers on each node.
For example, with the template above, running sos run test.sos -q slurm_cluster
will create a single master task with 16 subtasks, processed by 8 workers on two computing nodes.
input: for_each=dict(id=range(16))
task: trunk_size=None, trunk_workers=[4, 4], mem='1G', walltime='1h'
bash: expand=True
echo "Processing {id+1}"
Under the hood, sos will
- determine number of nodes and set variable
nodes
to2
- adjust
mem
andwalltime
according to number of concurrent and sequential tasks. - populate the task template with variables
nodes
,mem
,walltime
etc - submit the generated shell script to create a multi-node job
- Because the task is executed on a cluster system, it will automatically recognize the nodes and processes per node, create workers accordingly to process the tasks on multiple nodes.