Edit this page on our live server and create a PR by running command !create-pr in the console panel

Output option paired_with and group_with

  • Difficulty level: intermediate
  • Time need to lean: 20 minutes or less
  • Key points:
    • Option paired_with attaches variables to output targets
    • Option group_with attaches variables to output groups (_output)

Passing of step_output

If a SoS step contains multiple substeps, defined by options group_by or for_each, the _input becomes the groups of step_input and _output becomes the groups of step_output, and the steps are executed for each of the groups.

Moreover, the group information of step_output will be passed as the default input to the next step in a simple forward-style workflow, or as input to another step with functions output_from or named_output. As shown in the following example, the step_output of step A becomes the input of step B, creating two substeps.

In [1]:
step_output=a.bak b.bak, _input=a.bak
step_output=a.bak b.bak, _input=b.bak

Attaching attributes to output targets

As we recall the input option paired_with associate each input target with one or more attributes.

In [2]:
step_input=a.txt b.txt, _input=a.txt for sample A, _output=a.bak
step_input=a.txt b.txt, _input=b.txt for sample B, _output=b.bak

We can do the same for _output, but it is trickier because the output statement defines _output and only in rare cases sees the entire step_output (see output option group_by for details). In any case, paired_with option applies to what is defined in the output statement.

For example, with paired_with, the _input is associated with an attribute sample, and we can assign it to

In [3]:
step_input=a.txt b.txt, _input=a.txt, _output=a.bak for sample A
step_input=a.txt b.txt, _input=b.txt, _output=b.bak for sample B

However, if the output statement defines step_output with group_by, option paired_with will need to associate all targets with an array (not a single _input.sample as above).

In [4]:
step_input=a.txt b.txt, _input=a.txt, _output=a.bak for sample A
step_input=a.txt b.txt, _input=b.txt, _output=b.bak for sample B

With attributes attached to _output targets, the attributes will be passed to next steps implicitly, or explicitly with output_from. The information will help you identify the properties of each substep more easily.

In [5]:
Continue processing a.bak for sample A
Continue processing b.bak for sample B

The group_with output option

Just like the group_with option of the input statement, the group_with output option assigns a sequence of variables to each of the output groups (variable _output). Again, the situation is trickier because the output statement defines _output and only in rare cases sees the entire step_output (see output option group_by for details). In any case, group_with option applies to what is defined in the output statement.

That is to say, if output defines _output, group_with just associate the dictionary with it, and the values should be specific for this particular substep.

In [6]:
step_output=a.bak b.bak, _output=a.bak, _output.sample=A
step_output=a.bak b.bak, _output=b.bak, _output.sample=B

If output defines step_output with group_by, then group_with should specify arrays with elements assigned to each substep.

In [7]:
step_output=a.bak b.bak, _output=a.bak, _output.sample=A
step_output=a.bak b.bak, _output=b.bak, _output.sample=B

Difference between paired_with and group_with

The difference between pairwd_with and group_with should be clear but the simple examples we have shown do not show it. More specifically,

  • paired_with pairs variables with each target of _output
  • group_with pairs variables to _output itself

We did not see any difference because our _output has only one element so _output.sample can be used in place of _output[0].sample. The following example creates _input of size 2 and demonstrates the difference between target variables (replicate) and group varaibles (group).

In [8]:
step 1 step_input=a1.txt a2.txt b1.txt b2.txt
  _input=a1.txt a2.txt, _output=a1.bak a2.bak, _output.group=A
step 1 step_input=a1.txt a2.txt b1.txt b2.txt
  _input=b1.txt b2.txt, _output=b1.bak b2.bak, _output.group=B
step 2 step_input=a1.bak a2.bak b1.bak b2.bak
  _input=a1.bak a2.bak with replicate [1, 2],  _input.group=A
step 2 step_input=a1.bak a2.bak b1.bak b2.bak
  _input=b1.bak b2.bak with replicate [1, 2],  _input.group=B