Edit this page on our live server and create a PR by running command !create-pr in the console panel

Working with arbitrary subkernels

  • Difficulty level: easy
  • Time need to lean: 20 minutes or less
  • Key points:
    • SoS works with any Jupyter kernel
    • %cd changes directory of all subkernels
    • %expand expands expressions in the cell input before sending it to subkernels
    • %capture captures output from subkernels and save them into Python variables
    • %render renders output from subkernels in different formats

A SoS kernel is a master kernel that can start, stop, and interact with any Jupyter kernel, which are called subkernels by SoS. Although most of the times SoS just passes user input to subkernels and sendes outputs from subkernels to the frontend (notebook), you can use a few SoS magics to modify user inputs before they are sent to the subkernel, and process outputs before they are sent to the frontend.

Change workding directory (magic %cd)

When working with multiple kernels in a notebook, it could be confusing if kernels have different working directory. The rule of thumb in SoS are as follows:

First, subkernels starts at the current working directory of the SoS kernel

In [1]:
Out[1]:
'/Users/bpeng1/sos/sos-docs/src/user_guide'
In [2]:
'/Users/bpeng1/sos/sos-docs/src/user_guide'

Second, changing working directory in one subkernel will not affect other living kernels

In [3]:
In [4]:
'/Users/bpeng1/sos/sos-docs/src'
In [5]:
Out[5]:
'/Users/bpeng1/sos/sos-docs/src/user_guide'

Third, magic %cd changes working directory of all subkernels

In [6]:
/Users/bpeng1/sos/sos-docs
In [7]:
Out[7]:
'/Users/bpeng1/sos/sos-docs'
In [8]:
'/Users/bpeng1/sos/sos-docs'

Expand input (magic %expand)

Script in SoS cells are by default sent to SoS or the subkernels in verbatim. For example,

In [9]:
A parameter {par} is specified.

With variable par defined in the SoS kernel,

In [10]:

the %expand treats the content of the following cell as a Python f-string and expands expressions inside braces:

In [11]:
[1] "A parameter 100 is specified."

If the script contains { }, which is quite common in R, you can double the braces

In [12]:
[1] "A parameter 100 greater than 50 is specified."

If there are multiple braces, it is obviously better to use a different sigil, such as ${ } to interpolate the script

In [13]:
[1] "A parameter 100 greater than 50 is specified."

Although not the topic of this tutorial, it is worth mentioning that the usage of the %expand magic is the same as the expand option of SoS actions so that you can convert the above script that was executed in a R session to an R action in a SoS workflow as follows:

In [14]:
[1] "A parameter 100 greater than 50 is specified."

Expand input in subkernels (magic %expand --in)

If a language module supports the expand interface (not all of them does), you can expand the content of a cell in specified kernel. This is most naturally used in a markdown kernel when the expanded text are treated as markdown text.

By default the markdown kernel processes the input text literally:

In [15]:
  • Hello, the value of a is {a}

With the %expand magic, you can expand expressions with variables in SoS kernel (default).

In [16]:
In [17]:

Hello, the value of a is 5

If you have variables in R,

In [18]:

You can expand the content of the cell using variables (and expressions) in R:

In [19]:

Hello, the value of a**2 is 10000

If you are more familiar with delimiters in RMarkdown, you can specify the delimiter as option of %expand.

In [20]:

Hello, the value of a**2 is 10000

Capture cell output (magic %capture)

Magic %capture all or part of the output of a cell to a SoS variable. To understand how this magic works, you will need to understand how Jupyter works. Briefly, after a cell is executed, the kernel sends one or more of messages stream, display_data, and other controlling messages before it sends execute_result to conclude the execution. The stream message type can contain standard output (stdout) and standard error output (stderr), and the display_data message can contain a lot more complex data under a dictionary with keys text/html, text/plain, text/markdown etc, and the frontend will decide how to display these messages.

Determine what to capture

The %capture magic can capture the following types of information

name message
stdout stdout of stream messages
stderr stderr of stream messages
text text/plain of display_data or execute_result messages
html text/html of display_data or execute_result messages
markdown text/markdown of display_data or execute_result messages
error evalue of error message
raw All above messages

The first step to capture output from a cell is to determine what types of messages are sent by the cell. If you are uncertain, you can open the console panel (right click and select New Console for Notebook if you are using Jupyter Lab), and use the %capture magic without option (or with the raw option).

In [21]:
I am from Bash

The messages that has been returned by the cell will be displayed in the console window

[('stream', {'name': 'stdout', 'text': 'I am from Bash\n'})]

for this cell, from which you can see that the message is of type stdout. You can then specify the stdout type,

In [22]:
I am from Bash

The captured result is by default saved to a variable __captured in the SoS kernel:

In [23]:
Out[23]:
'I am from Bash\n'

You can use option -t (--to) to assign the name of the variable

In [24]:
I am from Bash
In [25]:
Out[25]:
'I am from Bash\n'

As a more complex example, the following cell runs a SPARQL query and returns multiple messages.

In [26]:
Return format: JSON
Display: table
Endpoint set to: http://dbpedia.org/sparql
person name
http://dbpedia.org/resource/Paul_Da_Vinci Paul Da Vinci
http://dbpedia.org/resource/Leonardo_da_Vinci Leonardo da Vinci
Total: 2, Shown: 2

The __captured variable shows all returned messages

In [27]:
Out[27]:
[('display_data',
  {'data': {'text/html': '<div class="krn-spql"><div class="magic">Return format: JSON</div><div class="magic">Display: table</div><div class="magic">Endpoint set to: http://dbpedia.org/sparql</div></div>',
    'text/plain': 'Return format: JSON\nDisplay: table\nEndpoint set to: http://dbpedia.org/sparql\n'},
   'metadata': {}}),
 ('display_data',
  {'data': {'text/html': '<div class="krn-spql"><table><tr class=hdr><th>person</th>\n<th>name</th></tr><tr class=odd><td class=val><a href="http://dbpedia.org/resource/Paul_Da_Vinci" target="_other">http://dbpedia.org/resource/Paul_Da_Vinci</a></td>\n<td class=val>Paul Da Vinci</td></tr><tr class=even><td class=val><a href="http://dbpedia.org/resource/Leonardo_da_Vinci" target="_other">http://dbpedia.org/resource/Leonardo_da_Vinci</a></td>\n<td class=val>Leonardo da Vinci</td></tr></table><div class="tinfo">Total: 2, Shown: 2</div></div>'},
   'metadata': {}})]

You can then save the content of text/html to a variable html_table

In [28]:
Return format: JSON
Display: table
Endpoint set to: http://dbpedia.org/sparql
person name
http://dbpedia.org/resource/Paul_Da_Vinci Paul Da Vinci
http://dbpedia.org/resource/Leonardo_da_Vinci Leonardo da Vinci
Total: 2, Shown: 2

which contains the text/html data of two messages

In [29]:
<div class="krn-spql"><div class="magic">Return format: JSON</div><div class="magic">Display: table</div><div class="magic">Endpoint set to: http://dbpedia.org/sparql</div></div><div class="krn-spql"><table><tr class=hdr><th>person</th>
<th>name</th></tr><tr class=odd><td class=val><a href="http://dbpedia.org/resource/Paul_Da_Vinci" target="_other">http://dbpedia.org/resource/Paul_Da_Vinci</a></td>
<td class=val>Paul Da Vinci</td></tr><tr class=even><td class=val><a href="http://dbpedia.org/resource/Leonardo_da_Vinci" target="_other">http://dbpedia.org/resource/Leonardo_da_Vinci</a></td>
<td class=val>Leonardo da Vinci</td></tr></table><div class="tinfo">Total: 2, Shown: 2</div></div>

Then, if you would like to process the output programatically, you can use one of the many powerful Python modules

In [30]:
Out[30]:
[<a href="http://dbpedia.org/resource/Paul_Da_Vinci" target="_other">http://dbpedia.org/resource/Paul_Da_Vinci</a>,
 <a href="http://dbpedia.org/resource/Leonardo_da_Vinci" target="_other">http://dbpedia.org/resource/Leonardo_da_Vinci</a>]

Capture formatted content

If the output of a cell is well-formatted, it is possible to capture the output as variables in a type other than str.

For example, if you would like to capture the size of some files from a few notebook files. Instead of using Python scripts, you could possibly use a shell command as follows

In [31]:
> !ls -l ex*.ipynb | awk '{printf("%s,%d\n", $10, $6)}'
ls: ex*.ipynb: No such file or directory

The output is well formatted so you can capture it in csv format as follows

In [32]:
> !ls -l ex*.ipynb | awk '{printf("%s,%d\n", $10, $6)}'
ls: ex*.ipynb: No such file or directory

The resulting variable is a Pandas DataFrame but unfortunately treated the first data line as header, which is not entirely correct here.

In [33]:
Out[33]:
ls: ex*.ipynb: No such file or directory

The %capture magic can capture data in text (default), json, csv, and tsv format, and can append to instead of replacing an existing variable (option -a). Please refer to the SoS Magics reference or command %capture -h for a comlete list of options.

Render cell output (magic %render)

The %render magic intercepts the output of a cell, convert it to certain format before displaying it in the notebook. The format can be any format supported by the IPython.display module and is default to Markdown.

For example, if you have a dataset

In [34]:

You can format it in HTML format

In [35]:
<table>
  <tr>
    <th>Firstname</th>
    <th>Lastname</th> 
    <th>Age</th>
  </tr>

  <tr>
    <td>John</td>
    <td>Smith</td> 
    <td>50</td>
  </tr>

  <tr>
    <td>Eve</td>
    <td>Jackson</td> 
    <td>35</td>
  </tr>
</table>

and render it as a HTML table.

In [36]:
Out[36]:
Firstname Lastname Age
John Smith 50
Eve Jackson 35

Currently %render only renders stdout (of stream messages, default) and text (text/plain of display_data messages) contents, and you should probably use %capture raw to check the type of output before you %render.

The %render magic accepts any renderer that is defined in the IPython.display module. The following cell lists all renderers,

In [37]:
Options of magic %render
* DisplayObject
* TextDisplayObject
* Pretty
* HTML
* Markdown
* Math
* Latex
* SVG
* ProgressBar
* JSON
* GeoJSON
* Javascript
* Image
* Video
* Audio
* Code

and of course a %render magic would treat the output as markdown format and display the items as bullet points:

In [38]:

Options of magic %render

  • DisplayObject
  • TextDisplayObject
  • Pretty
  • HTML
  • Markdown
  • Math
  • Latex
  • SVG
  • ProgressBar
  • JSON
  • GeoJSON
  • Javascript
  • Image
  • Video
  • Audio
  • Code

The ability to render text output as markdown text alleviatea a problem with the Jupyter notebooks in that its markdown cells cannot contain variables, so you cannot really mix results with their descriptions as easily as what Rmarkdown inline expressions do. However, with the %render magic, you can write markdown text as a string in any kernel, and use the %render magic to display it.

For example, if you have res obtained from some analysis

In [39]:

You can report the result by generating a markdown text programmatically and use the %render magic to render it

In [40]:

Array of size 5

  • -1.0386173237681
  • -0.109619826107393
  • -1.485339927388
  • 1.22426679620287
  • 0.0548661565220672

This is less intuitive than writing down markdown text directly but you have the flexibility to generate the markdown text using any language and you can use conditions and loops to automate the output of long reports.