Working with arbitrary subkernels

Difficulty level: easy
Time need to lean: 20 minutes or less
Key points:
- SoS works with any Jupyter kernel
- %cd changes directory of all subkernels
- %expand expands expressions in the cell input before sending it to subkernels
- %capture captures output from subkernels and save them into Python variables
- %render renders output from subkernels in different formats

A SoS kernel is a master kernel that can start, stop, and interact with any Jupyter kernel, which are called subkernels by SoS. Although most of the times SoS just passes user input to subkernels and sendes outputs from subkernels to the frontend (notebook), you can use a few SoS magics to modify user inputs before they are sent to the subkernel, and process outputs before they are sent to the frontend.

No description has been provided for this image

Change workding directory (magic `%cd`)

When working with multiple kernels in a notebook, it could be confusing if kernels have different working directory. The rule of thumb in SoS are as follows:

First, subkernels starts at the current working directory of the SoS kernel

'/Users/bpeng1/sos/sos-docs/src/user_guide'

Second, changing working directory in one subkernel will not affect other living kernels

'/Users/bpeng1/sos/sos-docs/src/user_guide'

Third, magic %cd changes working directory of all subkernels

/Users/bpeng1/sos/sos-docs

'/Users/bpeng1/sos/sos-docs'

Expand input (magic `%expand`)

Script in SoS cells are by default sent to SoS or the subkernels in verbatim. For example,

A parameter {par} is specified.

With variable par defined in the SoS kernel,

the %expand treats the content of the following cell as a Python f-string and expands expressions inside braces:

[1] "A parameter 100 is specified."

If the script contains { }, which is quite common in R, you can double the braces

[1] "A parameter 100 greater than 50 is specified."

If there are multiple braces, it is obviously better to use a different sigil, such as ${ } to interpolate the script

[1] "A parameter 100 greater than 50 is specified."

Although not the topic of this tutorial, it is worth mentioning that the usage of the %expand magic is the same as the expand option of SoS actions so that you can convert the above script that was executed in a R session to an R action in a SoS workflow as follows:

[1] "A parameter 100 greater than 50 is specified."

Expand input in subkernels (magic `%expand --in`)

If a language module supports the expand interface (not all of them does), you can expand the content of a cell in specified kernel. This is most naturally used in a markdown kernel when the expanded text are treated as markdown text.

By default the markdown kernel processes the input text literally:

With the %expand magic, you can expand expressions with variables in SoS kernel (default).

If you have variables in R,

You can expand the content of the cell using variables (and expressions) in R:

If you are more familiar with delimiters in RMarkdown, you can specify the delimiter as option of %expand.

Capture cell output (magic `%capture`)

Magic %capture all or part of the output of a cell to a SoS variable. To understand how this magic works, you will need to understand how Jupyter works. Briefly, after a cell is executed, the kernel sends one or more of messages stream, display_data, and other controlling messages before it sends execute_result to conclude the execution. The stream message type can contain standard output (stdout) and standard error output (stderr), and the display_data message can contain a lot more complex data under a dictionary with keys text/html, text/plain, text/markdown etc, and the frontend will decide how to display these messages.

Determine what to capture

The %capture magic can capture the following types of information

name	message
`stdout`	`stdout` of `stream` messages
`stderr`	`stderr` of `stream` messages
`text`	`text/plain` of `display_data` or `execute_result` messages
`html`	`text/html` of `display_data` or `execute_result` messages
`markdown`	`text/markdown` of `display_data` or `execute_result` messages
`error`	`evalue` of `error` message
`raw`	All above messages

The first step to capture output from a cell is to determine what types of messages are sent by the cell. If you are uncertain, you can open the console panel (right click and select New Console for Notebook if you are using Jupyter Lab), and use the %capture magic without option (or with the raw option).

I am from Bash

The messages that has been returned by the cell will be displayed in the console window

[('stream', {'name': 'stdout', 'text': 'I am from Bash\n'})]

for this cell, from which you can see that the message is of type stdout. You can then specify the stdout type,

I am from Bash

The captured result is by default saved to a variable __captured in the SoS kernel:

'I am from Bash\n'

You can use option -t (--to) to assign the name of the variable

I am from Bash

'I am from Bash\n'

As a more complex example, the following cell runs a SPARQL query and returns multiple messages.

The __captured variable shows all returned messages

[('display_data',
  {'data': {'text/html': '<div class="krn-spql"><div class="magic">Return format: JSON</div><div class="magic">Display: table</div><div class="magic">Endpoint set to: http://dbpedia.org/sparql</div></div>',
    'text/plain': 'Return format: JSON\nDisplay: table\nEndpoint set to: http://dbpedia.org/sparql\n'},
   'metadata': {}}),
 ('display_data',
  {'data': {'text/html': '<div class="krn-spql"><table><tr class=hdr><th>person</th>\n<th>name</th></tr><tr class=odd><td class=val><a href="http://dbpedia.org/resource/Paul_Da_Vinci" target="_other">http://dbpedia.org/resource/Paul_Da_Vinci</a></td>\n<td class=val>Paul Da Vinci</td></tr><tr class=even><td class=val><a href="http://dbpedia.org/resource/Leonardo_da_Vinci" target="_other">http://dbpedia.org/resource/Leonardo_da_Vinci</a></td>\n<td class=val>Leonardo da Vinci</td></tr></table><div class="tinfo">Total: 2, Shown: 2</div></div>'},
   'metadata': {}})]

You can then save the content of text/html to a variable html_table

which contains the text/html data of two messages

<div class="krn-spql"><div class="magic">Return format: JSON</div><div class="magic">Display: table</div><div class="magic">Endpoint set to: http://dbpedia.org/sparql</div></div><div class="krn-spql"><table><tr class=hdr><th>person</th>
<th>name</th></tr><tr class=odd><td class=val><a href="http://dbpedia.org/resource/Paul_Da_Vinci" target="_other">http://dbpedia.org/resource/Paul_Da_Vinci</a></td>
<td class=val>Paul Da Vinci</td></tr><tr class=even><td class=val><a href="http://dbpedia.org/resource/Leonardo_da_Vinci" target="_other">http://dbpedia.org/resource/Leonardo_da_Vinci</a></td>
<td class=val>Leonardo da Vinci</td></tr></table><div class="tinfo">Total: 2, Shown: 2</div></div>

Then, if you would like to process the output programatically, you can use one of the many powerful Python modules

[<a href="http://dbpedia.org/resource/Paul_Da_Vinci" target="_other">http://dbpedia.org/resource/Paul_Da_Vinci</a>,
 <a href="http://dbpedia.org/resource/Leonardo_da_Vinci" target="_other">http://dbpedia.org/resource/Leonardo_da_Vinci</a>]

Capture formatted content

If the output of a cell is well-formatted, it is possible to capture the output as variables in a type other than str.

For example, if you would like to capture the size of some files from a few notebook files. Instead of using Python scripts, you could possibly use a shell command as follows

ls: ex*.ipynb: No such file or directory

The output is well formatted so you can capture it in csv format as follows

ls: ex*.ipynb: No such file or directory

The resulting variable is a Pandas DataFrame but unfortunately treated the first data line as header, which is not entirely correct here.

The %capture magic can capture data in text (default), json, csv, and tsv format, and can append to instead of replacing an existing variable (option -a). Please refer to the SoS Magics reference or command %capture -h for a comlete list of options.

Render cell output (magic `%render`)

The %render magic intercepts the output of a cell, convert it to certain format before displaying it in the notebook. The format can be any format supported by the IPython.display module and is default to Markdown.

For example, if you have a dataset

You can format it in HTML format

<table>
  <tr>
    <th>Firstname</th>
    <th>Lastname</th> 
    <th>Age</th>
  </tr>

  <tr>
    <td>John</td>
    <td>Smith</td> 
    <td>50</td>
  </tr>

  <tr>
    <td>Eve</td>
    <td>Jackson</td> 
    <td>35</td>
  </tr>
</table>

and render it as a HTML table.

Currently %render only renders stdout (of stream messages, default) and text (text/plain of display_data messages) contents, and you should probably use %capture raw to check the type of output before you %render.

The %render magic accepts any renderer that is defined in the IPython.display module. The following cell lists all renderers,

Options of magic %render
* DisplayObject
* TextDisplayObject
* Pretty
* HTML
* Markdown
* Math
* Latex
* SVG
* ProgressBar
* JSON
* GeoJSON
* Javascript
* Image
* Video
* Audio
* Code

and of course a %render magic would treat the output as markdown format and display the items as bullet points:

The ability to render text output as markdown text alleviatea a problem with the Jupyter notebooks in that its markdown cells cannot contain variables, so you cannot really mix results with their descriptions as easily as what Rmarkdown inline expressions do. However, with the %render magic, you can write markdown text as a string in any kernel, and use the %render magic to display it.

For example, if you have res obtained from some analysis

You can report the result by generating a markdown text programmatically and use the %render magic to render it

This is less intuitive than writing down markdown text directly but you have the flexibility to generate the markdown text using any language and you can use conditions and loops to automate the output of long reports.

person	name
http://dbpedia.org/resource/Paul_Da_Vinci	Paul Da Vinci
http://dbpedia.org/resource/Leonardo_da_Vinci	Leonardo da Vinci

Firstname	Lastname	Age
John	Smith	50
Eve	Jackson	35

Working with arbitrary subkernels

Change workding directory (magic %cd)

Expand input (magic %expand)

Expand input in subkernels (magic %expand --in)

Capture cell output (magic %capture)