Edit this page on our live server and create a PR by running command !create-pr in the console panel

SoS Notebook

SoS Notebook is a multi-language notebook that allows the use of multiple kernels in a single notebook with data exchange among live kernels, a Jupyter frontend to the SoS workflow engine to allows local and remote analysis of large datasets, with an enhanced Jupyter frontend that provides features such as line by line execution of cell contents. The unique combination of a multi-language kernel and a workflow engine allows easy transition from interactive data analysis to batch data processing workflows.

In [1]:
Revision Author Date Message
b26cc6c Bo Peng 2018-10-21 update documentation
5f8dc08 Bo Peng 2018-08-18 Update docs on the behavior of %run regarding global sections
37b8f7b Bo Peng 2018-08-15 Update doc on the updated %render and %capture magics
3424a5f Bo Peng 2018-06-27 Update documentation on the new options of magics %use and %with
c59bb79 Bo Peng 2018-06-26 Update documentation on %toc magic

Installing the SoS kernel

Please follow the instructions in Running SoS to install sos, sos-notebook, and relevant language modules. After the installation of sos-notebook, you should install the sos kernel to Jupyter with command

% python -m sos_notebook.install

sos should then appear in the output of:

In [2]:
Available kernels:
  bash         /Users/bpeng1/anaconda3/envs/sos/etc/jupyter/nbdata/kernels/bash
  julia-0.6    /Users/bpeng1/anaconda3/envs/sos/etc/jupyter/nbdata/kernels/julia-0.6
  markdown     /Users/bpeng1/anaconda3/envs/sos/etc/jupyter/nbdata/kernels/markdown
  sos          /Users/bpeng1/anaconda3/envs/sos/etc/jupyter/nbdata/kernels/sos
  ir           /Users/bpeng1/anaconda3/envs/sos/share/jupyter/kernels/ir
  python3      /Users/bpeng1/anaconda3/envs/sos/share/jupyter/kernels/python3
  sparql       /usr/local/share/jupyter/kernels/sparql

To create a SoS notebook, start a Jupyter server using command

$ jupyter notebook

and select a SoS kernel for your new notebook.

User Interface

user_interface

The SoS frontend is based on the Jupyter notebook frontend but adds a side panel and dropdown lists for all code cell.

SoS is also available to JupyterLab but the side panel is currently unavailable.

cell content

A SoS Jupyter notebook accepts the following types of cells:

Cell type Content Interpreted by Behavior
Markdown Markdown text Jupyter can be evaluated to have title, table etc
Subkernel Statements in other languages with optional SoS magics Subkernels SoS prepares statements and evaluate them in subkernels such as R.
SoS SoS statements without section header SoS Notebook Evaluate the cell as a SoS step in a persistent SoS dictionary.
Workflow SoS statements with section header SoS Command Can only be executed by magic %run (workflow in the current cell) or %sosrun (workflow in the entire notebook).

The markdown cells contains Markdown text and are rendered by Jupyter. These cells are used for displaying rich-format texts such as title and table. Most of the content of this documentation is written in such cells.

The kernel of each code cell is marked by the language selector at the top-right corner of each code cell, and by the color of the prompt area of the cell. For example, the following cell is a code cell with kernel R.

In [3]:
'This is cool !'

Similar to other Jupyter kernels, SoS defines a number of magics, which are special commands that start with character %. For example, expression {2**4} in the following code cell is expanded by magic %expand before it is passed to the underlying R kernel.

In [4]:
The value of expression '2**4' in Python is 16

A note of caution is that because the underlying kernels also accepts their own magics, SoS magics in subkernel cells must start from the first lines. The following code will send %expand to the underlying Python3 kernel and cause an error. SoS cells are not affected by this restriction so you can put magics, new lines, even comments before actual cell content.

In [5]:
UsageError: Line magic function `%expand` not found.

A cell with a SoS kernel can be either a SoS cell or a Workflow cell, with the latter containing section headers. A scratch cell is executed with a SoS kernel so that you can execute arbitrary SoS (Python) statements,

In [6]:
This is a scratch cell

or a single SoS step without header

In [7]:
%preview rand.txt
> rand.txt (51 B):
0 line
-0.2379578 -0.267737 -1.040099 0.1863217 -0.2325915

here a SoS magic %preview is used to preview the output of the cell.

The last type of SoS cell contains formal definition of SoS steps. These cells define complete SoS workflows and can only be executed by SoS magics (with magics %run or %sosrun) in Jupyter or by SoS commands from command line. For example, executing the following cell would not execute any statement.

In [8]:

and you can execute the cell with a SoS magic %run.

In [9]:
WF
Workflow ID
3B4C7C08FA583377
Index
#1
completed
This is step wf_20 of a workflow

What is even more magical about these cells is that they form notebook workflows that consist of all sections defined in the Jupyter notebook. A %sosrun magic would collect all workflow stepss in a notebook and execute them.

In [10]:
WF
Workflow ID
0C0BECAB6328BF28
Index
#2
completed
This is step wf_20 of a workflow
This is step wf_30 of a workflow

SoS provides a rich environment for you to analyze data in different languages. Generally speaking, you can

  • Use subkernels to analyze data interactively using different languages.
  • Use SoS cells to execute complete (and separate) scripts to analyze data or test steps of workflows, and
  • If needed, convert SoS cells to workflow cells to create complete workflows to analyze data in batch mode.

Switch between kernels

One of the most important features of the SoS kernel is its support for multiple Jupyter subkernel. A subkernel can be any Jupyter supported kernels that have been installed locally (or a remote ikernel with a local definition).

A subkernel has the following properties:

Property Example Options of magics %use and %with Comments
name R, R2 (positional) Name to identify the subkernel, usually the same as language name
kernel ir, python, R_remote -k, --kernel Name of Jupyter kernel, must be show in the output of command jupyter kernelspec list
language R, Python2 -l, --language SoS definition of the language, which enables magics %get and %get for the kernel.
color red, #FAEBD7 -c, --color Background color of the cell, with default defined by language definition.

You can switch the kernel of a code cell from a dropdown list at the top right corner of the cell or using the %use or %with magic. Despite of the flexibility on the use of local and remote kernels, multiple instances of the same kernel, use of self-defined languages, the majority of the times you will be using magics like

%use R

to switch to a language with default kernel (ir), color (FDEDEC), and name (R).

In [11]:

starts and switches to a ir kernel so that you can enter any R commands as if you are working on a notebook with a ir kernel. Note that R stands for a SoS language extension that uses kernel ir and you have to use the kernel name (e.g. iperl) if there is no language extension for the kernel.

In [12]:

As you can see, a different style is used for cells executed by the ir kernel. After you are done with the ir kernel, you can switch back to the SoS kernel using magic

In [13]:

The %with magic is similar to magic %use but it does not start a new kernel, and it accepts options --in (-i) and --out (-o) to pass specified input variables to the kernel, and return specified output variables from the kernel after the completion of the evaluation.

For example, if you have

In [14]:

you can pass the input and output variables to magic %with

In [15]:

and obtain the result in the SoS kernel

In [16]:
Out[16]:
[-1.04893205572007, 0.700078926097665, -0.334921652134619]

Note that any new cell will inherit the kernel of its previous code cell.

Side panel

SoS provides a side panel that can be toggled by a cube icon next to the language selection dropdown. The side panel contains a special cell that is used for two purposes:

  1. A scratch cell using which you can evaluate any expression and check its results.
  2. A preview cell in which the %preview magic of any cell would be executed if the side panel is open.

The input area of the panel cell has a dropdown button that allows you to execute a few frequently executed magics and previous executed statements.

Because this cell is not part of the main notebook, its output will not be saved with the notebook. This allows you to test commands, check environment, values of variables and content of files without affecting the content of the notebook.

Keyboard Shortcuts

In addition to shortcuts defined by Jupyter (e.g. Ctrl-Enter to evaluate a cell and Shift-Enter to evaluate a cell and move next), the SoS kernel defines the following shortcuts

  1. Ctrl-Shift-Enter This shortcut send current line or selected text to the panel cell for evaluation. This effectively allows you to evaluate content of a cell line by line. This shortcut works for both code and markdown cells. The panel cell will switch to the kernel of the sending cell if the sending cell is a code cell.
  2. Ctrl-Shift-t executes magic %toc that displays the table of content of the current notebook in the side panel, allowing you to easily navigate within a (long) notebook.
  3. Ctrl-Shift-O (output) toggles a code cell tag show_output and mark the output with a gray bar to the right of the output area, or toggle a markdown cell tag hide_output. The tags will be rendered accordingly in HTML reports generated using sos templates.
  4. Ctrl-Shift-v (paste-table) If you have a table copied from external sources such as HTML page or excel file and if you are inside of a markdown cell, this shortcut pastes table as markdown code to the current cell. This allows easy copy of tables to SoS notebook.
  5. Ctrl-Shift-m (markdown) toggle a cell between markdown and code type, which can be easier to use than select code or markdown cell type from tool bar.
  6. Ctrl-B toggles side-panel. This is for compatibility with the toc2 extension of Jupyter notebook.

Other usage hints

  1. Paste pictures from clipboard: You can paste image from clipboard directly to a markdown cell of the Jupyter notebook using Ctrl-V (or Cmd-V under Mac OS X). The key here is that you should select a markdown cell before pasting.
  2. Drag and drop figures: You can drag a picture and drop it to a markdown cell of the Jupyter notebook.
  3. Tab completion: You can use tab to complete keyword, magics (enter % and press TAB to get a list of magics), variable name, function name, file name etc.
  4. Inspect name: You can place your cursor inside a keyword, function name, magic etc, and press CTRL-TAB to inspect it. SoS will show variable name,, help message etc depending on the keyword that is being inspected.

How to get help

It is recommended that you go through this document and understand how SoS Notebook works, but it is of course not possible for you to memorize all the magics, SoS actions and their options. To get help, you can

  1. For a complete reference, visit the documentation page of SoS.
  2. For a list of available magics, type % and TAB (completion).
  3. For help on a particular magic, execute the magic with option -h (e.g. %run -h) in the side panel. Alternatively, you can place your cursor on %run and press CTRL-TAB to get a short description of magic.
  4. For help on a SoS action (e.g. python), place your cursor on the action name (e.g. before : on python:), press CTRL-TAB to get the help message of the action.

Data exchange among kernels

A SoS notebook can have multiple live kernels. A SoS notebook can be used as a collection of cells from multiple independent notebooks. However, the real power of SoS Notebook lies in its ability to exchange data among kernels.

String interpolation (magic %expand)

Cell content is by default sent to SoS or subkernels untouched. A expand magic interpolate cell content with variables in the SoS kernel as if the content is a Python f-string. For example,

In [17]:
In [18]:
{filename}
In [19]:
somefile
In [20]:
A filename "myfile" is passed 
In [21]:
A filename "somefile" is passed 

In the last example, {filename} is expanded by the %expand magic so the following statements are sent to Python 3:

filename = 'myfile'
print(f'A filename "somefile" is passed ')

As you can imagine, you can keep constants such as filenames and parameters in SoS and use these information to compose scripts in subkernels for execution.

By default, the expand magic expands expressions in braces { }. However, as you can see from the last example, %expand can be awkward if the script already contains { }. Although the official method is to double the braces, as in

In [22]:
A filename "myfile" is passed 

SoS recommend the use of an alternative sigil so that you do not have to change the syntax of the original script

In [23]:
SoS filename somefile, Python 3 filename: "myfile"

Markdown cell and markdown kernel

You can include headers, lists, figures, tables in your Jupyter notebook using markdown cells. These markdown cells are rendered by Jupyter itself and do not interact with the kernels. Consequently, it is not possible to pass information (e.g. results from analysis) to markdown cells to generate dynamic output. In contrast, RStudio/RMarkdown has long allowed the inclusion of expressions in markdown texts.

To overcome this problem, you can install a markdown kernel with commands

pip install markdown-kernel
python -m markdown.kernel install

and write markdown code in code cells with a markdown kernel.

In [24]:

Hello, this is a code cell in markdown kernel, not a markdown cell.

The significance of the markdown kernel is that you can pass information from SoS to it through the %expand magic. For example, suppose you have defined a function to calculate Fibonacci sequence,

In [25]:

You can write use it in Python expressions as follows:

In [26]:

The Fibonacci sequence has value 1 when n=1 and 55 when n=10, which can be calculated recursively by fibo(10)=fibo(9) + fib(8)=34+21, and so on.

Explicit data exchange (magic %get)

The SoS kernel provides a mechanism to pass variables between SoS and some subkernels using SoS magics.

For example, magic %get can get specified SoS variables from the SoS kernel to the subkernel ir.

In [37]:
In [38]:
  1. -1
  2. 0
  3. 1
  4. 2
  5. 3

SoS tries its best to find the best-matching data types between SoS and the subkernel and convert the data in the subkernel's native datatypes (e.g. Python's DataFrame to R's data.frame), so the variables you get will always be in the subkernel's native data types, not a wrapper of a foreign object (for example objects provided by rpy2).

In [39]:
'numeric'

Similarly, using magic %put, you can put a variable in the subkernel to the sos kernel.

In [40]:
In [41]:
In [42]:
Out[42]:
[0, 0, 1, 1, 1]

Variables can also be transferred with options --in (-i) and --out (-o) of magics %use and %with. For example, if you would like to add 2 to all elements in data but not sure if pandas can do that, you can send the dataframe to R, add 2 and send it back.

In [43]:
In [44]:
Out[44]:
[2, 2, 3, 3, 3]

Implicit data exchange (sos* variables)

In addition to the use of magics %put and %get and parameters --in and --out of magics %use and %with to explicitly exchange variables between SoS and subkernels, SoS automatically shares variables with names starting with sos among all subkernels.

For example,

In [45]:
In [46]:
sos_var is changed to 200
In [47]:
200
In [48]:
Out[48]:
200

SoS supports an increasing number of languages and provides an interface to add support for other languages. Please refer to chapter Supported Language for details on each supported language. If your language of choice is not yet supported, please considering adding SoS support for your favoriate kernel with a pull request.

In [49]:
rm: mydata.csv: No such file or directory

Capture output (Magic %capture)

Magic capture captures output from a cell, optionally parse it in specified format (option --as, and save or append the result to a variable in SoS (--to or --append). The output of the cell (namely the input to magic %capture) can be

  • standard output, %capture stdout or just %capture
  • standard error, %capture stderr
  • plain text, %capture text from text/plain of message display_data
  • markdown, %capture markdown from text/markdown of message display_data
  • html, %capture html from text/html of message display_data, or
  • raw, which simply returns a list of messages.

For example

In [50]:
1 2 3

captures standard output of the cell to variable r_var

In [51]:
Out[51]:
'1 2 3'
In [52]:
  1. '1'
  2. '2'
  3. '3'
In [53]:
Out[53]:
'[1] "1" "2" "3"'

The capture magic allows passing information from the output of cells back to SoS. If the output text is structured, it can even parse the output in json, csv (comma separated values), or tsv (tab separated values) formats and save them in Python dict or Pandas.DataFrame.

For example,

In [54]:
a,b
1,2
3,4
In [55]:
Out[55]:
a b
0 1 2
1 3 4

This method is especially suitable for kernels without a language module so there is no way to use a %get magic to retrieve information from it. For example, a sparql kernel simply executes sparql queries and return results. It does not have a concept of variable so you will have to capture its output to handle it in SoS:

In [58]:
Endpoint set to: http://dbpedia.org/sparql
Return format: JSON
In [63]:
Out[63]:
[<a href="http://www.w3.org/2002/07/owl#differentFrom" target="_other">http://www.w3.org/2002/07/owl#differentFrom</a>,
 <a href="http://www.w3.org/2000/01/rdf-schema#seeAlso" target="_other">http://www.w3.org/2000/01/rdf-schema#seeAlso</a>,
 <a href="http://www.w3.org/2002/07/owl#sameAs" target="_other">http://www.w3.org/2002/07/owl#sameAs</a>]

In addition to option --to, the %capture magic can also append cell output to an existing variable using paramter --append. How newly captured data is appended to the existing variable depends on type of existing and new data. Basically,

  1. If the variable does not exist, --append is equivalent to --to.
  2. If the existing variable is a list, the new data is appended to the list.
  3. If the new data has the same type with the existing one, the new data will be appended to existing variable. That it to say, strings are appended to existing string, dictionaries are merged to existing dictionaries, and rows of data frames are appended to existing data frame.

For example, we already have a variable table

In [64]:
Out[64]:
a b
0 1 2
1 3 4
In [67]:
a,b
5,6
7,8
In [68]:
Out[68]:
a b
0 1 2
1 3 4
0 5 6
1 7 8

If you would like to collect results from multiple cells, you can create a list and use it to capture them

In [69]:
In [70]:
result from sh
In [71]:
result from python
In [72]:
Out[72]:
['result from sh\n', 'result from python\n']

Preview of results

Instant preview of intermediate results is extrmely useful for interactive data analysis. SoS provides rich and extensible preview features to

  • preview files in many different formats,
  • preview variables and expressions in SoS and subkernels,
  • show preview results temporarily in the side panel or permanently in the main notebook, and
  • generate interactive tables and plots for better presentation of data both in Jupyter notebook and in converted HTML reports.

Magic %preview

SoS provides a %preview magic to preview files and variables (and their expressions) after the completion of a cell. By default, %preview displays results in the side panel if the side panel is open, and otherwise in the main notebook. You can override this behavior with options -p (--panel) or -n (--notebook) to always display results in the side panel or notebook.

For example, in a subkernel R, you can do

In [73]:
pdf: 2

to preview a.png generated by this cell. The figure will be displayed in the side panel if the side panel is open, or otherwise in the main notebook.

The %preview magic also accept sos variable and expressions. For example, the following example previes variable mtcars in a R kernel.

In [74]:
%preview mtcars
> mtcars:
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX421.0 6 160.0110 3.90 2.62016.460 1 4 4
Mazda RX4 Wag21.0 6 160.0110 3.90 2.87517.020 1 4 4
Datsun 71022.8 4 108.0 93 3.85 2.32018.611 1 4 1
Hornet 4 Drive21.4 6 258.0110 3.08 3.21519.441 0 3 1
Hornet Sportabout18.7 8 360.0175 3.15 3.44017.020 0 3 2
Valiant18.1 6 225.0105 2.76 3.46020.221 0 3 1
Duster 36014.3 8 360.0245 3.21 3.57015.840 0 3 4
Merc 240D24.4 4 146.7 62 3.69 3.19020.001 0 4 2
Merc 23022.8 4 140.8 95 3.92 3.15022.901 0 4 2
Merc 28019.2 6 167.6123 3.92 3.44018.301 0 4 4
Merc 280C17.8 6 167.6123 3.92 3.44018.901 0 4 4
Merc 450SE16.4 8 275.8180 3.07 4.07017.400 0 3 3
Merc 450SL17.3 8 275.8180 3.07 3.73017.600 0 3 3
Merc 450SLC15.2 8 275.8180 3.07 3.78018.000 0 3 3
Cadillac Fleetwood10.4 8 472.0205 2.93 5.25017.980 0 3 4
Lincoln Continental10.4 8 460.0215 3.00 5.42417.820 0 3 4
Chrysler Imperial14.7 8 440.0230 3.23 5.34517.420 0 3 4
Fiat 12832.4 4 78.7 66 4.08 2.20019.471 1 4 1
Honda Civic30.4 4 75.7 52 4.93 1.61518.521 1 4 2
Toyota Corolla33.9 4 71.1 65 4.22 1.83519.901 1 4 1
Toyota Corona21.5 4 120.1 97 3.70 2.46520.011 0 3 1
Dodge Challenger15.5 8 318.0150 2.76 3.52016.870 0 3 2
AMC Javelin15.2 8 304.0150 3.15 3.43517.300 0 3 2
Camaro Z2813.3 8 350.0245 3.73 3.84015.410 0 3 4
Pontiac Firebird19.2 8 400.0175 3.08 3.84517.050 0 3 2
Fiat X1-927.3 4 79.0 66 4.08 1.93518.901 1 4 1
Porsche 914-226.0 4 120.3 91 4.43 2.14016.700 1 5 2
Lotus Europa30.4 4 95.1113 3.77 1.51316.901 1 5 2
Ford Pantera L15.8 8 351.0264 4.22 3.17014.500 1 5 4
Ferrari Dino19.7 6 145.0175 3.62 2.77015.500 1 5 6
Maserati Bora15.0 8 301.0335 3.54 3.57014.600 1 5 8
Volvo 142E21.4 4 121.0109 4.11 2.78018.601 1 4 2

You can also specify expressions to be reviewed, but similar to command line arguments, you will need to quote the expression if it contains spaces. For example,

In [75]:
%preview rownames(mtcars) mtcars['gear']
> rownames(mtcars):
  1. 'Mazda RX4'
  2. 'Mazda RX4 Wag'
  3. 'Datsun 710'
  4. 'Hornet 4 Drive'
  5. 'Hornet Sportabout'
  6. 'Valiant'
  7. 'Duster 360'
  8. 'Merc 240D'
  9. 'Merc 230'
  10. 'Merc 280'
  11. 'Merc 280C'
  12. 'Merc 450SE'
  13. 'Merc 450SL'
  14. 'Merc 450SLC'
  15. 'Cadillac Fleetwood'
  16. 'Lincoln Continental'
  17. 'Chrysler Imperial'
  18. 'Fiat 128'
  19. 'Honda Civic'
  20. 'Toyota Corolla'
  21. 'Toyota Corona'
  22. 'Dodge Challenger'
  23. 'AMC Javelin'
  24. 'Camaro Z28'
  25. 'Pontiac Firebird'
  26. 'Fiat X1-9'
  27. 'Porsche 914-2'
  28. 'Lotus Europa'
  29. 'Ford Pantera L'
  30. 'Ferrari Dino'
  31. 'Maserati Bora'
  32. 'Volvo 142E'
> mtcars['gear']:
gear
Mazda RX44
Mazda RX4 Wag4
Datsun 7104
Hornet 4 Drive3
Hornet Sportabout3
Valiant3
Duster 3603
Merc 240D4
Merc 2304
Merc 2804
Merc 280C4
Merc 450SE3
Merc 450SL3
Merc 450SLC3
Cadillac Fleetwood3
Lincoln Continental3
Chrysler Imperial3
Fiat 1284
Honda Civic4
Toyota Corolla4
Toyota Corona3
Dodge Challenger3
AMC Javelin3
Camaro Z283
Pontiac Firebird3
Fiat X1-94
Porsche 914-25
Lotus Europa5
Ford Pantera L5
Ferrari Dino5
Maserati Bora5
Volvo 142E4

If option -w (--workflow) is specified, sos collects workflow steps from the current notebook and preview it. This would give you a better sense of what would be saved with magic %sossave or executed with magic %sosrun.

In [76]:

Magic %preview also accepts an option --off, in which case it turns off the preview of output files from the SoS workflow.

Preview variables from another kernel

The %preview magic previews the variables with the kernel using which the cell is evaluated. For example, even if you have var defined in both SoS and R kernel,

In [77]:
In [78]:
In [79]:
%preview val
> val:
  1. 1
  2. 2
  3. 1
  4. 2
  5. 1
  6. 2
In [80]:
%preview val
> val: list of length 4
[2, 3, 2, 3]

Note that SoS also displays type information of variables in the SoS kernel, including row and columns for data frames, but not in subkernels.

If you would like to preview variable in another kernel, you can specify it using option --kernel, for example, the following cell previews a variable r.val in a SoS kernel.

In [81]:
%preview val
> val:
  1. 1
  2. 2
  3. 1
  4. 2
  5. 1
  6. 2

Preview of files

The %preview magic can be used to preview files in a variety of formats. For example, the following cell download a bam file and use %preview magic to preview the content of the file to make sure the file has been downloaded correctly.

In [82]:
issue225.bam:   0%|                                   | 0/4261 [00:00<?, ?it/s]
                                                                                                                                          
Out[82]:
0
%preview issue225.bam
> issue225.bam (4.2 KiB):
> Failed to preview file issue225.bam: The 'pysam; extra == "sam"' distribution was not found and is required by the application
> Failed to preview file or expression issue225.bam

Similarly, you can preview files produced by another kernel

In [83]:
pdf: 2
%preview test.png
> test.png (20.5 KiB):

Note that SoS by default preview PDF files as an embeded object (iframe), but you can use option -s png to convert PDF to png to preview it as a image, if you have imagematick and Python package wand installed.

Preview options

The %magic accepts options that are format dependent. This table lists options that are available to specific file types. Options for previewing dataframes will be described later.

file type Option description
pdf --style png (-s png) Convert PDF to png before preview. All pages are combined to produce a sngle PNG figure if the pdf file contains multiple pages. This option requires Python module wand.
--pages 2 3 With --style png, preview specified pages (page numbers starting from 1) from a multi-page PDF file.
txt --limit lines (-l) Preview number of lines to preview for text files (not limited to files with extension .txt), default to 5

Automatic preview

SoS will automatically preview results of Python assignments when you send statements to the sidepanel for execution. For example, if you do execute the following cell with Ctrl-Shift-Enter,

In [84]:

The cell would be executed in the side panel as

%preview s
s='12345'

which allows instant feedback when you step through your code.

SoS would also automatic preview results from SoS workflows that are specified with statement output:. For example, the following SoS cell executes a scratch step (step without section head) and generates output a.png. The figure would be automatically displayed in the side panel after the step is executed.

In [85]:
null device 
          1 

Preview of remote files

When a workflow or a task is executed remotely (see Remote Execution for details, result files might be on a remote host that is unavailable for local preview. In this case, you can specify the host with option -r HOST and preview files remotely. This essentially executes a command sos preview -r HOST FILE on the remote host and allows you to preview content of (large) files without copying them locally.

For example, you can execute a workflow on a remote host dev to generate a file (mygraphic.png) and use %preview -r dev to preview it.

In [86]:
UNDEFINED
Workflow ID
UNDEFINED
Index
#3
completed
%preview mygraphic.png -r dev
Failed to preview ['mygraphic.png'] on remote host dev
ERROR: Undefined remote host dev

Preview of tables

If a data to be previewed is a Pandas DataFrame (or a csv or excel file which would be previewed as a DataFrame), SoS will preview it as a sortable and searchable table. For example, the following cell get a data.frame mtcar from the R kernel (as a pandas DataFrame) and preview it in the main notebook:

In [87]:
Loading required package: feather
%preview mtcars
> mtcars: DataFrame of shape (32, 11)
  mpg   cyl   disp   hp   drat   wt   qsec   vs   am   gear   carb  
Mazda RX4 21.0 6.0 160.0 110.0 3.90 2.620 16.46 0.0 1.0 4.0 4.0
Mazda RX4 Wag 21.0 6.0 160.0 110.0 3.90 2.875 17.02 0.0 1.0 4.0 4.0
Datsun 710 22.8 4.0 108.0 93.0 3.85 2.320 18.61 1.0 1.0 4.0 1.0
Hornet 4 Drive 21.4 6.0 258.0 110.0 3.08 3.215 19.44 1.0 0.0 3.0 1.0
Hornet Sportabout 18.7 8.0 360.0 175.0 3.15 3.440 17.02 0.0 0.0 3.0 2.0
Valiant 18.1 6.0 225.0 105.0 2.76 3.460 20.22 1.0 0.0 3.0 1.0
Duster 360 14.3 8.0 360.0 245.0 3.21 3.570 15.84 0.0 0.0 3.0 4.0
Merc 240D 24.4 4.0 146.7 62.0 3.69 3.190 20.00 1.0 0.0 4.0 2.0
Merc 230 22.8 4.0 140.8 95.0 3.92 3.150 22.90 1.0 0.0 4.0 2.0
Merc 280 19.2 6.0 167.6 123.0 3.92 3.440 18.30 1.0 0.0 4.0 4.0
Merc 280C 17.8 6.0 167.6 123.0 3.92 3.440 18.90 1.0 0.0 4.0 4.0
Merc 450SE 16.4 8.0 275.8 180.0 3.07 4.070 17.40 0.0 0.0 3.0 3.0
Merc 450SL 17.3 8.0 275.8 180.0 3.07 3.730 17.60 0.0 0.0 3.0 3.0
Merc 450SLC 15.2 8.0 275.8 180.0 3.07 3.780 18.00 0.0 0.0 3.0 3.0
Cadillac Fleetwood 10.4 8.0 472.0 205.0 2.93 5.250 17.98 0.0 0.0 3.0 4.0
Lincoln Continental 10.4 8.0 460.0 215.0 3.00 5.424 17.82 0.0 0.0 3.0 4.0
Chrysler Imperial 14.7 8.0 440.0 230.0 3.23 5.345 17.42 0.0 0.0 3.0 4.0
Fiat 128 32.4 4.0 78.7 66.0 4.08 2.200 19.47 1.0 1.0 4.0 1.0
Honda Civic 30.4 4.0 75.7 52.0 4.93 1.615 18.52 1.0 1.0 4.0 2.0
Toyota Corolla 33.9 4.0 71.1 65.0 4.22 1.835 19.90 1.0 1.0 4.0 1.0
Toyota Corona 21.5 4.0 120.1 97.0 3.70 2.465 20.01 1.0 0.0 3.0 1.0
Dodge Challenger 15.5 8.0 318.0 150.0 2.76 3.520 16.87 0.0 0.0 3.0 2.0
AMC Javelin 15.2 8.0 304.0 150.0 3.15 3.435 17.30 0.0 0.0 3.0 2.0
Camaro Z28 13.3 8.0 350.0 245.0 3.73 3.840 15.41 0.0 0.0 3.0 4.0
Pontiac Firebird 19.2 8.0 400.0 175.0 3.08 3.845 17.05 0.0 0.0 3.0 2.0
Fiat X1-9 27.3 4.0 79.0 66.0 4.08 1.935 18.90 1.0 1.0 4.0 1.0
Porsche 914-2 26.0 4.0 120.3 91.0 4.43 2.140 16.70 0.0 1.0 5.0 2.0
Lotus Europa 30.4 4.0 95.1 113.0 3.77 1.513 16.90 1.0 1.0 5.0 2.0
Ford Pantera L 15.8 8.0 351.0 264.0 4.22 3.170 14.50 0.0 1.0 5.0 4.0
Ferrari Dino 19.7 6.0 145.0 175.0 3.62 2.770 15.50 0.0 1.0 5.0 6.0
Maserati Bora 15.0 8.0 301.0 335.0 3.54 3.570 14.60 0.0 1.0 5.0 8.0
Volvo 142E 21.4 4.0 121.0 109.0 4.11 2.780 18.60 1.0 1.0 4.0 2.0

Compared to previewing the same variable in R, you have the addition features of

  1. sorting table by clicking the sort icon at the header of each column
  2. displaying a subset of rows that matches specified text in the input text

Note that SoS by default outputs the first 200 rows of the table. You can use option -l (--limit) to change this threshold.

What makes this feature particularly interesting is that the table will be sortable and searchable when you save the jupyter notebook in HTML format (through command sos convert analysis.ipynb analysis.html --template sos-report or magic %sossave --to html).

Scatter Plot

The tablular preview of Pandas DataFrame is actually using the default table style of %preview. If you have one or more numeric columns, you can use the scatterplot style of %preview to view the data. For example, the following command plots mpg vs disp of the mtcars dataset, stratified by cyl.

In [88]:
%preview mtcars
> mtcars: DataFrame of shape (32, 11)

The advantage of this scatterplot is that you can see a description of data when you hover over the data points, which can be more informative than static figures produced by, for example, R.

The scatterplot style provides a number of options and you can use option -h with -s to display them:

In [89]:
usage: %preview -s scatterplot [-h] [--ylim YLIM YLIM] [--xlim XLIM XLIM]
                               [--log {x,y,xy,yx}] [--width WIDTH]
                               [--height HEIGHT] [-b BY [BY ...]]
                               [--show SHOW [SHOW ...]]
                               [-t [TOOLTIP [TOOLTIP ...]]] [-l LIMIT]
                               [cols [cols ...]]

positional arguments:
  cols                  Columns to plot, which should all be numeric. If one
                        column is specified, it is assumed to be a x-y plot
                        with x being 0, 1, 2, 3, .... If two or more columns
                        (n) are specified, n-1 series will be plotted with the
                        first column being the x axis, in which case an
                        "_index" name can be used to specify 0, 1, 2, 3, ....
                        This option can be igured if the dataframe has only
                        one or two columns.

optional arguments:
  -h, --help            show this help message and exit
  --ylim YLIM YLIM      Range of y-axis
  --xlim XLIM XLIM      Range of x-axis
  --log {x,y,xy,yx}     Make x-axis, y-axis, or both to logarithmic
  --width WIDTH         Width of the plot.
  --height HEIGHT       Height of the plot.
  -b BY [BY ...], --by BY [BY ...]
                        columns by which the data points are stratified.
  --show SHOW [SHOW ...]
                        What to show in the plot, can be 'lines', 'points' or
                        both. Default to points, and lines if x-axis is
                        sorted.
  -t [TOOLTIP [TOOLTIP ...]], --tooltip [TOOLTIP [TOOLTIP ...]]
                        Fields to be shown in tooltip, in addition to the row
                        index and point values that would be shown by default.
  -l LIMIT, --limit LIMIT
                        Maximum number of records to plot.

For example, you can show more tooltips and multiple columns as follows:

In [90]:
%preview mtcars
> mtcars: DataFrame of shape (32, 11)

Command sos preview

The %preview magic has a command line counterpart sos preview. This command cannot display any figure but can be convenient to preview content of compressed files and files that are not previewed by the operating system (e.g. bam files). The -r option is especially useful in previewing files on a remote host without logging to the remote host or copying the files to local host.

Execution of Workflows

We have discussed markdown cells, subkernel cells, and SoS cells, which can be considered as a subkernel cell with SoS (Python) kernel. SoS notebook supports another type of cell, namely workflow cell.

A workflow cell is a SoS cell with one or more formal SoS steps, which are marked by a section header. In summary,

  • A workflow cell can contain a complete workflow and be executed by magic %run.
  • Sections defined in all workflow cells in a notebook form notebook workflows, which can be executed by magic %sosrun or by command sos run (or sos-runner) from command line.

Magic %run

Magic %sos executes the content of a cell as a complete SoS workflow in a separate namespace.

For example, the following workflow cell defines a SoS step, but SoS Notebook will ignore it when you execute it with Ctrl-Enter.

In [91]:

You can only execute such a cell with magic %run:

In [92]:
EXAMPLE_STEP
Workflow ID
FFCCFD4D23510E85
Index
#4
completed
This is example_step_1

The %run magic treats the content of the cell as a SoS workflow so you can put multiple sections in the cell

In [93]:
EXAMPLE_STEP
Workflow ID
2A6A70A1D5D8A926
Index
#5
completed
This is example_step_10
This is example_step_20

You can have global section, define parameter, and execute the workflow multiple times with multiple %run magics:

In [94]:
EXAMPLE_STEP
Workflow ID
4B25B3A6E97903B4
Index
#6
completed
This is example_step_15 with option 100
This is example_step_25

It is important to remember that the workflow is executed in its own namespace so it needs to be self-contained. That is to say, if you define a variable in the SoS namespace,

In [95]:

The variable is not available to the workflow and has to be passed as parameters

In [98]:
> %run --var 500
DEFAULT
Workflow ID
A7892418956F278A
Index
#10
completed
500

Similar to command line tool sos run, magic %run accepts a large number of parameters. Please refer to the output of %run -h for details.

Magic %sosrun

Magic %sosrun is similar to %sos in its ability to execute workflows with many options, with the major difference in that it execute notebook workflows, which are defined by all sections defined in the notebook. The notebook workflow can be displayed with magic %preview --workflow and is the workflow that will be executed if the notebook is executed from the command line with command sos run or sos-runner.

It is worth noting that magics %sosrun and %run can be used together, so that you can run single workflows with %run and multiple workflows with %sosrun. For example, you can debug and execute single workflows with magic %run`

In [99]:
STEPA
Workflow ID
5DD1524529C006EE
Index
#11
completed
stepA with parameter 10

and execute many such workflows in batch mode with magic %sosrun:

In [100]:
MASTER
Workflow ID
C56A78AED7B67E8E
Index
#12
completed
stepA with parameter 30

Conversion between .ipynb and .sos files

You can save a Jupyter notebook with SoS kernel to a SoS script using File -> Download As -> SoS from the browser, or using command

$ sos convert myscript.ipynb myscript.sos

By default, only workflow cells will be saved to a .sos file to create a SoS script in correct syntax. You can also save all cells, including cells in other kernels in a '.sos' file using option --all, although the resulting .sos file might not be executable in bath mode.

You can convert an .sos script to .ipynb format using command

$ sos convert myscript.sos myscript.ipynb

or even to an executed notebook with option --execute

$ sos convert myscript.sos myscript.ipynb --execute

and SoS will either assign each SoS step to a cell, or split the workflow according to some cell magic if the .sos was exported with the --all option.

Please refer to the tutorial on File Conversion for details of these commands.

Convert .ipynb to HTML format

Because it is not particularly easy to open an .ipynb file (a live Jupyter server is required) and because of risk of changing the content of an .ipynb file, it is often desired to save an .ipynb file in HTML format.

Jupyter makes use of a template system to control the content and style of exported HTML file. For example, you can use the default Jupyter template (File -> Save As -> HTML) to save all input and output cells, or you can use the hide code Jupyter extension to manually hide input or output of each cell and produce a customized .HTML file.

SoS provides its own template called sos-report, which can be used from command line using command

sos convert analysis.ipynb analysis.html --template sos-report

or from within Jupyter with magic %sossave with specified filename

%sossave analysis.html -f

or using the same name as the notebook with option --to html

%sossave --to html -f

Option -f overrides existing file if an output file already exists. You could also use other sos based templates such as sos-full, which displays all cells.

The generated HTML file has the following properties

  1. It by default only displays markdown cells, input and output cells with tag report_cell, and output cells with tag report_output. This is the report view that only displays the results of interest.

  2. If you point your mose to the left top corner of the window, a display control panel will be displayed for you to select additional items to display, including all input and output cells, input and output prompts, and various messages. This is the notebook view that displays all the details of the analysis.

The report_output tag can be toggled using keyboard shortcut Ctrl-Shift-O and cells with this tag will be marked by a gray bar to the right of the output area. The report_cell and scratch tag has to be manually added through the tag toolbar (View -> Cell Toolbar -> Tags). Whereas report_output only works for output of code cells, the scratch tag applies to both markdown and code cells. Finally, if you have created your own customized template, you can define sos-default-template in a configuration file (eg. sos config --set sos-default-template mytemplate.tpl) instead of specifying it with option --template each time.