- Difficulty level: difficult
- Time need to lean: 30 minutes or more
- Key points:
- It is relatively easy to write a language module with basic functions
- Data exchange for different data types are handled independently so you can start from most common types and add more types gradually
SoS can interact with any Jupyter kernel. As shown in the SoS notebook tutorial, SoS can
- List the kernel in the language dropdown box and use it to execute associated cells
- Use
%expand
magic to prepare input before sending to the kernel - Use
%capture
magic to capture the output from the kernel - Use
%render
magic to render output from the kernel
without knowing what the kernel does.
However, if the kernel supports the concept of variable (not all kernel does), a language module for the kernel would allow SoS to work more efficiently with the kernel. More specifically, SoS can
- Mark the prompt areas of each cell to differentiate cells that belong to different kernels
- Preview variables when an assignment is executed during line-by-line execution
- Change current directory of all subkernels with
%cd
magic - Exchange variables between subkernels using magics
%put
,%get
and%with
- Expand (markdown) texts in subkernel using magic
%expand --in
- Preview the content of variables using magic
%preview
- Show the version information of kernels using magic
%sessioninfo
Whereas data exchange among subkernels is really powerful, it is important to understand that, SoS does not tranfer any variables among kernels, it creates independent homonymous variables of similar types that are native to the destination language. For example, if you have the following two variables
a = 1
b = c(1, 2)
in R and executes a magic
%get a b --from R
in a SoS cell, SoS actually execute the following statements, in the background, to create variables a
and b
in Python
a = 1
b = [1, 2]
These variables are independent so that changing the value of variables a
or b
in one kernel will not affect another. We also note that a
and b
are of different types in Python although they are of the same numeric
type in R
(a
is technically speaking an array of size 1).
The best way to start a new language module is to read the source code of an existing language module and adapt it to your language. Our github oraganization has a number of language modules. Module sos-r
is a good choice and you should try to match the corresponding items with code in kernel.py
when going through this tutorial.
To support a new language, you will need to write a Python package that defines a class, say mylanguage
, that provides the following class attributes:
supported_kernels
supported_kernels
should be a dictionary of language and names of the kernels that the language supports. For example, ir
is the name of kernel for language R
so this attribute should be defined as:
supported_kernels = {'R': ['ir']}
If multiple kernels are supported, SoS will look for a kernel with matched name in the order that is specified. This is the case for JavaScript
where multiple kernels are available:
supported_kernels = {'JavaScript': ['ijavascript', 'inodejs']}
Multiple languages can be specified if a language module supports multiple languages. For example, MATLAB
and Octave
share the same language module
supported_kernels = {'MATLAB': ['imatlab', 'matlab'], 'Octave': ['octave']}
Wildcard characters are allowd in kernel names, which are useful for kernels that contain version numbers:
supported_kernels = {'Julia': ['julia-?.?']}
Finally, if SoS cannot find any kernel that it recognizes, it will look into the language
information of the kernelspec.
background_color
background_color
should be a name or #XXXXXX
value for a color that will be used in the prompt area of cells that are executed by the subkernel. An empty string can be used for using default notebook color. If the language module defines multiple languages, a dictionary {language: color}
can be used to specify different colors for supported languages. For example,
background_color = {'MATLAB': '#8ee7f1', 'Octave': '#dff8fb'}
is used for MATLAB
and Octave
.
cd_command
cd_command
is a command to change current working directory, specified with {dir}
intepolated with option of magic %cd
. For example, the command for R is
cd_command = 'setwd({dir!r})'
where !r
quotes the provided dir
. Note that { }
are used as a Python f-string but no f
prefix should be used.
options
A Python dictionary with options that will be passed to the frontend. Currently two options variable_pattern
and assignment_pattern
are supported. Both options should be regular expressions in JS style.
Option
variable_pattern
is used to identify if a statement is a simple variable (nothing else). If this option is defined and the input text (if executed at the side panel) matches the pattern, SoS will prepend%preview
to the code. This option is useful only when%preview var
displays more information thanvar
.Option
assignment_pattern
is used to identify if a statement is an assignment operation. If this option is defined and the input text matches the pattern, SoS will prepend%preview var
to the code wherevar
should be the first matched portion of the pattern (use( )
). This mechanism allows SoS to automatically display result of an assignment when you step through the code.
Both options
are optional.
An instance of the class would be initialized with the sos kernel and the name of the subkernel, which does not have to be one of the supported_kernels
(could be self-defined) and should provide the following attributes and functions. Because these attributes are instantiated with kernel name, they can vary (slightly) from kernel to kernel.
Function expand(self, text, sigil)
(new in SoS Notebook 0.20.8)
Function expand
should be a Python function that passes text
(most likely in Markdown format) with inline expressions, evaluate the expressions in the subkernel and return expanded text. This can be used by the markdown kernel for the execution of inline expressions of, for example, R markdown text.
Function preview(self, item)
Function preview
accepts a name, which should be the name of a variable in the subkernel. This function should return a tuple of two items (desc, preview)
where
desc
should be a text (can be empty) that describes the type, size, dimension, or other general information of the variable, which will be displayed after variable name.preview
can be- A single
str
that are printed asstdout
- A dictionary, which should contain keys such as
text/plain
,text/html
,image/png
and corresponding data. The data will be sent directly asdisplay_data
and allows you to return different types of preview result, even images. - A list or tuple of two dictionaries, with the first being the
data
dictionary, and the second being themetadata
directionary for adisplay_data
message.
- A single
Function sessioninfo(self)
Function sessioninfo
should a Python function that returns information of the running kernel, usually including version of the language, the kernel, and currently used packages and their versions. For R
, this means a call to sessionInfo()
function. The return value of this function can be
- A string
- A list of strings or
(key, value)
pairs, or - A dictinary.
The function will be called by the %sessioninfo
magic of SoS.
Obtain variable from SoS
The get_vars
function should be defined as
def get_vars(self, var_names)
where
self
is the language instance with access to the SoS kernel, andvar_names
are names in the sos dictionary.
This function is responsible for probing the type of Python variable and create a similar object in the subkernel.
For example, to create a Python object b = [1, 2]
in R
(magic %get
), this function could
- Obtain a R expression to create this variable (e.g.
b <- c(1, 2)
) - Execute the expression in the subkernel to create variable
b
in it.
Note that the function get_vars
can change the variable name because a valid variable name in Python might not be a valid variable name in another language. The function should give a warning (call self.sos_kernel.warn()
) if this happens.
Send variables to other kernels
The put_vars
function should be defined as
def put_vars(self, var_names, to_kernel=None)
where
self
is the language instance with access to the SoS kernelvar_name
is a list of variables that should exist in the subkernel.to_kernel
is the destination kernel to which the variables should be passed.
Depending on destination kernel, this function can:
- If direct variable transfer is not supported by the language, the function can return a Python dictionary, in which case the language transfers the variables to SoS and let SoS pass along to the destination kernel.
- If direct variable transfer is supported, the function should return a string. SoS will evaluate the string in the destination kernel to pass variables directly to the destination kernel.
So basically, a language can start with an implementation of put_vars(to_kernel='sos')
and let SoS handle the rest. If needs arise, it can
- Implement variable exchanges between instances of the same language. This can be useful because there are usually lossness and more efficient methods in this case.
- Put variable to another languages where direct varable transfer is much more efficient than transferring through SoS.
NOTE: SoS Notebook before version 0.20.5 supports a feature called automatic variable transfer, which automatically transfers variables with names starting with sos
between kernels. This feature has been deprecated. (#253).
For example, to send a R
object b <- c(1, 2)
from subkernel R
to SoS
(magic %put
), this function can
- Execute an statement in the subkernel to get the value(s) of variable(s) in some format, for example, a string
"{'b': [1, 2]}"
. - Post-process these varibles to return a dictionary to SoS.
The R
sos extension provides a good example to get you started.
NOTE: Unlike other language extension mechanisms in which the python module can get hold of the "engine" of the interpreter (e.g. saspy
and matlab's Python extension start the interpreter for direct communication) or have access to lower level API of the language (e.g. rpy2
), SoS only have access to the interface of the language and perform all conversions by executing commands in the subkernels and intercepting their response. Consequently,
- Data exchange can be slower than other methods.
- Data exchange is less dependent on version of the interpreter.
- Data exchange can happen between a local and a remote kernel.
Also, although it can be more efficient to save large datasets to disk files and load in another kernel, this method does not work for kernels that do not share the same filesystem. We currently ignore this issue and assume all kernels have access to the same file system.
With access to an instance of SoS kernel, you can call various functions of this kernel. However, the SoS kernel does not provide a stable API yet so you are advised to use only the following functions:
This function executes the statement and collects messages send back from the subkernel. Only messages in specified msg_type
are kept (e.g. stream
, display_data
), and name
can be one or both of stdout
and stderr
when stream
is specified.
The returned value is a list of
msg_type, msg_data
msg_type, msg_data
...
so
self.sos_kernel.get_response('ls()', ('stream', ),
name=('stdout', ))[0][1]
runs a function ls()
in the subkernel, collects stdout
, and get the content of the first message.
Debugging
If you are having trouble in figuring out what messages have been returned (e.g. display_data
and stream
can look alike) from subkernels, you can use the %capture
magic to show them in the console panel.
You can also define environment variable SOS_DEBUG=MESSAGE
(or MESSAGE,KERNEL
etc) before starting the notebook server. This will cause SoS to, among other things, log messages processed by the get_response
function to ~/.sos/sos_debug.log
.
Logging
If you would like to add your own debug messages to the log file, you can
from sos.utils import env
env.log_to_file('VARIABLE', f'Processing {var} of type {var.__class__.__name__}.')
If the log message can be expensive to format, you can check if SOS_DEBUG
is defined before logging to the log file:
if 'VARIABLE' in env.config['SOS_DEBUG'] or 'ALL' in env.config['SOS_DEBUG']:
env.log_to_file('VARIABLE', f'Processing {var} of type {var.__class__.__name__}.')
Although you can test your language module in many ways, it is highly recommended that you adopt a standard set of selenium-based tests that are executed by pytest
. To create and run these tests, you should
- Install
selenium
andpytest
- Install Google Chrome and chrome driver
- Set environment variable
JUPYTER_TEST_BROWSER
tolive
if you would like to the test running. Otherwise the tests will be run in a virtual chrome browser without display. - Copy three test files from tests for
sos-r
and adapt them for your language.
Test files
The test suite contains three files:
conftest.py
This is the configuration file for
pytest
that defines how to start a Jupyter server with the notebook with the right kernel. You can simply copy this file for your purpose.test_interface.py
This file contains tests on the interface of the language module, including
- Test for prompt color
- Test for magic
%cd
- Test for change of variable names for magics
%put
and%get
- Test for the automatic exchange of
sos
variables (variables with names starting withsos
- Test for the
%preview
magic - Test for the
%sessioninfo
magic
test_data_exchange.py
This file should contain tests for data exchange between SoS (Python) and the language, and optionally between subkernels. It should separate by data types and direction of data transfer.
All tests should be derived from NotebookTest
derived from sos_notebook.test_utils
, and use a pytest fixture notebook
as follows:
from sos_notebook.test_utils import NotebookTest
class TestDataExchange(NotebookTest):
def test_something(self, notebook):
pass
The notebook
fixture
The notebook
fixture that is passed to each test function contains a notebook instance that you can operate on. Although there are a large number of functions, you most likely only need to learn two of them for your tests:
notebook.call(statement, kernel, expect_error=False)
This function append a new cell to the end of notebook, insert the specified statement
as its content, change the kernel of the cell to kernel
, and executes the cell. It automatically dedent statement
so you can indent multiple statements and cal
notebook.call('''\
%put df --to R
import pandas as pd
import numpy as np
arr = np.random.randn(1000)
arr[::10] = np.nan
df = pd.DataFrame({'column_{0}'.format(i): arr for i in range(10)})
''', kernel='SoS')
This function returns the index of the cell so that you can call notebook.get_cell_output(idx)
if needed. If you are supposed to see some warning messages, use expect_error=True
. Otherwise the function will raise an exception that fails the test.
notebook.check_output(statement, kernel, expect_error=False, selector=None, attribute=None)
This function calls the notebook.call(statement, kernel)
and then notebook.get_cell_output(idx, selector, attribute)
to get the output. The output contains all the text
of the output, and additional text from non-text elements. For example, selector='img', attribute='src'
would return text in <img src="blah">
output. Using this function, most of your unittests can look like the following
def test_sessioninfo(self, notebook):
assert 'R version' in notebook.check_output(
'%sessioninfo', kernel="SoS")
To register a language module with SoS, you will need to add your module to an entry point under section sos-language
. This can be done by adding the something like the following to your setup.py
:
entry_points='''
[sos_language]
Perl = sos_perl.kernel:sos_Perl
'''
With the installation of this package, sos
would be able to import a class sos_Perl
from module sos_perl.kernel
, and use it to work with the Perl
language.