- Difficulty level: easy
- Time need to lean: 15 minutes or less
- Key points:
- There are intuitive corresponding data types between most Python (SoS) and R datatypes
There are several options to install R
and its jupyter kernel irjernel, the easiest of which might be using conda
but it could be tricky to install third-party libraries of R to conda, and mixing R packages from the base
and r
channels can lead to devastating results.
Anyway, after you have a working R installation with irkernel
installed, you will need to install
- The
sos-r
language module, - The
arrow
library of R, and - The
feather-format
module of Python
The feature modules are needed to exchange dataframe between Python and R
SoS transfers Python variables in the following types to R as follows:
Python | condition | R |
---|---|---|
None |
NULL |
|
integer |
integer |
|
integer |
large |
numeric |
float |
numeric |
|
boolean |
logical |
|
complex |
complex |
|
str |
character |
|
Sequence (list , tuple , ...) |
homogenous type | c() |
Sequence (list , tuple , ...) |
multiple types | list |
set |
list |
|
dict |
list with names |
|
numpy.ndarray |
array | |
numpy.matrix |
matrix |
|
pandas.DataFrame |
R data.frame |
SoS gets variables in the following types to SoS as follows (n
in condition
column is the length of R datatype):
R | condition | Python |
---|---|---|
NULL |
None |
|
logical |
n == 1 |
boolean |
integer |
n == 1 |
integer |
numeric |
n == 1 |
double |
character |
n == 1 |
string |
complex |
n == 1 |
complex |
logical |
n > 1 |
list |
integer |
n > 1 |
list |
complex |
n > 1 |
list |
numeric |
n > 1 |
list |
character |
n > 1 |
list |
list without names |
list |
|
list with names |
dict (with ordered keys) |
|
matrix |
numpy.array |
|
data.frame |
DataFrame |
|
array |
numpy.array |
One of the key problems in mapping R datatypes to Python is that R does not have scalar types and all scalar variables are actually array of size 1. That is to say, in theory, variable a=1
should be represented in Python as a=[1]
. However, because Python does differentiate scalar and array values, we chose to represent R arraies of size 1 as scalar types in Python.
Most simple Python data types can be converted to R types easily,
The variables can be sent back to SoS without losing information
However, because Python allows integers of arbitrary precision which is not supported by R, large integers would be presented in R as float point numbers, which might not be able to keep the precision of the original number.
For example, if we put a large integer with 18 significant digits to R
The last digit would be different because of floating point presentation
This is not a problem with SoS because you would get the same result if you enter this number in R
Consequently, if you send large_int
back to SoS
, the number would be different
The one-dimension (vector) data is converted from SoS to R as follows:
The multi-dimension data is converted from SoS to R as follows:
The scalar data is converted from R to SoS as follows:
The one-dimension (vector) data is converted from R to SoS as follows:
The multi-dimension data is converted from R to SoS as follows:
It is worth noting that R's named list
is transferred to Python as dictionaries but SoS preserves the order of the keys so that you can recover the order of the list. For example,
Although the dictionary might appear to have different order
The order of the keys and values are actually preserved
so it is safe to enumerate the R list in Python as