Edit this page on our live server and create a PR by running command !create-pr in the console panel

Working with SAS

  • Difficulty level: easy
  • Time need to lean: 10 minutes or less
  • Key points:
    • You can only exchanged SAS datasets as dataframes with magics %get and %put.
    • SAS statements can start with %, you will have to prefix them with a new line to avoid being handled as SoS magics.
    • If the dataset if not in the WORK library, you will have to pass the SAS dataset with name libname.dataset, and the dataset will be obtained as a dataframe with name libname_dataset.

Installation

To use SAS with SoS, you will need to setup sos-kernel and saspy. The trickiest part is to the creation of a sascfg_personal.py file that specifies how to connect to the SAS server from the Jupyter server.

After you set up a sos-kernel, verified that the sas kernel appears in the list of jupyter kernelspec list, you should start a Jupyter notebook with a SAS kernel and verify if you can execute SAS codes from the Jupyter notebook. Then you can install sos-sas with command

pip install sos-sas

and then select SAS as the language for SAS cells.

Data exachange between SAS and other kernels

A SAS dataset is roughly equivalent to Python's DataFrame (Pandas) and R's dataframe, and this is the only data type that can be exchanged between SAS and other languages.

Sending dataframes to SAS

Data frames in languages such as Python and R can be send to SAS with the usual %get or %put magic. For example, you can %get a dataset from R as follows:

In [1]:
Using SAS Config named: ssh
Out[1]:
SAS Output

The SAS System

Obs mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

As you can see, SoS basically creates a dataset MTCARS in the WORK library. There is no way for you to specify name of a library so you will have to copy the datasets to a library in SAS if you would like to save the dataset permanently.

In [2]:
Out[2]:
SAS Output

The SAS System

Directory
Libref WORK
Engine V9
Physical Name /scratch/sastemp/SAS_work91950000E467_mendel.dldcc.bcm.edu
Filename /scratch/sastemp/SAS_work91950000E467_mendel.dldcc.bcm.edu
Inode Number 6028402508
Access Permission rwx------
Owner Name u233771
File Size 0KB
File Size (bytes) 169
# Name Member Type File Size Last Modified
1 MTCARS DATA 128KB 04/17/2020 13:02:39

The following example shows the passing of a Python Pandas dataframe to SAS and plot it there. The data is of medium size of 100k rows, but should take just a few seconds to be passed to SAS.

In [3]:
In [4]:
Out[4]:
SAS Output
svgtitle Plot of y by x y -1 0 1 x 0 1 2 3 4 5 6 7 The Sin Curve

Passing SAS Datasets to other languages

If you have a SAS dataset, for example, a MYCLASS dataset as follows

In [5]:
Out[5]:
SAS Output

The Sin Curve

Obs NAME SEX AGE HEIGHT WEIGHT
1 JOHN M 12 59 99.5
2 JAMES M 12 57 83.0
3 ALFRED M 14 69 112.5
4 ALICE F 13 56 84.0

You can transfer the dataset to Python or R and analyze there. Note that SAS dataset is case insensitive so you can get either myclass or MYCLASS, but the variable name will be case sensitive in Python.

In [6]:
> myclass: DataFrame of shape (4, 5)
  NAME   SEX   AGE   HEIGHT   WEIGHT  
0 JOHN M 12.0 59.0 99.5
1 JAMES M 12.0 57.0 83.0
2 ALFRED M 14.0 69.0 112.5
3 ALICE F 13.0 56.0 84.0

If the data is not in the WORK library, you will have to specify library name in the %get (or %put magic). For example, you can get the sashelp.air dataset from SAS with name sashelp.air. However, because this is not a valid name for a single variable, the dataset is obtained as sashelp_air.

In [7]:
> sashelp_air: DataFrame of shape (144, 2)
  DATE   AIR  
0 1949-01-01 112.0
1 1949-02-01 118.0
2 1949-03-01 132.0
3 1949-04-01 129.0
4 1949-05-01 121.0
5 1949-06-01 135.0
6 1949-07-01 148.0
7 1949-08-01 148.0
8 1949-09-01 136.0
9 1949-10-01 119.0
10 1949-11-01 104.0
11 1949-12-01 118.0
12 1950-01-01 115.0
13 1950-02-01 126.0
14 1950-03-01 141.0
15 1950-04-01 135.0
16 1950-05-01 125.0
17 1950-06-01 149.0
18 1950-07-01 170.0
19 1950-08-01 170.0
20 1950-09-01 158.0
21 1950-10-01 133.0
22 1950-11-01 114.0
23 1950-12-01 140.0
24 1951-01-01 145.0
25 1951-02-01 150.0
26 1951-03-01 178.0
27 1951-04-01 163.0
28 1951-05-01 172.0
29 1951-06-01 178.0
30 1951-07-01 199.0
31 1951-08-01 199.0
32 1951-09-01 184.0
33 1951-10-01 162.0
34 1951-11-01 146.0
35 1951-12-01 166.0
36 1952-01-01 171.0
37 1952-02-01 180.0
38 1952-03-01 193.0
39 1952-04-01 181.0
40 1952-05-01 183.0
41 1952-06-01 218.0
42 1952-07-01 230.0
43 1952-08-01 242.0
44 1952-09-01 209.0
45 1952-10-01 191.0
46 1952-11-01 172.0
47 1952-12-01 194.0
48 1953-01-01 196.0
49 1953-02-01 196.0
50 1953-03-01 236.0
51 1953-04-01 235.0
52 1953-05-01 229.0
53 1953-06-01 243.0
54 1953-07-01 264.0
55 1953-08-01 272.0
56 1953-09-01 237.0
57 1953-10-01 211.0
58 1953-11-01 180.0
59 1953-12-01 201.0
60 1954-01-01 204.0
61 1954-02-01 188.0
62 1954-03-01 235.0
63 1954-04-01 227.0
64 1954-05-01 234.0
65 1954-06-01 264.0
66 1954-07-01 302.0
67 1954-08-01 293.0
68 1954-09-01 259.0
69 1954-10-01 229.0
70 1954-11-01 203.0
71 1954-12-01 229.0
72 1955-01-01 242.0
73 1955-02-01 233.0
74 1955-03-01 267.0
75 1955-04-01 269.0
76 1955-05-01 270.0
77 1955-06-01 315.0
78 1955-07-01 364.0
79 1955-08-01 347.0
80 1955-09-01 312.0
81 1955-10-01 274.0
82 1955-11-01 237.0
83 1955-12-01 278.0
84 1956-01-01 284.0
85 1956-02-01 277.0
86 1956-03-01 317.0
87 1956-04-01 313.0
88 1956-05-01 318.0
89 1956-06-01 374.0
90 1956-07-01 413.0
91 1956-08-01 405.0
92 1956-09-01 355.0
93 1956-10-01 306.0
94 1956-11-01 271.0
95 1956-12-01 306.0
96 1957-01-01 315.0
97 1957-02-01 301.0
98 1957-03-01 356.0
99 1957-04-01 348.0
100 1957-05-01 355.0
101 1957-06-01 422.0
102 1957-07-01 465.0
103 1957-08-01 467.0
104 1957-09-01 404.0
105 1957-10-01 347.0
106 1957-11-01 305.0
107 1957-12-01 336.0
108 1958-01-01 340.0
109 1958-02-01 318.0
110 1958-03-01 362.0
111 1958-04-01 348.0
112 1958-05-01 363.0
113 1958-06-01 435.0
114 1958-07-01 491.0
115 1958-08-01 505.0
116 1958-09-01 404.0
117 1958-10-01 359.0
118 1958-11-01 310.0
119 1958-12-01 337.0
120 1959-01-01 360.0
121 1959-02-01 342.0
122 1959-03-01 406.0
123 1959-04-01 396.0
124 1959-05-01 420.0
125 1959-06-01 472.0
126 1959-07-01 548.0
127 1959-08-01 559.0
128 1959-09-01 463.0
129 1959-10-01 407.0
130 1959-11-01 362.0
131 1959-12-01 405.0
132 1960-01-01 417.0
133 1960-02-01 391.0
134 1960-03-01 419.0
135 1960-04-01 461.0
136 1960-05-01 472.0
137 1960-06-01 535.0
138 1960-07-01 622.0
139 1960-08-01 606.0
140 1960-09-01 508.0
141 1960-10-01 461.0
142 1960-11-01 390.0
143 1960-12-01 432.0

Additional notes

% statements

The SAS language has syntaxes that look like Jupyter magics. To avoid SoS treating them as SoS magics, you will have to prefix the statement with a new line. For example, to execute a SAS %put statement as the first statement in a cell, you will have to enter it as the second line.

In [8]:
Out[8]:

100156  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
100156! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
100157
100158 %put "this is SAS magic"
100159
100160
100161 ods html5 (id=saspy_internal) close;ods listing;
"this is SAS magic"

100162

SAS HTML Output

The sas-kernel generates output in HTML format, with embedded CSS (style sheet). Although the CSS is designed only for SAS outputs, they can change the look and feel of the entire SoS Notebook, even the HTML report generated from SoS Notebooks with SAS outputs. For example, the page you are reading looks a bit funny with elements (e.g. table of contents) look different from other pages, because of these embedded CSS.

SAS and Jupyter on different servers

SoS %get datasets from SAS by importing data from the SAS data files (.sas7bdat files) directly. This is not possible if the SAS server is on a remote windows server that is connected by the sas-kernel via the IOM method. In another word, getting datasets from SAS only works if the SAS and Jupyter servers share the same server or file system, or connected via SSH.

%cd magic

The %cd magic changes the current working directory on the SAS server, which does not have to be the same server on which the Jupyter server resides. The %cd magic will fail to change directory if the specified path does not exist on the remote server.