- Difficulty level: easy
- Time need to lean: 10 minutes or less
- Key points:
- Use SoS workflow system to record multi-script, multi-language data analysis
- Adding comments and help messages
- Using command line option to apply the workflow to new batches of data
This tutorial is a re-cap of what we have learned from other tutorials, with an emphasis on how to organize your data analysis for easy sharing and reproducing.
As we have shown in the Using SoS workflow system in Jupyter and from command line and the following tutorials, SoS allows you to perform your data analysis in Jupyter or record the scripts you developed in other environments in a Jupyter notebook, without a steep learning curve.
Firstly, you can perform your data analysis in Jupyter using multiple kernels in one notebook. Without going into the details on how SoS Notebook can assist the interactive data analysis, here is what the end result might look like.
Multi-language notebook
- Data analysis is performed by multiple kernels
- Analysis in each kernel can be separated into multiple cells
- The
%expand
magic can be used to pass variables from SoS to subkernels - The entire data analysis can be rerun using the
Kernel
=>Restart Kernel and Run All Cells
This tutorial does not introduce any
Simple workflows with numerically numbered steps
- Workflows with numerically numbered steps
- Definition of input and output is optional
- Execute the workflow from within the notebook using magics
%run
,%sosrun
, or from command line usingsos run
The multi-language data analysis can be converted almost trivially to the following SoS workflow. In contrast to analysis in SoS notebook, each step must contain complete scripts that can be executed independent of other steps. One of the benfits of the conversion is that the workflow can be execute from command line.
Adding command line options
- Command line options are defined with the
parameter
statement. - Both optional and mandatory options are supported
Adding command line options allows you to apply the workflow to other sets of data, usually from command line:
If your analysis contains multiple related or unrealted steps, you can include them all in the notebook and execute them with their names.
Now that you have learned the basics of SoS, you can go ahead and use them to oraganize your scripts. However, SoS is very powerful system and can be used to write powerful workflows and execute scripts in containers and remote hosts. The following example from How to define and execute basic SoS workflows demonstrates the creation and passing of substeps and you can learn more from other tutorials.