Script of Scripts (SoS) is a computational environment for the development and execution of scripts in multiple languages for daily computational research. It can be used to develop scripts to analyze data interactively in a Jupyter environment, and, with minimal effort, convert the scripts to a workflow that analyzes a large amount of data in batch mode.
SoS consists of a ployglot notebook that allows the use of multiple kernels in one Jupyter notebook, and a workflow system that is designed for daily computational research. Basically,
The figure is linked to a youtube video for a presentation on SoS during the 2018 JupyterCon, which introduces both SoS Notebook and SoS Workflow System and can be a good starting point for you to learn SoS. The SoS Workflow part starts at 20min.
As an interactive environment and notebook tool that promotes literate programming, SoS allows you to perform and record your analysis in different languages in a single Jupyter notebook, with seamless integration of multiple Jupyter kernels (e.g. python, and R). The ability to exchange data between live jupyter kernels allows you to use the most appropriate languages for different parts of your analyses.
SoS can work with all Jupyter kernels and currently provides native support (with data exchange) for the following languages:
Bash |
JavaScript |
Julia |
MATLAB |
Ruby |
Octave |
Python 2 |
Python 3 |
R |
SAS |
Scilab |
Stata |
TypeScript |
Zsh |
Other langauges can be supported through third-party langauge modules. Please feel free to submit a PR if you would like to list your language modules below:
Most workflow systems have rigorous interfaces and syntaxes for specification of computational tasks. Inevitably, re-factoring codes for these pipeline platforms results in notably different implementations of their non-pipeline counterparts. Therefore, overuse of pipelines, particularly in the early stages of projects, decreases productivity, as researchers are forced to redirect their focus from scientific problems to engineering details. For these reasons, pipelineitis is a nasty disease and prevents the use of workflow tools for exploratory data analysis.
The SoS workflow system is designed for daily data analysis. SoS is extended from Python 3.6 with a minimal ammount of extra syntax. It also follows an incremental design so that you can literally convert your scripts into a workflow in seconds, while allowing the use of more advanced features as needs arise.
SoS follows your definition of workflow that allows you to define workflows in both foward and makefile style, even a mixture of both. More specifically, you can define workflows as ordered steps that would be executed sequentially step by step to process specified input files; or as a set of makefile-style rules that would be executed to generated specified output files; or as a sequence of steps with help from makefile-style steps. SoS automatically analyzes the input and output of steps and execute workflows using a dynamic DAG (Directed Acyclic Graph) that would be expanded and shrinked during the execution.
One of the major obstacles for reproducible data analysis is the fact that large workflows need to exploit remote computational resources, which usually involves logging to remote systems, creating system-specific wrapper scripts, submitting jobs and collecting results are the jobs are done. With SoS' remote execution model, all these steps are simplified so that you can submit tasks to different remote hosts and cluster systems, all from your local desktop or laptop.
The SoS workflow system can be embeded to SoS notebooks and be executed both from command line or from within the notebook. This allows SoS notebooks to include scientific narratives, workflow descriptions, sample input and output, along with the embedded workflows, which makes them much more readable, therefore much easier to share than other workflow systems, and becomes a perfect medium for the sharing of analytic procedures for daily data analysis.