Edit this page on our live server and create a PR by running command !create-pr in the console panel

Using SoS Workflow for daily data analysis

  • Difficulty level: easy
  • Time need to lean: 10 minutes or less
  • Key points:
    • Use SoS workflow system to record multi-script, multi-language data analysis
    • Adding comments and help messages
    • Using command line option to apply the workflow to new batches of data

This tutorial is a re-cap of what we have learned from other tutorials, with an emphasis on how to organize your data analysis for easy sharing and reproducing.

Using SoS to record your data analysis

As we have shown in the Using SoS workflow system in Jupyter and from command line and the following tutorials, SoS allows you to perform your data analysis in Jupyter or record the scripts you developed in other environments in a Jupyter notebook, without a steep learning curve.

Using SoS Notebook for interactive multi-language data analysis

Firstly, you can perform your data analysis in Jupyter using multiple kernels in one notebook. Without going into the details on how SoS Notebook can assist the interactive data analysis, here is what the end result might look like.

In [1]:
In [2]:
In [3]:
pdf: 2

This tutorial does not introduce any

Simple SoS workflow

The multi-language data analysis can be converted almost trivially to the following SoS workflow. In contrast to analysis in SoS notebook, each step must contain complete scripts that can be executed independent of other steps. One of the benfits of the conversion is that the workflow can be execute from command line.

In [4]:

Allow command line options

Adding command line options allows you to apply the workflow to other sets of data, usually from command line:

In [5]:
xlsx2csv data/DEG.xlsx > DEG.csv

null device 
          1 

Adding multiple workflows in one SoS notebook

If your analysis contains multiple related or unrealted steps, you can include them all in the notebook and execute them with their names.

In [6]:

Adding step input and output to analysis input files in parallel

Now that you have learned the basics of SoS, you can go ahead and use them to oraganize your scripts. However, SoS is very powerful system and can be used to write powerful workflows and execute scripts in containers and remote hosts. The following example from How to define and execute basic SoS workflows demonstrates the creation and passing of substeps and you can learn more from other tutorials.

In [7]: