Edit this page on our live server and create a PR by running command !create-pr in the console panel

Running scripts in docker containers

  • Difficulty level: intermediate
  • Time need to lean: 30 minutes or less
  • Key points:
    • The container option runs the script inside the specified docker container
    • Action docker_build builds docker images from embedded Dockerfile

Running a script inside docker

SoS executes scripts inside docker by calling command docker run with appropriate parameters. Suppose you do not have ruby installed locally and would like to run a ruby script, you can execute it inside a ruby container by specifying the docker image to use:

In [1]:
HINT: Pulling docker image ruby
Line1 contains Cats

The actual docker run command executed by SoS can be shown when you execute the script in dryrun mode (with -n option). In this mode, SoS would print the interpolated script (if option expand=True is set) and the docker command to execute it:

In [2]:
HINT: Pulling docker image ruby
HINT: docker run --rm   -v /Users/bpeng1/sos/sos-docs/src/user_guide:/Users/bpeng1/sos/sos-docs/src/user_guide -v /Users/bpeng1/sos/sos-docs/src/user_guide/tmpddu7vzmy/docker_run_81390.rb:/var/lib/sos/docker_run_81390.rb    -t  -w=/Users/bpeng1/sos/sos-docs/src/user_guide -u 1985961928:895809667    ruby ruby /var/lib/sos/docker_run_81390.rb
line1 = "Cats are smarter than dogs";
line2 = "Dogs also like meat";

if ( line1 =~ /Cats(.*)/ )
  puts "Line1 contains Cats"
end
if ( line2 =~ /Cats(.*)/ )
  puts "Line2 contains  Dogs"
end

As you can see, the docker command looks similar to

docker run --rm  
    -v $(pwd):$(pwd)
    -v /tmp/path/to/docker_run_30258.rb:/var/lib/sos/docker_run_30258.rb
    -t -P 
    -w=/Users/bpeng1/sos/sos-docs/src/tutorials
    -u 12345:54321    ruby
    ruby /var/lib/sos/docker_run_30258.rb

Basically, SoS downloads a docker image called ruby and runs command docker run to execte the specified script, with the following options

  • --rm Automatically remove the container when it exits
  • -v $(pwd):$(pwd) maps current working directory to the docker image so that it can be accessed from within the docker image
  • -v /tmp/path/to/docker_run_30258.rb:/var/lib/sos/docker_run_30258.rb maps a temporary script (/Users/bpeng1/sos/sos-docs/src/tutorials/tmp2zviq3qh/docker_run_30258.rb to the docker image.
  • -t Allocate a pseudo-tty
  • -P Publish all exposed ports to the host interfaces
  • -w=/Users/bpeng1/sos/sos-docs/src/tutorials Set working directory to current working directory
  • -u 12345:54321 Use the host user-id and group-id inside docker so that files created by docker (on shared volumes) could be accessible from outside of docker.
  • ruby name of the docker image
  • ruby /var/lib/sos/docker_run_30258.rb command that execute the script.

The details of these options could be found at the docker run manual. They are chosen by the default to work with a majority of the scenarios but can fail for some docker images, in which case you will have to use additional options to customized how the images are executed. This tutorial demonstrates the use of options for some common scenarios but please refer to the SoS documentation on general action options.

Note that

ruby: container='ruby'

is a shortcut for

ruby: container='docker://ruby'

Building docker images (action docker_build)

Building a docker image is usually done outside of SoS if you are maintaining a collection of docker containers to be shared by your workflows, your groups, or everyone. However, if you need to create a docker image on-the-fly or would like to embed the Dockerfile inside a SoS script, you can use the docker_build action to build a docker container.

For example, you can build simple image

In [3]:
Sending build context to Docker daemon  61.09MB
Step 1/1 : FROM ubuntu:14.04
 ---> 6e4f1fe62ff1
Successfully built 6e4f1fe62ff1
Successfully tagged test_docker:latest
Out[3]:
0

and use the image

In [4]:
Pulling docker image test_docker
bin  games  include  lib  local  sbin  share  src

This tutorial will use the docker_build action to build a few simple docker images to demonstrate the use of various options.

Docker build options

Action docker_build accepts usual SoS action options such as workdir, stdout, and stderr. For example, you can suppress the output of the action using options stdout=False and/or stderr=False:

In [5]:
Out[5]:
0

This action also accepts all options to command docker build, as listed in the docker build documentation. Only the long format of the option names are accepted (e.g. --memory is acceptable but not -m). Option names containing hyphens (e.g. disable-content-trust) should have hyphens relaced with underscores. Boolean options should be specified as name=True.

For example,

In [6]:
Sending build context to Docker daemon  53.87MB
Step 1/2 : FROM ubuntu:14.04
 ---> 6e4f1fe62ff1
Step 2/2 : LABEL testimage=
 ---> Running in 70268c43e676
Removing intermediate container 70268c43e676
 ---> 311ecf436f9d
Successfully built 311ecf436f9d
Successfully tagged test_docker:latest
Out[6]:
0

Running docker images

Customized working directory (workdir and docker_workdir)

SoS by default sets the current working directory of the docker image to the working directory of the host system, essentially adding -w $(pwd) to the command line. For example, with the following docker image, the pwd of the script is the current working directory on the host machine.

In [7]:
Pulling docker image ubuntu:14.04
/Users/bpeng1/sos/sos-docs/src/user_guide

Since the action option workdir can change the working directory of the script, you can use this option to change the script of the working directory of the docker image as well. For example, SoS in the following example will change the current working directory to the parent directory before executing docker run there.

In [8]:
/Users/bpeng1/sos/sos-docs/src

This default behavior is convenient when you use commands in docker images to process input files on your host machine but it has two caveats:

  1. The docker image might have its own WORKDIR for the command to work. For example, a docker image can provide an application that is not in standard $PATH so it can only be executed in a specified WORKDIR.
  2. You might need to specify a working directory inside of docker that does not exist in the host machine.

Option docker_workdir, if specified, overrides workdir and allows the use of default or customized working directory inside of docker images. When docker_workdir is set to None, no -w option will be passed to the docker image and the default WORKDIR will be used. Otherwise an absolute path inside the docker image can be specified.

For example, the following customized docker image has a WORKDIR set to /usr. It is working directory is set to host working directory by default, to /usr with docker_workdir=None, and /home/random_user with docker_workdir='/home/random_user'.

In [9]:
Sending build context to Docker daemon  61.09MB
Step 1/2 : FROM ubuntu:14.04
 ---> 6e4f1fe62ff1
Step 2/2 : WORKDIR /usr
 ---> Using cache
 ---> ff584235a068
Successfully built ff584235a068
Successfully tagged test_docker_workdir:latest
Pulling docker image test_docker_workdir
/Users/bpeng1/sos/sos-docs/src/user_guide
/usr
/home/random_user

Note the directory is relative to the docker file system so it does not have to exist on the host system. Docker also creates docker_workdir if it does not exist so you do not have to create the directory in advance.

Sharing of input and output files (volumes)

Because the working directory of the docker image is set by default to the current working directory, you can apply a command inside a docker image to work on files in the current working directory, and create files in it as well.

In [10]:
1279 docker.ipynb

This works because SoS automatically shares the current working directory of the host system to the docker image. Because the docker image can only "see" file systems shared by command docker run, your script will fail if your input files or output files are not under current working directory.

The problem could be solved by specifying additional shared file systems through parameter volumes. This parameter accepts one (a string) or a list of volumes (list of strings) in the format of

  • A single path (e.g. /Users) which would be shared to the docker image under the same name (e.g. /Users:/Users).
  • A full volume specification host-src:]container-dest[:<options>], in which case host and container directories can have different names (e.g. /Users:/home).

A special rule here is that the current working directory will not be mapped separately if it is under one of the specified volumes. That is to say, if the current working directly is /Users/bpeng1/project and option volumes='/Users:/home' is specified, current working directory will be implicitly mapped to /home/bpeng1/project.

Docker images with ENTRYPOINT

Some docker images have an entry point which determines the command that will be executed when the image is executed. When such images are executed, parameters passed from command line will be appended to ENTRYPOINT so our usually way of specifying an interpreter (e.g. ruby) and a script will not work. If we run the script directly, our "command" (e.g. ruby /var/lib/sos/docker_run_30258.rb will be appended to the entry point and will not be executed properly. Examples of such images include dceoy/gatk, which has an entry point

["java", "-jar", "/usr/local/src/gatk/build/libs/gatk.jar"]

and does not accept any additional interpreter. What we really need to do is to append "arguments" to this pre-specified command.

For example, the test_docker_ls image has an ENTRYPOINT with command ls.

In [11]:
Sending build context to Docker daemon  61.09MB
Step 1/2 : FROM ubuntu:14.04
 ---> 6e4f1fe62ff1
Step 2/2 : ENTRYPOINT ["ls"]
 ---> Using cache
 ---> 52364ac33aad
Successfully built 52364ac33aad
Successfully tagged test_docker_ls:latest
Out[11]:
0

The image is expected to be executed directly with optional parameter and without an interpreter (e.g. docker run test_docker_ls).

Because action script does not have a default interpreter, and option args can be used to construct a command line, we can use docker images with ENTRYPOINT in the format of

In [12]:
Pulling docker image test_docker_ls
-rw-r--r-- 1 1985961928 895809667 34732 Dec 25 23:26 docker.ipynb

which essentially passes -l SoS_Docker_Guide.ipynb to the image and executes command

ls -l SoS_Docker_Guide.ipynb

If the command line is long, you can use another trick, that is to say, to use {script} in args for scripts of the action. For example, the aforementioned command can be specified as

In [13]:
-rw-r--r-- 1 1985961928 895809667 34732 Dec 25 23:26 docker.ipynb

Common problems

  • If you are using Docker Toolbox instead of Docker for Mac on a Mac, the docker image will be executed inside a VirtualBox virtual machine, which has its own shared directories, allocated CPUs and memories. It is therefore possible that
  1. Your virtual machine is usually onnfigured with a small amount of RAM (e.g. 2G), so your docker image will run out of memory even when your system has plenty of RAM left. Re-configure your virtual box VM if this happens.
  2. SoS uses native paths for its docker command line so it will for example map c:\Users to /C/Users under windows. However, this path might not be accessible from docker if the virtual box does not share this directory (c:\Users) or if the shared directory has a different name (e.g. /Users instead of /C/Users). Before you use SoS with Docker Toolbox, please make sure that the directory you would like to use is shared in the VM, and use names recognizable by SoS (e.g. share C:\Users as /C/Users, D:\Data as /D/Data).
  • Symbolic links are different from OS to OS so creating symbolic links inside docker might fail with strange error messages such as "Read-only file system". Even if you can create symbolic links inside docker, the created links might not be accessible from the host machine.

  • Killing a sos task or sos process will not terminate scripts that are executed by the docker daemon. They will have to be killed explicitly using docker commands.