- Difficulty level: intermediate
- Time need to lean: 30 minutes or less
- Key points:
- The
containeroption runs the script inside the specified docker container - Action
docker_buildbuilds docker images from embeddedDockerfile
- The
SoS executes scripts inside docker by calling command docker run with appropriate parameters. Suppose you do not have ruby installed locally and would like to run a ruby script, you can execute it inside a ruby container by specifying the docker image to use:
The actual docker run command executed by SoS can be shown when you execute the script in dryrun mode (with -n option). In this mode, SoS would print the interpolated script (if option expand=True is set) and the docker command to execute it:
As you can see, the docker command looks similar to
docker run --rm
-v $(pwd):$(pwd)
-v /tmp/path/to/docker_run_30258.rb:/var/lib/sos/docker_run_30258.rb
-t -P
-w=/Users/bpeng1/sos/sos-docs/src/tutorials
-u 12345:54321 ruby
ruby /var/lib/sos/docker_run_30258.rb
Basically, SoS downloads a docker image called ruby and runs command docker run to execte the specified script, with the following options
--rmAutomatically remove the container when it exits-v $(pwd):$(pwd)maps current working directory to the docker image so that it can be accessed from within the docker image-v /tmp/path/to/docker_run_30258.rb:/var/lib/sos/docker_run_30258.rbmaps a temporary script (/Users/bpeng1/sos/sos-docs/src/tutorials/tmp2zviq3qh/docker_run_30258.rbto the docker image.-tAllocate a pseudo-tty-PPublish all exposed ports to the host interfaces-w=/Users/bpeng1/sos/sos-docs/src/tutorialsSet working directory to current working directory-u 12345:54321Use the host user-id and group-id inside docker so that files created by docker (on shared volumes) could be accessible from outside of docker.rubyname of the docker imageruby /var/lib/sos/docker_run_30258.rbcommand that execute the script.
The details of these options could be found at the docker run manual. They are chosen by the default to work with a majority of the scenarios but can fail for some docker images, in which case you will have to use additional options to customized how the images are executed. This tutorial demonstrates the use of options for some common scenarios but please refer to the SoS documentation on general action options.
Note that
ruby: container='ruby'
is a shortcut for
ruby: container='docker://ruby'
Building a docker image is usually done outside of SoS if you are maintaining a collection of docker containers to be shared by your workflows, your groups, or everyone. However, if you need to create a docker image on-the-fly or would like to embed the Dockerfile inside a SoS script, you can use the docker_build action to build a docker container.
For example, you can build simple image
and use the image
This tutorial will use the docker_build action to build a few simple docker images to demonstrate the use of various options.
Action docker_build accepts usual SoS action options such as workdir, stdout, and stderr. For example, you can suppress the output of the action using options stdout=False and/or stderr=False:
This action also accepts all options to command docker build, as listed in the docker build documentation. Only the long format of the option names are accepted (e.g. --memory is acceptable but not -m). Option names containing hyphens (e.g. disable-content-trust) should have hyphens relaced with underscores. Boolean options should be specified as name=True.
For example,
SoS by default sets the current working directory of the docker image to the working directory of the host system, essentially adding -w $(pwd) to the command line. For example, with the following docker image, the pwd of the script is the current working directory on the host machine.
Since the action option workdir can change the working directory of the script, you can use this option to change the script of the working directory of the docker image as well. For example, SoS in the following example will change the current working directory to the parent directory before executing docker run there.
This default behavior is convenient when you use commands in docker images to process input files on your host machine but it has two caveats:
- The docker image might have its own
WORKDIRfor the command to work. For example, a docker image can provide an application that is not in standard$PATHso it can only be executed in a specifiedWORKDIR. - You might need to specify a working directory inside of docker that does not exist in the host machine.
Option docker_workdir, if specified, overrides workdir and allows the use of default or customized working directory inside of docker images. When docker_workdir is set to None, no -w option will be passed to the docker image and the default WORKDIR will be used. Otherwise an absolute path inside the docker image can be specified.
For example, the following customized docker image has a WORKDIR set to /usr. It is working directory is set to host working directory by default, to /usr with docker_workdir=None, and /home/random_user with docker_workdir='/home/random_user'.
Note the directory is relative to the docker file system so it does not have to exist on the host system. Docker also creates docker_workdir if it does not exist so you do not have to create the directory in advance.
Because the working directory of the docker image is set by default to the current working directory, you can apply a command inside a docker image to work on files in the current working directory, and create files in it as well.
This works because SoS automatically shares the current working directory of the host system to the docker image. Because the docker image can only "see" file systems shared by command docker run, your script will fail if your input files or output files are not under current working directory.
The problem could be solved by specifying additional shared file systems through parameter volumes. This parameter accepts one (a string) or a list of volumes (list of strings) in the format of
- A single path (e.g.
/Users) which would be shared to the docker image under the same name (e.g./Users:/Users). - A full volume specification
host-src:]container-dest[:<options>], in which case host and container directories can have different names (e.g./Users:/home).
A special rule here is that the current working directory will not be mapped separately if it is under one of the specified volumes. That is to say, if the current working directly is /Users/bpeng1/project and option volumes='/Users:/home' is specified, current working directory will be implicitly mapped to /home/bpeng1/project.
Some docker images have an entry point which determines the command that will be executed when the image is executed. When such images are executed, parameters passed from command line will be appended to ENTRYPOINT so our usually way of specifying an interpreter (e.g. ruby) and a script will not work. If we run the script directly, our "command" (e.g. ruby /var/lib/sos/docker_run_30258.rb will be appended to the entry point and will not be executed properly. Examples of such images include dceoy/gatk, which has an entry point
["java", "-jar", "/usr/local/src/gatk/build/libs/gatk.jar"]
and does not accept any additional interpreter. What we really need to do is to append "arguments" to this pre-specified command.
For example, the test_docker_ls image has an ENTRYPOINT with command ls.
The image is expected to be executed directly with optional parameter and without an interpreter (e.g. docker run test_docker_ls).
Because action script does not have a default interpreter, and option args can be used to construct a command line, we can use docker images with ENTRYPOINT in the format of
which essentially passes -l SoS_Docker_Guide.ipynb to the image and executes command
ls -l SoS_Docker_Guide.ipynb
If the command line is long, you can use another trick, that is to say, to use {script} in args for scripts of the action. For example, the aforementioned command can be specified as
- If you are using Docker Toolbox instead of Docker for Mac on a Mac, the docker image will be executed inside a VirtualBox virtual machine, which has its own shared directories, allocated CPUs and memories. It is therefore possible that
- Your virtual machine is usually onnfigured with a small amount of RAM (e.g. 2G), so your docker image will run out of memory even when your system has plenty of RAM left. Re-configure your virtual box VM if this happens.
- SoS uses native paths for its docker command line so it will for example map
c:\Usersto/C/Usersunder windows. However, this path might not be accessible from docker if the virtual box does not share this directory (c:\Users) or if the shared directory has a different name (e.g./Usersinstead of/C/Users). Before you use SoS with Docker Toolbox, please make sure that the directory you would like to use is shared in the VM, and use names recognizable by SoS (e.g. shareC:\Usersas/C/Users,D:\Dataas/D/Data).
Symbolic links are different from OS to OS so creating symbolic links inside docker might fail with strange error messages such as "Read-only file system". Even if you can create symbolic links inside docker, the created links might not be accessible from the host machine.
Killing a sos task or sos process will not terminate scripts that are executed by the docker daemon. They will have to be killed explicitly using docker commands.