- Difficulty level: intermediate
- Time need to lean: 30 minutes or less
- Key points:
- The
container
option runs the script inside the specified docker container - Action
docker_build
builds docker images from embeddedDockerfile
- The
SoS executes scripts inside docker by calling command docker run
with appropriate parameters. Suppose you do not have ruby installed locally and would like to run a ruby script, you can execute it inside a ruby
container by specifying the docker image to use:
The actual docker run
command executed by SoS can be shown when you execute the script in dryrun mode (with -n
option). In this mode, SoS would print the interpolated script (if option expand=True
is set) and the docker command to execute it:
As you can see, the docker command looks similar to
docker run --rm
-v $(pwd):$(pwd)
-v /tmp/path/to/docker_run_30258.rb:/var/lib/sos/docker_run_30258.rb
-t -P
-w=/Users/bpeng1/sos/sos-docs/src/tutorials
-u 12345:54321 ruby
ruby /var/lib/sos/docker_run_30258.rb
Basically, SoS downloads a docker image called ruby
and runs command docker run
to execte the specified script, with the following options
--rm
Automatically remove the container when it exits-v $(pwd):$(pwd)
maps current working directory to the docker image so that it can be accessed from within the docker image-v /tmp/path/to/docker_run_30258.rb:/var/lib/sos/docker_run_30258.rb
maps a temporary script (/Users/bpeng1/sos/sos-docs/src/tutorials/tmp2zviq3qh/docker_run_30258.rb
to the docker image.-t
Allocate a pseudo-tty-P
Publish all exposed ports to the host interfaces-w=/Users/bpeng1/sos/sos-docs/src/tutorials
Set working directory to current working directory-u 12345:54321
Use the host user-id and group-id inside docker so that files created by docker (on shared volumes) could be accessible from outside of docker.ruby
name of the docker imageruby /var/lib/sos/docker_run_30258.rb
command that execute the script.
The details of these options could be found at the docker run manual. They are chosen by the default to work with a majority of the scenarios but can fail for some docker images, in which case you will have to use additional options to customized how the images are executed. This tutorial demonstrates the use of options for some common scenarios but please refer to the SoS documentation on general action options.
Note that
ruby: container='ruby'
is a shortcut for
ruby: container='docker://ruby'
Building a docker image is usually done outside of SoS if you are maintaining a collection of docker containers to be shared by your workflows, your groups, or everyone. However, if you need to create a docker image on-the-fly or would like to embed the Dockerfile inside a SoS script, you can use the docker_build
action to build a docker container.
For example, you can build simple image
and use the image
This tutorial will use the docker_build
action to build a few simple docker images to demonstrate the use of various options.
Action docker_build
accepts usual SoS action options such as workdir
, stdout
, and stderr
. For example, you can suppress the output of the action using options stdout=False
and/or stderr=False
:
This action also accepts all options to command docker build
, as listed in the docker build documentation. Only the long format of the option names are accepted (e.g. --memory
is acceptable but not -m
). Option names containing hyphens (e.g. disable-content-trust
) should have hyphens relaced with underscores. Boolean options should be specified as name=True
.
For example,
SoS by default sets the current working directory of the docker image to the working directory of the host system, essentially adding -w $(pwd)
to the command line. For example, with the following docker image, the pwd
of the script is the current working directory on the host machine.
Since the action option workdir
can change the working directory of the script, you can use this option to change the script of the working directory of the docker image as well. For example, SoS in the following example will change the current working directory to the parent directory before executing docker run
there.
This default behavior is convenient when you use commands in docker images to process input files on your host machine but it has two caveats:
- The docker image might have its own
WORKDIR
for the command to work. For example, a docker image can provide an application that is not in standard$PATH
so it can only be executed in a specifiedWORKDIR
. - You might need to specify a working directory inside of docker that does not exist in the host machine.
Option docker_workdir
, if specified, overrides workdir
and allows the use of default or customized working directory inside of docker images. When docker_workdir
is set to None
, no -w
option will be passed to the docker image and the default WORKDIR
will be used. Otherwise an absolute path inside the docker image can be specified.
For example, the following customized docker image has a WORKDIR
set to /usr
. It is working directory is set to host working directory by default, to /usr
with docker_workdir=None
, and /home/random_user
with docker_workdir='/home/random_user'
.
Note the directory is relative to the docker file system so it does not have to exist on the host system. Docker also creates docker_workdir
if it does not exist so you do not have to create the directory in advance.
Because the working directory of the docker image is set by default to the current working directory, you can apply a command inside a docker image to work on files in the current working directory, and create files in it as well.
This works because SoS automatically shares the current working directory of the host system to the docker image. Because the docker image can only "see" file systems shared by command docker run
, your script will fail if your input files or output files are not under current working directory.
The problem could be solved by specifying additional shared file systems through parameter volumes
. This parameter accepts one (a string) or a list of volumes (list of strings) in the format of
- A single path (e.g.
/Users
) which would be shared to the docker image under the same name (e.g./Users:/Users
). - A full volume specification
host-src:]container-dest[:<options>]
, in which case host and container directories can have different names (e.g./Users:/home
).
A special rule here is that the current working directory will not be mapped separately if it is under one of the specified volumes. That is to say, if the current working directly is /Users/bpeng1/project
and option volumes='/Users:/home'
is specified, current working directory will be implicitly mapped to /home/bpeng1/project
.
Some docker images have an entry point which determines the command that will be executed when the image is executed. When such images are executed, parameters passed from command line will be appended to ENTRYPOINT
so our usually way of specifying an interpreter (e.g. ruby
) and a script will not work. If we run the script directly, our "command" (e.g. ruby /var/lib/sos/docker_run_30258.rb
will be appended to the entry point and will not be executed properly. Examples of such images include dceoy/gatk
, which has an entry point
["java", "-jar", "/usr/local/src/gatk/build/libs/gatk.jar"]
and does not accept any additional interpreter. What we really need to do is to append "arguments" to this pre-specified command.
For example, the test_docker_ls
image has an ENTRYPOINT
with command ls
.
The image is expected to be executed directly with optional parameter and without an interpreter (e.g. docker run test_docker_ls
).
Because action script
does not have a default interpreter, and option args
can be used to construct a command line, we can use docker images with ENTRYPOINT
in the format of
which essentially passes -l SoS_Docker_Guide.ipynb
to the image and executes command
ls -l SoS_Docker_Guide.ipynb
If the command line is long, you can use another trick, that is to say, to use {script}
in args
for scripts of the action. For example, the aforementioned command can be specified as
- If you are using Docker Toolbox instead of Docker for Mac on a Mac, the docker image will be executed inside a VirtualBox virtual machine, which has its own shared directories, allocated CPUs and memories. It is therefore possible that
- Your virtual machine is usually onnfigured with a small amount of RAM (e.g. 2G), so your docker image will run out of memory even when your system has plenty of RAM left. Re-configure your virtual box VM if this happens.
- SoS uses native paths for its docker command line so it will for example map
c:\Users
to/C/Users
under windows. However, this path might not be accessible from docker if the virtual box does not share this directory (c:\Users
) or if the shared directory has a different name (e.g./Users
instead of/C/Users
). Before you use SoS with Docker Toolbox, please make sure that the directory you would like to use is shared in the VM, and use names recognizable by SoS (e.g. shareC:\Users
as/C/Users
,D:\Data
as/D/Data
).
Symbolic links are different from OS to OS so creating symbolic links inside docker might fail with strange error messages such as "Read-only file system". Even if you can create symbolic links inside docker, the created links might not be accessible from the host machine.
Killing a sos task or sos process will not terminate scripts that are executed by the docker daemon. They will have to be killed explicitly using docker commands.