Working with Docker

The DGX server is configured to use Docker containers to keep any dependencies containerized. The typical process of working with Docker is:

  1. Building or pulling a Docker image.

  2. Tagging the built or pulled Docker image.

  3. Running a Docker container using docker.

  4. Attaching to the container either:

    1. Directly through a terminal emulator, or

    2. Setting up the Docker container to be accessible through SSH.

Pulling a Docker Image

Pulling a Docker image is the easiest way to get started. Pulling refers to downloading a Docker image from some repository to the DGX server. This is done by running docker pull NAME[:TAG|@DIGEST].

After pulling, don’t forget to tag the image so it is recognizable.

More information on docker pull can be found at Docker Docs.

Building a Docker Image

Docker images can also be built from Dockerfiles. This is done with the following:

  1. Create a folder and put your Dockerfile, named Dockerfile in that folder.

  2. Run docker build [path to folder] to build the docker image.

  3. When building Docker images, sometimes leftover intermediary images are built and not removed. This happens usually when a Dockerfile fails to build. These images must be removed to save space on the server and to keep the images library clean. This is done by running docker rmi IMAGE_NAME

Tagging a Docker Image

On the DGX server, images must be tagged with the format your_short_username/description. This can be done by running docker tag IMAGE_NAME YOUR_SHORT_NAME/DESCRIPTION.

Running the Docker Container

Experiments must be run within the context of a SLURM session. Doing this within a tmux or screen session is strongly recommended. This can be done using the following steps.

  1. Start a tmux/screen session.

  2. In the tmux/screen session, start a SLURM session with srun --job-name=$JOBNAME --pty --ntasks=1 --cpus-per-task=$NCPU --mem=${MEM}G --gres=gpu:$NGPU bash

    $JOBNAME

    A short job name to make your session identifiable. An example for a job using 2 GPUs is 2_gpu

    $NCPU

    Number of CPU cores to assign to the job.

    $MEM

    Amount of RAM to assign to the job.

    $NGPU

    Number of GPUs to assign to the job.

  3. Use docker to start a docker container with access to the GPUs with: docker run --shm-size=16g -it -p $PortOnDockerHost:$PortInDockerContainer -v $LOCAL_DIR:/workspace $DOCKER_IMAGE bash;

    --shm-size

    Shared memory between processes. 16g is a good value for most jobs.

    $PortOnDockerHost

    Docker creates a tunnel between the server port to the container port on this port. A port must first be exposed within the Dockerfile. The $PORT variable value must be within your allocated port range, which can be found at Important notes

    $PortInDockerContainer

    The port that is exposed within the Dockerfile, i.e. $PortOnDockerHost is mapped to $PortInDockerContainer

    $DOCKER_IMAGE

    The tag given in the previous section.

  4. If for some reason you’re not automatically attached, run:

    1. docker ps to see what the name/id of the container is, then

    2. docker attach [container_name/id] to attach to it.

  5. Once you’re done and you’ve exited the container, your container is stopped but not actually removed.

    1. If you still want to use it eventually, just run:

      1. docker start [container_name/id] to restart the container, then

      2. docker attach [container_name/id] to attach to it.

    2. If you don’t, then don’t forget to remove it using docker rm [container_name/id]

  6. To check if a stopped container still exists, run docker ps -a


The next section will explain how to use Docker with SSH to connect remotely directly to your Docker container.