Working with Docker
The DGX server is configured to use Docker containers to keep any dependencies containerized. The typical process of working with Docker is:
Building or pulling a Docker image.
Tagging the built or pulled Docker image.
Running a Docker container using
docker
.Attaching to the container either:
Directly through a terminal emulator, or
Setting up the Docker container to be accessible through SSH.
Pulling a Docker Image
Pulling a Docker image is the easiest way to get started.
Pulling refers to downloading a Docker image from some repository to the DGX server.
This is done by running docker pull NAME[:TAG|@DIGEST]
.
After pulling, don’t forget to tag the image so it is recognizable.
More information on docker pull
can be found at Docker Docs.
Building a Docker Image
Docker images can also be built from Dockerfiles. This is done with the following:
Create a folder and put your Dockerfile, named
Dockerfile
in that folder.Run
docker build [path to folder]
to build the docker image.When building Docker images, sometimes leftover intermediary images are built and not removed. This happens usually when a Dockerfile fails to build. These images must be removed to save space on the server and to keep the images library clean. This is done by running
docker rmi IMAGE_NAME
Tagging a Docker Image
On the DGX server, images must be tagged with the format your_short_username/description
.
This can be done by running docker tag IMAGE_NAME YOUR_SHORT_NAME/DESCRIPTION
.
Optional but recommended: Utilize Docker Volumes for Datasets
See here.
Running the Docker Container
Experiments must be run within the context of a SLURM session. Doing this within a tmux or screen session is strongly recommended. This can be done using the following steps.
Start a tmux/screen session.
In the tmux/screen session, start a SLURM session with
srun --job-name=$JOBNAME --pty --ntasks=1 --cpus-per-task=$NCPU --mem=${MEM}G --gres=gpu:$NGPU bash
- $JOBNAME
A short job name to make your session identifiable. An example for a job using 2 GPUs is
2_gpu
- $NCPU
Number of CPU cores to assign to the job.
- $MEM
Amount of RAM to assign to the job.
- $NGPU
Number of GPUs to assign to the job.
Use
docker
to start a docker container with access to the GPUs with:docker run --shm-size=16g -it -p $PortOnDockerHost:$PortInDockerContainer -v $LOCAL_DIR:/workspace $DOCKER_IMAGE bash;
- --shm-size
Shared memory between processes. 16g is a good value for most jobs.
- $PortOnDockerHost
Docker creates a tunnel between the server port to the container port on this port. A port must first be exposed within the Dockerfile. The $PORT variable value must be within your allocated port range, which can be found at Important notes
- $PortInDockerContainer
The port that is exposed within the Dockerfile, i.e. $PortOnDockerHost is mapped to $PortInDockerContainer
- $DOCKER_IMAGE
The tag given in the previous section.
If for some reason you’re not automatically attached, run:
docker ps
to see what the name/id of the container is, thendocker attach [container_name/id]
to attach to it.
Once you’re done and you’ve exited the container, your container is stopped but not actually removed.
If you still want to use it eventually, just run:
docker start [container_name/id]
to restart the container, thendocker attach [container_name/id]
to attach to it.
If you don’t, then don’t forget to remove it using
docker rm [container_name/id]
To check if a stopped container still exists, run
docker ps -a
The next section will explain how to use Docker with SSH to connect remotely directly to your Docker container.