Docker Volumes for Datasets

It is reasonable to utilize docker volumes to make datasets easily accessible for oneself and for others. Specifically, bind-mounts offer a very convenient and storage-friendly way to do so. The advantage of using them is that they can be found and mounted by anyone on the same machine. With additional options, one can enforce the volumes to be read-only even by others.

The following datasets already exist on every DGX server as a read-only docker volume:

To create a docker volume from an existing dataset, use the following command:

Creating a read-only docker volume.
docker volume create -o o=bind,ro -o type=none -o device=/cluster/data/<name>/path/to/folder <volume_name>

Note that the -o o=<values> arguments are passed on to mount. Therefore, omitting the ro option will leave the option of mounting the volume read-only to whoever mounts the volume in the end.

Verify that the volume has been created and inspect it:

Check on the volume.
docker volume ls
# outputs:
# DRIVER    VOLUME NAME
# local     <volume_name>

docker inspect <volume_name>
Output of docker inspect <volume_name>.
[
    {
        "CreatedAt": "2022-09-15T14:55:07+02:00",
        "Driver": "local",
        "Labels": {},
        "Mountpoint": "/var/lib/docker/volumes/<volume_name>/_data",
        "Name": "<volume_name>",
        "Options": {
            "device": "/cluster/data/<name>/path/to/folder",
            "o": "bind,ro",
            "type": "none"
        },
        "Scope": "local"
    }
]

Mounting the Docker Volume

When starting a docker container, volumes and mounts can be easily configured using the -v argument:

Mounting docker volumes.
docker run ... -v <volume_name>:/path/to/mount/to ...
# Or with the read-only flag:
docker run ... -v <volume_name>:/path/to/mount/to:ro ...

Note that if the docker volume does not already have the read-only flag in the mount options (check with docker inspect <volume_name> | jq ".[0].Options.o"), then someone mounting it could potentially add, manipulate or delete data. This means one could be stuck with root-owned files in their dataset or files missing. If it is desired to let users decide whether to mount the volumes read-only (or to allow them to manipulate the dataset), then during the creation of the docker volume, the ro flag is not needed, and the users can optionally append the :ro flag to their mount options (as shown above). But if the read-only flag has already been set during the creation of the docker volume, the :ro flag during mounting is not needed.

Peeking into Docker Volumes

Docker volumes cannot be conveniently peeked into from outside. In order to still do so, one has to start a light-weight docker container like busybox. Example:

List files inside docker volume.
docker run -it --rm -v cifar-10:/mnt busybox ls -lAh /mnt