Docker Volumes for Datasets
It is reasonable to utilize docker volumes to make datasets easily accessible for oneself and for others. Specifically, bind-mounts offer a very convenient and storage-friendly way to do so. The advantage of using them is that they can be found and mounted by anyone on the same machine. With additional options, one can enforce the volumes to be read-only even by others.
The following datasets already exist on every DGX server as a read-only docker volume:
cifar-10
(CIFAR-10, python version, 178MB)cifar-100
(CIFAR-100, python version, 178MB)tinyimagenet-200
(Tiny-ImageNet-200, Kaggle version, 481MB)
To create a docker volume from an existing dataset, use the following command:
docker volume create -o o=bind,ro -o type=none -o device=/cluster/data/<name>/path/to/folder <volume_name>
Note that the -o o=<values>
arguments are passed on to mount.
Therefore, omitting the ro
option will leave the option of mounting the volume read-only to whoever mounts the volume in the end.
Verify that the volume has been created and inspect it:
docker volume ls
# outputs:
# DRIVER VOLUME NAME
# local <volume_name>
docker inspect <volume_name>
[
{
"CreatedAt": "2022-09-15T14:55:07+02:00",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/<volume_name>/_data",
"Name": "<volume_name>",
"Options": {
"device": "/cluster/data/<name>/path/to/folder",
"o": "bind,ro",
"type": "none"
},
"Scope": "local"
}
]
Mounting the Docker Volume
When starting a docker container, volumes and mounts can be easily configured using the -v
argument:
docker run ... -v <volume_name>:/path/to/mount/to ...
# Or with the read-only flag:
docker run ... -v <volume_name>:/path/to/mount/to:ro ...
Note that if the docker volume does not already have the read-only flag in the mount options (check with docker inspect <volume_name> | jq ".[0].Options.o"
), then someone mounting it could potentially add, manipulate or delete data.
This means one could be stuck with root-owned files in their dataset or files missing.
If it is desired to let users decide whether to mount the volumes read-only (or to allow them to manipulate the dataset), then during the creation of the docker volume, the ro
flag is not needed, and the users can optionally append the :ro
flag to their mount options (as shown above).
But if the read-only flag has already been set during the creation of the docker volume, the :ro
flag during mounting is not needed.
Peeking into Docker Volumes
Docker volumes cannot be conveniently peeked into from outside. In order to still do so, one has to start a light-weight docker container like busybox. Example:
docker run -it --rm -v cifar-10:/mnt busybox ls -lAh /mnt