Docker相关常用命令(持续更新中...)

1.创建docker镜像:

在Dockerfile所在目录下,确保Dockerfile中语法无误的情况下;运行
docker build -t $image_name:$tag_name .
完成之后通过docker images检查是否创建成功

2.基于已有的镜像创建对应容器并通过bash交互:

docker run -it $image_name:$tag_name /bin/bash
完成之后通过docker ps -a检查镜像是否创建成功

3.删除已有容器:

docker rm $container_ID

4.删除已有镜像:

docker rmi $image_name:$tag_name

5.连接已有的容器:

docker exec -it $container_ID /bin/bash
如果容器未启动,则需用docker start $container_ID先启动该容器

6.复制宿主机的文件到容器内:

前提是需在宿主机中执行,sudo docker cp $host_path $container_ID:container_path

7.Docker容器内部调用GPU

7.1 在宿主机安装Docker(跳过)
7.2 安装官方的NVIDIA docker
  • 添加NVIDIA repo
curl -s -L [https://nvidia.github.io/nvidia-docker/gpgkey] | 
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L [https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list] | 
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
  • 安装支持GPU的NVIDIA容器并重启docker deamon

    sudo apt-get install nvidia-docker2
    sudo pkill -SIGHUP dockerd
    7.3 通过docker run --gpus $GPU_nums ** 命令在容器内部调用GPU

*指定GPU个数: docker run -it --shm-size 8G --name="shm_updated" --gpus 2 $image_name:$tag_name /bin/bash

root@12331b9ad979:/code# nvidia-smi
Fri Sep 25 06:19:17 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 30%   46C    P0    57W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 24%   42C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

*指定所有GPU个数:docker run -it --shm-size 8G --name="shm_updated" --gpus all $image_name:$tag_name /bin/bash


root@fafadb9bc70f:/code# nvidia-smi

Fri Sep 25 06:15:32 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 30%   46C    P0    57W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 24%   42C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:82:00.0 Off |                  N/A |
| 23%   42C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:83:00.0 Off |                  N/A |
| 22%   40C    P0    55W / 250W |      0MiB / 11178MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
7.4 如何通过docker 命令调用指定GPU core

docker run的时候加入以下参数:
--gpus '"device=2,3"'
例如:docker run -it --shm-size 64G --name="testing" --gpus '"device=2,3"' $image_name:$tag_name /bin/bash

Notes:请注意格式为'"device=2,3"',否则会出现以下error

docker: Error response from daemon: cannot set both Count and DeviceIDs on device request.
ERRO[0000] error waiting for container: context canceled

8.解决上面第七条产生的bug

在7.1~7.2的步骤中并没有其他错误。但是请大家仔细看一下7.3中nvidia-smi的输出,如下图所示:
image
这是由于我的粗心,之前没有注意到。后来是在代码中具体torch.cuda.is_available()的返回值为false,才发现没有调用成功GPU。但是torch.version.cuda返回值却是与宿主机一样的'10.2',torch.backends.cudnn.enabled返回值为true。这证明了docker是没有问题的。问题应该出在cuda驱动了。回过头来一想,上面7.3中运行的$image_name:$tag_name这个镜像是特定项目镜像,没有nvidia/cuda的环境。

8.1. 在之前pull下来的nvidia/cuda:latest的镜像中测试以下命令:docker run --runtime=nvidia --rm nvidia/cuda:latest nvidia-smi
若是显示以下错误,请看第九条!!

docker: Error response from daemon: Unknown runtime specified nvidia.

假设没有以上错误,运行却显示以下错误!

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.0, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.

感觉是不是快要疯了。。hold on,还有解决方法!
仔细看一下log信息,发现requirement error后面的部分说的是cuda>=11.0,请升级驱动或者用老版本的cuda容器。

8.2.故,pull一个老版本的cuda容器。用以下命令:
docker pull nvidia/cuda:9.0-runtime-ubuntu16.04
然后,再尝试
docker run --runtime=nvidia --rm nvidia/cuda:9.0-runtime-ubuntu16.04 nvidia-smi
结果显示:
image
这个结果就是正常调用GPU了,finally!若是想用bash跑Python的话,还是需要在此容器中加入项目所需依赖即可。

9.解决 Unknown runtime specified nvidia.bug

依次执行即可,解决bug

9.1

(如果已经安装过,跳过)

sudo apt-get install nvidia-container-runtime
9.2
sudo vim /etc/docker/daemon.json
9.3 在.json文件中添加如下内容
"default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
9.4 验证json文件中的语法是否正确,如下图

image

9.5 重启docker服务
sudo systemctl daemon-reload
sudo systemctl restart docker

如此这般,便可继续执行第八条中未完部分!

10 Docker镜像迁移

  • saving: docker save -o $IMAGE_NAME.tar $IMAGE_NAME:$tag_name
  • transferring: rsync -av --progress $IMAGE_NAME.tar root@server_ip:~
  • loading: docker load -i $IAMGE_NAME.tar

    11 从更新过的容器生成新的Docker镜像

    docker commit -a=$AUTHOR_name -m="message" $CONTAINER_ID $IMAGE_NAME:$tag_name
    如果遇到各种Dockerfile编译不通过的情况,如requirements中各种依赖 pip install 问题,可尝试先建立基础image,然后在此基础上安装之后转为image。

12 clean images

docker system prune --force --volumes


Oops
9 声望7 粉丝

脑机接口狂热爱好者,EEG+Deeplearning实践者