本教程前提条件是 宿主机需要安装完成GPU显卡的驱动 即使用 nvidia-smi 有正常回显
脚本部署宿主机驱动:nvidia-gpu.sh
本教程环境:ubuntu22.04LTS
NVIDIA container主要组件包括nvidia-container-runtime, nvidia-container-toolkit, libnvidia-container和CUDA驱动;
在3.6.0版本后,runtime包成为一个只依赖于toolkit包(指container-toolkit而不是nvidia CUDA toolkit)的包,在官网中也指出,对于一般的应用而言,nvidia-container-toolkit能够满足绝大多数需求。
1.官网安装方式(需要科学上网):
1 2 3 4 5 6 7
| root@dlp:~ root@dlp:~ root@dlp:~ root@dlp:~ root@dlp:~ root@dlp:~
|
2.离线方式部署(低版本)
下载到本地后上传到服务器 执行部署
离线deb包下载地址:
https://github.com/NVIDIA/libnvidia-container/blob/gh-pages/stable/ubuntu18.04
本地下载地址:
https://onenote.zznnwn.cloudns.biz/zh-CN/public-tools/nvidia-deb/
1
| dpkg -i nvidia-container-toolkit_1.4.2-1_amd64.deb
|
3. 本教程成功部署方式
下方deb包下载地址见onenote
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| dpkg -i libnvidia-container1_1.9.0-1_amd64.deb dpkg -i libnvidia-container-tools_1.9.0-1_amd64.deb dpkg -i nvidia-container-toolkit_1.9.0-1_amd64.deb dpkg -i nvidia-container-runtime_3.9.0-1_all.deb
➜ ~ cat /etc/docker/daemon.json { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
systemctl restart docker
|
3.验证(成功)
备注:使用pve创建的虚拟机直通GPU 需要将cpu模式设置为host(默认为qemu模式) 否则会报错cpu不支持avx指令集导致GPU被禁用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| root@dmx:/opt/ollama Tue Jul 23 07:28:58 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.100 Driver Version: 550.100 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 D Off | 00000000:01:00.0 Off | Off | | 0% 40C P5 65W / 425W | 5609MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+
|
参考:
Ubuntu 24.04 : NVIDIA Container Toolkit : Install : Server World (server-world.info)
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html (官网)
https://blog.csdn.net/u010953692/article/details/114053593
Docker离线安装Nvidia-container-toolkit实现容器内GPU调用-CSDN博客
https://www.holelin.cn/2022/03/31/devops/%E8%BF%90%E7%BB%B4-%E7%A6%BB%E7%BA%BF%E5%AE%89%E8%A3%85nvidia-docker2/index.html