ubuntu部署nvidia-container-toolkit

本教程前提条件是 宿主机需要安装完成GPU显卡的驱动 即使用 nvidia-smi 有正常回显

脚本部署宿主机驱动:nvidia-gpu.sh

本教程环境:ubuntu22.04LTS

NVIDIA container主要组件包括nvidia-container-runtime, nvidia-container-toolkit, libnvidia-container和CUDA驱动;
在3.6.0版本后,runtime包成为一个只依赖于toolkit包(指container-toolkit而不是nvidia CUDA toolkit)的包,在官网中也指出,对于一般的应用而言,nvidia-container-toolkit能够满足绝大多数需求。

1.官网安装方式(需要科学上网):

1
2
3
4
5
6
7

root@dlp:~# curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-toolkit.gpg
root@dlp:~# curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | tee /etc/apt/sources.list.d/nvidia-toolkit.list
root@dlp:~# sed -i -e "s/^deb/deb \[signed-by=\/usr\/share\/keyrings\/nvidia-toolkit.gpg\]/g" /etc/apt/sources.list.d/nvidia-toolkit.list
root@dlp:~# apt update
root@dlp:~# apt -y install nvidia-container-toolkit
root@dlp:~# systemctl restart docker

2.离线方式部署(低版本)

下载到本地后上传到服务器 执行部署

离线deb包下载地址:

https://github.com/NVIDIA/libnvidia-container/blob/gh-pages/stable/ubuntu18.04

本地下载地址:

https://onenote.zznnwn.cloudns.biz/zh-CN/public-tools/nvidia-deb/

1
dpkg -i nvidia-container-toolkit_1.4.2-1_amd64.deb

3. 本教程成功部署方式

下方deb包下载地址见onenote

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 按照顺序安装
dpkg -i libnvidia-container1_1.9.0-1_amd64.deb
dpkg -i libnvidia-container-tools_1.9.0-1_amd64.deb
dpkg -i nvidia-container-toolkit_1.9.0-1_amd64.deb
dpkg -i nvidia-container-runtime_3.9.0-1_all.deb

# 此种部署后需要在docker添加配置
➜ ~ cat /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}

# 重启
systemctl restart docker
# 最后部署ollama本地大模型服务

https://github.com/zznn-cloud/zznn-cloud-blog-images/raw/main/Qexo/24/7/image_61c36b996becd1e14b738fbb930de4ac.png

3.验证(成功)

备注:使用pve创建的虚拟机直通GPU 需要将cpu模式设置为host(默认为qemu模式) 否则会报错cpu不支持avx指令集导致GPU被禁用

https://github.com/zznn-cloud/zznn-cloud-blog-images/raw/main/Qexo/24/7/image_20236426c0189f25768346fb91a15c8f.png

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
root@dmx:/opt/ollama# docker run -it --rm --gpus all registry.cn-hangzhou.aliyuncs.com/zznn/mycentos:ubuntu  nvidia-smi
Tue Jul 23 07:28:58 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.100 Driver Version: 550.100 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 D Off | 00000000:01:00.0 Off | Off |
| 0% 40C P5 65W / 425W | 5609MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

参考:

Ubuntu 24.04 : NVIDIA Container Toolkit : Install : Server World (server-world.info)

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html (官网)

https://blog.csdn.net/u010953692/article/details/114053593

Docker离线安装Nvidia-container-toolkit实现容器内GPU调用-CSDN博客

https://www.holelin.cn/2022/03/31/devops/%E8%BF%90%E7%BB%B4-%E7%A6%BB%E7%BA%BF%E5%AE%89%E8%A3%85nvidia-docker2/index.html