gpu驱动持久化及nvlink检查

转载:

https://blog.csdn.net/weixin_54757969/article/details/131960450

操作系统及NVlink桥接及gpu驱动持久化

我最熟悉的linux操作系统是Ubuntu,起初想装23.04,它的内核是6.2。最终选择了装Ubuntu22.04,考虑就是这个版本支持较好。基本上所有的支持linux软件,都有Ubuntu22.04的安装方法。我没装server版,直接上了桌面版本,显卡驱动版本530.41.03。(注:后来我为了安装cuda11.7.1及cudnn8.6,把驱动降至515.65.01,2023.7.26)

然后是让NVlink工作,需要开启驱动持久化模式:

1
2
3
nvidia-smi -pm 1
sudo reboot

检查方法:

1
2
3
4
5
nvidia-smi topo -m

GPU0 GPU1 CPU Affinity NUMA Affinity
GPU0 X NV4 0-23 N/A
GPU1 NV4 X 0-23 N/A

这里有一个事项要说明,NVlink插好时,会发现咔哒一声。起初我插完后,持久化重启,用检查命令查不到nvlink工作。后来断电关机,重新插了一次NVlink,听到咔哒一声。然后再开机,一切正常了。测试一下桥接速度,不知道为何14GB,快吗?

1
2
3
4
5
6
7
8
9
10
11
12
13
allonx@x299:~/qmx_projects/langchain-ChatGLM$ nvidia-smi nvlink -s
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-9cca89b5-8627-6cff-7cb4-d8fae8dfadbe)
Link 0: 14.062 GB/s
Link 1: 14.062 GB/s
Link 2: 14.062 GB/s
Link 3: 14.062 GB/s
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-f3fc3749-f793-189f-b592-5af6bb39a569)
Link 0: 14.062 GB/s
Link 1: 14.062 GB/s
Link 2: 14.062 GB/s
Link 3: 14.062 GB/s


再查一下详细情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
allonx@x299:~/qmx_projects/langchain-ChatGLM$ nvidia-smi nvlink -c
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-9cca89b5-8627-6cff-7cb4-d8fae8dfadbe)
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: true
Link 0, Link is supported: false
Link 1, P2P is supported: true
Link 1, Access to system memory supported: true
Link 1, P2P atomics supported: true
Link 1, System memory atomics supported: true
Link 1, SLI is supported: true
Link 1, Link is supported: false
Link 2, P2P is supported: true
Link 2, Access to system memory supported: true
Link 2, P2P atomics supported: true
Link 2, System memory atomics supported: true
Link 2, SLI is supported: true
Link 2, Link is supported: false
Link 3, P2P is supported: true
Link 3, Access to system memory supported: true
Link 3, P2P atomics supported: true
Link 3, System memory atomics supported: true
Link 3, SLI is supported: true
Link 3, Link is supported: false
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-f3fc3749-f793-189f-b592-5af6bb39a569)
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: true
Link 0, Link is supported: false
Link 1, P2P is supported: true
Link 1, Access to system memory supported: true
Link 1, P2P atomics supported: true
Link 1, System memory atomics supported: true
Link 1, SLI is supported: true
Link 1, Link is supported: false
Link 2, P2P is supported: true
Link 2, Access to system memory supported: true
Link 2, P2P atomics supported: true
Link 2, System memory atomics supported: true
Link 2, SLI is supported: true
Link 2, Link is supported: false
Link 3, P2P is supported: true
Link 3, Access to system memory supported: true
Link 3, P2P atomics supported: true
Link 3, System memory atomics supported: true
Link 3, SLI is supported: true
Link 3, Link is supported: false


其中有提示“Link is supported: false”,不明就里,上网查了一下,有一个人与我一样的情况,回复说是正常。

总体来说,这个主机方案接近完美,为了省钱,我没选择水冷,采用了经济实惠的风冷方案。

装机之后,我先装了一个windows系统测试。可能是这块主板有些古老(2017年生产),试了几次我都没法成功安装win11,只能安装win10。用cup-z、gpu-z、鲁大师、3dmark都进行了测试。也许是华硕tuf 3090都这样,3dmark稳定度测试最高都只达到97%-98%之间,算是勉强通过了测试。

另有一个小小的缺憾,就是这台工作站的主机上面缺少了档板,不知道是不是长期运行主机会进入大量灰尘。而且x299 gaming7的档板很难配,偶尔有卖的,一块档板竟要价近300元。实在没必要。

参考资料: