一、前言

目前来说,个人能获得的最低的GPU就是蓝厂的核心显卡了。虽然用来打游戏有点鸡肋,但是其编解码性能可以说是相当无敌,用来进行视频转码、人脸识别等小规模应用非常安逸。

但是,绝大多是人的性价比选择当然是ALL IN BOOM咯,对于显卡就有直通Passthrough和单根虚拟化SR-IOV两条路子,所以到这里就不得不再掰扯一下虚拟化能力了。Passthrough的好处就是100%的性能,而缺点就是一个设备只能被一个虚拟机独占。SR-IOV则可以将一个设备变成多个VF,类似于分身,只是性能强弱由分出的VF多少来决定,每个VF都具备本体的功能且每个都能通过直通的方式分配给不同虚拟机,同时物理机上仍然有显卡本体可供使用,但是但是但是,蓝厂家只有12代及以上的CPU核显是支持SR-IOV的;绿厂的卡原则上只有专业卡的Grid驱动支持vGPU,但20系及以下可以通过vgpu_unlock实现,30系列及以上只是有消息(Nvidia vGPU技术防线被攻破,30/40系解锁已成)但是未公开方法;而农企更不用说了。

目标:在PVE上开启核显SR-IOV并直通给虚拟机,再将虚拟机配置为k3sGPU节点

思路:由于需要启用SR-IOV进行Intel核心显卡的虚拟化,作为Host的PVE和作为Guset的Debian等虚拟机都需要安装基于DKMS的i915-SRIOV驱动。安装主要指导参考i915-sriov-dkms项目以及pve_source中的说明,主要涉及内核版本、启动参数等修改,在此表示感谢。

本文使用环境配置为:

  • 硬件平台:魔改的MoDT平台,CPU为Q1J2,也即移动平台的I7 1370P的ES版,选择这个很大的一个原因是其核显有96EU
  • 虚拟化平台:PVE 9.0.6,Kernel版本为6.14.8-2-pve
  • 虚拟机系统:Debian 13,Kernel版本为6.12.41+deb13-amd64

二、HOST安装驱动

1. 硬件环境准备

BIOS需要如下配置

  • 打开Intel虚拟化
  • 打开VT-d
  • 打开Above 4GB MMIO BIOS assignment

2. 内核更新及依赖安装

首先安装软件包,安装前建议升级至最新版本并重启:

1
2
3
apt update && apt upgrade
# 安装头文件、dkms及编译套件
apt install -y pve-headers proxmox-headers-$(uname -r) dkms build-*

尤其注意proxmox-headers-$(uname -r)这个,保证只有一个当前内核版本的proxmox-headers

3. 驱动安装

i915-sriov-dkms项目的release中提供了deb安装包,直接下载安装即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
dpkg -i i915-sriov-dkms_2025.07.22_amd64.deb

# output:
Selecting previously unselected package i915-sriov-dkms.
(Reading database ... 57391 files and directories currently installed.)
Preparing to unpack i915-sriov-dkms_2025.07.22_amd64.deb ...
Unpacking i915-sriov-dkms (2025.07.22) ...
Setting up i915-sriov-dkms (2025.07.22) ...
install dkms modules for all kernels
Loading new i915-sriov-dkms/2025.07.22 DKMS files...
Building for 6.14.8-2-pve

Building initial module i915-sriov-dkms/2025.07.22 for 6.14.8-2-pve
Sign command: /lib/modules/6.14.8-2-pve/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub

Building module(s).................................... done.
Signing module /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/i915.ko
Signing module /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/kvmgt.ko
Found pre-existing /lib/modules/6.14.8-2-pve/kernel/drivers/gpu/drm/i915/i915.ko.xz, archiving for uninstallation
Installing /lib/modules/6.14.8-2-pve/updates/dkms/i915.ko.xz
Found pre-existing /lib/modules/6.14.8-2-pve/kernel/drivers/gpu/drm/i915/kvmgt.ko.xz, archiving for uninstallation
Installing /lib/modules/6.14.8-2-pve/updates/dkms/kvmgt.ko.xz
Running depmod.... done.
update-initramfs: deferring update (trigger activated)
Processing triggers for initramfs-tools (0.148.3) ...
update-initramfs: Generating /boot/initrd.img-6.14.8-2-pve

不出意外此时基本就成功了一大半了,检查驱动模块加载情况

1
2
3
4
dkms status

# output:
i915-sriov-dkms/2025.07.22, 6.14.8-2-pve, x86_64: installed (Original modules exist)

需要注意,模块显示的内核版本和实际内核版本一定要完全一致,否则是无法驱动的。内核版本可以通过uname -r查看,目前PVE内核版本为6.14.8-2-pve,和dkms返回的一致,没有问题。

4. 启动参数修改

  1. intel_iommu=on i915.enable_guc=3 i915.max_vfs=7 module_blacklist=xe附加到/etc/default/grub文件的GRUB_CMDLINE_LINUX_DEFAULT变量中,注意各个变量以空格分割。i915.max_vfs指定了最大的VF数量,对于核显7就是最大的,需注意性能是按照VF数量平分的,数量越多单个性能就越低。如果是使用systemd-boot的系统,需要将参数添加到/etc/kernel/cmdline文件中,参考文档
  2. 执行update-grubupdate-initramfs -u更新grubinitramfs(systemd-boot引导则执行 proxmox-boot-tool refresh更新内核参数)
  3. 安装sysfsutils工具,echo "devices/pci0000:00/0000:00:02.0/sriov_numvfs = 7" > /etc/sysfs.nconf使能VF,其中00:02.0是核显的PCIe地址,可以通过lspci | grep VGA查看
  4. 重启系统

5. 测试

查看dmesg启动日志如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
[    3.285909] Module xe is blacklisted
[ 3.321922] i915: module verification failed: signature and/or required key missing - tainting kernel
[ 3.729149] Setting dangerous option enable_guc - tainting kernel
[ 3.729367] i915: You are using the i915-sriov-dkms module, a ported version of the i915 module with SR-IOV support.
[ 3.729371] i915: Please file any bug report at https://github.com/strongtz/i915-sriov-dkms/issues/new.
[ 3.729372] i915: Module Homepage: https://github.com/strongtz/i915-sriov-dkms
[ 3.730063] i915 0000:00:02.0: [drm] Found ALDERLAKE_P/RPL-P (device ID a720) display version 13.00 stepping E0
[ 3.730087] i915 0000:00:02.0: Running in SR-IOV PF mode
[ 3.730840] i915 0000:00:02.0: [drm] VT-d active for gfx access
[ 3.745839] intel_tcc_cooling: TCC Offset locked
[ 3.877484] Console: switching to colour dummy device 80x25
[ 3.933794] i915 0000:00:02.0: vgaarb: deactivate vga console
[ 3.933865] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[ 3.934243] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[ 3.936891] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adlp_dmc.bin (v2.20)
[ 3.941676] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.44.1
[ 3.941681] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[ 3.956178] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[ 3.956630] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[ 3.956633] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[ 3.957116] i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
[ 3.959586] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops i915_pxp_tee_component_ops [i915])
[ 3.959742] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[ 3.959746] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
[ 3.983854] intel_rapl_msr: PL4 support detected.
[ 3.996998] intel_rapl_common: Found RAPL domain package
[ 3.997002] intel_rapl_common: Found RAPL domain core
[ 3.997004] intel_rapl_common: Found RAPL domain uncore
[ 3.997465] [drm] Initialized i915 1.6.0 for 0000:00:02.0 on minor 1
[ 3.998487] ACPI: video: Video Device [GFX0] (multi-head: yes rom: no post: no)
[ 3.998876] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input13
[ 4.036635] ZFS: Loaded module v2.3.3-pve1, ZFS pool version 5000, ZFS filesystem version 5
[ 4.091663] fbcon: i915drmfb (fb0) is primary device
[ 4.091607] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[ 4.118316] snd_hda_codec_realtek hdaudioC0D0: autoconfig for ALC897: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:line
[ 4.118319] snd_hda_codec_realtek hdaudioC0D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[ 4.118320] snd_hda_codec_realtek hdaudioC0D0: hp_outs=1 (0x1b/0x0/0x0/0x0/0x0)
[ 4.118321] snd_hda_codec_realtek hdaudioC0D0: mono: mono_out=0x0
[ 4.118321] snd_hda_codec_realtek hdaudioC0D0: inputs:
[ 4.118322] snd_hda_codec_realtek hdaudioC0D0: Rear Mic=0x18
[ 4.118323] snd_hda_codec_realtek hdaudioC0D0: Front Mic=0x19
[ 4.118323] snd_hda_codec_realtek hdaudioC0D0: Line=0x1a
[ 4.160982] Console: switching to colour frame buffer device 240x67
[ 4.161122] input: HDA Intel PCH Rear Mic as /devices/pci0000:00/0000:00:1f.3/sound/card0/input14
[ 4.161151] input: HDA Intel PCH Front Mic as /devices/pci0000:00/0000:00:1f.3/sound/card0/input15
[ 4.161172] input: HDA Intel PCH Line as /devices/pci0000:00/0000:00:1f.3/sound/card0/input16
[ 4.161191] input: HDA Intel PCH Line Out as /devices/pci0000:00/0000:00:1f.3/sound/card0/input17
[ 4.161211] input: HDA Intel PCH Front Headphone as /devices/pci0000:00/0000:00:1f.3/sound/card0/input18
[ 4.161231] input: HDA Intel PCH HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input19
[ 4.161254] input: HDA Intel PCH HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input20
[ 4.161306] input: HDA Intel PCH HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input21
[ 4.161387] input: HDA Intel PCH HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input22
[ 4.172203] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[ 4.189473] i915 0000:00:02.0: 7 VFs could be associated with this PF

···················

[ 5.094449] pci 0000:00:02.1: [8086:a720] type 00 class 0x030000 PCIe Root Complex Integrated Endpoint
[ 5.094473] pci 0000:00:02.1: DMAR: Skip IOMMU disabling for graphics
[ 5.094537] pci 0000:00:02.1: Adding to iommu group 16
[ 5.094620] pci 0000:00:02.1: vgaarb: bridge control possible
[ 5.094621] pci 0000:00:02.1: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 5.094625] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
[ 5.094684] i915 0000:00:02.1: enabling device (0000 -> 0002)
[ 5.094706] i915 0000:00:02.1: [drm] Found ALDERLAKE_P/RPL-P (device ID a720) display version 13.00 stepping E0
[ 5.094724] i915 0000:00:02.1: Running in SR-IOV VF mode
[ 5.095200] i915 0000:00:02.1: [drm] GT0: GUC: interface version 0.1.20.1
[ 5.095735] i915 0000:00:02.1: [drm] VT-d active for gfx access
[ 5.095787] i915 0000:00:02.1: [drm] Using Transparent Hugepages
[ 5.096382] i915 0000:00:02.1: [drm] GT0: GUC: interface version 0.1.20.1
[ 5.097077] i915 0000:00:02.1: [drm] GT0: GUC: interface version 0.1.20.1
[ 5.098097] i915 0000:00:02.1: GuC firmware PRELOADED version 0.0 submission:SR-IOV VF
[ 5.098099] i915 0000:00:02.1: HuC firmware PRELOADED
[ 5.101202] i915 0000:00:02.1: [drm] Protected Xe Path (PXP) protected content support initialized
[ 5.101205] i915 0000:00:02.1: [drm] PMU not supported for this GPU.
[ 5.101281] [drm] Initialized i915 1.6.0 for 0000:00:02.1 on minor 0
[ 5.101450] pci 0000:00:02.2: [8086:a720] type 00 class 0x030000 PCIe Root Complex Integrated Endpoint
[ 5.101470] pci 0000:00:02.2: DMAR: Skip IOMMU disabling for graphics
[ 5.101520] pci 0000:00:02.2: Adding to iommu group 17
[ 5.101630] pci 0000:00:02.2: vgaarb: bridge control possible
[ 5.101631] pci 0000:00:02.2: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 5.101635] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=io+mem
[ 5.101638] i915 0000:00:02.1: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 5.101693] i915 0000:00:02.2: enabling device (0000 -> 0002)
[ 5.101709] i915 0000:00:02.2: [drm] Found ALDERLAKE_P/RPL-P (device ID a720) display version 13.00 stepping E0
[ 5.101727] i915 0000:00:02.2: Running in SR-IOV VF mode
[ 5.101998] i915 0000:00:02.2: [drm] GT0: GUC: interface version 0.1.20.1
[ 5.102451] i915 0000:00:02.2: [drm] VT-d active for gfx access
[ 5.102470] i915 0000:00:02.2: [drm] Using Transparent Hugepages
[ 5.103003] i915 0000:00:02.2: [drm] GT0: GUC: interface version 0.1.20.1
[ 5.103477] i915 0000:00:02.2: [drm] GT0: GUC: interface version 0.1.20.1
[ 5.104145] i915 0000:00:02.2: GuC firmware PRELOADED version 0.0 submission:SR-IOV VF
[ 5.104150] i915 0000:00:02.2: HuC firmware PRELOADED
[ 5.107133] i915 0000:00:02.2: [drm] Protected Xe Path (PXP) protected content support initialized
[ 5.107138] i915 0000:00:02.2: [drm] PMU not supported for this GPU.
[ 5.107203] [drm] Initialized i915 1.6.0 for 0000:00:02.2 on minor 2
[ 5.107346] pci 0000:00:02.3: [8086:a720] type

···················

lspci可以看到有7个VF

1
2
3
4
5
6
7
8
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04)
00:02.1 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04)
00:02.2 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04)
00:02.3 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04)
00:02.4 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04)
00:02.5 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04)
00:02.6 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04)
00:02.7 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04)

lspci -vs 00:02.0查看详细信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04) (prog-if 00 [VGA controller])
DeviceName: Onboard - Video
Subsystem: Intel Corporation Device 2212
Flags: bus master, fast devsel, latency 0, IRQ 167, IOMMU group 0
Memory at 60ec000000 (64-bit, non-prefetchable) [size=16M]
Memory at 4000000000 (64-bit, prefetchable) [size=256M]
I/O ports at 5000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Root Complex Integrated Endpoint, IntMsgNum 0
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [100] Process Address Space ID (PASID)
Capabilities: [200] Address Translation Service (ATS)
Capabilities: [300] Page Request Interface (PRI)
Capabilities: [320] Single Root I/O Virtualization (SR-IOV)
Kernel driver in use: i915
Kernel modules: xe, i915

三、Guest虚拟机使用vGPU

1. 新建虚拟机

以下虚拟机硬件设置是必须的:

  • 🖥️OS: Debian 13
  • ⌨️BIOS: OVMF(UEFI)
  • ⚠️EFI Disk: pre_enrolled_keys必须为false(开启此项会启用Secure Boot,默认会禁用dkms加载的模块,需要配置MOK
  • ⚙️Machine: q35

最后,添加一个PCI设备,选择其中任意一个VF即可,但是00:02.0这个是物理显卡不可选择!!!

2. 环境准备、驱动安装及启动参数修改

参考上一章PVE相关章节,几乎完全一致,只有以下不同点需要处理:

  • 安装的linux headers修改为linux-headers-$(uname -r)

    1
    apt install -y linux-headers-$(uname -r) dkms build-*
  • 启动参数去除i915.max_vfs=7

  • sysfsutils无需安装,且相应的设置也无需配置

  • 安装apt install firmware-intel-graphics

3. 测试

虚拟机的dmesg启动日志中应当可以看到类似输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[    2.449962] Module xe is blacklisted
[ 2.576150] i915: loading out-of-tree module taints kernel.
[ 2.576191] i915: module verification failed: signature and/or required key missing - tainting kernel
[ 2.791739] Setting dangerous option enable_guc - tainting kernel
[ 2.791901] i915: You are using the i915-sriov-dkms module, a ported version of the i915 module with SR-IOV support.
[ 2.791903] i915: Please file any bug report at https://github.com/strongtz/i915-sriov-dkms/issues/new.
[ 2.791903] i915: Module Homepage: https://github.com/strongtz/i915-sriov-dkms
[ 2.792140] i915 0000:06:10.0: [drm] Found ALDERLAKE_P/RPL-P (device ID a720) display version 13.00 stepping E0
[ 2.792180] i915 0000:06:10.0: Running in SR-IOV VF mode
[ 2.792885] i915 0000:06:10.0: [drm] GT0: GUC: interface version 0.1.20.1
[ 2.793852] i915 0000:06:10.0: vgaarb: deactivate vga console
[ 2.793867] i915 0000:06:10.0: [drm] Using Transparent Hugepages
[ 2.795355] i915 0000:06:10.0: [drm] GT0: GUC: interface version 0.1.20.1
[ 2.795676] i915 0000:06:10.0: [drm] GT0: GUC: interface version 0.1.20.1
[ 2.796194] i915 0000:06:10.0: GuC firmware PRELOADED version 0.0 submission:SR-IOV VF
[ 2.796198] i915 0000:06:10.0: HuC firmware PRELOADED
[ 2.798809] i915 0000:06:10.0: [drm] Protected Xe Path (PXP) protected content support initialized
[ 2.798814] i915 0000:06:10.0: [drm] PMU not supported for this GPU.
[ 2.798960] [drm] Initialized i915 1.6.0 for 0000:06:10.0 on minor 0

查看/dev/dri目录下是否有设备驱动,其中card0renderD128就是核显(如果存在card1card0,则card0是虚拟机默认的那个显卡,后面建议删除,用串口作为默认Console)

1
2
3
4
5
drwxr-xr-x  3 root root        120 Aug 30 17:14 .
drwxr-xr-x 18 root root 3260 Aug 30 17:14 ..
drwxr-xr-x 2 root root 100 Aug 30 17:14 by-path
crw-rw---- 1 root video 226, 1 Aug 30 17:14 card0
crw-rw---- 1 root render 226, 128 Aug 30 17:14 renderD128

同理,lspci也可以查看驱动信息,还可用vainfo检查VAAPI是否正常来判断核显是否成功启用。

四、Docker使用vGPU

如果只是为了dockerpodman等容器工具使用显卡那就很容易,直接参考:

1
2
3
4
5
6
7
8
9
10
11
12
13
services:
emby:
image: amilys/embyserver
privileged: true
devices:
- /dev/dri:/dev/dri
ports:
- 8096:8096
volumes:
- ./emby:/config
- ./media:/mnt/media
environment:
- TZ=Asia/Shanghai

关键点就在于privilegeddevices这两个配置了。以emby为例,启动后如果正确启用核显的话,设置中正在转码下应该有如下显示:

image-20250910224818064

五、添加K3S vGPU节点

1. 部署

1.1 手动档

k3s添加GPU设备Node相对就复杂一点了,但是好在官方提供了很不错的工具:

参考文档中的install-with-nfd章节:

1
2
3
4
5
6
7
8
# Start NFD - if your cluster doesn't have NFD installed yet
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd?ref=<RELEASE_VERSION>'

# Create NodeFeatureRules for detecting GPUs on nodes
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd/overlays/node-feature-rules?ref=<RELEASE_VERSION>'

# Create GPU plugin daemonset
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/gpu_plugin/overlays/nfd_labeled_nodes?ref=<RELEASE_VERSION>'

其中<RELEASE_VERSION>是Release Tag,比如v0.32.0,这边需要注意环境能够访问到github。完成后会多出资源:

  • daemonset.apps/intel-gpu-plugin
  • intel-gpu-plugin-mtt8m

其中daemonset.apps/intel-gpu-pluginNODE SELECTOR: intel.feature.node.kubernetes.io/gpu=true,kubernetes.io/arch=amd64

1.2 自动挡

参考Install with HELM charts,首先添加repo:

1
2
helm repo add intel https://intel.github.io/helm-charts/
helm repo update

类似自动挡,需要安装NFDcert-manager

1
2
helm install nfd nfd/node-feature-discovery --namespace node-feature-discovery --create-namespace
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --set crds.enabled=true

安装intel-device-plugins-operator

1
helm install intel-device-plugins-operator intel/intel-device-plugins-operator --namespace intel-device-plugins --create-namespace

最好针对你的平台部署合适的intel-device-plugin

1
2
helm install <NAME> intel/intel-device-plugins-<PLUGIN> --namespace intel-device-plugins \
--set nodeFeatureRule=true

其中<NAME>是Helm部署名字随便写,<PLUGIN>则需要从gpu, sgx, qat, dlb, dsa & iaa里面选择一个来了,根据实际硬件环境选择。

2. 测试

推荐使用OpenCL image直接测试,需要自行build一下intel-opencl-icd镜像:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/demo/intel-opencl-icd/Dockerfile
FROM ubuntu:22.04

ARG APT="env DEBIAN_FRONTEND=noninteractive apt"

RUN ${APT} update && ${APT} install -y curl gpg-agent \
&& echo 'deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified' | \
tee -a /etc/apt/sources.list.d/intel.list \
&& curl -s https://repositories.intel.com/gpu/intel-graphics.key | \
gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg \
&& ${APT} update \
&& ${APT} install -y --no-install-recommends \
intel-opencl-icd \
clinfo \
&& ${APT} remove -y curl gpg-agent \
&& ${APT} autoremove -y

执行以获取GPU能力:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/demo/intelgpu-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: intelgpu-demo-job
labels:
jobgroup: intelgpu-demo
spec:
template:
metadata:
labels:
jobgroup: intelgpu-demo
spec:
restartPolicy: Never
containers:
- name: intelgpu-demo-job
image: intel/intel-opencl-icd:devel
imagePullPolicy: IfNotPresent
command: [ "clinfo" ]
resources:
limits:
gpu.intel.com/i915: 1

会有很长的输出,这边仅仅截取一部分展示:

image-20250910231135125

六、其他

  1. 强烈建议使用Windows测试PVE Host配置是否成功。Windows只需要使用Intel Graphics driver installer安装驱动即可,完成后任务管理器中也应该直接可以看到核显,如果设备管理器显示代码43等大概率就是Host的SR-IOV配置有问题。
  2. 虚拟机的vIOMMU确认无需开启。
  3. Linux虚拟机存在默认显卡,似乎会导致问题出现,禁用后网页端VNC就没有了。可以考虑使用串口作为终端,配置方法为PVE中Hardware添加一个串口设备,Display改为Serial terminal 0。虚拟机Linux的启动参数添加console=tty0 console=ttyS0,115200参数即可。
  4. 使用cloud init自动部署虚拟机使用debian-13-genericcloud-amd64-20250924-2245.qcow2这类genericcloud镜像作为模板时,由于是针对云环境的,会有一些库缺失,导致安装dkms模块失败,需要使用generic的镜像。