Featured image of post Fixing Proxmox PVE issues when adding or removing PCIe devices

Fixing Proxmox PVE issues when adding or removing PCIe devices

Proxmox Virtual Environment (PVE) offers powerful virtualization capabilities but tinkering with PCIe devices can lead to unexpected challenges. If like me, you didn’t realize that messing around with PCIe devices in your Proxmox PVE host would break your VMs, keep reading.

In my case, I added a new NVMe SSD, but this can also happen when removing PCIe devices.

Access the Proxmox PVE host with mouse and keyboard and log in as a privileged user.

Gathering info: PCIe bus ids

It’s EXTREMELY likely that the device addresses change upon adding or removing any PCIe devices:

[For each restart] the PCI BIOS must walk the base PCI bus (starting at bus 0), subsequent bridges, and bridged devices to search and identify other PCI buses as if it were the first time.

Each time the PCI BIOS discovers another PCI bus after a physical configuration change is made, it increments the bus number and continues to walk the bus until all other buses are discovered.

As it discovers each bus and/or bridge, the PCI BIOS:

  • Records each unique bus number
  • Associates the bus number to a bus or bridged PCI device

Compaq, PCI Bus Numbering in a Microsoft Windows NT Environment

In order to better understand how the PCIe bus ids have changed, lets run lspci to get a list of PCIe devices installed and their respective bus ids.

1
2
3
4
5
6
7
8
$ lspci

[...]
01:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 (rev 01)
02:00.0 Ethernet controller: Intel Corporation Device 125c (rev 04)
03:00.0 Ethernet controller: Intel Corporation Device 125c (rev 04)
04:00.0 Ethernet controller: Intel Corporation Device 125c (rev 04)
05:00.0 Ethernet controller: Intel Corporation Device 125c (rev 04)

The NVMe drive I just installed is 01:00.0. That bus id used to belong to the first Intel ethernet controller. So in this case, all NICs bus ids have increased by one.

  • 01:00.0 is now 02:00.0
  • 02:00.0 is now 03:00.0
  • …and so on

Potential issues

Issue 1: Network connectivity is gone

If your NICs are PCIe (which is very likely!), they would be affected by this:

The Linux kernel assigns names to network interfaces by combining a fixed prefix and a number that increases as the kernel initializes the network devices. […] If you add another network interface card to the system, the assignment of the kernel device names is no longer fixed. Consequently, after a reboot, the kernel can name the device differently.

When the consistent network device name feature is enabled, the udev device manager creates the names of devices based on different criteria.

  • en for Ethernet

  • [P<domain_number>]p<bus>s<slot>[f<function>][d<device_id>]

RedHat, RHEL 8 documentation. Chapter 1. Consistent network interface device naming

Run ip link (with the ethernet cable connected) in order to get the interface name for the Proxmox Linux Bridge.

1
2
3
4
5
6
7
8
$ ip link

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP mode DEFAULT group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
7: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff

Leaving the loopback (lo) and Linux Bridge (vmbr0) interfaces aside, the new interface name for the Linux Bridge is enp5s0.

Comparing against the naming criteria, we can understand that enp5s0:

  • Is an ethernet connection.
  • Its bus is 5.
  • Its slot within the bus is 0.

Can now edit /etc/network/interfaces to reflect the change, and restart the networking service.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# /etc/network/interfaces

auto lo
iface lo inet loopback

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.120/24
        gateway 192.168.1.1
-       bridge-ports enp4s0
+       bridge-ports enp5s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
1
$ systemctl restart networking.service

At this point, network connectivity should be restored. Check by pinging Google: ping -c3 google.com.

Issue 2: Broken Passthrough PCI devices

Now, one of Proxmox PVE features is the ability to do PCI Passthrough, allowing the guest VM to access host physical resources. As you have probably guessed by now, if any of your VMs use PCI passthrough, they will also be affected by the PCIe bus changes.

This is required for every VM that does PCI passthrough.

Retrieve the VM id of the relevant machine, with the following command:

1
2
3
$ qm list
VMID    NAME        STATUS     MEM(MB)    BOOTDISK(GB)      PID       
100     routervm     running    8192       32.00             8705  

Then export the VM id as an environment variable, to avoid having to remember it every time: export VM_ID=100.

Ensure the VM is (gracefully) stopped, by running qm shutdown $VM_ID. Once stopped, check the VM config, with qm config $VM_ID.

1
2
3
4
5
6
7
8
9
$ qm config $VM_ID

boot: order=scsi0
cores: 4
cpu: kvm64,flags=+aes
hostpci0: 0000:02:00
hostpci1: 0000:03:00
hostpci2: 0000:04:00
[...]

hostpci0, hostpci1 and hostpci2 are all NICs that are PCI passthrough devices. As mentioned earlier, the bus ids for the NICs has increased by one, so just need to edit the VM config file located in /etc/pve/qemu-server/${VM_ID}.conf in my case:

Remember to shutdown the machine first if you haven’t yet: qm shutdown $VM_ID

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# /etc/pve/qemu-server/$VM_ID.conf

[...]
cores: 4
cpu: kvm64,flags=+aes
- hostpci0: 0000:02:00
+ hostpci0: 0000:03:00
- hostpci1: 0000:03:00
+ hostpci1: 0000:04:00
- hostpci2: 0000:04:00
+ hostpci2: 0000:05:00
memory: 8192
[...]

Now you should just be able to either start the VM from the Proxmox Web UI (remember, we just restored connectivity to it!). Or by running the following command: qm start $VM_ID.