proxmox vm / no arp replies / mellanox nic
Recently, when upgrading a Proxmox host from 7.2 to 8.4, we ran into a strange issue. The hypervisor host had network connectivity, but the VM guests did not. The upgrade was continued to Proxmox 9.0, but that did not fix the problem.
The host was fine. We could ssh in, ping the gateway, do updates, whatever — but the guests had trouble communicating. In particular: the guests appeared unable to find the gateway at layer 2.
This upgrade process happened on a test system that might not have been the cleanest of systems. So the culprit could be anywhere, and after several hours of poking around I was ready to throw the proverbial towel in the ring. To make matters worse, I wasn't involved with this system and was unaware of which changes had been made. For all I knew, things had been changed on the peer side as well.
Missing ARP replies
What do I mean by “unable to find the gateway”?
When a machine wants to send a packet to another machine over Ethernet, it first needs the MAC (hardware) address of the layer 2 (L2, Ethernet) peer. For communication outside its own network, it needs the address of the gateway.
Usually these MAC addresses are learnt automatically, but sometimes an explicit query has to be made. In this case the virtual machine (VM) broadcasts an ARP who-has request on the network: “Who has 10.20.30.32?” The gateway (at 10.20.30.32) should respond with an ARP is-at reply, providing its MAC.
Here's what tcpdump showed on the host:
# tcpdump -enni enmlx0 vlan and arp
11:54:40.417131 bc:24:11:xx:xx:xx > ff:ff:ff:ff:ff:ff,
ethertype 802.1Q (0x8100), length 46: vlan 66, p 0, ethertype ARP (0x0806),
Request who-has 10.20.30.32 tell 10.20.30.33, length 28
You see the who-has from 10.20.30.33 asking for the physical address of 10.20.30.32. This kept repeating. No other traffic could be seen for VLAN 66. And because no is-at reply was in sight, the VM obviously had trouble communicating: no L3 traffic is possible without L2.
The VM sat in bridge vmbr66, and the host that did have connectivity was communicating on bridge vmbr55:
# brctl show
bridge name bridge id STP enabled interfaces
vmbr55 8000.248a07xxxxxx no enmlx0.55
vmbr66 8000.248a07xxxxxx no enmlx0.66
tap140i0
# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
enmlx0 UP fe80::268a:7ff:xxxx:xxxx/64
enmlx0.55@enmlx0 UP
vmbr55 UP 192.168.2.3/31 fe80::268a:7ff:xxxx:xxxx/64
enmlx0.66@enmlx0 UP
vmbr66 UP fe80::268a:7ff:xxxx:xxxx/64
tap140i0 UNKNOWN
On the directly connected Cumulus L3 switch, we could sniff the traffic as well:
# tcpdump -enni swp56s1 vlan and arp
11:54:40.417428 bc:24:11:xx:xx:xx > ff:ff:ff:ff:ff:ff,
ethertype 802.1Q (0x8100), length 60: vlan 66, p 2, ethertype ARP,
Request who-has 10.20.30.32 tell 10.20.30.33, length 42
11:54:40.417479 d8:c4:97:xx:xx:xx > bc:24:11:xx:xx:xx,
ethertype 802.1Q (0x8100), length 46: vlan 66, p 0, ethertype ARP,
Reply 10.20.30.32 is-at d8:c4:97:xx:xx:xx, length 28
It's really nice that we can do traffic sniffing on these switches. But you have to keep several things in mind: (1) you only get to see local traffic (packets from/to the switch at L2 or L3); (2) not everything might travel the route you think.
In this case, we can clearly see that the switch gets the who-has
(broadcast to ff:ff:ff:ff:ff:ff) and it replies with is-at.
But, keeping the caveats in mind, I was still only 90% sure the switch wasn't to blame.
(We also see that the size of the who-has differs, but that was likely just padding added by either the sending or receiving hardware.)
In the meantime, traffic to/from the host was working fine. The
difference being that it used the MAC address of the host:
24:8a:07:xx:xx:xx instead of bc:24:11:xx:xx:xx.
Traffic flow, changes, and usual suspects
In our setup, guests connect to a Linux bridge on the Proxmox host. That bridge is backed by a Mellanox ConnectX-4 Lx NIC and tagged subinterfaces for VLAN separation. Packets should flow like this: VM → Linux bridge → Mellanox NIC (with VLAN tags) → upstream switch.
The Proxmox host had been upgraded, and along with it, the Debian host OS and Linux kernel: from Proxmox 7.2 on Bullseye with kernel 5.15.35-2-pve, to Proxmox 8.4 on Bookworm with kernel 6.8.12-15-pve, and finally Proxmox 9.0 on Trixie with kernel 6.14.11-2-pve.
It was useful that we had installed
GoCollect on this host. It
allowed us to look back in time and see what versions of kernels were
running earlier (os.kernel).
Trying to debug the cause for the missing packets, I checked the usual culprits:
- iptables/ebtables/nft — any kind of table I could find, both on the switch and on the host — all empty or irrelevant;
- Bridge and VLAN config — removing the bridge manually, and setting up both the VLAN subinterfaces and the bridge, and the subinterface on the switch;
- FDB weirdness — I tried to find the switch's forwarding tables, but apart from the ip neigh interface — which looked correct — I couldn't find other MAC address tables;
- Forwarding rules — compared the ip_forward settings and synced it with a host that has no problems;
- MAC tricks — chose a fresh VM MAC address;
- VLAN priority — tried to set priority 0 instead of priority 2
(
p 2).
Nothing.
Offloading
During these debug sessions I asked Mr. Chat for advice, and it came up with suggestions like:
# ethtool -K enmlx0 rxvlan off txvlan off gro off gso off
That disables various offloading options in the NIC: this hardware acceleration speeds up traffic handling, but it may hide something. Maybe now the missing packets would appear?
They didn't. And I was ready to give up.

But as they say: de aanhouder wint.
After looking some more at ethtool -k output, the rx-vlan-filter
setting caught my eye. Let's try that.
# ethtool -K enmlx0 rx-vlan-filter off
All connectivity to the host was lost.
According to sources, the networking just needed a “bump” and would then
resume. But no — that setting needed to be on or we had no
connectivity at all.
That shouldn't be the case, so I went looking for firmware upgrades.
Mellanox firmware
On the NVIDIA site, the mlxup tool was easy to find. And it did
exactly what it was supposed to do:
# ./mlxup --query
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX4LX
Part Number: MCX4121A-ACA_Ax
Description: ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
PSID: MT_2420110034
PCI Device Name: 0000:81:00.0
Base MAC: 248a07xxxxxx
Versions: Current Available
FW 14.16.1020 14.32.1010
PXE 3.4.0812 3.6.0502
UEFI N/A 14.25.0017
Status: Update required
That looked promising. Let's update.
# ./mlxup -d 0000:81:00.0
...
Found 1 device(s) requiring firmware update...
Perform FW update? [y/N]: y
About as smooth as a firmware upgrade can get.
And that was it: rx-vlan-filter could now be toggled, and better yet,
the guest VM started communicating.
Timeline
So, what must have happened is the following:
- Sometime between kernel 5.15 and kernel 6.8, a new hardware feature in the Mellanox NIC was put to use;
- This feature didn't work on the older firmware;
- But that was fixed after firmware 14.16.1020 and before 14.27.1016 (seen on a working system).
I spent a bit of time trying to find the offending commit, but I didn't want to revert kernels or firmwares. Simply browsing the kernel changes and the (unfortunately very concise) firmware changelog was not sufficient for me to pinpoint the specific changes in both.
I suspect maybe this “[while] using e-switch vport sVLAN stripping, the RX steering values on the sVLAN might not be accurate” firmware fix. And possibly for the kernel one of the “[this] patch series deals with vport handling in SW steering” patches.
Once again, I did learn a lot. Next time a NIC decides to misbehave,
I'll make sure to also probe info using
devlink dev eswitch show pci/0000:81:00.0 and
devlink dev param show pci/0000:81:00.0.
Info gathering and tips
We call our Mellanox NICs enmlx0 and upwards using systemd-networkd .link files. This shortens the name from the stable enp129s0f0 or enp129s0f0np0 names. We do this because there is (still) a 15-character limit on interface names. VLANs can go up to 4095, so a subinterface of enp129s0f0np0.4095 would require 18 characters, and won't fit.
If you're looking for the serial number of your Mellanox NIC, you can
use lspci -vv:
# lspci -vv | sed -e '/^[^[:blank:]].*Mellanox/,/^$/!d;/^[^[:blank:]]\|[[]SN[]]/!d'
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[SN] Serial number: MT16xxXxxxxx
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[SN] Serial number: MT16xxXxxxxx
If you happen to have all machine info of your fleet — like we do with
GoCollect — the firmware
version can be found in lshw (app.lshw):
# lshw -json | jq 'recurse(.children[]?)
| select(.vendor == "Mellanox Technologies")
| .configuration'
{
"autonegotiation": "on",
"broadcast": "yes",
"driver": "mlx5_core",
"driverversion": "6.14.11-2-pve",
"firmware": "14.32.1010 (MT_2420110034)",
"latency": "0",
"link": "no",
"multicast": "yes"
}