proxmox vm / no arp replies / mellanox nic

Written by
Walter Doekes
Published on 2025-09-30

Recently, when upgrading a Proxmox host from 7.2 to 8.4, we ran into a strange issue. The hypervisor host had network connectivity, but the VM guests did not. The upgrade was continued to Proxmox 9.0, but that did not fix the problem.

The host was fine. We could ssh in, ping the gateway, do updates, whatever — but the guests had trouble communicating. In particular: the guests appeared unable to find the gateway at layer 2.

This upgrade process happened on a test system that might not have been the cleanest of systems. So the culprit could be anywhere, and after several hours of poking around I was ready to throw the proverbial towel in the ring. To make matters worse, I wasn't involved with this system and was unaware of which changes had been made. For all I knew, things had been changed on the peer side as well.

Missing ARP replies

What do I mean by “unable to find the gateway”?

When a machine wants to send a packet to another machine over Ethernet, it first needs the MAC (hardware) address of the layer 2 (L2, Ethernet) peer. For communication outside its own network, it needs the address of the gateway.

Usually these MAC addresses are learnt automatically, but sometimes an explicit query has to be made. In this case the virtual machine (VM) broadcasts an ARP who-has request on the network: “Who has 10.20.30.32?” The gateway (at 10.20.30.32) should respond with an ARP is-at reply, providing its MAC.

Here's what tcpdump showed on the host:

# tcpdump -enni enmlx0 vlan and arp

11:54:40.417131 bc:24:11:xx:xx:xx > ff:ff:ff:ff:ff:ff,
  ethertype 802.1Q (0x8100), length 46: vlan 66, p 0, ethertype ARP (0x0806),
  Request who-has 10.20.30.32 tell 10.20.30.33, length 28

You see the who-has from 10.20.30.33 asking for the physical address of 10.20.30.32. This kept repeating. No other traffic could be seen for VLAN 66. And because no is-at reply was in sight, the VM obviously had trouble communicating: no L3 traffic is possible without L2.

The VM sat in bridge vmbr66, and the host that did have connectivity was communicating on bridge vmbr55:

# brctl show

bridge name  bridge id           STP enabled  interfaces
vmbr55       8000.248a07xxxxxx   no           enmlx0.55
vmbr66       8000.248a07xxxxxx   no           enmlx0.66
                                              tap140i0

# ip -br a

lo               UNKNOWN        127.0.0.1/8 ::1/128
enmlx0           UP             fe80::268a:7ff:xxxx:xxxx/64
enmlx0.55@enmlx0 UP
vmbr55           UP             192.168.2.3/31 fe80::268a:7ff:xxxx:xxxx/64
enmlx0.66@enmlx0 UP
vmbr66           UP             fe80::268a:7ff:xxxx:xxxx/64
tap140i0         UNKNOWN

On the directly connected Cumulus L3 switch, we could sniff the traffic as well:

# tcpdump -enni swp56s1 vlan and arp

11:54:40.417428 bc:24:11:xx:xx:xx > ff:ff:ff:ff:ff:ff,
  ethertype 802.1Q (0x8100), length 60: vlan 66, p 2, ethertype ARP,
  Request who-has 10.20.30.32 tell 10.20.30.33, length 42

11:54:40.417479 d8:c4:97:xx:xx:xx > bc:24:11:xx:xx:xx,
  ethertype 802.1Q (0x8100), length 46: vlan 66, p 0, ethertype ARP,
  Reply 10.20.30.32 is-at d8:c4:97:xx:xx:xx, length 28

It's really nice that we can do traffic sniffing on these switches. But you have to keep several things in mind: (1) you only get to see local traffic (packets from/to the switch at L2 or L3); (2) not everything might travel the route you think.

In this case, we can clearly see that the switch gets the who-has (broadcast to ff:ff:ff:ff:ff:ff) and it replies with is-at.

But, keeping the caveats in mind, I was still only 90% sure the switch wasn't to blame.

(We also see that the size of the who-has differs, but that was likely just padding added by either the sending or receiving hardware.)

In the meantime, traffic to/from the host was working fine. The difference being that it used the MAC address of the host: 24:8a:07:xx:xx:xx instead of bc:24:11:xx:xx:xx.

Traffic flow, changes, and usual suspects

In our setup, guests connect to a Linux bridge on the Proxmox host. That bridge is backed by a Mellanox ConnectX-4 Lx NIC and tagged subinterfaces for VLAN separation. Packets should flow like this: VM → Linux bridge → Mellanox NIC (with VLAN tags) → upstream switch.

The Proxmox host had been upgraded, and along with it, the Debian host OS and Linux kernel: from Proxmox 7.2 on Bullseye with kernel 5.15.35-2-pve, to Proxmox 8.4 on Bookworm with kernel 6.8.12-15-pve, and finally Proxmox 9.0 on Trixie with kernel 6.14.11-2-pve.

It was useful that we had installed GoCollect on this host. It allowed us to look back in time and see what versions of kernels were running earlier (os.kernel).

Trying to debug the cause for the missing packets, I checked the usual culprits:

iptables/ebtables/nft — any kind of table I could find, both on the switch and on the host — all empty or irrelevant;
Bridge and VLAN config — removing the bridge manually, and setting up both the VLAN subinterfaces and the bridge, and the subinterface on the switch;
FDB weirdness — I tried to find the switch's forwarding tables, but apart from the ip neigh interface — which looked correct — I couldn't find other MAC address tables;
Forwarding rules — compared the ip_forward settings and synced it with a host that has no problems;
MAC tricks — chose a fresh VM MAC address;
VLAN priority — tried to set priority 0 instead of priority 2 (p 2).

Nothing.

Offloading

During these debug sessions I asked Mr. Chat for advice, and it came up with suggestions like:

# ethtool -K enmlx0 rxvlan off txvlan off gro off gso off

That disables various offloading options in the NIC: this hardware acceleration speeds up traffic handling, but it may hide something. Maybe now the missing packets would appear?

They didn't. And I was ready to give up.

Walter, red-eyed, asking Emilia in the chat to just flush the entiremachine

But as they say: de aanhouder wint.

After looking some more at ethtool -k output, the rx-vlan-filter setting caught my eye. Let's try that.

# ethtool -K enmlx0 rx-vlan-filter off

All connectivity to the host was lost.

According to sources, the networking just needed a “bump” and would then resume. But no — that setting needed to be on or we had no connectivity at all.

That shouldn't be the case, so I went looking for firmware upgrades.

Mellanox firmware

On the NVIDIA site, the mlxup tool was easy to find. And it did exactly what it was supposed to do:

# ./mlxup --query
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX4LX
  Part Number:      MCX4121A-ACA_Ax
  Description:      ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
  PSID:             MT_2420110034
  PCI Device Name:  0000:81:00.0
  Base MAC:         248a07xxxxxx
  Versions:         Current        Available
     FW             14.16.1020     14.32.1010
     PXE            3.4.0812       3.6.0502
     UEFI           N/A            14.25.0017

  Status:           Update required

That looked promising. Let's update.

# ./mlxup -d 0000:81:00.0
...

Found 1 device(s) requiring firmware update...

Perform FW update? [y/N]: y

About as smooth as a firmware upgrade can get.

And that was it: rx-vlan-filter could now be toggled, and better yet, the guest VM started communicating.

Timeline

So, what must have happened is the following:

Sometime between kernel 5.15 and kernel 6.8, a new hardware feature in the Mellanox NIC was put to use;
This feature didn't work on the older firmware;
But that was fixed after firmware 14.16.1020 and before 14.27.1016 (seen on a working system).

I spent a bit of time trying to find the offending commit, but I didn't want to revert kernels or firmwares. Simply browsing the kernel changes and the (unfortunately very concise) firmware changelog was not sufficient for me to pinpoint the specific changes in both.

I suspect maybe this “[while] using e-switch vport sVLAN stripping, the RX steering values on the sVLAN might not be accurate” firmware fix. And possibly for the kernel one of the “[this] patch series deals with vport handling in SW steering” patches.

Once again, I did learn a lot. Next time a NIC decides to misbehave, I'll make sure to also probe info using devlink dev eswitch show pci/0000:81:00.0 and devlink dev param show pci/0000:81:00.0.

Info gathering and tips

We call our Mellanox NICs enmlx0 and upwards using systemd-networkd .link files. This shortens the name from the stable enp129s0f0 or enp129s0f0np0 names. We do this because there is (still) a 15-character limit on interface names. VLANs can go up to 4095, so a subinterface of enp129s0f0np0.4095 would require 18 characters, and won't fit.

If you're looking for the serial number of your Mellanox NIC, you can use lspci -vv:

# lspci -vv | sed -e '/^[^[:blank:]].*Mellanox/,/^$/!d;/^[^[:blank:]]\|[[]SN[]]/!d'
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
      [SN] Serial number: MT16xxXxxxxx
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
      [SN] Serial number: MT16xxXxxxxx

If you happen to have all machine info of your fleet — like we do with GoCollect — the firmware version can be found in lshw (app.lshw):

# lshw -json | jq 'recurse(.children[]?)
    | select(.vendor == "Mellanox Technologies")
    | .configuration'

{
  "autonegotiation": "on",
  "broadcast": "yes",
  "driver": "mlx5_core",
  "driverversion": "6.14.11-2-pve",
  "firmware": "14.32.1010 (MT_2420110034)",
  "latency": "0",
  "link": "no",
  "multicast": "yes"
}

proxmox vm / no arp replies / mellanox nic