2013-06-19

All I Did Was Add a VMkernel Interface (Routing)

That was the call I got today.
"All I did was add a VMkernel interface and my host lost connection to vCenter".

On went my troubleshooting hat.

First the environment (simplified)

Environment

The physical interfaces on which the VMkernel interfaces reside were trunked with multiple VLANs. In this case VLAN(4) and VLAN(49).

vmk0 was used for ESXi management - with a default gateway of x.x.4.254

When the user added vmk1 - the host would become disconnected, he removed vmk1 - the host reconnected to the vCenter.

While the host was disconnected, we tried to ping the vmk0 interface - replies were fine.

While the host was disconnected - we tried to connect to the host directly with SSH and the vSphere client - both worked.

While the host was disconnected - we tried to ping the vCenter server with its IP - there was no response.

While the host was disconnected - we tried to ping the external network - replies were fine.

We then looked at the settings on vmk1 - and I noticed that the user had not set the VLAN49 tag on the VMkernel interface. Obviously this was not set correctly, and by adding VLAN(49) to vmk1 - everything worked correctly. The Host reconnected to vCenter.

So the problem was solved - VLAN(49) was missing on vmk1. ????

I was puzzled and tried to understand why this misconfiguration would cause the host to disconnect from the vCenter - then I realized why, and therefore the reason for this post.

When configuring a VMkernel interface, a new entry is added to the routing table. There will only be one default gateway - and that will the one defined on the Management interface. The additional VMkernel interfaces will not have a gateway defined.

This was the printout of the esxcfg-route -l from the Host.

image

Just to explain the output in plain English.

Anything on Network x.x.4.0 on that subnet will go out through vmk0.
Anything on Network x.x.6.0 on that subnet will go out through vmk2.
Anything on Network x.x.49.0 on that subnet will go out through vmk1.
Everything that does not match the above - will go out the Default Gateway x.x.4.254 through vmk0.

So the user had configured vmk1 on x.x.49.0. That meant any traffic trying to go out to the vCenter server - would go out through vmk1 - it was on the same subnet.

But… the user had not assigned the appropriate VLAN(49) tag to the interface - which meant that the interface would send out packets onto the network but without the correct VLAN tag on the packets, and therefore the Host could not communicate with the vCenter.


Photo by Jelle Oostrom (flickr)

Always follow the route..