2009-05-14

Enable EVC on a Cluster

One of the wonderful useful features that was released with ESX 3.5 U2 was Enhanced VMotion Compatibility (EVC).

From the release notes:

Enhanced VMotion Compatibility – Simplifies VMotion compatibility issues across CPU generations. Enhanced VMotion compatibility (EVC) automatically configure server CPUs with Intel FlexMigration or AMD-V Extended Migration technologies to be compatible with older servers. After EVC is enabled for a cluster in the VirtualCenter inventory, all hosts in that cluster are configured to ensure CPU compatibility for VMotion. VirtualCenter does not permit the addition of hosts that cannot be automatically configured to be compatible with those already in the EVC cluster.

Why am I blogging about this now, well I came across a situation this week, with a cluster that we wanted to enable EVC on. This is how the story goes.

Client wanted to upgrade RAM on two 100% identical hosts (same server, same CPU, same configuration, same BIOS level – everything). In order to bring them down he needed to VMotion the VM’s off, and upgrade the RAM. For some reason some VM’s (mainly RHEL 64bit) would not Vmotion off the host, and we were getting CPU mismatch errors.

We powered down the problematic VM’s and cold migrated them to a new host. Upgraded the RAM, and at the same time, BIOS upgrades as well. Brought the one host back up – migrated the VM’s again (cold migrations for the problematic ones) and brought down the second host. RAM upgrade, BIOS as well, and brought the host up again.

Vmotion between the hosts worked fine. But in order to prevent such issues in the future, he wanted to enable EVC. So you cannot enable EVC on a cluster that currently has hosts inside of it,

 

image

As is detailed in this post – what you need to do essentially is:

  • Put host in maintenance mode
  • Remove host from original cluster
  • Create new EVC enabled cluster
  • Add host to it (check NX/XD and Intel_VT options in system BIOS. These options need to be enabled)
  • VMotion files from old to new cluster.
  • Disarm old cluster

So I started the process, and when I tried to Vmotion a VM from a non-EVC into the new EVC-enabled cluster I got this error

image

Now going to KB 1993 gives a wealth of information about “VMotion CPU Compatibility - Migrations Prevented Due to CPU Mismatch - How to Override Masks” but in order to change the masking for the VM – you need to power it down (which if I already was going to power the VM off, then I would already Vmotion it into the new cluster), or enable a global setting in the vpxd.cfg file (the info is in the KB). This I was not going to do because this is not the only host in cluster in the Datacenter.

We rechecked that both VT and check NX/XD options were enabled in the bios, and that the output of

cat /proc/vmware/cpuinfo

on the ESX hosts were identical.

Carter Shanklin suggested to perhaps, suspend the VM’s, and then try to Vmotion them, but that did not help either.

So I guess I will have to revert to Shutting down all the VM’s, Vmotion them into the new EVC-enabled cluster and then power them on again.

What do you think could be the cause of this kind of behavior? It should work seeing that both hosts are identical.

Unfortunately, I do not have another solution at this time.