2010-01-12

Bad VM! Bad VM!!

I had an issue today with a slowly responding ESX 3.5 Host.

When going into the Host – it was visible that pCPU0 was running constantly at 90-100%.

In esxtop I saw that the console process was running at 50-70% of %USED CPU.

Back into top on the host. I saw that vmkload_app was sitting at 30-50 %CPU on a permanent basis.

Below is similar to what I was seeing

TOP

PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
7645 root      5 -10   920  920   600 S    30.9 0.3   0:26   0 vmkload_app

ESXTOP

PCPU(%):  95.06,  13.33,  ……… 

ID    GID NAME      NWLD   %USED    %RUN    %SYS  
9      9  console   1      4.40     4.43    0.01 

Looking on Google the suggestions were:

  1. service mgmt-vmware restart
  2. service vmware-vpxa restart
  3. service pegasus restart
  4. Disable HA and re-enable on the Cluster.

Nothing helped

So I tried to get some more information from the process

ps –ef | 7645

and got this:

root      7645     1  0 15:00 ?       

00:00:31 /usr/lib/vmware/bin/vmkload_app /usr/lib/vmware/bin/vmware-vmx -ssched.group=host/user -# name=VMware ESX Server;version=3.5.0;licensename=VMware ESX Server;licenseversion=2.0 build-143128; -@ pipe=/tmp/vmhsdaemon-0/vmx8bb2cfd461217725; /vmfs/volumes/49af9c6a-c4e3adf1-e61

Now if you notice the really interesting stuff (like the VM name) is not there. I tried to pipe it to a file – nope tried other options – nope.

So how did I find which VM it was?

Each Process that is running will be listed under /proc. Remember the Process ID (7645)?

[ESX]# cat /proc/7645/cmdline
/usr/lib/vmware/bin/vmkload_app/usr/lib/vmware/bin/vmware-vmx-ssched.group=host/user-#name=VMware ESX Server;version=3.5.0;licensename=VMware ESX Server;licenseversion=2.0 build-143128;-@pipe=/tmp/vmhsdaemon-0/vmx8bb2cfd461217725;/vmfs/volumes/49af9c6a-c4e3adf1-e616-001e0bd66d9a/FILE_LOADER/FILE_LOADER.vmx

I looked at the VM – it seemed to be performing correctly, still we power cycled the VM – and that cleared up the process and released the resources on the ESX host.

2 Questions I still have.

  1. Is there another way of getting the full command of the process?
  2. What caused this VM to go Meshuga?

Update: Thanks to Nitro for the easier way of doing this

ps -efww