Got a call today.
Panic!!!!
All VM on an ESX host just went grey – all disconnected.
Trouble shooting steps:
- Ping ESX host Service Console – All ok
- Look in the VI client what is with the server – NOT OK – all machines are greyed out – (hey that is what they said wasn’t it).
- SSH into the Service console - All ok
- Direct GUI management to the server NOT OK. could not load the inventory
- All VM's on the host were running and responding to ping.
- No failover was initiated in the cluster.
- On the console – I saw that there were 7 processes of vmware-hostd each using a lot of RAM.
- service mgmt-vmware stop – to stop the service. GOT STUCK
- Off to this KB which helped me stop the service and get the host responsive again.
# cd /var/run/vmware
# ls -l vmware-hostd.PID watchdog-hostd.PID (to get the current PID of the process)
# cat vmware-hostd.PID (i.e. 1234 is the PID)
# kill -9 <PID> (kill the process)
# rm vmware-hostd.PID watchdog-hostd.PID remove the files
# service mgmt-vmware start (restart the agent) - The host came back online – all VM’s were no longer grey.
Here starts my questions.