2013-09-13

vSphere App HA - Closer... But Not Yet

vSphere App HA has undergone a decent overhaul - with a substantial increase in the use cases. But we are not there yet.

Not so long ago I wrote a post about the missing piece in the VMware HA puzzle. And I would like to continue the discussion about how this new release completes (or does not yet complete) the puzzle.

So before we get into what it exactly does and what has changed - here are at least three articles that have already posted information about App HA.

First some disclosure. Most of this information is taken from the Beta documentation. That means that there could (or could not) be some changes when the product goes GA.

vSphere App HA lets you define high availability for the applications that are running on the virtual
machines in your environment. vSphere App HA performs following functions.

  • Provides application level visibility and control.
  • Displays application availability in the vSphere Web Client and performs remediation if the service is unavailable.Uses remediation actions defined by the user such as restart service or reset virtual machine, and attempts to restart the application inside the virtual machine.
  • If an application restart fails, resets the virtual machine.

Here is what the architecture looks like.

image

First the dry details and limitations.

vSphere App HA - will be another separate virtual appliance that you have to run in your environment. With the minimum requirements of 2vCPU's, 4GB RAM, 20GB disk, 1Gbps Network.

It works only with the Web Client - so the traditional vSphere client is not supported (which is expected)

The services supported by vSphere App HA are:

  • MSSQL 2005, 2008, 2008R2, 2012
  • Tomcat 6.0, 7.0
  • TC Server Runtime 6.0, 7.0
  • IIS 6.0, 7.0, 8.0
  • Apache HTTP Server 1.3, 2.0, 2.2.

You can only install one vFabric Hyperic server on one vCenter server with one vSphere App HA plug-in installed (1:1:1).

vSphere App HA has a scale limitation of 400 agents.

Each guest that is to be monitored with App HA needs to have a Hyperic Agents installed and configured with the correct Hyperic Server.

Now comes the not so know part.

You have to have a Hyperic Server installed in your environment for this to work

According to the vFabric Hyperic Supported Configurations and System Requirements document, you will need a vFabric Hyperic Server (Minimum of 4vCPU's and 4GB RAM) which will suffice for up to 50 managed platforms. To get to the level of what App HA can support (400 Agents) - it is recommended to allocate 4vCPU's and 8GB RAM or more.

You will also need a vFabric Hyperic Database which by default needs 4vCPU's and 4GB RAM but to bring that up to the level of what HA can support - then it is recommended to allocate 4vCPU's and 6GB of RAM.

(it could be that these can be bundled into a single appliance - I am not sure - because I have never deployed this)

All in all to use use this new feature - you will need a minimum of 10 vCPU's and about 18GB of RAM.

In an Enterprise environment I would expect that will be acceptable - I mean since this will only be available in the Enterprise Plus version of vSphere, the customers that will be using it - should be able to ok with allocating these resources. The problem though is for testing this kind of setup - this is a huge amount of resources to allocate in the lab. Considering of course you already have your vCenter in there, and vCOPS, and VDR, and VIN, and Log Insight… you see where I am getting.

I once was shocked about how much resources were needed to run System Center 2012 - it looks like that VMware is not far behind. You can expect a blog post …. ;)

From the feedback I got in the Beta was that there will most probably be no extra charge for the Hyperic licenses. (Phew)

And of course you should not forget that you have to learn how to install a whole new line of products just to get this working (don't worry it is only another 600-1000 pages of documentation), and when was the last time you the VIAdmin checked out the Hyperic forums? Honestly… ? (You would be surprised at what you will find..)

I must say that the documentation is quite thin at the moment - and I hope that for GA it improves.

Enough dry details..

What can App HA actually do?

Up until now you had VM HA monitoring - which looked at the guest - and if the guest had crashed (blue screen, i.e. no VMware tools, no disk/network activity) then it would restart the VM - according to the settings you had configured.

Now with App HA - if you are running one of the supported applications - then App HA will recognize that a service has failed, it will try and restart that service (X amount of times - according to your settings) and if that does not work, it will restart the machine.

Yay!! (Sorry for the sarcasm) Unfortunately this still does not provide high availability for an application.

I deal with several applications that need to be highly available. Some examples that come to mind are Oracle databases, SQL databases, Exchange servers.

VMware HA or App HA still do not provide a solution for application level clustering. All of the above applications use application level clustering.

A quick run through of why this does not solve the problem

I have an SQL server protected by App HA.

  • SQL Service goes down.
  • App HA tries to restart it.
  • No go.
  • Tries again - X amount of times. 
  • No go.
  • App HA will restart the VM.
  • Perhaps when the server comes back up - then SQL may or may not start.
  • Who knows?

Why because if I understand correctly - all that App HA is doing is monitoring a process, up / down. It has no idea why SQL is not working and even more so - has no idea how to solve it.

It reminds me of the olden days (oh wait - we still do this) that when something wasn't working with your server - then let's give it a reboot and hope that it fixes it.

That is why you have a Windows Application Cluster, Oracle RAC. The application knows how to survive the failure of a service. The clustering mechanism takes care of it.

Yes, I am aware of the fact that for a good number of applications - App HA will provide a wonderful solution, even though it has a cost (both in licensing and required resources) but it will not protect your Exchange server from downtime. It will not protect your Oracle / SQL database from going down. And once the database is down - then all kinds of applications are not going to work..

This is even more important when you have a number of applications using the same databasepuzzle server - which makes more and more sense when you are trying to save on licensing.

You need to provide the appropriate level of protection for your application. From the requirements you have, the SLA will be defined.

Please do not assume - just because you can restart your SQL server when the service goes down - assures that you meet an SLA. Could it solve your problem? Perhaps..

But would you bet your job/career on it? That is the question.

So for me the puzzle is still not complete. Which reminds me I still have about 250 pieces to get done in my 1000 piece Saving Piglet puzzle that I am doing with the kids.

Update - 16/09/2013

As a result of Jeff Hunter's comment below From the comments below - I would just like to clarify - App HA is being presented (at the time of this update - screenshot below) as an alternative for application specific clustering solutions - as can be seen  on the vSphere Features page.

image