2010-10-31

Can We move to only one Physical CPU?



I started reading Eric Siebert's book this afternoon - Maximum vSphere: Tips, How-Tos, and Best Practices for Working with VMware vSphere 4 (which is a great book - highly recommended!!), and for some reason during the part that he was speaking about licensing and the features, a thought crossed my mind.

I put out a feeler this evening on Twitter with this question:

Quick twitter poll - what is your average CPU usage on your ESX hosts? <25% - <50% - <75% - >75% - Interested to hear...Sat Oct 30 18:43:27 via TweetDeck

All the answers I received all pointed to the same conclusion.

The constraint that almost everyone hits first is RAM, not CPU. Some admins cannot expand on the amount of RAM in their server, because the cost of the bigger DIMM's are too high, and there are not enough slots left in the server. Which leaves them with servers that are nearing memory capacity, but not not anywhere close on utilizing the CPU power of the server.

Many people are purchasing dual-socket servers for redundancy or because of the fear of not having the server perform well enough.

From my own environment I can say that my hosts are utilizing around 30% of their CPU, with 2 Quad Core CPU's. And from the answers I got tonight on my question above - the results are pretty much the same.

Now perhaps a sacrilegious thought. What would happen if we only used one physical processor in a server?

Today we are talking about a six or eight core processors and this number is rising. The amount of cores available are more or less the same,  as what the majority of people are using today, 8 cores - 2 x Quad-core processors.

Now you might ask, but here I lose the redundancy. This could be true, but how many of you have actually lost a CPU due to malfunction in a server? I personally have not. Ever. I would also suppose - that if a physical CPU barfs on you during a production workload - then it will not be pretty. The VM's that were running from that Processor will obviously kill over and die, but I suppose the rest of the host will not be happy either. From my experience with faulty memory, you are more likely to crash the whole host with a PSOD than having the host function with one DIMM less. I guess that with a CPU - it will probably be the same. So having redundant CPU's does not really cover it. I could be wrong here, and if so I would appreciate your feedback with more information.

Now I am sure there are other implications here, regarding the spread of memory and load over the two channels from both processors, and I am also sure that there are other internal ESX performance implications as well. So it is not a simple matter.

How will this change the game though? Well it will cut costs - in two ways.

  1. Licensing. ESX licenses are now counted per processor, and not per sets of 2. Removing one processor, will lower ESX host licensing costs by half.
  2. Server hardware. With one processor less, you are able to cut costs on each server.

So are we destined to run only a 1 Socket ESX host? I would interested in hearing your thoughts and insights on this one.

4 comments:

Jannie said...

Most of the servers we sold into SMBs (for virtualisation or otherwise) for the past three years have been single-processor for this very reason. Things are changing slightly now that Essentials Plus permits three dual-processor hosts, though. CPUs are "cheap" (<US$700 for a E5520) but licenses are not.

We are an HP partner and service a lot of kit. If we see one CPU failure a year over thousands of systems, it's a lot; and it's usually in servers that are 5+ years old - not your average vSphere host... We rather advise our clients to spend the money on guaranteed-fix (as opposed to response) maintenance contracts. System board failures are much more common than CPU failures - if I had to buy insurance, I'd rather buy something that protects the entire system than just one component. The box is going down anyway, I don't gain THAT much if I bring that 6-hour call-to-repair window down to 30 minutes if I have HA.

One final note: Personally (and I do quite a bit of x86 pre-sales), I have never been able to justify upgrading a system with additional processors. Invariably, by the time an upgrade is required, it's less expensive to buy a new server than upgrading the original system with additional processors and memory. How do you defend spending as much money on a system with its maintenance about to expire, as you would on a brand new system?

I find it incredibly frustrating when a customer insists on paying a premium for 4-socket technology "because we might want to buy more CPUs in future." In most cases I've been able to talk them out of it, with no ill effects three year down the line. (2-socket systems are another thing, as most decent entry-level systems are 2-socket anyway. Due to volumes, there is usually no cost penalty compared to 1-CPU systems. Therefore have no beans about running a 2-socket system with 1x CPU)

Jtaylor said...

I've had two hosts go down due to a malfunctioning CPU so your assumption is correct (at least in my experience). I prefer having more CPUs just so I can load up my hosts and have less total hosts, but that's just me.

Mikidutzaa said...

Even if you see only 30% usage on your server that doesn't mean you are wasting CPU, it means you have reserve for instantaneous spikes, don't be fooled by averages!

Check CPU Ready%, it should be less that 1% if your CPU is "wasted", and I bet you it is not, especially if you have multi-vcpu machines.

ChrisVirt said...

Your article has always made me wonder as well the same thing.

Some systems require both physical cpu's to be occupied to have full access to the expanded memory options (Like the Dell R710). Also generally there is no redundancy configuration with the processors themselves... however, if there was the ability to mirror the two processors for possibility of a failed processor brings a very interesting thought in this scenario.