Sun’s Grid Engine now features Cloud burst and Apache Hadoop Integration

Sun (or is that Oracle…) has released a new version of their Grid Engine which brings it into the cloud.

There are two main additions in this release. The First is is integration with Apache Hadoop in which Hadoop jobs can now be submitted to Grid Engine, as if they were any other computation job. The Grid Engine also understand Hadoop’s global file systems which means that the Grid Engine is able to send work to the correct part of the cluster (data affinity).

The second is dynamic resource reallocation which also includes the ability to use on-demand resources from Amazon EC2. Grid Engine also is now able to manage resources across logical clusters which can be either in Cloud or off Cloud. This means that Grid engine can now be configured to “cloud burst” dependent on load which is a great feature. Integration is specifically set up with EC2 and enables scale down as well as scale up.

This release of Grid Engine also implements a usage accounting and billing feature called ARCo, making it truly SaaS ready as it is able to cost and bill jobs.

Impressive and useful stuff, and if you are interested in finding out more you can do so here.

Is average utilisation of servers in Data Centers really between 10 and 15% ?

Server RacksThere has been  an interesting discussion occurring on The Cloud Computing forum hosted on Google Groups (and if you are all interested in Cloud I recommend you join this as it really does have some excellent discussions). What has been interesting about it from my viewpoint is that there is a general consensus that the average CPU utilisation in organisational data centre’s runs between 10 and 15%. Some snippets of the discussion below:

 

Initial Statement on the Group discussion

The Wallstreet Journal article “Internet Industry is on a Cloud” does not do Cloud computing any justice at all.

First: Value proposition of Cloud computing is crystal clear. Averaged over 24 hours, and 7 days a week , 52 weeks in a year most servers have a CPU utilization of 1% or less.  The same is also true of network bandwidth. The storage capacity on harddisks that can be accessed only from a specific servers is also underutilized. For example, harddisk capacity of hard disks attached to a database server, is used only when certain queries that require intermediate results to be stored to the harddisk.  At all other times the hard disk capacity is not used at all.

First response to the statement above on the group

Utilization of *** 1 % or less *** ???

Who fed them this? I have seen actual collected data from 1000s of customers showing server utilization, and it’s consistently 10-15%. (Except mainframes.) (But including big proprietary UNIX systems.)

2nd Response:

Mea Culpa. My 1% figure is not authoritative.  It is based on my experience with a specific set of servers: 

J2EE Application Servers: Only one application is allowed per cluster of servers. So if  you had 15% utilization when you designed the application 8 years ago, on current servers it could be 5% or less.  With applications that are used only few hours per week,  1% is certainly possible.  The other set of servers for which utilization is really low are: departmental web servers and mail servers.

3rd Response 
  

Actually, it was across a very large set of companies that hired IBM Global Services to manage their systems. Once a month, along with a bill, each company got a report on outages, costs, … and utilization.

A friend of mine heard of this, and asked “are you, by any chance, archiving those utilization numbers anywhere?” When the answer came back “Yes” — you can guess the rest. He drew graphs of # of servers at a given utilization level. He was astonished that for every category of server he had data on, the graphs all peaked between 10% and 15%. In fact, the mean, the median, and the mode of the distributions were all in that range. Which also indicates that it’s a range. Some were nearer zero, and some were out past 90%. That yours was 1% is no shock. 

4th Response:

This is no surprise for me, as HPC packages like Sun Grid Engine working on batch jobs can increase close to 90% utilization. We had data that without a workload manager of sorts, the average utilization is 10% to 15%, confirming what you discovered.

This means world wide, 85% to 90% of the installed computing capacity is sitting idle. Grids improved this utilization rate dramatically, but grid adoption was limited. 

If this is not an argument for virtualisation in private data centers / clouds then I don’t know what is. It should also be a big kicker for those who who are considering moving applications to public clouds, out of the data centre and the racks of machines spinning their wheels. It is also a good example of companies planning for Peak capacity (see our previous blog on this). What is really needed is scale on demand and hybrid cloud / Grid technologies such as GigaSpaces which can react to Peak loading in real-time. Consider not only the wasted cost but also the “Green computing” cost for the running of hordes of machines running at 15% capacity….