GigaSpaces release 7.1 of XAP Cloud enabled Middleware – certified for use on Cisco UCS

The upcoming release of GigaSpaces XAP includes the ‘Elastic Data Grid’, which enables deploying a full clustered application with a single API call. Users basically specify their business requirements and XAP automatically performs sizing, hardware provisioning, configuration and deployment. The aim of this is to  provide simplification resulting in reduced effort and cost savings for enterprise applications that require dynamic scalability.

Other features of the XAP 7.1 release include:

  • Certified for use with Cisco UCS, providing enhanced performance
  • Built-in multi-tenancy
  • Extended in-memory querying capabilities
  • Real-time distributed troubleshooting
  • Multi-core utilization

More detail can be found from the GigaSpaces website.

Sun’s Grid Engine now features Cloud burst and Apache Hadoop Integration

Sun (or is that Oracle…) has released a new version of their Grid Engine which brings it into the cloud.

There are two main additions in this release. The First is is integration with Apache Hadoop in which Hadoop jobs can now be submitted to Grid Engine, as if they were any other computation job. The Grid Engine also understand Hadoop’s global file systems which means that the Grid Engine is able to send work to the correct part of the cluster (data affinity).

The second is dynamic resource reallocation which also includes the ability to use on-demand resources from Amazon EC2. Grid Engine also is now able to manage resources across logical clusters which can be either in Cloud or off Cloud. This means that Grid engine can now be configured to “cloud burst” dependent on load which is a great feature. Integration is specifically set up with EC2 and enables scale down as well as scale up.

This release of Grid Engine also implements a usage accounting and billing feature called ARCo, making it truly SaaS ready as it is able to cost and bill jobs.

Impressive and useful stuff, and if you are interested in finding out more you can do so here.

GigaSpaces finds a place in the Cloud

a new report from analysts The 451 Group outlines the success to date  that GigaSpaces has had in the Cloud Sector. The report talks about how GigaSpaces now has 76 customers using its software on cloud-computing platforms. This is up from 25 on Amazon’s EC2 in February. GigaSpaces have moved forward their cloud strategy in recent weeks, announcing support for deployments on GoGrid and also recently announcing tighter integration with VMWare which enables GigaSpaces to dynamically manage and scale VMWare instances and enable them to participate in the scaling of GigaSpaces hosted applications.

GigaSpaces have a number of hybrid deployments in which their application stack is hosted in the cloud and the data or services are hosted on premise which have had some notable successes.

The GigaSpaces product provides a strong Cloud middleware stack which encompasses Logic, data, services and messages in memory underpinned by a real-time Service Level Agreement enforcement which functions at the application level enabling the stack to scale up and out in real time based on SLA’s set by the business. As everything is held in memory, it is faster than alternative ways of trying to build enterprise scale applications in the cloud, and it has sophisticated sync’ services that enable async (or sync) of data to a DB or persistent store.

Supporting SLA’s on the Cloud

What does it take to make a Cloud Computing infrastructure enterprise ready ? Well, as always, this probably depends on the use case, but support for real-time scaling and SLA support must figure highly.

Software that purports to scale the applications on the cloud is not new, have a look at our prior blog post on this topic, and you will see some of the usual suspects such as RightScale, and Scalr. A new offering in this space is by Tibco with its Tibco Silver offering. Tibco Silver is trying to solve the problem of not whether cloud services can scale but whether the applications themselves can scale with them. This problem is addressed by Silver through ‘self aware elasticity’. Hmmm….sounds good but what exactly does that mean ? It means the system can automatically provision new cloud capacity (be that storage or compute) dependent upon fluctuations in application usage.

According to Tibco, unlike services in a service-oriented architecture cloud services are not aware of the SLA’s to which they are required to adhere and Tibco Silver is aimed at providing this missing functionality. Tibco claim that “Self-aware elasticity” is something no other vendor has developed. I would dispute this. GigaSpaces XAP with it’s ability to deploy to the cloud as well as on-premise using the same technology has very fine grained application level SLA control that when breached allows the application to react accordingly, whether this be to increase the number of threads, provision new instances or to distribute workloads in a different way. GigaSpaces Service Grid technology enables support for this real-times elasticity.  The GigaSpaces Service Grid originated from Sun’s RIO Project. (interestingly it seems GigaSpaces are doing some work on enabling their cloud tools to deploy to and manage VMWARE images on private clouds as they do with AMI’s on Amazon’s public cloud) 

Without a doubt the ability to react in real-time to application level SLA’s rather than just breaches of an SLA at an infrastructure level is something that will find a welcome home in both private and public clouds.

Cloud Best Practice – what to be aware of !

Some of the key things to think about when putting your application on the cloud are discussed below. Cloud computing is relatively new, and best practice is still being established. However we can learn from earlier technologies and concepts such as utility compute, SaaS, outsourcing and even internal enterprise centre management, as well as from experience with vendors such as Amazon and FlexiScale.

Licensing: If you are using the cloud for spikes or overspill make sure that the products you want to use in the cloud can be used in this way. Certain products restrict their licenses to be used from a cloud perspective. This is especially true of commercial Grid, HPC or DataGrid vendors.

Data transfer costs:  When using a provider like Amazon with a detailed cost model, make sure that any data transfers are internal to the provider network rather than external. In the case of Amazon, internal traffic is free but you will be charged for any traffic over the external IP addresses.

Latency: If you have low latency requirements then the Cloud may not be the best environment to achieve this. If you are trying to run an ERP or some such system in the cloud then the latency may be good enough but if you are trying to run a binary or FX Exchange then of course the latency requirements are very different and more stringent. It is essential to make sure you understand the performance requirements of your application and have a clear understanding of what is deemed business critical.

One Grid vendor who has focused on attacking low latency in the cloud is GigaSpaces and so if you require cloud low latency then these are one of the companies you should evaluate. Also for processing distributed data loads there is the map reduce pattern and Hadoop. These type of architectures eliminating the boundaries created by scale-out database based approaches.

State: Check whether your cloud infrastructure providers have persistence.  When an application is brought down and then back up all local changes will be wiped and you start with a blank slate. This obviously has ramifications with instances that need to store user or application state.  To combat this on their platform Amazon rdelivered EC2 persistent storage in which data can remain linked to a specific computing instance. You should ensure you understand the state limitations of any Cloud Computing platform that you work with.

Data Regulations: If you are storing data in the cloud you may be breaching data laws depending where your data is stored i.e. which country or continent.  To combat this Amazon S3 now supports location constraints, which allow you to specify where in the world to store data for a bucket and provides a new API to retrieve the location constraint for an existing bucket. However if you are using another cloud provider you should check where your data is stored.

Dependencies:  Be aware of dependencies of service providers. If service ‘y’ is dependant on ‘x’ then if you subscribe to service ‘y’ and service ‘x’ goes down you lose your service. Always check any dependencies when you are using a cloud service.

Standardisation: A major issue with current cloud computing platforms is that there is no standardisation of the APIs and platform technologies that underpin the services provided. Although this represents a lack of maturity you need to consider how locked in you are when considering a Cloud platform or migrating between cloud computing platforms will be very difficult if not impossible. This may not be an issue if your supplier is IBM and always likely to be IBM, but it will be an issue if you are just dipping your toe in the water and discover that other platforms are better suited to your needs.

Security: Lack of security or apparent lack of security is one of the perceived major drawbacks of working with Cloud platform and Cloud technology. When moving sensitive data about or storing it in public cloud it should be encrypted. And it is important to consider a secure ID mechanism for authentication and authorisation for services. As with normal enterprise infrastructures only open the ports needed and consider installing a host based intrusion detection systems such as OSSEC. The advantage of working with an enterprise Cloud provider, such as IBM or Sun is that many of these security optimisations are already taken care of. See our prior blog entry for securing n-tier and distributed applications on the cloud.

Compliance:  Regulatory controls mean that certain applications may not be able to deployed in the Cloud. For example the US Patriot Act could have very serious consequences for non-US firms considering U.S. hosted cloud providers. Be aware that often cloud computing platforms are made up of components from a variety of vendors who may themselves provide computing in a variety of legal jurisdictions. Be very aware of the dependencies and ensure you factor this into any operational risk management assessment. See also our prior blog entry on this topic

Quality of service: You will need to ensure that the behaviour and effectiveness of the cloud application that you implement can be measured and tracked both to meet existing or new Service Level agreements. We have discussed previously some of the tools that come with this option built in (GigaSpaces) and other tools that provide functionality that enable you to use this with your Cloud Architecture (RightScale, Scalr etc). Achieving Quality of Service will encompass scaling, reliability, service fluidity, monitoring, management and system performance.

System hardening: Like all enterprise application infrastructures you need to harden the system so that it is secure, robust, and achieves the necessary functional requirements that you need. See our prior blog entry on system hardening for Amazon EC2.

Content adapted from “TheSavvyGuideTo HPC, Grid, DataGrid, Virtualisation and Cloud Computing” available on Amazon.

Sun’s Cloud Computing Vision

The below presentation is a very good presentation on Sun’s Cloud Computing Vision. It covers all areas, including public and private cloud, as well as open source initiatives. It’s a very good intro’ to get the skinny on where Sun is right now with Cloud and where it intends to focus and drive forward.

Is average utilisation of servers in Data Centers really between 10 and 15% ?

Server RacksThere has been  an interesting discussion occurring on The Cloud Computing forum hosted on Google Groups (and if you are all interested in Cloud I recommend you join this as it really does have some excellent discussions). What has been interesting about it from my viewpoint is that there is a general consensus that the average CPU utilisation in organisational data centre’s runs between 10 and 15%. Some snippets of the discussion below:


Initial Statement on the Group discussion

The Wallstreet Journal article “Internet Industry is on a Cloud” does not do Cloud computing any justice at all.

First: Value proposition of Cloud computing is crystal clear. Averaged over 24 hours, and 7 days a week , 52 weeks in a year most servers have a CPU utilization of 1% or less.  The same is also true of network bandwidth. The storage capacity on harddisks that can be accessed only from a specific servers is also underutilized. For example, harddisk capacity of hard disks attached to a database server, is used only when certain queries that require intermediate results to be stored to the harddisk.  At all other times the hard disk capacity is not used at all.

First response to the statement above on the group

Utilization of *** 1 % or less *** ???

Who fed them this? I have seen actual collected data from 1000s of customers showing server utilization, and it’s consistently 10-15%. (Except mainframes.) (But including big proprietary UNIX systems.)

2nd Response:

Mea Culpa. My 1% figure is not authoritative.  It is based on my experience with a specific set of servers: 

J2EE Application Servers: Only one application is allowed per cluster of servers. So if  you had 15% utilization when you designed the application 8 years ago, on current servers it could be 5% or less.  With applications that are used only few hours per week,  1% is certainly possible.  The other set of servers for which utilization is really low are: departmental web servers and mail servers.

3rd Response 

Actually, it was across a very large set of companies that hired IBM Global Services to manage their systems. Once a month, along with a bill, each company got a report on outages, costs, … and utilization.

A friend of mine heard of this, and asked “are you, by any chance, archiving those utilization numbers anywhere?” When the answer came back “Yes” — you can guess the rest. He drew graphs of # of servers at a given utilization level. He was astonished that for every category of server he had data on, the graphs all peaked between 10% and 15%. In fact, the mean, the median, and the mode of the distributions were all in that range. Which also indicates that it’s a range. Some were nearer zero, and some were out past 90%. That yours was 1% is no shock. 

4th Response:

This is no surprise for me, as HPC packages like Sun Grid Engine working on batch jobs can increase close to 90% utilization. We had data that without a workload manager of sorts, the average utilization is 10% to 15%, confirming what you discovered.

This means world wide, 85% to 90% of the installed computing capacity is sitting idle. Grids improved this utilization rate dramatically, but grid adoption was limited. 

If this is not an argument for virtualisation in private data centers / clouds then I don’t know what is. It should also be a big kicker for those who who are considering moving applications to public clouds, out of the data centre and the racks of machines spinning their wheels. It is also a good example of companies planning for Peak capacity (see our previous blog on this). What is really needed is scale on demand and hybrid cloud / Grid technologies such as GigaSpaces which can react to Peak loading in real-time. Consider not only the wasted cost but also the “Green computing” cost for the running of hordes of machines running at 15% capacity….


How do you design and handle peak load on the Cloud ?

We see these questions time and time again – “How do I design for Peak load” and “How do I scale out on the cloud?”. First lets figure out how to give some definition for Peak load:

We will make a stab at defining peak load as: “A percentage of activity on a day/week/month/year that comes within a window of a few hours and is deemed as extreme and occurs because of either seasonality or because of unpredictable spikes.”

The Thomas Consulting Group have a good stab(ppt) at a forumla to try and predict and plan for Peak load. Their formula and example is shown below:

H = peak hits per second
h = # hits received over a one month period
a = % of activity that comes during peak time
t = peak time in hours
H = h * a / (days * t * minutes * seconds)
H = h * a / (108,000 * t)

Determine the peak Virtual Users: Peak hits/second + page view times

U = peak virtual users
H = peak hits per second
p = average number of hits / page
v = average time a user views a page

U = (H / p) * v


h = 150,000,000 hits per month
a = 10% of traffic occurs during peak time
t = peak time is 2 hours
p = a page consists of 6 hits
v = the average view time is 30 seconds

H = (h X a) / (108,000 * t)
H = (150,000,000 * .1) / (108,000 X 2)
H = 48

U = (H / p) * v
U = (48 / 6) * 30
U = 8 * 30
U = 240

Desired Metric – 48 Hits / Sec or 240 Virtual Users

In the example Thomas Consulting present above  Peak load is 15,000 hits in two hours whereas the  normal average of hits for two hours is 411 [(((h*12)/365)/24)*2]. This is over a 70% increase and a huge difference, and this example is not even extreme. Online web consumer companies can do 70% of their yearly business in December alone.

Depending on the what else occurs during the transactionality of the hits, then this could be difference between having 1 EC2 instance and having 10, or a cost difference between $6912 to $82,944 over the course of a year (based on a large Amazon EC2 instance).  And of course building for what you think is peak can still lead to problems. A famous quote by Scott Gulbransen from Intuit is:

“Every year, we take the busiest minute of the busiest hour of the busiest day and build capacity on that, We built our systems to (handle that load) and we went above and beyond that.” Despite this the systems still could not handle the load.

What we really want to be able to do is to have our site build for our average load, excluding peak, and have scale on demand built into the Architecture. As EC2 is the most mature cloud platform we will look at tools that can achieve this on EC2:

GigaSpaces XAP: From version 6.6 of the GigaSpaces XAP Platform Cloud tooling is built in. GigaSpaces is a next generation virtualised middleware platform that hosts logic, data, and messaging in-memory, and has less moving parts so that scaling out can be achieved linearly, unlike traditional middlware platforms. GigaSpaces us underpinned by a service grid which enables application level Service Level Agreement’s to be set and which are monitored and acted on in real-time. This means if load is increased then GigaSpaces can scale threads or the number of virtualised middlware instances to ensure that the SLA is met, which in our example would be the ability to act process the number of requests. GigaSpaces also partner with RightScale. GigaSpaces lets you try their Cloud offering for free before following the traditional utility compute pricing model.

Scalr:Scalr is a series of  Amazon Machine Images (AMI), for basic website needs i.e. an app server, a load balancer, and a database server. The AMIs are pre-built with a management suite that monitors the load and operating status of the various servers on the cloud. Scalr purports to increase / decrease capacity as demand fluctuates, as well as detecting and rebuilding improperly functioning instances. Scalr has open source and commercial versions and is a relatively new infrastructure service / application. We liked the the  ‘Synchronize to All’ features of Scalr.  This auto-bundles an AMI and then re-deploys it on a new instance. It does this without interrupting the core running of your site. This saves time going through the EC2 image/AMI creation process. To find out more about Scalr you should check out the Scalr Google Groups forum.

RightScale: RightScale has an automated Cloud Management platform. RightScale services include auto-scaling of servers according to usage load, and pre-built installation templates for common software stacks. RightScale support Amazon EC2, Eucalyptus, FlexiScale, and GoGrid. They are quoted as saying that RackSpace support will happen also at some point.  RightScale has a great case study oveview on their blog about Animoto and also explains how they have launched, configured and managed over 200,oo0 instances to date. RightScale are VC backed and in December 2008 did a $13 million series B funding round. RightScale have free and commercial offerings.

FreedomOSS:  Freedom OSS has created custom templates, called jPaaS (JBoss Platform as a Service), for scaling resources such as JBoss Application Server, JBoss Messaging, JBoss Rules, jBPM, Hibernate and JBoss Seam. jPaaS monitors the instances for load and scales them as necessary. jPaaS takes care of updating the vhosts file and other relevant configuration files to ensure that all instances of Apache respond to this hostname. The newly deployed app that runs either on Tomcat or JBoss becomes part of the new app server image.

New HPC and Cloud Book

A new book co-authored by Jim Liddle, one of the contributors to this blog has been released entitled “TheSavvyGuideTo HPC, Grid, DataGrid, Virtualisation and Cloud Computing. The aim of TheSavvyGuideTo book range is to get people up to speed as quickly as possible on the subject matter of the books.

Although the book covers a plethora of technologies it is a good read for anyone who wants a good concise overview of the space be they a manager, consultant or developer.