New Case Studies for Amazon EC2/ Cloud Computing

Simone Brunozzi, technology evangelist for AWS in Europe, posted some further success stories / use cases for Amazon Web Services in Europe and Asia on the Amazon blog – I’ve reposted the article below as it always makes interesting reading to see how companies are embracing cloud computing, and particularly what the details are of the use case.

Industria , Iceland

Amazon-industriaIndustria’s mission is to improve the TV viewing experience .

Industria adopted the Amazon Web Services for theirZignalCloud service, as well as for the Zignal digital entertainment delivery platform. Zignal Cloud lowers the total cost of ownership for service providers and provides predictability of costs, reduces technology risks and decreases time to market. 
In their blog, they state:
 An intended consequence of this approach is that we can do it all with no upfront cost for our customers, because we’re effectively using a true cost-sharing model that offers us almost a 100% economy of scale. ”

Of course, when you use Amazon Web Services, you’re charged only for what you use, with no upfront investment . You can read more details on AWS’s offerings on our product page .

If you’re interested in ZignalCloud, you can contact Industria in Iceland, Ireland, Bulgaria, UK, Sweden or China.


Imageloop , Germany 
Imageloop
Antonio Agudo , COO of CloudAngels.eu , sent us an email documenting a nice success story, involving one of their customers, imageloop.com . This is a service that allows you to create nice slideshows, and manage pictures and widgets.

 

When they started imageloop.com’s transition to Amazon Web Services, they needed to convert all their old pictures, generating new thumbnails and output formats.

Normally that would have taken months, but since they had virtually unlimited access to cpu power with EC2, they just launched sixty c1.xlarge instances that fed off conversion jobs from SQS and were done in a day and a half.

Then, about a week later,when they were going live, they scheduled one night of downtime maintenance, and converted the images that had accumulated during the week, about 110,000 pictures , using ten EC2 instances for two hours.

Overall, imageloop.com is very pleased with the level of flexibility that Amazon provides.

From the words of Antonio : “the delivery speed of the Slideshows is way better than before and we liked the flexibility and ease with which we were able to build up the platform. Congratulations to a great product!”

And this is Stefan Riehl , imageloop.com’s CEO: “When we began evaluating alternatives to traditional hosting vendors, it became apparent that AWS’s offering is the most mature in the market.”


SnappyFingers, Bangalore, India

 

SnappyFingers is a Question and Answer search engine. SnappyFingers crawls and indexes Frequently Asked Questions on the Internet, and provides search results in a easy to view Question/Answer format.
Chirayu Patel was kind enough to share with us some details on how they use Amazon Web Services (AWS) along with some rationale behind their choices.

The three main motivations behind their choices are (in their own words):
– we are extremely reluctant to learn or do anything outside of SnappyFingers domain. We would rather outsource. 
– We are very cost conscious. 
– We do write buggy code, but we do not want our systems to die because it.
 

During the design of SnappyFingers, they considered multiple options, but at the end they picked Amazon Web Services. 
Preliminary cost analysis showed that the basic cost of the AWS alternatives would be lower in the long run. Also, there was an added advantage of not being tied to a single vendor. However, once they added the cost of managing the systems, the financial advantage of using AWS became evident. 
This, coupled with the fact that they didn’t want to be distracted with operational burdens unrelated to their core business,  meant that AWS became the obvious choice for scaling CPU/storage resources.
SnappyFingers Architecture 
SnappyFingers is comprised of two systems – a Website, and Information Retrieval System (IRS). The Website corresponds to the system that serves user requests, and the IRS is the system that does all the behind the scenes work to gather Q&A.
SnappyFingers is mostly coded in Python, Java languages, and uses multiple third party packages: notably being the Django framework , multiprocessing package in Python, and Apache Lucene , a high-performance, full-featured text search engine library written entirely in Java.

Website

The Website runs on at least three EC2 nodes , and uses the following components.
1.    nginx – An extremely fast web server, used to serve static/cached content. It is also used to reverse proxy traffic to multiple Apache servers.
2.    Apache server with mod_python to execute the Python code along with the Django framework. 
3.    Searchers to perform the actual searches on the Q&A index.
4.    Spell checkers . 
5.    PostgreSQL , for system management: recording bugs, registering new services, and such.

Caching is built into the system using a combination of memcached and file system caching. Static content is served using Amazon CloudFront . Amazon Mechanical Turk is used to test the relevancy of search results.
The Information Retrieval System (IRS) is responsible for creating Q&A indexes that will eventually be used by the searcher.  It uses multiple services to do the job:
1.    Crawlers to crawl the internet.
2.    Parsers to extract Questions and Answers from each page, detect spam, and eliminate duplicate content.
3.    Scorers to score the Q&A’s based on a number of factors. The scoring algorithms are the most dynamic pieces of code, and are under continuous evolution.
4.    Indexers to index the Q&A.
These services interact with multiple storage devices – Amazon S3, Amazon SimpleDB and Postgresql. Not all data is stored in all locations. Based on the data size, and retrieval requirements, we store the data in different locations. All data access is done through a Python based custom ORM (Object Relational Mapping) to simplify programming.

Irs

Another aspect of these services is that they can be run in any node . At times they have used a certain amount of EC2 servers, while at others they have reduced their infrastructure depending on the load and their monthly AWS budget.
At present IRS has consumed roughly 500 GBytes for a data set of 11 millionQ&A.
Intra-service communication uses the concept of pipelines, each with its own set of pipes. Each pipe ( Amazon SQS Queue) is owned by a service, which is responsible for processing messages within it. Once processing is complete messages are sent to the next pipe in the pipeline. 

This architecture has not only allowed SnappyFingers to maintain the modular nature of the system, but also to develop and deploy services in isolation with the rest of the system.

The Error Handling strategy is simple: on an error, a service will log the error and store the corresponding message in Amazon SimpleDB , and continue processing the next message. The service stops only when the error rate exceeds configured thresholds. 
Once the errors have been corrected, the corresponding messages are pushed back to Amazon SQS for completion of processing.
CPU utilization and scaling 
All the IRS services are designed to keep the CPU occupancy 100% (or to a configured value), using Python’s multiprocessing package to spawn/kill processes to maintain CPU occupancy.
The services are independent of the node on which they are running, and if there is a huge backlog of messages in Amazon SQS, more EC2 nodes can be spawned to handle the extra load.

Why, right now, Amazon is the only game in town ?

Amazon is currently the big bear of Cloud Computing Platforms. It’s web services division has proved disruptive and consistently shown innovation and breadth of services within its platform. It is growing at a rapid rate. Forty per cent of Amazon’s cross revenues are from its 3rd party merchants. Amazon Web Services is an extension of this. The core Amazon site uses its own web services to build the Amazon pages on the fly, dynamically. This results in approximately 2-300 Amazon Web Service calls. In short, it eats its own dog food.

Why are Amazon good at this ?

1. They have a deep level of technical expertise that has come from running one of the largest global online consumer marketplaces.

2. This has lead to a culture of Scale and Operational excellence.

3. They have an appetite for low margin, high volume business, and more importantly the understand it fully.

Lets look at the competition. Microsoft certainly can satisfy  the first point from the list above, but will probably have to buy the second, and certainly have not in their history demonstrated that they have the third.  For this reason we cannot expect Azure to be an instant Amazon competitor. What about RackSpace ? Well they can satisfy 1,and to a lesser extent 2, but again it is not clear that they have currently fully assimilated point 3. IBM have both 1 and 2 but again fall down point 3.  Currently Amazon are unique in the combination of what they provide, how they provide it, and how they price and make money for it.

The core ethos of the Amazon CTO, Werner Vogels, is that “everything breaks all the time“, and it is with this approach that they build their infrastructure. Amazon currently have 3 worldwide data centers. One on the east coast, one on the west coast, and one in Ireland. The intent is to have at least another in AsiaPac.  Each data centre is on a different flood plain, different power grid, and has different bandwidth provider to ensure redundancy. If S3 is used to store data then 6 copies of the data are stored. In short, the infrastructure is built to be resilient.

This does not mean there will not be outages. We know that this has occurred not just for Amazon but for other prominent online companies as well. Amazon’s SLA guarantees 99.95%  uptime for EC2 and 99.9% for S3. What does this mean in terms of downtime ? Well this is approximately 4 hours and 23 minutes per year. Not good enough ? Well reduced downtime costs and I know many, many enterprise organisations who could only dream of having downtime as low as this. Chasing 5 9’s availability is in many ways chasing the dream.  Achieving it is often more costly than the cost of outages it is meant to protect. Amazon already provides a services health dashboard for all it’s services, something Google also seems set to do. It is set to provide additional monitoring services later in the year (along with auto-scaling and load balancing services) that make the core services even better.

Amazon has proved that as soon as you take away the friction of hardware you breed innovation.The Animoto use case is a good example of this, as is their case study on the Washington Post.  There are more Amazon case studies here.

Right now, for my money, Amazon is on its own for what it is providing. Sure other companies provide hosting, and storage, and for many users they will be good enough, but for the sheer innovation and breadth of integrated services coupled with the low cost utility compute model, Amazon stands alone.