Amazon S3 add RRS – Reduced Redundancy Storage

introduce a new storage option for Amazon S3 called Reduced Redundancy Storage (RRS) that enables customers to reduce their costs by storing non-critical, reproducible data at lower levels of redundancy than the standard storage of Amazon S3. It provides a cost-effective solution for distributing or sharing content that is durably stored elsewhere, or for storing thumbnails, transcoded media, or other processed data that can be easily reproduced. The RRS option stores objects on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive, but does not replicate objects as many times as standard Amazon S3 storage does, and thus is even more cost effective. Both storage options are designed to be highly available, and both are backed by Amazon S3’s Service Level Agreement.
Once customer data is stored using either Amazon S3’s standard or reduced redundancy storage options, Amazon S3 maintains durability by quickly detecting failed, corrupted, or unresponsive devices and restoring redundancy by re-replicating the data. Amazon S3 standard storage is designed to provide 99.999999999% durability and to sustain the concurrent loss of data in two facilities, while RRS is designed to provide 99.99% durability and to sustain the loss of data in a single facility.
Pricing for Amazon S3 Reduced Redundancy Storage starts at only $0.10 per gigabyte per month and decreases as you store more data. To get started using RRS and Amazon S3, visit http://aws.amazon.com/s3 or learn more by joining our May 26 webinar.
Sincerely,
The Amazon S3 Team

Amazon have introduced a new storage option for Amazon S3 called Reduced Redundancy Storage (RRS) that enables customers to reduce their costs by storing non-critical, reproducible data at lower levels of redundancy than the standard storage of Amazon S3.

It provides a cost-effective solution for distributing or sharing content that is durably stored elsewhere, or for storing thumbnails, transcoded media, or other processed data that can be easily reproduced. The RRS option stores objects on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive, but does not replicate objects as many times as standard Amazon S3 storage does, and thus is even more cost effective.

Both storage options are designed to be highly available, and both are backed by Amazon S3’s Service Level Agreement.

Once customer data is stored using either Amazon S3’s standard or reduced redundancy storage options, Amazon S3 maintains durability by quickly detecting failed, corrupted, or unresponsive devices and restoring redundancy by re-replicating the data. Amazon S3 standard storage is designed to provide 99.999999999% durability and to sustain the concurrent loss of data in two facilities, while RRS is designed to provide 99.99% durability and to sustain the loss of data in a single facility.

Pricing for Amazon S3 Reduced Redundancy Storage starts at only $0.10 per gigabyte per month and decreases as you store more data.

From a programming viewpoint to enable your storage to take advantage of RRS  you need to set the storage class of an object you upload to RRS. To enable this you set x-amz-storage-class to REDUCED_REDUNDANCY in a PUT request.

Amazon announce new Asia Pacific region in Singapore for their cloud services

Starting today, Asia Pacific-based businesses and global businesses with customers based in Asia Pacific can run their applications and workloads in AWS’s Singapore Region to reduce latency to end-users in Asia and to avoid the undifferentiated heavy lifting associated with maintaining and operating their own infrastructure.

The new Singapore Region launches with multiple availability zones and currently supports Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), Amazon SimpleDB, Amazon Relational Database Service (Amazon RDS), Amazon Simple Queue Service (Amazon SQS), Amazon Simple Notification Service (Amazon SNS), Amazon CloudWatch, and Amazon CloudFront. Singapore Region pricing is available on the detail page of each service, at aws.amazon.com/products.

Sun’s Grid Engine now features Cloud burst and Apache Hadoop Integration

Sun (or is that Oracle…) has released a new version of their Grid Engine which brings it into the cloud.

There are two main additions in this release. The First is is integration with Apache Hadoop in which Hadoop jobs can now be submitted to Grid Engine, as if they were any other computation job. The Grid Engine also understand Hadoop’s global file systems which means that the Grid Engine is able to send work to the correct part of the cluster (data affinity).

The second is dynamic resource reallocation which also includes the ability to use on-demand resources from Amazon EC2. Grid Engine also is now able to manage resources across logical clusters which can be either in Cloud or off Cloud. This means that Grid engine can now be configured to “cloud burst” dependent on load which is a great feature. Integration is specifically set up with EC2 and enables scale down as well as scale up.

This release of Grid Engine also implements a usage accounting and billing feature called ARCo, making it truly SaaS ready as it is able to cost and bill jobs.

Impressive and useful stuff, and if you are interested in finding out more you can do so here.

GigaSpaces finds a place in the Cloud

a new report from analysts The 451 Group outlines the success to date  that GigaSpaces has had in the Cloud Sector. The report talks about how GigaSpaces now has 76 customers using its software on cloud-computing platforms. This is up from 25 on Amazon’s EC2 in February. GigaSpaces have moved forward their cloud strategy in recent weeks, announcing support for deployments on GoGrid and also recently announcing tighter integration with VMWare which enables GigaSpaces to dynamically manage and scale VMWare instances and enable them to participate in the scaling of GigaSpaces hosted applications.

GigaSpaces have a number of hybrid deployments in which their application stack is hosted in the cloud and the data or services are hosted on premise which have had some notable successes.

The GigaSpaces product provides a strong Cloud middleware stack which encompasses Logic, data, services and messages in memory underpinned by a real-time Service Level Agreement enforcement which functions at the application level enabling the stack to scale up and out in real time based on SLA’s set by the business. As everything is held in memory, it is faster than alternative ways of trying to build enterprise scale applications in the cloud, and it has sophisticated sync’ services that enable async (or sync) of data to a DB or persistent store.

Supporting SLA’s on the Cloud

What does it take to make a Cloud Computing infrastructure enterprise ready ? Well, as always, this probably depends on the use case, but support for real-time scaling and SLA support must figure highly.

Software that purports to scale the applications on the cloud is not new, have a look at our prior blog post on this topic, and you will see some of the usual suspects such as RightScale, and Scalr. A new offering in this space is by Tibco with its Tibco Silver offering. Tibco Silver is trying to solve the problem of not whether cloud services can scale but whether the applications themselves can scale with them. This problem is addressed by Silver through ‘self aware elasticity’. Hmmm….sounds good but what exactly does that mean ? It means the system can automatically provision new cloud capacity (be that storage or compute) dependent upon fluctuations in application usage.

According to Tibco, unlike services in a service-oriented architecture cloud services are not aware of the SLA’s to which they are required to adhere and Tibco Silver is aimed at providing this missing functionality. Tibco claim that “Self-aware elasticity” is something no other vendor has developed. I would dispute this. GigaSpaces XAP with it’s ability to deploy to the cloud as well as on-premise using the same technology has very fine grained application level SLA control that when breached allows the application to react accordingly, whether this be to increase the number of threads, provision new instances or to distribute workloads in a different way. GigaSpaces Service Grid technology enables support for this real-times elasticity.  The GigaSpaces Service Grid originated from Sun’s RIO Project. (interestingly it seems GigaSpaces are doing some work on enabling their cloud tools to deploy to and manage VMWARE images on private clouds as they do with AMI’s on Amazon’s public cloud) 

Without a doubt the ability to react in real-time to application level SLA’s rather than just breaches of an SLA at an infrastructure level is something that will find a welcome home in both private and public clouds.

Amazon – what is coming soon, and what is not !

We had a meeting with Amazon in the UK recently and covered off some off the pressing issues that we wanted to speak about and also learnt some other of what Amazon have lined up.

First, what is not going to happen anytime soon:

– From what we heard Amazon are not going to resolve the issue of billing in local currency with locally issued invoices any time soon. See our prior post on this topic. We did learn however that large organisations can request an invoice.

– Right now if you want to use your own AMI image to sell on a SaaS basis using Amazon infrastructure you have to a US organisation. Again Amazon don’t seem to have plans to change this in the immediate timeframe so that leaves out any organisation outside of the US who want to sell their product offering as SaaS on Amazon’s web services infrastructure unless they integrate their own commerce infrastructure and not use DevPay. This can be both a blessing (charge margin on Amazon’s infrastructure pieces like AMQS) but also a curse (can leave you exposed as you will be month behind in billing your clients). Even though Amazon are entrenched right now as the Public Cloud infrastructure of choice, it wouldn’t be the first time we have seen 100 pound gorilla displaced from it’s prime market position. If I were Amazon, I’d fix this and soon. Microsoft and RackSpace are looking more attractive all the time.

– Amazon’s ingestion services again require you to be a US organisation with a US return address. Are you detecting a common theme here….

And what we can expect to see soon:

– VPC (Virtual private cloud) access is in private beta now. This is a mechanism for securely connecting public and private clouds within the EC2 infrastructure.

– High memory instances analogous to High CPU instances are in the pipeline

– Shared EBS is in the pipeline

– Functionality for Multiple users associated with a single account is in the pipeline and will provide simple privileges too. This has long been a bone of contention for organisations using AWS so will be welcomed.

– Amazon is planning to have lot more EC2 workshops through local partners.

Other things of note that we learnt where:

– We learned that large physical instances currently have their own dedicated blade / box.

– As AWS has grown, large number of machines are available and organizations can request hundreds of machines easily. Even extreme cases are catered for i.e. even requests for 50000 machines.

– As a matter of policy new functionally will be rolled out simultaneously in EU and US unless there is a good reason.

All in all some exciting stuff, and there was other things in the pipeline they could not share, but the public cloud market is starting to get more players and I think Amazon need to get some of their infrastructure pieces in place sooner rather than later.

Amazon Elastic MapReduce now available in Europe

From the Amazon Web Services Blog:

 Earlier this year I wrote about Amazon Elastic MapReduce and the ways in which it can be used to process large data sets on a cluster of processors. Since the announcement, our customers have wholeheartedly embraced the service and have been doing some very impressive work with it (more on this in a moment).

Today I am pleased to announce Amazon Elastic MapReduce job flows can now be run in our European region. You can launch jobs in Europe by simply choosing the new region from the menu. The jobs will run on EC2 instances in Europe and usage will be billed at those rates.

 Because the input and output locations for Elastic MapReduce jobs are specified in terms of URLs to S3 buckets, you can process data from US-hosted buckets in Europe, storing the results in Europe or in the US. Since this is an internet data transfer, the usual EC2 and S3 bandwidth charges will apply.

Our customers are doing some interesting things with Elastic MapReduce.

 At the recent Hadoop Summit, online shopping site ExtraBux described their multi-stage processing pipeline. The pipeline is fed with data supplied by their merchant partners. This data is preprocessed on some EC2 instances and then stored on a collection of Elastic Block Store volumes.The first MapReduce step processes this data into a common format and stores it in HDFS form for further processing. Additional processing steps transform the data and product images into final form for presentation to online shoppers. You can learn more about this work in Jinesh Varia’s Hadoop Summit Presentation.

Online dating site eHarmony is also making good use of Elastic MapReduce, processing tens of gigabytes of data representing hundreds of millions of users, each with several hundred attributes to be matched. According to an article on SearchCloudComputing.com, they are doing this work for $1,200 per month, a considerable savings from the $5,000 per month that they estimated it would cost them to do it internally.

We’ve added some articles to our Resource Center to help you to use Elastic MapReduce in your own applications. Here’s what we have so far:

 

 

You should also check out AWS Evangelist Jinesh Varia in this video from the Hadoop Summit:

— Jeff;

PS – If you have a lot of data that you would like to process on Elastic MapReduce, don’t forget to check out the new AWS Import/Export service. You can send your physical media to us and we’ll take care of loading it into Amazon S3 for you.

EC2 Linux Monitoring & Tuning Tips

When deploying on EC2 even though Amazon provides the hardware infrastructure, you still need to tune your instances operating system and monitor your application. You should review your hardware/software requirements and review your application design and deployment strategy

The Operating System

Change ulimit

‘ulimit’ Specifies the number of open files that are supported. If the value set for this parameter is too low, a file open error, memory allocation failure, or connection establishment error might be displayed. By default this is set to 1024 , normally you should increase this to at least 8096.

Issue the following command to set the value.

ulimit -n 8096

Use the ulimit -a command to display the current values for all limitations on system resources

Tune the Network

A good in detail reference for Linux IP tuning is here.  Some of the  important parameters to change  for distributed applications are below:

TCP_FIN_TIMEOUT

The tcp_fin_timeout variable tells kernel how long to keep sockets in the state FIN-WAIT-2 if you were the one closing the socketThis value takes an integer value which is per default set to 60 seconds. To set the value to 30  issue the command

echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

TCP_KEEPALIVE_INTERVAL

The tcp_keepalive_intvl variable tells the kernel how long to wait for a reply on each keepalive probe. This value is in other words extremely important when you try to calculate how long time will go before your connection will die a keepalive death. The variable takes an integer value and the default value is 75 seconds. To set the value to 15 issue the following command

echo 15 > /proc/sys/net/ipv4/tcp_keepalive_intvl

TCP_KEEPALIVE_PROBES

The tcp_keepalive_probes variable tells the kernel how many TCP keepalive probes to send out before it decides a specific connection is broken.
This variable takes an integer value, The default value is to send out 9 probes before telling the application that the connection is broken. To change the valueto 5  use the following command.

echo 5 > /proc/sys/net/ipv4/tcp_keepalive_probes

 

Monitoring

You can monitor the system resources using command line but to make life easier you can use monitoring systems.  Couple of free opensource monitoring tools that we use

  • Ganglia a free monitoring system
  • Hyperic they have both a commercial and free offering

 

Logging

You will be amazed how few projects care about logging until they have hit a problem. Have a consistent logging procedure in place to collect the logs from different machines to troubleshot in case of a problem

Linux Commands

Some linux command that we use regulary to you might find useful. More details can be found here, here and here

  • top: display Linux tasks
  • vmstat Report virtual memory statistics
  • free Display amount of free and used memory in the system
  • netstat Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
  • ps Report a snapshot of the current processes
  • iostat Report Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions
  • sar Collect, report, or save system activity information
  • tcpdump dump traffic on a network
  • strace trace system calls and signals

Is billing Amazon’s Achilles heel ?

istock_000000199356xsmallHaving worked on a number of projects with Amazon Web Services recently the one non-technical thing that has stood out is the billing model that Amazon adopts which basically forces the company to have a credit card available and then Amazon produce an email with the least amount of information possible on it to tell you that your credit card has been charged. If the user wants any kind of ‘Invoice’ they have to go back to their account and try and download usage amounts and associated bills. There is not one clean Invoice and a number of ‘features’ missing for this type of model…to name but a few:

What I am looking for is a way to put some control back into an Organisations hands, including:

– A way to grant access to more  granular access to users and therefore track who /which department in the company is using the service

– Central Management of billing, and an actual Invoice that can be submitted for recompense either to a another company or internally

– Ability to set budget limits, akin to what you can do to Google Adwords. 

– Alerting mechanisms to SMS when budgets near tolerance levels

– Ability to centrally track usage data so that chargeaback mechanisms can cleanly be built and used

There are numerous threads on the Amazon Web Service Community forum asking for hard copy invoices . Amazon does provide a page for tax help but its not that helpful 😉

Just some of the things floating around on the thread:

“Sounds silly, isnt’t it? But really, you can shake your head as long as you want, but tax authorities will not accept an invoice which does not state both partie’s VAT-ID number (here in italy, but its the same all over europe). 
If i go to dinner with my clients, the waiter will bring the bill in a carbon copy chemical paper. I HAVE to write my VAT-ID and full company name on it. 
Only THEN, he separates the first from the second sheet of paper, one stays in his records, one in my. 

If they check my books and find an invoice or bill which is not complaint to the formal requirements of having VAT-ID of both parties, they will not accept it and make you pay a fine. Its silly to discuss about the meaning of this, you would have to listen to a very long story about what cross-checks they do with these VAT-IDs. 

Any way, it’s not necessary that you send me a printed invoice, i can print it myself. But IT IS NECESSARY, that the invoice states clearly: 

name, address and VAT-ID of the seller 
name, address and VAT-ID of the purchaser 
description of goods and services 
invoice date, invoice number 

if any of these things are missing, the sheet of paper simply is not an invoice and trying to book it as an expense is a violation of law. 

Currently we are not able to detract AWS expenses of a few 100 US$/month due to these limitations.”

Reply to this post:

“In Czech it is even worse … we have to have hard copy with hand-writen _signature_ to be valid for tax authorities. Problems implications are then quite clear. Silly, but real in Czech. Another more detail, we can not add dinner with customer to our taxes. It has to be paid from the company net profit. “

Another example Reply:

“The same here in germany, we want to start using AWS for some projects but without a proper invoice our accounting will not give us a “go”. 

If this won’t change within this month we will either continue to work with dedicated server networks or might try the google appspot. 

Thats really a shame, because amazon does obviously know how to write correct invoices for amazon.com/.de. 

I believe that this is probably tax related, with Amazon not wanting to amass taxes for Regional entities that would be liable for country specific tax, but its a great hole right now and I don’t have much doubt that it stops further adoption of the services themselves as organisational procedures are pretty inflexible when dealing with these issues.

Cloud Views Conference Presentations

The recent Cloud Views conference presentations can be viewed below: