Amazon enables easy website hosting with S3 – competes with RackSpace

In a move that has put it into direct competition with competitors such as RackSpace. Amazon has announced that you can now host your website using an Amazon S3 Account. With these new features, Amazon S3 now provides a simple and inexpensive way to host your website in one place at a very cheap price.

To get started, open the Amazon S3 Management Console, and follow these steps:

1) Right-click on your Amazon S3 bucket and open the Properties pane

2) Configure your root and error documents in the Website tab

3) Click Save

Amazon provide more information on hosting a static website on Amazon S3 here.

This is part of a trend that Amazon obviously want to encourage. They recently started an add placement from JumpBox on their free Web Services developers page to offer one click WordPress deployments, amongst other JumpBox offerings.

The rise of the Cloud Data Aggregators

As storing data in the cloud becomes increasingly more normal users will increasingly find themselves in the position of needing to access different types of data regularly.  To this end we are starting to see a new breed of applications and services which themselves provide a service that interacts with data stored on the cloud. The challenge is  that services that sell their products or service based on data access are in the position of having to choose which data services to support.

This is further exacerbated in the cloud storage space as their is no ubiquitous API (see our prior post on Amazon S3 becoming a de facto standard interface).

To this end we are starting to see services an applications that themselves are offering interesting aggregations of access to data clouds. We look at a few of these below:

GoodReader, Office2 HDQuickOfficeDocuments to Go, iSMEStorage, iWork:

The iPad,  iPhone, Android have some interesting applications which function on top of existing data clouds. All the aforementioned application work in this way, either letting you view the files (in the case of GoodReader) or letting you view and edit the files (in the case of Office2, QuickOffice, Documents to Go, iWork, and iSMEStorage). The premise is that if you have data stored in an existing cloud then you can load and view or edit it in this tools and store it locally.

Tools such as iWork (which encompasses iPages, iNumbers, and iKeynote) only work with MobileMe or the WebDav standard, although the iSMEStorage App gets around this by enabling you to use iWork as an editor for files accessed through it’s cloud gateway , that can be stored on any number of clouds, using WebDav, even if the underlying cloud does not support WebDav.

In fact some companies are making data access a feature in pricing, for example,  charging extra for increased connectivity.

Gladinet.com and StorageMadeEasy.com :

Both Gladinet and SMES are unique amongst the current Cloud vendors in that they enable aggregated access to multiple file clouds. They essentially enable you to access cloud files from multiple different providers from a single file system.

Gladinet is inherently a windows only solution with many different offerings whereas Storage Made Easy also has windows software but also has cloud drives for Linux, Mac and also mobile clients for iOS, Android and BlackBerry. Gladinet is  a client side service whereas SME is a server-based service using it’s Cloud Gateway Appliance ,which is also available as a virtual appliance for VMWAre, XEN etc.

Both offering support a dizzying array of Cloud, such as, Amazon S3, Windows Azure Blob Storage, Google Storage, Google Docs, RackSpace Cloud Files etc, plus many more.

Such solutions don’t just aggregate cloud services but bring the cloud into the desktop and onto the Mobile / Tablet, making the use of cloud data much more transparent.

As data become more outsourced (to the cloud) for all types of different applications and services I expect we will see more such innovative solutions, and applications that give access to aggregated cloud data, and extend the services and tools that are provided by the native data provider.

Is Amazon S3 becoming a de facto standard interface ?

I don’t think anyone would argue that Amazon S3 is the big bear of the Cloud market, both on the virtual cloud infrastructure and the cloud storage side of things. Amazon S3 has more than 102 billion objects stored on it as of March 2010.

As befits a dominant player the interface that Amazon exposes for Amazon S3 is becoming so widely used that it almost becoming a standard with regards to how to connect into Cloud Storage. Many new or existing players in this space already support the interface as an entry point into their Storage infrastructure. For example Google Storage supports the S3 interface, as does the private cloud vendor Eucalyptus with its Walrus offering. Also the on-premise cloud appliance vendor Mezeo recently announced support for accessing their cloud using Amazon S3, as did TierraCloud. There are other Open Source implementations as well such as ParkPlace which is an Amazon S3 clone and bittorrent service that is written in ruby.

Additional to this, the multi-cloud vendor, Storage Made Easy has implemented an S3 entry point into it’s gateway so that you can use it with normal clouds even where they do not natively support Amazon S3, such as RackSpace, Google Docs, DropBox etc.

So as far as S3 goes it seems you can pretty much access a multitude of  storage back-end’s using this API, which is not surprising as vendors want to make it easy for you to move from S3 to their proposition or they want their proposition to work with existing toolsets and program code. So is it good for cloud in general ? I guess the answer to that is both ‘yes’ and ‘no’.

‘Yes’ from the point of view that standardisation can be a good thing for customers as it gives stability and promotes interoperability. ‘No’ from the point of view that standardisation can easily stifle innovation. I’m happy to say that this is not what is occurring in the cloud storage space as the work around OpenStack and Swift demonstrates.

I think right now, S3 is as close as you will get to a de facto standard for cloud storage API interactions. It probably suits Amazon that this is the case, and it certainly suits consumers / developers. Time will tell how quickly this situations lasts.

Amazon S3 showing elevated error rates

In a recent post CenterNetworks noted that the Amazon S3 service is showing elevated error rates. They noticed that several images were not loading correctly and they heard from multiple CN readers with the same issue on their sites.

They note the issues seem only to be hitting the U.S. Standard centers — other S3 centers including Northern California, Europe and Asia are functioning correctly.

Amazon S3 add RRS – Reduced Redundancy Storage

introduce a new storage option for Amazon S3 called Reduced Redundancy Storage (RRS) that enables customers to reduce their costs by storing non-critical, reproducible data at lower levels of redundancy than the standard storage of Amazon S3. It provides a cost-effective solution for distributing or sharing content that is durably stored elsewhere, or for storing thumbnails, transcoded media, or other processed data that can be easily reproduced. The RRS option stores objects on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive, but does not replicate objects as many times as standard Amazon S3 storage does, and thus is even more cost effective. Both storage options are designed to be highly available, and both are backed by Amazon S3’s Service Level Agreement.
Once customer data is stored using either Amazon S3’s standard or reduced redundancy storage options, Amazon S3 maintains durability by quickly detecting failed, corrupted, or unresponsive devices and restoring redundancy by re-replicating the data. Amazon S3 standard storage is designed to provide 99.999999999% durability and to sustain the concurrent loss of data in two facilities, while RRS is designed to provide 99.99% durability and to sustain the loss of data in a single facility.
Pricing for Amazon S3 Reduced Redundancy Storage starts at only $0.10 per gigabyte per month and decreases as you store more data. To get started using RRS and Amazon S3, visit http://aws.amazon.com/s3 or learn more by joining our May 26 webinar.
Sincerely,
The Amazon S3 Team

Amazon have introduced a new storage option for Amazon S3 called Reduced Redundancy Storage (RRS) that enables customers to reduce their costs by storing non-critical, reproducible data at lower levels of redundancy than the standard storage of Amazon S3.

It provides a cost-effective solution for distributing or sharing content that is durably stored elsewhere, or for storing thumbnails, transcoded media, or other processed data that can be easily reproduced. The RRS option stores objects on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive, but does not replicate objects as many times as standard Amazon S3 storage does, and thus is even more cost effective.

Both storage options are designed to be highly available, and both are backed by Amazon S3’s Service Level Agreement.

Once customer data is stored using either Amazon S3’s standard or reduced redundancy storage options, Amazon S3 maintains durability by quickly detecting failed, corrupted, or unresponsive devices and restoring redundancy by re-replicating the data. Amazon S3 standard storage is designed to provide 99.999999999% durability and to sustain the concurrent loss of data in two facilities, while RRS is designed to provide 99.99% durability and to sustain the loss of data in a single facility.

Pricing for Amazon S3 Reduced Redundancy Storage starts at only $0.10 per gigabyte per month and decreases as you store more data.

From a programming viewpoint to enable your storage to take advantage of RRS  you need to set the storage class of an object you upload to RRS. To enable this you set x-amz-storage-class to REDUCED_REDUNDANCY in a PUT request.

Is billing Amazon’s Achilles heel ?

istock_000000199356xsmallHaving worked on a number of projects with Amazon Web Services recently the one non-technical thing that has stood out is the billing model that Amazon adopts which basically forces the company to have a credit card available and then Amazon produce an email with the least amount of information possible on it to tell you that your credit card has been charged. If the user wants any kind of ‘Invoice’ they have to go back to their account and try and download usage amounts and associated bills. There is not one clean Invoice and a number of ‘features’ missing for this type of model…to name but a few:

What I am looking for is a way to put some control back into an Organisations hands, including:

– A way to grant access to more  granular access to users and therefore track who /which department in the company is using the service

– Central Management of billing, and an actual Invoice that can be submitted for recompense either to a another company or internally

– Ability to set budget limits, akin to what you can do to Google Adwords. 

– Alerting mechanisms to SMS when budgets near tolerance levels

– Ability to centrally track usage data so that chargeaback mechanisms can cleanly be built and used

There are numerous threads on the Amazon Web Service Community forum asking for hard copy invoices . Amazon does provide a page for tax help but its not that helpful 😉

Just some of the things floating around on the thread:

“Sounds silly, isnt’t it? But really, you can shake your head as long as you want, but tax authorities will not accept an invoice which does not state both partie’s VAT-ID number (here in italy, but its the same all over europe). 
If i go to dinner with my clients, the waiter will bring the bill in a carbon copy chemical paper. I HAVE to write my VAT-ID and full company name on it. 
Only THEN, he separates the first from the second sheet of paper, one stays in his records, one in my. 

If they check my books and find an invoice or bill which is not complaint to the formal requirements of having VAT-ID of both parties, they will not accept it and make you pay a fine. Its silly to discuss about the meaning of this, you would have to listen to a very long story about what cross-checks they do with these VAT-IDs. 

Any way, it’s not necessary that you send me a printed invoice, i can print it myself. But IT IS NECESSARY, that the invoice states clearly: 

name, address and VAT-ID of the seller 
name, address and VAT-ID of the purchaser 
description of goods and services 
invoice date, invoice number 

if any of these things are missing, the sheet of paper simply is not an invoice and trying to book it as an expense is a violation of law. 

Currently we are not able to detract AWS expenses of a few 100 US$/month due to these limitations.”

Reply to this post:

“In Czech it is even worse … we have to have hard copy with hand-writen _signature_ to be valid for tax authorities. Problems implications are then quite clear. Silly, but real in Czech. Another more detail, we can not add dinner with customer to our taxes. It has to be paid from the company net profit. “

Another example Reply:

“The same here in germany, we want to start using AWS for some projects but without a proper invoice our accounting will not give us a “go”. 

If this won’t change within this month we will either continue to work with dedicated server networks or might try the google appspot. 

Thats really a shame, because amazon does obviously know how to write correct invoices for amazon.com/.de. 

I believe that this is probably tax related, with Amazon not wanting to amass taxes for Regional entities that would be liable for country specific tax, but its a great hole right now and I don’t have much doubt that it stops further adoption of the services themselves as organisational procedures are pretty inflexible when dealing with these issues.

Mosso come out fighting against S3 / Cloudfront with Cloudfiles and Limelight

Mosso are certainly not intent on letting Amazon have everything their own way, posting on their blog, Top 10 Reasons why Cloud Files + Limelight offers a better experience than S3 + CloudFront.

Competition is a great leveler and fuels innovation so I am glad to Mosso taking the lead here. The reasons that they give are reproduced below – I’d be interested in the thoughts of S3 / Cloudfront users as to whether these would make them consider moving across or what they think in general:


1.  World-class technical support is only one click away.

Live support, with real humans based right here in our offices, is available 24/7.  And they are really, really good!

2. World-class technical support is free.

Yes, free.  As in you don’t pay for it.  And it is really, really good support!

3. You can get started in as little as one minute.

Not a programmer?  Not a problem!  You do not have to know how to code to use Cloud Files + CDN!  Our simple web-based interface makes it a snap to share your content!

4. Limelight is a tier one CDN provider.

Really, Limelight is VERY cool, and one of the foremost CDN provider’s in the industry.  That’s why we chose to partner with them!

5. No API is required to share files.

Did we mention this already?  If so, it is worth mentioning again!

6. Language-specific APIs are available, if you need them.

Not everyone knows ReST and SOAP, so we’ve created and provide support for the following language APIs – PHP, Python, Java and .NET. We do this to allow you to work in the language you feel most comfortable with.

7. Pricing for data transfer does not vary depending on edge locations.

Data transfer starts at $0.22/GB, no matter what edge location is used to share your content. This should make it easier on you when you are trying to estimate your monthly bill.

8. There are no per requests fees for CDN.

Just another way we simplify your life, and your billing.

9. There is no limit to the number of CDN-enabled containers you can create.

As far as we can tell, you can only have up to 100 distributions in Amazon’s Cloud Front system. At Mosso, we try to keep these types of arbitrary limitations to a bare minimum, not just for Cloud Files, but for all of the services we offer.

10. The Cloud Files GUI is easy to use and navigate.

Our browser based GUI let’s you easily upload a file and share it on CDN without writing a single line of code.  Heck, you don’t even need to know a programmer to share content via Cloud Files!

Differences between S3 and EBS

Amazon Elastic Block Storage (Amazon EBS) is a new type of storage designed specifically for Amazon EC2 instances. Amazon EBS allows you to create volumes that can be mounted as devices by EC2 instances. Amazon EBS volumes behave as if they were raw unformatted external hard drives and can be formatted using a file system such as ext3 (Linux) or NTFS (Windows) and mounted on an EC2 instance; files are accessed through the file system . They have user supplied device names and provide a block device interface.

For a 20 GB volume, Amazon estimates an annual failure rate for EBS volumes from 1-in-200 to 1-in-1000.  The failure rate increases as the size of the volume increases.  Therefore you either need to keep an up-to-date snapshot on S3, or have a backup of the contents somewhere else that you can restore quickly enough to meet your needs in the event of a failure.  

EBS accounts can have a  maximum of 20 volumes unless a higher limit is requested from Amazon. The maximum size of a volume is 1 TB and the storage on a volume is limited to the provisioned size and cannot be changed. EBS volumes can only be accessed from an EC2 instance in the same availability zone whereas snapshots on S3 can be accessed from any availability zone. 

Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers. S3 needs software to be able to read and write files but is hugely scalable, stores 6 copies of data for HA and redundancy, and is rumoured to be written in Erlang and is hugely scalable.

S3 accounts can have a maximum of 100 buckets, each with unlimited storage and an unlimited number of files. The maximum size of a single file is 5 GB.

S3 is subject to “eventual consistency”  which means that there may be a delay in writes appearing in the system whereas EBS has no consistency delays. Also EBS can only be accessed by one machine at a time whereas snapshots on S3 can be shared,  

In terms of performance S3 has the higher latency and also has higher variation in latency. S3 write latency can also be higher than read latency . EBS on the other hand has lower latency with less variation. It also has writeback caching for very low write latency. However be aware that writeback caching and out-of-order flushing could result in either an unpredictable file system or a database corruption

In terms of throughput S3 has  maximum  throughput (single threaded ) of approximately 20 MB/s or 25 MB/s  for multithreaded. This is on a small instance. This rises to 50 MB/s on the large and  extra large instances. EBS has a maximum  throughput limited by the network, This isapproximately 25 MB/s on a small instance and 50 MB/s on large instances,  and 100 MB/s on  extra large instances. As both S3 and EBS are shared resources they are subject to slowdown under heavy load.

For file listing S3 is slow and search is by prefix only  whereas EBS has fast directory listing and searching. S3 is performance optimized by using multiple buckets. The write performance is  optimized by writing keys in sorted order . EBS single volume performance is similar to a disk drive with writeback caching.

There is an alternative to EBS for EC2 and that is PersistentFS. With PersistentFS you mount a drive and use it like any other, but, and here is the crunch, the storage for the device is actually realized in many little chunks in an S3 storage bucket. PersistentFS is a closed-source product based on the FUSE approach.

S3 costs 15 cents per GB for storage actually used and 1 cent per 10,000 GETs, and 1 cent per 1,000 PUTs. EBS costs 10 cents per GB provisioned and 1 cent per 100,000 I/O’s. For a pricing of PersistentFS and how this compares to both S3 and EBS I suggest you read this post on the Amazon forums which as posted by the PersistentFS team.

Why, right now, Amazon is the only game in town ?

Amazon is currently the big bear of Cloud Computing Platforms. It’s web services division has proved disruptive and consistently shown innovation and breadth of services within its platform. It is growing at a rapid rate. Forty per cent of Amazon’s cross revenues are from its 3rd party merchants. Amazon Web Services is an extension of this. The core Amazon site uses its own web services to build the Amazon pages on the fly, dynamically. This results in approximately 2-300 Amazon Web Service calls. In short, it eats its own dog food.

Why are Amazon good at this ?

1. They have a deep level of technical expertise that has come from running one of the largest global online consumer marketplaces.

2. This has lead to a culture of Scale and Operational excellence.

3. They have an appetite for low margin, high volume business, and more importantly the understand it fully.

Lets look at the competition. Microsoft certainly can satisfy  the first point from the list above, but will probably have to buy the second, and certainly have not in their history demonstrated that they have the third.  For this reason we cannot expect Azure to be an instant Amazon competitor. What about RackSpace ? Well they can satisfy 1,and to a lesser extent 2, but again it is not clear that they have currently fully assimilated point 3. IBM have both 1 and 2 but again fall down point 3.  Currently Amazon are unique in the combination of what they provide, how they provide it, and how they price and make money for it.

The core ethos of the Amazon CTO, Werner Vogels, is that “everything breaks all the time“, and it is with this approach that they build their infrastructure. Amazon currently have 3 worldwide data centers. One on the east coast, one on the west coast, and one in Ireland. The intent is to have at least another in AsiaPac.  Each data centre is on a different flood plain, different power grid, and has different bandwidth provider to ensure redundancy. If S3 is used to store data then 6 copies of the data are stored. In short, the infrastructure is built to be resilient.

This does not mean there will not be outages. We know that this has occurred not just for Amazon but for other prominent online companies as well. Amazon’s SLA guarantees 99.95%  uptime for EC2 and 99.9% for S3. What does this mean in terms of downtime ? Well this is approximately 4 hours and 23 minutes per year. Not good enough ? Well reduced downtime costs and I know many, many enterprise organisations who could only dream of having downtime as low as this. Chasing 5 9’s availability is in many ways chasing the dream.  Achieving it is often more costly than the cost of outages it is meant to protect. Amazon already provides a services health dashboard for all it’s services, something Google also seems set to do. It is set to provide additional monitoring services later in the year (along with auto-scaling and load balancing services) that make the core services even better.

Amazon has proved that as soon as you take away the friction of hardware you breed innovation.The Animoto use case is a good example of this, as is their case study on the Washington Post.  There are more Amazon case studies here.

Right now, for my money, Amazon is on its own for what it is providing. Sure other companies provide hosting, and storage, and for many users they will be good enough, but for the sheer innovation and breadth of integrated services coupled with the low cost utility compute model, Amazon stands alone.

Is Amazon S3 really cheaper than the alternative ?

An interesting post asked the question why Amazon S3 is considered cheaper than the alternative  – excerpt below:

With a price tag of $0.150/GB/month, storing 1TB of data costs around $150/month on Amazon S3. But this is a recurring amount. So, for the same amount of data it would cost $1800/year and $3600/2-years. And this doesn’t even include the data transfer costs.

Consider the alternative, with colocation the hardware cost of storing 1TB of data on two machines (for redundancy) would be around $1500/year. But this is fixed. And increasing the storage capacity on each machine can be done at the price of $0.1/GB. Which means that a RAID-1+redundant copies of data on multiple servers for 4TB of data could be achieved at $3000/year and $6000/2-years in a colocation facility. Whereas on S3 the same would cost $7200/year and $14,400/2-years.

Also, adding bandwidth+power+h/w replacement costs at a colocation facility would still keep the costs significantly lower than Amazon S3.

Given this math, what is the rationale behind going with Amazon S3? The Smugmug case study of 600TB of data stored on S3 seems misleading.

I do see several services that offer unlimited storage which is actually hosted on S3. For example, Smugmug, Carbonite etc. all offer unlimited storage for a fixed annual fee. Wouldn’t this send the costs out of the roof on Amazon S3?

The CEO of SmugMug responded:

Hey there, I’m the CEO & Chief Geek at SmugMug. You’re overlooking a few things:

– Amazon keeps at least 3 copies of your data (which is what you need for high reliability) in at least 2 different geographical locations. That’s what we’d do ourselves, too, if we continued to use our own storage internally. So your math is off both on the storage costs and then the costs of maintaing two or more datacenters and the networks between them.

– When Amazon reduces their prices, you instantly get all your storage cheaper. This isn’t something you get with your capital expenditure of disks – your costs are always fixed. This has upsides and downsides, but you certainly don’t get instant price breaks to your OpEx costs. When they added cheaper, tiered storage, our bill with Amazon dropped hugely.

– There’s built-in price pressure with Amazon, too. The cost of one month’s rent is roughly the same as the cost of leaving. So if it gets too expensive (or unreliable or slow or whatever your metrics are), you can easily leave. And Amazon has incentive to keep lowering prices and improving speed & reliability to ensure you don’t leave.

– CapEx sucks. It’s hard on your cashflow, it’s hard on your debt position if you need to lease or finance (we don’t, but that just means it’s even harder on our cashflow), it’s hard on taxes (amortization sucks), etc etc. I vastly prefer reasonable OpEx costs, with no debt load, which is what Amazon gets us.

– Free data transfer in/out of EC2 can be a big win, too. It is for us, anyway.

– Our biggest win is simply that it’s easy. We have a simpler architecture, a lot less people, and a lot less worry. We get to focus on our product (sharing photos) rather than the necessary evils of doing so (managing storage). We have two ops guys for a Top 500 website with over a petabyte of storage. That’s pretty awesome.

So what does this tell us ?

1. Opex is better than Capex, especially when related to something related to the  intrinsic running of your business

2. The utility compute model reduces risk i.e. your cost of turning it off is the equivalent of one month of running the service.

3. The “ilities” that you get for free such as HA, redundant copies, geographical distribution etc would need to be paid for in an alternative model and are expensive to build in.

4. The flexibility is greater i.e. if you need to scale out to double capacity on demand then this is easily achievable with S3 but needs to be planned, built and executed in the alternative model.