Why, right now, Amazon is the only game in town ?

Amazon is currently the big bear of Cloud Computing Platforms. It’s web services division has proved disruptive and consistently shown innovation and breadth of services within its platform. It is growing at a rapid rate. Forty per cent of Amazon’s cross revenues are from its 3rd party merchants. Amazon Web Services is an extension of this. The core Amazon site uses its own web services to build the Amazon pages on the fly, dynamically. This results in approximately 2-300 Amazon Web Service calls. In short, it eats its own dog food.

Why are Amazon good at this ?

1. They have a deep level of technical expertise that has come from running one of the largest global online consumer marketplaces.

2. This has lead to a culture of Scale and Operational excellence.

3. They have an appetite for low margin, high volume business, and more importantly the understand it fully.

Lets look at the competition. Microsoft certainly can satisfy  the first point from the list above, but will probably have to buy the second, and certainly have not in their history demonstrated that they have the third.  For this reason we cannot expect Azure to be an instant Amazon competitor. What about RackSpace ? Well they can satisfy 1,and to a lesser extent 2, but again it is not clear that they have currently fully assimilated point 3. IBM have both 1 and 2 but again fall down point 3.  Currently Amazon are unique in the combination of what they provide, how they provide it, and how they price and make money for it.

The core ethos of the Amazon CTO, Werner Vogels, is that “everything breaks all the time“, and it is with this approach that they build their infrastructure. Amazon currently have 3 worldwide data centers. One on the east coast, one on the west coast, and one in Ireland. The intent is to have at least another in AsiaPac.  Each data centre is on a different flood plain, different power grid, and has different bandwidth provider to ensure redundancy. If S3 is used to store data then 6 copies of the data are stored. In short, the infrastructure is built to be resilient.

This does not mean there will not be outages. We know that this has occurred not just for Amazon but for other prominent online companies as well. Amazon’s SLA guarantees 99.95%  uptime for EC2 and 99.9% for S3. What does this mean in terms of downtime ? Well this is approximately 4 hours and 23 minutes per year. Not good enough ? Well reduced downtime costs and I know many, many enterprise organisations who could only dream of having downtime as low as this. Chasing 5 9’s availability is in many ways chasing the dream.  Achieving it is often more costly than the cost of outages it is meant to protect. Amazon already provides a services health dashboard for all it’s services, something Google also seems set to do. It is set to provide additional monitoring services later in the year (along with auto-scaling and load balancing services) that make the core services even better.

Amazon has proved that as soon as you take away the friction of hardware you breed innovation.The Animoto use case is a good example of this, as is their case study on the Washington Post.  There are more Amazon case studies here.

Right now, for my money, Amazon is on its own for what it is providing. Sure other companies provide hosting, and storage, and for many users they will be good enough, but for the sheer innovation and breadth of integrated services coupled with the low cost utility compute model, Amazon stands alone.

Securing n-tier and distributed applications on EC2

In this post I will walk you through the  high level  of securing a normal tiered application running on EC2. First I will cover the basics of what EC2 provides and then briefly discuss how this can be used in a real life scenario.

Security Groups

For Network security EC2 provides a security groups, security groups are essentially inbound firewalls  suited to the dynamic nature of EC2.  Using security groups you can specify which incoming network traffic should be delivered to your instance.

  • The default mode is to deny access, you have to explicitly open ports to allow for inbound network traffic
  • If no security group is specified a special default group is assigned to the instance. This group allows all network traffic from other members of this group and discards traffic from other IP addresses and groups. You can change settings for this group
  • You can assign multiple security groups to an AMI instance.
  • The security groups for an instance are set at launch time and can not be changed. You can dynamically modify the rules in a security group and the new rules are automatically enforced for all running and future instance, there may be a small delay depending on the number of instances
  • You can control access either from  named security groups or source IP address range. You can specify the protocol(TCP, UDP, or ICMP) , individual ports or port range to open
  • You can allow access to other users security groups using user-group pair
  • The current API (Amazon EC2 on 2008-12-17) does not support port ranges for security group using command line tools or Query API, you will need to use SOAP API
  • An account can have a maximum of 100 security groups
  • Security groups are just access rules applied to a single or collection of instances, if two instances are part of the same security group this does not afford them any special access between them.
  • An instance running in promiscuous can not sniff any traffic intended for a different instance.
  • A running instance cannot change security group access rules. You need access keys or X 509 key to authorize change.
  • In the instance you can get the security group information from the instance meta-data (curl http://169.254.169.254/1.0/meta-data/security-groups)

Key Pair

Amazon discourages the use of passwords and the normal way to access an instance is using ssh and a private key. Amazon EC2 provides facilities to generate the key(2048 bit RSA key), at instance startup you can attach the key name to the instance and this will allow root access. Normally you will customize the AMI with your own less privileged user public keys and disable root login

Securing Your Application

Now that we have covered the basics how can we use these to secure a distributed application. Below is the normal deployment architecture for a typical tiered application.
ec2

In the above deployment we have created 4 security groups

Web-Security group: Allows http (80) and https(443) to everyone to access the application

App security group: Only allows access from instances running in web security group on required ports e.g. 8080

DB security group : Only allows access from instances running in app security group on required ports e.g. 3306

ssh-admin security group: Only allows access to ssh port 22 and as a matter of policy access is allowed from specific host address or organization network. This allows easy management of permissions.

As you can start an instance with multiple security groups the web tier instances will run with web and ssh-admin security groups, app server instances with app and ssh-admin and finally database instances with db and ssh-admin.

You will not need to change web, app or db security groups, The cloud administrator will allow or revoke admin access by  just adding or removing hosts from ssh-admin group with port 22 access. You can write scripts or use any GUI (Elasticfox, Amazon admin console) tool

Other Best practices

  • Make secure requests to Amazon Web Services see
  • Restrict ssh port(22) access to  host or organization network
  • You can and are encouraged by amazon to use an other firewall (e.g iptables) in conjunction with security groups  on an instance to restrict inbound/outbound traffic and have finer control
  • Dont open any port unnecessarily
  • Have separate application administrator (ssh access to instances) and cloud administrator(setting up security groups and key-pair generation with access to amazon EC2 certificate and access keys but no ssh access to running instances)
  • Disable password based login( set PasswordAuthentication no in /etc/ssh/sshd_config) see
  • Customize the AMI with your own user public keys and disable root login. If you need root login use sudo see
  • Keep your AMI up-to-date with security patches and fixes

Could Amazon really pull the plug ?

One of the interesting things about the Amazon success story is that the EC2  virtual server technology is often assumed  to originally have been an overspill from Amazon’s own network and that Amazon sold the extra capacity that it had available and that was not being used at peak times. I guess we’ll never know the real answer to this, but an interesting post from Benjamin Black on of the original guys to work with Chris Pinkham on what was to become Amazon EC2 seems to dispel this as an urban myth.

Why is this important ? Because I still meet people who seem to believe this, and that Amazon could “take back” capacity if they needed it and therefore leave people using EC2 (i.e. them ) high and dry. So could this ever happen ? Well, not for this reason, but I think a better question to ask is “Could Amazon pull the plug ?”

“Of course not ” I hear you say, Amazon have just announced an SLA. Well, that is true, since October 23rd 2008 Amazon have had an SLA that guarantees they will make every reasonable effort to provide a 99.9% monthly uptime. If they breach this then there are a series of financial credits which may not make up for the money you lose through trade if your site is down. To be fair though, if any SLA is breached you have the same problem wherever the site / service / application is hosted (and remember SalesForce.com don’t provide an SLA, preferring to build trust in their service instead)

One of the things that you also sign up to when you use one of Amazon’s services is the click through license agreement. Delving into this provides more details of the answer to our core question, “could Amazon really pull the plug”.  In section 3.3.2 of this agreement Amazon state the following:

3.3.2. Paid Services (other than Amazon FPS and Amazon DevPay). We may suspend your right and license to use any or all Paid Services (and any associated Amazon Properties) other than Amazon FPS and Amazon DevPay, or terminate this Agreement in its entirety (and, accordingly, cease providing all Services to you), for any reason or for no reason, at our discretion at any time by providing you sixty (60) days’ advance notice in accordance with the notice provisions set forth in Section 15 below.

In essence what this says is that if Amazon want to pull the plug, other than if you make a fundamental breach of contract (which are laid out in section 3.4) then they will give you 60 days notice. Great, so you get 60 days, right ? Well, not quite. Another section in the terms of service, “Modifications to this Agreement” allow Amazon to modify the terms of the whole agreement and once posted the new terms will be applicable 15 terms of after posting. Of course this change could include the section that says Amazon has to give 60 days notice of termination. OK, so now we get to it, so Amazon have to give you 15 days before they pull the plug ? Well, not quite, if they redefine their acceptable usage policy, and the new usage policy prohibits your service or application then you in effect get no notice before you get the plug pulled.

Extreme ? Of course, but the reality is that a service like Amazon (and SalesForce) is built on trust and if people don’t trust the service they won’t use it, Amazon and SalesForce both know this and work hard on creating services that have very little downtime and that are flexible and easy to use. This is why their usage ramp is going through the roof.


Alexa Traffic History

Using Amazon EC2 public IP address inside EC2 network

Each AMI instance on EC2 is assigned two IP addresses and corresponding DNS names. A public IP address that is accessable over the internet and an internal IP address only accessable inside internal EC2 regional network. You don’t have any control over the internal IP address and it is assigned randomly when you start the instance. For a public IP address you can assign an Elastic IP address to a running instance, elastic IP address is reserved and associated with your account and you pay for it when not in use. If you communicate between instances using public or elastic IP address even in the same region you pay regional data transfer rates(0.01$ per GB in/out).

There might be some scenarios where you might be tempted to use elastic IP address to communicate inside the same region e.g when your distributed system needs fixed ip addresses but you should carefully weigh the cons/pros. Not only are you paying for the traffic that would be free if you use internal IP address but also the performance will be lower. I ran some simple tests to find out more about this.

For the test I started two large instances in the same region using the same security group. And the results were quite interesting:

1) public or private dns name resolve to internal IP address inside EC2
2) There is a big hit in network latency between using internal and public IP address
3) Using traceroute shows that with public IP address network traffic goes through a lot more routers/hops

We hope that Amazon will soon provide:

1) Internal static IP address so we don’t go through configuration hell and enjoy fast network communication
2) Machines without public IP/DNS address e.g for machines that will be used behind firewalls and will never be accessed outside EC2 network directly e.g Database or Application Servers

Test Details
– Machine A used to run ping and traceroute internal IP address: 10.250.79.223
– Machine B Machine associated to an elastic IP address:
– Internal dns name: ip-10-250-78-208.ec2.internal
– Public dns name: ec2-174-129-227-190.compute-1.amazonaws.com
– Internal ip: 10.250.78.208
– Elastic ip: 174.129.227.190

DNS Ping Tests

ip-10-250-79-223:~# ping ip-10-250-78-208.ec2.internal

PING ip-10-250-78-208.ec2.internal (10.250.78.208) 56(84) bytes of data.

64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=1 ttl=62 time=0.346 ms

64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=2 ttl=62 time=0.226 ms

64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=3 ttl=62 time=0.384 ms

64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=4 ttl=62 time=0.257 ms

64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=5 ttl=62 time=0.252 ms

— ip-10-250-78-208.ec2.internal ping statistics —

5 packets transmitted, 5 received, 0% packet loss, time 3999ms

rtt min/avg/max/mdev = 0.226/0.293/0.384/0.060 ms

ip-10-250-79-223:~# ping ec2-174-129-227-190.compute-1.amazonaws.com

PING ec2-174-129-227-190.compute-1.amazonaws.com (10.250.78.208) 56(84) bytes of data.

64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=1 ttl=62 time=6.52 ms
64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=2 ttl=62 time=0.262 ms
64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=3 ttl=62 time=0.329 ms
64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=4 ttl=62 time=0.359 ms
64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=5 ttl=62 time=0.327 ms
64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=6 ttl=62 time=0.367 ms
64 bytes from ip-10-250-78-208.ec2.internal (10.250.78.208): icmp_seq=7 ttl=62 time=1.63 ms

— ec2-174-129-227-190.compute-1.amazonaws.com ping statistics —

7 packets transmitted, 7 received, 0% packet loss, time 5999ms

rtt min/avg/max/mdev = 0.262/1.400/6.520/2.138 ms

The above two commands show the public and private dns resolve to internal ip address when pinging from another EC2 machine

Public/Private network Ping tests

ip-10-250-79-223:~# ping 10.250.78.208

PING 10.250.78.208 (10.250.78.208) 56(84) bytes of data.

64 bytes from 10.250.78.208: icmp_seq=1 ttl=62 time=7.93 ms

64 bytes from 10.250.78.208: icmp_seq=2 ttl=62 time=0.250 ms

64 bytes from 10.250.78.208: icmp_seq=3 ttl=62 time=0.244 ms

64 bytes from 10.250.78.208: icmp_seq=4 ttl=62 time=0.360 ms

64 bytes from 10.250.78.208: icmp_seq=5 ttl=62 time=0.311 ms
— 10.250.78.208 ping statistics —

5 packets transmitted, 5 received, 0% packet loss, time 4000ms

rtt min/avg/max/mdev = 0.244/1.820/7.938/3.059 ms

ip-10-250-79-223:~# ping 174.129.227.190

PING 174.129.227.190 (174.129.227.190) 56(84) bytes of data.

64 bytes from 174.129.227.190: icmp_seq=1 ttl=52 time=1.62 ms

64 bytes from 174.129.227.190: icmp_seq=2 ttl=52 time=1.50 ms

64 bytes from 174.129.227.190: icmp_seq=3 ttl=52 time=1.46 ms

64 bytes from 174.129.227.190: icmp_seq=4 ttl=52 time=1.52 ms

64 bytes from 174.129.227.190: icmp_seq=5 ttl=52 time=1.49 ms

64 bytes from 174.129.227.190: icmp_seq=6 ttl=52 time=1.37 ms

64 bytes from 174.129.227.190: icmp_seq=7 ttl=52 time=1.38 ms

— 174.129.227.190 ping statistics —

7 packets transmitted, 7 received, 0% packet loss, time 5997ms

rtt min/avg/max/mdev = 1.375/1.482/1.621/0.092 ms

The above two ping commands show the difference in ping performance to the same machine using public and private ip address.

TraceRoute Tests

ip-10-250-79-223:~# traceroute 10.250.78.208

traceroute to 10.250.78.208 (10.250.78.208), 30 hops max, 52 byte packets

1 ip-10-250-76-177 (10.250.76.177) 0.155 ms 0.070 ms 0.046 ms

2 ip-10-250-76-160 (10.250.76.160) 11.776 ms 0.092 ms 0.087 ms

3 ip-10-250-78-208 (10.250.78.208) 0.267 ms 0.160 ms 0.127 ms

ip-10-250-79-223:~# traceroute -m 100 174.129.227.190

traceroute to 174.129.227.190 (174.129.227.190), 100 hops max, 52 byte packets

1 ip-10-250-76-177 (10.250.76.177) 0.121 ms 0.208 ms 0.047 ms

2 ip-10-250-76-3 (10.250.76.3) 0.295 ms 0.208 ms 0.209 ms

3 ec2-75-101-160-114.compute-1.amazonaws.com (75.101.160.114) 0.243 ms 0.226 ms 0.221 ms

4 othr-216-182-224-19.usma1.compute.amazonaws.com (216.182.224.19) 0.677 ms 20.055 ms 0.631 ms

5 72.21.197.200 (72.21.197.200) 0.797 ms 0.673 ms 0.593 ms

6 othr-216-182-232-72.usma2.compute.amazonaws.com (216.182.232.72) 0.897 ms 0.860 ms 0.808 ms

7 72.21.197.201 (72.21.197.201) 0.679 ms 0.865 ms 0.850 ms

8 othr-216-182-232-102.usma2.compute.amazonaws.com (216.182.232.102) 1.084 ms 1.129 ms 0.988 ms

9 othr-216-182-224-18.usma1.compute.amazonaws.com (216.182.224.18) 1.353 ms 1.308 ms 1.472 ms

10 ec2-75-101-160-115.compute-1.amazonaws.com (75.101.160.115) 1.823 ms 1.455 ms 1.608 ms

11 198.19.63.211 (198.19.63.211) 1.299 ms 1.305 ms 1.241 ms

12 ec2-174-129-227-190.compute-1.amazonaws.com (174.129.227.190) 1.363 ms 1.519 ms 1.254 ms

ip-10-250-79-223:~#

Traceroute shows the traffic has to go through multiple hops when using public ip address, this also requires opening more ports.

Finding information about an Amazon EC2 instance

One thing hit me when I was working with EC2 and I wanted to find some information about the instance itself. This did not seem to be readily available and I spent a frustrating 45 minutes trying to get what I was looking for. Eventually I found a very easy way to get the amazon instance information using CURL. Enter the following command appended by the Amazon information you are looking for:

$ curl http://169.254.169.254/1.0/meta-data/

This command can be appended with:
ami-id
ami-launch-index
ami-manifest-path
hostname
instance-id
local-ipv4
public-keys/
reservation-id
security-group
s

You can also pass data to the instance when you start it using EC2 tools  and retrieve the data within the instance using the above command. E.g if you started the instance using the command
ec2-run-instances ami-3abe5953 -d “Some data I need”

In the instance
$ curl http://169.254.169.254/1.0/user-data
will return
Some data I need

I want EC2 Cloud but I’ve got VMWare !

Many organisations are used to using virtualisation in-house probably from the use of VMWare. Often the organisational need is to move an existing virtualised application hosted on VMWare to a cloud provider, such as EC2. If this is your scenario, standards won’t help but you can still achieve what you need to do. The basic steps to do this are:

1.    Shut down the existing VMWare image

2.    Grab a copy of QEMU which you can use to convert the image.

3.    The VMDK file will then be converted into a RAW file

4.    As this is a RAW image it should be bootable by a local Xen, QEMU or KVM installation.

5.    Now you need to bundle this into an AMI using ‘ec2-bundle-image’

6.    Lastly you need to upload the bundled image and register it in EC2.

7.    The AMI will appear when you request a list of your images

We’ve done this for quite a few clients now and it is a relatively straight forward process.