Amazon EBS Provisioned IOPS volumes can now store up to 16 TB

Amazon EVS 16TBFrom  today, users of Amazon Web Services can create Amazon EBS Provisioned IOPS volumes that can store up to 16 TB, and process up to 20,000 input/output operations per second (IOPS).

Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 (Elastic Compute) instances in the AWS Cloud.

Users can also create Amazon EBS General Purpose (SSD) volumes that can store up to 16 TB, and process up to 10,000 IOPS. These volumes are designed for five 9s of availability and up to 320 megabytes per second of throughput when attached to EBS optimized instances.

These performance improvements make it even easier to run applications requiring high performance or high amounts of storage, such as large transactional databases, big data analytics, and log processing systems. Users can now run large-scale, high performance workloads on a single volume, without needing to stripe together several smaller volumes.

Larger and faster volumes are available now in all commercial AWS regions and in AWS GovCloud (US). To learn more please check out the Amazon EBS details page.

EC2 Linux Monitoring & Tuning Tips

When deploying on EC2 even though Amazon provides the hardware infrastructure, you still need to tune your instances operating system and monitor your application. You should review your hardware/software requirements and review your application design and deployment strategy

The Operating System

Change ulimit

‘ulimit’ Specifies the number of open files that are supported. If the value set for this parameter is too low, a file open error, memory allocation failure, or connection establishment error might be displayed. By default this is set to 1024 , normally you should increase this to at least 8096.

Issue the following command to set the value.

ulimit -n 8096

Use the ulimit -a command to display the current values for all limitations on system resources

Tune the Network

A good in detail reference for Linux IP tuning is here.  Some of the  important parameters to change  for distributed applications are below:

TCP_FIN_TIMEOUT

The tcp_fin_timeout variable tells kernel how long to keep sockets in the state FIN-WAIT-2 if you were the one closing the socketThis value takes an integer value which is per default set to 60 seconds. To set the value to 30  issue the command

echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

TCP_KEEPALIVE_INTERVAL

The tcp_keepalive_intvl variable tells the kernel how long to wait for a reply on each keepalive probe. This value is in other words extremely important when you try to calculate how long time will go before your connection will die a keepalive death. The variable takes an integer value and the default value is 75 seconds. To set the value to 15 issue the following command

echo 15 > /proc/sys/net/ipv4/tcp_keepalive_intvl

TCP_KEEPALIVE_PROBES

The tcp_keepalive_probes variable tells the kernel how many TCP keepalive probes to send out before it decides a specific connection is broken.
This variable takes an integer value, The default value is to send out 9 probes before telling the application that the connection is broken. To change the valueto 5  use the following command.

echo 5 > /proc/sys/net/ipv4/tcp_keepalive_probes

 

Monitoring

You can monitor the system resources using command line but to make life easier you can use monitoring systems.  Couple of free opensource monitoring tools that we use

  • Ganglia a free monitoring system
  • Hyperic they have both a commercial and free offering

 

Logging

You will be amazed how few projects care about logging until they have hit a problem. Have a consistent logging procedure in place to collect the logs from different machines to troubleshot in case of a problem

Linux Commands

Some linux command that we use regulary to you might find useful. More details can be found here, here and here

  • top: display Linux tasks
  • vmstat Report virtual memory statistics
  • free Display amount of free and used memory in the system
  • netstat Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
  • ps Report a snapshot of the current processes
  • iostat Report Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions
  • sar Collect, report, or save system activity information
  • tcpdump dump traffic on a network
  • strace trace system calls and signals

Is billing Amazon’s Achilles heel ?

istock_000000199356xsmallHaving worked on a number of projects with Amazon Web Services recently the one non-technical thing that has stood out is the billing model that Amazon adopts which basically forces the company to have a credit card available and then Amazon produce an email with the least amount of information possible on it to tell you that your credit card has been charged. If the user wants any kind of ‘Invoice’ they have to go back to their account and try and download usage amounts and associated bills. There is not one clean Invoice and a number of ‘features’ missing for this type of model…to name but a few:

What I am looking for is a way to put some control back into an Organisations hands, including:

– A way to grant access to more  granular access to users and therefore track who /which department in the company is using the service

– Central Management of billing, and an actual Invoice that can be submitted for recompense either to a another company or internally

– Ability to set budget limits, akin to what you can do to Google Adwords. 

– Alerting mechanisms to SMS when budgets near tolerance levels

– Ability to centrally track usage data so that chargeaback mechanisms can cleanly be built and used

There are numerous threads on the Amazon Web Service Community forum asking for hard copy invoices . Amazon does provide a page for tax help but its not that helpful 😉

Just some of the things floating around on the thread:

“Sounds silly, isnt’t it? But really, you can shake your head as long as you want, but tax authorities will not accept an invoice which does not state both partie’s VAT-ID number (here in italy, but its the same all over europe). 
If i go to dinner with my clients, the waiter will bring the bill in a carbon copy chemical paper. I HAVE to write my VAT-ID and full company name on it. 
Only THEN, he separates the first from the second sheet of paper, one stays in his records, one in my. 

If they check my books and find an invoice or bill which is not complaint to the formal requirements of having VAT-ID of both parties, they will not accept it and make you pay a fine. Its silly to discuss about the meaning of this, you would have to listen to a very long story about what cross-checks they do with these VAT-IDs. 

Any way, it’s not necessary that you send me a printed invoice, i can print it myself. But IT IS NECESSARY, that the invoice states clearly: 

name, address and VAT-ID of the seller 
name, address and VAT-ID of the purchaser 
description of goods and services 
invoice date, invoice number 

if any of these things are missing, the sheet of paper simply is not an invoice and trying to book it as an expense is a violation of law. 

Currently we are not able to detract AWS expenses of a few 100 US$/month due to these limitations.”

Reply to this post:

“In Czech it is even worse … we have to have hard copy with hand-writen _signature_ to be valid for tax authorities. Problems implications are then quite clear. Silly, but real in Czech. Another more detail, we can not add dinner with customer to our taxes. It has to be paid from the company net profit. “

Another example Reply:

“The same here in germany, we want to start using AWS for some projects but without a proper invoice our accounting will not give us a “go”. 

If this won’t change within this month we will either continue to work with dedicated server networks or might try the google appspot. 

Thats really a shame, because amazon does obviously know how to write correct invoices for amazon.com/.de. 

I believe that this is probably tax related, with Amazon not wanting to amass taxes for Regional entities that would be liable for country specific tax, but its a great hole right now and I don’t have much doubt that it stops further adoption of the services themselves as organisational procedures are pretty inflexible when dealing with these issues.

Amazon EC2 News / Round Up

There is a good PDF whitepaper on using Oracle with Amazon Web Services which can be downloaded here.


A tutorial by Amazon on creating an Active Directory Domain on Amazon EC2 is a thorough article and well worth the read if you intend to implement this functionality on the cloud.


Simon Brunozzi from Amazon gives a good talk on “From zero to Cloud in 30 minutes” at the Next conference in Hamburg which can be viewed below.





Leventum talk about how they implemented the first ERP solution on the cloud using Compiere.


Jay Crossler Looks at how to visualize different cloud computing algorithms using serious Games technologies on the Amazon EC2 cloud below:


Practical Guide for Developing Enterprise Applications for the Cloud

This session was presented at Cloud Slam 09 by Nati Shalom CTO of GigaSpaces. It provides a practical guideline addressing the common challenges of developing and deploying an existing enterprise application on the cloud. Additionally, you will get the opportunity for hands-on experience running and deploying production ready applications in a matter of minutes on Amazon EC2.

Using Amazon EC2 for PCI DSS compliant applications

Compliance and regulatory concerns are often voiced when it comes to Cloud Computing, and often many of the interesting types of applications organisations would like to deploy to the cloud are  often those governed by some form of regulatory standard. Lets look in more details at one of these.

PCI DSS is a set of comprehensive requirements for enhancing payment account data security and was developed by the founding payment brands of the PCI Security Standards Council, including American Express, Discover Financial Services, JCB International, MasterCard Worldwide and Visa Inc. Inc. International, to help facilitate the broad adoption of consistent data security measures on a global basis.

The PCI DSS is a multifaceted security standard that includes requirements for security management, policies, procedures, network architecture, software design and other critical protective measures. This comprehensive standard is intended to help organizations proactively protect customer account data.

So, is it possible to create a PCI DSS compliant application that can be deployed to EC2 ?

In order for an application or system to become PCI DSS compliant requires an end to end system design (or a review if pre-existing) and implementation.  In the case of AWS customer’s attaining PCI compliance (certification), they would have to ensure they met all of the prescribed requirements through the use of encryption etc. very much like other customers have done with HIPAA applications.  The AWS design allows for customers with varying security and compliance requirements to build to those standards in a customized way.

There are different levels of PCI compliance and the secondary level is quite a straight forward configuration, but requires additional things such as 3rd party external scanning (annually).  You can find an example here of the PCI Scan report that is done on a quarterly basis for the Amazon platform.  This isn’t meant to be a replacement for the annual scan requirement. Customers undergoing PCI certification should have a dedicated scan that includes their complete solution, therefore certifying the entire capability, not just the Amazon infrastructure.

 The principles and accompanying requirements, around which the specific elements of the DSS are organized are:

 Build and Maintain a Secure Network

Requirement 1: Install and maintain a firewall configuration to protect cardholder data

Requirement 2: Do not use vendor-supplied defaults for system passwords and other security parameters Protect Cardholder Data

Requirement 3: Protect stored cardholder data

Requirement 4: Encrypt transmission of cardholder data across open, public networks Maintain a Vulnerability Management Program

Requirement 5: Use and regularly update anti-virus software

Requirement 6: Develop and maintain secure systems and applications Implement Strong Access Control Measures

Requirement 7: Restrict access to cardholder data by business need-to-know

Requirement 8: Assign a unique ID to each person with computer access

Requirement 9: Restrict physical access to cardholder data Regularly Monitor and Test Networks

Requirement 10: Track and monitor all access to network resources and cardholder data

Requirement 11: Regularly test security systems and processes Maintain an Information Security Policy

Requirement 12: Maintain a policy that addresses information security

Many of these requirements can’t be met strictly by a datacenter provider, but in Amazon’s case, they will be able to provide an SAS70 Type 2 Audit Statement in July that will provide much of the infrastructure information needed to meet PCI DSS certification.  The Control Objectives that the Amazon Audit will address are:

 Control Objective 1: Security Organization:  Management sets a clear information security policy. The policy is communicated throughout the organization to users

 Control Objective 2: Amazon Employee Lifecycle:  Controls provide reasonable assurance that procedures have been established so that Amazon employee accounts are added, modified and deleted in a timely manner and reviewed on a periodic basis to reduce the risk of unauthorized / inappropriate access

 Control Objective 3: Logical Security:  Controls provide reasonable assurance that unauthorized internal and external access to data is appropriately restricted

Control Objective 4: Access to Customer Data:  Controls provide reasonable assurance that access to customer data is managed by the customer and appropriately segregated from other customers

Control Objective 5: Secure Data Handling:  Controls provide reasonable assurance that data handling between customer point of initiation to Amazon storage location is secured and mapped accurately

 Control Objective 6: Physical Security:  Controls provide reasonable assurance that physical access to Amazon’s operations building and the data centers is restricted to authorized personnel

Control Objective 7: Environmental Safeguards:  Controls provide reasonable assurance that procedures exist to minimize the effect of a malfunction or physical disaster to the computer and data center facilities

Control Objective 8: Change Management:  Controls provide reasonable assurance that changes (including emergency / non-routine and configuration) to existing IT resources are logged, authorized, tested, approved and documented.

Control Objective 9: Data Integrity, Availability and Redundancy:  Controls provide reasonable assurance that data integrity is maintained through all phases including transmission, storage and processing and the Data Lifecycle is managed by customers

Control Objective 10: Incident Handling:  Controls provide reasonable assurance that system problems are properly recorded, analyzed, and resolved in a timely manner.

Many thanks to Carl from Amazon for his help with this information.

Update: Since this post was published Amazon updated their PCI DSS FAQ. You can find that here.

Overcoming the EC2 Windows AMI 10GB limit

Amazon limit the Windows AMI instance to 10GB in size which almost makes the image unusable if you try and add other software within the windows C Drive. Windows is notoriously heavy on disk space and whereas 10 GB may seem a lot believe us, it isn’t when it comes to windows and a combination of windows software.

So what can you do ? Well there are three potential options:

1. You can mount an EBS volume to a directory under C: MyDigitalLife has a great article on how to achieve this. This volume will become your E:

2. If more temporary space is needed for files or downloads etc than the 10 GB limit will give you, it is possible to make temporary folders outside of  the C: partition. 

– Right-click My Computer. 
– Click Properties 
– Click Advanced 
– Click Environment Variables 
– Change the tmp and temp to whatever you want.

3.  Use a combination of Junction link magic and webdrive. Firstly install whatever you need to the D: drive and use JLM to create junctions from C to D. Junctions are effectively a combination of symbolic links, and mount points. Install WebDrive to C: and then use it to copy the program files that are on D: to Amazon s3. As D: is transient this will mean if the instance goes down You can copy everything back from S3 to D:.

I’m sure at some point Amazon will get their act together on the instance size for Windows so you don’t have to navigate you way around this but right now at least this gives you some options.

McKinsey Cloud research kicks up a storm

A research paper on Cloud Computing by McKinsey & Company entitled ‘Clearing the Air on Cloud Computing’ has kicked up a right old storm with various luminaries either for or against it. The premise of the results of the article are that for large organisations, if they adopt the cloud model, then they would be making a mistake and most likely will lose money, as outsourcing from a more traditional data centre will likely double the cost (($150 per month per unit for data center vs $366 per month per unit for Amazon virtual cloud) . The New York times has an excellent summary of the study here.


Many of the complaints focus on McKinsey totally missing the “Private Cloud” and basing their assumptions on Public Clouds only. However there seems to be a general consensus that Amazon is too expensive and will need to adjust to survive. I’m not convinced about this. It is not the first study to suggest that Amazon are more expensive to use than a traditional data centre. Amazon seems to have been doing just fine up to now and they seem to be getting Enterprises to move across. Whether they replace a whole corporate data centre misses the point. I think this is unlikely, but for certain applications and service it makes perfect sense. Also, more competition unfolds then economics suggest that prices will naturally adjust if they need to.


You can download a PDF of the McKinsey presentation on this paper here.

Vendors line up to use Cloud as differentiator

Vendors small and large are starting to see Cloud Computing as a great sales and technical differentiator. Three example of this are:

1. AMD pushed out press to let the world know it is saying “yes to cloud”. Whoopee ! To quote:

“Advanced Micro Devices CEO Dirk Meyer sees cloud computing as the next great investment for the enterprises and says AMD’s processors are going to be a big part of this type of future data center”.

Given the rapid progression of the public/private cloud and virtualisation markets, I’m sure the chip vendors must be salivating at the potential extra dollars to be made.

2. Ubuntu announced version 9.10 of their Linux Distro  which is codenamed Karmic Koala and also announced that it will have built in support for Cloud in their server edition.  Their aim is for Ubuntu to provide a standard set of AMI’s (Amazon Machine Images) to enable simplified deployment on EC2.  So far so what, but the developers also aim to integrate support for Eucalyptus, which we have discussed previously. This would enable organisations to use Ubuntu to make their own private clouds within their own data center. A real differentiator and a great way to create a value point differentiation against the likes of RedHat and SUSE.

3. Cobol and the Cloud….two things you probably would not expect to hear in the same sentence. However MicroFocus has identified a market in which customers can outsource their Cobol applications and MicroFocus can host them on the cloud. Micro Focus is supporting Amazon EC2 to increase the options customers have to reach the cloud and begin capitalizing on the cost savings associated with cloud computing.

Interesting times indeed…

Is it Grid or is it Cloud ?

recent post by the Cloud vendor CohesiveFT talks about the potential changes in technical sales cycles when evaluating Grid based products. I’m not sure I agree totally with the article, but the ethos behind the article i.e. making it easier to trial products, try out solutions and build apps /services quicker to be build internal business cases is solid.

Cloud is a game changer, which is the intent of the article, but you cannot apply a broad brush to “Grid on the Cloud” as a unilateral game changer  in respect of Cloud replacing Grid (which to be fair is not the intent of the article). For many companies replacing internal Grids, or even planning for new Grids, cannot be done on the Cloud. There are challenge of integration, moving data, securing data (and this is where Cohesive FT’s VPN-Cubed product offering can help), physical location, legislation, SLA’s and availability (see this article for a good synopsis on this as applied to EC2). Many of these will be resolved in time, and some of course can be resolved right now, but with the move by many vendors to enable existing IT infrastructure and Data Centers as private clouds the point is likely to be mute in the future I think. Right now, an internal Grid is not elastic. It does not add more servers or resources to the service as required, but this will change as such internal Fabric enablers become more normal.In fact one can image a future where such companies may sell excess capacity of their “Grid Clouds” to ensure a more economical running of their infrastructures.