When deploying on EC2 even though Amazon provides the hardware infrastructure, you still need to tune your instances operating system and monitor your application. You should review your hardware/software requirements and review your application design and deployment strategy
The Operating System
‘ulimit’ Specifies the number of open files that are supported. If the value set for this parameter is too low, a file open error, memory allocation failure, or connection establishment error might be displayed. By default this is set to 1024 , normally you should increase this to at least 8096.
Issue the following command to set the value.
ulimit -n 8096
Use the ulimit -a command to display the current values for all limitations on system resources
Tune the Network
A good in detail reference for Linux IP tuning is here. Some of the important parameters to change for distributed applications are below:
The tcp_fin_timeout variable tells kernel how long to keep sockets in the state FIN-WAIT-2 if you were the one closing the socketThis value takes an integer value which is per default set to 60 seconds. To set the value to 30 issue the command
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
The tcp_keepalive_intvl variable tells the kernel how long to wait for a reply on each keepalive probe. This value is in other words extremely important when you try to calculate how long time will go before your connection will die a keepalive death. The variable takes an integer value and the default value is 75 seconds. To set the value to 15 issue the following command
echo 15 > /proc/sys/net/ipv4/tcp_keepalive_intvl
The tcp_keepalive_probes variable tells the kernel how many TCP keepalive probes to send out before it decides a specific connection is broken.
This variable takes an integer value, The default value is to send out 9 probes before telling the application that the connection is broken. To change the valueto 5 use the following command.
echo 5 > /proc/sys/net/ipv4/tcp_keepalive_probes
You can monitor the system resources using command line but to make life easier you can use monitoring systems. Couple of free opensource monitoring tools that we use
- Ganglia a free monitoring system
- Hyperic they have both a commercial and free offering
You will be amazed how few projects care about logging until they have hit a problem. Have a consistent logging procedure in place to collect the logs from different machines to troubleshot in case of a problem
Some linux command that we use regulary to you might find useful. More details can be found here, here and here
- top: display Linux tasks
- vmstat Report virtual memory statistics
- free Display amount of free and used memory in the system
- netstat Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
- ps Report a snapshot of the current processes
- iostat Report Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions
- sar Collect, report, or save system activity information
- tcpdump dump traffic on a network
- strace trace system calls and signals