Hardening RedHat (CentOS) Linux for use on Cloud

If you next to deploy Linux on Cloud you should consider hardening the Linux instance prior to any deployment. Below are guidelines we have pulled together with regards to hardening a RedHat or CentOS instance.

Hardening Redhat linux guidelines

enable selinux

Ensure that /etc/selinux/config includes the following lines:

Run the following on commandline to allow httpd to create outbound network connections
setsebool httpd_can_network_connect=1

check using
To enable/disable
echo 1 >/selinux/enforce

disable the services

chkconfig anacron off
chkconfig autofs off
chkconfig avahi-daemon off
chkconfig gpm off
chkconfig haldaemon off
chkconfig mcstrans off
chkconfig mdmonitor off
chkconfig messagebus off
chkconfig readahead_early
chkconfig readahead_early off
chkconfig readahead_later off
chkconfig xfs off

Disable SUID and SGID Binaries

chmod -s /bin/ping6
chmod -s /usr/bin/chfn
chmod -s /usr/bin/chsh
chmod -s /usr/bin/chage
chmod -s /usr/bin/wall
chmod -s /usr/bin/rcp
chmod -s /usr/bin/rlogin
chmod -s /usr/bin/rsh
chmod -s /usr/bin/write

Set Kernel parameters

At boot, the system reads and applies a set of kernel parameters from /etc/sysctl.conf. Add the following lines to that file to prevent certain kinds of attacks:


Disable IPv6

Unless your policy or network configuration requires it, disable IPv6. To do so, prevent the kernel module from loading by adding the following line to /etc/modprobe.conf:
install ipv6 /bin/true
Next, add or change the following lines in /etc/sysconfig/network:

Nessus PCI Scan

Upgrade openssh to latest version

upgrade bash to latest version


Set HTTP headers off

In /etc/httpd/conf/httpd.conf set the following values
ServerTokens Prod
ServerSignature Off
TraceEnable off

In /etc/php.ini set
expose_php = Off

Change MySQL to listens on only localhost

Edit /etc/my.cnf and add following to mysqld section
bind-address =

Make sure only port 80 443 21 are open

vi /etc/sysconfig/iptables
and add
ACCEPT tcp - anywhere anywhere state NEW tcp dpt:https
ACCEPT tcp — anywhere anywhere state NEW tcp dpt:ftp

Cloud Advertising: Google Adwords – how much is enough?

Normally this blog is pretty tech focused but we thought we’d depart slightly from our normal mode operandus and provide a high level overview on Google Adwords with regards to spend. We often get asked. How much should we spend ? If we are only spending a small amount should we even bother ? Good questions, so here is our 5 cents:

– To let you figure out effectiveness plan a test budget and test campaign matrix and run it for a month or so to see where you get the best bang for you buck

– Remember It is not about the spend it is about the ROI. If the ROI holds up your spend should increase.

– Your should focus on Earnings Per Click (EPC) not Cost Per Click (CPC). That is what really counts. (EPC =  Customer Value X Conversion Rate)

Focus on how to increase EPC during your trial. In particular:

Set up Google Adwords conversion tracking – without it your campaign is worthless. You need to be able to track conversions.

– Focus on refining the Ad to make it as compelling as possible. Monitor the conversions won (or lost) due to the change.

– You must create relevance between the Ad and the landing page otherwise Google will score you down as your prospects quickly click away and/or the check the page for relevant keywords.

– Focus on the most cost effective keywords. Don’t bother with those that are outside of your value range i.e those that eat into your ROI or end up in a negative ROI.

– Use lots of negative keywords to prevent untargeted traffic.

That’s it ! There are a gazillion great ways of refining or making Adwords work for you (Long Tail keywords, different type of matches etc) but these high level tips should get you on the right road from the beginning.


Ed Snowdon’s email service shuts down – advises not to trust physical data to US companies – what are options ?

It has been a while since we did a post and a lot has happened in that time including the explosion from Edward Snowdon and the PRISM snooping revelations. These have continued to gather momentum culminating  in the email service that Snowdon used, Lavabit, closing. The owner, Ladar Levision, said that he had to walk away to prevent becoming complicit in crimes against the American public. All very cryptic and chilling. He also had this advised that he “would  strongly recommend against anyone trusting their private data to a company with physical ties to the United States.” So what to do if you have data stored on remote servers ?

Well firstly you may not care. The data you are storing may no way be sensitive and that is the key ie. you need a strategy for how you deal with sensitive data and sharing of sensitive data so what can you do ?

1. You could consider encrypting the data that is stored on cloud servers. There are various ways to do this. There are client side tools such as BoxCryptor that do a good job of this, and there are also more enterprise type platform solutions such as CipherCloud and Storage Made Easy that enable private key encryption of data stored remotely . Both can be deployed on-premise behind the corporate firewall.

2. You could consider a different policy entirely for sharing sensitive data. On a personal basis you could use OwnCloud or even setup a RaspBerry Pi as your own personal DropBox or again you could use StorageMadeEasy to create your own business cloud for keeping sensitive data behind the firewall and encrypting remote data stored outside the firewall.

The bottom line is think about your data security, have a policy, think about how you protect sensitive data.


Understanding DNS, propagation and migration

We recently had a customer that was migrating from one DNS provider to another due to large outages from their existing supplier ie. a failure to keep their DNS services working correctly. They went ahead and migrated by changing their A Record and MX records for their domain/ sub-domains and only contacted us when they started getting outages during propagation as they suspected they must have done something wrong and they were not sure of how to check.

The best way to check this out is to use the DIG command. DIG is an acronym for Domain Information Groper. Passing a domain name to the DIG command by default displays the A record of the queried sit (the IP address).

We can use Dig to check the new nameserver are correctly returning the A record and MX records. To do this:

 Dig@<nameserver URL or IP> <DomainName to check>

If this is correct then it means that the name servers have the correct records which means when they are changed at the registrar we can assume they will be correct.

In the case of the company in question the DNS was correctly returning the new NameServer and MX records for the Domain but their local recursor was still returning the old NameServer records as propagation had not taken place.

Other recursors can be checked to identify whether propogation has taken place there i.e.:

dig @ ns <domain> would check the Verizon recursor

Others of note are:, – OpenDNS, – Google

Others can be found on the OpenNic Wiki

So in the companies case caching of the prior NameServers and the TTL (time to live) was causing the problem as the new NameServers were not completed propagating. Essentially there were two different “nameservers”, each returning different values, and, being selected randomly (due to cached ns records).

One of the things we were able to do help smooth the transition was to ensure each NameServer returned identical values by making both zones were 100% identical ie. on the original service we changed the NameServer NS records to match the new NameServer NS records. Ideally this would have been done as soon as migration occurred.

Finding disk bottlenecks on Linux

We recently had a client that had some issues with their site slowing down. They thought initially it was due to MySQL locks but this was not the case.  It was clear that the problem was with the disk. Some process was utilizing the disk. When running top we could see that the CPU wait time was 25-30%.

Also running vmstat we could see the wait time was quite high, so the question was which process was causing the issue. Interestingly doing a Google web search brings up almost no coherent posts on finding disk bottlenecks. The solution is good old iostat. That provides the information about the disk read and writes per partition but it does not tell you which process is causing the disk i/o. The later versions of linux kernel provide quite good diagnostic information about the disk i/o but this is not documented in the reasonably popular older posts on the subject of disk thrashing.

For the lastest kernel versions you can use iotop to pinpoint the process that is specifically doing the disk i/o. To do this:

1. Start iotop

2. press the left arrow twice so that the sort field is on disk write.

3. You will now be in real time mode of which process i is writing to the disk so you can see specficially

4. If you wish to get a historic view of writes to date then press ‘a’ again (just press ‘a’ one more time to switch back).

In this clients case the issue was their temp directory was on the same physical drive as their site and MySQL DB. Moving the temp diretory to a separate drive resolved the issue.


Some MongoDB and MySQL comparisons for a real world site

We recently did some tests with regards to replacing an existing MySQL implementation with MongoDB. I thought some of the tests would be interesting to share.

MySQL ver 14.12 Distrib 5.0.27, for Win32 (ia32)

MongoDB v2.0.4 for Win32 (journaling not enabled)

The test was centred around a table that has 400000 records with numbered names

The table was indexed on two fields, id and an_id

Selection from specific folder by name:

SELECT id FROM table WHERE (an_id=2 AND name=’some name_251504′);

db.files.find({an_id:1, name:’some name_255500′}, {id:1});

* no index for ‘name’


0.83 s


0.44 s


Increased records number to 800 000 (reached limit on 32bit OS for the data file size)

*Added index for ‘name’

Data files size:


238 MB


1.4 GB


Selection of files from specific folder by name pattern:

SELECT count(*) FROM table WHERE (an_id=1 AND name like ‘%ame_2%’);

db.files.find({an_id:0, fi_name:/ame_2/}, {id:1, fi_name:1}).count();

> 202 225 records found


9.69 s

0.69 s


3.62 s

1.34 s

* first run and others (match pattern changes slightly to prevent cache usage)


select count(*) from table where (id > 500000 and id < 550000);

db.files.find({id:{$gt:500000, $lt:550000}}).count()

> 50 000 records found


0.02 s


0.08 s


Delete 10 records:

delete from table where (id > 800000 and id < 800010);

db.files.remove({id:{$gt:800000, $lt:800010}});



0.13 s


0.00 s


Delete 50 000 records: 

delete from table where (id > 600000 and id < 650000);

db.files.remove({id:{$gt:600000, $lt:650000}});


5.72 s


2.00 s


Update 10 records:

UPDATE table SET name=’some new name’ WHERE (an_id=2 AND id > 200000 AND id <= 200010);

db.files.update({an_id:1, id:{$gt:200000, $lte:200010}}, {$set:{name:’some new name’}}, false, true);


0.08 s


0.02 s


Update 50 000 records:

UPDATE table SET name=’sone new name 2′ WHERE (id > 250000 AND id <= 300000);

db.files.update({id:{$gt:250000, $lte:300000}}, {$set:{name:’some new name2′}}, false, true);


10.63 s


3.54 s

Insert 50 records:


0.08 s


0.02 s


Insert 500 records:


0.13 s


0.09 s

Conclusions and other thoughts:

MongoDB has a clear  advantage on speed and this increases as more records are added.

Concerns are:

– MongoDB is not as battle tested or hardened

– The “gotcha’s (lack of our knowledge in part..)

– In MySQL data can be obtained from multiple tables with a single query whereas in mongoDB it seems multiple queries are needed to obtain data from multiple collections. Whereas there are latency advantages when dealing with a single collection these are negligible when dealing with multiple collections. Also, tuning of MySQL buffers and partitioning reduces speed advantages once again.

The conclusion was to stick with MySQL but to keep an eye on MongoDB.

DropBox is just a frontend to Amazon S3 with a killer sync feature

Musing about iCloud, the forthcoming SkyDrive integration into Windows 8, and Google Drive  got me thinking about DropBox, the company whose business model is built on charging when everyone else is starting to give large amounts of storage away for free. DropBox killer feature is their sync replication. It just works, and consumers have shown they love the simplicity of it. However Apple have replicated the simplicity of the sync, albeit only for iOS users, and Microsoft are now close to the same with Live Mesh.

DropBox store the files you give them on Amazon S3. This surprises many people who had assumed that they are stored on DropBox Servers. This means that the entire DropBox business model is beholden to Amazon Web Services. Amazing when you think about it, and highly illustrative that what DropBox really brings to the table is great software with a killer feature, but what is going to happen when every one else has that killer feature, with 10x to 20x more storage for free?

recent article had DropBox valued at 4 billion dollars . This is a valuation on a company doing revenues between 100-200 million dollars per year in which investors have poured in 257 million dollars in funding. Perhaps it’s me, but I just don’t see it. Yes, they have a gazillion subscribers but so what? In a commodised industry that struggles to convert more than 2% of the user base, why should that get me excited? But there is DropBox Teams for businesses right? Ever used it? Then try it and you won’t need me to draw a conclusion.

So what for DropBox if there is no mega IPO coming along? They turned down Mr Jobs (a mistake), so who else would be interested? What about Amazon? After all DropBox really is the ultimate sync client for Amazon S3. With Amazon now looking twards  private cloud it would same a match made in heaven. As with all good things, time will tell……

Comprehensive overview of PaaS Platforms

Looking to implement a PaaS. Wondering what product to start with or how they compare ? Well, there may not be an App for that but there is a collaborative spreadsheet.

To view the spreadsheet directly on Google Docs click here (it seems Google only supports 50 concurrent connections of a spreadsheet so if you have an issue try again later)

Are we witnessing the death of public File Sharing services ?

The decline of MegaUpload and the rumours that the FBI has another hotlist of sites to go after has left other file sharing services running for their proverbial lives, with legitimate services often deciding to remove public file sharing from their own services, despite arguments that the MegaUpload “bust” has done little to reduce internet piracy.

A list stored on the pastebin service shows the extent that the MegaUpload and MegaVideo closure has had on services:

  1. MegaUpload – Closed.
  2. - FileServe – Closing does not sell premium.
  3. - FileJungle – Deleting files. Locked in the U.S..
  4. - UploadStation – Locked in the U.S..
  5. - FileSonic – the news is arbitrary (under FBI investigation).
  6. - VideoBB – Closed! would disappear soon.
  7. - Uploaded – Banned U.S. and the FBI went after the owners who are gone.
  8. - FilePost – Deleting all material (so will leave executables, pdfs, txts)
  9. - Videoz – closed and locked in the countries affiliated with the USA.
  10. - 4shared – Deleting files with copyright and waits in line at the FBI.
  11. - MediaFire – Called to testify in the next 90 days and it will open doors pro FBI
  12. -Org torrent – could vanish with everything within 30 days “he is under criminal investigation”
  13. - Network Share mIRC – awaiting the decision of the case to continue or terminate Torrente everything.
  14. - Koshiki – operating 100% Japan will not join the SOUP / PIPA
  15. - Shienko Box – 100% working china / korea will not join the SOUP / PIPA
  16. - ShareX BR – group UOL / BOL / iG say they will join the SOUP / PIPA

For certain sites that previously failed to remove copyrighted files for long periods of time, then this is clearly illegal, and in our opinion these services should rightly be targetted. Other services that tuned their whole offering to enable users to upload copyrighted content, whilst then charging users to access the illegally obtained copyrighted content, can also have no complaints at legal intervention.

However we are more concerned about other services such as Box, DropBox etc who offer public file sharing for legitimate purposes,  and to treat these services the same as those aforementioned is clearly wrong. It is like trying to ban cars because robbers choose to use cars as getaway vehicles when robbing banks. Clearly the car manufactures did not design cars to rob banks ! Whereas the analogy may sound trite, what is concerning is that authorities may well go after every file sharing service just because they possess public file sharing features.

Dealing with MySQL issues in the Cloud: Automating restart on error

MySQL is the mainstay of most Cloud Applications (including this WordPress Blog !), however if MySQL has an issue, either through number of connections maxing out, or MySQL being locked and not available it can result in site outages. We’ve seen clients who have ended up with their SQL DB down from a couple of hours to a couple of days before they suddenly realised there was an issue.

To that end we wrote a small script that can be used to automate the restarting of MySQL in such scenarios.

The script is called mysqlrestart.sh and is listed below. You need root access to be able to use it. If you use it and  ever reboot the server you will need to login as root and run nohup ./mysqlrestart.sh  & to restart it.

Set the script to run every 30 seconds using Cron. It will then check for a number of connections and if it cannot get a connection or the number of connections is greater than the number defined (defined as 90 in the example below), it will restart mysql.



echo `date`  sqlrestart started >> run.out

while true; do

        sqlconnections=`mysql –skip-column-names -s   -e  “SHOW STATUS LIKE ‘Threads_connected'” -u root | awk ‘{print $2}’`

        #exclude myself from the number of thread connections             

        sqlconnections=$((sqlconnections – 1))

        echo `date` sqlrestart connections  $sqlconnections  >> run.out

        if [ $sqlconnections -gt $SQLCONNECTION_THRESHOLD ] || [ $sqlconnections -lt 0 ]


                echo `date` restarting mysql server $sqlconnections  >> restart.out

                service mysql restart >> restart.out 2>&1

                echo `date` restart  complete   >> restart.out


        sleep 30