Some MongoDB and MySQL comparisons for a real world site

We recently did some tests with regards to replacing an existing MySQL implementation with MongoDB. I thought some of the tests would be interesting to share.

MySQL ver 14.12 Distrib 5.0.27, for Win32 (ia32)

MongoDB v2.0.4 for Win32 (journaling not enabled)

The test was centred around a table that has 400000 records with numbered names

The table was indexed on two fields, id and an_id

Selection from specific folder by name:

SELECT id FROM table WHERE (an_id=2 AND name=’some name_251504′);

db.files.find({an_id:1, name:’some name_255500′}, {id:1});

* no index for ‘name’

MySQL:

0.83 s

MongoDB:

0.44 s

 

Increased records number to 800 000 (reached limit on 32bit OS for the data file size)

*Added index for ‘name’

Data files size:

MySQL:

238 MB

MongoDB:

1.4 GB

 

Selection of files from specific folder by name pattern:

SELECT count(*) FROM table WHERE (an_id=1 AND name like ‘%ame_2%’);

db.files.find({an_id:0, fi_name:/ame_2/}, {id:1, fi_name:1}).count();

> 202 225 records found

MySQL:

9.69 s

0.69 s

MongoDB:

3.62 s

1.34 s

* first run and others (match pattern changes slightly to prevent cache usage)

Count

select count(*) from table where (id > 500000 and id < 550000);

db.files.find({id:{$gt:500000, $lt:550000}}).count()

> 50 000 records found

MySQL:

0.02 s

MongoDB:

0.08 s

 

Delete 10 records:

delete from table where (id > 800000 and id < 800010);

db.files.remove({id:{$gt:800000, $lt:800010}});

 

MySQL:

0.13 s

MongoDB:

0.00 s

 

Delete 50 000 records: 

delete from table where (id > 600000 and id < 650000);

db.files.remove({id:{$gt:600000, $lt:650000}});

MySQL:

5.72 s

MongoDB:

2.00 s

 

Update 10 records:

UPDATE table SET name=’some new name’ WHERE (an_id=2 AND id > 200000 AND id <= 200010);

db.files.update({an_id:1, id:{$gt:200000, $lte:200010}}, {$set:{name:’some new name’}}, false, true);

MySQL:

0.08 s

MongoDB:

0.02 s

 

Update 50 000 records:

UPDATE table SET name=’sone new name 2′ WHERE (id > 250000 AND id <= 300000);

db.files.update({id:{$gt:250000, $lte:300000}}, {$set:{name:’some new name2′}}, false, true);

MySQL:

10.63 s

MongoDB:

3.54 s

Insert 50 records:

MySQL:

0.08 s

MongoDB:

0.02 s

 

Insert 500 records:

MySQL:

0.13 s

MongoDB:

0.09 s

Conclusions and other thoughts:

MongoDB has a clear  advantage on speed and this increases as more records are added.

Concerns are:

– MongoDB is not as battle tested or hardened

– The “gotcha’s (lack of our knowledge in part..)

– In MySQL data can be obtained from multiple tables with a single query whereas in mongoDB it seems multiple queries are needed to obtain data from multiple collections. Whereas there are latency advantages when dealing with a single collection these are negligible when dealing with multiple collections. Also, tuning of MySQL buffers and partitioning reduces speed advantages once again.

The conclusion was to stick with MySQL but to keep an eye on MongoDB.

DropBox is just a frontend to Amazon S3 with a killer sync feature

Musing about iCloud, the forthcoming SkyDrive integration into Windows 8, and Google Drive  got me thinking about DropBox, the company whose business model is built on charging when everyone else is starting to give large amounts of storage away for free. DropBox killer feature is their sync replication. It just works, and consumers have shown they love the simplicity of it. However Apple have replicated the simplicity of the sync, albeit only for iOS users, and Microsoft are now close to the same with Live Mesh.

DropBox store the files you give them on Amazon S3. This surprises many people who had assumed that they are stored on DropBox Servers. This means that the entire DropBox business model is beholden to Amazon Web Services. Amazing when you think about it, and highly illustrative that what DropBox really brings to the table is great software with a killer feature, but what is going to happen when every one else has that killer feature, with 10x to 20x more storage for free?

recent article had DropBox valued at 4 billion dollars . This is a valuation on a company doing revenues between 100-200 million dollars per year in which investors have poured in 257 million dollars in funding. Perhaps it’s me, but I just don’t see it. Yes, they have a gazillion subscribers but so what? In a commodised industry that struggles to convert more than 2% of the user base, why should that get me excited? But there is DropBox Teams for businesses right? Ever used it? Then try it and you won’t need me to draw a conclusion.

So what for DropBox if there is no mega IPO coming along? They turned down Mr Jobs (a mistake), so who else would be interested? What about Amazon? After all DropBox really is the ultimate sync client for Amazon S3. With Amazon now looking twards  private cloud it would same a match made in heaven. As with all good things, time will tell……

Comprehensive overview of PaaS Platforms

Looking to implement a PaaS. Wondering what product to start with or how they compare ? Well, there may not be an App for that but there is a collaborative spreadsheet.

To view the spreadsheet directly on Google Docs click here (it seems Google only supports 50 concurrent connections of a spreadsheet so if you have an issue try again later)

Are we witnessing the death of public File Sharing services ?

The decline of MegaUpload and the rumours that the FBI has another hotlist of sites to go after has left other file sharing services running for their proverbial lives, with legitimate services often deciding to remove public file sharing from their own services, despite arguments that the MegaUpload “bust” has done little to reduce internet piracy.

A list stored on the pastebin service shows the extent that the MegaUpload and MegaVideo closure has had on services:

  1. MegaUpload – Closed.
  2. – FileServe – Closing does not sell premium.
  3. – FileJungle – Deleting files. Locked in the U.S..
  4. – UploadStation – Locked in the U.S..
  5. – FileSonic – the news is arbitrary (under FBI investigation).
  6. – VideoBB – Closed! would disappear soon.
  7. – Uploaded – Banned U.S. and the FBI went after the owners who are gone.
  8. – FilePost – Deleting all material (so will leave executables, pdfs, txts)
  9. – Videoz – closed and locked in the countries affiliated with the USA.
  10. – 4shared – Deleting files with copyright and waits in line at the FBI.
  11. – MediaFire – Called to testify in the next 90 days and it will open doors pro FBI
  12. -Org torrent – could vanish with everything within 30 days “he is under criminal investigation”
  13. – Network Share mIRC – awaiting the decision of the case to continue or terminate Torrente everything.
  14. – Koshiki – operating 100% Japan will not join the SOUP / PIPA
  15. – Shienko Box – 100% working china / korea will not join the SOUP / PIPA
  16. – ShareX BR – group UOL / BOL / iG say they will join the SOUP / PIPA

For certain sites that previously failed to remove copyrighted files for long periods of time, then this is clearly illegal, and in our opinion these services should rightly be targetted. Other services that tuned their whole offering to enable users to upload copyrighted content, whilst then charging users to access the illegally obtained copyrighted content, can also have no complaints at legal intervention.

However we are more concerned about other services such as Box, DropBox etc who offer public file sharing for legitimate purposes,  and to treat these services the same as those aforementioned is clearly wrong. It is like trying to ban cars because robbers choose to use cars as getaway vehicles when robbing banks. Clearly the car manufactures did not design cars to rob banks ! Whereas the analogy may sound trite, what is concerning is that authorities may well go after every file sharing service just because they possess public file sharing features.

Dealing with MySQL issues in the Cloud: Automating restart on error

MySQL is the mainstay of most Cloud Applications (including this WordPress Blog !), however if MySQL has an issue, either through number of connections maxing out, or MySQL being locked and not available it can result in site outages. We’ve seen clients who have ended up with their SQL DB down from a couple of hours to a couple of days before they suddenly realised there was an issue.

To that end we wrote a small script that can be used to automate the restarting of MySQL in such scenarios.

The script is called mysqlrestart.sh and is listed below. You need root access to be able to use it. If you use it and  ever reboot the server you will need to login as root and run nohup ./mysqlrestart.sh  & to restart it.

Set the script to run every 30 seconds using Cron. It will then check for a number of connections and if it cannot get a connection or the number of connections is greater than the number defined (defined as 90 in the example below), it will restart mysql.

#!/bin/bash

SQLCONNECTION_THRESHOLD=90

echo `date`  sqlrestart started >> run.out

while true; do

        sqlconnections=`mysql –skip-column-names -s   -e  “SHOW STATUS LIKE ‘Threads_connected'” -u root | awk ‘{print $2}’`

        #exclude myself from the number of thread connections             

        sqlconnections=$((sqlconnections – 1))

        echo `date` sqlrestart connections  $sqlconnections  >> run.out

        if [ $sqlconnections -gt $SQLCONNECTION_THRESHOLD ] || [ $sqlconnections -lt 0 ]

        then

                echo `date` restarting mysql server $sqlconnections  >> restart.out

                service mysql restart >> restart.out 2>&1

                echo `date` restart  complete   >> restart.out

        fi

        sleep 30

done

Using the Power of Cloud Computing with SaaS Services

We recently started documenting the services we use on a day-to-day basis and it struck us just how much we use SaaS and Cloud Computing services. To that end we thought it would be fun / beneficial to share some of these and how and why we use them:

File Server: StorageMadeEasy

We use SkyDrive (25 GB free) in conjunction with Amazon S3 and an in-house PogoPlug installation to store files. We use the Storage Made Easy Cloud File Server to provide a unified view of all of ur files (we are trialling PogoPlug support with them). This also enables us to assign user / file permissions and governance on the consolidated information stores which is very useful. We can also create collaboration groups across the consolidated information stores for sharing with clients and where the files are stored is abstracted (to the client). We also use the service  to managea nd share files on our iPad (tablet) and Windows Phone 7 clients.

CRM: Zoho CRM

Having used several different CRM systems over time, personally we prefer Zoho. For up to 3 users it is free and adding users and services is reasonably priced. It’s also very easy to change the default templates and to use (using the HTML5 mobile App). SalesForce is synonymous with CRM systems and SaaS, but Zoho is good (cheaper) alternative.

Project Management: Basecamp

Basecamp can be a little expensive but if you manage projects and want to collaborate it is hard  to beat as it’s simplicity and easy to use web interface stand the test of time.

Source Code / Bug Tracker: BitBucket

There are many source code and bug tracking systems, many of them free but we like BitBucket. It’s free for up to 3 users and is a solid source code and bug tracker. For source code editing we use Textastic which enables us to hook into specific source files stored on SkyDrive via WebDav using the SMEStorage CloudDav feature which is enabled when the iPad app is purchased.

Analytics: Google

Google Analytics a essential for tracking website statistics. There are alternatives (such as the fantastic Piwik) but it is hard to beat for ease of use. It is not flawless, the lack of ability to track IP addresses (and therefore do a reverse DNS lookup) is frustrating for example. We use AnalyticsPro on the iPad for mobile access/tracking.

Email: Gmail

Google Apps GMail is a great email system. We’ve used it for years and only have good things to say about it. We backup our Gmail to SkyDrive using the SMEStorage Cloud File Server so that it can be indexed and searchable along with our other files, and of course also for resiliency. For offline access on our iPad’s we use a customised version of Remail that we enhanced for the iPad.

Inbound Lead Tracking: Leadlander

LeadLander enables us to track companies visiting our website, how often they visit, and the new people feature enables us to directly contact leads. It’s a great service and great value at a couple of thousand dollars per year.

Server Monitoring: Server Density & Pingdom & WatchMouse & PagerDuty

Server Density is great for monitoring thresholds on Apache, Server Processes, MySQL connections etc. We plug this into PagerDuty so we can be alerted by phone if major thresholds are breached. Server Density also provide an iOS App for push alerts. Pingdom is used an added check for server outages, and WatchMouse is used to check quality of service for times taken to load pages on a site.

MySQL Admin: PHPMyAdmin

We tend to use the command line but if we need to access MySQL graphically we use PHPMyAdmin.

Call Services: e-Receptionist and Skype and TollFreeforwarding

e-Receptionist is great for virtual teams if you want to route calls to Sales / Support etc and you are not physically in the same location. Skype of course needs no explanation, except that we back up our Skype conversations to SkyDrive using the SMEStorage Cloud File Server so they can be indexed and searchable as we do a lot of communication over Skype. TollFreeForwarding is used to give an international number and then route into the e-receptionist infrastructure.

Online Marketing: Google Adwords and BuySellAds.com and LinkedIn Ads

Google Adwords is synonymous with online marketing and can be a great sales tool if used correctly, and BuySellAds is useful to advertise services to targeted sites using banner ads. LinkedIn Ads is a great way to reach a very targeted audience and in our experience is a great way to compliment any online marketing campaign.

Invoicing Services: BlinkSale

Blinksale is a great, simple SaaS invoicing service that is low cost and very easy to use and administer. There are many others but for simplicity you can’t beat BlinkSale.

Blog: WordPress

There are many others, such as Google Sites, and Tumblr, but there are so many ways to use WordPress and so many plug-in’s that there really is nothing else to compete with it.

Social: Twitter and FriendFeed and Identi.ca and Google Plus and Hootsuite

We use a variety of social sites to push out news. There are of course so many now that it would be impossible to list them all but we covered the major ones. If you use multiple Twitter or social accounts then Hootsuite is a must.

……. and there you have it. The types of SaaS cloud services a distributed virtual company can use to run their business.

Amazon Cloud is now FISMA certified: Joins Google and Microsoft

The Amazon Cloud has now classed as being FISMA certified. FISMA is an acronym for Federal Information Security Management Act. FISMA sets security requirements for federal IT systems. and is a required certification for US federal government projects.

This is the third set of certifications Amazon has recently announced coming on top of VPC ISO 27001 certification and SAS 70 Type II certification.

The accreditation covers EC2 (Amazon Elastic Compute Cloud), S3 (Simple Storage Service), VPC (Virtual Private Cloud), and includes Amazon’s underlying infrastructure.

AWS’ accreditation covers FISMA’s low and moderate levels. This level of accreditation requires a set of security configurations and controls that includes documenting the management, operational and technical processes used in securing physical and virtual infrastructure, and a requirement for third-party audits.

Other vendors who recently announced FISMA certification recently where Google with Google Apps for Government and Microsoft with the Microsoft’s Business Productivity Online Suite among cloud services (although there was a spat between Microsoft and Google regarding these claims).

Expect to see further certifications as these are a pre-requisite of expansion into lucrative government and private sector contracts as vendors feels more comfortable choosing Cloud resources as commoditisation marches on.

The Cloud and the power of “one”

One of the interesting things that about the last 12-18 months is how the Cloud has put the power into the hands of consumers. What I mean by this is, imagine the following scenario in the world pre-cloud:

“A user buys some software over the Internet or Shrink wrapped. They receive it and install it. It either does not work for them or they cannot figure out how to use it so they basically write off the cash and don’t use the App. End of story.”

Now lets looks at what happens on Cloud:

“The user either buys an Application from an App Store be it desktop or mobile, or a holiday from a holiday store, and then decided either the application is rubbish, does not work for them (or they have not RTFM) or has a bad experience on holiday. The user then use Social networks and/or the review forum on the App Store to comment on the bas experience”

In the latter case this “review” and negative experience puts off other people of trying the App / the holiday / the hotel etc. In some cases it can mean the difference between continually selling product or selling nothing as users look at the last bad review and then move somewhere else to continue their search to buy. One person can have the power to seriously undermine your whole product marketing and application strategy.

Even worse, many of the review forums (Apple’s App Store and Google Marketplace come to mind…) don’t even let you post a counter-review to explain that either the user has got it wrong, or misunderstood, or to genuinely offer to correct a bug. Worse still, some vendors can use a strategy of targeting competitive products to “put people off” purchase. In some cases this has led to the vendors involved seeking out legal action.

So what can you do to protect your product and your reputation ?

Well the first step has already been taken, you are at least thinking about it and conscious of it which is half the battle. What you should do is have your marketing or support team have a strategy that includes:

– Monitoring App Stores and review forums where your product features
– Monitoring social media for keywords about your product or company
– Set up Google Alerts keywords to inform you of keywords about your product and company
– Ensure you check your Twitter messages and also posts on LinkedIn and Facebook pages.  These are easily missed.

When you see responses, always make sure to try and follow up with the user and engage and resolve their issue, even if this means refunding them, Even if a refund seems like the last thing you want to do, offer the refund, it is not worth your reputation. As part of the process try and see if they will change their review, even if it is only to neutral.

The Cloud brings power to the masses in more ways than one and a single user can have a dramatic network effect on your business if you are not careful !

Using Cloud payment gateways can seriously damage your business

Recently we had a customer contact us who had been using Google Checkout to sell an Android application. The customer logged into their Merchant account as they are prone to do each morning to be confronted with “Your merchant Account has been closed” with some templated details about why this could be the case. No specifics, just “we have done this because you may have fallen foul of one of the following reasons”.  You can see the full message here.

This in and of itself is incredible if you think about it, you turn up for work one day, check your business account to find that it’s no longer accessible and no reason has been given, but it gets worse. Within the body of the message the customer is told that any income they have in their account has been refunded, and all access to account details is no longer permitted.

Where we come in is that we were asked if we could help. Well you’d reasonable expect this would be a simple process. Contact Google, figure out what the issues is and resolve it. Right ? Wrong !

Firstly it’s impossible to contact Google by phone to try and resolve this.. It seems that Google has a UK address (this is for a UK conpany) and telephone number but all you get is a recorded message. The address and telephone number is below in full:

Google Payment Limited
Belgrave House
76 Buckingham Palace Road
London SW1W 9TQ
United Kingdom
0207 031 3000

So, we fished about the web to see if we could find a support address for Google Checkout, which we did: checkout-support@google.com. But guess what ? You get an auto-resonder informing you that direct emails are not answered and to use the Merchant help centre, which is basically a set of links, and forums.

We did some more checking on the web and we found the issue is far from a lone issue. Companies invest in using the Google Cloud payment gateway and then at some point in time their accounts is de-activated, any payments for goods they have sent is refunded, and they cannot access any account details, or indeed contact Google direct to see what they have done wrong. This is akin to turning up at your bank to find out you can no longer access your account, your funds have been given back to your customers, despite you having sent them products, and your bank manager will not even entertain talking with you. Seriously we’re not often dumbfounded but in this case we are !

Now we’re not saying that some business may indeed have fallen foul of Google Terms of Service, but this is far from an isolated problem. A quick scan at the Google Merchant forum show many furious customers who have experienced the same issue, and getting in touch with Google at all in regards to Google Checkout issues seems impossible.

So what can you do ? Well it seems all you can do is apply for re-instatement. Further digging found us a link in which you can ask Google via an online contact form as to why the account was suspended. Other than this it seems there is little you can do. Nor is this a new issue. When digging about looking for solutions we found a particular post that is almost 3 years old that read like a litany of the issues our client faces.

Nor is this issue limited to Google Merchant services, PayPal users seem to have experienced very similar issues.

So, the morale here is, choose your Cloud payment gateway carefully. Your business may depend on it ! Stick to reputable merchant service providers and check out their customer service prior to doing any integration work. Make sure you can contact them and fully understand their terms of service.

Amazon enables easy website hosting with S3 – competes with RackSpace

In a move that has put it into direct competition with competitors such as RackSpace. Amazon has announced that you can now host your website using an Amazon S3 Account. With these new features, Amazon S3 now provides a simple and inexpensive way to host your website in one place at a very cheap price.

To get started, open the Amazon S3 Management Console, and follow these steps:

1) Right-click on your Amazon S3 bucket and open the Properties pane

2) Configure your root and error documents in the Website tab

3) Click Save

Amazon provide more information on hosting a static website on Amazon S3 here.

This is part of a trend that Amazon obviously want to encourage. They recently started an add placement from JumpBox on their free Web Services developers page to offer one click WordPress deployments, amongst other JumpBox offerings.