Ed Snowdon’s email service shuts down – advises not to trust physical data to US companies – what are options ?

It has been a while since we did a post and a lot has happened in that time including the explosion from Edward Snowdon and the PRISM snooping revelations. These have continued to gather momentum culminating  in the email service that Snowdon used, Lavabit, closing. The owner, Ladar Levision, said that he had to walk away to prevent becoming complicit in crimes against the American public. All very cryptic and chilling. He also had this advised that he “would  strongly recommend against anyone trusting their private data to a company with physical ties to the United States.” So what to do if you have data stored on remote servers ?

Well firstly you may not care. The data you are storing may no way be sensitive and that is the key ie. you need a strategy for how you deal with sensitive data and sharing of sensitive data so what can you do ?

1. You could consider encrypting the data that is stored on cloud servers. There are various ways to do this. There are client side tools such as BoxCryptor that do a good job of this, and there are also more enterprise type platform solutions such as CipherCloud and Storage Made Easy that enable private key encryption of data stored remotely . Both can be deployed on-premise behind the corporate firewall.

2. You could consider a different policy entirely for sharing sensitive data. On a personal basis you could use OwnCloud or even setup a RaspBerry Pi as your own personal DropBox or again you could use StorageMadeEasy to create your own business cloud for keeping sensitive data behind the firewall and encrypting remote data stored outside the firewall.

The bottom line is think about your data security, have a policy, think about how you protect sensitive data.

 

DropBox is just a frontend to Amazon S3 with a killer sync feature

Musing about iCloud, the forthcoming SkyDrive integration into Windows 8, and Google Drive  got me thinking about DropBox, the company whose business model is built on charging when everyone else is starting to give large amounts of storage away for free. DropBox killer feature is their sync replication. It just works, and consumers have shown they love the simplicity of it. However Apple have replicated the simplicity of the sync, albeit only for iOS users, and Microsoft are now close to the same with Live Mesh.

DropBox store the files you give them on Amazon S3. This surprises many people who had assumed that they are stored on DropBox Servers. This means that the entire DropBox business model is beholden to Amazon Web Services. Amazing when you think about it, and highly illustrative that what DropBox really brings to the table is great software with a killer feature, but what is going to happen when every one else has that killer feature, with 10x to 20x more storage for free?

recent article had DropBox valued at 4 billion dollars . This is a valuation on a company doing revenues between 100-200 million dollars per year in which investors have poured in 257 million dollars in funding. Perhaps it’s me, but I just don’t see it. Yes, they have a gazillion subscribers but so what? In a commodised industry that struggles to convert more than 2% of the user base, why should that get me excited? But there is DropBox Teams for businesses right? Ever used it? Then try it and you won’t need me to draw a conclusion.

So what for DropBox if there is no mega IPO coming along? They turned down Mr Jobs (a mistake), so who else would be interested? What about Amazon? After all DropBox really is the ultimate sync client for Amazon S3. With Amazon now looking twards  private cloud it would same a match made in heaven. As with all good things, time will tell……

The rise of the Cloud Data Aggregators

As storing data in the cloud becomes increasingly more normal users will increasingly find themselves in the position of needing to access different types of data regularly.  To this end we are starting to see a new breed of applications and services which themselves provide a service that interacts with data stored on the cloud. The challenge is  that services that sell their products or service based on data access are in the position of having to choose which data services to support.

This is further exacerbated in the cloud storage space as their is no ubiquitous API (see our prior post on Amazon S3 becoming a de facto standard interface).

To this end we are starting to see services an applications that themselves are offering interesting aggregations of access to data clouds. We look at a few of these below:

GoodReader, Office2 HDQuickOfficeDocuments to Go, iSMEStorage, iWork:

The iPad,  iPhone, Android have some interesting applications which function on top of existing data clouds. All the aforementioned application work in this way, either letting you view the files (in the case of GoodReader) or letting you view and edit the files (in the case of Office2, QuickOffice, Documents to Go, iWork, and iSMEStorage). The premise is that if you have data stored in an existing cloud then you can load and view or edit it in this tools and store it locally.

Tools such as iWork (which encompasses iPages, iNumbers, and iKeynote) only work with MobileMe or the WebDav standard, although the iSMEStorage App gets around this by enabling you to use iWork as an editor for files accessed through it’s cloud gateway , that can be stored on any number of clouds, using WebDav, even if the underlying cloud does not support WebDav.

In fact some companies are making data access a feature in pricing, for example,  charging extra for increased connectivity.

Gladinet.com and StorageMadeEasy.com :

Both Gladinet and SMES are unique amongst the current Cloud vendors in that they enable aggregated access to multiple file clouds. They essentially enable you to access cloud files from multiple different providers from a single file system.

Gladinet is inherently a windows only solution with many different offerings whereas Storage Made Easy also has windows software but also has cloud drives for Linux, Mac and also mobile clients for iOS, Android and BlackBerry. Gladinet is  a client side service whereas SME is a server-based service using it’s Cloud Gateway Appliance ,which is also available as a virtual appliance for VMWAre, XEN etc.

Both offering support a dizzying array of Cloud, such as, Amazon S3, Windows Azure Blob Storage, Google Storage, Google Docs, RackSpace Cloud Files etc, plus many more.

Such solutions don’t just aggregate cloud services but bring the cloud into the desktop and onto the Mobile / Tablet, making the use of cloud data much more transparent.

As data become more outsourced (to the cloud) for all types of different applications and services I expect we will see more such innovative solutions, and applications that give access to aggregated cloud data, and extend the services and tools that are provided by the native data provider.

Cloud Failure – Files cannot be downloaded from Box.net

Again the ugly issue of what do you do when the cloud goes wrong rears it’s head. Right now if you login to box.net and try and download a file you cannot download a file. Instead you get a screen like the below. I’m sure Box are aware of this, but it again shows you the total reliance you have on an outsourced infrastructure on the cloud, and their problems become your problems.

Picture 12

Is Amazon S3 really cheaper than the alternative ?

An interesting post asked the question why Amazon S3 is considered cheaper than the alternative  – excerpt below:

With a price tag of $0.150/GB/month, storing 1TB of data costs around $150/month on Amazon S3. But this is a recurring amount. So, for the same amount of data it would cost $1800/year and $3600/2-years. And this doesn’t even include the data transfer costs.

Consider the alternative, with colocation the hardware cost of storing 1TB of data on two machines (for redundancy) would be around $1500/year. But this is fixed. And increasing the storage capacity on each machine can be done at the price of $0.1/GB. Which means that a RAID-1+redundant copies of data on multiple servers for 4TB of data could be achieved at $3000/year and $6000/2-years in a colocation facility. Whereas on S3 the same would cost $7200/year and $14,400/2-years.

Also, adding bandwidth+power+h/w replacement costs at a colocation facility would still keep the costs significantly lower than Amazon S3.

Given this math, what is the rationale behind going with Amazon S3? The Smugmug case study of 600TB of data stored on S3 seems misleading.

I do see several services that offer unlimited storage which is actually hosted on S3. For example, Smugmug, Carbonite etc. all offer unlimited storage for a fixed annual fee. Wouldn’t this send the costs out of the roof on Amazon S3?

The CEO of SmugMug responded:

Hey there, I’m the CEO & Chief Geek at SmugMug. You’re overlooking a few things:

– Amazon keeps at least 3 copies of your data (which is what you need for high reliability) in at least 2 different geographical locations. That’s what we’d do ourselves, too, if we continued to use our own storage internally. So your math is off both on the storage costs and then the costs of maintaing two or more datacenters and the networks between them.

– When Amazon reduces their prices, you instantly get all your storage cheaper. This isn’t something you get with your capital expenditure of disks – your costs are always fixed. This has upsides and downsides, but you certainly don’t get instant price breaks to your OpEx costs. When they added cheaper, tiered storage, our bill with Amazon dropped hugely.

– There’s built-in price pressure with Amazon, too. The cost of one month’s rent is roughly the same as the cost of leaving. So if it gets too expensive (or unreliable or slow or whatever your metrics are), you can easily leave. And Amazon has incentive to keep lowering prices and improving speed & reliability to ensure you don’t leave.

– CapEx sucks. It’s hard on your cashflow, it’s hard on your debt position if you need to lease or finance (we don’t, but that just means it’s even harder on our cashflow), it’s hard on taxes (amortization sucks), etc etc. I vastly prefer reasonable OpEx costs, with no debt load, which is what Amazon gets us.

– Free data transfer in/out of EC2 can be a big win, too. It is for us, anyway.

– Our biggest win is simply that it’s easy. We have a simpler architecture, a lot less people, and a lot less worry. We get to focus on our product (sharing photos) rather than the necessary evils of doing so (managing storage). We have two ops guys for a Top 500 website with over a petabyte of storage. That’s pretty awesome.

So what does this tell us ?

1. Opex is better than Capex, especially when related to something related to the  intrinsic running of your business

2. The utility compute model reduces risk i.e. your cost of turning it off is the equivalent of one month of running the service.

3. The “ilities” that you get for free such as HA, redundant copies, geographical distribution etc would need to be paid for in an alternative model and are expensive to build in.

4. The flexibility is greater i.e. if you need to scale out to double capacity on demand then this is easily achievable with S3 but needs to be planned, built and executed in the alternative model.