While the use of SQS is too obvious to even be considered more than once, SimpleDB, S3 and even EC2 is highly contentious.
Instead of EC2, you can get a VPS from slicehost for reasonable sum (depending on how you compare). But if you use S3 or SimpleDB, EC2 is probably a must.
So then why S3. I find serving assets (images) from S3 to be too slow. You also cannot do CNAMEs tricks to maximize browser open connections. So think of S3 as a ultimate data storage, and backup data server, but never main data server. However, Dreamhost can store and serve your data at ridiculously cheap price. So even if you use S3, you still need to do something to actually serve your data.
SimpleDB is even more troublesome. It is not RDBMS, so you have limited query capability (no SQL!), no foreign keys, no transactions, no data constraints. And you have to deal with "eventual consistency".
Why use Amazon AWS at all then?
The short answer is that because it is there. I recently re-started a project that uses S3 as data storage. While I need to spent a few hours setting up my VPS, installing MySQL, installing RMagick and ImageMagicks, ruby, etc, etc (why something always go wrong?), the data upload to S3 just... works. When the image server piece wasn't running right, I just need to tell my code to start using S3 URLs and it... works. It's just so nice that it's always there, no need to back it up, no need to move it around, etc.
Using Amazon AWS is really come to down to the fact that you do not need to do admin work. Since I use S3, I do not need to worry about moving my data, backing it up and splitting it between nodes due to size constraints.
Similarly with SimpleDB, I'd foresee that I no longer need to muck around with MySQL settings and figuring out how to setup clusters, etc.
But, no transactions! "Eventual consistency"! How???
Well, all those things do have a code workaround. And guess what, I am better at programming than doing admin work. Not to mention I also like programming more than admin work! I'd rather spent a few weeks figuring out a slick way to deal with "eventual consistency" and end up with a supremely reliable data storage. The alternative is having to create cron jobs to backup MySQL, install MySQL whenever I need one... *shudder*
Of course there's also the whole scalability benefit, but with 500 visitor/day... I do not need to worry about that as yet.
Subscribe to:
Post Comments (Atom)
3 comments:
You're kidding when you say that you don't need to backup your data, right? I've not seen anything published by Amazon that tells me one lick about why they are trustworthy with my data. If they've published this, then I welcome it being pointed out to me.
In my opinion, it is unwise to entrust your business or personal data's continued existence to a third-party entity (even with a name as big as Amazon -- and, trust me, I respect Amazon) without a clear picture of the technology through which they guarantee zero-data-loss.
Taking it a step further, even with that level of information I would still engineer a plan by which I have final say in how my data is protected.
It's my data. It's my life. It's my job.
mikedoug, you are absolute correct to point out that Amazon has not publish a guarantee of data protection.
In all my development work, I always have to weight between doing things myself or using existing "stuff".
Like should I use an open-source library instead of writing myself? Should I use existing protocol or write my own?
The decision usually comes down to, can I do it better than them? More often than not, my answer is no. If yes, the next question is, do I have the time for it. Again more no than yes.
So if the question comes down to, can I provide a reliable data storage for my assets, accessible through HTTP, can be set to either public/private, and goes down only 3 hours in a whole year (like S3), the answer is yes. Do I have time for it. No.
I'd rather be coding site functionality.
But if like you, my job is to make sure data should never ever ever go away at any cost, then I'd agree that my time would be invested on building such system.
That's a fair response. At the same time, however, there are multiple online companies through which you can store files -- so you can always backup your S3 with another online provider and feel confident that you won't lose your data.
That would (likely) incur the least amount of your programming time to ensure the existence of your business through a major disaster.
To be clear my point was more about the protections against absolute loss, not about 3 hours of downtime (though that just increased by another 8 hours last weekend) a year.
Post a Comment