R – Is a cloud service suitable for this application


I'm looking for details of the cloud services popping up (eg. Amazon/Azure) and am wondering if they would be suitable for my app.

My application basically has a single table database which is about 500GB. It grows by 3-5 GB/Day.
I need to extract text data from it, about 1 million rows at a time, filtering on about 5 columns. This extracted data is usually about 1-5 GB and zips up to 100-500MB and then made available on the web.

There are some details of my existing implementation here
One 400GB table, One query – Need Tuning Ideas (SQL2005)

So, my question:
Would the existing cloud services be suitable to host this type of app? What would the cost be to store this amount of data and bandwidth (bandwidth usage would be about 2GB/day)?

Are the persistence systems suitable for storing large flat tables like this, and do they offer the ability to search on a number of columns?

My current implementation runs on sub $10k hardware so it wouldn't make sense to move if costs are much higher than, say, $5k/yr.

Best Solution

Given the large volume of data and the rate that it's growing, I don't think that Amazon would be a good option. I'm assuming that you'll want to be storing the data on a persistent storage. But with EC2 you need to allocate a given amount of storage and attach it as a disk. Unless you want to allocate a really large amount of space (and then will be paying for unused disc space), you will have to constantly be adding more discs. I did a quick back of the envalop calculation and I estimate it will cost between $2,500 - $10,000 per year for hosting. It's difficult for me to estimate accurately because of all of the variable things that amazon charges for (instance uptime, storage space, bandwidth, disc io, etc.) Here's the EC2 pricing .