Cloud Databases are the Future

As I type this I’m sitting here watching MySQL shutdown after a bad up/down sequence and for the last five hours it’s been rebuiding it’s indexes… Fundamentally everything is down at this point, if it wasn’t for the backend services that are not tied to the database things would be ugly.

I’ve been preaching internally about cloud databases for the last few months and I’m totally convinced they’re the future. Why you ask? I only have a few requirements in my cloud database:

  • Distributed and replicated store – I would like to make sure that I have N copies of any object and can support a read throughput of X across the store.

  • Opaque object identifiers — within a “tablespace” – Every object should get a unique identifier on storage and the ability to reference it via a “key”, I don’t really care about  the ability to assign it.

  • Indexable fields – This is where I can spec a primary key, I don’t want auto increment, since that’s just a crutch for uniquness… I’ve been happy to use things like:     base64.urlsafe_b64encode(uuid.uuid1().bytes).rstrip(‘=’)

  • Fast – Well duh, hopefully this is accomplished with the replication and the ability to tune the replication.

What can do this?  Or, where should I be spending my time evaluating technologies?

  • CouchDB
  • MongoDB
  • Amazon S3
  • Some custom Lucene + Storage mechanism
  • Google Base
  • others?

Though once that is complete, I need to have something like django support.  If you spend any time with django and Google App Engine you see that it works but half the power of django is missing since the nice folks at google ripped the model layer out.  It would be really nice if one could have the benifit of one of the above stores and build a whole django model layer out to support it…