Tuesday, April 24, 2012

More on DynamoDB - The good part!

In a previous post on DynamoDB, I told you we were in the process of migrating to DDB and from MongoDB for our largest datastore. Now, we have moved a bit further on this and we, including myself, has pretty positive view on DDB, it really is a viable MongoDB alternative if you can live with the limitations. That said, there are many limitations, but I would like to put this differently. I would say this is an opportunity to simplify and get real good performance from your base level data, and combine it with other technologies where appropriate.

I wouldn't think that any serious application that use a database could live with DynamoDB only, unless the application developers were prepared to do just about everything database related, beyond the most simple, themselves. For example, you might need a secondary index, DDB doesn't provide you with them, so what you could do is use another DDB table as an index into the main data. Which is fine, but you have to implement it yourself, no more CREATE INDEX statement, no more ensureIndex() command and no more "the index is there so the optimizer will use it" rather "I have now an index on that previously unindexed attribute, so I rewrite my code to take advantage of it".

That said, how I would like to see DDB, and this is how we use it here at Recorded Future, is as a store for low level objects, like BLOB, pieces of text, pieces of XML, collections of keywords, you name it. Then you reference that data with an id that is looked up in some kind of supporting technology, like a free-text search engine or MySQL or both.

What we are looking at doing here at Recorded Future is to use DDB for just this kind of stuff. The supporting technologies are, in our case, MongoDB (yes, MongoDB, we have data in MongoDB today that will not work well in DDB, data that has secondary indexes on it, data that has uses more features in MongoDB etc) and Sphinx. But this may change. The database we are moving from MongoDB to DDB is just so simple and straightforward as is required to make it a good for for DDB.

And despite the limited functionality, DDB has several advantages:
  • It performs well, and I can pay for only the throughput I need. Actually, pricing is one of the intriguing aspects of DDB, that you pay for throughput, basically, not for storage, number of servers, number of users or something as arcane as that.
  • It is managed by Amazon, and Amazon seems to do a good job here.
  • DDB currently lacks any kind of backup mechanism, and as DDB isn't exposed outside the managed Amazon DDB environment, there isn't much I can do about it, so I just ignore it and tell my managers that Amazon will not allow me to back up our data (yes, I am kidding now).
  • There are several reasonable well working APIs, Ruby, Java etc that is integrated in the same Amazon APIs as the other Amazon services (the REST based API that these are built on leaves a fair bit to be desired, as does the documentation).
We are not live with DDB yet, we need to figure out a way to perform backups (as we can't get backups out of DDB, we have to find a way in out application to catch data before it enters DDB) for example, and we have coding to do, but my initial reservations regarding DDB are not as strong as they used to be, but one has to know the limitations, fave the facts and work with them. But that is life in a Cloud environment anyway.

As for the DDB pricing model, should we call that Cloud-based prising?


1 comment:

Angelica Gomez said...

Hello. I am interested with the data migration that your organization did.
We are also having a similar migration - from SimpleDB to DynamoDB.
Could you give some tips/standards as to how primary keys can be chosen? Thanks!

P.S. I added you on Google+