Karlsson on databases and stuff: March 2012

Wednesday, March 21, 2012

Amazon DynamoDB ... Is it any good

As you might have noticed, I'm getting further away from MySQL here. This is just how things are I guess, I just do much less work with MySQL these days. The first migration was from MySQL to MongoDB, which was some time back. This was pretty successful, but note that we still have some data in MySQL, but the bulk of the data is in MongoDB right now.

Running any database on Amazon (and we run all databases on Amazon or on Amazon RDS service) may be costly, depending on how you utilize your resources. The recently announced Amazon DynamoDB is Amazons NoSQL service offering, but it is not like MongoDB with a twist, far from it. If you have read what I have written about MongoDB, I have now and then complained about the lack of functionality, but to be honest, I have learnt to live with MongoDB and it's shortcomings, and have started to like many of the JavaScript features of it (one thing I hate about it, and about JavaScript in general though, is the numeric datatype. It's just plain silly).

That said, we are now taking a shot at migrating again, this time to DynamoDB. In comparison with MongoDB, DynamoDB is incredibly simplistic, there are very few things you can do. To begin with, there is one "index" and one index only on each table. This index is either a unique hash key (that is what they call it) or a combination of a hash-key and a range-key (a unique composite key). I'll soon get into the gory details later. You can not have secondary indexes, i.e. indexes on any other attribute than the "primary key" or whatever you want to call it.

You can then read data in 1 of three ways. Simple:

You read a single row by unique key access. If you have a composite hey, provide both the hash-key and the range-key, else provide just the hash key.
You scan the whole table.
If you have a composite key, access by the hash-key part and scan (you may filter, but in essence, this is still a scan) on the range key.

There is nothing else you can do, and note that unless doing a full table scan, you must always provide the hash-key, i.e. if you do not know the exact hash key for the row to get, you have to do a full table scan. There is just no other way.

The supported datatypes aren't overly exciting either: Number, String and a Set of Number and String. The string type is UTF-8 and the Number is a signed 38 precision number. Other notable limits is that there is a max of 64 K per row limit, and that a scan will only scan up to a max of 1Mb of data. Note that there is no binary datatype (we have binary data in out MongoDB setup and use base64 encoding on that in DynamoDB).

Pricing is interesting. What you pay for is throughput and storage, which is pretty different from what you may be used to. Throughput may adjusted to what you need, and it's calculated in kb of row data per second, i.e. a table with rows of up to 1Kb in size that with a requirement of 10 reads per second will mean you need 10 units of read capacity (there is a similar throughput number for write capacity). Read more on pricing here.

All in all, DynamoDB still has to prove itself, and in some situations it might turn out expensive, in other situations not so. I have one major big gripe with DynamoDB, before I close for this time: DynamoDB is not Open Source, nor is it a product you can buy, except as a service from Amazon. You want to run DynamoDB on a machine in your datacenter? Forget it. This annoys the h*ll out of me!

We are still testing, but so far I am reasonably happy with DynamoDB, despite the issues listed above. The lack of tools (no, there are no DynamoDB tools. At all. No backup tool, no import / export, nothing) means that a certain amount of app development is necessary to access it, even for the simplest of things. Also, there is no Backup, but I am sure this will be fixed soon.

/Karlsson

Wednesday, March 7, 2012

Amazon RDS take two

I wrote in a previous blogpost, a few weeks ago, that we had started to migrate our MySQL databases to Amazon RDS. We have now been running this for quite a while, and so far we have not had any issues at all. The plan was to migrate even more, but this has not happened yet, as we got into another interesting migration: MongoDB to DynamoDB!

So far we have done some benchmarking, and we are pretty happy with that. DynamoDB has some interesting features, among the most interesting one is how they price it, where you pay for performance and resources used. This is not too dissimilar to what I suggested way back at a MySQL Sales conference, so maybe my head was correctly screwed on after all.

As usual, Amazon will not give much details on what they are doing, and how, but as DynamoDB is related to Cassandra, we have to assume it is an LSM-tree database. Amazon also claims that Flash / SSD storage is used for DynamoDB.

The main issue I have with DynamoDB isn't performance, I think this is an area where Amazon have done their homework. But DynamoDB is really limited in terms of functionality, there are only some simple operations available, you can have only 1 key (which may be composite, but still). There is a very limited number of datatypes, but this isn't that much of a problem for me per see (see this blog post), rather the types that are available aren't really that generic (a numeric 38-digit precsion number and a UTF-8 string), I ask where the generic variable length byte stream type is (in my mind, this is the one datatype that all databases should provide. There is hardly anything that cannot be represented as such a datatype).

Anyway, when we get furrther along with our DynamoDB testing, I'll let you know.

Cheers
/Karlsson