Monday, August 6, 2012

Finally, the simple comparison of KVSs and some rants about performance

Faster and faster, that is what we want from our databases. And the biggest roadblock for the MySQL Dragster is the speed of the hard disk, right? No, I'm not going to debate that, that is just the case. And how do you fix that then? Well, if what is limiting your dragster is a roadblock, then either you drive round the roadblock of you make it disappear faster, or in computer terms:
  • Avoid using the disk by instead putting as much data as you can in RAM
  • Use faster disks (like SSDs)
Now, to be honest, this analogy isn't that good, as the performance limiting factor of the disk so huge, and contrary to popular belief, it's not getting any better! But we have SSDs you say? Yes, that makes the hard drive faster, but the CPU and RAM is getting faster even faster! But let's assume that we have enough memory so we do not need the disk? Will just about everything go at the speed of light? Nope, what happens here is that stuff that wasn't even noticeable in terms of limiting performance when the disk was there, as disk-I/O is such a huge bottleneck, suddenly show it's dirty face!

Like this:  As the CPU cores are getting faster, but not that much faster, anymore, due to physical limitations, we have more and more of these CPU cores instead. And suddenly, any limitation in getting those CPUs to work well together suddenly turns into a major headache! Like a mutex shared by all threads. Like the Query Cache mutex in MySQL for example!

With this in mind I am now ready to commence with the benchmarks that I wrote about in may. Yes, it took some time to get the data loaded into MySQL and in the process I managed to create a new Open Source project for exporting and importing JSON data from and to MySQL With that in place I now have something that is sort-of real world data. I had to remove a few columns (or Attributes of you are a JSON taliban) to make MySQL Cluster work with this data, this because MySQL Cluster stores VARCHAR data as fixed length data on disk which means a few things:
  • There is a lot more stuff to write to disk.
  • UTF-8 means that there is 3 times more data to write!
All this means that MySQL Cluster may well work in well as an alternative to some key-valuestore setups, but not all, and it depends what "value" means here. If "value" means "document" or "object", then we need to use a VARCHAR or something like that for the value, which will be real limiting in the case of MySQL Cluster. And I'll try to be real nice to MySQL Cluster here, so I end up with a really simple schema:
CREATE TABLE `keyvalue` (
  `id` bigint(20) NOT NULL,
  `value1` int(11) DEFAULT NULL,
  `value2` double DEFAULT NULL,
  PRIMARY KEY (`id`)
)
And in this table I load some 105.000.000 rows. Should be simple with MySQL Cluster, right? Except that MySQL Cluster will only accommodate 512 Mb of hash data per partition (this is a really, really silly limitation! Is 512 Mb the MySQL Cluster version of "640 K should be enough for anyone?"). But this fixable, and with 4 partitions, this works just fine.

As a side note, without disk data, MySQL Cluster feels MUCH more stable. The occasional data loss and other weirdness that I experienced what I tried to load the table with VARCHAR data are now completely gone. So not only does disk data limit you on in terms of data types (VARCHAR), it also seems to need some more development. And no, I wasn't in the mood of reproducing the errors I got.

Anyway, on my server here at home, having an 8-Core AMD CPU and 16 Gb RAM, waiting to run this benchmark. I'm testing MySQL with InnoDB, MySQL Cluster and MongoDB. The test program is the same in all cases, I read 1.000.000 rows 10 times distributed over 100 threads. To be fair to all, I made sure that what data I had would fit in memory and that it WAS in memory, so I did a couple of warmup runs first. In the case of NDB I used the MySQL API, not NDBAPI (I will try this eventually). The results I got was this:
  • MongoDB - 110.000 reads per second
  • MySQL with InnoDB - 30.000 reads per second
  • MySQL with NDB - 32.000 reads per second
In the case of NDB, I had these setting, beyond the standard stuff:
ndb_use_exact_count=0
ndb_force_send=0
And that makes one helluva difference, I can tell you that! Why this isn't by default I don not really know, I assume there is a good reason for it, but someone has to tell me what it is. Now, I did load data also, and the results there were similar, but as I was loading JSON and that is pretty native to MongoDB, that was expected, MongoDB was about 2.5 times faster than NDB / InnoDB, which were on par with each other. I will not give any exact numbers here as loading data depends on so much more in terms of tuning.

This is not the end of this story though, if we assume, as MySQL was way behind MongoDB in terms of performance, but InnoDB and NDB were on par with eachother, one might at least want to try the theory that it is the MySQL part that is slowing things down, and this can be tested by running MySQL / NDB with more than one mysqld, and that is the next thing to try. Then we have the HANDLER interface and proper NDBAPI also, the latter should be a lot faster, really. And yes, I really should see what MyISAM can do for me. And MariaDB.

And before I end, please MySQL Cluster developers, remove that silly 512 Mb HASH index size per partition limitation (see the manual here). As RAM get's less and less expensive, and if we, for example, wanted to avoid Network and Disk I/O as much as possible, and instead use RAM and CPU (like in an Amazon environment), then this turns into a rather sreious limitation. My tests above was on hard iron though.

/Karlsson
Who will do some more benchmarking tomorrow.

6 comments:

x said...

You can use MAX_ROWS attribute when creating your NDB table which would create multiple partitions per LCP, it has significant disadvantages - such as automatic repartitioning no longer works.
What seems to be a more effective solution is to make sure to run the multi-threaded data node instead of the single-threaded data node. That gives you one LCP per thread. And then if necessary, run multiple data nodes per machine.
All a fun hack, I haven't gotten around to checking out if using the SHM connector between nodes on the same machine boosts performance but I have heard some hints that it may.

Karlsson said...

Cool. I wasn't sure how MAX_ROWS would work here, but I'll try it, as well as the other options. Testing SHM was on my mind also.

x said...

Oh! I forgot to mention an important detail when using NDB...
By default, you're probably only running with 1 connection from your MySQL API node to your NDB data nodes. That will severely limit your performance. You would want one NDB API connection per real core on your MySQL API node to your data nodes.
This can be configured by setting the ndb-cluster-connection-pool variable.

Karlsson said...

That one I know about. I am running 4 connections now, to leave some cores for the benchmark program itself.

Sinisa said...

Anders,

Can you post details of our benchmarking environment ???

Sinisa said...

Sorry, Anders, I meant that I would like to see details of your benchmarking code and not our !!!!

I would just like to repeat it with certain possible mods.