Tuesday, August 7, 2012

Some corrections and additions to my simple KVS tests.

This is the first follow-up to my post on a simple test of KVS alternatives. To recap, I tested a simple single table schema in MySQL using the NDB and InnoDB storage engines. To have a Key-Value store to compare with, I did the same test in MongoDB. All tests were done of the same system, an 8-core AMD Linux box with 16 Gb RAM. The tests consisted of reading 1.000.000 rows, out of the total 105.000.000 in the table, distributed over 100 threads 10 times, a total of 10.000.000 rows read then. The test program I use makes sure that the same random ID's of the table are reused each time and the same are used for all servers.

Now, firstly, after some checking I realized that I had not fully cached the InnoDB engine, so it was doing a certain, small, amount of disk I/O still. I fixed this and the number now looks like this:
  • MongoDB - 110.000 rows read per second.
  • InnoDB - 39.000 rows read per second.
  • NDB - 32.000 rows read per second.
So now InnoDB is faster than NDB. But NDB has some tricks up it's sleeve, like running with multiple mysqld servers, and I have today finished my test-program to support just this. Also, I have had some useful hints of NDB engine tuning, so I'll try that one too, but testing NDB takes more time as restarting NDB is much, much slower than MongoDB or MySQL with InnoDB.

But I have another test result today! I realized that although I no big fan of the query cache, I should be able to use that here too. And I don't need that a big a cache, as I am only reading some 1.000.000 rows. So off I went, turned on and tuned in, without dropping out, the query cache and ran my test again. I soon realized one thing: Warming up the query cache was going to be real slow. But warming up MongoDB is just as slow, MongoDB really is targeted for as much as possible in RAM, the disk I/O the do is hardly optimized (they use mmap, so what can you expect). Once the query cache was nice and warm, I ran my benchmark (this was using the mysqld with InnoDB, which matters less as all reads are now done in the query cache). And what I got was about 34.000 rows read per second. This is not a 100% fair comparison of course, as the query cache doesn't need to cache that much (only 1.000.000 queries), but really, it should have been faster than caching in InnoDB, I was a bit disappointed with this and I'll see if I can find the bottleneck somewhere in the code.

But I'm not finished yet. The MEMORY engine and NDB with a few more mysqld servers remains to be tested, as well as Tarantool, the MySQL HANDLER interface and NDBAPI eventually. Not necessarily in that order.

And before closing, if you are wondering, the test program is written in plain C, no C++ or Java or anything or fancy stuff like that. Also, the test program uses proper multi-threading, I do not have multiple processes running around here,



Mark Callaghan said...

After two posts you have yet to describe what versions of MySQL, NDB and Cluster have been used. Lots of other details are missing.9 I think you should focus on reporting what was done before adding the editorial.

Karlsson said...


That's a good point, I'll add some more details in a post real soon, but in short I have used very recent versions.