Tuesday, August 14, 2012

MySQL Cluster / NDB as a key value store - Better results this time

Following up on my series of "how much performance do I get when I access RAM only", the most recent post being this, and as you can see there, MySQL didn't really perform that well. And MySQL Cluster in particular wasn't very fast, fact is, it was slower than InnoDB and almost any other storage engine, with the exception of MyISAM then.

Well, I finally had some bandwidth to finish testing my benchmark with NDBAPI. This took some time, as i also had some other MySQL Cluster tests to do (multiple mysqld, cramming everything into one ndbmtd etc), but finally I had it done. And this time, things worked better. Using MySQL NDBAPI I managed to get about 90.000 single row reads per second using 100 threads in a simple 105.000.00 table with 3 columns and a BIGINT PRIMARY KEY, compared to about 32k single row reads when using MySQL Cluster using the SQL interface. MySQL with InnoDB got some 39k rows per second.

Still, MySQL Cluster using NDBAPI still doesn't beat MongoDB (which got about 110k single rows reads per second), but still close enough to be useful. I still have some tuning to do here, so I might get better results in a few days. From this, one might jump to the conclusion that it's mysqld that is the bottleneck here, and that is probably correct, but then you would expect two mysqlds to help, but no. On the other hand, the explanation to this might be that I am maxing out the CPU (all my cores are at 100%). To test this, I'll put a mysqld daemon on another box. Before I go on, the reason I do this is to see if I can find the bottleneck, using a second machine would be unfair to, in particular, MongoDB.

My first idea was to run MySQL Cluster on my desktop Windows box, as this is one of the more powerful boxes I have, and I imagined that the MySQL Cluster for Windows had improved since I last tried it and complained, but that turned out not to be the case. The msi installer for MySQL Cluster doesn't work at all it seems (Bug #66386 and Bug #66387). To be honest, it seems like this installer really has never worked, if I was Oracle (which I am not), I'd just take it away or fix it, as it stands, it is just an annoyance and a waste of time. (And don't tell me "No one use MySQL Cluster on Windows". Yes, that might well be true, but that doesn't mean it shouldn't be fixed if Windows is not supposed to be a supported platform. If you drive that reasoning further, then you might also say that there is no market for MySQL Cluster 7.2.8 as no one uses it).

This means that I have two options:
- Persist in getting MySQL Cluster 7.2.7 to Work on Windows.
- Start up my old Linux desktop box.

I'd like to get Cluster running on Windows again, and write a blog post on how to get it to work (and possibly even create an installer myself). On the other hand, although the Linux box isn't that how, it should be warm enough (it's a 4-Core ancient AMD box, loosely based on scrapped Mobo from a box at the old MySQL Uppsala office). So which one to do? We'll see, but possibly both, my 4-core Linux box should be running anyway (I was shut down when I moved my office a few weeks back) and MySQL Cluster really should work on my Windows box, if for no other reason so to say that "I run MySQL Cluster on Windows" so you cannot use that as a reason not to fix the obvious MySQL Cluster Installer issues.

And then I have Handlersocket, Tarantool and some other things to try. If you haven't realized it by now, I am enjoying this. And I haven't really seem many tests like this: "I have one box with lots of memory so I can fit my whole database in RAM. Now, which one is fastest?".

/Karlsson

8 comments:

Todd Farmer said...

Not to ignore your points about the .MSI package for MySQL Cluster, but have you considered the alternatives for deploying Cluster on Windows? I wrote my own startup scripts so that I could use the .ZIP packages (which I prefer):

http://mysqlblog.fivefarmers.com/2012/03/16/mysql-cluster-quick-start-script/

Karlsson said...

Todd!


Yeah, I have written scripts like that myself, but I was hoping that as an msi package was available, it would actually work, which it doesn't. So it's up to fixing my old scripts and start using it. And actually, all I nedded was the mysqld with the NDB storage engine (which didn't work after using the msi installer either, but I have still to figure out why).

Cheers
/Karlsson

x said...

I still think your ndb numbers are a bit low.
;-)

Karlsson said...

Antony!

I think so too, bur the CPU is maxed out and I have tried a whole bunch of things. I guess I could get it to perform better, but there you are. The load is read-only, in-memory only, doing single row reads using primary Key lookups. I know I have a few things to adjust in the NDB API code on the side of my benchmark. I'm not sure how much more I want to play with this though, but so far it's been fun so I'll keep at it a bit more.

Cheers
/Karlsson

Todd Farmer said...

I've verified your bug reports and added one of my own (66394). Thanks for reporting the problems - it's clearly something that should have been caught earlier, and I'll work to make sure future releases have adequate testing of the installer packaging.

Bogdan Kecman said...

Hey AK,

Nice to see you'r still playing with Cluster :) ... a question - how many data nodes you are running there? How many replicas? From your post it's not clear but looks like you are running either a single data node with noofreplica=1 or 2 data nodes on a single server with noofreplica=2. Neither of those two are optimized for max troughput and there's really no point in running cluster on less then 2 dedicated machines for datanodes.

As for windows, you probably have a point there but not that I never seen windows in production, I never seen cluster installed from any type of packages in production - all production systems use tar.gz. Now the cluster manager is slowly creeping in but.. that's a whole other story..

stay good
b.

Karlsson said...

Nogdan!

Yeah, I'm certain that this is not an optimal config för NDB. But I've heard so many times that MySQL Cluster, in particular with NDB, should be a good Key-Value Store. So I wanted to test a KVS based environment, and a simple one with just one server (1-node only, no replication).

What one should be aware of is that RAM prices are going way down right now, so an In-Memory setup with a few 10s of Gb of data can easily fit in an average server box, which is different from a few years ago. So really, not that many machines are needed to support, say, 20 or 30 Gb In-Memory database. Trust, I would have liked to see NDB work better, and I will not give this up, but I do not think that my hardware setup is that weird, except that I should probably have one more box as a replica, but besides that, really having some 4 or 8 boxes is just not required in terms of RAM anymore, but if NDB requires it because of CPU requirements, then so be it, but then MongoDB is probably a lot more cost efficient. And I'd really want NDB to win!

/Karlsson

Karlsson said...

And Bogdan, sorry for my bad speling of your name. Completely unintentional.

/Karlsson