Thursday, August 23, 2012

MySQL Cluster performance revisited - tcmalloc kicks some ***

My now long-running series of posts on getting max performance from a very simple MySQL Cluster setup (see details here) is continuing here. As a short intro to what I am trying out here, is to see if I can validate the claim that MySQL Cluster / NDB would be a good replacement for a Key Value Store (KVS) such as MongoDB. I test this in a rather simple single-server environment, but this is for a reason, just not a coincidence: The reason is that RAM is getting inexpensive and servers that can take a lot of RAM are also getting less expensive, which in turns means that the saying that many small servers are more cost-effective that few big ones, might not be as valid as it used to be. Also, I wanted to test what MySQL Cluster can do for me, from a KVS centric view. In short, I run on one server (16 Gb RAM, 8 cores) with all data in memory, what do I get.

I can tell you I have been trying real hard to get MySQL Cluster to work better here. Yes, SQL parsing in MySQL is to blame for a lot of why MySQL Cluster is so much slower than MongoDB. But why is MySQL Cluster slower than a fully cached (hot) InnoDB then? Admittedly, I was running the InnoDB tests with MySQL 5.6, so there you have one more thing for me to try (i.e. running InnoDB tests with the mysqld packaged with MySQL Cluster).

A lot of different MySQL Configurations was tried, as was a few other things. Lookiing at what was going on in the box revealed that most of the CPU was consumed by mysqld, so I decided to try to speed that up a bit, as the CPUs were maxed out by mysqld using up some 85 % or so. Using tcmalloc seemed like one obvious fix here, and I got some 10% better performance using that, MySQL Cluster using the SQL interface now performance about 35 k requests per second, which still is below InnoDB (with MySQL 5.6) and not even close to MongoDB, but still an improvement.

But as I said, there is still more stuff to test and I am not giving up yet. But I have given up on distributing some mysqld load to some other server. I got that working, but as the load on the test is distributed evenly across the mysqld servers, the slowest one will hold things up (that is how thing work right now, but I can and will revisit that one). Tarantool, some more NDBAPI tests and then some remains to be tested!

Take care
/Karlsson

7 comments:

Mikael Ronstrom said...

Try jemalloc, it will probably get even more improvement. tcmalloc improves InnoDB also by about 10% where jemalloc improves around 15% and this gain is higher with bigger machines most likely. Also make sure to use sufficient number of ndb cluster connections, should improve up to 8 or 16 at least.

Anders Karlsson said...

I'll try jemalloc. I have tried with many NDB connections, up to 16, but I did not see a big difference. The bottleneck was CPU and it was mysqld maxing out. So I'll be fuun to see what I get from InnoDB with the same mysqld as NDB.

/Karlsson

Antony T Curtis said...

What Mikael says..
Plus... you can remove a fair sized chunk of the MySQL overhead by *gasp* using stored procedures.
There is a significant cost to first time invocation in a connection but if you keep your entire transaction in a stored procedure, it helps keeps NDB Cluster happy by keeping the transactions short. There are then further tricks you can do to help shorten the transaction times...

Anders Karlsson said...

Anthony!

Hey, that would be a real interesting thing to try! Not that I am convinced it will help, but it is sure worth a try! Also, this is rather easy to try :-)

/Karlsson

Nicolas Seyvet said...

Could u publish the my.cnf and config.ini files? NDB is tricky to configure right.

I do not know if you are using it but for example the number of open table in the cache can dramatically change performance.

[mysqld]
ndbcluster
set-variable=max_connections=250

skip-external-locking
key_buffer_size = 256M
max_allowed_packet = 1M
table_open_cache = 500
sort_buffer_size = 1M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size= 16M

Another thing is the number of connections from mysqld to the NDP cluster.
ndb-cluster-connection-pool=...

See http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-program-options-mysqld.html

I am quite interested in your results :)

Anders Karlsson said...

Nicolas!

Well, setting things like key_buffer_size isn't terribly valid when using NDB. The ¤ of open tables is often an issue, but here I am using just 1 of them, so that shouldn't be a big deal either. Any other MyISAM related config parameters aren't terribly relevant either (yes, some of them come into action for stuff liek sorting and that, but here I am using single row PK lookups, so they don't make much difference).

Cluster connectin pool is something I have experimented with a lot, which made very little difference, among the ndb-specific mysqld parameters, what DID make a difference was ndb_use_exact_count=0. But except for that, I don't know.

Among the config.ini parameters, I have tried quite a few combinations, currently this is what I have in the ndb default section:
NoOfReplicas=1
DataMemory=8000M
IndexMemory=1500M
ODirect=1
MaxNoOfExecutionThreads=4
LockPagesInMainMemory=1
TimeBetweenLocalCheckpoints=0
RealtimeScheduler=1
LongMessageBuffer=32M

Some of these are not very relevant for read performance. If someone wants to add something, I am willing to listen, but note that these are the parameters that are used when I run with NDBAPI also, which is A LOT faster!

As for the ndb_xxx parameters in my.ini, I'd really like to know if there is any of those that can make a real big difference in terms of my very simple workload.

Cheers
/Karlsson

Nicolas Seyvet said...

Anders
¨
Sorry for myisam params, I copied and pasted some config file I had.)

I have seen the same problem with an overloaded mysqld (300+%) on a multicore while ndbmtd is idling at 10%.

And I have not found any way to improve this significantly.

NDP_API is much faster as you indicated. For the reasons you mentioned.

Nothing to add :)