Thursday, March 10, 2011

Want to accellerate the MySQL Slave? Here is something to try!

I admit this wasn't my idea. Nor do I claim it will solve all problems. Or that it will always speed things up. But in principle, this should work in many cases.

The deal is this: For certain kind of Slave operations, UPDATE, DELETE and INSERT ... SELECT, the operation is both a read operation, as well as a write. To be frank (but I am not, I'm Anders), this is the case with just about any MySQL operation, but for the ones I list, there is a distinct set of rows, defined by the statement itself, that will have to be read.

The above is something we can take advantage on. As there is just one thread applying the changes from the master in the slave, as the slave operations are serialized, this thread is easily the bottleneck. But there is one aspect that doesn't necessarily need to be serialized, which is the read of the data to be modified. We can, to put it simple, pre-warm the MySQL cache with the data that is used by the statements above.

What is needed to make this work is to read from the MySQL relay log, ahead of the SQL Slave Replication thread, and convert the UPDATE, INSERT or DELETE statement to a SELECT, and then execute that SELECT statements. As I said, I didn't invent this, but I wanted to try it out. Hence you can now download slavereadahead 1.0 from Sourceforge. This is a really simple piece of software that just what I describe above. But before you get started, let me tell you some caveats:

  • I am NOT reading directly from the MySQL relay log files, rather, I use the output from the SHOW RELAYLOG EVENTS command, which means you have to be running MySQL 5.5 or up.
  • Because of the above, not all statements will be available to cache, as row-based events are not shown properly in this case.
  • Also, it might be the case, but I have not tested this, that the overhead of doing this through the server cause quite a bit of overhead. I do not know.
Now, I have not tested the effectiveness of the program much. I know it builds and works on Ubuntu 9.x on 64-bit Intel platforms, but that is about it for hard facts. What I have done to compensate for this is to make the program configurable in many ways. The SELECT statements are for example executed by separate threads, you can configure how many you want. If you run out of these threads, the SELECT will be executed by the "main" thread, which means that you can run with no specific SQL threads at all, if you wish. Also, how much ahead I read and how much I read at the time is also configurable.

The algorithm I use for which statements to pick up and execute is pretty simple, and the conversion from INSERT, UPDATE and DELETE into SELECT is also not very fancy. For us here at Recorded Future, it seems to work though. Any errors in the conversion to / execution of SQL statements are silently ignored.

I'd be really glad if someone would try out the usefulness or not of this program. It comes with the standard GNU autobuild setup, i.e. configure / make, and should be easy to get going with. There is also a basic documentation PDF included.

So, download from Sourceforge and try it out!



datacharmer said...

How does this tool differ from Maatkit's mk-slave-prefetch?

Karlsson said...

To be honest, I had all but forgotten about this. So see it as an alternative.

My program is written in C and use the SHOW RELAYLOG EVENTS command, maakkit is written in Perl and reads from the relay log files. Not that much different, I admit. But hey, what's wrong with diversity? The maatkit version has a bunch of more options, and the algorithm is differnt. To me, I like C better for these kind of things, but maybe that just me. Quite possibly it is.


Harrison said...

mk-slave-prefetch is quasi-deprecated, so having more alternatives is better:

Mark Callaghan said...

I can't believe Domas has yet to reply here with a link to the source for his version of it.