Karlsson on databases and stuff

Tuesday, September 11, 2012

MySQL Configuration Wizard - Don't always trust it...

So, the MySQL Configuration Wizard for making MySQL Easier to use on Windows will just plainly aid in setting up a safe and sound MySQL Configuration. Right? Just run it and you are safe. OK? But thing do break, specifically harddrives and stuff, so to keep your database in shape you have a second set of disks in your Windows machine, for all that data that you do not want to loose in RAID configuration. Or you might even connect, say, your D:-drive to an expensive but safe SAN! Then you run the MySQL Configuration Wizard and tell it to place your InnoDB data files in that D: drive. Phew! Now you are OK! That RAID-set was expensive, but better safe than sorry! Now you can place all that important data into MySQL, and what's left on the C:-drive is just some programs, which can be replaced if necessary, but the data is safe!

NO! That's NOT how it works, not at all. There are two things wrong here:

Just because you move the InnoDB files, which might well be the storage engine you use, doesn't mean that the other MySQL datafiles are there. Fact is, you CANNOT change where MySQL places the other datafiles, they get into C:\ProgramData\MySQL Server 5.5/data whatever you do! Why this is so I do not know, but if you want to change this location, you have 2 options:

Edit the my.ini file generated by the Config Wizard. And change the datadir setting and move the files and directories from the C:\ProgramData\MySQL Server 5.5/data directory to the directory where you want the files on your D:drive.
Instruct Windows beforehand where you want your datafiles, which is real easy, just change the setting in the registry that determines where ProgramData is: HKLM\Software\Microsoft\Windows NT\CurrentVersion\ProfileList\ProgramData. Real easy and safe, right :-)

If you are OK with the MySQL other datafiles, and just want the InnoDB files to be safe, you still have to fix one thing: The innodb_log_group_home_dir setting. This follows the datadir setting, which means that if you move the InnoDB datafiles to another drive using the Config Wizard, the transaction log files will not move with it! They remain in C:\ProgramData\MySQL Server 5.5/data! Ouch! See this bug:66769

/Karlsson

Thursday, September 6, 2012

MySQL Server installer on Windows

I don't what makes this so hard for MySQL Developers at Oracle. Look, I know you guys are trying hard, that's not it, and I also know that chasing SQL Server is a priority, and that is fine. But for that to work, the MySQL MSI Installer has to get better! It just has to. Like some basic issues like these:

You can not use the MSI installer to install two MySQL servers on the same box. If the installer seems an old installation, it insists on upgrading it if you are installing a newer version, and if you are installing an older version, well, that will be refused.
You cannot, using the MSI installer, install both 32 and 64-bit versions on the same box. If you try that, even if the versions are the same, the second install insists that what you have already installed is newer than the one you are installing, although the have the same version number, but one is 32 bit and one is 64 bit.
In the Instance configurator when it comes to the data directory location, if you select "Installation path" you might think that the data directory will be placed where you are installing MySQL right now. Think again, this is not so. What this means is that if you install MySQL on the box, and MySQL hasn't been installed there before, then Installation path means just that, So let's assume that you choose to install MySQL in C:\foo, then the data directory will be C:\foo\data. Which is fine. If you then uninstall this version of MySQL and then install some other version of it, this time in, say, C:\bar, then you would image that the data directory would be C:\bar\data, but it will still be C:\foo\data. I'm not sure where this is stored, but it is somewhere and it's really annoying.

So, if a developer wants to run with the 32-bit version of the MySQL library (say because he / she is using a 32-bit driver for MySQL), then you cannot use the MSI. If someone would have given this an extra round of thinking, then the MySQL 64-bit server installation, would have included both the 32-bit and the 64-bit versions of the client. Right? I'll see if I have the bandwidth and energy to report these issues as bugs.

Will the MySQL Installer help then? That is a 201 Mb package that contains, according to the download page "All MySQL Products. For all Windows Platforms. In one package." Right, that should do the trick then? Nope, once you have made the choice to install MySQL Server, be it 32 or 64 bit, you can only install one of them.

Look, I'm trying hard not to be a grumpy old man here, but it is difficult (partly because I'm getting old I guess). I am aware that I can always download and unpack the ZIP archive (which is what I will do now), but please folks, this is on Windows and I'd rather stay away from that on that platform. And Oracle / MySQL took the effort to create not just one, but 2 installers, but none of them allow you to install 2 MySQL servers on the same machine, which is too bad.

And by the way, no just because I am installing MySQL and then uninstalling it, and then installing it again, doesn't mean I want to use the same data. It just doesn't.

And I am afraid that if you think MariaDB is any better, think again, that is NOT the case. Too bad.

/Karlsson

Wednesday, August 29, 2012

Revisiting libmysqld, the client / server overhead and all that. And an apology

I wrote about the performance gains with libmysqld a few days ago but I had just too many things in my head to do a proper comparison with the MySQL Cluster / Server protocol. Yes, libmysqld is faster, but not as much faster as I thought, and blogged about. What happened was that I had another thing to try which I had forgotten about, which was to test using the Client / Server protocol without the dreaded CLIENT_COMPRESS flag (see more on this here).

Without CLIENT_COMPRESS, I could see NDB performance improve by some 25 - 30 %. But with InnoDB, which I just tested, I achieved some 98 k row reads per second! Yikes, I should have tested that one before comparing with libmysqld (in which case I got 115 k rows read per second, which is still faster).

The good thing with all this a multitude of things:

We know for sure that you should NOT use the CLIENT_COMPRESS flag. Just don't. At least not when you have many small operations going and the database is largely in RAM. I'll test this in some more detail later, to see if I can find some good cases where CLIENT_COMPRESS is a good fit, but in this case, it's not.
When data is in memory, and you aren't using sharding, MongoDB really isn't that much faster, maybe some 10% compared to MySQL using InnoDB. But then you get transactions, joins and all sorts of goodies with MySQL.
The MySQL Client / Server protocol is FAR from as sluggish as I suspected!
The MySQL Parser and Optimizer is not that much of an overhead as I was lead to believe.
Using MySQL with InnoDB in a simple table might be such a bad Key Value Store as you, but as always it depends on the milage.

Frankly, the result is pretty much a surprise to me, MongoDB isn't that much faster than MySQL AT ALL, at least not in the case when data is in RAM. And if you ask how the MEMORY engine performance, well, about the same as InnoDB, slightly faster, but no as much as to say anything conclusively.

What remains to test then? Well, I have Tarantool and HANDLER SOCKET to test. And possibly a few more things. Also, I want to test what happens when there are some biggers sized documents in store that will not fit in memory in either MongoDB or MySQL? BLOBs anyone?

/Karlsson
Apologizing for comparing apples to oranges. Heck I already KNEW that I had used the CLIENT_COMPRESS flag, so why did I reference those test before that was removed? I just forgot it I guess.

The real differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels

Justin Swanhart wrote a blog on Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels and I thought I'd give my view on this issue.

To begin with, from a technical standpoint, Justin is correct in his statements, that is not why I am writing this (and this is the reason I'm writing a separate blogpost instead of just commenting his), but there is more to it than that.

What Justin really writes about is how these isolation levels are implemented in a particular database, in this case InnoDB. For example the sentence "There are a few differences between READ-COMMITTED and REPEATABLE-READ, and they are all related to locking." makes this very clear, these isolation levels, as defined by ANSI SQL as well as how they are implemented by other databases, has nothing to do with locking. What I am trying to say that when you issue the statement SET TRANSACTION ISOLATION LEVEL READ COMMITTED what you are saying is not that "I want these kinds of locks places here and there when I issue any upcoming SQL statements", what you are saying is that "I want to use the READ COMMITTED isolation level, which means that the data returned has certain attributes as related to consistency". As a side-effect of those attributes with regard to the returned dataset and when you use the InnoDB Storage Engine, you get just those locks, but that is another thing. (Like when you turn the steering wheel in your car, what you are saying is NOT that some gears and bolts should twist and turn in this or the other way, what you say is that "please dear car turn right or I am going to run into that upcoming brisk wall", which of course in turns means that those gears and bolts and stuff gets into action, but that is not what you asked for, you asked for the car to turn, and which way it can. And again, I'm not saying that those gears and stuff ain't interesting, they are!). And none of this to imply that Justins post was incorrect or not useful, I just think, as I wrote above, that there is a not more to it.

So, having determined that, what ARE you saying to the database when you issue those commands. And the answer is this (I'm excluding READ UNCOMMITTED here):

READ COMMITTED - Please give me only rows back that are committed!
REPEATABLE READ - Please give me rows back that are committed, and if I read one of those again, in the same transaction, make sure those rows are still there and haven't been modified.
SERIALIZABLE - Please give me rows back that are committed, and make sure that if the same SQL statement is executed again, I get the exact same rows, none being modified, deleted or added.

Now, read the above once again, and for READ COMMITTED, we only make sure that the rows are committed, we WHEN they should have been committed, only that they are when the row is read. So what does this mean then? Well, it means that there is a level of flexibility in the implementation here. Some implementer might say that the point in time to look at is at the start of the statement, another implementer might consider it the point when the row is read. And this is a BIG difference:

In the first implementation, it means that when using READ COMMITTED, the read is a "consistent read", which means that the whole set of records looks like it once existed. Or in other words, if I for example sum the value of some column, that sum will be or will have been the sum of that column in the database at some point in time.
In the second interpretation of READ COMMITTED, one may, for example, have rows added to the result set "after" the point when where the current row is read. Which means that this is an inconsistent read.

What is annoying is that there are actually different implementations out there. Let's see what ANSI SQL-99 has to say about what is allowed to happen when using READ COMMITTED:

P2 (‘‘Non-repeatable read’’): SQL-transaction T1 reads a row. SQL-transaction T2 then modifies
or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may
receive the modified value or discover that the row has been deleted.

What makes things a bit easier is that most implementations, although not all by far, use the second interpretation (I have to admit I have only looked at SQL-99 here, and not in that much detail, so things might have changed before or after SQL-99).

There is one thing though that should be pointed out in Justins blog, which is what he writes is not MySQL specific, it's InnoDB specific! So different storage engines may behave differently and may place different locks.

Committed by yours truly

/Karlsson

Monday, August 27, 2012

Fast and furious!

A few days I wrote a bit on my first results of comparing MySQL with MongoDB as a Key-Value Store, something that has been going on for way to long, but I am not finished yet. Last time I used MySQL Embedded Library to bypass the MySQL Client Server protocol to see what the overhead was, and the result was that it is big (and again, note that the same networking was used with MongoDB and I was also using Unix Domain Sockets, as well as plain TCP/IP, so don't ask me to fix any network issues I might have). Using Embedded Server with InnoDB was actually faster than using MongoDB, some 3 times faster compared to using the client / server protocol.

That one out of the way, I now wanted to see what I could get if I used the storage engine that was fastest in Client / Server mode, MEMORY. That took a while to fix, as to have an Embedded Server application, like my test application here, use the MEMORY engine, I have to load the data into the MEMORY table somehow each time I run the application. No big deal but a slight update to my benchmarking application was needed, as well as some debugging as embedded server is pretty picky with you doing things the right way and in exactly the right order, and is much less forgiving than the MySQL Client library. Anyway, I now have it fixed, and the result. Fast. Real fast and furious: 172 k rows read per second! Compared to 110k rows read per second with MongoDB (but that is MongoDB in Client Server mode of course). Using the MySQL Client, the MEMORY engine achieved 43 k row reads per second, which means that libmysqld is 400% faster! How is that for a performance improvement.

Which is not to say that we all should start building libmysqld applications right now. But what I want to say is that if you want to improve the performance of MySQL, looking into the Client / Server protocol would be a good starting point, there is a lot of performance to get there. The results noted could be interpreted as at least 75% of the time that MySQL processes a query, excluding disk I/O (this is the MEMORY engine after all), is spent in the Client / Server protocol. And looking at it differently: A key value store such as MongoDB might not be as fast as we think, but MongoDB sure does have a more efficient C/S protocol!

/Karlsson

Sunday, August 26, 2012

So, what is the client/server protocol overhead then? Or: How MySQL can finally kick some MongoDB ****

If you have followed, or have tried to follow, my different attempts at getting Key-Value Store, in this case represented by MongoDB, from MySQL on a single machine with all data in RAM. I have not been very successful so far, but many smart people, way smarter than yours truly, has been giving me suggestions for things to try to get MySQL closer to the performance of MongoDB, where MongoDB did some 110 k row reads per second, whereas MySQL was at best reading some 43 k rows per second (using the HEAP / MEMORY storage engine) and 46 k row reads per second (using NDB and without CLIENT_COMPRESS). Note that not all combinations has been tested, so it would be reasonably safe to assume that using the HEAP / MEMORY storage engine and excluding the memory storage engine, would be even faster than the 43 k rows when using CLIENT_COMPRESS.

As I could see that the CPU load on mysqld was very high, and as everything is in memory and hence there is no disk I/O, then we were probably CPU bound here. So bypassing mysqld when using MySQL Cluster, directly accessing NDBAPI did get me much better performance, some 90 k row reads, but that is still behind MongoDB.

Among the suggestions of what the bottleneck in MySQL was, was Parsing and optimization. Also, the question of how big the client / server overhead was. I have already figured out that using the compress client protocol actually limited performance quite a bit here, which might not be that much of a surprise, but that it affected performance by as much as 25 - 30 % as I found was surprising.

Anyway, the easiest way to see how much the client / server overhead was, was to run with libmysqld, the MySQL Embedded Library. Using libmysqld required some changes in my benchmark program, as I wanted to make sure that data was in memory before starting the test. Having done that I then commenced to test with using InnoDB in libmysqld. And hey, now we see some very different results! 115 k row reads per second! Yes, faster than MongoDB!

The documentation for libmysqld claims that the embedded InnoDB engine is not reentrant, but I run multiple connections here (in multiple threads), and it seems to work just fine. But this might be a mistake.

If this is truly so, I'm not 100% sure about these results yet, I'd really like to see someone else testing something similar with "normal" mysqld vs. libmysqld, then we know a few things:

The overhead of MySQL parsing and optimization is not even close to as high as much as one might think (yes, I was also thinking that all this CPU overhead might be due to SQL parsing etc).
There is something in the MySQL Client / Server protocol that one should look at (note that I was testing with Unix Domain sockets, and also with the normal TCP/IP network, and sockets was faster but not much. So this is not due to some network issues on my machine, and also the same network was using for testing with MongoDB).

What I suspect is that the client / server protocol sends a bit too many messages and that this in turn causes a lot of context switches (I have not looked at this yet, but you bet I will, i.e. how many context switches do I get with MongoDB compared to MySQL).

Another conclusion is that there would be a lot to potentially gain from improving the client / server protocol here. And some of you out there might say that my testcase is a bit extreme, having all data in RAM and all that, but I am not so sure that is the case, if you are running in a cloud, getting a lot of RAM is not that hard, but getting real fast disk-I/O is, at least right now and at reasonable price, i.e. adding RAM is less expensive than adding more powerful I/O.

The next cool thing to try then: Using libmysqld with NDB, an option has has been around for a while now, but I have not seen it documented much, which is strange as libmysqld really should server NDB much better than many other storage engines (as the mysqld / libmysqld component, when using MySQL Cluster, is largely stateless).

/Karlsson
Once again playing with libmysqld, and still thinking this is a "forgotten" technology with some interesting aspects to it.

Thursday, August 23, 2012

MySQL Cluster performance up again, or CLIENT_COMPRESS considered harmful

I'm back again (previous post in ths series is here)., with some interesting finds related to some more testing of MySQL Cluster testing (yes, I have promissed to test more things than this, but I got so entangled with NDB that I just had to give it one more shot). Looking at my benchmarking code, I realized I used the CLIENT_COMPRESS flag when I connected to MySQL. This flag was there in the code where I connected to MySQL (using mysql_real-connect(), this is a C program after all) and it was probably just pasted in from some other code somewhere. Not that this flag isnät known to me or anything, but I had not tested the compressed or non-compress MySQL client protocols much. I guess I at one time had assumed that CLIENT_COMPRESS at best helps when sending large packets between the client and the MySQL server, and above all, that for many small packets, it wasn't terribly helpful, but also not terribly harmful. Turns out I was wrong (yepp, that DOES happen).

Googling for CLIENT_COMPRESS, I didn't find much more than this either, to be honest, if you have many small packets, it's not going to be very helpful, but not very harmful either.

In this case though, it was the MySQL daemon maxing out the CPU that was the issue, so maybe I should to to run without CLIENT_COMPRESS. As stated above, Googling for this did not, at least not initially, provide much help, but as the CPU was maxed out, and compression consumes CPU power a lot, maybe we should avoid compression.

The result? 25 - 30 % more performance, just like that! MySQL Cluster with NDB is now managing some 46 k requests per second, as compared to the previous 35 k! Not a bad improvement. All in all, using MySQL Cluster using the MySQL API, as opposed to NDB, you probably want to avoid using CLIENT_COMPRESS and you are likely to make many small SQL statements with limited sizes of the result sets, and all data in memory (well, not all if you use STORAGE DISK, but that has issues of it's own), chances are that your performance bottleneck of the database side of things, will be the CPU.

But don't get too excited, as I am now going to revisit this with InnoDB also! (Yes, that is mean)

/Karlsson