Wednesday, August 29, 2012

Revisiting libmysqld, the client / server overhead and all that. And an apology

I wrote about the performance gains with libmysqld a few days ago but I had just too many things in my head to do a proper comparison with the MySQL Cluster / Server protocol. Yes, libmysqld is faster, but not as much faster as I thought, and blogged about. What happened was that I had another thing to try which I had forgotten about, which was to test using the Client / Server protocol without the dreaded CLIENT_COMPRESS flag (see more on this here).

Without CLIENT_COMPRESS, I could see NDB performance improve by some 25 - 30 %. But with InnoDB, which I just tested, I achieved some 98 k row reads per second! Yikes, I should have tested that one before comparing with libmysqld (in which case I got 115 k rows read per second, which is still faster).

The good thing with all this a multitude of things:
  • We know for sure that you should NOT use the CLIENT_COMPRESS flag. Just don't. At least not when you have many small operations going and the database is largely in RAM. I'll test this in some more detail later, to see if I can find some good cases where CLIENT_COMPRESS is a good fit, but in this case, it's not.
  • When data is in memory, and you aren't using sharding, MongoDB really isn't that much faster, maybe some 10% compared to MySQL using InnoDB. But then you get transactions, joins and all sorts of goodies with MySQL.
  • The MySQL Client / Server protocol is FAR from as sluggish as I suspected!
  • The MySQL Parser and Optimizer is not that much of an overhead as I was lead to believe.
  • Using MySQL with InnoDB in a simple table might be such a bad Key Value Store as you, but as always it depends on the milage.
Frankly, the result is pretty much a surprise to me, MongoDB isn't that much faster than MySQL AT ALL, at least not in the case when data is in RAM. And if you ask how the MEMORY engine performance, well, about the same as InnoDB, slightly faster, but no as much as to say anything conclusively.

What remains to test then? Well, I have Tarantool and HANDLER SOCKET to test. And possibly a few more things. Also, I want to test what happens when there are some biggers sized documents in store that will not fit in memory in either MongoDB or MySQL? BLOBs anyone?

Apologizing for comparing apples to oranges. Heck I already KNEW that I had used the CLIENT_COMPRESS flag, so why did I reference those test before that was removed? I just forgot it I guess.

The real differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels

Justin Swanhart wrote a blog on Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels and I thought I'd give my view on this issue.

To begin with, from a technical standpoint, Justin is correct in his statements, that is not why I am writing this (and this is the reason I'm writing a separate blogpost instead of just commenting his), but there is more to it than that.

What Justin really writes about is how these isolation levels are implemented in a particular database, in this case InnoDB. For example the sentence "There are a few differences between READ-COMMITTED and REPEATABLE-READ, and they are all related to locking." makes this very clear, these isolation levels, as defined by ANSI SQL as well as how they are implemented by other databases, has nothing to do with locking. What I am trying to say that when you issue the statement SET TRANSACTION ISOLATION LEVEL READ COMMITTED what you are saying is not that "I want these kinds of locks places here and there when I issue any upcoming SQL statements", what you are saying is that "I want to use the READ COMMITTED isolation level, which means that the data returned has certain attributes as related to consistency". As a side-effect of those attributes with regard to the returned dataset and when you use the InnoDB Storage Engine, you get just those locks, but that is another thing. (Like when you turn the steering wheel in your car, what you are saying is NOT that some gears and bolts should twist and turn in this or the other way, what you say is that "please dear car turn right or I am going to run into that upcoming brisk wall", which of course in turns means that those gears and bolts and stuff gets into action, but that is not what you asked for, you asked for the car to turn, and which way it can. And again, I'm not saying that those gears and stuff ain't interesting, they are!). And none of this to imply that Justins post was incorrect or not useful, I just think, as I wrote above, that there is a not more to it.

So, having determined that, what ARE you saying to the database when you issue those commands. And the answer is this (I'm excluding READ UNCOMMITTED here):
  • READ COMMITTED - Please give me only rows back that are committed!
  • REPEATABLE READ - Please give me rows back that are committed, and if I read one of those again, in the same transaction, make sure those rows are still there and haven't been modified.
  • SERIALIZABLE - Please give me rows back that are committed, and make sure that if the same SQL statement is executed again, I get the exact same rows, none being modified, deleted or added.
Now, read the above once again, and for READ COMMITTED, we only make sure that the rows are committed, we WHEN they should have been committed, only that they are when the row is read. So what does this mean then? Well, it means that there is a level of flexibility in the implementation here. Some implementer might say that the point in time to look at is at the start of the statement, another implementer might consider it the point when the row is read. And this is a BIG difference:
  • In the first implementation, it means that when using READ COMMITTED, the read is a "consistent read", which means that the whole set of records looks like it once existed. Or in other words, if I for example sum the value of some column, that sum will be or will have been the sum of that column in the database at some point in time.
  • In the second interpretation of READ COMMITTED, one may, for example, have rows added to the result set "after" the point when where the current row is read. Which means that this is an inconsistent read.
What is annoying is that there are actually different implementations out there. Let's see what ANSI SQL-99 has to say about what is allowed to happen when using READ COMMITTED:

P2 (‘‘Non-repeatable read’’): SQL-transaction T1 reads a row. SQL-transaction T2 then modifies
or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may
receive the modified value or discover that the row has been deleted.

What makes things a bit easier is that most implementations, although not all by far, use the second interpretation (I have to admit I have only looked at SQL-99 here, and not in that much detail, so things might have changed before or after SQL-99).

There is one thing though that should be pointed out in Justins blog, which is what he writes is not MySQL specific, it's InnoDB specific! So different storage engines may behave differently and may place different locks.

Committed by yours truly

Monday, August 27, 2012

Fast and furious!

A few days I wrote a bit on my first results of comparing MySQL with MongoDB as a Key-Value Store, something that has been going on for way to long, but I am not finished yet. Last time I used MySQL Embedded Library to bypass the MySQL Client Server protocol to see what the overhead was, and the result was that it is big (and again, note that the same networking was used with MongoDB and I was also using Unix Domain Sockets, as well as plain TCP/IP, so don't ask me to fix any network issues I might have). Using Embedded Server with InnoDB was actually faster than using MongoDB, some 3 times faster compared to using the client / server protocol.

That one out of the way, I now wanted to see what I could get if I used the storage engine that was fastest in Client / Server mode, MEMORY. That took a while to fix, as to have an Embedded Server application, like my test application here, use the MEMORY engine, I have to load the data into the MEMORY table somehow each time I run the application. No big deal but a slight update to my benchmarking application was needed, as well as some debugging as embedded server is pretty picky with you doing things the right way and in exactly the right order, and is much less forgiving than the MySQL Client library. Anyway, I now have it fixed, and the result. Fast. Real fast and furious: 172 k rows read per second! Compared to 110k rows read per second with MongoDB (but that is MongoDB in Client Server mode of course). Using the MySQL Client, the MEMORY engine achieved 43 k row reads per second, which means that libmysqld is 400% faster! How is that for a performance improvement.

Which is not to say that we all should start building libmysqld applications right now. But what I want to say is that if you want to improve the performance of MySQL, looking into the Client / Server protocol would be a good starting point, there is a lot of performance to get there. The results noted could be interpreted as at least 75% of the time that MySQL processes a query, excluding disk I/O (this is the MEMORY engine after all), is spent in the Client / Server protocol. And looking at it differently: A key value store such as MongoDB might not be as fast as we think, but MongoDB sure does have a more efficient C/S protocol!


Sunday, August 26, 2012

So, what is the client/server protocol overhead then? Or: How MySQL can finally kick some MongoDB ****

If you have followed, or have tried to follow, my different attempts at getting Key-Value Store, in this case represented by MongoDB, from MySQL on a single machine with all data in RAM. I have not been very successful so far, but many smart people, way smarter than yours truly, has been giving me suggestions for things to try to get MySQL closer to the performance of MongoDB, where MongoDB did some 110 k row reads per second, whereas MySQL was at best reading some 43 k rows per second (using the HEAP / MEMORY storage engine) and 46 k row reads per second (using NDB and without CLIENT_COMPRESS). Note that not all combinations has been tested, so it would be reasonably safe to assume that using the HEAP / MEMORY storage engine and excluding the memory storage engine, would be even faster than the 43 k rows when using CLIENT_COMPRESS.

As I could see that the CPU load on mysqld was very high, and as everything is in memory and hence there is no disk I/O, then we were probably CPU bound here. So bypassing mysqld when using MySQL Cluster, directly accessing NDBAPI did get me much better performance, some 90 k row reads, but that is still behind MongoDB.

Among the suggestions of what the bottleneck in MySQL was, was Parsing and optimization. Also, the question of how big the client / server overhead was. I have already figured out that using the compress client protocol actually limited performance quite a bit here, which might not be that much of a surprise, but that it affected performance by as much as 25 - 30 % as I found was surprising.

Anyway, the easiest way to see how much the client / server overhead was, was to run with libmysqld, the MySQL Embedded Library. Using libmysqld required some changes in my benchmark program, as I wanted to make sure that data was in memory before starting the test. Having done that I then commenced to test with using InnoDB in libmysqld. And hey, now we see some very different results! 115 k row reads per second! Yes, faster than MongoDB!

The documentation for libmysqld claims that the embedded InnoDB engine is not reentrant, but I run multiple connections here (in multiple threads), and it seems to work just fine. But this might be a mistake.

If this is truly so, I'm not 100% sure about these results yet, I'd really like to see someone else testing something similar with "normal" mysqld vs. libmysqld, then we know a few things:
  • The overhead of MySQL parsing and optimization is not even close to as high as much as one might think (yes, I was also thinking that all this CPU overhead might be due to SQL parsing etc).
  • There is something in the MySQL Client / Server protocol that one should look at (note that I was testing with Unix Domain sockets, and also with the normal TCP/IP network, and sockets was faster but not much. So this is not due to some network issues on my machine, and also the same network was using for testing with MongoDB).
What I suspect is that the client / server protocol sends a bit too many messages and that this in turn causes a lot of context switches (I have not looked at this yet, but you bet I will, i.e. how many context switches do I get with MongoDB compared to MySQL).

Another conclusion is that there would be a lot to potentially gain from improving the client / server protocol here. And some of you out there might say that my testcase is a bit extreme, having all data in RAM and all that, but I am not so sure that is the case, if you are running in a cloud, getting a lot of RAM is not that hard, but getting real fast disk-I/O is, at least right now and at reasonable price, i.e. adding RAM is less expensive than adding more powerful I/O.

The next cool thing to try then: Using libmysqld with NDB, an option has has been around for a while now, but I have not seen it documented much, which is strange as libmysqld really should server NDB much better than many other storage engines (as the mysqld / libmysqld component, when using MySQL Cluster, is largely stateless).

Once again playing with libmysqld, and still thinking this is a "forgotten" technology with some interesting aspects to it.

Thursday, August 23, 2012

MySQL Cluster performance up again, or CLIENT_COMPRESS considered harmful

I'm back again (previous post in ths series is here)., with some interesting finds related to some more testing of MySQL Cluster testing (yes, I have promissed to test more things than this, but I got so entangled with NDB that I just had to give it one more shot). Looking at my benchmarking code, I realized I used the CLIENT_COMPRESS flag when I connected to MySQL. This flag was there in the code where I connected to MySQL (using mysql_real-connect(), this is a C program after all) and it was probably just pasted in from some other code somewhere. Not that this flag isn├Ąt known to me or anything, but I had not tested the compressed or non-compress MySQL client protocols much. I guess I at one time had assumed that CLIENT_COMPRESS at best helps when sending large packets between the client and the MySQL server, and above all, that for many small packets, it wasn't terribly helpful, but also not terribly harmful. Turns out I was wrong (yepp, that DOES happen).

Googling for CLIENT_COMPRESS, I didn't find much more than this either, to be honest, if you have many small packets, it's not going to be very helpful, but not very harmful either.

In this case though, it was the MySQL daemon maxing out the CPU that was the issue, so maybe I should to to run without CLIENT_COMPRESS. As stated above, Googling for this did not, at least not initially, provide much help, but as the CPU was maxed out, and compression consumes CPU power a lot, maybe we should avoid compression.

The result? 25 - 30 % more performance, just like that! MySQL Cluster with NDB is now managing some 46 k requests per second, as compared to the previous 35 k! Not a bad improvement. All in all, using MySQL Cluster using the MySQL API, as opposed to NDB, you probably want to avoid using CLIENT_COMPRESS and you are likely to make many small SQL statements with limited sizes of the result sets, and all data in memory (well, not all if you use STORAGE DISK, but that has issues of it's own), chances are that your performance bottleneck of the database side of things, will be the CPU.

But don't get too excited, as I am now going to revisit this with InnoDB also! (Yes, that is mean)


MySQL Cluster performance revisited - tcmalloc kicks some ***

My now long-running series of posts on getting max performance from a very simple MySQL Cluster setup (see details here) is continuing here. As a short intro to what I am trying out here, is to see if I can validate the claim that MySQL Cluster / NDB would be a good replacement for a Key Value Store (KVS) such as MongoDB. I test this in a rather simple single-server environment, but this is for a reason, just not a coincidence: The reason is that RAM is getting inexpensive and servers that can take a lot of RAM are also getting less expensive, which in turns means that the saying that many small servers are more cost-effective that few big ones, might not be as valid as it used to be. Also, I wanted to test what MySQL Cluster can do for me, from a KVS centric view. In short, I run on one server (16 Gb RAM, 8 cores) with all data in memory, what do I get.

I can tell you I have been trying real hard to get MySQL Cluster to work better here. Yes, SQL parsing in MySQL is to blame for a lot of why MySQL Cluster is so much slower than MongoDB. But why is MySQL Cluster slower than a fully cached (hot) InnoDB then? Admittedly, I was running the InnoDB tests with MySQL 5.6, so there you have one more thing for me to try (i.e. running InnoDB tests with the mysqld packaged with MySQL Cluster).

A lot of different MySQL Configurations was tried, as was a few other things. Lookiing at what was going on in the box revealed that most of the CPU was consumed by mysqld, so I decided to try to speed that up a bit, as the CPUs were maxed out by mysqld using up some 85 % or so. Using tcmalloc seemed like one obvious fix here, and I got some 10% better performance using that, MySQL Cluster using the SQL interface now performance about 35 k requests per second, which still is below InnoDB (with MySQL 5.6) and not even close to MongoDB, but still an improvement.

But as I said, there is still more stuff to test and I am not giving up yet. But I have given up on distributing some mysqld load to some other server. I got that working, but as the load on the test is distributed evenly across the mysqld servers, the slowest one will hold things up (that is how thing work right now, but I can and will revisit that one). Tarantool, some more NDBAPI tests and then some remains to be tested!

Take care

Free but not Gratis: A call for Open Source for everyone!

The term Open Source is not as old as you may think, and the concept actually predates the name. Initially the keyword was Free not Open, but Free is here in the sense of Freedom not in the sense "without cost", and this conflict in the English term "Free" was one of the big reasons that Free really wasn't a good word here. Which all in all doesn't mean that Free isn't still used to describe the Open Source movement, like in FSF (Free Software Foundation).

And Free as in Freedom, not Free as in "without cost", is an important distinction. What the deal was, in my interpretation at least but there are many different views here, was that the software should be available for use by anyone and for any purpose as long as they followed the rules. And the rules was there for a number of purposes, two important ones being:
  • To ensure that the software in question remained free and open.
  • To encourage the development of more free and open software.
Now, these are points I fully agree with, for a number of reasons. But I also think that that, and this is where I am not so politically correct anymore (if that is something I ever was), that there are markets for:
  • Making a living on Open Source software. And no one said this shouldn't be possible, but here I am thinking beyond master programmers like Linus, RMS and Ken Thompson, I'm thinking about your everyday coder, building applications for a customer, providing support or packaging software. Yes, there are many good developers out there without the ability to be master programmers like Linus or Alan Cox, but which are still needed and has competences beyond those guys, like Domain Knowledge.
  • Proprietary software. Yes, I am aware that this is something you probably didn't want to hear, but I do think that proprietary software will still be around and is still needed in many places. Let me tell you why I think so,and the main reason is that there are areas in Software that just isn't viable as Open Source. Like small specialized areas where it's not possible for someone to create good Open Source Software. This is not to say that specialized software cannot be Open Source in meaning that the source is open, but not Open Source in the meaning that there is a community development effort, with people around the globe finding and fixing issues adding features.
 Yes, I do believe that a developer who has significant domain knowledge of something, say car sales, can bring significant contributions to an Open Source project, even though that developer might not be more than an average programmer. And if the project in question is an Open Source system for, say, car rental, then his input might well be more important than any input from, say, Linus Torvalds.

Another thing to realize is that the higher up the ladder of software you go, from operating systems and device drivers, up through databases and middleware, past generic office applications and to domain specific applications adopted for a very specific purpose, the more non-IT specific knowledge is needed. To create an Operating System, like Linux, you need to be really good at, well this is a surprise I guess, but I do believe that being an Ace on Operating Systems is a good thing. And Operating Systems is something you can learn at University and the theories and operations of an OS are generic IT stuff. To a large extent, the same goes for databases. But for a system for, say, car rental, that is something you do not learn at University. And here are some issues with the Open Source movement, if you ask me:
  • The domain of Open Source is IT, i.e. computers, software etc, and the users of it are also presumed to be IT folks (which doesn't mean that Open Source isn't used by everywhere, it's just that Open Source it isn't a conscious choice by your average user at Joe Bloggs Car and Trunck Rental).
  • Open Source doesn't in and of itself bring a lot of advantages to those guys at Joe Bloggs Car and Truck Rental. Which is not to say that Open Source doesn't push prices down or isn't the most cost effective solution, but the Open nature of, say, Linux, means little beyond "lower price" to them,
That Linux is not as strong on the Desktop as on the Server side I think is largely based on this. It's not that the end users don't appreciate Linux and Openness, it's just that it's not easy to see that advantages in the case of an Operating System or even in the case of, say, Open Office, beyond that fact that the cost is less. But really, did want Freedom just to be cheap? There has to be more to it?

I think there is a market for much more domain specific Open Source software. Really, I do! But I think we (us IT folks) must start to be much more open towards things like end user needs and application specifics. In many cases, the way something domain specific works has been determined by some ancient IT-system built ages ago, so that the effective business logic is defined by how this old IT system works, not the other way around (flight booking anyone). I also firmly beleive that there is a big market for something like that out there, this is an area where the non-as-smart-as-RMS developers (like myself) can contribute, where we have some domain specific knowledge that can be turned into a useful and powerful application. But the Open Source movement needs to find a way to accommodate that, to understand that just because someone is not a good developer for writing SCSI device drivers, might not mean that he or she doesn't posses valuable knowledge, and to support those domain specific effort. Yes, to write an OS or a Database System, you can surely do that based on knowledge you picked up at University, but that is not the case for many, if not most, end user applications. Hey, maybe we can learn something interesting domain specific while contributing something IT specific to a project.

Feeling Free and Open

Tuesday, August 14, 2012

MySQL Cluster / NDB as a key value store - Better results this time

Following up on my series of "how much performance do I get when I access RAM only", the most recent post being this, and as you can see there, MySQL didn't really perform that well. And MySQL Cluster in particular wasn't very fast, fact is, it was slower than InnoDB and almost any other storage engine, with the exception of MyISAM then.

Well, I finally had some bandwidth to finish testing my benchmark with NDBAPI. This took some time, as i also had some other MySQL Cluster tests to do (multiple mysqld, cramming everything into one ndbmtd etc), but finally I had it done. And this time, things worked better. Using MySQL NDBAPI I managed to get about 90.000 single row reads per second using 100 threads in a simple 105.000.00 table with 3 columns and a BIGINT PRIMARY KEY, compared to about 32k single row reads when using MySQL Cluster using the SQL interface. MySQL with InnoDB got some 39k rows per second.

Still, MySQL Cluster using NDBAPI still doesn't beat MongoDB (which got about 110k single rows reads per second), but still close enough to be useful. I still have some tuning to do here, so I might get better results in a few days. From this, one might jump to the conclusion that it's mysqld that is the bottleneck here, and that is probably correct, but then you would expect two mysqlds to help, but no. On the other hand, the explanation to this might be that I am maxing out the CPU (all my cores are at 100%). To test this, I'll put a mysqld daemon on another box. Before I go on, the reason I do this is to see if I can find the bottleneck, using a second machine would be unfair to, in particular, MongoDB.

My first idea was to run MySQL Cluster on my desktop Windows box, as this is one of the more powerful boxes I have, and I imagined that the MySQL Cluster for Windows had improved since I last tried it and complained, but that turned out not to be the case. The msi installer for MySQL Cluster doesn't work at all it seems (Bug #66386 and Bug #66387). To be honest, it seems like this installer really has never worked, if I was Oracle (which I am not), I'd just take it away or fix it, as it stands, it is just an annoyance and a waste of time. (And don't tell me "No one use MySQL Cluster on Windows". Yes, that might well be true, but that doesn't mean it shouldn't be fixed if Windows is not supposed to be a supported platform. If you drive that reasoning further, then you might also say that there is no market for MySQL Cluster 7.2.8 as no one uses it).

This means that I have two options:
- Persist in getting MySQL Cluster 7.2.7 to Work on Windows.
- Start up my old Linux desktop box.

I'd like to get Cluster running on Windows again, and write a blog post on how to get it to work (and possibly even create an installer myself). On the other hand, although the Linux box isn't that how, it should be warm enough (it's a 4-Core ancient AMD box, loosely based on scrapped Mobo from a box at the old MySQL Uppsala office). So which one to do? We'll see, but possibly both, my 4-core Linux box should be running anyway (I was shut down when I moved my office a few weeks back) and MySQL Cluster really should work on my Windows box, if for no other reason so to say that "I run MySQL Cluster on Windows" so you cannot use that as a reason not to fix the obvious MySQL Cluster Installer issues.

And then I have Handlersocket, Tarantool and some other things to try. If you haven't realized it by now, I am enjoying this. And I haven't really seem many tests like this: "I have one box with lots of memory so I can fit my whole database in RAM. Now, which one is fastest?".


Thursday, August 9, 2012

MySQL as a Key Value Store (KVS) - Why, some background and some more tests

You might have seen my attempts at testing MySQL as a Key Value Store (KVS) (here and here), i.e. used like MongoDB, and I have had some comments to those. Above all, many wanted to know the details of my test environment. To be honest, this started off as a very simple thing, but it turned out a bigger than I thought, so I guess its reasonable to give some background. I apologize for not giving you this background before, but as I said, my initial intention was to just run bog standard MongoDB versus MySQL and see what happened.


Here at Recorded Future, when I joined about 2 years ago, we used a lot of MySQL, and it was causing us serious problems. We had a pretty standard school-book style relational schema, and performance was really hurting us bad. Because of the amount and type of data we manage, because we run on Amazon and because we are still developing and changing, this schema didn't fit our needs at all.
Read - scalability was not a good option either, as we are massaging so much data that we had big issues with just having the Slaves keep up with the master. Schema changes, as we developed our application, was also difficult.

What we ended up doing was a redesign of a lot of things, ending up with something that could use a KVS, as these are easier to scale, using sharding. This had some negative implication too, like it was no longer possible to go into MySQL and issue a query for data just like that. And no, using the mongo shell for this, as not only were we now using MongoDB, we had data spread over several MongoDB stores. On the good side though, things was not much more performant and easier to scale, write scalability with MongoDB is good and easy to use and configure. Which is not to say that MongoDB solves all problems in the world.

So, having implemented MongoDB and having gotten used to it and even starting to like it a fair bit, I also started seeing some other things. Like, Why can't MySQL be used for this? I mean MySQL is a bit more mature than MongoDB and should be better at handling, say disk-IO. On the other hand, MongoDB is newer. One thing I already knew was that MongoDB really needs to be in-memory to a large extent, when it hits the disk, performance goes downhill fast, so here MySQL is better. But when we DO have enough RAM (and as we use several MongoDB datastores for different purposes, and we are on Amazon, where you can get respectable amounts of RAM, this is usually the case) what happens? If I let MySQL use RAM it can get so I don't get any disk-IO for reads at all, will it be as fast as MongoDB? Or maybe even faster.
And among the MySQL Storage Engines, the one mostly targeted as a Key Value Store is NDB? How's that going to work.


Test environment

So, to test all this, but to make sure I wasn't affected by Amazons ups-and-downs, I pulled one of our MongoDB datastores, some 105.000.000 rows of real world data. Initially, my plan was to use the full MongoDB datastore, but I had to exclude some fields as these were VARCHAR and putting them on Disk with NDB was causing just too much disk-I/O, as NDB stores disk data as fixed length (so a UTF-8 VARCHAR(256) field occupies some 768 bytes). I ended up with a table schema like this:
CREATE TABLE `keyvalue` (
  `id` bigint(20) NOT NULL,
  `value1` int(11) DEFAULT NULL,
  `value2` double DEFAULT NULL,
  PRIMARY KEY (`id`)
In the case of MongoDB, the id column was used as _id, which is MongoDBs way of saying "PRIMARY KEY".

 Having done all that, I need some tools for my test bench:
  • Homebrew server with an AMD FX-8120 8 core CPU and 16 Gb RAM on an ASUS M5A88-V Mobo. This is a neat server MoBo really, supporting SATA 6, USB 3.0 and other niceties but I use a Lite-On LNE100TX network card instead of the built-in Realtek crap that many MoBos, inparticular AMD ones, seems to be equipped with.
  • A bunch of disks, the disks where database resides is a LVM Stripe of two Seagate Barracuda 7200.11 500 Gb disks using xfs. But this is pretty insignificant as there should be no disk-I/O at all, or at least extremely limited, when testing.
  • Ubuntu 10.10 (I have some reasons for this a bit ancient version. And yes, I have patched that 16 Gb memory issue in Linux with some AMD MoBos).
  • MySQL 5.6.5 64-bit (for all MySQL tests except NDB)
  • MySQL Cluster 7.2.6 64-bit (for MySQL NDB tests) 
  • MongoDB 2.0.5 64-bit
Having all that ready I now had a slightly smaller table taking up some 4.2 Gb when exported as JSON. So this data has been imported into MongoDB and MySQL with a number of different storage engines, and I was ready to do some testing. I use a test program that kicks of SQL queries with over a specific T of threads, each SQL (with Mongo this is not SQL of course, but the test program is the same) containing a variable part, which is the ID for the row to look up, The program starts by reading a file with a number of random IDs, the IDs are spread over the threads and then each thread looks up the rows N times.

In this case I picked some numbers for my test, so the number of threads I use is 100, not an unreasonable number, but big enough to make any concurrency issues visible and the number of rounds is 10. So to be clear, the 1.000.000 IDs are distributed over the threads, so each thread processes 10.000 IDs each, running then 10 times, which means I do some 10.000.000 random reads in parallel. Not the most exciting benchmark on the planet, I know, but it was all the same interesting and also to some extent a real-world test.

Before each test run, I did a few initial runs to ensure that the cached were warmed up. Also, I did look at vmstat and stuff to ensure no disk-I/O was taking place.


Current state of tests

I have so far tested MongoDB, MySQL Cluster and MySQL InnoDB and since last I have also tested the MySQL MEMORY engine and MySQL with MyISAM. So far the results are like this:
  • MongoDB - 110.000 rows read per second
  • MySQL Cluster - 32.000 rows read per second
  • MySQL with InnoDB - 39.000 rows read per second
  • MySQL with MEMORY / HEAP - 43.000 rows read per second
  • MySQL with MyISAM - 28.000 rows read per second
The last two ones were very disappointing to me, and trust me, I want them to be good. I really wanted MySQL to beat MongoDB, and I really wanted to see how the much lower overhead of MyISAM would make it run circles around InnoDB, but no.

As MyISAM only caches the keys itself, not the actual data, this was an issue, but still, the MyISAM data file should be well cached by Linux for me, and rightly so, there was no disk-I/O. Could context switching when going to Linux for data be an issue? I decided to try something fun. I created a new unique index on the table, covering all columns, which meant that if I used this covering index, all data would be fulfilled by the index (but the index in itself would be bigger of course). This didn't help at all, quite the opposite. Playing with named keycaches and stuff didn't help at all, nor did using mmap. Which is not to say that these guys do not help to improve performance, all this means is that when all data is already im memory, they don't seem to help. I did figure out one thing that did work though, which was using tcmalloc. Enabling tcmalloc when starting the mysql server, which is easy these days as MySQL comes with the tcmalloc library included and mysqld_safe has an command-line option for it. Real easy, this really should be the default.

With this in place, I managed to get:
  • MySQL with MyISAM - 37.000 rows read per second
I'll try using tcmalloc with the other engines also, and see what happens.


In conclusion so far

Don't take me wrong here,  I'm not out to prove that MySQL or MongoDB sucks. Nor am I out to do some exact benchmarking. What I am out to do is to see if the claim that an RDBMS can perform as well as a standard Key Value Store in an environment optimized for a Key Value Store (i.e. all data in memory). And such an environment isn't that specialized actually. databases around a few 10's of gigabytes are really common and can easily fit on memory. And although I do prefer the more strict datamodel of an RDBMS to the pretty schema-free design of a NoSQL alternative such as MongoDB (MongoDB really has some reliance on schema, but in essence, it's schema-free), I have to admit that sometimes you need more performance, and distributing the load over multiple servers is a lot easier when you don't have to worry about global transactions or global consistency, and frankly, in many cases you can live without that, to an extent,

What remains to be tried is MySQL with NDB using multiple mysqld servers, InnoDB using tcmalloc and tarantool and the MySQL HANDLER interface (actually, I have already tested the latter, but I am not sure I got the tests right).

Who apologizes for the omission of the test setup before publishing the first results

Tuesday, August 7, 2012

Sales Wolfs vs Techies? Is that really an issue?

Are you to be aware of the Sales Wolf really? Is this something to be scared of? Sean Hull writes a bit about this in a recent post but I think he got a few things wrong, although I also agree with a lot of stuff in there. Above all, I think Sean oversimplifies what Sales are up to.

Before I go on though, let me tell you that I have worked at Oracle, not only as part of the MySQL -> Sun -> Oracle transition but before that, actually my first job outside Telco and my first job for a US company was for Oracle. And I learnt so much stuff there, it was an incredible time. And not only was Oracle good at taking care of me and letting me go to classes and conferences and stuff, they actually helped me find those things and pointed me there. I went from a C programmer with some interest in databases to a decent support engineer and then to performance consultant and eventually a Pre-Sales Engineer, in a few hectic years. But I was younger then.

So, back to the real issue, is Sales just about getting the customer locked in and getting as much money as fast as possible from the unsuspecting management person, in the process bypassing the far smarter DBAs? Nope, that's not how I looked at my career as a Sales Engineer, I do not think I ever ended up selling something, together my my sales guys then, that I cannot stand up for today. Sales is so much more, just think about it, as there is competition out there.

Does your Oracle Sales Team really want to get unfriendly with the techies at the customer site? Yes, you often sell to someone in management, but just as often to a techie or to someone in management who is a techie. I there are very few Sales Guys out there who are both successful and is selling stuff the customer doesn't want or need.

Is the techie Ops and DBA guys really the experts? They know what you have alright, that is mostly correct, but they do not always know what the options are and what the advantages and disadvantages are. And again, this is natural: If you are a seasoned Oracle DBA, having taken all the classes in Oracle and knowing your TNSNAMES.ORA and INIT.ORA and other bits of Oracle inside and out, why would you want MySQL? MySQL is something you don't know? Why would you want something like that? I'm not saying that Oracle DBSs are bigots, quite the contrary, all I am saying is that we ALL have reasons beyond pure facts to react to things. It's not like a DBA or a sysadmin is necessarily more unbiased than someone in management, just because they know the existing technology better.

For me as a Sales Engineer, I think selling is about trust and about positioning. Can you trust me when I say Oracle might be a better choice for your database needs? Or that the answer might be MySQL? Or PostgreSQL? I'm really a curious person, and I really love to know what people are doing with technology, I am mostly not that interested in technology for the sake of technology itself. I'm not an Open Source taliban either, despite having worked at MYSQL for 6+ years (before that for close to 20 years at commercial closed source database companies, mostly). I like Open Source stuff mostly because it's a brilliant means of developing software. And yes, I have said no numerous times, and I have my sales guys with me (mostly)!

Look at a car salesguy. Does he want to sell you the most expensive car he has, then he is probably not a good salesguy in the long run, as you, as a customer, will eventually figure out that this car really didn't fit you that well, and next time around, you want to buy a car from someone else.

Having said all this, I agree with many other points in the post in question. Yes, make a list of things, if you can, but remember that the company trying to sell to you might have something you can benefit from, but which requires some effort or change on your side to change. Do not just just look at what you know and have experience in, but on what you can learn. If the technology isn't used in your industry, is that a good thing or a bad thing? If it's not used by your competitors, are there advantages to the technology that your can use to your advantage versus your competitors?


Some corrections and additions to my simple KVS tests.

This is the first follow-up to my post on a simple test of KVS alternatives. To recap, I tested a simple single table schema in MySQL using the NDB and InnoDB storage engines. To have a Key-Value store to compare with, I did the same test in MongoDB. All tests were done of the same system, an 8-core AMD Linux box with 16 Gb RAM. The tests consisted of reading 1.000.000 rows, out of the total 105.000.000 in the table, distributed over 100 threads 10 times, a total of 10.000.000 rows read then. The test program I use makes sure that the same random ID's of the table are reused each time and the same are used for all servers.

Now, firstly, after some checking I realized that I had not fully cached the InnoDB engine, so it was doing a certain, small, amount of disk I/O still. I fixed this and the number now looks like this:
  • MongoDB - 110.000 rows read per second.
  • InnoDB - 39.000 rows read per second.
  • NDB - 32.000 rows read per second.
So now InnoDB is faster than NDB. But NDB has some tricks up it's sleeve, like running with multiple mysqld servers, and I have today finished my test-program to support just this. Also, I have had some useful hints of NDB engine tuning, so I'll try that one too, but testing NDB takes more time as restarting NDB is much, much slower than MongoDB or MySQL with InnoDB.

But I have another test result today! I realized that although I no big fan of the query cache, I should be able to use that here too. And I don't need that a big a cache, as I am only reading some 1.000.000 rows. So off I went, turned on and tuned in, without dropping out, the query cache and ran my test again. I soon realized one thing: Warming up the query cache was going to be real slow. But warming up MongoDB is just as slow, MongoDB really is targeted for as much as possible in RAM, the disk I/O the do is hardly optimized (they use mmap, so what can you expect). Once the query cache was nice and warm, I ran my benchmark (this was using the mysqld with InnoDB, which matters less as all reads are now done in the query cache). And what I got was about 34.000 rows read per second. This is not a 100% fair comparison of course, as the query cache doesn't need to cache that much (only 1.000.000 queries), but really, it should have been faster than caching in InnoDB, I was a bit disappointed with this and I'll see if I can find the bottleneck somewhere in the code.

But I'm not finished yet. The MEMORY engine and NDB with a few more mysqld servers remains to be tested, as well as Tarantool, the MySQL HANDLER interface and NDBAPI eventually. Not necessarily in that order.

And before closing, if you are wondering, the test program is written in plain C, no C++ or Java or anything or fancy stuff like that. Also, the test program uses proper multi-threading, I do not have multiple processes running around here,


Monday, August 6, 2012

Finally, the simple comparison of KVSs and some rants about performance

Faster and faster, that is what we want from our databases. And the biggest roadblock for the MySQL Dragster is the speed of the hard disk, right? No, I'm not going to debate that, that is just the case. And how do you fix that then? Well, if what is limiting your dragster is a roadblock, then either you drive round the roadblock of you make it disappear faster, or in computer terms:
  • Avoid using the disk by instead putting as much data as you can in RAM
  • Use faster disks (like SSDs)
Now, to be honest, this analogy isn't that good, as the performance limiting factor of the disk so huge, and contrary to popular belief, it's not getting any better! But we have SSDs you say? Yes, that makes the hard drive faster, but the CPU and RAM is getting faster even faster! But let's assume that we have enough memory so we do not need the disk? Will just about everything go at the speed of light? Nope, what happens here is that stuff that wasn't even noticeable in terms of limiting performance when the disk was there, as disk-I/O is such a huge bottleneck, suddenly show it's dirty face!

Like this:  As the CPU cores are getting faster, but not that much faster, anymore, due to physical limitations, we have more and more of these CPU cores instead. And suddenly, any limitation in getting those CPUs to work well together suddenly turns into a major headache! Like a mutex shared by all threads. Like the Query Cache mutex in MySQL for example!

With this in mind I am now ready to commence with the benchmarks that I wrote about in may. Yes, it took some time to get the data loaded into MySQL and in the process I managed to create a new Open Source project for exporting and importing JSON data from and to MySQL With that in place I now have something that is sort-of real world data. I had to remove a few columns (or Attributes of you are a JSON taliban) to make MySQL Cluster work with this data, this because MySQL Cluster stores VARCHAR data as fixed length data on disk which means a few things:
  • There is a lot more stuff to write to disk.
  • UTF-8 means that there is 3 times more data to write!
All this means that MySQL Cluster may well work in well as an alternative to some key-valuestore setups, but not all, and it depends what "value" means here. If "value" means "document" or "object", then we need to use a VARCHAR or something like that for the value, which will be real limiting in the case of MySQL Cluster. And I'll try to be real nice to MySQL Cluster here, so I end up with a really simple schema:
CREATE TABLE `keyvalue` (
  `id` bigint(20) NOT NULL,
  `value1` int(11) DEFAULT NULL,
  `value2` double DEFAULT NULL,
  PRIMARY KEY (`id`)
And in this table I load some 105.000.000 rows. Should be simple with MySQL Cluster, right? Except that MySQL Cluster will only accommodate 512 Mb of hash data per partition (this is a really, really silly limitation! Is 512 Mb the MySQL Cluster version of "640 K should be enough for anyone?"). But this fixable, and with 4 partitions, this works just fine.

As a side note, without disk data, MySQL Cluster feels MUCH more stable. The occasional data loss and other weirdness that I experienced what I tried to load the table with VARCHAR data are now completely gone. So not only does disk data limit you on in terms of data types (VARCHAR), it also seems to need some more development. And no, I wasn't in the mood of reproducing the errors I got.

Anyway, on my server here at home, having an 8-Core AMD CPU and 16 Gb RAM, waiting to run this benchmark. I'm testing MySQL with InnoDB, MySQL Cluster and MongoDB. The test program is the same in all cases, I read 1.000.000 rows 10 times distributed over 100 threads. To be fair to all, I made sure that what data I had would fit in memory and that it WAS in memory, so I did a couple of warmup runs first. In the case of NDB I used the MySQL API, not NDBAPI (I will try this eventually). The results I got was this:
  • MongoDB - 110.000 reads per second
  • MySQL with InnoDB - 30.000 reads per second
  • MySQL with NDB - 32.000 reads per second
In the case of NDB, I had these setting, beyond the standard stuff:
And that makes one helluva difference, I can tell you that! Why this isn't by default I don not really know, I assume there is a good reason for it, but someone has to tell me what it is. Now, I did load data also, and the results there were similar, but as I was loading JSON and that is pretty native to MongoDB, that was expected, MongoDB was about 2.5 times faster than NDB / InnoDB, which were on par with each other. I will not give any exact numbers here as loading data depends on so much more in terms of tuning.

This is not the end of this story though, if we assume, as MySQL was way behind MongoDB in terms of performance, but InnoDB and NDB were on par with eachother, one might at least want to try the theory that it is the MySQL part that is slowing things down, and this can be tested by running MySQL / NDB with more than one mysqld, and that is the next thing to try. Then we have the HANDLER interface and proper NDBAPI also, the latter should be a lot faster, really. And yes, I really should see what MyISAM can do for me. And MariaDB.

And before I end, please MySQL Cluster developers, remove that silly 512 Mb HASH index size per partition limitation (see the manual here). As RAM get's less and less expensive, and if we, for example, wanted to avoid Network and Disk I/O as much as possible, and instead use RAM and CPU (like in an Amazon environment), then this turns into a rather sreious limitation. My tests above was on hard iron though.

Who will do some more benchmarking tomorrow.