I tried this tool with an InnoDB table, and that worked great, importing 100.000.000+ records at about 10k records / s, all CPUs going at full speed. As JSON is native to MongoDB, I assumed it would be faster and consume less resources there, and that was true, 22k records / s were imported into MongoDB, but I still think my tool worked well.
Now, having done all that, and having all the necessary tools ready, it was to to set up MySQL Cluster on this box. I have 8 cords and 16 Gb RAM on it, so a 20 Gb table should be too bad, at least not if I use Cluster disk storage. But it was some time ago since I used MySQL Cluster, so after downloading Cluster 7.2.6 I decided to follow the manual. After all, I was running MongoDB in a pretty standard off-the shelf config, and that worked fine, as did MySQL Community edition, so MySQL 7.2.6 shouldn't really have much issues here. Right?
Well, I was wrong. Or rather, the docs were wrong. And still the defaults for many of the MySQL Cluster parameters seems to indicate that this is meant to run on an Andriod based Cellphone rather than a mid-range home-brew server. All this not considering the incorrect syntax used in many example.
After downloading MySQL Cluster 7.2.6. I followed the manual for installing the thing, and even at that state I fealt that I was in for a bumpy ride here. MySQL 5.5 Section 17.2 MySQL Cluster installation says. "This section describes the basics for planning, installing, configuring, and running a MySQL Cluster.". Fact is, it does not, rather it gives and overview of some of the architectural aspects and some hardware requirements for MySQL Cluster, and covers nothing on planning or installing MySQL Cluster. This is not to say this section is bad, it's just that it's not about what it say it should be about.
OK, I'll live with that, lets move on. I downloaded a tar-file, and I know how to deal with that, but just to figure out if there are any specifics, I have a look in the docs, as this is Ubuntu Linux I look in 126.96.36.199. Installing a MySQL Cluster Binary Release on Linux. This is some of the things mentioned on here that I take to heart. Not. If I did, stuff would work. Like this little command:
tar -C /usr/local -xzvf mysql-cluster-gpl-7.1.23-linux-i686-glibc23.tar.gz
Why the manual wants me to unpack a MySQL 7.1 tar file is beyond me, I don't think that would end up being a 7.2 Cluster. The Management nodes are set up like this, according to this document:
cp bin/ndb_mgm* /usr/local/bin
Again, I can live with it, but I don't much like it, as someone else, who actually CAN benefit from MySQL Cluster would be put off due to issues like that. But as for myself, I move on to section 17.2.3. Initial Configuration of MySQL Cluster. Here, the config.ini settings are again incorrect, among the issues I found was that the portnumber setting (for the TCP DEFAULT section) is now obsolete, and for the management node the configdir setting, which is required, is missing.
Having fixed all this and possibly a few more issues that I have forgotten about, it was time to create some tables. As I was planning to store data on disk, I had a few more steps before I could create the table to load data into, I had to create a LOGFILE GROUP and a TABLESPACE for my disk data, where there is a working example in section 188.8.131.52. MySQL Cluster Disk Data Objects. This section has a sample configuration section where a disk based is set up. OK, fine. Time for setting up a LOGFILE GROUP:
ADD UNDOFILE 'undo_1.log'
I started loading data in MySQL Cluster. MyJSON loader failed, running it, it seemed to hang MySQL Cluster, I have yet to determine what the issue is, but I'll get to that real soon. Instead, I decidedd to load data into MySQL Cluster using the native MySQL method, by first mysqlexport data from my INNODB table, and then doing a mysql command line import (without the CREATE TABLE statement in the export file of course). This is running now, and it is dead slow. I'll have to figure out why. But the DataMemory and IndexMemory have been set to 2Gb and 1 Gb respectively, which isn't much but should be enough. Also, I have configured MySQL Cluster without replication to speed things up. I will eventually get this data loaded, but this method is way too slow, so I have to think up something else. Now I get a "table is full" after som 2 million rows (I have 100 million rows to load). Memory usage isn't high, and the data files on disk should be big enough (32 G) so I need to investigate what is going wrong. One thing is clear though, getting MySQL Cluster going is WAY more complicated than either MySQL or MongoDB. And note I have done it before, although it was some time ago since I did it last.
Hence the story of this benchmark continues in another blogpost.