Karlsson on databases and stuff

Wednesday, April 24, 2013

In Santa Clara now. 2 talks coming up

I'm in Santa Clara for Percona Live now, and things are looking good! The announceent on the merger of Monto Program and SkySQL is a good one! If you are around, I'll be speaking on MySQL on AWS on Thursday at 1:50 PM in Ballroom F. On Friday at SkySQL Solutions day (if you are at Percona Live and don't know about this, registration is free! Come see us, the program is here: http://www.skysql.com/content/mysql-cloud-database-solutions-day-schedule) I'll be talking about MySQL and MariaDB with JSON at 11:15 AM in Grand Ballroom B!

I'm also about to release a new version of my MySQL JSON tools real soon!

/Karlsson

Friday, March 29, 2013

See you at the UC in April!

I'll be speaking at the MySQL Conference and Expo on April 22-25 in Santa Clara. On April 25 at 1:50 PM I'll be talking about using MySQL in the Amazon AWS cloud but to be honest, I hevn't done much formal preparations. I will prepare some slides here, but fact is that most of this session will be practical, hands-on stuff. Largely, I'll show stuff that I used when I was Database Architect and Admin for a reasonably large AWS installation.

Anyone telling you that Amazon AWS is just like any other environment, except that disk-I/O is slower, doesn't get it, there is much more to AWS than that. By using the services that comes as part of AWS there are loads on things you as a DBA or Devops can do to simplify and automate everyday tasks. Backups, slave provisioning, availability are things that can make really good use of AWS. So armed with an AWS account and some MySQL instances running there, I'll be showing you some real world examples.

Also, you may ask why I haven't been blogging much recently? If this worries you, I think you should get a day job. Jokes aside, I was testing MySQL replication in real life, but it sort of failed on me, and I ended up with twins alright, but one boy and one girl. These two has taken a lot of my time recently, and the joys of blogging and writing code in spare time was changed to exercising the joy of changing dipers on these little babies:

I will do some more blogging now again though, I have promised myself to do that, but I have loads of other things to do also.

/Karlsson

Monday, January 21, 2013

Talking at the SkySQL Roadshow in Stockholm

SkySQL Roadshow is coming to Stockholm on Feb 7, come by and meet us. I'll be ending the day with a talk on Big Data, which will be a more generic Big Data talk with some MySQL relevance, but with the focus on Big Data in general.

I haven't blogging much recently, but that has some reasons. I am since Dec 1 the proud father of twins, a little boy and a little girl. I have yet to teahc them to write proper SQL, the have particular issues with subqueries, but we'll get there. In order to create the usual mess of things and to make sure things are at the brink of running out of control, we decided to renovate our flat in the middle of all this. But I'll get there, and once we have a new kitchen installed, I'll do some more blogging, I have some things piled up to write about.

/Karlsson

Friday, January 4, 2013

MySQL JSON import / export tools updated

A user of mysqlimport. Josh Baird, reminded me of a feature which I should have added from teh start, but which was forgotten about. The deal is that when you put a bunch of JSON objects in a file, you have a couple of options on how to do this.

The most obvious is maybe to export as a JSON array of objects, like this:
[
{"id":1, "name": "Geraint Watkins"},
{"id":2, "name": "Kim Wilson"}
]
But this is not what mysqljsoninport supported and this is not how, say, MongoDB exports JSON by default. The reason is that for large amount if data this is cumbersome, as what is in the file is actually one big JSON object containing all the data. This is difficult to parse, requires that a lot of data is read and that the object in whole is kept in memory, unless some clever processing is done. And if we are clever, this is still not effective. Rather, what was supported by mysqljsonimport and how MongoDB exports to JSON is as multiple objects without separators, i.e. you read an object, processit, and then you read some optional blankspace until you reach another object, like this:

{"id":1, "name": "Geraint Watkins"}
{"id":2, "name": "Kim Wilson"}

The latter is more effective, but often the former is used also. So mysqljsonimport now supports both formats, and mysqlexport can optionally export as a single JSON array of objects in a file.

Download the most recent version from Sourcefore: mysqlimport 1.5 and mysqlexport 1.2

Cheers
/Karlsson

Wednesday, January 2, 2013

Amazon AWS for MySQL folks - Speaking at Percona Live 2013

I'll be speaking at Percona Live Conference and Expo in Santa Clara (April 22-25 2013) and this time I'll do a different talk from what I usually do. The plan here is to be low-level dirty practical, showing stuff using the Amazon AWS API, writing scripts using them and showing how to use them together with MySQL. I have said it before and I say it again, to get the most from your cloud, you have to understand and use the unique features of the cloud environment you use.

Can you create an elastic MySQL setup on Amazon? What about HA? How can you add slaves seamlessly? And automatically? I'll try to cover and show as much of this is possible, but the presentation is far from ready so I am happy to accept suggestions on specifics to cover here. See some more details on my talk here.

Hope to see you in Santa Clara in April!

/Karlsson

Friday, December 21, 2012

Galera features beyond just HA

Galera from codership has been getting a lot of attention recently. Galera provides a nice High Availability solution for MySQL where Galera provides synchronous replication with conflict detection using the classic InnoDB Storage Engine. No more playing about with special storage engines of DRBD failover, just continue to use InnoDB and add Galera as the secret sauce for High Availability.

Some of the neat features of Galera are, but are not limited to, multi-master replication, a lightweight implementation of replication and zero failover times due to the multi.master ability. This is not a complete HA solution though, just a component of it, we still need to add some monitoring and failover mechanisms, but as Galera is multi-master this is greatly simplified and can in many cases be handled by the driver or the application with little overhead.

Now, the replication in Galera is synchronous, so that should slow things down a bit, right? Well, yes, but on the other hand Galera can use multiple threads to apply data on the slave, so that should compensate for that somewhat. And how does it compare to MySQL Semi-synchronous replication, which on paper should be that much different?

So I was curious about the multi-threaded apply on the slave that Galera supports? Could this be the multi-thread apply that MySQL has been waiting for all this time? (No, the schema parallel implementation in MySQL 5.6 doesn't count in my mind). So I set out to try this, and this is my thinking:

The parallel nature of this should be best exposed when you have many small transaction, so each INSERT is a single row, autocommit transaction.
For the sake of the test, remove as much InnoDB overhead as possible and run on Ramdisk (tmpfs)
The schema should be simple
Simple INSERTs are to be tested, nothing else
Multiple INSERT threads.
Multi-master operation, but no conflicts.

This is admittedly a simple testcase, but it should tell us something. The schema looks like this:
CREATE TABLE `tab1` (
`c1` int(11) NOT NULL,
`c2` char(100) DEFAULT NULL,
PRIMARY KEY (`c1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
And the data to be inserted is

column c1 - Unique sequential integer.
column c2 - A random string of characters of 5 to 100 characters length.
1.000.000 rows are inserted using 400 threads (200 on each MySQL server).

The hardware I am using for this test is my usual homebrew 8 core AMD box with 16G RAM, nothing exciting but useful.
InnoDB was standard configured here, nothing special, and Galera was using 16 apply threads on the slave, which is probably excessive for this use case. Both MySQL and Galera was using two MySQL servers on the same box.

MySQL with semi-synchronous replication ached some 4.830 INSERTs per second.
Galera achieved some 12.987 INSERTs per second, nearly 3 times the performance!

This test wasn't something scientific, but then most applications aren't terribly scientific either. To me, it seems like Galera is the Replication system MySQL should have had from the beginning! This rocks! And this also proves to me what I was thinking from the start, that Galera has more to give than a plain HA solution!

/Karlsson

Monday, November 26, 2012

In my mind: Why the ORDBMS idea failed

Some 15 years ago, the idea of an ORDBMS (Object-Relational Database Management System) was red hot, and I was very close to the flaming hot center of that. I worked for Informix at the time, and Informix bought Illustra which was the hottest and coolest of the databases if it's time, hey it was an ORDBMS.

This was not a bad idea per se, and I got entangled with it and was really enthusiastic about the idea and I spent a lot of time evangelizing this technology. For Informix, this was as much market positioning and a technical change, Informix went from being the cheap redneck cousin to become the Gordon Gekko of databases. Before this, Oracle was the Big Market Leader, Sybase was the technology leader and Informix was the price leader (no, I'm not talking technical realities, there was a lot of good technical stuff to all of these, this is about how the world at large perceived these guys). But Illustra and another Informix project, XPS (aimed at the data warehouse market) was going to take Informix to places it had never been before. Oh, the Billboard wars, the day when Informix went past Sybase, those were fun days.

From a financial POV, Informix lost it, we already know that (read "The Real Story of Informix Software and Phil White" by "Steve W. Martin" ISBN: 978-0-09721822-2-5), but that's not, in my mind, the whole story, and I think that even though I think there are many good prspects for an ORDBMS system, it's not really as generic as I figured it back then (OK, I was wrong, I admit it, it does happen).

From a technical standpoint what went wrong was (this is my take on it, by the way) that the cool ORDBMS features shoehorned into an aging Informix RDBMS design ended up being largely the worst of both worlds. That has been fixed, to an extent, in more recent Informix releases, but not it's too late :-(

And from a conceptual view, this is also what I think is wrong with the whole ORDBMS thinking. I know and love the traditional RDBMS model, with a fixed number of columns and a variable number of rows, even if this is a simple model, it works real well for data. It makes plain data easy to visualize and understand, and this is also a well researched and understood model for data. As for OO, then this has been thoroughly researched, but the implementations and functionalities differ a lot. Also, OO has a developer focues way of looking at data, for an application, and Objects is a natural way of looking at things and makes things easy, from an application POV. But representing data as an Object is a different thing. Not a bad or good way, but different. The Relational model also lends itself to building control structures for data as it, assuming that the RDBMS is used in some kind of normalized form, is representing data at a very low level, lower than what most applications or end-users view data. And Objects are a way of combining all this data into something that is more application centric.

So the ORDBMS systems turned non-OO enough to not attract the OO people, and at the same time the OO features were non-Relational enough to make the SQL-experts ignore them. (Like: "Why would I want a result set with a variable number of columns?"),

And before I close this: Yes, I know there are many ORDBMS applications out there, that works well and where the application utilize all the cool ORDBMS features. Also, in Oracle and in Particular Postgres and others, there are ORDBMS features that are developed. And inside Postgres, the ORDBMS features is a building block for more than one generic RDBMS feature. But for database people in general, ORDBMS is something we don't see much of.

/Karlsson