Karlsson on databases and stuff: July 2012

Thursday, July 26, 2012

MySQL JSON import tool version 1.4 is out

This is just a quick notice that there is a new version of the JSON import tool available now. The main thing in it is that it fixes an iddues with threadinig that caused a threaded load (and this is the default) to hang of there was an error or the import was interrupted. Also, there are some minor fixes and also a runtime status printing, available by sending a SIGUSR1 signal to the process, feature is available.

Download from sourceforge.

Cheers
/Karlsson

Tuesday, July 24, 2012

What is a MySQL GA? RC? Beta?

I still do not know how this works for MySQL, but whatever the thiinking is, I do not like it. There is a flow of new features coming into the GA releases which is my mind usually isn't a good thing (i.e. a new feature means a new source of bugs). This is particularily bad in the case of MySQL due to the risk of regression bugs and lack of modularization.

And I know, what you are going to say, if we didn't get those new features then that old GA version would not be useful, if for no other reason so due to Moore's Law, the software has to keep up with what is going on in the world of Hardware. But MySQL has never been good at that, for example, can someone explain why the sample MySQL config file for big MySQl server, my-huge.cnf, starts with this comment:

# Example MySQL config file for very large systems.
#
# This is for a large system with memory of 1G-2G where the system runs mainly
# MySQL.

When was a "very large" system one with "1G-2G" of memory? Was it before or after the civil war? What usefulness does something like this have? (And this is from MySQL 5.6, which isnät even GA yet).

MySQL has never been very good at keeping up with itself either, the same my-huge.cnf contains a long comment section on how to set up replication by mocking with the config file, something that has been adviced against for ages and which isn't even supported in MySQL 5.6 (or MySQL 5.5 for that matter).

What MySQL needs, in my mind, is more version, less emphasis on adding necessary features for new hardware or for fixing bugs, and more GA versions. 5.6 has been real slow in appearing, which is annoying. There is some good stuff in there, but it's very late.

If you ask me, GA means "This versioon will only get bugfixes from now on". And if this means that important features that should be in production has a long time getting there, then the issue is not to add new features to GA releases released during the Carter administration, but to get some more GA releases out there.

And besides that: Can we modernixe MySQL a bit? Remove old stuff from the distribution that doesn't work or that noone really should use? Add stuff that is up-to-date, like sample config files for Amazon EC2? Right? And possibly tools integrated with EC2? Or iOS? Or Ubuntu?

/Karlsson
And yes, I am aware that I myself is a firm beliver in awk, vi and autotools, which are awfully old technologies. So maybe my views shouldn't be taken too seriously, and I just want to stir up a debate?

New MySQL JSON import and export tools

Yo!

My idea for a JSON based SQL schema tool that I wrote about some time ago has been on the backburner as I wanted to finish two other projects:

A better JSON import tool to substitute the old mysqljsonload.
Fix up mysqljsonexport to align it with mysqljsonimport.

These two projects are now done and the program are ready for download from Sourceforge, mysqljsonimport from here and mysqljsonexport from here. There is some serious changes to these program here, for example, the multi-threading in mysqljsonimport is now "dynamic", so that as many threads as needed (up to a user specified limit) are created to support faster loads and in case of mysqlexport, mainly this now has it's options aligned with mysqlimport. This means that now you can export and import a whole database in JSON with these tools with a single command.

As usual with my tools, there are A LOT of options and a lot of flexibility, you can specify fixed values for certain MySQL columns to import or export, and both tools also support an automatic internal increment feature. Also, MySQL columns imported may now contain MySQL functions or constants, basically anything that is allowed as a value in a MySQL INSERT can be used.

Hope you enjoy these tools, there is still a few things missing, like better documentation and more tests and stuff, and also better parallel export for mysqljsonexport, but that is for another version of these tools.

As you might remember, these tools started off as a tool to do some InnoDB, NDB and MongoDB benchmarking. I never got as far with that as I wanted to, but now I have the tools to complete those benchmarks. Also I will soon start working on a tool for that database-schema-in-JSON tool, so stay tuned to this channel folks, and don't touch that dial!

Cheers
/Karlsson

Tuesday, July 17, 2012

Database SCHEMA as JSON - A second OSS opportunity!

I have already written about how documentation is an opportunity for Open Source Software to show off, and area where we have the potential to be better than the Open Source folks. Another such area is interoperability, and for myself, interoperability between databases is what springs to mind.

Database schema interoperability

We all know what a MySQL schema looks like: I bunch of CREATE DATABASE, CREATE TABLE CREATE this-and-that, all together in a file. Usually, when someone asks for a schema, this is what they get, and this is independent on if the user is using PostgreSQL, MySQL, Oracle or whatever database.

In some way, most SQL based databases follow some variation of the SQL standard, but as they all, without exception, also have extensions (which they are allowed to, even if the follow the SQL standard) and they all also interpret things differently. At least in the case of MySQL, there is also the schema "translation" effect to take into account. A column defined as BOOL doesn't actually appear as BOOL when you look at the create schema, neither in the actual type used, nor in the actual semantics (more values that TRUE (1) and FALSE (0) may be represented), and this is not the only example.

All in all, although databases use SQL to define the schema, the schema isn't portable, at all. At least in the case of MySQL, we have to add the different features of the different versions, meaning that an exported schema is full of those /*!40110 WEIRDO_SYNTAX */ comments.

In conclusion, the schema isn't really portable at all, and to an extent not even between different versions of the same database server software. The big issue is that the features supported are part of the actual syntax of the statements that create the schema. But there should be a better way of doing it, and us Open Source folks could well be the ones to do it, i.e. create a way to represent a schema that makes the schema portable across versions and across different databases and in the case of MySQL also across storage engines, where application. And what better format to do this than JSON?

The details

Thinking a bit about how to represent a schema in JSON, I came of with a few ideas:

A bunch of top level objects, like GRANTS and DATABASES and also a top level object that defines the source of the schema.
Some other common objects, like TABLEs, VIEWs and ROUTINEs.
A means to provide versioned schema information.
A means to provide both proprietary (like STORAGE ENGINE) and standardized (NOT NULL) attributes, and also some kind of suggested mapping (like in the case of MySQL, a column type might be both the proprietary MySQL type, like MEDIUMINT, and the best standardized fit for this type, INTEGER).

The beauty of this is that JSON is "tagged", so that any attribute that is meaningless when importing the schema, like STORAGE ENGINE when importing a schema from MySQL into PostgreSQL, can be silently ignored, or the import utility could be smart enough to know how to map this proprietary attribute to PostgreSQL.

For this to work, we need to decide on some basics, like what to contain in the schema and how to name it. Basically I think that the best common denominator would be SQL-92, so that anything that is SQL-92 and is not ambiguous, should look the some in the exported JSON schema, independent of what database we are exporting from.

What is more needed is some tool to export and import schemas to these different database system, but in most cases, I don't think that would be terribly complex. The neat way of this way of doing stuff is that we have a common vehicle, or JSON schema, that allows to to easily migrate a schema, once we have the schema export / import tool ready. On top of this, we should be able to migrate data, but again I think JSON would be brilliant for this (yes, I am a big fan of JSON, but you probably know that by now).

Some coding also is necessary for this, but that is what we Open Source dudes and dudettes are good at, right? If someone else than myself thinks this is a good idea, let me know, and we could do some work. I would really like some PostgreSQL, Oracle etc. folks involved, and not only MySQLers! I truly believe that interoperability is the next big thing, the thing that will make Cloud computing real valuable!

/Karlsson

Sunday, July 15, 2012

JSON export tool for MySQL 1.0 available

It took longer than expected to get this ready, but I got ambitious and I also decided to introduce a reasonable test suite. But not it is done, my JSON export tool for MySQL is ready for download from sourceforge.

So, what's in there? A basic tool for exporting tables from MySQL in JSON format, that's what is there. It has some cool features, like batching (so you do not need 10 Gb of RAM to export a 10 Gb table), which by the way is automatic, the ability to export from an ad-hoc SQL query, JSON format support for BOOL and NULL values, fixed values (i.e. you can add a fixed column value to each row exported), LIMIT support, conditional export using a WHERE extension and much more.

What is missing is base64 support for BINARY and BLOB columns, and this is a shame. Currently, this data is exported as is and is supposed to be null-terminated (which they usually are not), so for any generic binary data in MySQL, the program will not work as expected. Also, there are some more minor issues to fix. There is pdf documentation to download from the link above if all you want to do is have a look at that.

I will update the JSON load tool soon, and this will have it's name changed to mysqljsonimport and will have features added to make it more compatible with mysqljsonexport, but this is a later story.

/Karlsson

An Open Source opportunity, part 1

I guess a lot of folks will disagree with me here, but I have to say this: I think that Open Source is a great way of developing great software, but some opportunities that this way of developing software are lost or underused. I will start with one thing that is close to my heart, but there is more to it, so I will try to follow up with some more stuff later.

Documentation

Yes, documentation. You know that lengthy novel you write after people numerous times has asked for obvious things that just about anyone can see if they read your code, but they don't seem to do just that. Weird isn't it, that people try to use your code to do productive stuff instead of just putting it to good use in a productive way.

Yes, we all seem to assume that the whole world are developers and that using Open Source software somehow implies that you are just another developer. Frankly, this is not so. I think that lack of documentation is the biggest flaw of most Open Source projects. Documentation seems to come as an afterthought once the deed is done, and it's contents more often than should be the case, seems to document how the code works instead of how the program is supposed to be used.

And this is even more annoying as we are all Open Source dudes, we should be able to work together to create documentation that is not just as good as commercial software but a lot better. For example, we should be able to do better cross referencing to other Open Source software documentation. And we should be able to create a structure for at least the basic stuff of our software that should be easily recognized independent of the software we use. For example, if I include some other software package and also use autotools to build my software, then I should be able to reference the relevant documentation for those packages, so that anyone not familiar with autotools, but who still wants to use my software, should be able to find any additional docs for autotools without me copying that as part of my documentation (something I really shouldn't do anyway).

And we are a community, right? We should be able to have other than ourselves, the software authors, to comment, add use cases and create references and examples. Does a project Wiki fulfil that purpose? Nope, not in my mind, as I here look at comments and samples for the documentation itself.

What I would want to see eventually

Writing documentation is considered boring (I don't, but I assume many do), so why don't we have a way to collaborate, so as to allow those who are better at docs to provide that. Where is that website that allows me to write and collect documentation for all sorts of open source projects? That allows dynamic cross referencing? That allows users to comment and to add use cases? That allows me to generate personalized doc sets for the software I use? That has a similar structure and common sections for different software packages, when applicable (like "Installing", "Building from source", "Change log", "Reference", "Configuring", "Quick start"). A system that allows be to view are create versioned manuals (like: "Hey, I am using MySQL 5.6, I'm not interesting what came and went in version 4.7 or something like that").

A final thought: If you had never driven a car in your whole live, would a detailed instructing on how the Otto-engine works help you? Probably not. Which is not to say that a description in detail of that same engine would not be helpful in some cases, but largely not as part of describing how to use a car in real life.

For now, this is it but I will visit another pet subject of mine that we OSS folks should be better at: APIs.

/Karlsson

Sunday, July 1, 2012

So, what happened to the JSON unloader?

I promissed to release a complenent to my jsonload project, a json unload / export tool, in a blog post last week, but since then, not much has happened. Well, what happened was two things:

Amazon distracted me with all sorts of issues, downs, connectioon problems and power outages.
I got WAY more ambitious than I first planned to be.

You know what it is like, you really want to do the right thing, and when you start doing that you figure what the right thing really is, and it's much more featureful and performant that you had originally envisioned. But stuff is in the works, and it's pretty cool. In the process, I have introdused a few features that are such that I want to link them up to how jsonload works, so that these guys work nicely together. Also, I'm changing the names of these guys, so in the next version they will be called mysqljsonexport and mysqljsonimport respectively. And then there is the usual set of cool features, the best one, in terms of the export, is a batching feature that allows you to export really large tables that will not fit in RAM, and this feature is automatic, a logic which took some time to develop but which works real nice.

So sometime next week, if nothing else breaks, I'll be ready for an initial release of mysqljsonexport and a new new version of mysqljsonimport. So don't touch that dial just yet.

/Karlsson