Tuesday, July 17, 2012

Database SCHEMA as JSON - A second OSS opportunity!

I have already written about how documentation is an opportunity for Open Source Software to show off, and area where we have the potential to be better than the Open Source folks. Another such area is interoperability, and for myself, interoperability between databases is what springs to mind.

Database schema interoperability

We all know what a MySQL schema looks like: I bunch of CREATE DATABASE, CREATE TABLE CREATE this-and-that, all together in a file. Usually, when someone asks for a schema, this is what they get, and this is independent on if the user is using PostgreSQL, MySQL, Oracle or whatever database.

In some way, most SQL based databases follow some variation of the SQL standard, but as they all, without exception, also have extensions (which they are allowed to, even if the follow the SQL standard) and they all also interpret things differently. At least in the case of MySQL, there is also the schema "translation" effect to take into account. A column defined as BOOL doesn't actually appear as BOOL when you look at the create schema, neither in the actual type used, nor in the actual semantics (more values that TRUE (1) and FALSE (0) may be represented), and this is not the only example.

All in all, although databases use SQL to define the schema, the schema isn't portable, at all. At least in the case of MySQL, we have to add the different features of the different versions, meaning that an exported schema is full of those /*!40110 WEIRDO_SYNTAX */ comments.

In conclusion, the schema isn't really portable at all, and to an extent not even between different versions of the same database server software. The big issue is that the features supported are part of the actual syntax of the statements that create the schema. But there should be a better way of doing it, and us Open Source folks could well be the ones to do it, i.e. create a way to represent a schema that makes the schema portable across versions and across different databases and in the case of MySQL also across storage engines, where application. And what better format to do this than JSON?

The details

Thinking a bit about how to represent a schema in JSON, I came of with a few ideas:
  • A bunch of top level objects, like GRANTS and DATABASES and also a top level object that defines the source of the schema.
  • Some other common objects, like TABLEs, VIEWs and  ROUTINEs.
  • A means to provide versioned schema information.
  • A means to provide both proprietary (like STORAGE ENGINE) and standardized (NOT NULL) attributes, and also some kind of suggested mapping (like in the case of MySQL, a column type might be both the proprietary MySQL type, like MEDIUMINT, and the best standardized fit for this type, INTEGER).
The beauty of this is that JSON is "tagged", so that any attribute that is meaningless when importing the schema, like STORAGE ENGINE when importing a schema from MySQL into PostgreSQL, can be silently ignored, or the import utility could be smart enough to know how to map this proprietary attribute to PostgreSQL.

For this to work, we need to decide on some basics, like what to contain in the schema and how to name it. Basically I think that the best common denominator would be SQL-92, so that anything that is SQL-92 and is not ambiguous, should look the some in the exported JSON schema, independent of what database we are exporting from.

What is more needed is some tool to export and import schemas to these different database system, but in most cases, I don't think that would be terribly complex. The neat way of this way of doing stuff is that we have a common vehicle, or JSON schema, that allows to to easily migrate a schema, once we have the schema export / import tool ready. On top of this, we should be able to migrate data, but again I think JSON would be brilliant for this (yes, I am a big fan of JSON, but you probably know that by now).

Some coding also is necessary for this, but that is what we Open Source dudes and dudettes are good at, right? If someone else than myself thinks this is a good idea, let me know, and we could do some work. I would really like some PostgreSQL, Oracle etc. folks involved, and not only MySQLers! I truly believe that interoperability is the next big thing, the thing that will make Cloud computing real valuable!

/Karlsson

1 comment:

Unknown said...

I'm down. Where do we start?