Monday, May 27, 2013

JSON datatype in MariaDB prototype

I have patched up a MariaDB version with JSON support, just for the fun of it. This is not the best version of MariaDB around, as I am not a MariaDB developer by any means, and although I have played with the MySQL and MariaDB sources before, I have never attemped to look like I know it in detail. So although this works, it's probably full of memory leaks, errors and bad code (my additions, that is).

That said, it actually works. Surprise! For a simple prototype, that is to show off a specific feature.

So, this is what I have: I grabbed MariaDB 10.0.2 sources and worked from there. To support the JSON specifics, I included the Jansson C-library for JSON. So far so good, then I wanted a JSON datatype, that was the first step. Adding a new datatype to MariaDB / MySQL is a pretty major undertaking though, so I decided to try something different, I decided to kidnap an existing datatype, preferably one with attributes that I could reuse for my JSON type, so I ended up grabbing the little used MEDIUMBLOB datatype.

I also decided to use plain JSON text as the representation, and not use some compacted binary representation, the only compaction I do is to remove duplicate members and unnecessary spaces. So for a JSON object like this:
{"fname":  "John", "dept": {  "name":  "accounting", "id": 1 }, "lname": "Doe", "fname":  "George"}
I will store
{"fname":"George","lname":"Doe","dept":{"name":"accounting","id":1}}
Which is no big deal I guess. So with this in hand, I can now create a table like this:
CREATE TABLE appuser(id VARCHAR(10) NOT NULL PRIMARY KEY, name JSON);

The JSON datatype checks that the input string is valid JSON, which is different from the way dynamic columns work. Also, the data is compacted as mentioned above.

Let's insert some data to test this out:
INSERT INTO appuser VALUES('johndoe', '{"fname": "John", "lname": "Doe", "dept": {"name": "accounting", "id": 1}, "role": "admin"}');
INSERT INTO appuser VALUES('susanscott', '{"lname": "Scott", "fname": "Susan", "dept": {"name": "services", "id": 2}}');
INSERT INTO appuser VALUES('janeroe', '{"lname": "Roe", "fname": "Jane", "dept": {"name": "accounting", "id": 1}}');
INSERT INTO appuser VALUES('tomdraper', '{"fname": "Tom", "lname": "Draper", "dept": {"name": "management", "id": 3}, "subordinates": ["johndoe", "susanscott"]}');

Following this, we can do some simple JSON manipulation like this;
SELECT id FROM appuser WHERE json_object_get(name, 'role') = 'admin';

And if the name of the 'admin' role changes to 'administrator' we can do this:
UPDATE appuser SET name = json_object_set(name, 'role', 'administrator') WHERE json_object_get(name, 'role') = 'admin';

And to find whoever is supervising Susan Scott, we can do this:
SELECT id FROM appuser WHERE json_array_contains(json_object_get(name, "subordinates"), "susanscott");

Finally, John Doe is actually the supervisor for Jane Roe, so let's fix that:
UPDATE appuser SET name = json_object_set(name, "subordinates", '["janeroe"]') WHERE id = 'johndoe';

I have a few more things in there, and there are many things missing, this was done just to get a feel for things. The main tool for all thsi is Jansson, as mentioned above, and Jansson has a bunch of cool tricks up it's sleeve. We could easily implement merging of object and of arrays, more array management functions and stuff like that.

Again, what do you think? Is this useful? I sure think this is easier to use than the Dynamic Columns (but these have other advantages). I still have things to do, both functions to be added as well as SELECT ... INTO OUTFILE and LOAD DATA INFILE and stuff like that.

Does anyone want to play with this? I do have a x86_64 build if someone wants to try it.

/Karlsson

3 comments:

Pavel Stratil said...

Sounds really cool! We'd definitely love to use this in production, the only other option to do the same is xml which makes queries look like encrypted garbage. Is there an ETA when this could actually get merged into the mariadb trunk?

Christoph Evers said...

Sounds really cool indeed. I was just thinking of a library for doing these kind of things (to avoid pulling the whole doc, update and putting back)...
I would love to be able to play with it if possible!

Anders Karlsson said...

In all honesty, this was just an experiment and I'm not a developer. What I could do though, was to convert the JSON function here into a UDF library. This would not implement the JSON datatype, but except for that it could be a useful.