Karlsson on databases and stuff: json

Showing posts with label json. Show all posts

Tuesday, February 28, 2017

JSON with MariaDB 10.2

JSON is fast becoming the standard format for data interchange and for unstructured data, and MariaDB 10.2 adds a range on JSON supporting functions, even though a JSON datatype isn't implemented yet. There are some reasons why there isn't a JSON datatype, but one is that there are actually not that many advantages to that as JSON is a text-based format. This blog post aims to describe JSON and the use cases for it, as well as to describe the MariaDB 10.2 JSON functions and uses for these, as well as showing some other additions to MariaDB 10.2 that are useful for JSON processing.

So to begin with then, why do we need JSON? Or to put it differently, why do we not store all data in JSON? Well, the reason as I see it is that some data we work with really is best treated as schemaless whereas some other data really should be handled in a more strict way in a schema. Which means that in my mind mixing relational data with unstructured data is what we really want. And using JSON for unstructured data is rather neat, and JSON is even standardized (see json.org).

There are reasons why this hasn't always been so. When the sad old git that is writing this stuff started working in this industry, which I think was during the Harding administration, computers were rare, expensive, handled only by experts (so how I got to work with them is a mystery) and built from lego-bricks, meccano and pieces of solid gold (to keep up the price). Also, they were as powerful as a slide-ruler, except it was feed with punched-cards (and probably powered by steam). Anyway, no one in their right mind would have considered string pictures of cute felines as something to be on a computer, or actually stuff to be stored in the database. The little that would fit was the really important stuff - like price, amount in stock, customer name, billing address and such - and nothing else. And not only that, stuff that was stored had some kind of value, somehow, which meant it had to follow certain rules (and following rules is something I am good at? I wonder how I ended up in this business. Again). Like, a price had to be a number of some kind, with a value 0 or higher and some other restrictions. As you see, these were hard and relentless times.

And then time moved on and people started buying things on the internet (whatever the internet is. I think it is some kind of glorified, stylish version of punched cards) and stuff such as Facebook and Google came around. The issue with computer storage was now not how to fit all that nicely structured data in it, but rather once we have filled that hard drive on your desktop with all the product, customers and transactions from Amazon (I think Amazon has something to do with Internet, but I am not sure) and a full 17.85% of that drive is now occupied by that old-style structured data, what more do we put in there? Maybe we could put some more information on the items for sale in that database, and some general information on who is buying it? That should fill up that disk nicely, right? Well, yes, but that new data, although related to the structured data I already have, is largely unstructured. Say, for example, you write a review of a product on Amazon late in the morning after a good deal of heavy "partying" (which is not an Internet thing, I think), the contents of that would hardly be considered "structured". If you didn't like the product (which you probably didn't), then the appropriate terms for large parts of that review would probably be "profanity" or "foul language").

The way to deal with the above is a mix of structured and unstructured data, with some kind of relation between the two. Like a column of unstructured data in each relational database table (or should I say "relation", just to show my age? Or maybe I should pretend to be really young, modern and cool, possibly sporting a hipster beard and all that, by calling it a "collection").

With that out of the way, let's consider an example using structured as well as non-structured JSON data. Assume we have a store selling different types of clothing, pants, jackets, shoes and we are to create a table to hold the inventory. This table would have some columns that are always there and which have the same meaning for all rows in the table, like name, amount in stock and price. These are items that are well suited for a relational format. On top of this we want to add attributes that have different meaning for each type of or even each instance of items. Here we have things like colour, width, length and size. These we consider non-relational as the interpretation of these attributes are different depending of the type of garment (like size M or shoe sizes or a "zebra striped" colour) and some garments might have some unique attribute, like designer or recommended by staff or something. Our table might then look something like this:
MariaDB> CREATE TABLE products(id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
price DECIMAL(9,2) NOT NULL,
stock INTEGER NOT NULL,
attr VARCHAR(1024));
In this table we have a few columns that look like columns in any relational database table, and then we have a column, called attr, that can hold any relevant attribute for the garment in question and we will store that as JSON a JSON string. You probably notice that we aren't using a JSON datatype here as that is not present in MariaDB, despite that there are JSON functions, but those JSON functions act on a text-string with JSON content. These functions are introduced in MariaDB 10.2 (which is in Beta as I write this), but there are a few bugs that means you should use MariaDB 10.2.4 or higher, which means as for now we assume that MariaDB 10.2.4 or higher is being used.

But there is one issue with the above that I don't particularly care for and that is, as the attr column is plain text, any kind of data can be put in the attr column, even non-valid JSON. The good thing is that there is a fix for this in MariaDB 10.2, which is CHECK constraints that actually work, and this is a little discussed feature of MariaDB 10.2. The way this works is that this kind of constraint kicks in whenever a row is INSERTed or UPDATEed, any CHECK constraint runs and validates the data and if the validation fails the operation also fails. Before I show an example I just want to mention one JSON function we are to use here, which is JSON_VALID which takes a string and checks if it is valid JSON. Note that although CHECK constraints are particularly valid here, check constraints can be used for any kind of data validation.

Armed with this, let's rewrite the statement that creates the table like this:
MariaDB> CREATE TABLE products(id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
price DECIMAL(9,2) NOT NULL,
stock INTEGER NOT NULL,
attr VARCHAR(1024),
CHECK (JSON_VALID(attr)));

Let's give this a try now:
MariaDB> INSERT INTO products VALUES(NULL, 'Jeans', 10.5, 165, NULL);
ERROR 4025 (23000): CONSTRAINT `CONSTRAINT_1` failed for `inventory`.`products`
Ok, that didn't work out. What happens here is that a NULL string isn't a valid JSON value, so we need to rewrite our table definition:
MariaDB> CREATE TABLE products(id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
price DECIMAL(9,2) NOT NULL,
stock INTEGER NOT NULL,
attr VARCHAR(1024),
CHECK (attr IS NULL OR JSON_VALID(attr)));
Following this we can try it again:
MariaDB> INSERT INTO products VALUES(NULL, 'Jeans', 10.5, 165, NULL);
Query OK, 1 row affected (0.01 sec)
MariaDB> INSERT INTO products VALUES(NULL, 'Shirt', 10.5, 78, '{"size": 42, "colour": "white"}');
Query OK, 1 row affected (0.01 sec)
MariaDB> INSERT INTO products VALUES(NULL, 'Blouse', 17, 15, '{"colour": "white}');
ERROR 4025 (23000): CONSTRAINT `CONSTRAINT_1` failed for `inventory`.`products`
That last statement failed because of malformed JSON (a double quote was forgotten about), so let's correct that:
MariaDB> INSERT INTO products VALUES(NULL, 'Blouse', 17, 15, '{"colour": "white"}');
Query OK, 1 row affected (0.01 sec)
One thing that has yet to be discussed is indexes on JSON values. As the attr column in our example is a plain text, we can of course index it as usual, but that is probably not what you want to do, rather what would be neat would be to create an index on individual attributes in that JSON string. MariaDB doesn't yet support functional indexes, i.e. functions not on values but on computed values. What MariaDB does have though is Virtual Columns, and these can be indexed and as of MariaDB 10.2 these virtual columns don't have to be persistent, (read more on Virtual Columns here: Putting Virtual Columns to good use).

The easiest way to explain this is with an example. Let's say we want an index on the colour attribute, if such a thing exists. For this we need two things: A virtual column that contains the colour attribute as extracted from the attr column, and then an index on that. In this case we will be using the JSON_VALUE function that takes a JSON value and a path, the latter describing the JSON operation to be performed, somewhat like a query language for JSON.

We end up with something like this:
MariaDB> ALTER TABLE products ADD attr_colour VARCHAR(32) AS (JSON_VALUE(attr, '$.colour'));
MariaDB> CREATE INDEX products_attr_colour_ix ON products(attr_colour);
With that in place, let's see how that works:
MariaDB> SELECT * FROM products WHERE attr_colour = 'white';
+----+--------+-------+-------+---------------------------------+-------------+
| id | name   | price | stock | attr                            | attr_colour |
+----+--------+-------+-------+---------------------------------+-------------+
| 2 | Shirt | 10.50 |    78 | {"size": 42, "colour": "white"} | white       |
| 3 | Blouse | 17.00 |    15 | {"colour": "white"}             | white       |
+----+--------+-------+-------+---------------------------------+-------------+
2 rows in set (0.00 sec)
And let's see if that index is working as it should:
MariaDB> EXPLAIN SELECT * FROM products WHERE attr_colour = 'white';
+------+-------------+----------+------+-------------------------+-------------------------+---------+-------+------+------------+
| id   | select_type | table    | type | possible_keys           | key                     | key_len | ref   | rows | Etra       |
+------+-------------+----------+------+-------------------------+-------------------------+---------+-------+------+------------+
|    1 | SIMPLE      | products | ref | products_attr_colour_ix | products_attr_colour_ix | 99      | const |    2 | Uing where |
+------+-------------+----------+------+-------------------------+-------------------------+---------+-------+------+------------+
1 row in set (0.00 sec)

And just to show that the column attr_colour is a computed column that depends on the attr column, lets try updating the colour for the blouse and make that red instead of white and then search that. To replace a value in a JSON object MariaDB 10.2 provides the JSON_REPLACE functions (for all JSON functions in MariaDB 10.2 see MariaDB Knowledge Base).
MariaDB> UPDATE products SET attr = JSON_REPLACE(attr, '$.colour', 'red') WHERE name = 'Blouse';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MariaDB> SELECT attr_colour FROM products WHERE name = 'blouse';
+-------------+
| attr_colour |
+-------------+
| red         |
+-------------+
1 row in set (0.00 sec)

There is more to say about JSON in MariaDB 10.2 but I hope you now have a feel on what's for offer.

Happy SQL'ing
/Karlsson

Tuesday, October 11, 2016

Getting data out of SQL Server in a MariaDB friendly way.

I know what you are thinking "Oh my, it's that time again, Karlsson insist that data should get out of some other database and into MariaDB. What is he up to this time?" and that is exactly right, this time data is coming out of SQL Server, out of that expensive, closed source, Windows-only software and into the lightweight, fast and cost effective MariaDB.

I have already shown how this works when getting data out of Oracle in releasing MyOraDump, so inline with that the corresponding program to get data out of SQL Server is called MyMSSQLDump and it has features along same same lines as MyOraDump. To connect to SQL Server (or Sybase, but this I haven't tested) I use FreeTDS, which is a nice Open Source SQL Server / Sybase driver. The output formats supported are MySQL (i.e. A file with (INSERT INTO...), MSSQL (same as MySQL format but aligned for MSSQL), JSON, JSON Array (similar to JSON but all data in one array) and CSV. All formats are very flexible and the common SQL Server / Sybase datatypes are supported.

As usual, the code is written in C and use autotools for building. I have only tested building it on CentOS (6 and 7) so far, but there is nothing magic to it so it should work on other platforms too. Download it from sourceforge.

Happy SQL'ing
/Karlsson

Monday, August 8, 2016

MyQuery 3.5.6 released

I released version 3.5.6 of MyQuery, and there are quite a number of new features and fixes in there. The #1 bugfix is that the annoying access warnings that poped up from Windows when saving to the registry are gone, as I have now moved the registry to a more Windows 10 acceptable place. Among the new features are:

JSON format output when saving results.
More flexible CSV format output with many new options.
Ability to save Dyncol as JSON in CSV and JSON output.
Nicer formatting of numbers in status dialogs.
Auto refresh of status dialogs

As this is a version with many new features, I still consider this a Beta. I have built it on Windows 10 and tested it on Windows 10 and 7, 64-bit, although MyQuery itself is still a 32-bit windows application.

Happy SQL'ing
/Karlsson

Tuesday, February 16, 2016

Loading JSON into MariaDB or even MySQL - mysqljsonimport 2.0 is available

It was a long time since I updated mysqljsonimport or mysqljsonexport and I had a few things I wanted to do with them. This release is significant enough for me to bump it up to 2.0, and the same is in the works for mysqljsonexport. The one big thing that is now implemented is reasonably advanced support for MariaDB Dynamic Columns, and it is actually pretty flexible, allowing you to load a nested JSON object into a MariaDB Dynamic Column. But don't worry, it will still link and run with MySQL if that is what you want to do (but then you will not have the dynamic column features, for obvious reasons),

Download from Sourceforge as usual (yes, I know I am oldfashioned and that I should have put it on github). Also as usual is the documentation in pdf format that is also downloadable separately.

/Karlsson

Wednesday, August 26, 2015

MyOraDump, Oracle dump utility, version 1.2

I have now released version 1.2 of MyOraDump, my Oracle data extraction tool. This version has one new feature, which is transaction support for MySQL format exports which does speed up loading data a lot! Trust me, it really does! Also I have fixed a bug that did cause a crash at the end of the run, I have no idea why this didn't show up before, but there you go and now it is fixed.

MyOraDump 1.2 can be downloaded sourceforge, and as usual there is also a pdf only download if you want to read up on the tool before using it.

/Karlsson

Wednesday, October 16, 2013

MariaDB Dynamic Columns client API

I have blogged on using MariaDB Dynamic Columns already, and I hope this was useful and introduction. I have a few more things on this subject though, but one so far little known and used feature is the Client API for Dynamic Columns, see the MariaDB Knowledge Base for details. What this is all about is that Dynamic Columns were originally envisioned as a means of managing the "dynamic columns" used by Cassandra when using the MariaDB Cassandra Storage Engine. But as we realize, this is the server side of things (the Storage Engine) but there is a corresponding client library also, that is part of the MariaDB Client API.

As you have seen if you have read my previous blog on this subject, or whatever is written about MariaDB Dynamic Columns elsewhere, which is not much, MariaDB actually has no external representation of the dynamic column: Either you get the binary representation or you parse the binary representation using one of the supplied functions (like COLUMN_GET to get the value of an attribute in a dynamic column) or you get the binary. The only exception is the function COLUMN_JSON that retrieves the whole dynamic column and converts it to JSON, supporting nested objects also. Regrettably, there is no built-in means of adding a dynamic column in JSON format (Ulf Wendel has provided some JSON User Designed Functions though, but they don't line up with Dynamic Columns on the other hand).

Now, if we assume we are not using the COLUMN_GET function or something like that, can I programmatically parse a dynamic_column on the client? Well, I could sure use the server functions again, calling COLUMN_LIST and functions like that from a SELECT statement or in a Storage Procedure (I'll show this later), but what is real handy (if you are a C-developer like myself) is the DynamicColumns API functions in the MariaDB Client Library.

To begin with, these functions are not as easy to use as it might seem in the Knowledge Base article mentioned above. There are some bugs in what files are required that currently makes it kind of messy to build programs that use the Dynamic Columns API (which brings up another question: How many times can you write Dynamic Columns in one single blogpost?) This will be fixed eventually (it is known, reported bugs we talk about here), but until it is, let me show you some examples, so you get the point of this API.

To begin with, the API assumes that data passed to it is in the form of a DYNAMIC_COLUMN struct, which is a string with a specified length, i.e. it is NOT a null-terminated string but instead it is really, truly binary. This might be silly, but it is just how things are. So the DYNAMIC_COLUMN struct is set up with the data you got from a MariaDB Dynamic Column binary data. Still, this might not be valid Dynamic Column data, so to check that you should call the mariadb_dyncol_check() referencing this data to make sure it is valid Dynamic Column data.

The function that you then want to use is by far the most powerful one, mariadb_dyncol_unpack(). This will unpack a MariaDB dynamic column are retrieve the basic information on it:

How many attributes there are (how many "Dynamic Columns")
What the names of those attributes are.
The the values of those attributes are.

The values are returned as DYNAMIC_COLUMN_VALUE structs and the names as MYSQL_LEX_STRING structs. Most of this is described in the knowledge base article mentioned above.

I spent some time playing with this, most of that time was spent understanding how the API was meant to work and how to build an application on it. One night, after some heavy drinking, I decided that the best way to show this would be to write some code, instead of just blog about it on and on, so still severely drunk I wrote a simple tool for extracting Dynamic Column data from MariaDB and show it in some different ways (JSON, pretty JSON, Indented dynamic columns etc). To allow you to build it yourself I used autotools, but fact is to use that, you have to also copy some includefiles from the MariaDB source distribution (the reason is those bugs I mention above).

So there is a pre-built executable (I built this on an Ubuntu 10.10) that assumes you use the MariaDB 10.x Client Library (I used 10.0.4). If this doesn't work, as I said, you can always build it yourself (which isn't a throughly tested procedure). Also, for any of this to work, you need the Jansson JSON library.

I'll develop some more Dynamic Columns related tools eventually, but for now this is what I have in the dyncoltools toolset, the dyncoldebug tool available on sourceforge.

/Karlsson

Thursday, September 19, 2013

MariaDB Dynamic Columns

MariaDB has a feature called Dynamic Columns which is not in MySQL, and this feature requires some explanation. It is used for example by the Cassandra Storage Engine, which is also unique to MariaDB, and as this is a schema-less database, which means we need a way to handle the fact that one one end MariaDB has a fixed set of columns defined by the schema, and on the other end, Cassandra provides any kind of attribute that the developer feels he wants to have for a particular "row" (which is a row in MariaDB but is not called a row in Cassandra).

But ignoring the Cassandra engine for a while, let's look at what us mere mortals, brought up on mothers milk, SQL and Relational Theory under the watching eyes of E.F. Codd, can use this for, and fact is that it can be quite useful. All in all, what Dynamic Columns provide here is a means of adding non-schema data to a row in a structured way, you know where you used to emulate an array or something in a VARCHAR or use full blown Anchor Modelling for properties.

Which is not to say that Dynamic Columns is the best things since sliced bread, but it does have some good uses, so let me show you how they work. To begin with, a Dynamic Column contains structured data in the form of one or more pairs of <attribute name> = <attribute value>. In MariaDB 5.5, <attribute name> was numeric, in 10.x it is a proper name as it should have been from the start. All examples here assume you are using MariaDB 10.x.

The attribute names are not predefined or in a schema or anything, each instance of a Dynamic Column (i.e. every Dynamic Column in every row) might have different attributes with different values and types. And yes, is typed, but typing is automatic per instance of a Dynamic Columns (i.e. every Dynamic Column in every row of the table might have different type for an for the same ). The possible types are:

String (Binary or Normal)
Date
Datetime
Decimal
Double
Integer (Signed or Unsigned)
Time
Dynamic column

As can be seen from this, nesting is actually available, i.e. might be in the form of another Dynamic Column. So let's dig in with some simple examples.

First, lets create a table that holds some useful attributes, in this case I have a table with my users, and for each user I want to store a range of mixed attributes that the application finds useful, but which are not in the normal TABLE schema, such a birthplace, car make, etc etc.
CREATE TABLE users(userid INT NOT NULL PRIMARY KEY AUTO_INCREMENT, username VARCHAR(255), firstname VARCHAR(255), lastname VARCHAR(255), attrs BLOB);
Now we have to insert some data into this table. To create a Dynamic Column value you use the COLUMN_CREATE function which takes a even variable number of arguments, which are name value pairs. For example:
INSERT INTO users VALUES(NULL, 'bloggs', 'Joe', 'Bloggs', COLUMN_CREATE('shirtsize', 'L', 'colorpref', 'red'));
INSERT INTO users VALUES(NULL, 'doe', 'John', 'Doe', COLUMN_CREATE('shirtsize', 'L', 'vegetarian', FALSE));
Now we hopefully have some data in the table, and as you can see, we are mixing schema data with schemaless data in the same row. Before I end for this time on this subject, let's get a look at that data in the attrs column. There are a few different means of doing that, either you can get a specific Dynamic Column value, like this:
SELECT COLUMN_GET(attrs, 'colorpref' AS CHAR) FROM users;
+----------------------------------------+
| COLUMN_GET(attrs, 'colorpref' AS CHAR) |
+----------------------------------------+
| red                                    |
| NULL                                   |
+----------------------------------------+
2 rows in set (0.00 sec)
Or I can get all the attributes of a dynamic column in one go, in which case I get JSON back:
SELECT COLUMN_JSON(attrs) FROM users;
+-------------------------------------+
| COLUMN_JSON(attrs)                  |
+-------------------------------------+
| {"colorpref":"red","shirtsize":"L"} |
| {"shirtsize":"L","vegetarian":0}    |
+-------------------------------------+
2 rows in set (0.00 sec)
That's it for now, but I'll be back with some more details and some cool things you can do with MariaDB Dynamic Columns!

/Karlsson

Monday, May 27, 2013

JSON datatype in MariaDB prototype

I have patched up a MariaDB version with JSON support, just for the fun of it. This is not the best version of MariaDB around, as I am not a MariaDB developer by any means, and although I have played with the MySQL and MariaDB sources before, I have never attemped to look like I know it in detail. So although this works, it's probably full of memory leaks, errors and bad code (my additions, that is).

That said, it actually works. Surprise! For a simple prototype, that is to show off a specific feature.

So, this is what I have: I grabbed MariaDB 10.0.2 sources and worked from there. To support the JSON specifics, I included the Jansson C-library for JSON. So far so good, then I wanted a JSON datatype, that was the first step. Adding a new datatype to MariaDB / MySQL is a pretty major undertaking though, so I decided to try something different, I decided to kidnap an existing datatype, preferably one with attributes that I could reuse for my JSON type, so I ended up grabbing the little used MEDIUMBLOB datatype.

I also decided to use plain JSON text as the representation, and not use some compacted binary representation, the only compaction I do is to remove duplicate members and unnecessary spaces. So for a JSON object like this:
{"fname": "John", "dept": { "name": "accounting", "id": 1 }, "lname": "Doe", "fname": "George"}
I will store
{"fname":"George","lname":"Doe","dept":{"name":"accounting","id":1}}
Which is no big deal I guess. So with this in hand, I can now create a table like this:
CREATE TABLE appuser(id VARCHAR(10) NOT NULL PRIMARY KEY, name JSON);

The JSON datatype checks that the input string is valid JSON, which is different from the way dynamic columns work. Also, the data is compacted as mentioned above.

Let's insert some data to test this out:
INSERT INTO appuser VALUES('johndoe', '{"fname": "John", "lname": "Doe", "dept": {"name": "accounting", "id": 1}, "role": "admin"}');

INSERT INTO appuser VALUES('susanscott', '{"lname": "Scott", "fname": "Susan", "dept": {"name": "services", "id": 2}}');

INSERT INTO appuser VALUES('janeroe', '{"lname": "Roe", "fname": "Jane", "dept": {"name": "accounting", "id": 1}}');

INSERT INTO appuser VALUES('tomdraper', '{"fname": "Tom", "lname": "Draper", "dept": {"name": "management", "id": 3}, "subordinates": ["johndoe", "susanscott"]}');

Following this, we can do some simple JSON manipulation like this;

SELECT id FROM appuser WHERE json_object_get(name, 'role') = 'admin';

And if the name of the 'admin' role changes to 'administrator' we can do this:

UPDATE appuser SET name = json_object_set(name, 'role', 'administrator') WHERE json_object_get(name, 'role') = 'admin';

And to find whoever is supervising Susan Scott, we can do this:
SELECT id FROM appuser WHERE json_array_contains(json_object_get(name, "subordinates"), "susanscott");

Finally, John Doe is actually the supervisor for Jane Roe, so let's fix that:
UPDATE appuser SET name = json_object_set(name, "subordinates", '["janeroe"]') WHERE id = 'johndoe';

I have a few more things in there, and there are many things missing, this was done just to get a feel for things. The main tool for all thsi is Jansson, as mentioned above, and Jansson has a bunch of cool tricks up it's sleeve. We could easily implement merging of object and of arrays, more array management functions and stuff like that.

Again, what do you think? Is this useful? I sure think this is easier to use than the Dynamic Columns (but these have other advantages). I still have things to do, both functions to be added as well as SELECT ... INTO OUTFILE and LOAD DATA INFILE and stuff like that.

Does anyone want to play with this? I do have a x86_64 build if someone wants to try it.

/Karlsson

Sunday, May 26, 2013

SQL and JSON, what do you think?

As you might know, I'm a big fan of JSON. One big reason is that I believe that JSON is closer to most developers view on data, whereas the Relational SQL based model is closer to what someone working with data itself or someone working with infrastructure. What I mean here is that neither view is wrong, but they are different.

So, given that, can we merge the Object JSON world with the relational model? Well, not JSON, but Hibernate does it quite well. This is one of my objects to the NoSQL world, that the datamodel is closely linked to the application at hand, and less so to data itself and to other applications. Stuff such as accounts, privileges, accounting data, orders and many other things are global, and are not specifically connected a specific application, but in many NoSQL applications, this is what it ends up being.

And there are not that many good solutions, how can I easily explore data in a NoSQL database, where the data model is application based, without me knowing the application? See what a high-ranking 10gen person, Dwight Merriman (Chairman of the board) has to say on the subject in an interview in 2011.

On the other hand, we relational folks has a few things to answer to. We insist that we know how data is to be managed (and I think we do) and we seem to ignore the fast that development tools and environments has changed sine the 1980s (we still insist that proprietary client/server protocols is the way to go, and that SQL rules), In my mind, SQL and Relational rules for data, but not necessarily for applications, and we (including yours truly) should wake up and appreciate what is going on here: We need better, more developer focused, means of accessing data, even if that data is relational data.

The NoSQL camp is getting ready to fix what they can fix: proper query languages (Cassandra CQL being one), better data access methods and probably much more that I am not aware of right now.

Which bring me to what we can do to be more developer friendly on the SQL and relational side of things. And the answer is, as I have already hinted, JSON support! MariaDB has a few trucks up it's sleeve, like the Dynamic Columns support. But this is nowhere near enough, the idea is OK but the way to use dynamic columns for a developer doesn't help much, rather the interface is more targeted at making the Cassandra Storage Engine features easier to access from the MariaDB SQL Interface. But I'll do a blog later on the Dynamic Column support and JSON later on anyway.

What I would want, but maybe that's just me, is a proper JSON datatype, with JSON functions, syntax and semantics. That is a starting point I think, and it shouldn't be too difficult. But many other JSON related things that we could do spring to mind:

SQL / JSON language mix - What I mean here is that JSON isn't just a string datatype, but that we can include plain JSON, where approprite, in the SQL:
INSERT INTO table VALUES(57, {"empno": 1, "ename": "Scott})
JavaScript stored procedures
ROW data as JSON support in the MySQL API
JSON column data as ROW data in the MySQL API and tools.
A JSON Storage Engine
JSON support in LOAD DATA INFILE
JSON support in SELECT .. INTO OUTFILE
REST / JSON interface to complement the MySQL interface
JSON attribute indexing

A then again some stuff I haven't though of so far. A reason for this blogpost if to get your opinion? What do you think? Would this be useful? Would is make MySQL easier to use?

And example why this is useful: instead of having to add columns to a table for some new attribute, attributes that are useful mostly for the applications, is then done by just adding the to a JSON column in the table and the application can add what it wants to that column. We have all seen this, haven't we? Some odd column that the application wants to keep track of in some table, some columns the data is which is pretty uninteresting as data, but is still necessary by the application? This is where a proper JSON type would help? Also, the MariaDB Dynamic Columns help with that, but I think JSON would be a far more elegant solution?

/Karlsson

Sunday, May 19, 2013

What's the deal with NoSQL?

Everybody seems to be looking at and debating NoSQL these days, and so am I and I thought I'd say a few words about it. Which is not to say I haven't said stuff before, bit them I was mainly targeting specific attributes of many NoSQL solutions (like "eventual consistency" or, as you might call it, "instant inconsistency", What I was opposing is that "eventual consistency" has anything to do with just that, consistency. Rather, what this means is that at any point in time the system is inconsistent, and even if it might be consistent, you cannot rely on it being so. Which is fine, but don't call it consistency, call it inconsistency. Allowing a database to be somewhat inconsistent doesn't necessarily mean that it's something wrong with it).

All this said, what is going on here, why are we MySQL and MariaDB users seeing so many MongoDB, Cassandra and LevelDB applications pop up? Come on, these are typically less functional implementations of a database than even the most basic MySQL setup? No transactions, no joins, no standards etc. etc. And the answer, if you want to hear what I have to say, is ease of use. So let's explore that a bit.

Following the Object Orientation frenzy of the 1990s, when any application project ended up consisting of endless sessions modeling objects, usually involving expensive consultants, dresses in expensive, blue suits. And when that was done (which took years!) you had a way cool object model, but no money left to do the actual implementation, i.e. do the real programming (shiver), and you went to some other project and the nicely dressed object design consultant left to see another OO sucker.

Now, objects are much more standard, even non-OO languages have a big chunk of OO features, and these are used enhance programmer productivity and better code and design. Which is fine (except that if you were one of those OO consultants, which means you are now out of a job, as such mundane tasks of writing is not something you would ever do, such dirty stuff is better left to "programmers". Oh no, I forgot that you are now an ITIL consultant, that just slipped my mind) but how does this relate to MySQL and MariaDB. The answer is that MySQL, which was once considered real easy to use, no longer is as easy as it used to be. The Relational data model is still brilliant when you look at data as data, and that is how many of us look at it, so we go through the process of mapping data to objects, if that is what it takes. SQL and Java, PHP or whatever merges, and the application now contains a layer mapping objects to real data. Or we use hibernate, which does this automatically for us.

But a new cadre of developers are emerging, and they look at OO as natural and they look at objects as data (it's not. Data, in my mind, should be independent from the application using it, objects on the other hand, are closely tied to the application at hand). With which I do not mean that there is something wrong with building applications using objects, quite the opposite. But if all you know is objects, then using relational technology turns difficult, and SQL, for all the good things with it, seems old-fashioned and arcane, which it is (but it is so widely used you cannot avoid it). So you go with something that looks at objects as all you need, and present that in some object format. Like JSON.

And again, there is nothing wrong with that. But if we who are on the SQL and Relational track just discards these NoSQL technologies, we are not making any friends. We have to accept that MySQL and MariaDB really aren't that easy to use anymore, at least not for newcomers.

And then there is another thing: Some data, like Big Data, has attributes that really doesn't fit well in a relational model. Data where the attribute of a value can't easily be determined once and for all, and you need to reprocess that data (large test objects, images and maps are some examples). In this case, you really need to extend the relational model, somehow.

But SQL-based relational isn't going away. The Relational model is still one of the best ways to look at data, it's just that we also need some other ways of looking at data. And it needs to be easier to access. And we shouldn't really have to push SQL down the throat of every single developer, trying to develop an application using some OO technology. The answer is we need both. And these technologies needs to interoperate. I want to use SQL for my data. But I also want JSON and REST for my data. And there shouldn't be much of a performance overhead. All in all, we SQL folks need to wake up and data easier to use again. We know data better than the Cassandra and MongoDB folks. We know transactions better than them too. But they know how to work with developers who doesn't know who The Beatles were and make Relational easy to use for them, without them having to learn JSON (and now having to listen to a tirade about todays youngsters not knowing what real music is and that it died with John Lennon! What? You don't know who John Lennon was! That's exactly what I mean, you have no taste at all!).

Just my 2 cents...

/Karlsson

Monday, May 13, 2013

Version 1.6 of mysqljsonimport now available

Yes, finally! This took some time, but I have been so busy with other things, work-related as well as domestic, that I just haven't had the time for this. But finally version 1.6 is available for download from sourceforge. The downloads is as usual the autoconf enabled source code and PDF documentation in PDF.

So, what is new you ask, well there is one big new feature which took a lot more effort than I expected. When this program was written at first, I still have the table/use use in mind. What this means is that I visioned JSON objects to be mapped to a table. This is not how programmers view JSON, but this is how data is viewed in many databases, even NoSQL ones such as MongoDB. So I wanted an import tool for simple row-structured JSON objects.

Now, there is a different way to look at things, which is to see the data in the JSON file as objects, and each member as one or more rows in a table. This sort of makes up an object (yes, this is very simplistic, but you get the point). So data might look like this:
[
{"nodename": "server1"
"users": [{"id": "joe", "name": "Joe bloggs"},
{"id": "sue", "name": "Sue Bloggs"}
],
"hosts" [{"name"; "internal", "address": "192.168.0.78"},
{"name": "external", "address": "11.186.19.177"}
]
},
{"nodename": "server2"
"users": [{"id": "dick", "name": "Rickard Bloggs"}
],
"hosts" [{"name"; "internal", "address": "192.168.0.75"},
{"name": "external", "address": "11.186.19.161"}
]
}
]
Here we would be loading into tables users and hosts and we would load some 7 rows in those two tables. I think what is also clear is that there is a whole bunch of stuff here to make this smarter, like other fields of the object affecting the data that is loaded, either being added to the data or to filter what data is loaded. But none of that is in place right now, for this version, this is just a simple object to table load. The old row-by-row formats are still supported (plain JSON format and Array format).

Also, something cool to add is to add support for MariaDB dynamic colums. I have some ideas here, but I have yet to write the code.

In addition, this release adds a --dry-run option has been added, which allows you to test config files and settings, before starting to load.

I'm planning to write more about MySQL / MariaDB and JSON here eventually, and also about plain JSON, but for now, have fun, take care and happy SQLing.

/Karlsson

Tuesday, April 30, 2013

JSON with MariaDB and MySQL Slides available

As you may know, I'm a big fan of JSON and I am working on releasing a new version of my JSON tools real soon. Meanwhile on the SkySQL Solutions Day on April 26 I gave a talk on JSON with MySQL and MariaDB, and the slides are available on slideshare.

/Karlsson

Wednesday, April 24, 2013

In Santa Clara now. 2 talks coming up

I'm in Santa Clara for Percona Live now, and things are looking good! The announceent on the merger of Monto Program and SkySQL is a good one! If you are around, I'll be speaking on MySQL on AWS on Thursday at 1:50 PM in Ballroom F. On Friday at SkySQL Solutions day (if you are at Percona Live and don't know about this, registration is free! Come see us, the program is here: http://www.skysql.com/content/mysql-cloud-database-solutions-day-schedule) I'll be talking about MySQL and MariaDB with JSON at 11:15 AM in Grand Ballroom B!

I'm also about to release a new version of my MySQL JSON tools real soon!

/Karlsson

Thursday, July 26, 2012

MySQL JSON import tool version 1.4 is out

This is just a quick notice that there is a new version of the JSON import tool available now. The main thing in it is that it fixes an iddues with threadinig that caused a threaded load (and this is the default) to hang of there was an error or the import was interrupted. Also, there are some minor fixes and also a runtime status printing, available by sending a SIGUSR1 signal to the process, feature is available.

Download from sourceforge.

Cheers
/Karlsson

Tuesday, July 24, 2012

New MySQL JSON import and export tools

Yo!

My idea for a JSON based SQL schema tool that I wrote about some time ago has been on the backburner as I wanted to finish two other projects:

A better JSON import tool to substitute the old mysqljsonload.
Fix up mysqljsonexport to align it with mysqljsonimport.

These two projects are now done and the program are ready for download from Sourceforge, mysqljsonimport from here and mysqljsonexport from here. There is some serious changes to these program here, for example, the multi-threading in mysqljsonimport is now "dynamic", so that as many threads as needed (up to a user specified limit) are created to support faster loads and in case of mysqlexport, mainly this now has it's options aligned with mysqlimport. This means that now you can export and import a whole database in JSON with these tools with a single command.

As usual with my tools, there are A LOT of options and a lot of flexibility, you can specify fixed values for certain MySQL columns to import or export, and both tools also support an automatic internal increment feature. Also, MySQL columns imported may now contain MySQL functions or constants, basically anything that is allowed as a value in a MySQL INSERT can be used.

Hope you enjoy these tools, there is still a few things missing, like better documentation and more tests and stuff, and also better parallel export for mysqljsonexport, but that is for another version of these tools.

As you might remember, these tools started off as a tool to do some InnoDB, NDB and MongoDB benchmarking. I never got as far with that as I wanted to, but now I have the tools to complete those benchmarks. Also I will soon start working on a tool for that database-schema-in-JSON tool, so stay tuned to this channel folks, and don't touch that dial!

Cheers
/Karlsson