Friday, May 31, 2013

MyQuery 3.5.2 Beta available

I have just posted version 3.5.2 of MyQuery on sourceforge. This version adds just one feature: The ability to run LOAD DATA LOCAL INFILE and in the process adds a few features to this, such as a progess window and the ability to stop a running load.

Download and enjoy!

/Karlsson

Monday, May 27, 2013

JSON datatype in MariaDB prototype

I have patched up a MariaDB version with JSON support, just for the fun of it. This is not the best version of MariaDB around, as I am not a MariaDB developer by any means, and although I have played with the MySQL and MariaDB sources before, I have never attemped to look like I know it in detail. So although this works, it's probably full of memory leaks, errors and bad code (my additions, that is).

That said, it actually works. Surprise! For a simple prototype, that is to show off a specific feature.

So, this is what I have: I grabbed MariaDB 10.0.2 sources and worked from there. To support the JSON specifics, I included the Jansson C-library for JSON. So far so good, then I wanted a JSON datatype, that was the first step. Adding a new datatype to MariaDB / MySQL is a pretty major undertaking though, so I decided to try something different, I decided to kidnap an existing datatype, preferably one with attributes that I could reuse for my JSON type, so I ended up grabbing the little used MEDIUMBLOB datatype.

I also decided to use plain JSON text as the representation, and not use some compacted binary representation, the only compaction I do is to remove duplicate members and unnecessary spaces. So for a JSON object like this:
{"fname":  "John", "dept": {  "name":  "accounting", "id": 1 }, "lname": "Doe", "fname":  "George"}
I will store
{"fname":"George","lname":"Doe","dept":{"name":"accounting","id":1}}
Which is no big deal I guess. So with this in hand, I can now create a table like this:
CREATE TABLE appuser(id VARCHAR(10) NOT NULL PRIMARY KEY, name JSON);

The JSON datatype checks that the input string is valid JSON, which is different from the way dynamic columns work. Also, the data is compacted as mentioned above.

Let's insert some data to test this out:
INSERT INTO appuser VALUES('johndoe', '{"fname": "John", "lname": "Doe", "dept": {"name": "accounting", "id": 1}, "role": "admin"}');
INSERT INTO appuser VALUES('susanscott', '{"lname": "Scott", "fname": "Susan", "dept": {"name": "services", "id": 2}}');
INSERT INTO appuser VALUES('janeroe', '{"lname": "Roe", "fname": "Jane", "dept": {"name": "accounting", "id": 1}}');
INSERT INTO appuser VALUES('tomdraper', '{"fname": "Tom", "lname": "Draper", "dept": {"name": "management", "id": 3}, "subordinates": ["johndoe", "susanscott"]}');

Following this, we can do some simple JSON manipulation like this;
SELECT id FROM appuser WHERE json_object_get(name, 'role') = 'admin';

And if the name of the 'admin' role changes to 'administrator' we can do this:
UPDATE appuser SET name = json_object_set(name, 'role', 'administrator') WHERE json_object_get(name, 'role') = 'admin';

And to find whoever is supervising Susan Scott, we can do this:
SELECT id FROM appuser WHERE json_array_contains(json_object_get(name, "subordinates"), "susanscott");

Finally, John Doe is actually the supervisor for Jane Roe, so let's fix that:
UPDATE appuser SET name = json_object_set(name, "subordinates", '["janeroe"]') WHERE id = 'johndoe';

I have a few more things in there, and there are many things missing, this was done just to get a feel for things. The main tool for all thsi is Jansson, as mentioned above, and Jansson has a bunch of cool tricks up it's sleeve. We could easily implement merging of object and of arrays, more array management functions and stuff like that.

Again, what do you think? Is this useful? I sure think this is easier to use than the Dynamic Columns (but these have other advantages). I still have things to do, both functions to be added as well as SELECT ... INTO OUTFILE and LOAD DATA INFILE and stuff like that.

Does anyone want to play with this? I do have a x86_64 build if someone wants to try it.

/Karlsson

Sunday, May 26, 2013

SQL and JSON, what do you think?

As you might know, I'm a big fan of JSON. One big reason is that I believe that JSON is closer to most developers view on data, whereas the Relational SQL based model is closer to what someone working with data itself or someone working with infrastructure. What I mean here is that neither view is wrong, but they are different.

So, given that, can we merge the Object JSON world with the relational model? Well, not JSON, but Hibernate does it quite well. This is one of my objects to the NoSQL world, that the datamodel is closely linked to the application at hand, and less so to data itself and to other applications. Stuff such as accounts, privileges, accounting data, orders and many other things are global, and are not specifically connected a specific application, but in many NoSQL applications, this is what it ends up being.

And there are not that many good solutions, how can I easily explore data in a NoSQL database, where the data model is application based, without me knowing the application? See what a high-ranking 10gen person, Dwight Merriman (Chairman of the board) has to say on the subject in an interview in 2011.

On the other hand, we relational folks has a few things to answer to. We insist that we know how data is to be managed (and I think we do) and we seem to ignore the fast that development tools and environments has changed sine the 1980s (we still insist that proprietary client/server protocols is the way to go, and that SQL rules), In my mind, SQL and Relational rules for data, but not necessarily for applications, and we (including yours truly) should wake up and appreciate what is going on here: We need better, more developer focused, means of accessing data, even if that data is relational data.

The NoSQL camp is getting ready to fix what they can fix: proper query languages (Cassandra CQL being one), better data access methods and probably much more that I am not aware of right now.

Which bring me to what we can do to be more developer friendly on the SQL and relational side of things. And the answer is, as I have already hinted, JSON support! MariaDB has a few trucks up it's sleeve, like the Dynamic Columns support. But this is nowhere near enough, the idea is OK but the way to use dynamic columns for a developer doesn't help much, rather the interface is more targeted at making the Cassandra Storage Engine features easier to access from the MariaDB SQL Interface. But I'll do a blog later on the Dynamic Column support and JSON later on anyway.

What I would want, but maybe that's just me, is a proper JSON datatype, with JSON functions, syntax and semantics. That is a starting point I think, and it shouldn't be too difficult. But many other JSON related things that we could do spring to mind:

  • SQL / JSON language mix - What I mean here is that JSON isn't just a string datatype, but that we can include plain JSON, where approprite, in the SQL:
    INSERT INTO table VALUES(57, {"empno": 1, "ename": "Scott})
  • JavaScript stored procedures
  • ROW data as JSON support in the MySQL API
  • JSON column data as ROW data in the MySQL API and tools.
  • A JSON Storage Engine
  • JSON support in LOAD DATA INFILE
  • JSON support in SELECT .. INTO OUTFILE
  • REST / JSON interface to complement the MySQL interface
  • JSON attribute indexing
A then again some stuff I haven't though of so far. A reason for this blogpost if to get your opinion? What do you think? Would this be useful? Would is make MySQL easier to use?

And example why this is useful: instead of having to add columns to a table for some new attribute, attributes that are useful mostly for the applications, is then done by just adding the to a JSON column in the table and the application can add what it wants to that column. We have all seen this, haven't we? Some odd column that the application wants to keep track of in some table, some columns the data is which is pretty uninteresting as data, but is still necessary by the application? This is where a proper JSON type would help?  Also, the MariaDB Dynamic Columns help with that, but I think JSON would be a far more elegant solution?

/Karlsson

Sunday, May 19, 2013

What's the deal with NoSQL?

Everybody seems to be looking at and debating NoSQL these days, and so am I and I thought I'd say a few words about it. Which is not to say I haven't said stuff before, bit them I was mainly targeting specific attributes of many NoSQL solutions (like "eventual consistency" or, as you might call it, "instant inconsistency", What I was opposing is that "eventual consistency" has anything to do with just that, consistency. Rather, what this means is that at any point in time the system is inconsistent, and even if it might be consistent, you cannot rely on it being so. Which is fine, but don't call it consistency, call it inconsistency. Allowing a database to be somewhat inconsistent doesn't necessarily mean that it's something wrong with it).

All this said, what is going on here, why are we MySQL and MariaDB users seeing so many MongoDB, Cassandra and LevelDB applications pop up? Come on, these are typically less functional implementations of a database than even the most basic MySQL setup? No transactions, no joins, no standards etc. etc. And the answer, if you want to hear what I have to say, is ease of use. So let's explore that a bit.

Following the Object Orientation frenzy of the 1990s, when any application project ended up consisting of endless sessions modeling objects, usually involving expensive consultants, dresses in expensive, blue suits. And when that was done (which took years!) you had a way cool object model, but no money left to do the actual implementation, i.e. do the real programming (shiver), and you went to some other project and the nicely dressed object design consultant left to see another OO sucker.

Now, objects are much more standard, even non-OO languages have a big chunk of OO features, and these are used enhance programmer productivity and better code and design. Which is fine (except that if you were one of those OO consultants, which means you are now out of a job, as such mundane tasks of writing is not something you would ever do, such dirty stuff is better left to "programmers". Oh no, I forgot that you are now an ITIL consultant, that just slipped my mind) but how does this relate to MySQL and MariaDB. The answer is that MySQL, which was once considered real easy to use, no longer is as easy as it used to be. The Relational data model is still brilliant when you look at data as data, and that is how many of us look at it, so we go through the process of mapping data to objects, if that is what it takes. SQL and Java, PHP or whatever merges, and the application now contains a layer mapping objects to real data. Or we use hibernate, which does this automatically for us.

But a new cadre of developers are emerging, and they look at OO as natural and they look at objects as data (it's not. Data, in my mind, should be independent from the application using it, objects on the other hand, are closely tied to the application at hand). With which I do not mean that there is something wrong with building applications using objects, quite the opposite. But if all you know is objects, then using relational technology turns difficult, and SQL, for all the good things with it, seems old-fashioned and arcane, which it is (but it is so widely used you cannot avoid it). So you go with something that looks at objects as all you need, and present that in some object format. Like JSON.

And again, there is nothing wrong with that. But if we who are on the SQL and Relational track just discards these NoSQL technologies, we are not making any friends. We have to accept that MySQL and MariaDB really aren't that easy to use anymore, at least not for newcomers.

And then there is another thing: Some data, like Big Data, has attributes that really doesn't fit well in a relational model. Data where the attribute of a value can't easily be determined once and for all, and you need to reprocess that data (large test objects, images and maps are some examples). In this case, you really need to extend the relational model, somehow.

But SQL-based relational isn't going away. The Relational model is still one of the best ways to look at data, it's just that we also need some other ways of looking at data. And it needs to be easier to access. And we shouldn't really have to push SQL down the throat of every single developer, trying to develop an application using some OO technology. The answer is we need both. And these technologies needs to interoperate. I want to use SQL for my data. But I also want JSON and REST for my data. And there shouldn't be much of a performance overhead. All in all, we SQL folks need to wake up and data easier to use again. We know data better than the Cassandra and MongoDB folks. We know transactions better than them too. But they know how to work with developers who doesn't know who The Beatles were and make Relational easy to use for them, without them having to learn JSON (and now having to listen to a tirade about todays youngsters not knowing what real music is and that it died with John Lennon! What? You don't know who John Lennon was! That's exactly what I mean, you have no taste at all!).

Just my 2 cents...

/Karlsson

Friday, May 17, 2013

MyQuery 3.5.1 beta released!

After a lot of fuzz, I am now releasing MyQuery version 3.5.1. This version introduces one big feature, a brand new Dictionary viewer. In addition to that, there are numerous bug fixes and the removal of one feature, which is the option to run with just 1 connection: In this version, 2 connections will always be used, and I have some good reasons to remove this as being optional, fact is, running with 1 connection was hard to diagnose, caused a lot of problems, and had no real benefit actually, just drawbacks.

So, for you Windows users, MyQuery 3.5.1 is now out there, but it is really a beta. The beta is caused by the new Dictionary viewer, the rest should be pretty stable.

Download it from sourceforge.

Happy SQLing
/Karlsson

Monday, May 13, 2013

Version 1.6 of mysqljsonimport now available

Yes, finally! This took some time, but I have been so busy with other things, work-related as well as domestic, that I just haven't had the time for this. But finally version 1.6 is available for download from sourceforge. The downloads is as usual the autoconf enabled source code and PDF documentation in PDF.

So, what is new you ask, well there is one big new feature which took a lot more effort than I expected. When this program was written at first, I still have the table/use use in mind. What this means is that I visioned JSON objects to be mapped to a table. This is not how programmers view JSON, but this is how data is viewed in many databases, even NoSQL ones such as MongoDB. So I wanted an import tool for simple row-structured JSON objects.

Now, there is a different way to look at things, which is to see the data in the JSON file as objects, and each member as one or more rows in a table. This sort of makes up an object (yes, this is very simplistic, but you get the point). So data might look like this:
[
{"nodename": "server1"
 "users": [{"id": "joe", "name": "Joe bloggs"},
    {"id": "sue", "name": "Sue Bloggs"}
  ],
  "hosts" [{"name"; "internal", "address": "192.168.0.78"},
    {"name": "external", "address": "11.186.19.177"}
  ]
  },
{"nodename": "server2"
 "users": [{"id": "dick", "name": "Rickard Bloggs"}
  ],
  "hosts" [{"name"; "internal", "address": "192.168.0.75"},
    {"name": "external", "address": "11.186.19.161"}
  ]
  }
]
Here we would be loading into tables users and hosts and we would load some 7 rows in those two tables. I think what is also clear is that there is a whole bunch of stuff here to make this smarter, like other fields of the object affecting the data that is loaded, either being added to the data or to filter what data is loaded. But none of that is in place right now, for this version, this is just a simple object to table load. The old row-by-row formats are still supported (plain JSON format and Array format).

Also, something cool to add is to add support for MariaDB dynamic colums. I have some ideas here, but I have yet to write the code.

In addition, this release adds a --dry-run option has been added, which allows you to test config files and settings, before starting to load.

I'm planning to write more about MySQL / MariaDB and JSON here eventually, and also about plain JSON, but for now, have fun, take care and happy SQLing.

/Karlsson