Karlsson on databases and stuff: November 2012

Monday, November 26, 2012

In my mind: Why the ORDBMS idea failed

Some 15 years ago, the idea of an ORDBMS (Object-Relational Database Management System) was red hot, and I was very close to the flaming hot center of that. I worked for Informix at the time, and Informix bought Illustra which was the hottest and coolest of the databases if it's time, hey it was an ORDBMS.

This was not a bad idea per se, and I got entangled with it and was really enthusiastic about the idea and I spent a lot of time evangelizing this technology. For Informix, this was as much market positioning and a technical change, Informix went from being the cheap redneck cousin to become the Gordon Gekko of databases. Before this, Oracle was the Big Market Leader, Sybase was the technology leader and Informix was the price leader (no, I'm not talking technical realities, there was a lot of good technical stuff to all of these, this is about how the world at large perceived these guys). But Illustra and another Informix project, XPS (aimed at the data warehouse market) was going to take Informix to places it had never been before. Oh, the Billboard wars, the day when Informix went past Sybase, those were fun days.

From a financial POV, Informix lost it, we already know that (read "The Real Story of Informix Software and Phil White" by "Steve W. Martin" ISBN: 978-0-09721822-2-5), but that's not, in my mind, the whole story, and I think that even though I think there are many good prspects for an ORDBMS system, it's not really as generic as I figured it back then (OK, I was wrong, I admit it, it does happen).

From a technical standpoint what went wrong was (this is my take on it, by the way) that the cool ORDBMS features shoehorned into an aging Informix RDBMS design ended up being largely the worst of both worlds. That has been fixed, to an extent, in more recent Informix releases, but not it's too late :-(

And from a conceptual view, this is also what I think is wrong with the whole ORDBMS thinking. I know and love the traditional RDBMS model, with a fixed number of columns and a variable number of rows, even if this is a simple model, it works real well for data. It makes plain data easy to visualize and understand, and this is also a well researched and understood model for data. As for OO, then this has been thoroughly researched, but the implementations and functionalities differ a lot. Also, OO has a developer focues way of looking at data, for an application, and Objects is a natural way of looking at things and makes things easy, from an application POV. But representing data as an Object is a different thing. Not a bad or good way, but different. The Relational model also lends itself to building control structures for data as it, assuming that the RDBMS is used in some kind of normalized form, is representing data at a very low level, lower than what most applications or end-users view data. And Objects are a way of combining all this data into something that is more application centric.

So the ORDBMS systems turned non-OO enough to not attract the OO people, and at the same time the OO features were non-Relational enough to make the SQL-experts ignore them. (Like: "Why would I want a result set with a variable number of columns?"),

And before I close this: Yes, I know there are many ORDBMS applications out there, that works well and where the application utilize all the cool ORDBMS features. Also, in Oracle and in Particular Postgres and others, there are ORDBMS features that are developed. And inside Postgres, the ORDBMS features is a building block for more than one generic RDBMS feature. But for database people in general, ORDBMS is something we don't see much of.

/Karlsson

Thursday, November 22, 2012

Character sets, Collations, UTF-8 and all that

Yesterday at the first Swedish MySQL User Group real meeting here in Stockholm, I presented a talk on character sets, collations and stuff like that. If you read this blog, you know that I have written about this before, but the presentation I did yesterday was a fair bit more detailed. You can view the full presentation on slideshare :

One thing I talked a lot on was collations and how they affect matters, and this has more of an impact than you think, in particular when using UTF-8. You would think that using UTF-8 most character set problems are solved (at least when using 4-byte UTF-8), but no. Collations are still added to this, and there are many of them and the effect of choosing the wrong one can be real bad.

Let me take an example. You would think think that using a UNIQUE or PRIMARY KEY on a text-based column (using something like a VARCHAR or CHAR type) in a table would ensure that any two strings are unique, but that two strings values that are different may coexist in two different rows. Think again.

A collation defines how characters in a character set are sorted and compared. And most localized collations have some weird attributes to them. There are things that linguistics think are reasonable for a particular language, and that are hence present in the UNICODE standard, but it might not be widely accepted by the community at large. So back to my original example. Let's say we are in Sweden, then 4 (yes, four) different collations may be applicable:

utf8 binary - This is a plain binary collation, comparisons are done on the binary value of the characters.
utf8_unicode - This is a pretty reasonable collations based on some generic compromise in UNICODE on how things are sorted, and are not sorted across the globe. Sort of.
utf8_general - This is a simplified, faster general variation compared to utf8_Unicode
utf8_swedish - This is a collations that is specific to Sweden with some interesting Swedish specifics.

So lets's see how this work in practice. Lets try a table that looks like this:
CREATE TABLE `utf8_table` (
`swedishname` char(10) CHARACTER SET utf8
COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (`swedishname`)
) ENGINE=InnoDB;
What happens with this data:
INSERT INTO utf8_table VALUES('A');
INSERT INTO utf8_table VALUES('Ä');
In Sweden, these two are unique (the second A has an umlaut). In the rest of the world, these two are the same, so the above will not work, a PRIMARY KEY error will happen on the second row, despite the characters being different! So we try this instead:
CREATE TABLE `utf8_table` (
`swedishname` char(10) CHARACTER SET utf8
COLLATE utf8_swedish_ci NOT NULL,
PRIMARY KEY (`swedishname`)
) ENGINE=InnoDB;
And with the same data:
INSERT INTO utf8_table VALUES('A');
INSERT INTO utf8_table VALUES('Ä');
And this works as it should work, both rows are inserted!

I will write another blog post on this soon, with some more examples eventually, but for now:
Cheers
/Karlsson
PS. I apologize if you have problems reading the above, it probably comes from the fact that there are embedded Swedish characters in the text :-(

Tuesday, November 13, 2012

This sucks! Well, maybe it does and maybe it doesn't...

Imagine that Microsoft and Apple got into a big fight for the market some 15 years ago and that Apple lost. Big time. Apple went down completely and there was nothing left. And as an IT expert, you were called in to look at what remained, what could be salvaged and what was just a waste of everyones time and money.

If you had seen the iPhone back then what would you have said? (I'm not so sure myself, chances are I would have been terribly negative). Note that there would have been no AppStore, no HTML5 sites, none of that neat stuff.

Or to make a different analogy: Was VHS better than BetaMax? Well, that depends on who you ask: The end consumer wanting to rent a movie or the techie looking at the specifications of the technology in question.

Just after the second world war, in a Germany in shambles, the allies went in and had that look at Germany, and with them they brought some smart dudes, to look at what was useful, what was not and what was rubbish. Reginald Rootes, who together with his brother Billy ran the Rootes Group, one of the big 5 producers of cars i Britain at the time, came along to, among other places, Wolfsburg to have a look at the VW plant. Despite being adviced that the VW was a viable product and seeing it himself, Reggie wasn't interested. Now, some 60+ years later, all the remaining Rootes brands and factories are long gone (the last one, producing Peugeots, closed in 2007). And VW is fighting with Toyota for the title of the world largest car maker.

All in all, stuff that might have serious issues, might be just because of development issues, and you need to look further down the road to see the potential. And don't make the mistake of thinking that the good or bad implementation of an idea says much about the real potential of that idea.

Take Virtualization. Running a database in a virtualized environment was a big no-no just a few years ago. Now things have developed, performance is much better and many of us can use a virtualized environment for many, if not most, of our database needs. Be it Oracle, MySQL or Postgres or whatever.

Go back 20 years and ask yourself how you would have reacted if someone told you that in 20 years, many large enterprises would have large parts of their infrastructure run on a operating system developed by a Finnish student in his spare time in an outsourced environment run by an Internet bookstore company? Nah, don't think so.

So what is the next big thing then? I try to spend some time on it, and when I get to test or try something, I really try to separate the implementation of the idea, be it a new operating system, a new type of access method or whatever, from the actual implementation. The latter says less about the former than you think.

Also, technology isn't everything. Far from it. The best technology doesn't always win. And as for the new technologies you look at, the usefulness and applicability of those isn't always what you think. Did the web turn out to do what we were expecting? What did you expect to be able to do with a cellphone some 15 years ago, besides making phone calls and sending text messages? I believe there is a synergy between the potential of a technology and the applications for it that is the drive forward. And don't be so fast to click that "This sucks" button.

/Karlsson
Sorry for this post not being that MySQL focused, but I think it is still applicable. Even if the implementation sucks.