Karlsson on databases and stuff: October 2013

Tuesday, October 29, 2013

Bet the company! Just do it!

The term "bet the company" means that a company, large or small, puts everything it has in one big move. Betting the company when the company is small is one thing, this happens every day (you can say that MySQL did this, but they did it when they were small and and there wasn't really much to bet with). It is true that Henry Ford bet the company on the T-Ford, but at that time Ford really wasn't that big when Henry Ford started building the Model T using the Assembly Line which meant that the cars could be made much less expensive.

But how often do you see a major company betting itself on one single product line? Not that often I say, but that might be because I do not know industries outside IT and Cars that well, and also you tend to forget the failures and the associated companies.

One reasonably well-known failure was the Schlitz Brewing Company that went downhill in the early 1980's after betting on new technologies in making beer to produce a less expensive beer to produce and increase margins. In the process though, the beer, even for American beers at that time, went of to taste even less, if that is possible and also the beer looked awful. So Schlitz lost it's customers and went downhill and was acquired a few times. Their most innovative Brewery, as far as technology goes, still exists, is owned by Anheuser-Busch and produces, well, you can guess it by now, Bud Light.

Anyway, enough of interesting things such as cars and beer, let's get back to boring computers again. Some of the biggest bet-the-company project we have seen are in the IT industry, one is when Microsoft decided to go with Windows, skipping out of the development of OS/2. If you look back at it now, and consider that at that time, IBM was still the dominant player and also that IBM had brought Microsoft to the position it was in (by giving them the right to print money, sort of, in the shape of allowing Microsoft the rights to DOS), this was pretty risky bet by Microsoft, but it worked out. There were more IT bet-the-company projects in the 1980's, from Commodore and Atari and so on, but they were largely failed. Also, we have Apple giving us the Mac, but in this case, Apple as a corporate entity never was betting on the Macintosh until it had proven itself in the marketplace.

But I think that one of the most substantial bet-the-company projects ever, in any industry, was when IBM bet on the 360 architecture in the beginning of the 1960's. At the time IBM had a huge array of conflicting and incompatible lines of computers, and this was the case with the computer industry in general at the time, it was largely a custom or small scale design and production industry, but IBM was such a large company and the problems of this was getting obvious: When upgrading from one of the smaller series of IBM computers to a larger one, the effort in doing that transition was so big so you might as well go for a competing product, the effort was actually similar, you had to redo everything. So IBM was getting some serious competition from the "BUNCH" (Burroughs, Univac, NCR, CDC and Honeywell) as they were called later when General Electric and RCA left the field of computing in the late 1960's.

Yes, the IBM 360 was truly revolutionary. It was a whole new range of computers, including software, hardware and peripherals. It had a whole bunch of features we now take for granted, like the 8-bit byte addressing (yes, this was not an obvious feature and the term "byte" itself, although not invented for the 360, was first used by IBM and was made 8-bits and nothing else with the 360-series). The risk that IBM took with the 360 series was enormous, and this is another factor of what bet-the-companies all about: taking a really big risk. Looking at what Xerox did, or rather didn't do, with PARC, explains a big chunk of why PARC was a failure, in terms of turning Xerox around, and why the 360 was not. The 360 really did turn IBM around, and many at IBM was not happy with the way that the 360-series was developing and what it was doing to existing IBM product lines, but the IBM management at that time stood by their decisions. As for Xerox in the 1970, they were much less supportive of PARC (and this is a gross understatement).

Anyway, the IBM management that dared taking the 360-project is to be applauded, and I think we need more of this kind of daring management, stuff is just happening too slow and it's time for a quantum leap. And although there is a risk to betting-the-company, the gains can be enormous and the effects of NOT taking that risk is also betting-the-company (look at those office equipment manufacturers that didn't dare to go into computers).

I think this is only of the issues with many Open Source projects, they are run and supported by people using the project the way it works today, and hence has limited interest in taking a huge risk at some new, unknown technology that currently has limited value for them. As Henry Ford said: "If I had asked people what they wanted, they would have said they wanted a faster horse".

/Karlsson

Wednesday, October 16, 2013

MariaDB Dynamic Columns client API

I have blogged on using MariaDB Dynamic Columns already, and I hope this was useful and introduction. I have a few more things on this subject though, but one so far little known and used feature is the Client API for Dynamic Columns, see the MariaDB Knowledge Base for details. What this is all about is that Dynamic Columns were originally envisioned as a means of managing the "dynamic columns" used by Cassandra when using the MariaDB Cassandra Storage Engine. But as we realize, this is the server side of things (the Storage Engine) but there is a corresponding client library also, that is part of the MariaDB Client API.

As you have seen if you have read my previous blog on this subject, or whatever is written about MariaDB Dynamic Columns elsewhere, which is not much, MariaDB actually has no external representation of the dynamic column: Either you get the binary representation or you parse the binary representation using one of the supplied functions (like COLUMN_GET to get the value of an attribute in a dynamic column) or you get the binary. The only exception is the function COLUMN_JSON that retrieves the whole dynamic column and converts it to JSON, supporting nested objects also. Regrettably, there is no built-in means of adding a dynamic column in JSON format (Ulf Wendel has provided some JSON User Designed Functions though, but they don't line up with Dynamic Columns on the other hand).

Now, if we assume we are not using the COLUMN_GET function or something like that, can I programmatically parse a dynamic_column on the client? Well, I could sure use the server functions again, calling COLUMN_LIST and functions like that from a SELECT statement or in a Storage Procedure (I'll show this later), but what is real handy (if you are a C-developer like myself) is the DynamicColumns API functions in the MariaDB Client Library.

To begin with, these functions are not as easy to use as it might seem in the Knowledge Base article mentioned above. There are some bugs in what files are required that currently makes it kind of messy to build programs that use the Dynamic Columns API (which brings up another question: How many times can you write Dynamic Columns in one single blogpost?) This will be fixed eventually (it is known, reported bugs we talk about here), but until it is, let me show you some examples, so you get the point of this API.

To begin with, the API assumes that data passed to it is in the form of a DYNAMIC_COLUMN struct, which is a string with a specified length, i.e. it is NOT a null-terminated string but instead it is really, truly binary. This might be silly, but it is just how things are. So the DYNAMIC_COLUMN struct is set up with the data you got from a MariaDB Dynamic Column binary data. Still, this might not be valid Dynamic Column data, so to check that you should call the mariadb_dyncol_check() referencing this data to make sure it is valid Dynamic Column data.

The function that you then want to use is by far the most powerful one, mariadb_dyncol_unpack(). This will unpack a MariaDB dynamic column are retrieve the basic information on it:

How many attributes there are (how many "Dynamic Columns")
What the names of those attributes are.
The the values of those attributes are.

The values are returned as DYNAMIC_COLUMN_VALUE structs and the names as MYSQL_LEX_STRING structs. Most of this is described in the knowledge base article mentioned above.

I spent some time playing with this, most of that time was spent understanding how the API was meant to work and how to build an application on it. One night, after some heavy drinking, I decided that the best way to show this would be to write some code, instead of just blog about it on and on, so still severely drunk I wrote a simple tool for extracting Dynamic Column data from MariaDB and show it in some different ways (JSON, pretty JSON, Indented dynamic columns etc). To allow you to build it yourself I used autotools, but fact is to use that, you have to also copy some includefiles from the MariaDB source distribution (the reason is those bugs I mention above).

So there is a pre-built executable (I built this on an Ubuntu 10.10) that assumes you use the MariaDB 10.x Client Library (I used 10.0.4). If this doesn't work, as I said, you can always build it yourself (which isn't a throughly tested procedure). Also, for any of this to work, you need the Jansson JSON library.

I'll develop some more Dynamic Columns related tools eventually, but for now this is what I have in the dyncoltools toolset, the dyncoldebug tool available on sourceforge.

/Karlsson