Monday, January 26, 2009

MySQL Embedded and Technically Embedded at the User Conference

Yepp, it's back again. By popular request, libmysqld is here with us again, having been off to attend other important library matters, whatever those are. But before I go on about this, let me tell what the heck I'm talking about, if you missed the neginning here.

MySQL can be embedded in your application, this is what is often called MySQL Embedded. If you are an ISV (Independent Software Vendor) or OEM (Original Equiment Manufacturer) then this should interest you. If you have never theard those acronyms before, then what this means is that you represent a company that, as part of some unique solution, integrates components from outside vendors. such as MySQL (and you you think that OEM in particular is a weird acronym, theen you are not alone, but this is historical).

Now, assume you build an application that might not be completely Open Source or use an license that is incompatible with GPL. So you have to use non-GPL software. Then this is where MySQL Embedded comes in. It has a different license (and a price), that allows you to do just that.

But there is some confusion here. Embedded as I described it so far is in terms of how MySQL is bundled with an application, it really has no technical implications. So then we can talk about deeply embedded ocg technically embedded, which means that the MySQL Database runs as a component inside your application. No Windows Services. No daemons. Nothing like that. When the application starts, MySQL starts with it, and when the application shuts down, for whatever reason, MySQL stops. This is what libmysqld is. A library that contains nearly the whole MySQL Server that you link with your application.

I will be speaking about libmysqld at the MySQL User Conference later this year. I have a pretty good idea on what to talk about, you can see it for yourself on the description of my talk if you follow the link given. But I would want to hear from you. Is there anything more you want to know. I will do some benchmarking eventually (I haven't seen anyone else doing it), but that I plan to show and discuss here at the Blog. Also, if you are interested in libmysqld, what features do you lack? The current "C library-only" is a limitation, what would you like to see next? Do you have experiences with libmysqld? Good or bad? Come to my talk, and let's talk about it.

Your truly embedded
/Karlsson

Putting 5.x features to good use. Of Events, Information_schema etc.

When I first looked at EVENTs in MySQL 5.1, I was a bit unclear on what the advantages were. Hey, what's wrong with crontab? I can tell you one thing, that I realized after I started using EVENTs that they can do, that a crontab event cannot: You do NOT have to put a passord in a script! Noone really said anything about this obvious advantage, and it was only after I started using EVENTs that I figured out that obvious advantage.

That said, I have used some new features in 5.x to implement a monitoring solution for my PapaBlues website, and on top of this, I am managing the site using some Joomla components that I have developed (the whole site is built with Joomla).

I use an EVENT to once per day collect STATUS data. The way this works is that I have a PROCEDURE that is called by the EVENT. I realized that for a recurring event, and mostly for a single shot event, you really want all the code in a PROCEDURE that is then called by the EVENT in question. Why? Simple, you want to test the code, without waiting for the procedure to run. You want to run the code at will. You want to change the procedure code without having to respecify the event. No, this is way to go, trust me.

The Joomla module I have written controls the running of the EVENT, and a screenshot looks something like this:
Here I can enable or diable the EVENT. I can define how often the EVENT runs and when it starts. Which is all I need to do, really. Also, I can, using the "Run" icon, run the collector right now.

The way I collect data is by DATE. If the proceure that does the collection runs twice in a day, the previous contents are overwritten. To ensure that not TOO much data is collected, I use a table to control what status variables I want to save. To simply handling of status data, and the collection of data, I have a table that contains what I call "STATUS GROUP", and then there is another table that contains the VARIBLES that are part of each group. This makes it a lot easier to monitor the different status variables.

For each group, I can decide if the variables in that group are added to the history or not. Any variable can of course be in more than one GROUP. The Joomla interface looks like this:
Here we see a status GROUP that contain the COM_ status variables for DML commands. Plain an simple. And when viewing the status, which I do in yet another Joomla screen, I can choose which group to view, or all. Also, I can choose to view data between certain dates and also do a free text search on variables.

The tables I use to control this looks like this:
CREATE TABLE IF NOT EXISTS `papa_db_status_group`(
`id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`updated` TIMESTAMP NOT NULL,
`name` VARCHAR(60) NOT NULL,
`description` VARCHAR(255),
`is_default` BOOLEAN NOT NULL,
`history` BOOLEAN NOT NULL,
UNIQUE(`name`),
PRIMARY KEY(`id`)) ENGINE=InnoDB;

CREATE TABLE IF NOT EXISTS `papa_db_status_group_var`(
`id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`group_id` INTEGER UNSIGNED NOT NULL,
`variable_name` VARCHAR(60) NOT NULL,
UNIQUE(`group_id`, `variable_name`),
FOREIGN KEY(`group_id`) REFERENCES `papa_db_status_group`(`id`),
PRIMARY KEY(`id`)) ENGINE=InnoDB;

CREATE TABLE IF NOT EXISTS `papa_db_status_history`(
`id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`history_date` DATE NOT NULL,
`variable_name` VARCHAR(60) NOT NULL,
`variable_value` VARCHAR(20480),
UNIQUE(`history_date`, `variable_name`),
PRIMARY KEY(`id`)) ENGINE=MyISAM;

As for my procedure that collects the history, it looks like this:
CREATE PROCEDURE `papa_update_status_history`()
BEGIN
DECLARE v_dummy INTEGER;
DECLARE v_today DATE;
DECLARE c1 CURSOR FOR SELECT MAX(1) FROM `papa_db_status_group_var` AS sgv
JOIN `papa_db_status_group` AS sg ON sgv.`group_id` = sg.`id`
WHERE sgv.`variable_name` = 'Uptime_since_flush_status'
AND sg.`history`;

SET v_today = CURRENT_DATE();

/* Update values that already existed. */
UPDATE `papa_db_status_history` AS sh JOIN information_schema.`GLOBAL_STATUS` AS gs
ON sh.`variable_name` = gs.`VARIABLE_NAME` AND sh.`history_date` = v_today
SET sh.`variable_value` = gs.`VARIABLE_VALUE`;

/* Insert values that didn't exist. */
INSERT INTO `papa_db_status_history`(`id`, `history_date`, `variable_name`, `variable_value`)
SELECT NULL, v_today, gs.`VARIABLE_NAME`, gs.`VARIABLE_VALUE`
FROM information_schema.`GLOBAL_STATUS` AS gs
LEFT OUTER JOIN `papa_db_status_history` AS sh ON gs.`VARIABLE_NAME` = sh.`variable_name`
AND sh.`history_date` = v_today
WHERE EXISTS(SELECT 1 FROM `papa_db_status_group_var` AS sgv JOIN `papa_db_status_group` AS sg
ON sgv.`group_id` = sg.`id`
WHERE sgv.`variable_name` = gs.`VARIABLE_NAME`
AND sg.`history`)
AND sh.`id` IS NULL;

/* Make sure that uptime_since_flush_status is always saved. */
OPEN c1;
FETCH c1 INTO v_dummy;
IF v_dummy IS NULL THEN
/* First, try to update an existing value. */
UPDATE `papa_db_status_history` AS sh JOIN information_schema.`GLOBAL_STATUS` AS gs
ON sh.`variable_name` = 'UPTIME_SINCE_FLUSH_STATUS' AND gs.`VARIABLE_NAME` = 'UPTIME_SINCE_FLUSH_STATUS'
AND sh.`history_date` = v_today
SET sh.`variable_value` = gs.`VARIABLE_VALUE`;

/* Insert the value if it doesn't exist. */
INSERT INTO `papa_db_status_history`(`id`, `history_date`, `variable_name`, `variable_value`)
SELECT NULL, v_today, gs.`VARIABLE_NAME`, gs.`VARIABLE_VALUE`
FROM information_schema.`GLOBAL_STATUS` AS gs
LEFT OUTER JOIN `papa_db_status_history` AS sh ON sh.`variable_name` = 'UPTIME_SINCE_FLUSH_STATUS'
WHERE gs.`VARIABLE_NAME` = 'UPTIME_SINCE_FLUSH_STATUS'
AND sh.`id` IS NULL;
END IF;
CLOSE c1;

END
//

And finally, the event looks like this:
CREATE EVENT `papa_update_status_history_event`
ON SCHEDULE EVERY 1 DAY
STARTS CONCAT(DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY), ' 00:00:00')
DISABLE
DO CALL `papa_update_status_history`;

Before I close this, if you wonder why I always collect the current value of the status variable 'UPTIME_SINCE_FLUSH_STATUS', then this is because, when I show my statistics, I also show the value per hour, and for this, I always need this value.

Hope this provides some help for you
/Karlsson

Friday, January 9, 2009

Sun x2250 box experiences

I'm not trying to push Sun hardware here, I'm not even paid for doing so, I'm just reading up on what is available. In my experience, the probably most common MySQL box is the DL380 from the guys that also make a nice steak-sause (I'm not sure I can mention names here). This is a nice steady workhorse of box that's been around since the Lincoln administration or something like that. A good old box anyway, with ample space for disks and stuff and there is also the smaller brother, the DL360.

Now Sun has, since last summer or so, an entry level x86 box in it's offering. I haven't tested it, but am curious of any real world experiences with this machine, which is competitively priced, it seems, the Sun X2250. This is much smaller than the DL380 (it's only 1U high) and has space for only 2 SATA disks. But today, when there are more and more SAN, NAS and stuff like that used, lack of internal disk-space isn't that much of an issue. And the thing supports 32 Gb memory. And Sun wasn't tempted to go DDR3, so memory for this puppy shouldn't be too expensive.

So has anyone used the X2250, is it any good as an entry level MySQL server box? And any other comments? (except that there is still a lack of a mid-range box in the Sun lineup, the next machine in price is much more expensive, but that I already know).

Your temporary hardware guy, with a screwdriver and soldering iron in hand
/Karlsson

Multicolumn UPDATE with Subquery the MySQL way

We all know the usual UPDATE statement, for example:
UPDATE table_1 SET column_a1 = 57;
And if we want to update more than 1 column, we do this:
UPDATE table_1 SET column_a1 = 57, column_a2 = 37;
Or, assuming we are using something else than MySQL, we can do this:
UPDATE table_1 SET (column_a1, column_a2) = (57, 37);
The last two statements do the same thing, but using slightly different syntaxes (is that a proper word?), both in the SQL standard, and both supported by most RDBMSs, but only the former one by MySQL. Which really doesn't matter much, as these are just two ways to express the same thing, and the two ways aren't really that different.

An UPDATE statement with a subquery, where you use the subquery to fetch the information for the new values for the updated columns is not an uncommon operation. It can look like this, for example:
UPDATE table_a SET column_a1 = (SELECT column_b1 FROM table_b
WHERE table_b.column_b3 = table_a.column_a3);

Nothing special with this, this is the way MySQL does it, and this is how the SQL standard says it should be done. When looking at this syntax and comparing it with an UPDATE without a subquery, you realize that if you are to update 2 columns in table_a, from 2 columns in
table_b you would do this:
UPDATE table_a SET column_a1 = (SELECT column_b1 FROM table_b
WHERE table_b.column_b3 = table_a.column_a3),
column_a2 =
SELECT column_b2 FROM table_b
WHERE table_b.column_b3 = table_a.column_a3);

Now, looking at this, one realizes that in most cases when this syntax is used, the condition in the two subqueries, will almost always be the same, we are just getting two different columns
from table_b. The SQL standard allows you to use a constriction with a ROW value in these cases, and it looks like this:
UPDATE table_a SET (column_a1, column_a2) = (SELECT
column_b1, column_b2 FROM table_b

WHERE table_b.column_b3 = table_a.column_a3);

This will achieve the same result as the second of the SQL statements above, but is less error prone (I one have to specify the query once) and it makes it easy to understand the intention of the query, as well as for the optimizer to do it's job better, oh my, what a blessing. Except that this later construct, supported by most SQL databases, and
part of the SQL standard, is not supported by MySQL. Yikes!

Which doesn't mean that you cannot DO this with MySQL. Not at all, but the MySQL syntax is a bit different, and is an extension of the SQL syntax. If you are used to some other RDBMS than MySQL, then you might not know about the MySQL specific multi table UPDATE and DELETE statements. Here we will look at how the former can be used to achieve what we did with standard SQL above:
UPDATE table_a JOIN table_b ON table_a.column_a3 = table_b.column_b3
SET table_a.column_a1 = table_b.column_b1,
table_a.column_a2 = table_b.column_b2;

In a sense, this makes some sense (sic!). The table join is an ANSI join, which is allowed here but confusingly enough, not with multi-table. But the fact remains, this is not standard SQL. But it is less error prone than the variation above with two subqueries, which IS standard SQL, and is probably faster also, unless the MySQL subquery optimizer does something very smart here.

But all in all, I'd prefer that MySQL supported the standard "row constructor" syntax (this is the term for the ( value [, value] ) syntax in general) for four reasons:
  • It IS the SQL standard way of doing things.
  • It is supported by most other RDBMSs.
  • It is just a syntactic thing (as, as can be seen above, the same can be achieved with a slightly different syntax in MySQL).
  • It is easier to see what is happening, in this case. ANSI JOINs are nice, but in this case, I think they blur the subject of what we are trying to achieve.
And to finish up, no I do not think that MySQL should NOT support the multi-table UPDATE syntax, this is a useful addition to the standard SQL syntax, but in this case, the standard is better (and it is standard), but the multi-table UPDATE can do more things that you cannot do with the "row constructor" UPDATE syntax. Maybe I should do a worklog for "row constructor" style UPDATE support for MySQL...

/Karlsson

Thursday, January 8, 2009

Of MySQL 5.1 and Joomla part 1

Some of you might know that I run a little website on my few spare hours (actually, I have several sites, but I have one that takes up some 95% of all the time I spend on these sites). The site is called PapaBlues and if you pop by some time, and you have seen it before, then you realize that there has been a very major restructuring. The old site is all gone, and the thing is now built using the Joomla CMS as the framework, whereas the old site was using homebuild PHP, HTML and SQL in a bit of a mess, like so many other sites.

I took the new site live just before Christmas, and as MySQL 5.1 had just been declared GA, I decided to use this for my site. I have to say that I am very happy with it so far, stable and performant, and with a some new useful features, I will write more about these in a later blogpost.

If you are currently using Joomla and want to upgrade to MySQL 5.x, then I can tell you that this works fine. To an extent at least, but the limitations are by far more in Joomla than in MySQL, sadly.

What I mean when I say that I use Joomla as a framework, is that I use the Joomla basic libraries, database components etc, and some base components also, but very few of the Joomla modules, components or templates. Except the articles, all the content is provided by my own home-built components that keep track of blues bands, gigs, festivals etc, all running in my own designed template (this I am a bit proud of, as I am a very bad GUI and grphic designer, but I think the look of this turned out OK).

One thing that bugs me with Joomla is that the developers of it seems set in MySQL 4.0 days, actually, in the database mdules, there is a specific "SET SQL_MODE = 'MYSQL40'" command. Come on now, Joomly guys, MySQL 4.0 is about to become extinct! In the comment to this command in the database module, there is even a comment that says that this is to overcome the issues with "STRICT" settings. Yeah right, I want it STRICT so that I know things are rights, just as I want NO WARNINGS when I compile a piece of code. No, this isn't good enough.

In theory, Joomla is database agnostic, at least on paper. In the real world, it is firmly set not only with MySQL but also with MySQL 4.x syntax and behaviour. Try to use the Joomla installer to install a stored procedure or function. It will not work, the SQL installer doesn't understand the delimiter command. This is annoying, but frankly, the stuff still works, even with MySQL 5.1.

You have complete control of your SQL in your own modules, which makes life easier. In the database tables that I use for my modules, not the tables that Joomle use interally, I use InnoDB with aggressive foreign keys all over the place, just the way I like it. I want to be able to mock around a bit with the data, even with interactive SQL, knowing that my relations are still intact.

NULL values is another area where the Joomla database developers has gone all wrong. So very wrong. In essence, they assume that NULL is the same as 0 (for a numeric values) or the empty string for a string value. This I can to an extent understand, as an HTML form, which is the base for much of the administration of a Joomla site at least, doesn't know about NULL either. This makes things a mess when it comes to FOREIGN KEYS. A NULL in a referencing key is allowed, without a matching references key, but not so a 0 or an empty string.

But why would you want that anyway? A refernce to any value is useless, right? Well, if we are talking single column keys, yes, that is probably true, but for a multi column key, this is a very useful feature. I will explain why in a later blogpost, and how this works.

All in all then, am I happy with my Joomla and MySQL 5.1 combination? Yes, sure, it works as expected. I do have quite a few issues with Joomla, the docs are no good, there are few really good books on it (some half-good are avilable now), the database support is not as good as it should be, and the database independence features that Joomla provides aren't good enough, and aren't used much even in the basic Joomla modules. Fact is, I tried running without MYSQL40 mode, and failed. And with STRICT mode, it was even worse. I gave up trying to fix this.

But it does work, it comes for free, the framework seems reasonably stable and once you get the hang of it, it is easy to use. The issue is that it just takes too much work to get the hang of it. If you have a simple site with some article you want to publish, then Joomla is just fine. If you are picky with the database interface and has some special needs, prepare to spend some time with it.

One sign, if you ask me, that the docs are not that good, is that fact that even developing an application with Joomla, not developing Joomla itself, I constantly go back to read the source.

But Joomla is helping me, the site is finally up and running, I have developed quite a bunch of useful Joomla utilities and components (one utility is a PapaTable class, derived from the JTable class, that fixes one big NULL value issue with Joomla), and people have told me they like the site, so maybe I shouldn't complain. And instead of complaining, I plan to write a MySQL 5.0 database module for Joomla, based on the MySQL 4.0 assuming ones that already exist. And then we'll see.

And BTW, don't try running Joomla with the Joomla system tables in InnoDB, it is error prone and not even that useful.

/Karlsson