Friday, November 13, 2009

InnoDB Plugin (With some Windows focus) - Part 2

In the last post on this subject, I discussed the benefits of the InnoDB plugin for operational things, such as creating and dropping indexes, and how much faster and non-obtrusive on other operations this is with the plug-in version of InnoDB, compared to the built-in InnoDB.

This time, I will discuss another benefit of the plugin, which is the operational metadata views provided by the plug-in. A way overdue feature of InnoDB is the ability to inspect locks and lock waiters, to be able to effectively manage lock concurrency issues. In any database using even a moderately complex schema, and having a reasonable amount of writes, concurrenctý issues will often happen. And the issue is that if you cannot monitor them properly, then you may not know what is happening. Users have sessions waiting for locks, then tires of waiting and abandons the operations, sometimes leaving locks of their own around!

Another issue is that end-users cannot usually tell the differece between a straight performace problem, such as a too small cache, a slow disk or a bad network, and a concurrency issue, i.e. waiting for a lock. And the difference in how we find and fix these issues is distinctly different!

OK, so now let's have a look in practice on hwo this works. To begin with, we need to configure the InnoDB plugin monitoring INFORMATION_SCHEMA tables, and here I will install all of them, not only the ones related to locking. The setting needed in the [mysql] section in the my.ini file, to support the InnoDB plugin and all the corresponding monitoring tables are then:
ignore_builtin_innodb
plugin_load="innodb=ha_innodb_plugin.dll;innodb_trx=ha_innodb_plugin.dll;innodb_locks=ha_innodb_plugin.dll;innodb_lock_waits=ha_innodb_plugin.dll;innodb_cmp=ha_innodb_plugin.dll;innodb_cmp_reset=ha_innodb_plugin.dll;innodb_cmpmem=ha_innodb_plugin.dll;innodb_cmpmem_reset=ha_innodb_plugin.dll"

Note that the "plugin_load" parameter must be on a single line! Now, restart the MySQL server and we can start to have some fun!

First create an InnoDB table for our testing:
CREATE TABLE locktest (
c1 int(11) NOT NULL AUTO_INCREMENT,
c2 char(10) DEFAULT NULL,
PRIMARY KEY (c1),
KEY c2 (c2)
) ENGINE=InnoDB;

And insert some data:
INSERT INTO locktest VALUES(1, 1);
INSERT INTO locktest VALUES(2, 2);
INSERT INTO locktest VALUES(10, 16);
INSERT INTO locktest VALUES(11, 16);
INSERT INTO locktest VALUES(15, 18);

And we are set to do some testing. First, open two sessions on the database where the testing table was created, and then, in the first session, do a SELECT .. FOR UPDATE in a transaction:
BEGIN WORK;
SELECT * FROM locktest WHERE c1 > 8 FOR UPDATE;

Now, in the other session, lets try to update one of the locked rows:
UPDATE locktest SET c2 = 17 WHERE c1 = 11;

And then, in a third session, lets see what we have in the lock and lock_wait tables:
use information_schema
SELECT lock_mode, lock_type, lock_table, lock_data FROM innodb_locks;
+-----------+-----------+-------------------------+-----------+
| lock_mode | lock_type | lock_table | lock_data |
+-----------+-----------+-------------------------+-----------+
| X | RECORD | `plugintest`.`locktest` | 11 |
| X | RECORD | `plugintest`.`locktest` | 11 |
+-----------+-----------+-------------------------+-----------+
2 rows in set (0.00 sec)
and:
SELECT * FROM innodb_lock_waits;
+-------------------+-------------------+-----------------+------------------+
| requesting_trx_id | requested_lock_id | blocking_trx_id | blocking_lock_id |
+-------------------+-------------------+-----------------+------------------+
| 6007 | 6007:0:16396:12 | 6001 | 6001:0:16396:12 |
+-------------------+-------------------+-----------------+------------------+
1 row in set (0.00 sec)

As we can see, InnoDB will only show locks that are waited on, but that is cool. Look at the lock_data column in the innodb_locks table, you see the actual data being waied on there. The lock_mode column is X (the two rows are for the table and index respectively). In your case, you may see X,GAP for the index lock. That is because you have index gap locking on, or in other words, you have not set innodb_locks_unsafe_for_binlog.

All in all, tracking down concurrency issues with these tables is A LOT easier than the old "show innodb status" way of dealing with things.

/Karlsson

Friday, October 30, 2009

InnoDB Plugin (With some Windows focus) - Part 1

So InnoDB is now, from 5.1.38, distributed with MySQL. It is still not enabled by default though. What this means is that there are 2 InnoDB implementations that comes with your MySQL installation. The old-school standard one, which is compiled in with the MySQL binary and is enabled by default, and one plugin.

The plugin comes as a DLL, and to load it, you have to do two things:
- Disable the builtin InnoDB Engine.
- Load the plugin.
To do this, on windows with a standard configuration, you edit the my.ini file and add the following two lines:
ignore_builtin_innodb
plugin_load=innodb=ha_innodb_plugin.dll

And that's about it, if you restart MySQL now
, the plugin version of InnoDB will be used. So, what is new in the plugin then? Many things, actually, some rather cool features, a bunch of performance and scalability improvements and some enhancements to the operational aspect.

I will write a few blogposts on the subject on the InnoDB plugin, and I will use Windows for what I am doing, mainly as I have noticed there is little in the way of documentation, blogs etc on this same subject with a Windows focus.

And let me tell you that although the performance and scalability improvements are what is most talked about, this by far is not the only changes. Even if you have OK performance today, use Windows and have little need for more scalability, there is stuff here for you. But before I go into today's subject, let me tell you one important thing: The InnoDB plugin, as it stands here, is not considered GA yet. This is planned for December this year, but right now, it's considered Beta, despite the fact that MySQL 5.1.38 is GA. So be a bit careful here. And that said, the plugin in and of itself actually is GA, so I don't think we'll see many issues here.

Now, today's lesson: Fast index creation
This is a feature that is waaay overdue, if you ask me, but it's available now. What it means, is that in most cases, creating or dropping an indexes does not mean that the table needs to be rewritten anymore! This is a major advantage when it comes to managing a MySQL installation. To me, this is reason alone to leave MyISAM / Maria and whathaveyou, and go InnoDB instead!

The interesting thing right now, in 5.1.38 and up, is that there are 2 InnoDB engines, and we can compare them, old and new, by just flipping a few parameters and restarting MySQL! So lets do that.

I create an InnoDB table:
CREATE TABLE t1(c1 INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
c2 VARCHAR(1024)) ENGINE=InnoDB;

The next step is to insert some data. I insert some semi random data in the c2 column, 300 bytes long. It's not terribly random actually, but I wanted a simple test and go on and insert some 311296 rows (just some arbitrary number).

I then want to create an index on the 10 first bytes of the c2 column. This is what happened with the InnoDB plugin being used:
mysql> create index ix1 on t1(c2(10));
Query OK, 0 rows affected (21.75 sec)
Records: 0 Duplicates: 0 Warnings: 0

And then I drop the index:
mysql> drop index ix1 on t1;
Query OK, 0 rows affected (0.14 sec)
Records: 0 Duplicates: 0 Warnings: 0

At this point, I should say one thing. Before I created the index, I restarted MySQL, to make sure that I could "benchmark" this properly, without data in the cache.

OK, now I flip the configuration back to using the built-in InnoDB Engine, restart MySQL and run the same simple test again, and this is what happened:
mysql> create index ix1 on t1(c2(10));
Query OK, 311296 rows affected (1 min 6.28 sec)
Records: 311296 Duplicates: 0 Warnings: 0

mysql> drop index ix1 on t1;
Query OK, 311296 rows affected (43.27 sec)
Records: 311296 Duplicates: 0 Warnings: 0

Quite a difference, right? Using the InnoDB plugin, index creation is 3 times as fast! And this is not that a big index! And dropping the index is more than 300 times faster! That's the kind of performance improvement I like!

/Karlsson

Monday, October 12, 2009

Of UNICODE, UTF-8, Character sets part 2

Welcome to this second post in this series on UNICODE, Character sets and what have you not. In the first of these posts, I went through some of the history of character set support, and some anomalies, and finished around the mid-1990's, when we had a bunch of reasonably well stanardized 8-bit character set. And then something happens...

Jacques Chirac becomes president of France. Now wait, that wasn't it. No, what happened was the Internet, and suddenly the time of IT as isolated islands, where we could determine ourselves how we wanted our computers to operate and what character set to use. came to an end. Suddenly, a user in France could view a webpage created in Japan. And now the issue woith Character sets becomes a real problem. Luckily, stuff has been going on since the late 1980's, more specifically UNICODE. Whoa, UNICODE comes to the rescue.

The first installments of UNICODE utilized a 16-bit character format. This later sneaked into operating systems, libraries and all over the place. The char datatype in C was supposed to be substituted for the wchar_t datatype (W as in Wide). This did non become very popular, but the scheme presisted and is still in use. Windows has many hooks for this, and most Windows API functions have wchar_t counterparts, and there are even portability macros (the being one of them, working together dit datatypes such LPTSTR and TCHAR). This turned out to be more used in countries, mostly in Asia, with large character sets, which was present in the UNICODE character set. The Windows macros and types made life easier for far-east developers to make their code more portable to more ISO-8859-1 friendly languages.

What made this not so popular was that it still was a headace creating portable code, and also, the UNICODE character set was moving on, and was soon more than 16-bit. So the 2 byte datatype we used for characters (wchar_t is mapped to a short usually), wasn't enough. Now, working with 2 bytes instead of one was bad enough, but working with 4, 5 or 6 per character was just too much.

The UNICODE character set now needs 6 bytes to cover the full representation. So much for all that hard work by Microsoft and all those macros and API rewrites. Just consider looking for the end of a C-style NULL-terminated string, even with 2 fixed bytes, this is much more difficult than what it used to be. With 6 bytes even more so!

So along comes some schmes that allow you to "encode" a UNICODE character in some other, hopefully easier to manage, form. The most popular of these, by far, is UTF-8. This is a means of encoding a 6-byte long UNICODE character to a format that is a variable length. The nice thing with UTF-8 is that the first 128 positions are encoded exactly like old-style 7-bit ASCII! One byte, highest bit being 0! The Way UTF-8 works means that these 128 bytes position will never appear as part of any other UNICODE character, or in other words, the high-order bit is ALWAYS 1, except for these first characters.

So, given all this, it seems like functions like strlen will work with a UNICODE string? Well, sort of, but it will give you the length in bytes not in characters. But besides that, it will work. And so will strcpy, strcat etc.

So is all hanky-panky then? Nah! Let's look at UNICODE transformation besides UTF-8, namely UTF-16. This is still a variable length encoding, like UTF-8, and it's not that much used actually. But some people tend to THINK it is. What is being heavly used, as mentioned above in Windows, as well as in many Linux technologies though, is UCS-2. So what is that? Well, UCS-2 is based on a ISO Standard, ISO 10646. This preceeded UNICODE slightly, and was, like early UNICODE, fixed 16-bit. UCS-2 means "Universal Character Set, 2 bytes"! When UNICODE came around, the (then) fixed encoding was UTF-16. These two, UCS-2 and UTF-16 are very often confused. But whereas UCS-2 is still a fixed 16-bit character set encoding, UTF-16 has developed and is now a variable length encoding, but of course still very similar to UCS-2. Gosh, I wonder what these folks were smoking whi figured this one up.

There is one UNICODE encoding that is fixed length, and that is called UTF-32 (or, if you ask ISO, UCS-4). This is very seldom used in practice.

But UNICODE seems to persist, in particular UTF-8, which is, for example, standardized in Java. As for MySQL, it supports UTF-8 as well as classic ISO-8859-1 (which is called latin1 in MySQL, which you know by now, if you did last weeks lesson) and several other character sets. One character set not well supported though is UCS-2. You can define it as a server character set, but a UCS-2 client is not accepted. And I think this may be difficult to implement, there is just too many places where MySQL regards a character as being 8 bits. And UTF-8 is much easier, as long as we don't care about the actual length, we can treat a UTF-8 string as long ASCII string, and all special characters that we may be looking for in a UTF-8 string, such a CR/LF, ASCII NULL and semicolon, are in the 0-127 range.

Now, lets have a quick look at UTF-8 and MySQL. As UTF-8 is variable length, any string space, if defined by the # of characters, is bound to be also variable length, right? Ha ha, got you there! No, MySQL will allocate 3 bytes for each character in a UTF-8 string. What, 3 bytes, isn't UTF-8 up to 4 bytes? Yes, but MySQL only supports UNICODE "codepoints" that can be represented with up to 3 UTF-8 bytes.

OK, that's enough for now, I'll finish of this series with a post on collations, some general ideas on character set development and testing, and a few beers.

/Karlsson

Thursday, October 8, 2009

Of UNICODE, UTF-8, Character sets part 1

Why would you care about UNICODE? Come on now, most people can read english and english can be written using only 7-bit ASCII, so who needs more? Well, I think it's safe to say that Internet (remember that? Netscape, WWW, .com booms, pet food on the net etc) changed all that. Now applications can be found and run everywhere by anyone, more or less, so even if the application speaks english, and even if the user does, you may end up with users inputing data using some other character sets.

For someone like myself, having grown up in a "beyond A-Z" part of the world (Sweden, which is one of the easy cases), I can tell you how annoying it is when I input my address on some webpage (this happens even on swedish website)s using some swedish characters (I got 2 of the 3 beyond A-Z characters in the name of the street where I live), and it comes out looking like someone just smashed a fly prominently placed in the name of my street.

For a developer, this is difficult. Having someone test it is bad enough. And then we have things like localized keyboards (I got one of them), printers, OCR software etc. With that in mind, I plan to write a few blog posts on character sets, unicode and stuff like that.

Before I start this first post though, let me tell you that although I am pretty familiar with the subject, I'm far from an expert, so you may well catch an error or two. Please help me correct them if you find any!

So, that said, where shall we start? Well, lets begin with some basics, way back in the Reagan administration. The 7-bit ASCII character set was what was used all over the place when I started in this industry. The only competition was EBCDIC, but that was IBM mainframe only. This was in the early 1980's, but even then, we needed to use Swedish characters sometimes (I am a "swedish character" myself, I guess), and in the 7-bit ASCII world, this was handled by changing some lesser used punctuation marks to the 6 swedish characters (å, ä, ö and the upper case versions Å, Ä and Ö). This was an issue as a C developer, as I was back then, as the puctuation marks changed was the pipe, backslash, and the curly and square brackets! Yikes! Usually, you could put you VT220 terminal in US ASCII mode, but when you printed, the printer was often shared with office workers, meaning that the printouts often looked like:
main(int argc, char *argvÄÅ)
ä
printf("Hello WorldÖn");
å
Well, you see what I mean, quite unreadable, looks even worse than a Python script. I was about to write that I might have gotten the above slightly wrong, as it was a long time ago since I used this, and then I decided to look it up, and when I did, I actually had it all right, which goes to show that this was something you really had to learn if you were writing code in C here in Sweden back then in the stoneages.

Now, time went on (well, actually, it didn't. 7-bit swedish ASCII is still in use out there, quite a bit in homebrew ERP systems and stuff like that), and the next step was support for all (or most) of the western world characters in one character set. And the 8-bit ASCII set was born. This was pretty good, actually, and was pioneered most in the DEC VT220 terminal and then spread. There were still some variations of the 8-bit character set, but they were much fewer. The most common, by far, is the ISO 8859-1 character set, which contains most characters used in major western world common languages.

Why do I use such weird language here, you ask "major western world common languages", why do I just not say "western world langauges". Because that would be incorrect, that's why. Take my native Sweden for example. I think most swedes will agree that 8859-1 contains all character used in the official swedish language, and that there is just one such language. And this just isn't true, I'm afraid. Neither 8859-1 or any of the other 8859 variations cover any of the special characters in the 4 (I think there are 4, where 3 are sort-of common and used) sami languages / dialects.

8859-1 has a few variations (I know, I know, this is getting boring. ALL these character sets have variations). One such is the 8859-15, which, among other things, contains the Euro symbol. 8859 also has another name, which should be well known to you MySQLers: latin-1! And what about Windows? Windows uses codepages (cp) and cp1252 is the one used by non-UNICODE Windows variations in most of the western world. And cp1252 is the same as 8859-1, right? Nope, it's not, but for our practical people, it can be trested as being so.

So what is the difference between cp1252 and ISO-8859-1 you ask? The difference lies in something that hardly anyone uses anymore, which is in the control characters. CP1252 contains only the non-printable characters as used in 7-bit ASCII in range 0-31, whereas 8859-1 and -15 also has some control characters in the range 128-159. In the latter range, CP1252 has some real characters.

This difference is due to ISO 8859-1 being so much older, from days when we actually used control characters (do you youngsters reading this even know when these are? If not, ask your grandaddy). But besides this, they are the same. This means that web-pages, which typically use 7-bit (very old pages do), 8859-1 or UTF-8 (other variations DO exist, but these are the most common ones), using 7-bit ASCII or 8859-1 can be displayed on Windows using CP1252, as 1252 just adds characters in a control characters range, and control characters aren't used on a web-page (except the basic LF, CR/LF, LF/CR and ... NO, dont get be started on THAT for gods sake!).

So along comes 8859-15, which builds on 8859-1, but adds the Euro sign, among a few other things. And as CP1252 was already in wide use, and as 8859-1 was largely compatible with CP1252 for all practical uses, and because noone in their right mind use much of control characters anymore, the committe defining 8859-15 was smart enough to put the additional characters in the same place as the existing ones in CP1252 (the Euro sign is a good example, CP1252 contains the Euro sign in the upper control characters range). HA HA HA Got you there. This is ISO, a bunch of smart people, of course they would not put the Euro sign in 8859-15 in the same place as it was in CP1252! The effect was that, I think most people who think they use 8859 actually use CP1252 (as the Euro sign is used more and more, and the 1252 encoding of it is probably more well known).

OK, so this is a mess. You understand that by now I think, it's not just me who is a mess, the whole character set thing is. Luckily UNICODE will fix that, so more on that in the next post of this subject (and if you beleive that UNICODE will fix this and stop the controversy, let me tell you about a New York Bridge that I can get you a real good deal on). And also something on collations. What are those? Any why? And what happened to the squirrel? We'll be right back, so don't touch that dial!

/Karlsson
AKA The Swedish character

Wednesday, October 7, 2009

MyQuery 3.2.1 ready for use!

If you need an ad-hoc Query Tool for MySQL on Windows, then MyQuery might be what you aere looking for. MyQuery is an Open Source, free Windows tool, supporting colour syntax highlightning, multi-tab editing, several information, status and dictionary features and much more.

The latest release is MyQuery 3.2.1, and if you have followed the development of MyQuery, you know that the 3.x releases are focused on features and usability, and 3.2.1 has advances in both areas:
  • Much enhanced keyboard navigation. It is now possible to navigate the main window and most functions and dialogs, using only the keyboard. Tab-ing around in teh main dialog is now supported for example.
  • Enhanced Keyboard accelerators, where accelerators for almost all functions has been added, and there is now also a non-modal dynamic help dialog, ie. dynamic in the sense that is shows the current key assignments. This help may also be printed, if necessary.
  • Printing support. This was much more difficult than expected, but I set my goals high. I wanted to be able to print both script and results (the former was much easier than the latter, due to Scintilla handling most of it for me), and both at once, ie. printing a query with the results of it on one page. Also, I wanted to have some nice touches to it, like headers, framing etc. But I'm happy with the way it turned out.
MyQuery 3.2.1 is available for download from Sourceforge!

I will follow up this post with some more, detailing some of the features of MyQuery!

/Karlsson

What's the matter with Sourceforge

Sourceforge used to be my preferred way of sharing and finding software. Sourceforge used to be there always, hosting loads of projects and stuff, some of them good, some bad, some dead, some very active. But many fine projects was there, incluing a big bunch of good MySQL projects, and most of my own projects are hosted there.

Now, Sourceforge has, somehow, turned dead slow. The interface is now nice and inviting, sort of, but some of the old interface is still there. But it is so sloooow. Gosh. Is lauchpad the name of the game, I have seen lauchpad more as a tool for codeveloping software and stuff, but I have hardly looked at it, so I don't know. All I want is a means of sharing my projects. They are so small and insignificant that few are likely to want to step in and work on them with me, so although they are Open Source, I think there are few people reading the source (and maybe that is just as well).

Anyway, I need to look for something else than Sourceforge, because Sourceforge, at least for me, is just a joke now.

Or am I missing something? And if not, what's the best alternative these days?

/Karlsson

Thursday, September 10, 2009

MyQuery 3.1.1 ready with a minor (but nasty) bugfix.

As noted in a previous post, I have released MyQuery 3.1. A user found a nasty little unnecessary bug that caused a crash in the profile dialog, when the Group by status checkbox was clicked when there was no data. This has now been fixed and a 3.1.1 version it available on Sourceforge for download.

/Karlsson

Wednesday, September 9, 2009

Released: MyQuery 3.1 Beta

MyQuery 3.1 is now ready for download from Sourceforge! There is a bunch of cool features, but the theme was Convenience and DBA Features. On top of this, UTF-8 support has also been added, not in a terribly advanced way, but it should work for most needs.

Among the new features are:
  • Events dialog to manage MySQL events
  • Much enhanced Drag-and-drop support. Drag a tables, index or column from the dictionary window to the editor, and the CREATE statement will be pasted for you.
  • SQL variables dialog, with helptext and manual links for most variables.
  • SQL Profiling support, including the ability to compare the profile of two different queries.
In addition, there has been some bug fixes, memory leak fixes, and as usual (you are sitting down now, right), updated documentation. I still have a few things to add, like online documentation, some NDB features and stuff like that, but MyQuery is getting pretty feature packed now, in particular when it comes to DBA features.

There are three different download files available from Sourceforge as above:
  • A pdf - Documentation.
  • MyQuery_3_1_setup.zip - The binary installer for Windows, including a setup program and the documentation pdf.
  • MyQuery_3_1.zip - The full sourcecode.
I want to add a few things in next release:
  • Support for the InnoDB pluging INFORMATION_SCHEMA tables, including a lock graph.
  • Support for NDB objects, such as tablespaces and stuff.
  • A few wizards.
  • Support for other storage engine specific features.
  • User and authorization management dialogs.
But that is for version 3.2, which will be a while. Keep me posed with any suggestions, ideas, bugs etc.

Cheers
/Karlsson

Tuesday, September 8, 2009

SSD disks... Bah!

SSD disks seems to be the hot and cool new disk technology. And I have to admit I don't understand why this is so hot. I think it's useful alright, but really, putting an ancient block level interface that dates back to the 1970's or so, to access a persistent, random access and fast media just seems backwards to me. Why? Because by doing this, we retain compatability with that old block-level access of course, but we loose performance and we loose random access.
Properly used, flash based memory should provide fast, persistent random-access memory, that is the promise of the technology, right? Any many seems copncered with what filesystem to use on SSD disks? Hey, com on, it's just like RAM! Slower, yes, by persistent. I don't need blcok level access, it doesn't help and the interfaces are inherently slow. And if the underlying media is random access, why, for heavens sake why?

Yes, I know there are a few issues to solve with Flash storage still, and that it will eventually "burn-out", but it seems like those problems are being worked on and a much smaller than what they used to be.

No, I don't want an SSD optimized filesystem, but I don't want to access flash over a serialized stoarge anyway! Can we instead have a standardized means of accessing flash as it should be accessed: Random access! Persistent! SSD is just a stop-gap solution for the real thing, in my mind. And yes, I understand that things take a while to develop. And yes, I understand that compatability with older equipment, hardware and OSes are a good thing. But the real McCoy here is, hey, you gessed it, a means of random access to flash! The issues with erasing flash should also be possible to solve, probably in software. So we probably need a software layer here.

Maybe we should be virtualizing RAM to flash. Maybe we should add some new interface for a "persistent malloc". Maybe applications should not need to be Flash aware (I think they should, as only the application knows what data needs persistence and what does not). But in any case, applications should be able to access Flash as true Random access, although I do understand that there are limites to this in terms of erasing. That is the problem I want fixed, not a new filesystem!

In a database, putting the transaction log on flash should be great, for example, but there are many more uses. Can we see a Flash based MySQL Storage Engine. No, not one optimized for SSD, one that accesses Flash directly!

But SSD's should, in my mind, should eventually be gone. Which is not the same thing as it will happen. We used the silly IDE interface for way too long, for example.

/Karlsson

Friday, August 28, 2009

SHOW GLOBAL VARIABLES ... autocommit?? What?

The autocommit variable is a session only variable, right? Easy to confirm by:
mysql> select @@global.autocommit;
ERROR 1238 (HY000): Variable 'autocommit' is a SESSION variable


But if I do this another way, I get a different result:
mysql> show global variables like 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit | ON |
+---------------+-------+
1 row in set (0.00 sec)


To me, this is a bug, right? But before I report it, I wanted to see if you agree, or? Might there be stuff out there that relies on this strange behaviour. And it gets worse, I'm afraid. In a new MySQL session:
mysql> show session variables where variable_name = 'tmp_table_size' or variable
_name = 'autocommit';
+----------------+----------+
| Variable_name | Value |
+----------------+----------+
| autocommit | ON |
| tmp_table_size | 13631488 |
+----------------+----------+
2 rows in set (0.00 sec)

mysql> show global variables where variable_name = 'tmp_table_size' or variable_
name = 'autocommit';
+----------------+----------+
| Variable_name | Value |
+----------------+----------+
| autocommit | ON |
| tmp_table_size | 13631488 |
+----------------+----------+
2 rows in set (0.01 sec)

Looks fair, right. So now lets set these two variables in this session:
mysql> set tmp_table_size = 102400;
Query OK, 0 rows affected (0.00 sec)

mysql> set autocommit = 0;
Query OK, 0 rows affected (0.00 sec)

OK, now let's look at the result, in global and session scope again:
mysql> show session variables where variable_name = 'tmp_table_size' or variable
_name = 'autocommit';
+----------------+--------+
| Variable_name | Value |
+----------------+--------+
| autocommit | OFF |
| tmp_table_size | 102400 |
+----------------+--------+
2 rows in set (0.00 sec)

mysql> show global variables where variable_name = 'tmp_table_size' or variable_
name = 'autocommit';
+----------------+----------+
| Variable_name | Value |
+----------------+----------+
| autocommit | OFF |
| tmp_table_size | 13631488 |
+----------------+----------+
2 rows in set (0.00 sec)


Whoa, this doesn't look right, does it? I change one variable and it affects GLOBAL scope, but not for the other variable. And there is a difference between these two variables, which is that autocommit is really a session only variable! So why in heavens name does it pop up when I do a SHOW GLOBAL VARIABLES? And why is the global value changed when I set the session value? This seems to be how it works for all session only variables?
There might be some sense in this, that I do not understand, so please enlighten me! But if this is the case, then the docs are wrong! Like MySQL manual section 5.1.5 that says this: Several system variables exist only as session variables and then goes on to list these, which are all show as above as both local and global (but always with the same value). The same section also states that Most of them are not displayed by SHOW VARIABLES which isn't true at all, actually, they are ALL displayed by SHOW VARIABLES!

And MySQL manual section 12.5.5.41 says that With the GLOBAL modifier, SHOW VARIABLES displays the values that are used for new connections to MySQL which actually isn't true either (just because I set AUTOCOMMIT to 0, and that is the vale displayed by SHOW GLOBAL VARIABLES, new connection will still have AUTOCOMMIT on).

So, what do you say? Am I just confused? Is the docs wrong, or even right in some awkward way? Or is this a bug? Or? And if you are wondering, the effect on the corresponding INFORMATION_SCHEMA tables is the same...

/Karlsson

Thursday, August 27, 2009

Drag-and-Drop in C on Windows, the easy way

OK, yeasterday I published a post where I complained about the complexities of Drag-and-Drop in plain C on Windows, and how this was related to Object Orientation gone bad. Why do you need three or 4 objects, jsut to drag a string of text from one place to another?

Well, then, if you agree that this is outrageous, I have help for you now. I just published my plain C library that does just this. You simple Win32 C program need not worry about OLE or COM. About IDataObject or interfaces or any of that stuff. Just usse hardcore C, like all macho developers like myself do. You need my library though, and I admit I AM cheating a bit in it, but actually, it does the trick and it follows the rules, both from COM/OLE and Win32 points of views.

Now, to drag and receive a text string, you need my library and you need to call 4 plain C functions or so, nuthin' else: Init the library, register as a target for a drop, start dragging and cleanup as a target for a drop. Then you need to handle the drop itself of course, but this is easy enough.

So the help is now available on Sourceforge and there is the source for the library, a Visual Studio project for this, complete with a sample, and a full PDF document describing the whole thing.

So, happy hacking, and don't drag your feet!
/Karlsson

Wednesday, August 26, 2009

Of OO gone bad and how I played a trick on Microsoft

Look back at the early / mid 1990's and if there was one paradigm that would rule the world after that, it was the everything would Object Oriented after that. I was pro-OO as a general idea also, and still am, but I believe that there are still stuff to solve. But hey, that wasn't the early 1990's, everything OO was supposedly good. SQL with OO extensions (remember the ORDBMS ideas? I still have Stonebreakers book, and it's worth reading, but nothing much came out of it).

I think there are a whole bunch of reasons why OO didn't end up being as important as it was foreseen to be back then (it's still important and widely used, I know that, but if you were there in the early 1990's OO rage, you could even go to the bathroom or have a beer without creating an object, and as we realize, after a few of those "Beer objects" it turns more and more difficult to create them), and you could go to a bar and chat a lady up without getting involved with polymorphism (and that is NOT a "bad" disease, and if you are not into OO, it's supposed to be a good thing. But ladies in bars rarely understand that).

Anyway, back to the original discussion. What killed the all encompassing OO world was, among other things, the web. Once WWW was stamped in every ones heads, all such theoretical stuff as OO was pushed to the side, and more traditional, but simple, efficient and workable solutions such as PHP and MySQL came around, none of them terribly OO.

So assuming that OO didn't die, but became less important (as did those consultants of those days that didn't write single line of code, or set up a single server in the world or anything, but despite this was the highest paid in the industry, because they were "OO consultants". Writing diagrams, and charts and going to meeting, expecting that awful code to write itself, so they didn't have to. Once that era was over, some of their firms got prestigious jobs at solid and reliable companies such as Enron). That was a long parenthesis, by the way, I'm getting good at that.

Whoa, will I ever get to the point? Yeah, yeah, I'm getting there. One thing on everybodies mind at the time was 32 bit computing and the next Windows release, Windows 95! Windows 95 used the "new" Windows 32-bit API Win32 (Hey, Microsoft marking had a long thought about THAT name), to replace Windows 3.11 and predecessors 16-bit... (You get a hug from me if you can guess the answer) Win16.

So one might think that Microsoft would be smart now. Win32 was a good API, tried in Windows NT and was solid and reliable. And a solid 32 bit API it was and is (I was programming in Win16 way back in 1992 or so). So the new cool GUI features of Windows 95 that would revolutionize the world and create world peace (except possibly for parts of the middle east, they had to wait til Windows 98) would just be added to the, now rather massive, Windows 32 API. That was easy to use as it was now clean 32-bit (Win16 was a mix, at beat) and doing it in C would make it easy to add any cool OO technologies on top of that. That's the good thing with C, you can interface just about anything to it. So this was NOT what Microsoft did.

No, the power of OO was still in effect during the time when Windows 95 was developed. OK, of the main applications languages on Windows at the time, few, among them C, Visual Basic and them some weirdo scripting stuff like "Hypercard" (Oh please, let's forget that. Don't bring that up again!). No, Microsoft invented COM. Which is a pure OO layer. And OO is the future!

Well, there is an issue there. Doing full OO on top of non-OO is pretty easy, the other way around not. So COM turned out to be much less complex than expected, as it had to interface with VB and stuff like (people had large stopped caring for C now, it wasn't OO you know. But VB was easy to "use" (which means that anyone can write useless and bad code in no time at all) so that was OK. And in the process, Microsoft wanted to convert the C people into C++ developers, so here came MFC.

Now, the world of OO wasn't as simple as that, not even for dear old Microsoft. Even though stuff like Informix had an OO SQL, and C++ was OO as was Smalltalk and a bunch of other languages, and some had had OO "extensions" added to them (the technical terms for OO features in a non-OO language is "lipstick on a pig"), this was not as bright and shiny. C was known for being reasonably portable and simple to access, but that was because the concepts of C are based on something fundamental, which is how a computer works. And int datatype is what the CPU sees as an integer. A pointer is a CPU reference. Easy to understand and use, once you get the idea.

The would of OO though was based firmly in a swamp of OO consultants driving the world of OO and IT and world peace (see above) by shouting at each other and when that didn't help, writing long book, documents and drawings, that they threw at each other. Results was not world peace, surprisingly, but that we ended up with OO that was there and that worked, but that was different all over the place. Some had multiple inheritance, others didn't. Some were basically non-OO and had had some OO lipstick painted on them by some high-ranking OO consultant, and some were so OO that basically the only keyword in the language used was object. And some had interfaces and others didn't. And some implementations of some OO stuff had templates, others didn't (you know who you are, but I am not naming names here). And they were all objects, and they were all incompatible, to to the rescue came, not the good lord, but the web. And all this OO stuff was left as it is, and forgotten about.

So here I am, finally getting to the point. I am working on my MyQuery MySQL Query Tool version 3.1 now, and most of it is done. It's pretty packed with cool features for a DBA and all that. Since version 3.0 I have also had dictionary views. These were useful, and you could double-click on, say, a table in one of them, and it would be pasted to the editor window. Useful, yes, but the way to do such an operation, i.e. copying data from one window to another, should be done with Drag-and-Drop. And Drag-and-Drop first appeared in Windows 95, and it does require some OO language, actually. It is very OO. Gosh! I have a text here, I take it with the mouse, drag it to another window, and release the mouse button, why can't I do that I do any other things? Like interacting with the mouse? You need to know when the mouse comes to a window? Handle a message! When it leaves a Window? Handle a message! When a button is clicked? Well, fact is, you handle a message! UNLESS! You are entering the window whilst dragging something! Then you need to create a whole bunch of COM objects! Why in heavens name is this necessary?

Well, that last question I can't answer, but I have been digging into this now on and off for a few weeks, and I know now that COM isn't as bad as it looked from the start. COM really is implemented in C anyway, it's just the Microsoft has done a crap job out of telling you how this works and has only provided a select number of interface, where they tell how the interfaces works from the user of the interface POV, such as a VB developer gets to know how to do this or that, but not how it gets from an OO COM object into a distinct non-OO VB application.

Well, that was the case until now. Eat my dust, Billy Gates! In the course of creating Drag-and-Drop for MyQuery, I felt this such a useful effort for someone else (and myself in future projects), that I implemented a generic C layer on top of stupid COM. I looked at what it takes to do a simple Drag-and-Drop with COM in VB and with MFC Com objects. Tell you what, my approach means less code. It really does. I only expose very few functions, and once that is done, to drag: Call two functions (both in my library), simple. And then to register for a Drag operation (this is not the kind of "Drag operation" that you can get by a shady doctor in San Francisco though) is two simple calls. And when an object is called, what happens? You have two choices: A callback function is called, or, hold your breath: A message is received. With all the data you need!

The library will be on Sourceforge, or possibly somewhere else, shortly.

And just to be clear. I don't mind OO. But I donät think OO as a generic concept was never done right! Which is why the importance of OO has diminished, in my mind. Java is OO alright, but this is more a convenience than a fundamental driving factor behind it, it just makes developers job easier, hopefully, which of course isn't bad per se, but we hoped for more, much more. And we did NOT look forward to, 15 years later, spending two weeks of dragging the non-objects out of COM, just to support bl**dy Drag-and-Drop!

/Karlsson
Not being 100% objective here

Tuesday, August 11, 2009

A minimal MySQL ODBC application

I got the question on how small a packaged application with a built-in MySQL server could be. I had an idea that it could be made very small, but I didn't have any exact numbers. The platform was Windows, so I decided to try to build an application like this, to test it.
I had a few requirements:
  • A minimal amount of registry settings.
  • I didn't want to spend too much time on the application itself, so this should be simple.
  • I wanted to have a full blow installation package, i.e. install application and then run it, so no servers to be started or something like that. Just install and run.
  • No assumptions about the Windows PC where this was installed (I actually in the end made one such simplification. I decided to use a fix MySQL port number).
  • Installer was InnoSetup, because I like it and I know it well.
So, I started building a simple dialog based Windows application, using ODBC. Took me very short time, a few hours at the most (202 lines of code, plus a dialog resource). The next step was the ODBC thingy. ODBC really insist that something is installed, at least that Windows knows where an ODBC driver is (but a DSN isn't required. And with some tricks you can get around even having a known driver). The good things was that I can call the driver anything I like, so I choose the name of my application (MiniODBC). I then needed to have my installation program to tell ODBC about this, and this is done with a registry setting:
HKLM\Software\ODBC\ODBCINST.INI\MiniODBC and then set the value Driver to the full path to the ODBC driver (which is part of my application, and is placed in the application directory. This is perfectly OK).

OK, so now for the most important part of this, the MySQL server? What do I need. Note that this is for a minimal setup! I need the server itself of course, mysqld.exe. Then I need a language file, errmsg.sys from an appropriate language directory. Then I need some database files, or rather, to make things really small, I could have ignored this and have the application create the single table I needed, and have MySQL create the InnoDB data and logfiles (yes, I'm using InnoDB). To simplify a bit, I decided to ship with a small default InnoDB file though (10M). I realize this is a large part of what is distributed, but I know it can be fixed, so I leave it at this for now. I also included mysqladmin.exe, so I can shut down the server nicely from the application.

I also need a my.ini. Except for the usual options, I include skip_grant_tables so I don't have to ship with all the usual MySQL host, db, user etc. tables. I used one database only, with one table, so ond db.opt file and one .frm file, and that's it.

As far as the installation goes, this was made really simple, I just copied the files, and the set one registry setting, the ODBC one mentioned above, and tweaked the my.ini file to match the installation directory. So what is missing? Starting the MySQL server maybe you say? I decided to do this in the application. No big deal really, and as I put the MySQL server and .ini file in the same directory as the application. An excerpt of the code looks like this:
char szMysqld[MAX_PATH];
static char szMysqlAdmin[MAX_PATH];
szMyIni[MAX_PATH + 80];
char szFile[MAX_PATH];
size_t i;

switch(nMsg)
{
case WM_INITDIALOG:
/* Get the directory where we are right now. */
GetModuleFileName(NULL, szFile, sizeof(szFile));
for(i = strlen(szFile); szFile[i] != '\\'; i--)
;
szFile[i + 1] = '\0';
strcpy(szMysqld, szFile);
strcat(szMysqld, "mysqld.exe");
strcpy(szMysqlAdmin, szFile);
strcat(szMysqlAdmin, "mysqladmin.exe");
strcpy(szMyIni, "--defaults-file=\"");
strcat(szMyIni, szFile);
strcat(szMyIni, "my.ini\"");

/* Start MySQL. */
ShellExecute(hDlg, NULL, szMysqld, szMyIni, NULL, SW_HIDE);

And yes, I'm aware that this isn't 100% optimal, but it works as an example (no UNICODE string support, limited overflow checking etc).

In conclusion, having built an InnoSetup package with this content, the installation package took up a bit more than 2Mb compressed (and this could have been reduced if I hadn't shipped the InnoDB files). After installation, I ended up with an installation directory with 13 files (14 after the first run: the MySQL .err file) taking up about 25 Mb of disk space, more than half of this being data itself.

If you want a copy of the application and source (well, without the source, this is a pretty limited application, to say the least), drop me an email and I'll fix that for you.

/Karlsson
Who has to do some Linux work soon, I have done much too much Window stuff lately

On Open Source Project and comercializing them

I got to think about the commercialization of Open Source software project happen these days. Many of them have been acquired by larger companies, or are about to, and I think a discussion here is important. By which I in no way think I have all the answers, quite the opposite, this post is more about raising questions, and maybe propose a few possible answers.

So, do we really need commercialization of the project? And if you ask me (but I am a Sales Engineer after all, so what do you expect from me), I think the answer must be yes. And then the question is how this is best done. I think we can agree that Open Source, as a development model is just great, but to keep it going, something more is needed. And there are reasons why we want commercialization of these projects. Among them, in my mind, are:
  • A commercial entity is needed for many things related to the project, such as partnerships, contacts, employees etc.
  • Some things are best done on a commercial basis. Certain things that are part of development of a larger project, things where it is just too hard to find Open Source volunteers to work on the project. And there are such aspects, such as Q&A, docs etc.
  • And last but not least, long term stability. In the world of large legacy enterprise installations, things are different than in your average webshop or more modern enterprises, such as Google, Yahoo etc. There just isn't competence enough to drive development of important software projects yourself. You just cannot expect them to take up continued development of some Open Source project, if the project should fail or the main developers leave the project or something. In that instance, someone must pick up where the Open Source project left off, and continue development, at least for existing paying users.
So, how do we go ahead with this then? Most successful Open Source project of some size has some kind of commercial sponsors anyway. But we don't want these to "kidnap" the project anyway. Not because they are evil, but because it just happens. In the case of MySQL for example, it did happen to an extent. Largely, the whole development of MySQL ended up with the MySQL company. The effects if this may be debated, but it did happen, this is no secret. And we are trying to fix this, but it is not that easy. There are some reasons for this, like the dual-licensing model, but I think that could have been fixed I think, if it had been considered earlier on.

My idea for a solution to this dilemma is to let the project have two different entities drive it, the Open Source Company and the Project. These two are closely linked, but are still separate. The different responsibilities should be clear, as should any licensing issues. If we look at Red Hat versus Fedora, I think this is pretty good model. One might debate that RHEL is a bit slow in reacting to new features and developments, and Fedora too fast, but these are details.

Commercial Open Source is what we need a model for. And in my mind then, this model should not be there at the cost of the Open Source Project itself. No, they should vitalize each other, but still have different goal and end users, largely. And no, I do not for one minute believe that a reasonable large Open Source Project could survive just by adding tech support and PS to an existing project, and live from that. I am a strong believer in the fact that the commercial Linux vendors, like Red Hat, SuSE etc has done the Linux a much bigger favour than you think. The reason is that having a commercial vendor behind something like Linux, means that many more dare trying it and using it. You think everyone can use and learn Linux? And anyone can support it? Well, if you have a system that is at the core of your business or is critical to what you do in general, then you have to ensure long term stability. And I mean long term. Asking a question on a Linux form doesn't help in the long run. You might get an answer today, but what about tomorrow? And buying tech-support from some Local Linux expert is fine, but what about long terms? No, we need commercial entities here, just like we need the Open Source projects themselves. And I think that we should be looking for a model where these two can live side-by-side, not in competition, but adding value to each other. I think RedHat has done a good job here. I think MySQL is improving a lot here, but I this still need to be discussed.

/Karlsson
Getting philosophical

Wednesday, July 29, 2009

MyQuery 3.0 Alpha available now!

MyQuery 3.0 is now available, as 3.0.0 Alpha, for download from Sourceforge. I am aware that this is probably surprising for many MySQL users, as the original plan was to release 2.4, and now we end up with 3.0 instead, which is the reverse of what MySQL is up to now (going from 6.0 to 5.4).

Now,. jokes aside, the reasons for 3.0 instead of 2.4 is that 3.0 really contains a whole bunch of new features and bug fixes. Among the cool new features are:
  • Tabbed editing - This allows you to edit several files at the same time, in different tabs. The old "History" function is still there, although of less use now I guess.
  • Dual connections - MyQuery can now keep two connections active at the same time, one for issuing queries, and one for controlling the whole thing. This allows the controlling connection, for example, to issue a KILL QUERY if you have a long running query that you want to stop.
  • Status monitors - You can now monitor global and session status easily, and I have included "lap time" and "total time" monitoring, and some nifty means to choose exactly what status you want to see.
  • MySQL Statements monitoring - This shows statements from SHOW PROCESSLIST, both in just a plain processlist format, as well as a specific dialog which keeps a history of statements, having any literals removed. You want to check up which statements are running most and takes most time? This is the feature for you (although this is not as effective as the MySQL Enterprise Monitor Query Analyser, as the latter will track all queries, even ones that are very short, whereas I poll SHOW PROCESSLIST).
  • Dictionary dialogs - Dictionary for tables and routines, that you may use as reference and to get at the CREATE statements for them, copying them to the editor or to the clipboard.
  • Non-modal dialogs - One thing that the multiple connections allows me to do is to have multiple windows open at the same time, without them blocking each other. I use mutexes to control access between different windows sharing a connection also.
  • Keyboard accelerators - These were there before also, but now there are more of them, and they are configurable by the end user.
  • Editor keyword files - The keywords used by the editor for syntax highlightning are now configurable, by placing them in "keyword files". You may have several different sets of keyword files, and easily change between them.
In addition to the above, I have refactored a few things to make things work better, and some features which have been there before has been fixed up. This goes for example for the Auto-reconnect feature, which has been there for a while, but which never worked really well. Now it works better, and is easier to maintain.

As for this being an Alpha version, this is based on it not being 100% feature complete yet. Also, there is still a fair amount of testing to do. The features to add before GA are a few more dictionary views, and the testing them of course.

If you test it, and get a crash, there is now a crash handling feature included. This will bring up a dialog and an optional emailing Window, allowing you to send the dumpfile to me. Feel free to attach that file and drop it off to me, but it may turn rather large, possibly.

Happy SQL'ing
/Karlsson

Monday, July 27, 2009

Using Windows multiple memory heaps for finding leaks

If you, like me (but sometimes I feel it's just me left) use C and the Win32/Win64 API for Windows programming, you might be interested in this little trick which I use quite often on Windows to make sure I do not have any memory leaks.

Usually, you allocate from the heap using something like malloc for a level of portability, as the Windows HeapXxx calls are Windows specific, and malloc maps to these guys anyway. usually.

In Windows, each process has a default heap. In most cases, DLL data is private, but heap allocations are usually done from the Process heap. This causes a memory allocation issue that is all too common: There may be a memory leak, and again, it may not be, as everyone is allocating data from the same heap, including any libraries, static or dynamic, that you may use, when all you want to know is if you have a leak in your code.

And this what you can do here: Although a process has a default heap, there is no stopping it having several heaps. The memory allocations functions that you mostly use on Win32, like HeapAlloc and HeapFree, take a handle to the heap to allocate from as the first parameter, and this handle can come from two places:
  • GetProcessHeap() - Get the default heap for the process.
  • HeapCreate() - Create a new heap.
Often, you see code like this then:
LPTSTR pStr;
if((pStr = HeapAlloc(GetProcessHeap(), 0, nLen)) == NULL)
return NULL;
and so on. Not too exciting stuff. Instead, I recommend you keeping a global handle for the heap to use for all you allocations, and use this handle, like this:
HANDLE g_hHeap;
winMain()
...
g_hHeap = GetProcessHeap();
...
LPTSTR SomeFunction()
{
LPTSTR pStr;

if((pStr = HeapAlloc(g_hHeap, 0, nLen)) == NULL)
return NULL;
...
}
This looks like it is the same thing, but it does give you one advantage: You can, when you set your heap to the return value from GetProcessHeap() instead create your own heap, using HeapCreate(), and you need only do this in one place. And using some other nifty calls, you can "walk" this heap, and see how many allocations you have and how big they are. And that is from your heap only! No external DLLs, no nothing but what you have allocated! If you allocate 15 bytes, you will see 15 more bytes allocated from your heap, no more and no less. How can you do this then? The way I do this, if this is a GUI application, I provide a menu option with a dialog that shows me the # of allocated blocks and the size and stuff like that (also, I do thisin such a way, susally, so I can ifdef this menu option away if I want to). So how do you "walk" a heap then? Oh, that's easy, here is a simple sample for you:
PROCESS_HEAP_ENTRY entry;
unsigned int nBlocks;
unsigned int nFreeBlocks;
unsigned int nSize;

...
entry.lpData = NULL;
nBlocks = nFreeBlocks = 0;
nSize = 0;

/* Walk the heap and get info. */
while(HeapWalk(g_hHeap, &entry) != 0)
{
if(entry.wFlags == 0)
{
nFreeBlocks++;
continue;
}
nBlocks++;
nSize += entry.cbData;
}

I find this really useful, and it allows me to easily track down those small leaks (as I know the size AND count, this is much easier), that can, in a production system, end up being major leaks and cause big issues.

/Karlsson

Thursday, July 23, 2009

Black box programming in C. Why? and a little How.

I'm not too hot on C++, I admit that, although I am hot on some of the advantages of C++! The way it is possible to program using a class as a"black box", where the developer of that "black box" decides what is public, what is protected and what is private.

The reason this is good thing is simple, it makes it possible to control what gets access in a complex datastructure, and how that is accessed, and that is determined by the developer of that complex datastructure, not the users. Without this, you would have to document what aspects of the structure should and can be used by anyone using it, and not even that might be enough, i.e. who needs to read the spec!

This is particularly important with Open Source software. If I work on an open source project and a simple C struct is available to me, I will use it. Hey, it's there and I can see it. And if there are any ill effects of using a particular member of the struct, that I can check by reading the code! Right?? Nah, not really. It may well be that a particular member of a published struct is distinctly private and is planned for other uses beyond what is currently implemented. And when that happens, your code may break. I think this is an issue that we are fighting with at MySQL right now, but we are not unique, I have worked with other RDBMS projects (written in C), even commercial ones, and have seen the same effect. Just because it is not Open Source doesn't mean you are protected, it just means that you have some more control (you know WHO writes the code, and if they screw up, you know where to find them).

I think this is a very important issue with Open Source in general. How can you be sure that all developers that writes code in your project follows what ground rules? Not that they are unintelligent or mean, but really, getting the grips of a large chunk of complex code like MySQL is difficult. The nice thing with C++ here is that it allows you to document the source itself, sort of, and the compiler will, to an extent, also enforce that.

But for me, as I am no fan of C++ in general, or rather, I don't mind C++ in itself that much, but all the different frameworks, discussions, features that are not agreed on and that work differently (templates) and non-standardized aspects (name mangling for example) drives me crazy.

So, can I achieve something similar in C then? Yes you can! And it works surprisingly well, at least in small / mid-size projects.

To begin with do NOT repeat NOT put struct definitions in header files! This is just plain bad! This doesn't mean that structs shouldn't be published, it just means that the implementation of it should be "hidden" from the users of it, and a pointer to it is implemented in the header file.

mycode.h
typedef struct tagMYSTRUCT *PMYSTRUCT;

mycode.c
struct tagMYSTRUCT
{
int nId;
char *pName;
} MYSTRUCT, *PMYSTRUCT;

This works so well and I use it all the time! Even in my own small projects, where I am the only developer! hey, even I loose track of my own code at times!

Now, this of course means a few things. To begin with, if the whole project resides in one implementation file, mycode.c in this case, this doesn't help at all. So the lesson then is to keep the project split in several files, and implement each function as one or more types, usually structs, that are controlled by code in a simple implementation file.

Another advantage of only having forward declared structs in header files is that the include file ordering gets much less important, and the issue of includefiles X requiring Y, but also Y requireing X, is much easier to solve and is much less of an issue in general.

Disadvantages here is that for every member that is accessed, I must provide an accessor function. This is usually considered good practice with C++ also, but in the case of C++ it is a bit easier, as the implementation can be inlined (performance) and in the header file (documentation). Can we solve this with C then?

Well, yes, to an extent. If we assume that we really would like to have the nId member in my sample struct above public, we could do something like this:

mycode.h
typedef struct tagMYSTRUCT
{
int nId;
} MYSTRUCT, *PMYSTRUCT;

mycode.c:
typedef struct tagMYSTRUCTPRIVATE
{
int nId;
char *pName;
} MYSTRUCTPRIVATE, *PMYSTRUCTPRIVATE;

In this case, in the implementation file I need to cast PMYSTRUCT to a PMYSTRUCTPRIVATE. This is not such a big issue, and this is a workable solution, although this has a few disadvantages, like the multiple include issue. Also, I wrote somewhere above that you shouldn't put struct definitions in includefiles, hey, I'm just trying to be practical here. This way of doing things is a bit iffy I guess though.

Another neat C++ feature is inheritance. This can, to an extent, be done with C! Although this also iffy... I'll get into this one later... But I guess that you have figured it out already, if you have read this far.

/Karlsson
Getting out into the sun. Time for some open air motoring!

Wednesday, July 22, 2009

Opinionated review on an opinionated book: Joel on Software

I just finished reading Joel Spolskys book "Joel on Software", and I had a good and interesting read I have to say. This doesn't mean that I agree with everything in this, by now a bit dated, published in 2004 and as it is a collection of writings from Spolskys website www.joelonsoftware.com, it is more dated than that.

Spolsky writes in a rather humorous, witty style, and he is very opinionated and has strong opinions on many things. And he sometimes hits some rather minor issue which he dislikes with a baseball bat. But I enjoyed reading the book, it was a fun read and it got me thinking on issues, even though I didn't agree with Spolsky always, frankly, most of the time I didn't (he's not too hot on Open Source for example, and gives examples why it's not going to work, and we now know, as the book is dated, how things went. And he was wrong). But the deal with this book isn't to get you to buy into Spolskys ideas, rather, it's to start you thinking on the issues he writes about.

Before you go on and buy this book though, you should be aware that Spolsky is a big-time Windows fan, there is no doubt about that (and I tend to like Windows as a desk-top OS also). But again, that is not the deal here. The deal is to get those cogs in your brain working.

As for entertainment, this book isn't bad, as is often the case with strongly opinionated writings. There are a couple of annoying things in the book though, once you have learnt to live with Spolsky having a different opinion than you (or me, for the most part). And that is that Spolsky is also seems pretty full of himself at times. The software company that he owns is one that can't do no wrong, anywhere in the book. The only time that it's mentioned to do something at fault, it is to tell how incredibly well they dealt with that problem and fixed it, no doubt better than anyone else.

But that aside, if you want to read a book on software development (yes, there is very little code in the book, but even in that sense I got a few ideas for the next version of MyQuery coming up soon), stuff that are interesting issues to think about, and which you might not agree with, but still interesting to think about.

Also, for all you hard-core Linux folks out there, I think this could be interesting to read, something written by a hard-core Windows guy, to get an idea on how the Windows developers of the world thinks, and why and what their arguments are. And Spolsky actually goes more into taking about differences here than about whats good and bad with Windows, Linux etc.

Also, I said that Spolsky is a hard-core Windows guy, and I am a Windows guy also, in part, but that doesn't mean Spolsky has many kind words for what is going on at Microsoft these days (oh, Spolsky used to work at Microsoft, which he writes about many times. It is no surprise that in THOSE days, little, if anything, was amiss in Redmond). .NET is not high-ranking in Spolskys eyes, and in this sense, it is interesting to see that in some cases he was right (this is complex, and getting .NET out there has been a long process), but that it actually get out there and has a reasonably large following, well that was also expected, but that so many developers actually seems to have embraced .NET with there hearts was apparently not expected.

And just to make things clear: As a Windows developer, I insist on C and on the Win16/Win32/Win64 API. No MFC, no C#, no .NET and above all, no VB.

/Karlsson

Thursday, July 16, 2009

Cross platform GUI portability, part 2

So, if you read my first post on the subject of cross platform GUIs, you probably think I missed one aspect, which is that on cross platform GUI toolkits. Don't these guys, in one shape or the other, solve this problem? If you ask me, the answer is no, and I will now set up to make my point here.

The problem is that the main issue persists: Do you want your app to look like Windows on Windows and like Mac on OSX. Always. For all intents and purposes? Or do you want your application to look and feel like "your application", on all platforms? Whichever path you choose, there are solid good reasons for both approaches. And that is the issue. wxWindows, Java, wxWidgets, all fine technologies, but they do not solve the basic issue.

Is the solution to make the GUI adaptable to both approaches on all platforms, and have it configurable to support all these aspects? In a sense, I think that is true, but on the other hand I feel that the effort needed to achieve this is terribly counter productive. It really is. It is cool and it looks nice, but in the end, really does little to drive technology forward, in a philosophical sense.

I have seen a few commercial toolkits doing this, with great technical success. The drawback:
  • These toolkits are usually expensive.
  • Now you have a third API to program against, not Win32, not OSX, but something else. Not to mention Gnome or KDE then.
  • What good does it do to your application, really? I have seen applications built with these toolkits expose a 100% Windows look and feel on Macintosh! And it looks real neat, and achieving the task of having Windows look-and-feel on a Mac is not an easy task, and these toolkits are advanced. But the value for the end user, once the interface they want to use is set? Not much.
To an extent, this reminds me of the 1980s, early 1990 mix of different networking technologies, where some vendors were pushing cross networking APIs. Usefule, well, sort of. Valuealble there and then, well, yes, to an extent. Useful in the long run? Nope. Definitively not at all. And then you have all this code, when you realize that TCP/IP is all you need to care about, written not against a TCP/IP library, but against some other kind of weirdo library that has a cost, but which provides you with cross network portability across TCP/IP, IPX/SPX, BanyanVines and DecNet. Which would be useful if it wasn't for the fact that no one is using anything else than TCP/IP.

Things aren't even close to that situation with GUIs, and I will make a shot at a cross platform wxWidgets based app later this year. But for now, I am pretty much convinced that independent how much I try, I have to make compromises between functionality, OS/GUI integration, usability and code overhead.
  • No, I am NOT going to sprinkle #ifdefs across my code.
  • Yes, I know I need to make compromises.
  • Yes, I am aware that my knowledge on Win16 7 Win32 is of limited use.
Despite this, I want to try it. And to easy the minds of my MyQuery users, no, the target for this will NOT be MyQuery, but something else...

/Karlsson
About to go to bed.