Thursday, November 22, 2012

Character sets, Collations, UTF-8 and all that

Yesterday at the first Swedish MySQL User Group real meeting here in Stockholm, I presented a talk on character sets, collations and stuff like that. If you read this blog, you know that I have written about this before, but the presentation I did yesterday was a fair bit more detailed. You can view the full presentation on slideshare:
 One thing I talked a lot on was collations and how they affect matters, and this has more of an impact than you think, in particular when using UTF-8. You would think that using UTF-8 most character set problems are solved (at least when using 4-byte UTF-8), but no. Collations are still added to this, and there are many of them and the effect of choosing the wrong one can be real bad.

Let me take an example. You would think think that using a UNIQUE or PRIMARY KEY on a text-based column (using something like a VARCHAR or CHAR type) in a table would ensure that any two strings are unique, but that two strings values that are different may coexist in two different rows. Think again.

A collation defines how characters in a character set are sorted and compared. And most localized collations have some weird attributes to them. There are things that linguistics think are reasonable for a particular language, and that are hence present in the UNICODE standard, but it might not be widely accepted by the community at large. So back to my original example. Let's say we are in Sweden, then 4 (yes, four) different collations may be applicable:
  • utf8 binary - This is a plain binary collation, comparisons are done on the binary value of the characters.
  • utf8_unicode - This is a pretty reasonable collations based on some generic compromise in UNICODE on how things are sorted, and are not sorted across the globe. Sort of.
  • utf8_general - This is a simplified, faster general variation compared to utf8_Unicode
  • utf8_swedish - This is a collations that is specific to Sweden with some interesting Swedish specifics.
So lets's see how this work in practice. Lets try a table that looks like this:
CREATE TABLE `utf8_table` (
  `swedishname` char(10) CHARACTER SET utf8

    COLLATE utf8_general_ci NOT NULL,
  PRIMARY KEY (`swedishname`)
) ENGINE=InnoDB;

What happens with this data:
INSERT INTO utf8_table VALUES('A');
INSERT INTO utf8_table VALUES('Ä');
In Sweden, these two are unique (the second A has an umlaut). In the rest of the world, these two are the same, so the above will not work, a PRIMARY KEY error will happen on the second row, despite the characters being different! So we try this instead:
CREATE TABLE `utf8_table` (
  `swedishname` char(10) CHARACTER SET utf8

    COLLATE utf8_swedish_ci NOT NULL,
  PRIMARY KEY (`swedishname`)
) ENGINE=InnoDB;

And with the same data:
INSERT INTO utf8_table VALUES('A');
INSERT INTO utf8_table VALUES('Ä');
And this works as it should work, both rows are inserted!

I will write another blog post on this soon, with some more examples eventually, but for now:
Cheers
/Karlsson
PS. I apologize if you have problems reading the above, it probably comes from the fact that there are embedded Swedish characters in the text :-(

Tuesday, November 13, 2012

This sucks! Well, maybe it does and maybe it doesn't...

Imagine that Microsoft and Apple got into a big fight for the market some 15 years ago and that Apple lost. Big time. Apple went down completely and there was nothing left. And as an IT expert, you were called in to look at what remained, what could be salvaged and what was just a waste of everyones time and money.

If you had seen the iPhone back then what would you have said? (I'm not so sure myself, chances are I would have been terribly negative). Note that there would have been no AppStore, no HTML5 sites, none of that neat stuff.

Or to make a different analogy: Was VHS better than BetaMax? Well, that depends on who you ask: The end consumer wanting to rent a movie or the techie looking at the specifications of the technology in question.

Just after the second world war, in a Germany in shambles, the allies went in and had that look at Germany, and with them they brought some smart dudes, to look at what was useful, what was not and what was rubbish. Reginald Rootes, who together with his brother Billy ran the Rootes Group, one of the big 5 producers of cars i Britain at the time, came along to, among other places, Wolfsburg to have a look at the VW plant. Despite being adviced that the VW was a viable product and seeing it himself, Reggie wasn't interested. Now, some 60+ years later, all the remaining Rootes brands and factories are long gone (the last one, producing Peugeots, closed in 2007). And VW is fighting with Toyota for the title of the world largest car maker.

All in all, stuff that might have serious issues, might be just because of development issues, and you need to look further down the road to see the potential. And don't make the mistake of thinking that the good or bad implementation of an idea says much about the real potential of that idea.

Take Virtualization. Running a database in a virtualized environment was a big no-no just a few years ago. Now things have developed, performance is much better and many of us can use a virtualized environment for many, if not most, of our database needs. Be it Oracle, MySQL or Postgres or whatever.

Go back 20 years and ask yourself how you would have reacted if someone told you that in 20 years, many large enterprises would have large parts of their infrastructure run on a operating system developed by a Finnish student in his spare time in an outsourced environment run by an Internet bookstore company? Nah, don't think so.

So what is the next big thing then? I try to spend some time on it, and when I get to test or try something, I really try to separate the implementation of the idea, be it a new operating system, a new type of access method or whatever, from the actual implementation. The latter says less about the former than you think.

Also, technology isn't everything. Far from it. The best technology doesn't always win. And as for the new technologies you look at, the usefulness and applicability of those isn't always what you think. Did the web turn out to do what we were expecting? What did you expect to be able to do with a cellphone some 15 years ago, besides making phone calls and sending text messages? I believe there is a synergy between the potential of a technology and the applications for it that is the drive forward. And don't be so fast to click that "This sucks" button.

/Karlsson
Sorry for this post not being that MySQL focused, but I think it is still applicable. Even if the implementation sucks.

Tuesday, October 23, 2012

MyQuery 3.5.0 available

I just made MyQuery 3.5.0 available for download from sourceforge. If you do not know about this program, this is my Windows based MySQL Query editor and database monitor program. And yes, it is Windows only.

It was a long time since I updated MyQuery, but I have been been at it for a long time, but slowly as I haven't had the bandwidth, and I have collected a few things I wanted in this release, but of all those things, many has to wait for now, as decided to add a major feature which did require a fair bit of coding and testing: autocomplete. When editing SQL statements, autocomplete for names of tables, columns etc is really useful, but it is not as easy as it sounds. What makes this a bit complicated are things like the default database (which I need to figure out for anything that is typed), table aliases and separate name spaces (which is a good thing, but makes like in this instance difficult).

Table aliases are particularly nasty as MyQuery really doesn't parse the SQL but just uses the current names of all objects. So when I type table1. you might expect a lit of all columns in the table table1 to be listed, but it isn't as easy as that as table1 might be an alias for another table. The way I manage this right now is to list all columns in all tables when I see something which looks like a table name (table1 in this case), but there is no such table (and no such database for that matter), I list all columns in all tables in the current database. This might not be what you want, but it's the best I can do right now.

To be honest, I really think autocomplete is a terribly useful feature, in particular when dealing with databases with many tables or tables with long names (such as when working with the information_schema database).

Another thing I have added in this release is a more flexible way of dealing with database status variables, which can now be organized into groups, which are easy to manage and which are persisted.

What I want to add is stuff like MySQL 5.6 information_schema additions, MariaDB specific functions and engines, JSON load and unload and some other MySQL 5.6 stuff. In addition, I'd like to do some integration with my sqlstats plugin. But now I have at least finished MyQuery 3.5.0, and I'm quite happy with it. If you have used earlier versions of MyQuery, this version also fixes a bunch of minor memory leaks and at least one crashing bug (which happened in some instances in the connect dialog).

Happy SQLing!
/Karlsson

Tuesday, October 9, 2012

Too many or too big open MyISAM tables??

MySQL tries to be smart and to cram as much performance out of available hardware, and one thing that most MySQLers knows is that opening a table (ie. opening the frm file, and depending on the storage engine, any additional files related to the table) is expensive from a performance point of view. When we see the opened_tables status variable increasing, we tend to increase the table_open_cache setting and keep as many tables as possible open and avoid reopening them.

When it comes to MyISAM though, this has a nasty side-effect. When the server crashes and there are MyISAM tables open, these might well need to be REPAIRed. And if you have big MyISAM tables, this is an issue. Let's say that your application use several MyISAM tables, with the same content, and that you create new tables after a certain time, to keep the size of each individual table down? There are other reasons why you have this effect also, say a large MyISAM table that this rarely updated or read from, but suddenly it is, and then it is kept alone again? When a crash occurs, you might be stuck with several MyISAM tables that are open, but have not been accessed in a long time so might not need to be open, but still there is a risk that these will require a long and boring REPAIR.

There is a Feature request for this; 67142, but I was thinking that instead of having low level server code do this, this would be a good thing to implement using EVENTs. So let's give it a shot.

To being with, I need to figure out when a table was last touch. MySQL doesn't record when a table was last read from, but if we are OK with flushing tables that haven't been written to in a specific time, then the update_time column in the information_schema table does the trick. To figure out what tables to flush, I need to select table and database names from this table for all tables that haven't been modified within a certain time. Also, I must filter so I only get MyISAM tables, and make sure that I don't hit the MyISAM tables in the mysql database. An appropriate SELECT may look like this:
SELECT table_schema, table_name
  FROM information_schema.tables
  WHERE engine = 'MyISAM' AND table_schema != 'mysql'
    AND UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(update_time) > <seconds>


That's not enough of course, all this gets me are the table and database names, we need to do something more than this. But let's start from the beginning. An EVENT will run an SQL statement, and in my world, there is mostly one statement it should run, which is a CALL to a stored procedure. And in this case, this procedure should use the SQL above to figure out what tables to flush, and then do the actual flushing. As the FLUSH command, when using inside a MySQL Routine, will not take any parameters, we have to run this as a PREPARED statement. All in all, when we end up with is something like this:
DELIMITER //
DROP PROCEDURE IF EXISTS myisamflush;
CREATE PROCEDURE myisamflush(p_sec INTEGER)
BEGIN
   DECLARE v_no_data BOOL DEFAULT FALSE;
   DECLARE v_database VARCHAR(64);
   DECLARE v_table VARCHAR(64);
   DECLARE v_dummy INTEGER;
   DECLARE cur1 CURSOR FOR SELECT table_schema, table_name
     FROM information_schema.tables
     WHERE engine = 'MyISAM' AND table_schema != 'mysql'
       AND UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(update_time) >

         p_sec;
   DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_no_data = TRUE;

   OPEN cur1;

-- Loop for all MyISAM-tables that haven't been updated

-- for p_sec seconds.
   FETCH cur1 INTO v_database, v_table;
   WHILE NOT v_no_data DO
-- Create the FLUSH statement.
      SET @tmpstmt = CONCAT('FLUSH TABLES ', v_database, '.', v_table);

-- Prepare and execute the statement.
      PREPARE stmt1 FROM @tmpstmt;
      EXECUTE stmt1;
      DEALLOCATE PREPARE stmt1;

-- Get the next table / database.
      FETCH cur1 INTO v_database, v_table;
   END WHILE;
   CLOSE cur1;

-- This is to overcome a bug that causes a warning,

-- even though the warning was caught.
   SELECT COUNT(*) INTO v_dummy FROM mysql.user;
END;
//
DELIMITER ;


Note that noop SELECT at the end? In some versions of MySQL, when a handler is run, like in this case the CONTINUE HANDLER FOR NOT FOUND warnings will still be around here, which means the execution of the procedure will finish with a warning. To get rid of that, I issue an SQL statement that always runs.I'm not sure why this is happeniing, and for it might be that I have a mistake somewhere, but I do not think so.

So far so good, now all we need is an event, in this case I will run every 15 minutes and flush tables that hasn't been used in the last 15 minutes, but you can set this to anything:
DROP EVENT IF EXISTS myisamflush;
CREATE EVENT myisamflush
  ON SCHEDULE EVERY 15 MINUTE
  DO CALL myisamflush(900);


Was this enough? Maybe, but to get this working check if the event scheduler is running:
mysql> SHOW VARIABLES LIKE 'event%';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| event_scheduler | OFF   |
+-----------------+-------+


In this case, it wasn't running so we should start it (and we should also modify the config file so that the event scheduler is running when the server is restarted, but that is a different story):
mysql> set global event_scheduler=1;

Before we finish up, note that everything above assumes that you are running with root privileges or similar.

Cheers
/Karlsson

Monday, October 1, 2012

Moving on and flying high...

As of today, Oct 1, yours truly is again working closer to the MySQL community. I have had 2 great years at Recorded Future, no doubt, and I have had a chance to work with some cool technologies and learn quite a few things I didn't know before. But not it's tim to high the sky, and I am from today working as Sales Engineer at SkySQL.

This is going to be fun, I will keep up my blogging, I hope to keep up my MongoDB investigations and above all I'll be involved with the SkySQL Cloud offerings!

That's all for now, but if you know me, that's not the end of it, far from it!

Cheers
/Karlsson

Tuesday, September 11, 2012

MySQL Configuration Wizard - Don't always trust it...

So, the MySQL Configuration Wizard for making MySQL Easier to use on Windows will just plainly aid in setting up a safe and sound MySQL Configuration. Right? Just run it and you are safe. OK? But thing do break, specifically harddrives and stuff, so to keep your database in shape you have a second set of disks in your Windows machine, for all that data that you do not want to loose in RAID configuration. Or you might even connect, say, your D:-drive to an expensive but safe SAN! Then you run the MySQL Configuration Wizard and tell it to place your InnoDB data files in that D: drive. Phew! Now you are OK! That RAID-set was expensive, but better safe than sorry! Now you can place all that important data into MySQL, and what's left on the C:-drive is just some programs, which can be replaced if necessary, but the data is safe!

NO! That's NOT how it works, not at all. There are two things wrong here:
  • Just because you move the InnoDB files, which might well be the storage engine you use, doesn't mean that the other MySQL datafiles are there. Fact is, you CANNOT change where MySQL places the other datafiles, they get into C:\ProgramData\MySQL Server 5.5/data whatever you do! Why this is so I do not know, but if you want to change this location, you have 2 options:
    • Edit the my.ini file generated by the Config Wizard. And change the datadir setting and move the files and directories from the C:\ProgramData\MySQL Server 5.5/data directory to the directory where you want the files on your D:drive.
    •  Instruct Windows beforehand where you want your datafiles, which is real easy, just change the setting in the registry that determines where ProgramData is: HKLM\Software\Microsoft\Windows NT\CurrentVersion\ProfileList\ProgramData. Real easy and safe, right :-)
  • If you are OK with the MySQL other datafiles, and just want the InnoDB files to be safe, you still have to fix one thing: The innodb_log_group_home_dir setting. This follows the datadir setting, which means that if you move the InnoDB datafiles to another drive using the Config Wizard, the transaction log files will not move with it! They remain in C:\ProgramData\MySQL Server 5.5/data! Ouch! See this bug:66769
/Karlsson

Thursday, September 6, 2012

MySQL Server installer on Windows

I don't what makes this so hard for MySQL Developers at Oracle. Look, I know you guys are trying hard, that's not it, and I also know that chasing SQL Server is a priority, and that is fine. But for that to work, the MySQL MSI Installer has to get better! It just has to. Like some basic issues like these:

  • You can not use the MSI installer to install two MySQL servers on the same box. If the installer seems an old installation, it insists on upgrading it if you are installing a newer version, and if you are installing an older version, well, that will be refused.
  • You cannot, using the MSI installer, install both 32 and 64-bit versions on the same box. If you try that, even if the versions are the same, the second install insists that what you have already installed is newer than the one you are installing, although the have the same version number, but one is 32 bit and one is 64 bit.
  • In the Instance configurator when it comes to the data directory location, if you select "Installation path" you might think that the data directory will be placed where you are installing MySQL right now. Think again, this is not so. What this means is that if you install MySQL on the box, and MySQL hasn't been installed there before, then Installation path means just that, So let's assume that you choose to install MySQL in C:\foo, then the data directory will be C:\foo\data. Which is fine. If you then uninstall this version of MySQL and then install some other version of it, this time in, say, C:\bar, then you would image that the data directory would be C:\bar\data, but it will still be C:\foo\data. I'm not sure where this is stored, but it is somewhere and it's really annoying.
So, if a developer wants to run with the 32-bit version of the MySQL library (say because he / she is using a 32-bit driver for MySQL), then you cannot use the MSI. If someone would have given this an extra round of thinking, then the MySQL 64-bit server installation, would have included both the 32-bit and the 64-bit versions of the client. Right? I'll see if I have the bandwidth and energy to report these issues as bugs.

Will the MySQL Installer help then? That is a 201 Mb package that contains, according to the download page "All MySQL Products. For all Windows Platforms. In one package." Right, that should do the trick then? Nope, once you have made the choice to install MySQL Server, be it 32 or 64 bit, you can only install one of them.

Look, I'm trying hard not to be a grumpy old man here, but it is difficult (partly because I'm getting old I guess). I am aware that I can always download and unpack the ZIP archive (which is what I will do now), but please folks, this is on Windows and I'd rather stay away from that on that platform. And Oracle / MySQL took the effort to create not just one, but 2 installers, but none of them allow you to install 2 MySQL servers on the same machine, which is too bad.

And by the way, no just because I am installing MySQL and then uninstalling it, and then installing it again, doesn't mean I want to use the same data. It just doesn't.

And I am afraid that if you think MariaDB is any better, think again, that is NOT the case. Too bad.

/Karlsson