Karlsson on databases and stuff

Sunday, October 24, 2010

It's about time

TIMESTAMP and DATETIME is how dates and times, or temporal, data is represented ion MySQL. These types are combined with a bunch of operators on these types, such as ADDATE and TIME_FORMAT and such. So is this enough to effectively manage temporal data then? Nope, in my mind, far from it.

To begin with, time isn't such a simple type of data as we might look at it. Is a date just a count of the number of seconds since 00:00:00 on Jan 1 1970 or something like that? And we then convert it to a calendar and all is fine? Is time data really linear in that sense? No, it's not. Dec 24 2010 is Christmas Eve and when you get your Christmas gifts here in Sweden. Is this significant? Also, it's a Friday. That this data is a Friday can be calculated from the date or the seconds-since-jan-1-1970 counter, but that is a different thing. Looking at that counter, you cannot see that there is no way to look at that number and figure out that this day is special, but it is.

To manage this, we have the date and time functions in MySQL, which help us figure things like the above out, but there are issues with it still. To begin woth, the constructs quickly turn difficult. Secondly, you cannot really do an indexed search on temporal data, except pretty simple cases, such as searching for a data and time, before or later than a given data and things like that. A search for "all saturdays" will sure not use an index.

There are more issues though, like that fact that, as I have shown above, a temporal value as we know them above, contains hidden data, something that is considered bad (like the day of the week, the month, the fact that a data is a leap day etc).

And it doesn't end there, there are different calendars in use, in different parts of the world and in different inductries. If you have a mortage on your house, chances are that the interest you pay is the same each month, despite the fact that the interest in calculated over the full year and different months have different number of days. The reason is that interest uses a "360 day" calendar, where each month is assumed to have 30 days. And different financial instruments use diferent calendars (for example the one called 366 day, which is our normal calendar and the 365 day calendar. which is the normal calendar with the exception that there is no leap day). If you try to match one financial instrument to another, when they use different calendars, is not a lot of fun (I've done it, so I know).

But chances are that many of you now think that this is not a big problem for you, as you don't see people asking questions on data with complex temporal relationships. And to that I say that, there is a big chance that this is because:

Temporal data is difficult. More so than simple integers and strings, disregarding the simple before date X or something like that. But we just don't ask the questions like "how much of the sales per quarter happen on the last day of the quarter" and things like that. Which doesn't mean the information isn't interesting, but the questions is difficult to formalize and even more so to execute in a database. Not to mention checking for overlapping dates and things like that.
We are used to not being able to ask these kinds of questions.

But really we should be able to ask, query and get data with a lot more temporal information. When was the last time you asked a temporal data query in Google (not that you CAN do this). Like, how has the matching or IBM and Pressrelease changed over the last 10 years, i.e. how many Press releases did IBM post in 2001, in 2002 etc. Yes, the data is there somewhere, querying it is difficult. And using temporal data combined with data you get on the web should allow you to determine more than what you can today (i.e. IBM and Press release matches X number of documents in Google right now). Adding a temporal axis to web data should allow you to determine what might happen.

One issue with all this is that databases are hardly geared up to support this, and a second is that the data to support this isn't available in large quantities yet. This is one of the things we are looking at at Recorded Future, but there is more stuff coming. The web is there with masses of data, now we need to make sense of all that data. One aspect which is there but which hasn't yet been explored much is just the temporal aspect of data.

/Karlsson

Tuesday, October 19, 2010

MyQuery 3.4.0 Released!

I have not finished MyQuery 3.4.0, after a long time of weekend and spare-time development. This version adds some features that are basically Scintilla featires that I have integrated with. This includes error indicators, margin markers and some other things. In particular the margin markers changes a few things in the interface: When running up to, or continue from a point in teh script, then use the margin marker instead of the current line! If you don't like this, then don't set the margin marker and the current line will be used just like in old releases.

Another addition this release is Online help. I get the help contents from the database that you connect to, which is installed with most MySQL installations in the tables called help_xxx in the mysql schema.

External program has a few more options, and in this release the installer installs a few of those, just to show how it works, like the MySQL schema in an editor window and direct access to the MySQL commandline.

In addition to this, and a few more minor features, there is a whole bunch of bugfixes in this release, which doesn't mean that some doesn't remain. Above all, I have reengineered how I handle multiple access to the MySQL connection.

Download it from Soureforge! I will follow up this post with some more on how to use MyQuery, how to build extensions and plugins and some things like that!

Happy SQLing
/Karlsson
PS. If you have issues with MyQuery, drop me an email, but note that I am now at anders@recordedfuture.com!

Monday, October 18, 2010

Moving on, but not far

As of today I have a new employer, and I no longer work for MySQL / Sun / Oracle, instead I am now Database Architect at Recorded Future (http://www.recordedfuture.com). This is exciting stuff, really so. RF pulls data from teh Web, but instead of just allowing searching, RF will analyze what it find to spot trends and things like that.

It is still early days, we need to collect a lot more data than today, and managing that will be one of my challenges. But an interesting and fun challenge!

This doesn't mean I will not maintain MyQuery anymore, I sure will. And Recorded Future is by the way MySQL users, so I will probably work even more with MySQL related stuff than today. I will also continue to blog about what we are doing, and about other database related things as well. In particular, I will work with and blog about maintaining very large databases!

So watch this space, and see you soon
/Karlsson

Thursday, October 14, 2010

MyQuery 3.4 soon to be released

I've been spending some cycles now and then on MyQuery 3.4. One thing I wanted to do with this release was to fix some long-standing bugs I knew about, but hadn't identified (Are you seeing ReleaseSemaphore errors? That's one of them). Also, I had a couple of issues with the new Plugins, as well as some ideas for new features, such as OnLine Help.

In addition to that, I also felt that the Scintilla editor controls tht I was using wasn't used to it's full potential.

As I have the new 3.4 in front of me now, I have done something about all those things. I have cleaned up some of the more recent code, I have added OnLine help, in this case I get that from the MySQL help tables. I am pretty happy with how a few more steps to integration with Scintilla could add so much more usability, Scintilla really is a great editor control with a lot of potential, I haven't reallly explored it fully yet, and there is more to do. Also, Scintilla is really solid.

What remains to be done is to test, package and publish MyQuery 3.4. This will be a Beta, but I will release a GA based on it real soon. Before that, I have one or two plugins I want to add (Replication monitoring for example). Also, I have a plan to add some more advanced error handling, using Scintilla to show erroneous text in a better way than today. But all in all, we are now close to GA.

Cheers and see you soon!
/Karlsson

Monday, September 13, 2010

More on MySQL Cluster monitoring with MyQuery 3.3

A few days ago, I posted a blog on monitoring MySQL Cluster / NDB with the aid of MyQuery accessing the ndbinfo tables. Now, there are more ways than that to query the status of MySQL Cluster / NDB, you can use the MGMAPI, which is the low level API used to monitor and manage a NDB Cluster setup.So, if this API is available, and is accessible from C, and MyQuery has the ability to be extended with DLLs written in C or C++, then why not create a MyQuery plugin for managing a NDB Cluster? Tell you what, that is already part of MyQuery, which has by the way had a few bugfixes and is now up to version 3.3.4.

To get started, if you haven't done so already, download MyQuery and install it, and make sure to install the plugins / User defined tools. Once you have done that we are sort-of ready to go, but first a few words pf caution regarding the sourcecode.

If you haven't used any of the NDB low-level APIs, i.e. NDBAPI and MGMAPI, let me fill you in on a few details. NDBAPI is used for normal database DML and DDL, but it is much more low level than SQL, but also much MUCH faster. NDBAPI is strictly a C++ interface, although there are other bindings (Java) on top of it. MGMAPI on the other hand, which is used for monitoring and managing an NDB Cluster, is C on the surface, meaning the the interface used no classes, no overloading no nothing, and most MGMAPI code I have seen uses straght C. Bit this isn't the whole truth. Some NDBAPI includefiles that are also needed by MGMAPI still have no C++ specifics, but one thing it uses, which is valid in C++ but not in C, is to define structs with no members (don't ask me why), like this:

struct ndb_logevent_STTORRYRecieved { };

So even though the MyQuery plugin that I will show later is written in C and is contained in the file NdbMonitor.c, it still must be compiled as C++ code. In Visual Studio 2008, this is done by right clicking on the file in Solution Explorer (yes, the file NdbMonitor.s, nit the project, that will not work!) select the Properties menu option and in the dialog go to C7C++ and then Advanced and then change Compile as to Compile as C++ Code. Also, when you look at the MyQuery Solution, note that the NdbMonitor project is compiled and linked against MySQL Cluster, and in fact, this is not the binary MySQL Cluster that you download from MySQL, rather this is built from source, see more in my previous blog postings on the subject.

OK, so that's it for the sourcecode, now let's look at what this little puppy can do when we run it (and it's part of the biinary MyQuery download by the way, so you do not have to use MGM API yourself or anything). To use this tool, after installing MyQuery, connect to an SQL node in a NDB Cluster and then select the Tools->NDB Monitor menu option. This will show a dialog like this:

If this is the first time you use NDB Monitor, the default Cluster connection string is filled in the connect string edit box, and adjust this to reflect your NDB Cluster setup. You may connect to NDB Cluster on any platform here, not just Windows. Then click Connect and the current Cluster status will be shown when you have connected to the Cluster:

Most of the information here should be pretty obvious. The traffic lights are green if all datanodes are up, yellow if one is down and red if two or more are down. To refresh the status click Refresh and you may automatically refresh by turning on the Autorefresh checkbox and enter the number of seconds between refreshing.

There are two tabs, one for the nodes themselves and one for the node groups. When the node view is active, you can right click on a node to bring up a menu that allows you to do one of three things:

Stop a node - This will bring up a dialog that allows you to stop a node, gracefully or just kill it.
Restart node - This will bring down a node and then restart it. You have the option of leving it in "non-started" mode also.
Start node - If you have done a restart and left a node in non-started mode, then you can start it using this menu option.

And that's about it, NdbMonitor isn't more fancy than that just now, but I am planning a few more things:

Cluster backup - This is not that hard to implement, I just haven't gotten through to it yet.
Cluster log management - This is a bit more difficult, but it involves subscribing to the cluster log and showing log contents in a new tab.

If you have more ideas for future versions of NdbMonitor, let me know!

/Karlsson

Friday, September 10, 2010

MySQL Cluster users - Check out MyQuery 3.3.2

In MyQuery 3.3, where I just released version 3.3.2 which includes some MySQL Cluster NB specific fixes, there is some support for accessing MySQL Cluster / NDB beyond what is available as ndbinfo table and such things. MyQuery is a Windows based MySQL GUI, if you didn't know that already, and just because you run this tool on Windows (it is only supported on Windows), doesn't mean you cannot monitor a Cluster that runs on, say, Linux. Or Windows, should you so wish.

There are 2 predefined tools that support MySQL Cluster there, one that uses the ability to predefine a SQL statement that runs in a non-modal dialog and is optionally updated and one that uses the feature that allows a DLL, using a sepcfic MyQuery API, to run as a tool, and in this case, this tool is linked up with the NDBAPI to allow monitoring and a certain amount of control overthe Cluster.

Monitoring NDB Cluster memory usage
In MySQL Cluster / NDB 7 and up, there are tables in a schema specific to NDB, called ndbinfo that includes a bunch of tables with NDB specific information.
Defining a tool in MyQuery that monitors memory usage, like this tool. by selecting from a table is real easy. The Memory usage monitor is included with MyQuery 3.3 and up, so you can just download MyQuery and then connect to the Cluster and use it, but possibly you want to create tools of your own for other ndbinfo tables or something completely different.
Assuming we didn't have the NDB Menory tool already, and wated to create it, this is how you wold do that:

First, test the SQL statement you want to use in the MyQuery editor. Always use fully qualified table names (i.e. prefixing them with the database name), unless you have a specific reason not to do this. In this case, we end up with this statement:
select * from ndbinfo.memoryusage
The next step is to create the tool. Select the Settings -> Configure user tools menu option, and you will get a dialog that shows all the tools that are currently defined. We are to create a new tool, so click the new button.
In the dialog that pops up, give the tool an appropriate name and select a type. In this case, the type is SQL SELECT. Then click OK.
A new tool is now created for you, but it will not be saved until we have added some required info, in this case teh SQL statement. In this case, if you have the SQL text in the editor, click the Copy button and select Copy from editor, else just tpe in the SELECT statement you want to run.
We probably want to tool to be shown in the Tool menu, and hence be accessible, only when it will work properly, in this case, the ndbinfo plugin is required. For this, select the Enabled for plugin radio button and then select the ndbinfo plugin in the listbox.
The last thing we want to do is select the connection to use. There are three options here: Use the connection used to execute normal SQL in MyQuery (Use execute connection), use the connection normally used by MyQuery tools (if MyQuery isn't connected with dual connections, then these two options are the same of course), which is the default (Use main connection) or use a completely separate connection. In our case, just use the default.
For tools like this, you would usually not enable the Modal dialog option. This is an option only used in certain special cases.
Finally, you might want to add some Help text.
That's it, now we have a new tool! Click OK and you are done!

Now, in the Tools menu, your newly created tool should appear. Select it, and a non-Modal dialog will appear, just like the one for the NDB Memory info tool. It should look something like this:

As you can see, there is the ability to refresh the contents automatically, to refresh every 2 seconds, just check the Autimatic refresh checkbox and set the time to 2 and you are all set. Let me know if you find out any really cools ways of using this feature for tools, and if they are generic enough, I will include them in a future release of MyQuery.

Happy Clustering!
/Karlsson

Thursday, September 9, 2010

MyQuery 3.3.1 releases - Fixes MyQuery 3.3.0 platform issues

I have released MyQuery 3.3.1 now. This is a minor release in many ways, but still significant in others. MyQuery 3.3.0 had issues on platforms where the runtime library I used wasn't installed. Also, there was a minor (but crashing) bug that caused Ndb Monitor to fail. All this, and a few other minor things, are fixed now.

I will eventually blog a bit more on the new features in MyQuery 3.3, such as the plugins, the InnoDB Lock monitor and how to easily define your own monitor dialogs with a simple SELECT statement.

Til then, download MyQuery 3.3 from Sourceforge.

Cheers
/Karlsson
And by the way, I'm sorry that I failed some QA with version 3.3.0, but it's just me coding away here, and I have a limited amount of machines and above all, time.