To begin with, time isn't such a simple type of data as we might look at it. Is a date just a count of the number of seconds since 00:00:00 on Jan 1 1970 or something like that? And we then convert it to a calendar and all is fine? Is time data really linear in that sense? No, it's not. Dec 24 2010 is Christmas Eve and when you get your Christmas gifts here in Sweden. Is this significant? Also, it's a Friday. That this data is a Friday can be calculated from the date or the seconds-since-jan-1-1970 counter, but that is a different thing. Looking at that counter, you cannot see that there is no way to look at that number and figure out that this day is special, but it is.
To manage this, we have the date and time functions in MySQL, which help us figure things like the above out, but there are issues with it still. To begin woth, the constructs quickly turn difficult. Secondly, you cannot really do an indexed search on temporal data, except pretty simple cases, such as searching for a data and time, before or later than a given data and things like that. A search for "all saturdays" will sure not use an index.
There are more issues though, like that fact that, as I have shown above, a temporal value as we know them above, contains hidden data, something that is considered bad (like the day of the week, the month, the fact that a data is a leap day etc).
And it doesn't end there, there are different calendars in use, in different parts of the world and in different inductries. If you have a mortage on your house, chances are that the interest you pay is the same each month, despite the fact that the interest in calculated over the full year and different months have different number of days. The reason is that interest uses a "360 day" calendar, where each month is assumed to have 30 days. And different financial instruments use diferent calendars (for example the one called 366 day, which is our normal calendar and the 365 day calendar. which is the normal calendar with the exception that there is no leap day). If you try to match one financial instrument to another, when they use different calendars, is not a lot of fun (I've done it, so I know).
But chances are that many of you now think that this is not a big problem for you, as you don't see people asking questions on data with complex temporal relationships. And to that I say that, there is a big chance that this is because:
- Temporal data is difficult. More so than simple integers and strings, disregarding the simple before date X or something like that. But we just don't ask the questions like "how much of the sales per quarter happen on the last day of the quarter" and things like that. Which doesn't mean the information isn't interesting, but the questions is difficult to formalize and even more so to execute in a database. Not to mention checking for overlapping dates and things like that.
- We are used to not being able to ask these kinds of questions.
One issue with all this is that databases are hardly geared up to support this, and a second is that the data to support this isn't available in large quantities yet. This is one of the things we are looking at at Recorded Future, but there is more stuff coming. The web is there with masses of data, now we need to make sense of all that data. One aspect which is there but which hasn't yet been explored much is just the temporal aspect of data.
/Karlsson