A long time ago, there were no objects. There was clean, cold code, and the closer to the hardware the faster, not only because it is faster that way, but also because not just anyone would write code in those days. Then came programming languages in the 1950's and structured programming in the 1960's (the first commercial software offering was Auto-Flow, a program that would generate flowchart prints from existing programs written in Cobol, Fortran etc. Auto-Flow dates from the early 1960's).
The 1970's came around, and besides awful clothing styles and horrible interior decoration colour schemes, and things such as ABBA, we got Personal Computers, sort of, and the Relational Database (Oracle was first with an true software offering in this arena, in 1977).
In the 1980's Object Orientation, which had been rather esoteric and academic in the 1970's, turned into reality. (Also, Oracle got BIG). And if you ask me, this was a good thing, if it wasn't for what happened in the 1990's.
The 1990's saw, among other things, my (still) favorite hangout Akkurat open in Stockholm serving English Cask Conditioned Real Ale and Belgian Beers, among other things. One of the worst things that I think happened in the 1990's though was that Object Orientation, which had so far been a real cool way of writing organized code, which in itself also documented, in the code, relationships and interactions between objects, into a Religion. Oh my, among the most awful things was getting into a customer where an OO consultant had taken control. You couldn't even go to teh bathroom without having a crapper object with number one and number two methods. Awful stuff.
And in the midst of that came... Yes, you know what I am talking about, the ORDBMS and OODBMS, the Object relational and the Object Oriented Databases. I was out there, trying to convince people to buy the ORDBMS I was promoting at the time, something I was doing with some, but still limited, success. I admit it, I was wrong, this was, in the general case, not a good idea!
Looking back, I think I know what I think the problem is with the OO database things. At least I have an opinion (you expected that I guess) and here it is: OO is fine, it's a great way or organizing data in your code, but just because can make code good (it doesn't HAVE to, just as little as C automatically makes your code "portable"), it doesn't necessarily make your data any good.
One thing here is that there is lots of code that use your data, and there are many way to look at data at. So your code makes do with a view of data that fits whatever task your code is set out to do. But the same data may well be applicable to another use, using some other, completely different, structure, relationships and methods. The mistake, in my view, is to say that just because I can process data in an OO way, my data should be organized in an OO manner. I do not think so.
Another thing is that, when looking at data, the Relational model is brilliant. It allows you to view data from any angle, independent of how it will later be processed, which is the key here: Data is data, and processing of that data is processing of that data. And yes, these two things are linked, but that doesn't mean they have to have the same structure.
A third things is that OO from the start was a developer thing. It allowed you to write black-box style code, it allowed smarter code, and it encouraged structure to the code and made it easy to provide and enforce strict interfaces between different aspects of the code. But it was still about code. I think nearly every highly paid OO evangelist of the 1990 was looking at OO from the POV of code, and not data.
My view is a bit like this, when it comes to relational databases and different kinds of programming methods and whathaveyounot. The RDBMS is like a container, you can fill it with whatever you want, but it is just contents, it will not move or do anything, it will just stay there in the container, and the bookshelf you put in there is there the next time you open the container (hopefully). The transport mechanisms, the trucks, ships, air carriers, satellites and what have you not, is the processing we do. You pick the best processing for the job. There may be better ways of transporting your data than a container, but because all containers look the same, this creates so much of an advantage as the disadvantage of not filling up every square inch of the ship is minor.
And then you have the standard things that allows your container to be loaded, moved, fastened, unfastened, lifted and loaded again etc in a standard way. This is your query language. Of which SQL might well not be the best (I don't think it is), but that is beside the point, that everybody uses it is what the advantage is. It really is.
And when I pack my container to move it, I don't have to worry about if the container will fly, go by truck or be shipped. I really do not care, I just fill it with stuff. Just as I just provide my database tables.
And another things, looking at data in the shape of fixed named columns and variable number of rows is something that everyone, more or less, can understand. It's a universal language that goes WAY beyond the RDBMS.
To conclude, no, I don't think OO sucks, I think it rocks! But for code, not for data. And no, I don't think that your data is good just because it's in an RDBMS. And no, I do not even think an RDBMS is good for everything! There are cases, many of them, where all you need is an organized persistent storage for your programs data: No-SQL is great for that. But if data is data, and not just an extension of the RAM occupied by your program, then an SQL based RDBMS is often (but not always) your best bet!
/Karlsson
I am Anders Karlsson, and I have been working in the RDBMS industry for many, possibly too many, years. In this blog, I write about my thoughts on RDBMS technology, happenings and industry, and also on any wild ideas around that I might think up after a few beers.
Thursday, February 3, 2011
Wednesday, January 26, 2011
Speaking at the MySQL UC...
I will be speaking at the MySQL Conference & Expo in April (The conference is on April 11 - 14 see more info here: http://en.oreilly.com/mysql2011). The subject for my speach is Large datasets in MySQL on Amazon EC2 and this is pretty appropriate. If you have been reading my blog before, you might know that we run all our systems on Amazon here at Recorded Future, and to be honest, at first I wasn't impressed. Then I learned to live with it, and now I'm a convert!
But I think Amazon might have done a bad job at selling and presenting this service to the world. The pro's are not what people think (cost, for example), and there are cons you didn't expect either. What the real issue is to get the right software that is at least to some extent EC2 aware. Yes, you run on your choise of operating system (we are on Ubuntu), and yes, it sure looks like your standard Linux box out there, but a bit more flexible. But this is not the whole truth.
The key in EC2 is the E as in Elastic. Here, in our EC2 environment, I want software that can scale. A lot. Here, I want software that is flexible (can be started, stopped, moved, extended, reconfigured). Here, I do NOT want a distributed, but static, clustered system.
So, see you in Santa Clara in April!
/Karlsson
But I think Amazon might have done a bad job at selling and presenting this service to the world. The pro's are not what people think (cost, for example), and there are cons you didn't expect either. What the real issue is to get the right software that is at least to some extent EC2 aware. Yes, you run on your choise of operating system (we are on Ubuntu), and yes, it sure looks like your standard Linux box out there, but a bit more flexible. But this is not the whole truth.
The key in EC2 is the E as in Elastic. Here, in our EC2 environment, I want software that can scale. A lot. Here, I want software that is flexible (can be started, stopped, moved, extended, reconfigured). Here, I do NOT want a distributed, but static, clustered system.
So, see you in Santa Clara in April!
/Karlsson
Sunday, January 2, 2011
Not so much on databases, more on Open Source and Democracy
Those who read what I write here know that I am sometimes pretty opinionated, and that I have some strong feelings about some things. I am not to discuss politics here, but I will discuss one topic that that is one of the subjects closest to my heart, a thing that I think is crucial in todays world, and that has been a driving force in making the world what it is, both good and bad. That things is democracy.
Way back when I was 20-something I begun to appreciate what is sometimes called "Adult cartoons", then not Adult in the sense of being sexually explicit, but rather Adult in the sense that these cartoons are about things closer to us adults. I don't know about the rest of the world, but in Sweden this is pretty popular (you might have seen the "Rocky" cartoon for example, published in many languages, which is pretty typical of this genre).
You may think what you want about something like the "Rocky" cartoon, that is OK with me, but I still read it and enjoy it. Which is not to say that I don't read other books too, quite the opposite, I read newspapers, magazines, novels and fact-books every day. But if you disapprove Rocky, which you are free to do, that is OK, as long as you do not try to stop me reading Rocky just because you don't like it. That is censorship, plain and simple.
All this is not to say that we can publicly say or state just about anything. If what we state is considered harmful in some way to someone else, in a practical sense, then we should think again. But that is for the courts to decide. Just because someone dislikes this or that work of art, does not mean it should be banned. If you don't like a particular painting, book, cartoon or TV-show, then just don't view it. Simple. If a published material causes you harm in some way, for example if it humiliates someone personally or some ethnic etc. group, then let the courts decide. These are not simple thing, by any measure, but it is something we have to do as a democrcy: Decide what is harmful in such a way so that it really should be pubishable to state it publicly, and what is not.
Someone who should NOT decide this is some commercial entity with it's own agenda. To protect the innocent, I use the name of an appropriate fruit to name the specific company I am talking about here, so we call them Grapefruit Inc.
The company in question has allowed the Swedish magazine "Galago", which an Adult (not sexually explicit) cartoon magazine, to publish their "App" only under the condition that Grapefruit Inc can censor the contents. This is not the first time this company does this, and the affects, among other people, users of the products noPhone and noPad from Grapefruit Inc.
And before I go on. Yes, I do have a noPad. And I lov it, hardware is great, as is the software, it truly brilliant. But as the deals with publishers aren't open, I will not pay for any Apps, as I will not know if, and if so what, has been censored.
Another magaine with the same issue is a Danish IT-magazine that has published quite a few positive articles on Android. One thing with an Andriod phone is that it is not censored (again, if you ask, Flash is pretty crap technology, but it is out there and it's being used on sites I want to visit. Not so with my noPad, but my HTC Desire does it like a charm. As well as acting as a 3G WiFi router). What's next? Will they ban TCP/IP because IPX/SPX is better (which it probably is)?
In a way, this is the perfect mix of censorship as seen in, say, the old Eastern German, with modern day market economy: You get censorship, but you have to pay for it!
So, who comes to the rescue? Android, which is Open Source! No censoring, thank you! Open Source delivers! As for myself, who has spent 6 years with MySQL as a Sales Engineer, I have been on the look for the "killer reason" to go Open Source over all those years. To me, this is it, a better reason than any other I have seen: No censorship!
To conclude, I am aware that this is a very opinionated posting, and must be very clear that the opinions expressed here are my own only, and are not related to my day job. I would hope that my employer shares some of my views, but again, what is expressed here is my own thinking, and mine alone. Competely uncensored!
/Karlsson
Way back when I was 20-something I begun to appreciate what is sometimes called "Adult cartoons", then not Adult in the sense of being sexually explicit, but rather Adult in the sense that these cartoons are about things closer to us adults. I don't know about the rest of the world, but in Sweden this is pretty popular (you might have seen the "Rocky" cartoon for example, published in many languages, which is pretty typical of this genre).
You may think what you want about something like the "Rocky" cartoon, that is OK with me, but I still read it and enjoy it. Which is not to say that I don't read other books too, quite the opposite, I read newspapers, magazines, novels and fact-books every day. But if you disapprove Rocky, which you are free to do, that is OK, as long as you do not try to stop me reading Rocky just because you don't like it. That is censorship, plain and simple.
All this is not to say that we can publicly say or state just about anything. If what we state is considered harmful in some way to someone else, in a practical sense, then we should think again. But that is for the courts to decide. Just because someone dislikes this or that work of art, does not mean it should be banned. If you don't like a particular painting, book, cartoon or TV-show, then just don't view it. Simple. If a published material causes you harm in some way, for example if it humiliates someone personally or some ethnic etc. group, then let the courts decide. These are not simple thing, by any measure, but it is something we have to do as a democrcy: Decide what is harmful in such a way so that it really should be pubishable to state it publicly, and what is not.
Someone who should NOT decide this is some commercial entity with it's own agenda. To protect the innocent, I use the name of an appropriate fruit to name the specific company I am talking about here, so we call them Grapefruit Inc.
The company in question has allowed the Swedish magazine "Galago", which an Adult (not sexually explicit) cartoon magazine, to publish their "App" only under the condition that Grapefruit Inc can censor the contents. This is not the first time this company does this, and the affects, among other people, users of the products noPhone and noPad from Grapefruit Inc.
And before I go on. Yes, I do have a noPad. And I lov it, hardware is great, as is the software, it truly brilliant. But as the deals with publishers aren't open, I will not pay for any Apps, as I will not know if, and if so what, has been censored.
Another magaine with the same issue is a Danish IT-magazine that has published quite a few positive articles on Android. One thing with an Andriod phone is that it is not censored (again, if you ask, Flash is pretty crap technology, but it is out there and it's being used on sites I want to visit. Not so with my noPad, but my HTC Desire does it like a charm. As well as acting as a 3G WiFi router). What's next? Will they ban TCP/IP because IPX/SPX is better (which it probably is)?
In a way, this is the perfect mix of censorship as seen in, say, the old Eastern German, with modern day market economy: You get censorship, but you have to pay for it!
So, who comes to the rescue? Android, which is Open Source! No censoring, thank you! Open Source delivers! As for myself, who has spent 6 years with MySQL as a Sales Engineer, I have been on the look for the "killer reason" to go Open Source over all those years. To me, this is it, a better reason than any other I have seen: No censorship!
To conclude, I am aware that this is a very opinionated posting, and must be very clear that the opinions expressed here are my own only, and are not related to my day job. I would hope that my employer shares some of my views, but again, what is expressed here is my own thinking, and mine alone. Competely uncensored!
/Karlsson
Wednesday, December 29, 2010
Ubuntu Upstart for automatic MySQL start and stop - Part 2
I've done a little change to the upstart script I published in a recent blogpost. The issue is with mysqld_safe which has a few issues it seems, but so far I haven't tracked them down exactly. Anyway, if mysqld_safe is not in your path and you run it with the full path (as I do in the Upstart script which is run by root that may well not have the mysql bin directory in it's path), then mysqld_safe will behave strange in at least 5.5 when called with the full path, i.e.
/mnt/mysql/bin/mysqld_safe --defaults-file=/etc/mysql/my.cnf &
will not work. If I then put the appropriate directory in my path, like this:
PATH=/mnt/mysql/bin:$PATH
/mnt/mysql/bin/mysqld_safe --defaults-file=/etc/mysql/my.cnf &
It still will not work as expected. In the first case, it complains that it cannot find my_print_defaults, and then just goes on, seemingly using some default values, error logging to /usr/local/mysql/data and trying to run /usr/local/mysql/bin/mysqld, and before you ask, yes I have set things like ledir and basedir in the [mysqld_safe] section in the .cnf file.
Anyway, the fix for all this is to run mysqld_safe with the current directory being the same as basedir (i.e. one step "above" where mysqld_safe is located). To tell Upstart to run in a specific directory, you use the chdir stanza. So my scripted ended up looking like this now:
chdir $MYSQL_HOME
It will not work. And as usual, debugging this with Upstart is nearly impossible. If you try using the above syntax, all you get is an information that the script would not run (and, yes, the console Upstart stanza is more or less useless. Why I cannot tell Upstart to output to the current terminal is beyond me, and why there is no verbose mode, I do not know either).
I will check up the issue with mysqld_safe and send a bug report to MySQL (if it isn't there already). As for upstart, I am so annoyed by it at this point in time, that I'm not sure I have the energy to post a bug report, as I cannot even tell what so-called "feature" is most annoying. But again, this may be because of my limited upstart exposure. Upstart is sure different from working with chef though (which I am also new to). One single, minor, error in chef, and you get an error listing the size of Encyclopedia Britannica. Whereas Upstart on the other side would say something like "job failed", and nothing more, if it was managing the Titanic.
/Karlsson
/mnt/mysql/bin/mysqld_safe --defaults-file=/etc/mysql/my.cnf &
will not work. If I then put the appropriate directory in my path, like this:
PATH=/mnt/mysql/bin:$PATH
/mnt/mysql/bin/mysqld_safe --defaults-file=/etc/mysql/my.cnf &
It still will not work as expected. In the first case, it complains that it cannot find my_print_defaults, and then just goes on, seemingly using some default values, error logging to /usr/local/mysql/data and trying to run /usr/local/mysql/bin/mysqld, and before you ask, yes I have set things like ledir and basedir in the [mysqld_safe] section in the .cnf file.
Anyway, the fix for all this is to run mysqld_safe with the current directory being the same as basedir (i.e. one step "above" where mysqld_safe is located). To tell Upstart to run in a specific directory, you use the chdir stanza. So my scripted ended up looking like this now:
#And lastly, again before you ask, don't try the easy way out here:
# MySQL Service for Recorded Future
#
description "MySQL Server"
author "Anders Karlsson, Recorded Future"
start on (net-device-up
and local-filesystems
and runlevel [2345])
stop on runlevel [016]
expect fork
kill timeout 2
# Set variables.
env MYSQL_ETC=/etc/mysql
env MYSQL_PIDFILE=/var/run/mysql.pid
env MYSQL_HOME=/usr/local/mysql5.5
env MYSQL_INSTANCE=my
umask 007
chdir /usr/local/mysql5.5
exec $MYSQL_HOME/bin/mysqld_safe --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf >> /tmp/x.out &
post-start script
loop=600
# Wait for MySQL to start.
while [ $loop -gt 0 ]; do
if $MYSQL_HOME/bin/mysqladmin --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf ping; then
break
fi
loop=$(($loop - 1))
sleep 1
done
exit 0
end script
# Send a soft SIGTERM to MySQL before Upstart will kill it.
# A Sigterm to mysqld will cause a controlled shutdown.
pre-stop script
exec kill -SIGTERM `cat $MYSQL_PIDFILE`
# Wait for MySQL to end. Flushing buffers and all.
loop=600
while [ $loop -gt 0 ]; do
# If the pidfile is found, then continue waiting.
if [ -e $MYSQL_PIDFILE ] ; then
loop=$((loop - 1))
sleep 1
continue
fi
break
done
end script
chdir $MYSQL_HOME
It will not work. And as usual, debugging this with Upstart is nearly impossible. If you try using the above syntax, all you get is an information that the script would not run (and, yes, the console Upstart stanza is more or less useless. Why I cannot tell Upstart to output to the current terminal is beyond me, and why there is no verbose mode, I do not know either).
I will check up the issue with mysqld_safe and send a bug report to MySQL (if it isn't there already). As for upstart, I am so annoyed by it at this point in time, that I'm not sure I have the energy to post a bug report, as I cannot even tell what so-called "feature" is most annoying. But again, this may be because of my limited upstart exposure. Upstart is sure different from working with chef though (which I am also new to). One single, minor, error in chef, and you get an error listing the size of Encyclopedia Britannica. Whereas Upstart on the other side would say something like "job failed", and nothing more, if it was managing the Titanic.
/Karlsson
Saturday, December 25, 2010
Ubuntu Upstart for automatic MySQL start and stop
Here at Recorded Future we use Ubuntu (running on Amazon EC2), but so far we have not explored Ubuntu Upstart that much. During the holidays I made an effort to get acquainted with Upstart and to implement proper MySQL start and stop with it.
If you do not know Upstart, this is the way you start and stop services in Ubuntu, and it serves the same purpose as the old /etc/init.d scripts, but are a bit more structured and powerful. That said, Upstart is regrettably far from complete, although the functionality is much better and Upstart has some cool features, some things do not work that well. For one thing, documentation, where it exists, is useless, at best. Secondly, there is very limited ability to test and develop Upstart scripts. And this is made worse by the fact that the documentation is so bad. Another thing is that Upstart insist on stopping services, by default, by sending a brutal kill signal. Not good for databases, mostly.
In the /etc/init directory are the Upstart scripts you have. In difference to the old init.d scripts, you cannot disable a service in Upstart curenntly. If it is in /etc/init it will be started at system start. That's it. And this is something that I am sure will be fixed, but for now, again, is something we have to live with. Upstart scripts have the suffix .conf (don't ask me why), so the default MySQL Upstart script, for example, is called /etc/init/mysql.conf.
In an Upstart script, there are Stanzas that determine what to do. Like the exec Stanza that runs a program for example. And you may then ask, when is it run? Startup? Shutdown? And the answer is startup. For shutting things down, as I said before, Upstart will by default just send a kill -9 signal.
The minimal startup script you can have, and this actually works in a reasonable way, is to just have one line with an exec stanza, like this:
exec /usr/bin/mydaemon
Which will start the daemon. For stopping the daemon, Upstart will send a -9 signal to the started process by default, and nothing more is needed in the Upstart script.
For MySQL, we need to make things a bit more complicated. The default mysql.conf Upstart script really is not good. For one thing, it will not do a controlled shutdown of MySQL (this is possible even if Upstart will eventually send a kill -9 anyway). Secondly, this script assumes that what we use is a standard Ubunty installed MySQL distribution, so if you have installed MySQL in /usr/bin/mysql5147 or somethings like that, you are out of luck.
So what I wanted to create was an Upstart script for MySQL that fullfilled these requirements:
So here we go, a complete MySQL Upstart script, the way I want it to work:
To be honest, this is not what I create for all our MySQL servers. Instead I used this to create a chef template, chef is what we use for configuration management here (see http://www.opscode.com/ for more on chef), and here it is put to good ude to generate an Upstart script for MySQL. The above is just an example.
/Karlsson
If you do not know Upstart, this is the way you start and stop services in Ubuntu, and it serves the same purpose as the old /etc/init.d scripts, but are a bit more structured and powerful. That said, Upstart is regrettably far from complete, although the functionality is much better and Upstart has some cool features, some things do not work that well. For one thing, documentation, where it exists, is useless, at best. Secondly, there is very limited ability to test and develop Upstart scripts. And this is made worse by the fact that the documentation is so bad. Another thing is that Upstart insist on stopping services, by default, by sending a brutal kill signal. Not good for databases, mostly.
In the /etc/init directory are the Upstart scripts you have. In difference to the old init.d scripts, you cannot disable a service in Upstart curenntly. If it is in /etc/init it will be started at system start. That's it. And this is something that I am sure will be fixed, but for now, again, is something we have to live with. Upstart scripts have the suffix .conf (don't ask me why), so the default MySQL Upstart script, for example, is called /etc/init/mysql.conf.
In an Upstart script, there are Stanzas that determine what to do. Like the exec Stanza that runs a program for example. And you may then ask, when is it run? Startup? Shutdown? And the answer is startup. For shutting things down, as I said before, Upstart will by default just send a kill -9 signal.
The minimal startup script you can have, and this actually works in a reasonable way, is to just have one line with an exec stanza, like this:
exec /usr/bin/mydaemon
Which will start the daemon. For stopping the daemon, Upstart will send a -9 signal to the started process by default, and nothing more is needed in the Upstart script.
For MySQL, we need to make things a bit more complicated. The default mysql.conf Upstart script really is not good. For one thing, it will not do a controlled shutdown of MySQL (this is possible even if Upstart will eventually send a kill -9 anyway). Secondly, this script assumes that what we use is a standard Ubunty installed MySQL distribution, so if you have installed MySQL in /usr/bin/mysql5147 or somethings like that, you are out of luck.
So what I wanted to create was an Upstart script for MySQL that fullfilled these requirements:
- Starts MySQL automatically.
- Waits for MySQL to be available before exiting.
- Be configurable to support different MySQL install locations, data directories etc.
- Do a clean shutdown of MySQL when stopping the MySQL services.
So here we go, a complete MySQL Upstart script, the way I want it to work:
#
# MySQL Service for Recorded Future
#
description "MySQL Server"
author "Anders Karlsson, Recorded Future"
start on (net-device-up
and local-filesystems
and runlevel [2345])
stop on runlevel [016]
expect fork
kill timeout 2
# Set variables.
env MYSQL_ETC=/etc/mysql
env MYSQL_PIDFILE=/var/run/mysql.pid
env MYSQL_HOME=/usr/local/mysql5.5
env MYSQL_INSTANCE=my
umask 007
exec $MYSQL_HOME/bin/mysqld_safe --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf >> /tmp/x.out &
post-start script
loop=600
# Wait for MySQL to start.
while [ $loop -gt 0 ]; do
if $MYSQL_HOME/bin/mysqladmin --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf ping; then
break
fi
loop=$(($loop - 1))
sleep 1
done
exit 0
end script
# Send a soft SIGTERM to MySQL before Upstart will kill it.
# A Sigterm to mysqld will cause a controlled shutdown.
pre-stop script
exec kill -SIGTERM `cat $MYSQL_PIDFILE`
# Wait for MySQL to end. Flushing buffers and all.
loop=600
while [ $loop -gt 0 ]; do
# If the pidfile is found, then continue waiting.
if [ -e $MYSQL_PIDFILE ] ; then
loop=$((loop - 1))
sleep 1
continue
fi
break
done
end script
To be honest, this is not what I create for all our MySQL servers. Instead I used this to create a chef template, chef is what we use for configuration management here (see http://www.opscode.com/ for more on chef), and here it is put to good ude to generate an Upstart script for MySQL. The above is just an example.
/Karlsson
Monday, December 20, 2010
Using the right tool for the job at hand - MongoDB, MySQL and Sphinx
You may have seen my posting regarding "eventual consistency" some months ago, and you may have come to the conclusion that I was insisting that a SQL based RDBMS is the way to go for just about anything. Tell you what, that is not so. And nether am I against using. say, MongoDB, where it is appropriate.
The whole deal with Eventual consistency is something that I am still opposed to, I want to know if my data is consistent. And I am not not sure that you cannot have a fully consistent, distributed system either. But I guess that debate goes on. And I still want my base data to be consistent. Like in RDBMS-SQL-Foreign-keys-all-over-the-place-and-not-a-bl**dy-bit-lost-in-the-MyISAM-swamp consistent. That is what I want the base data to look like. And if there are compromises with this, which it may well be, then I want to know about those too.
So, having covered that, what am I trying to say? Well, if you properly normalize your data, then the more you normalize and the more strict you are with data quality, the more troublesome management of that data is going to be, and that is something we have to live with I guess. But if you then are to ask some hefty queries on that data, where the data is organized in such a way to make data real consistent, and the queries just want to data, and the query-side of things really doesn't care about normalization at all, how do you deal with that? One way of course may be to replicate to something more query-friendly, possibly a second MySQL-server or possibly even a bunch of such servers in a scale-out scenario. But your data structures still look really complex, having being built to support storage, update, maintenance and consistency requirement foremost.
At Recorded Future we have taken a different path in our latest release: Choose the best tool for the job at hand. We use MySQL with InnoDB for our data loading and storage. And for that, MySQL worka real well. So we have the data we have collected and processed and organized, structed nicely in an RDBMS.
Now, on the other side of things, where queries are made, things look different, there we want to fulfil 2 needs, basically:
Forgetting Sphinx for now, what we do with Sphinx is actually really simple, and concentrating on MongoDB, where we also do pretty simple things, but the requirements and the scale in the case of MongoDB is higher for us. MongoDB so far has performed well for us. We are running in an Amazon EC2 environment, and that has issues of it's own (in particular this seemed to be the case with Sphinx, but they are on the case). As for Mongo, this is so common in EC2 environments so I guess ot has been more tested.
I admit to liking MongoDB. It is lightweight, easy to manage and has some cool features, such as automatic, and reasonably transparent, sharding. MongoDB at first seems to have good documentation, there is a lot of examples and samples. What is lacking is a good reference though, and a good administration guide is also missing and the Sharding support is still not well documented from admin point-of-view. The JavaScript orientation in MongoDB at first was annoying to me, but increasingly I find it useful. One thing to note about MongoDB is that it needs it's memory: the more the better. For good performance with MongoDB you really want to avoid disk accesses (in particular in EC2 environments). What we do to avoid disk-accesses as much as we can, we have gzipped the data items. We store data in JSON format (yes, we are a JSON shop, sorry XMLers out there) and compressing data fast and having a fast JSON parser is crucial. And yes, we DID think about the case with frequently accessed data: we have the option, on an instance-by-instance basis, to store data as plain JSON or compressed. Plain JSON has advantages, such as as MongoDB is using JavaScript, JSON is pretty much Native to MongoDB, making certain operations real easy.
We are always on the lookout for new technologies, and we do try many things, but the current setup is really useful and we do get much better performence and scalability. And yes, we do get both, with the same number of servers, we get better performance, and much better distributed load of the machines. Now we are waiting for Amazon to fix their disk IO and Network issues.
Your truly
/Karlsson
Hope to see you in Santa Clara in April at the MySQL UC!
The whole deal with Eventual consistency is something that I am still opposed to, I want to know if my data is consistent. And I am not not sure that you cannot have a fully consistent, distributed system either. But I guess that debate goes on. And I still want my base data to be consistent. Like in RDBMS-SQL-Foreign-keys-all-over-the-place-and-not-a-bl**dy-bit-lost-in-the-MyISAM-swamp consistent. That is what I want the base data to look like. And if there are compromises with this, which it may well be, then I want to know about those too.
So, having covered that, what am I trying to say? Well, if you properly normalize your data, then the more you normalize and the more strict you are with data quality, the more troublesome management of that data is going to be, and that is something we have to live with I guess. But if you then are to ask some hefty queries on that data, where the data is organized in such a way to make data real consistent, and the queries just want to data, and the query-side of things really doesn't care about normalization at all, how do you deal with that? One way of course may be to replicate to something more query-friendly, possibly a second MySQL-server or possibly even a bunch of such servers in a scale-out scenario. But your data structures still look really complex, having being built to support storage, update, maintenance and consistency requirement foremost.
At Recorded Future we have taken a different path in our latest release: Choose the best tool for the job at hand. We use MySQL with InnoDB for our data loading and storage. And for that, MySQL worka real well. So we have the data we have collected and processed and organized, structed nicely in an RDBMS.
Now, on the other side of things, where queries are made, things look different, there we want to fulfil 2 needs, basically:
- Fast quering for data, in out case these are instances.
- Fast retrieval of attributes of the instances that was retrieved.
Forgetting Sphinx for now, what we do with Sphinx is actually really simple, and concentrating on MongoDB, where we also do pretty simple things, but the requirements and the scale in the case of MongoDB is higher for us. MongoDB so far has performed well for us. We are running in an Amazon EC2 environment, and that has issues of it's own (in particular this seemed to be the case with Sphinx, but they are on the case). As for Mongo, this is so common in EC2 environments so I guess ot has been more tested.
I admit to liking MongoDB. It is lightweight, easy to manage and has some cool features, such as automatic, and reasonably transparent, sharding. MongoDB at first seems to have good documentation, there is a lot of examples and samples. What is lacking is a good reference though, and a good administration guide is also missing and the Sharding support is still not well documented from admin point-of-view. The JavaScript orientation in MongoDB at first was annoying to me, but increasingly I find it useful. One thing to note about MongoDB is that it needs it's memory: the more the better. For good performance with MongoDB you really want to avoid disk accesses (in particular in EC2 environments). What we do to avoid disk-accesses as much as we can, we have gzipped the data items. We store data in JSON format (yes, we are a JSON shop, sorry XMLers out there) and compressing data fast and having a fast JSON parser is crucial. And yes, we DID think about the case with frequently accessed data: we have the option, on an instance-by-instance basis, to store data as plain JSON or compressed. Plain JSON has advantages, such as as MongoDB is using JavaScript, JSON is pretty much Native to MongoDB, making certain operations real easy.
We are always on the lookout for new technologies, and we do try many things, but the current setup is really useful and we do get much better performence and scalability. And yes, we do get both, with the same number of servers, we get better performance, and much better distributed load of the machines. Now we are waiting for Amazon to fix their disk IO and Network issues.
Your truly
/Karlsson
Hope to see you in Santa Clara in April at the MySQL UC!
Thursday, December 2, 2010
Working with MySQL on Windows - Part 1
This is the first post in an attempt to write about how to get along with MySQL on Windows. Before I start, I want to stress that the focus here is seasoned Windows users getting started with MySQL. As for MySQL, I am prett much Operating System agnostic, having used Linux / Unix for some 25 - 30 years ( started using Unix as of Version 6. That's OLD). But I have also been a Windows user and developer since Windows 3.0 days.
Also, when I say I'm a Windows developer, let me tell you where I come from. I come from learning Win16 in C. Hardcoded message loops and message handlers and that kind of stuff. And that is how I still develop on Windows, but these days with Win32 / Win64 and exposing more Windows APIs. So I am no C# or .NET expert by any means.
So, you are ready to get started with MySQL on Windows? OK, first you should download it, possibly from here: http://dev.mysql.com/downloads/mysql/. And no, you don't have to pay anything, MySQL is free for us developers. And no, you will not need to compile anything from source or something, quite the opposite, MySQL comes with a very nice installer. Just download the MSI installer package for your platform (32 or 64 bit that is), and get started.
Usually when Installing MySQL, you can choose most of the defaults. After the installation, a configuration application starts, and you can usually choose most of the defaults for a starter here also. One things that your might want to look at is the character set selection. Latin1 is one option that is usually OK and that is similar odl and tried Windows CP1252. It's good enough as a start. If you plan to develop Web applications, then you might want to choose UTF8.
OK, assuming all went well, where do a get started with really using MySQL? Well, there isn't much in terms of a Windows GUI for MySQL installed as part of MySQL. And not only that, if you are a Visual Studio user, possibly developing in C#, there is very little in Visual Studio that shows that there is a new database server installed? Well, this we will fix, all this stuff sure is available from MySQL, you just have to know where to look.
First, as for a MySQL GUI for Windows, you have a few choices. If you want a complete MySQL design and SQL tool, then look no further than MySQL Workbench from MySQL themselves. This is a database design tool for MySQL that is loaded with cool features. Again, you can download it and use it at no cost, it is available here: http://dev.mysql.com/downloads/workbench/5.2.html
There are alternatives to MySQL Workbench, and many of them are also free and Open Source. HeidiSQL is one such option, and I have a tool myself called MyQuery. Both of these are a little less ambitious than MySQL Workbench, but also has features of their own. HeidiSQL is very much DBA focused and has loads of DBA style functions. My own MyQuery is focused on working with MySQL Scripts and shines when it comes to extensibility among other things. MyQuery can be downloaded here: http://sourceforge.net/projects/myquery/
But what about your applications? How is a database connection to MySQL achieved? Well, there are several ways to connect to MySQL. When you download the MySQL Server, it comes with a C API. If you are an old man like me, that might well be the way to go. The C API is OK and is well documented here: http://dev.mysql.com/doc/refman/5.5/en/c.html. The C API comes in static and dynamic shape. If you don't know the the difference or don't really care, then use the dynamic build. Using the static library is more demanding. MySQL itself, including the C API, is built with Visual Studio, so if that is what you use for your application (and I guess you are), that is OK. And you can do with the Express editions of Visual Studio.
Now, chances are you are not an old C hacker like myself, but a more modern person, using C#, Visual Basic or Java. If that is the case, MySQL provides help even here. If you are using Visual Studio with any of these languages, you want to download and install MySQL Connector/Net. A Connector is what MySQL calls their database drivers. And yes, even if you are on Java you want the .Net connector. The reason is that the .Net connector also includes the Visual Studio integration, although here it doesn't seem that the Express editions are enough. Connector/.Net can be downloaded here: http://dev.mysql.com/downloads/connector/net/.
And then for Java, you need Connector/J, available here: http://dev.mysql.com/downloads/connector/j/
Now, one word of caution before I leave you for this time: Both Connector/Net and Connector/J does some cool things for you. If you look at how MySQL works, or have a look at the MySQL C API, you realize that MySQL has some very particular ways of dealing with some things, like array inserts, prepared statements and things like that. To make life as a MySQL developer easier, both of these connectors do some very cool things to isolate you from that. Also, these connectors to an extent address some things beyond MySQL itself, such as high availability. And thirdly, tuning the Connectors is often just as important as tuning the server! What I am saying here is, in short, something as unknown and arcane as read the documentation. In particular this is true for Connector/J, which has many parameters, is very advanced, but also very fast and can help you speed up your application if you know how to work it.
Bye for now, I'll be back in a jiffy with some more Windows specific MySQL notes
/Karlsson
Also, when I say I'm a Windows developer, let me tell you where I come from. I come from learning Win16 in C. Hardcoded message loops and message handlers and that kind of stuff. And that is how I still develop on Windows, but these days with Win32 / Win64 and exposing more Windows APIs. So I am no C# or .NET expert by any means.
So, you are ready to get started with MySQL on Windows? OK, first you should download it, possibly from here: http://dev.mysql.com/downloads/mysql/. And no, you don't have to pay anything, MySQL is free for us developers. And no, you will not need to compile anything from source or something, quite the opposite, MySQL comes with a very nice installer. Just download the MSI installer package for your platform (32 or 64 bit that is), and get started.
Usually when Installing MySQL, you can choose most of the defaults. After the installation, a configuration application starts, and you can usually choose most of the defaults for a starter here also. One things that your might want to look at is the character set selection. Latin1 is one option that is usually OK and that is similar odl and tried Windows CP1252. It's good enough as a start. If you plan to develop Web applications, then you might want to choose UTF8.
OK, assuming all went well, where do a get started with really using MySQL? Well, there isn't much in terms of a Windows GUI for MySQL installed as part of MySQL. And not only that, if you are a Visual Studio user, possibly developing in C#, there is very little in Visual Studio that shows that there is a new database server installed? Well, this we will fix, all this stuff sure is available from MySQL, you just have to know where to look.
First, as for a MySQL GUI for Windows, you have a few choices. If you want a complete MySQL design and SQL tool, then look no further than MySQL Workbench from MySQL themselves. This is a database design tool for MySQL that is loaded with cool features. Again, you can download it and use it at no cost, it is available here: http://dev.mysql.com/downloads/workbench/5.2.html
There are alternatives to MySQL Workbench, and many of them are also free and Open Source. HeidiSQL is one such option, and I have a tool myself called MyQuery. Both of these are a little less ambitious than MySQL Workbench, but also has features of their own. HeidiSQL is very much DBA focused and has loads of DBA style functions. My own MyQuery is focused on working with MySQL Scripts and shines when it comes to extensibility among other things. MyQuery can be downloaded here: http://sourceforge.net/projects/myquery/
But what about your applications? How is a database connection to MySQL achieved? Well, there are several ways to connect to MySQL. When you download the MySQL Server, it comes with a C API. If you are an old man like me, that might well be the way to go. The C API is OK and is well documented here: http://dev.mysql.com/doc/refman/5.5/en/c.html. The C API comes in static and dynamic shape. If you don't know the the difference or don't really care, then use the dynamic build. Using the static library is more demanding. MySQL itself, including the C API, is built with Visual Studio, so if that is what you use for your application (and I guess you are), that is OK. And you can do with the Express editions of Visual Studio.
Now, chances are you are not an old C hacker like myself, but a more modern person, using C#, Visual Basic or Java. If that is the case, MySQL provides help even here. If you are using Visual Studio with any of these languages, you want to download and install MySQL Connector/Net. A Connector is what MySQL calls their database drivers. And yes, even if you are on Java you want the .Net connector. The reason is that the .Net connector also includes the Visual Studio integration, although here it doesn't seem that the Express editions are enough. Connector/.Net can be downloaded here: http://dev.mysql.com/downloads/connector/net/.
And then for Java, you need Connector/J, available here: http://dev.mysql.com/downloads/connector/j/
Now, one word of caution before I leave you for this time: Both Connector/Net and Connector/J does some cool things for you. If you look at how MySQL works, or have a look at the MySQL C API, you realize that MySQL has some very particular ways of dealing with some things, like array inserts, prepared statements and things like that. To make life as a MySQL developer easier, both of these connectors do some very cool things to isolate you from that. Also, these connectors to an extent address some things beyond MySQL itself, such as high availability. And thirdly, tuning the Connectors is often just as important as tuning the server! What I am saying here is, in short, something as unknown and arcane as read the documentation. In particular this is true for Connector/J, which has many parameters, is very advanced, but also very fast and can help you speed up your application if you know how to work it.
Bye for now, I'll be back in a jiffy with some more Windows specific MySQL notes
/Karlsson
Subscribe to:
Posts (Atom)