Karlsson on databases and stuff: July 2010

Tuesday, July 6, 2010

Prepared statements - Are they useful or not?

MySQL has supported prepared statements since version 5.0, but the use of the non-prepared statement API is still much more popular. Here, I'll explain what prepared statements are in short, and show some examples and why you would, and would not, use them.

The concept of a prepared statement is that it a SQL statement that is lacking any parameters or literals and with these being replaced by placeholders. Then the statement is parsed and optimized without those values, and the placeholders are referenced to program variables. Once this is done, to execute the prepared statement, you set the program variables to the appropriate values and then execute the statement. You may change the program variables referencing the placeholder values and re-execute the SQL statement as many times as you want, without having to re-parse or optimize the statement.

The different steps then are:

Prepare the statement - You prepare the statement with the appropriate placeholders (which are question marks), like: SELECT foo FROM bar WHERE foobar = ?
Bind the placeholder values to program variables.
Bind the result columns to program variables (if there are any columns returned that is).
Set the program variables that are bound to the statement placeholder literals.
Execute the prepared statement.
If there are columns returned, i.e. this is a SELECT or SHOW statement for example, store the result, fetch the rows and the free the result.
Repeat from step 4 until done.
Free the prepared statement.

This doesn't seem so complicated, right? Well, there are more than one way of dealing with prepared statements in MySQL. One way is the way employed by the PREPARE and EXECUTE SQL statements. This is different than what I described above, but the basics are there. What we are to look at here though is the Prepared statement MySQL C-API. And to be clear, I look at the API from the C API level, and not from the underlying "wire" protocol, although in this case there is little difference.

Working with prepared statement in MySQL, as in most other database systems, require a bit more work than playing with simple non-prepared statements. The prepare phase is done with the mysql_stmt_prepare() call, which use a MYSQL_STMT to keep track of the statement, and which has been initialized with a call to mysql_stmt_init().

The next step is to bind the placeholders, and this is done with a call to mysql_stmt_bind_param(), which takes the MYSQL_STMT struct as above, as well as an array of MYSQL_BIND structs, each of them referencing one bin parameter in the statement, and each to be filled in with information to the MySQL server about where the program variable is, what type it has and a pointer to an indicator of the NULL / NOT NULL status if this variable. Note that to pass a NULL, you do not pass the string NULL, which will do just that: Use the string "NULL", instead you MUST set the is_null member indicator of the MYSQL_BIND struct.

After binding parameters, you have to bind result columns, if there are any columns returned. The way this is done with the MySQL C API looks a bit awkward, but not really complicated. In essence, you call mysql_stmt_result_metadata() to get a MYSQL_RES, which is a familiar struct for any user of the classic MySQL C API, and then you use that to get any information you want on the result data from the statement.

You can nod bind the result columns, and again you use an array of MYSQL_BIND structs for this. Note that you have to pass appropriate data, like space for the returned column data and for a NULL indicator. Failure to do so will cause a segmentation fault and may in extreme cases cause really bad karma.

This was the complex part, now all you have to do is set the program variables referenced by the MYSQL_BIND array for the parameters, call mysql_stmt_execute() to execute the statement and then, if there are columns returned, call mysql_stmt_store_result() to get the result, then call mysql_stmt_fetch() to get each row, and for every row, you can look at the data referenced by the MYSQL_BIND struct array you passed for the column data. When you are done with the result, call mysql_stmt_free_result() and that's about it.

Well, to be honest, there is a bit more to it, but this is the basics. So what are the advantages? It really should be faster, as there is a lot less parsing and optimization going on, and in many database systems, using prepared statements increases performance A LOT. Not so with MySQL, probably due to the limited amount of optimization we do, and the rather basic SQL syntax we support. But there are cases where there are distinct advantages, in particular SELECTs. One reason for this is that a SELECT is usually a more complex SQL than, say, an INSERT or a SELECT (and mostly also an UPDATE), whcih means there is more to gain from limited time in the parser and optimizer. Also, a SELECT is in the same way also more complicated to optimize. I have done some simple tests, and has seen some 50% performance gain when using prepared statements with a still simple SELECT with one join. On the other hand, I have yet to see any advantages for INSERTs, but also no performance loss either, instead, I could see just about the same performance.

And are there disadvantages? Yes, I'm afraid so. For CALL statements, in/out parameters are not supported, nor is handling result sets from prepared statements. In addition, multiple statement results are not supported at all when using the prepared statement C API. Read more on the limitations in the MySQL manual.

These limitations are the worst aspects of using prepared statements, and I really hope we will be able to fix them soon.

/Karlsson

Open source or Open Core or Commercial... Does it matter??

This is my 2 cents in the Open Source vs. Open Code vs. Commercial debate. And it's a long one...

Maybe some of you reading this are offended already, but bear with me, I'll get there. The way I see the Open Source model, having worked with OSS at MySQL for 6+ years now, is that this is a great way of developing software. Not brilliant, but great, but I'll get there also.

Users of OSS, in my mind, are OSS users for one or more of three reasons:

It's Open - The users using OSS for this reason believes that being open is in and of itself a great thing, enough so to use OSS even when non-OSS is less expensive and/or better.
Cost - OSS is typically less expensive than non-OSS, and this is the reason these users get here. There are then 2 subgroups here, one that represents users that just aren't funded at all, many websites are in this category, the users building Joomla and Drupal sites and the like, I think you get the point. The second group are those that have funding, but would rather spend their money on luxury items and a new car than of a software license.
Technology - This is a category that many think they are in, but I don't think this is mostly not the case. These are the users on a unique piece of software that is either not existing as non-OSS, or where the OSS variations are so much more powerful than the commercial counterparts. In all honesty, although I am aware these cases exist, I do not think that that there are THAT many. But there are those there Cost + Technology plays in, i.e. even though a commercial option exists, it is just too expensive.

OK, so now we know (what I think) are the reasons that Open Source exists, is in wide use and is growing. For the first group, the ones that see Openness as a good enough reason in and of itself, I think this is a smaller group of the total number of users. But that openness is not really, in my mind, well defined.

If Oracle would take the sourcecode for the Oracle database and release it under GPL, then it would be Open in most peoples mind I guess. But that piece of code is massive, and few people outside the Oracle developers would have the time, resources and knowledge to understand, extend and modify it, so what how Open is it really then? I think to an extent MySQL is case in point here, although it is GPL licenced and the sourcecode is open and free, there are few outside contributors, as compared to the large number of users. I think most users building a website using Drupal cares much about MySQL being open or closed or whatever. I think most of them care about the cost being low. And one sure could argue that low cost comes from the source being open, that is probably true to a large extent, but that doesn't mean that commercial software or non-OSS also can be low cost (shareware for example).

What this boils down to, in my mind then, is that although we all enjoy the low cost of OSS, less care about it really being open and if so how, and more about it being inexpensive. And I say that as someone who doesn't actually mind reading sourcecode, and this is something I do on a regular basis, read and sometimes tinker with the MySQL source. But I really do not think that I am typical here.

And all this is not to say that there is something wrong with OSS, quite the opposite, but often it is more about cost than actual openness. And this is worrying, but there are exceptions. Linux is one such example, although the kernel is since long ago developed by a rather small closely knit community, utilities and programs surrounding and extending the kernel, such as modules, the GNU packages and that stuff, are developed separately from this group, by individuals or groups of them with specific needs or knowledge. The key here is the open interfaces. You don't have to understand every aspect of the Linux kernel to develop a well working and efficient utility or even kernel module.

But I do not think that even Linux is developed enough in this area as I would like it to see. To me, who really believe that Open Source Software is a good thing and an excellent model for development, I would like to see an even more "contributor friendly" architecture. I think Unix got a long way here in it's early days, with the principles of simple and easy to use APIs (like pipes) and programs could do one hing and do it well. But those days are gone now, that was 30 - 40 years ago or so, and we need to develop things, and I haven't seen that happening. FSF and GPL and all that defines to extent the framework for distributed software in terms of legalities and many other aspects, but there is little help in how to make the software that can now in theory be read by anyone truly open. If we assume that Oracle made their sourcecode GPL, but did not provide any documentation on how the sourcecode works (which is not a requirement of GPL) and removed all the sourcecode comments (which is not a requirement either), how open would that be, really? I do not think it would help much in terms of openness, to be honest. Sure, it would be open for someone who wanted to hickjack some intricate part of the Oracle sourcecode, but that would need a large investment in investigating the code, so this would probably only be reasonable for a some other large corporate entity. But the code would really be open for the rest of us.

Instead of discussing Open Source vs. Open Code vs. Commercial, I think it would be much more interesting to discuss how we develop software that truly is contributor friendly. Code that is easy to add to, code that lives in an environment where changes and additions can easily be made, reviewed and tested. Code that allows itself to be built by anyone, anywhere so that I can test my code on a 16 CPU x86 box somewhere in australia, provided by a nice person I don't even know, although I am located in Sweden. Code that is required to have proper commenting, proper structured APIs and natural points for injecting new and changed code. And above all, code that lets someone with excellent domain knowledge (in for example indexing algorithms, GIS, text search, APIs, disk management etc., if we talk about databases) to write code and test, without being a database expert or even knowing the inner details of the system he/she writes code in, and not being brilliant developers themselves.

Is this a dream? Maybe, Is Drizzle the answer (I know someone will suggest that), and I say no, it's just not enough, it's just more of the same (plugins), it doesn't really provide anything new in how we develop things or how those developments are published and distributed.

In short, I think the Open Source vs. Open Code debate is just nitpicking and boring. Neither model just isn't good enough to be truly friendly and open for contribution. The difference lies more in how and with what we we can commercialize our efforts, which is a valid concern, but my main concern, as you can see, is that I believe that neither model is truly open. And I would rather see a truly contributor friendly Open Code model than the current state of affairs.

/Karlsson

Saturday, July 3, 2010

A life among databases...

A long time ago, in the early 1980's, I decided to change jobs. I was a young guy, with no real experience of commercial software or anything like that, rather I was a self-taught sysadmin for an ancient UNix system. The company I worked for was in the Telco business, so I looked for another job where I could develop myself and at the same to use my telco knowledge.

I found a Telco startup, privately held. I must say that the fact that it was privately held meant nothing to me at the time. Nothing. They were building a system, the servers were VAXes running VMS, and again a became a sysadmin.

Having been sysadmin at this company for a while, building up the central datacenter (to be honest, in todays words, that was what I was doing, but at the time, I had no real clue. I was mosr enthusiastic and ready to take on any task than I was smart or intelligent or knew what I was doing, really). But I wanted to develop myself, and at the previous job I had learnt to code in C (it was a Unix system after all), so I slowly migrated into database development, managing sysadmin duties on the side.

Still, I wasn't truly professional I think. But I was willing to work and I was persistent and just wouldn't let go. I came to the office dressed in a pair of Jeans and a T-Shirt, and wasn't really aware that sometimes it would be a good idea to dress up or something (this was the 1980's still, so that might have been a good idea back then).

As a developer, I realized that the system used a database, and a SQL database! I had no clue whatsoever what this was. But I started writing code creating tables and working my way through this, learning as I went. That you needed something called an "INDEX" became obvious to me after I had shown my latest creating to my colleagues and the things was just soooo slow. In the end, I actually picked up the manual for that "SQL Database", whatever THAT was.

After about a year at this company, they decided to move their operations abroad, and I wanted to stay in Sweden, so I went looking for another job. The company behind the SQL Database I had used was looking for people it seemed, so I applied for a job. What it was like working for a US based software company was something I had no clue about. I got the job and turned up for my duties as a support engineer dressed in Jeans and a T-Shirt, and got to work. What a support engineer was really supposed to be doing wasn't something I really knew, it was more along the lines of people calling me with questions, and I tried to help them, as best as I could.

The company in question was Oracle. And Oracle really did support me, and courage me, to develop myself, to go to training classes (I didn't ask for these, I was just sent away on them), to take on other jobs inside the organization to to develop my technical and business skills.

For all this, I am grateful. Oracle largely shaped me for my future career among database companies, and if that is a good or bad thing is up for you to decide. Now I'm back at Oracle, and I still enjoy it. I am aware that not everyone will agree with me here, but I am glad to be back, after nearly 20 years after I left, and many things with this geart company is still around.

All in all, I'm sure Oracle is a good home for MySQL. You may think differently, but I am honored to work for Oracle, and even more so with MySQL at Oracle. Frankly, I can't see that it can get any better.

/Karlsson