Tuesday, July 6, 2010

Prepared statements - Are they useful or not?

MySQL has supported prepared statements since version 5.0, but the use of the non-prepared statement API is still much more popular. Here, I'll explain what prepared statements are in short, and show some examples and why you would, and would not, use them.

The concept of a prepared statement is that it a SQL statement that is lacking any parameters or literals and with these being replaced by placeholders. Then the statement is parsed and optimized without those values, and the placeholders are referenced to program variables. Once this is done, to execute the prepared statement, you set the program variables to the appropriate values and then execute the statement. You may change the program variables referencing the placeholder values and re-execute the SQL statement as many times as you want, without having to re-parse or optimize the statement.

The different steps then are:
  1. Prepare the statement - You prepare the statement with the appropriate placeholders (which are question marks), like: SELECT foo FROM bar WHERE foobar = ?
  2. Bind the placeholder values to program variables.
  3. Bind the result columns to program variables (if there are any columns returned that is).
  4. Set the program variables that are bound to the statement placeholder literals.
  5. Execute the prepared statement.
  6. If there are columns returned, i.e. this is a SELECT or SHOW statement for example, store the result, fetch the rows and the free the result.
  7. Repeat from step 4 until done.
  8. Free the prepared statement.
This doesn't seem so complicated, right? Well, there are more than one way of dealing with prepared statements in MySQL. One way is the way employed by the PREPARE and EXECUTE SQL statements. This is different than what I described above, but the basics are there. What we are to look at here though is the Prepared statement MySQL C-API. And to be clear, I look at the API from the C API level, and not from the underlying "wire" protocol, although in this case there is little difference.

Working with prepared statement in MySQL, as in most other database systems, require a bit more work than playing with simple non-prepared statements. The prepare phase is done with the mysql_stmt_prepare() call, which use a MYSQL_STMT to keep track of the statement, and which has been initialized with a call to mysql_stmt_init().

The next step is to bind the placeholders, and this is done with a call to mysql_stmt_bind_param(), which takes the MYSQL_STMT struct as above, as well as an array of MYSQL_BIND structs, each of them referencing one bin parameter in the statement, and each to be filled in with information to the MySQL server about where the program variable is, what type it has and a pointer to an indicator of the NULL / NOT NULL status if this variable. Note that to pass a NULL, you do not pass the string NULL, which will do just that: Use the string "NULL", instead you MUST set the is_null member indicator of the MYSQL_BIND struct.

After binding parameters, you have to bind result columns, if there are any columns returned. The way this is done with the MySQL C API looks a bit awkward, but not really complicated. In essence, you call mysql_stmt_result_metadata() to get a MYSQL_RES, which is a familiar struct for any user of the classic MySQL C API, and then you use that to get any information you want on the result data from the statement.

You can nod bind the result columns, and again you use an array of MYSQL_BIND structs for this. Note that you have to pass appropriate data, like space for the returned column data and for a NULL indicator. Failure to do so will cause a segmentation fault and may in extreme cases cause really bad karma.

This was the complex part, now all you have to do is set the program variables referenced by the MYSQL_BIND array for the parameters, call mysql_stmt_execute() to execute the statement and then, if there are columns returned, call mysql_stmt_store_result() to get the result, then call mysql_stmt_fetch() to get each row, and for every row, you can look at the data referenced by the MYSQL_BIND struct array you passed for the column data. When you are done with the result, call mysql_stmt_free_result() and that's about it.

Well, to be honest, there is a bit more to it, but this is the basics. So what are the advantages? It really should be faster, as there is a lot less parsing and optimization going on, and in many database systems, using prepared statements increases performance A LOT. Not so with MySQL, probably due to the limited amount of optimization we do, and the rather basic SQL syntax we support. But there are cases where there are distinct advantages, in particular SELECTs. One reason for this is that a SELECT is usually a more complex SQL than, say, an INSERT or a SELECT (and mostly also an UPDATE), whcih means there is more to gain from limited time in the parser and optimizer. Also, a SELECT is in the same way also more complicated to optimize. I have done some simple tests, and has seen some 50% performance gain when using prepared statements with a still simple SELECT with one join. On the other hand, I have yet to see any advantages for INSERTs, but also no performance loss either, instead, I could see just about the same performance.

And are there disadvantages? Yes, I'm afraid so. For CALL statements, in/out parameters are not supported, nor is handling result sets from prepared statements. In addition, multiple statement results are not supported at all when using the prepared statement C API. Read more on the limitations in the MySQL manual.

These limitations are the worst aspects of using prepared statements, and I really hope we will be able to fix them soon.

/Karlsson

8 comments:

Sheeri K. Cabral said...

My "short but sweet" for whether or not prepared statements are useful is that they're useful for security (ie, parameterization helps defeat SQL injection) but not useful for performance.

Didier Spezia said...

We also had good results with prepared statements on high-throughput servers running not-so-complex select queries (and even with multiple-rows insert queries).

IMO the worst aspects of prepared statements with MySQL are the extra roudtrips they require to prepare/deallocate them, and the fact memory for prepared statements is not shared across sessions.

Oracle has an excellent implementation of prepared statements (both on client API and on server side).

Regards,
Didier.

rpbouman said...

Hi Anders!

"But there are cases where there are distinct advantages, in particular SELECTs"

mm, but this assumes the IO associated with executing the SELECT (be it prepared or not) is small in comparison to parsing and building the plan. I would assume that in most cases the IO is more expensive than a bit of parsing and plan-building, especially in case the SQL implementation is simple (like MySQL's)

It would of course change in case you do a lot of these statements (in the same session since that's MySQL PS scope), but I don't think there are that many applications that fit this pattern.

Karlsson said...

Sherri: Yes, that is absolutely true, I wanted to write that to the post, but realized it was already a bit long. But you are of course right!

Roland: Well, actually my testing shows that even for simple SELECTs there is a benefit. And the more times a statement is executed, and the more complex it is, the more beneficial it is. My tests were doing using the InnoDB engine. What you say about IO is true, but these days servers has a lot of memory, and although IO is still an issue, that is often so only to support persistence (for INSERT and UPDATES). But if IO really IS the big deal, then you are right, there is always a balance.

Didier: I agree with you, Oracle has a great implementation and I have used it on and off for 20+ years. Great stuff. As for MySQL, well we still have a few things to learn, but I think that the SQL is simpler and the optimizer makes some shortcuts are among the reasons that they aren't as efficient with MySQL.

In general, my main hangup is that with MySQL they are still client based resultset based, and materialized on top of that. So no mysql_stmt_use_result() call :-(

/Karlsson

Anonymous said...

well actually i think SELECTs are a bad use case for prepared statements in MySQL. for one there is the additional network round trip. worse yet the chances o being able to reuse a prepared statements are low to non existant for the bulk of MySQL apps since they are written in script languages that do not really support any sort of pooling for prepared statement handles.

worse yet if the scalar values you bind have varying selectivity you run into crappy query plans if you reuse prepared statement handles, since MySQL only computes a generic query plan. some other RDBMS compute the query plan optimized for the first values bound, AFAIK Oracle recently got the ability to maintain multiple query plans for one prepared statement handle.

also of course if your data distribution changes over the lifetime of your prepared statement you might also be in trouble.

so i think there are many many usecase where parepared statements in their current form hurt performance severely for SELECTs.

Stewart Smith said...

of course, the implementation in MySQL makes them not only completely useless (server side prepared statements) but actively harmful.

Outsourcing server memory management to the client is just not a good idea.

Karlsson said...

lsmith: That script languages that are often used with MySQL has limited capabilities of dealing with statement handles is hardly an issue with MySQL, but rather with those languages, although I do get your point in the general sense. As for network roundtrips being an issue, that is possibly true in some cases, but my testing shows that there actually IS a benefit, and that it is quite relevant, in the 30 - 50% range for a SELECTs that are reused often.

Stewart: As for memory management on the Client, I couldn't agree with you more, but this is no different for prepared statements than for the non-prepared statement C API / protocol. That resultsets are materialized on the Client is hardly optimal.

igor said...

Ismith: MySQL optimizer cannot build generic plans. So it has to re-optimize the query for each new set of parameters.
Only certain "safe" transformations (like conversion outer join operations into inner join operations) are done once and forever. BTW some of them are performed at the compilation of the prepared statement, the others are done at the first execution.

Karlsson: I would recommend you to ask one of your colleagues at Oracle Konstantin Osipov. the author of the MySQL PS code, how MySQL prepared statement are compiled and what is actually done at their execution. You'll see that the picture of PS you drew is quite inexact.

Regards,
Igor Babaev, Monty Program AB