Wednesday, June 17, 2009

Of Oracle API's, array interface and bind variables

As we know, the MySQL prepared statements API leaves a bit to be desired, although there is a fair amount of progress. With MySQL, using the "normal" API or the prepared statement API usually doesn't have that much different in terms of performance.

When you use Oracle though, things are different. There is no separate "prepared statement" API, there is just on interface, which has a lot of functions and structures, and isn't the easiest to use, because of the complexity, but it IS very functional and performant.

If we look at the low-level Oracle interface, there used to be three of them:
  • Oracle Call Interface (OCI) - This is the the interface that is most complex, but also most functional. It is a C level interface along the lines of the MySQL C API, i.e. you still pass SQL statements, it's not a file level interface of some kind. This is also the interface that is used to build Oracle Data Cartridges (i.e. the "real" Cartridges that plug into the database and are called like stored routines).
  • Oracle PRO* - These are the Oracle Embedded SQL interfaces. And if you think that the Oracle Embedded SQL Precompilers produce OCI code, then think again, the precompilers use a different library. But they may be used together, with some work.
  • Oracle SQL*Plus API. As far as I know, SQL*Plus (which is the Oracle interactive command line query tool) used a different interface from the other two guys. This was never confirmed though, so I might be wrong here and it might have changed.
Now, using any of the two public API's above, OCO or PRO* (C, Fortran etc), you nearly always use what are prepared statements in MySQL. Why? Because Oracle is pretty slow at parsing and optimizing statements, and also because that when you work with Oracle, you tend to write rather complex SQL (as the SQL Syntax is a bit more rish and because this is what a macho Oracle developer is supposed to do. Write complex SQL), which just adds to the time it takes to parse and optimize. On the other hand, once a statement is parsed, reusing it is blindingly fast in comparison. So in the case or Oracle, prepared statements are used ALL THE TIME, frankly, often you see prepared statements with bind variables being used even if you are to execute a statement just once, and this is sometimes faster than just running a single statement with the data being part of the statement, the way we mostly do it with MySQL.

So, what is the Array Interface then? This is an extension to PRO* and is also available with OCI (everything is available in OCI) that allows you to so what Ronald Bradford described in his blog, i.e. inserting multiple rows with one statement. But the Array interface does this differently, the INSERT statement looks the same, the difference is on the API side of things. What you do is to parse you statement as:
But when you bind the ? parameter, you bind it to an array of integers, and when you execute the statement, you tell Oracle how many array records to process. This is very fast, the speedup provided, compared to executing N separate INSERT statement, is at least as big as it is with MySQL using the array interface, actually higher (As Oracle is typically not that good at executing many small statements).

Which is the better way of dealing with arrays then? If you ask me, I have a slight preference for Oracle, and let me tell you why: With MySQL, arrays are only available in this shape and form for INSERT statements, for obvious reasons. With Oracle, the same array interface is applicable also for UPDATE, DELETE and (don't hold your breath), SELECT! Yes, you can allocate an array of N rows, and then tell Oracle to fetch N rows at the time, again this is way faster than fetching row-by-row (on the other hand, the way data is fetched is very different between MySQL and Oracle. MySQL will always materialize a dataset somewhere (now I am talking API datasets, I know are many materializations going on inside MySQL also, before we reach the protocol), where as Oracle will typically not do this, at least not on the Client side of things (if this is good or bad is a different issue).

Another advantage, if you ask me, of the Oracle way of doing things, is the the SQL statement itself is intact. The disadvantages are at least two:
  • You have to use bind variables.
  • The API is a bit complex.
So, all in all, I like the idea that MySQL has some kind of array interface. The optimal would be to have an API array interface AND a SQL based (like MySQL) array interface. But maybe I'm just dreaming.