Wednesday, October 15, 2008

Using SQL to generate SQL

Using SQL to generate SQL is a trick that is used by many DBA's and is a really useful way to achieve certain things. It's not that much used in applications, but for DBA's it is great, as it allows you to do certain things that would otherwise require a script or something like that. The idea is simple enough, you write a SQL statement that in turn will generate another SQL statement or set of statements, the generated SQL is written to a file, and then you run (for example run the mysql "source" command) on the file with the generated SQL.

Simple? Yes, but effective. As an example, using MySQL, lets say you wanted to have a script that would run "OPTIMIZE TABLE" on all tables with 1000 rows or more in the current database? You cannot do this with one SQL statement, right? To begin with, you can only run OPTIMIZE TABLE on one table at the time?
So we need an SQL statement that will generate a bunch of relevant OPTIMIZE TABLE statements, and then we need to run the latter. Like this:

SELECT CONCAT('OPTIMIZE TABLE ', table_name, ';')

FROM information_schema.tables
WHERE table_schema = DATABASE()
AND table_rows >= 1000;

So all you have to do now is to write the output of the above SQL statement into a file, and then run that file, right? If you have been using Oracle or some database tool other than the mysql commandline, you know that these tools support writing SQL output to a file, without getting any extra output that what you get from the SQL statement, which is what we want here. No column names, no summaries, no timings, just the SQL result. You can to an extent achieve this with the mysql commandline, but you cannot create a script and run it from inside the same SQL script, as you cannot embed the "tee" command in a script that you run in batch. But from the commandline it works, to an extent. So to achieve the above with a script, you can use a script like this:

tee tmp.sql

SELECT CONCAT('OPTIMIZE TABLE ', table_name, ';')
FROM information_schema.tables
WHERE table_schema = DATABASE()
AND table_rows >= 1000;
notee
source tmp.sql

Now, for the above to work, there are some assumptions. To begin with, as I said, you have to run this from the commandline, you cannot use tee whith mysql in batch mode, such as when piping a command into it. Also, you have to run mysql with the options --skip-column-names and --silent. And finally, the mysql "tee" command has an irritating aspect: It will append, not overwrite, the outfile. Yikes!

A better way then, if you really want a script for this, is to pipe the output from one mysql session into another. If I, into a file named runopt.sql put this:

SELECT CONCAT('OPTIMIZE TABLE ', table_name, ';')
FROM information_schema.tables
WHERE table_schema = DATABASE()
AND table_rows >= 1000;

I can then create a script like this to optimize tables in the test database with 1000 rows or more:

mysql --skip-column-names test < tmp.sql | mysql test

Now
, this is all well, but wouldn't it be nice to be able to write to a file and then source that file from one SQL script (why the tee command is not allowed in batch mode beats me. I'll have to check this up)? Well you can use my MyQuery Open Source tool for that (available for download here: https://sourceforge.net/projects/myquery/ DIsclaimer: This is a Windows GUI tool, no Unix / Linux support as yet. Sorry). This has a special output command for these kind of no-frills output: sparse_output. To keep MyQuery scripts somewhat compatible with mysql, commands unique to MyQuery, such as this one, are commented.

So with myquery, the above can be achived with this simple script:

/*MYQUERY sparse_output_file tmp.sql */

SELECT CONCAT('OPTIMIZE TABLE ', table_name, ';')
FROM information_schema.tables
WHERE table_schema = DATABASE()
AND table_rows >= 1000;
source tmp.sql