Thursday, December 18, 2008

The forgotten FOREIGN KEY constraint

If you are using MySQL with the InnoDB Storage Engine, you know and probably use FOREIGN KEYs. There are some who dislikes the use of these, saying the are bad for performance, but I tend to disagree, if the price of keeping my data consistent is a slight performance degradation, then so be it.

Anyway, I was going to write about a couple of lesser known FOREIGN KEY constraint attributes here. We all know what a FOREIGN KEY constraint is, right? A reference from a column or set of columns in one table to the PRIMARY KEY in another table? Yes, this is correct, but it's not the whole truth, actually, a FOREIGN KEY is a reference to a PRIMARY KEY or UNIQUE key in another table. Being able to reference a UNIQUE constraint is sometimes rather useful.

But before we go into that, lets look at a few subtle differences between a PRIMARY KEY and a UNIQUE constraint. In RDBMS design lingo, the UNIQUE constraing usually represents a candidate key. This means that the UNIQUE constraint columns could be used as a PRIMARY KEY, but they are not.

Often a UNIQUE constraint is used to ensure that, say, a NAME of an entity is unique, even though the entity itself has an identifier on it. The identifier used as the PRIMARY KEY is then often an AUTO_INCREMENT column. A few reasons why we don't want to have the name column as the PRIMARY KEY is that having a long string as an identifier is cumbersome, and that an INTEGER AUTO_INCREMENT column is so much easier to handle as an identifier (AUTO_INCREMENT being one such reason).

The PRIMARY KEY is often then, at least if I can decide, a surrogate key or an opaque identifier, i.e. a value that carries no other information than being a key.

That the PRIMARY KEY column is an opaque value has the advantage of removing the issues when using, say, a name as a key, that is referenced by columns in other tables, and then the name is changed.

A UNIQUE constraint is similar to the PRIMARY KEY constraint in that is provides uniqueness of a column or set of columns in a table. The PRIMARY KEY is just the primary identifier. One difference though is that a UNIQUE constraint column or set of columns may be NULL.

A FOREIGN KEY column may also NOT reference an entity in a PRIMARY KEY or UNIQUE constraint. This is when the FOREIGN KEY column is NULL. Note that a NULL in a FOREIGN KEY is not matching a NULL UNIQUE tuple!

Lets show an example of how to use a FOREIGN KEY that references a UNIQUE constraint, an example that is useful and demonstrates most the things I described above.

Let's say I want to keep track of rockbands in a table called bands. Among the things I am tracking is the location of the band. I want to at least know the country the band comes from, but possibly also the city. So I have two more tables, country and city. Each city is of course also assigned to a country.

All in all, I have a table of countries, a table of cites in coutries and a table of bands know to be in a specific country or possibly a city.

So:
CREATE TABLE country(id INTEGER NOT NULL, name CHAR(30) NOT NULL, PRIMARY KEY(id));
CREATE TABLE city(id INTEGER NOT NULL, country_id INTEGER NOT NULL, name CHAR(30) NOT NULL, PRIMARY KEY(id), FOREIGN KEY(country_id) REFERENCES country(id));

That look right, OK? Now, I want to create the bands table. The first attempt is this:
CREATE TABLE bands(id INTEGER NOT NULL, name CHAR(30) NOT NULL, country_id INTEGER NOT NULL, city_id INTEGER, PRIMARY KEY(id), FOREIGN KEY(country_id) REFERENCES country(id), FOREIGN KEY(city_id) REFERENCES city(id));

Does this look right to you? Does it ensure data consistency? In a sense yes, but not in the case of country and city in the bands table: It is possible for me to designate a band to be in a city that is not in the country that that city is in! So I could say that The Rolling Stones are in Berlin in England, which, considering the shape of Keith Richards and the medication he is using, might not be so surprising. All the same, this is not correct.

What we have is not really just a bunch of countries and cities. The city table not only defines the cities, it also defines the valid combinations of city and country! And these combinations are unique! (yes, I know there are many cities with the same name in many countries). So what my cities table really should look like is this:
CREATE TABLE city(id INTEGER NOT NULL, country_id INTEGER NOT NULL, name CHAR(30) NOT NULL, PRIMARY KEY(id), FOREIGN KEY(country_id) REFERENCES country(id), UNIQUE(country_id, id));

And now I can ensure that Mick Jagger and his friends doesn't end of in the wrong country? I create my bands table like this:
CREATE TABLE bands(id INTEGER NOT NULL, name CHAR(30) NOT NULL, country_id INTEGER NOT NULL, city_id INTEGER, PRIMARY KEY(id), FOREIGN KEY(country_id, city_id) REFERENCES city(country_id, id));

And note that it is OK to create a band where city_id is null! But we are still not quite home yet. The issue is with NULL values and the combination of NULL and non-NULL values. The above setup will actually allow us to enter ANY value for country_id for a band, as long as city_id is NULL. This is because, as we know, NULL means undefined. And the combination of a know defined value and a non-defined NULL values is... NULL and non-defined. So we need one more foregn key here:
CREATE TABLE bands(id INTEGER NOT NULL, name CHAR(30) NOT NULL, country_id INTEGER NOT NULL, city_id INTEGER, PRIMARY KEY(id), FOREIGN KEY(country_id, city_id) REFERENCES city(country_id, id), FOREIGN KEY(country_id) REFERENCES country(id));

What about the NULL discussion above? Wouldn't it be reasonable to think that if country_id is specified, and city_id is NULL, that one could ensure that a country exists by a lookup in the city table? And that this would be valid RDBMS design? Nope, it's actually the other way around, this is actually HELPING us create a workable database design! Think about what would happen if we had a country with no defined cities yet? We would not be able to place a band in that country, even by setting city_id to NULL, as there would be no such country_id in the city table!

6 comments:

Shlomi said...

Hi,

Nice post,
One comment though: InnoDB does treat UNIQUE keys as candidate keys: if no PRIMARY KEY has been defined for your InnoDB table, InnoDB selects the first UNIQUE KEY and uses it.

Regards,
Shlomi Noach

Shlomi said...

Sorry, clicked too fast...
The above is true, of course, only if the *is* a unique key to be used, otherwise InnoDB creates its own PRIMARY KEY (based on the internal row_id, I believe)

Regards

Roland Bouman said...

Hi!

"We all know what a FOREIGN KEY constraint is, right? A reference from a column or set of columns in one table to the PRIMARY KEY in another table?"

Actually, the term "foreign key" just means that a column or collection of columns holds values that appear as key, that is, identifiers of another table. So, the columns are meant for storing key entries, but key entries from another table - hence, 'foreign' key. 'Foreign key' is a semantic term.

The term 'foreign key constraint' is simply an enforced rule that ensures the values are in fact valid keys of the other table: A foreign key is a way to maintain referential integrity.

Innodb is interesting in that it does *not* enforce that the foreign key constraint points to PRIMARY KEY or UNIQUE constraint columns. As long as the referenced columns (that is, columns in the 'parent') form the prefix of any index, InnoDB will accept the foreign key definition. Rather strange as it allows many parent rows to exist for a single child row (as well as of course many child rows per parent row)

Kind regards,

Roland

mtocker said...

I'm from the other school of thought you mentioned - people who like to disable foreign keys. I just thought I'd expand a little on my reasoning:

- Having foreign keys increases your locking - and as a result decreases concurrency.

- If your engine is transactional/ACID compliant, then you (hopefully) shouldn't end up with inconsistencies unless you have some sort of logic error (it happens!). Based on this, some choose to enable foreign keys during development, disable them in heavy production.

- The InnoDB implementation is a bit ugly. It's missing a few features. One smaller one I'd like is deferrable constraints, so that if I am committing a chunk of work the checks only have to pass on commit (to understand how much of an issue this is, look what mysqldump has to do to restore data if you have foreign keys).

Christian said...

I found this store that perfoms a drop table with cascade constraint for a given table:

http://laboticadellanga.blogspot.com/2009/01/drop-table-cascade-constraints-sql.html

devon broad said...

young children and should you may notice totally different lenders you may value more highly to utilize. Through check cashing endeavor for making every knowledge around heat and friendly, quickly, respectful, and in addition useful, we tend to tend to is ok thus on accumulate your own commitment. Visit around any USA day cash advances spot lately which we are able to demonstrate the item for you.