Wednesday, August 29, 2012

The real differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels

Justin Swanhart wrote a blog on Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels and I thought I'd give my view on this issue.

To begin with, from a technical standpoint, Justin is correct in his statements, that is not why I am writing this (and this is the reason I'm writing a separate blogpost instead of just commenting his), but there is more to it than that.

What Justin really writes about is how these isolation levels are implemented in a particular database, in this case InnoDB. For example the sentence "There are a few differences between READ-COMMITTED and REPEATABLE-READ, and they are all related to locking." makes this very clear, these isolation levels, as defined by ANSI SQL as well as how they are implemented by other databases, has nothing to do with locking. What I am trying to say that when you issue the statement SET TRANSACTION ISOLATION LEVEL READ COMMITTED what you are saying is not that "I want these kinds of locks places here and there when I issue any upcoming SQL statements", what you are saying is that "I want to use the READ COMMITTED isolation level, which means that the data returned has certain attributes as related to consistency". As a side-effect of those attributes with regard to the returned dataset and when you use the InnoDB Storage Engine, you get just those locks, but that is another thing. (Like when you turn the steering wheel in your car, what you are saying is NOT that some gears and bolts should twist and turn in this or the other way, what you say is that "please dear car turn right or I am going to run into that upcoming brisk wall", which of course in turns means that those gears and bolts and stuff gets into action, but that is not what you asked for, you asked for the car to turn, and which way it can. And again, I'm not saying that those gears and stuff ain't interesting, they are!). And none of this to imply that Justins post was incorrect or not useful, I just think, as I wrote above, that there is a not more to it.

So, having determined that, what ARE you saying to the database when you issue those commands. And the answer is this (I'm excluding READ UNCOMMITTED here):
  • READ COMMITTED - Please give me only rows back that are committed!
  • REPEATABLE READ - Please give me rows back that are committed, and if I read one of those again, in the same transaction, make sure those rows are still there and haven't been modified.
  • SERIALIZABLE - Please give me rows back that are committed, and make sure that if the same SQL statement is executed again, I get the exact same rows, none being modified, deleted or added.
Now, read the above once again, and for READ COMMITTED, we only make sure that the rows are committed, we WHEN they should have been committed, only that they are when the row is read. So what does this mean then? Well, it means that there is a level of flexibility in the implementation here. Some implementer might say that the point in time to look at is at the start of the statement, another implementer might consider it the point when the row is read. And this is a BIG difference:
  • In the first implementation, it means that when using READ COMMITTED, the read is a "consistent read", which means that the whole set of records looks like it once existed. Or in other words, if I for example sum the value of some column, that sum will be or will have been the sum of that column in the database at some point in time.
  • In the second interpretation of READ COMMITTED, one may, for example, have rows added to the result set "after" the point when where the current row is read. Which means that this is an inconsistent read.
What is annoying is that there are actually different implementations out there. Let's see what ANSI SQL-99 has to say about what is allowed to happen when using READ COMMITTED:

P2 (‘‘Non-repeatable read’’): SQL-transaction T1 reads a row. SQL-transaction T2 then modifies
or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may
receive the modified value or discover that the row has been deleted.

What makes things a bit easier is that most implementations, although not all by far, use the second interpretation (I have to admit I have only looked at SQL-99 here, and not in that much detail, so things might have changed before or after SQL-99).

There is one thing though that should be pointed out in Justins blog, which is what he writes is not MySQL specific, it's InnoDB specific! So different storage engines may behave differently and may place different locks.

Committed by yours truly


gggeek said...

Are you sure that most databases implement "read committed" the 2nd way? As far I understood from the original blog post, InnoDB will use a data snapshot taken at the start of the staemenet.
And iirc, Oracle does as well in its default transaction isolation level...

Karlsson said...

You are right, but I never said anything else (I hope). Both InnoDB as well as Oracle treats READ COMMITTED in a similar way, and so does most RDBMSs. But Informix does / didn't, if I am not mistaken for example, and this is still somewhat in line with SQL-99 (but it also depends on the interpretation of SQL-99, which makes things worse). What I was trying to say that, don't trust that ALL RDBMS systems will do that same thing, and that will still be SQL-99 compliant!


arm said...

Great post Karlsson! Wanted to confirm the following and feel that this should be mentioned for completeness: in repeatable-read isolation level, rows returned when the query is executed the first time can still be modified and deleted by other transactions unless the reads were "locking", which they are not by default. And, in serializable mode, rows once read can not be updated or deleted in other transactions because the reads are always "locking". Right?