Monday, March 29, 2010

OK, you have waited long enough, here's my take on NoSQL

I have to admit I'm getting old, and I am now scaringly close to being a sad old git, and not only that, I'm a database guy also, and I have worked with SQL based relational databases for more than 20 years now.

So considering my age and heritage, I really should just dispose the NoSQL movement as something for those young kids with earrings and a baseball cap (and to add insult to injury, the cap is worn backwards) and that any serious database dude like myself, with my loads of experience (like the invaluable experience of having run Oracle on a Wang word-processor. Real valuable stuff I tell you). But no, you will not hear me do that. But also, you will not hear me say that NoSQL key-value stores will replace all SQL databases within 5 years (If I worked for an analyst and was paid dearly to say things like that, I might have, though. Any takers?).

My take is actually quite simple. The strength of the relational model is that it is inredibly generic. The lack of a specific order and hierarchy makes it even more generic. I think few people would argue that more or less all applications served by NoSQL could just as well be served by a SQL database, if it wasn't for two things:
  • Scalability - The lack of the strict consistency rules of RDBMS heritage in most NoSQL implementations makes them much more scalable. The very nature of most NoSQL stores is distributed, and the lack of strict distributed consistency makes this distribution scalable beyond what is usually possible with an RDBMS, given the same platform etc.
  • Performance - This is largely due to the above, i.e. a NoSQL store being more scalable makes it easier to cram more performance out of it.
Now, with all this in mind, am I saying that NoSQL has all the advantages of an RDBMS, but with much better scalability? Nope, no way, José.

The strict consistency requirements of an RDBMS is also an advantage. It's not so that, if I understand them correctly, the propoents for NoSQL stores thinks that consistency is bad, it's just that they don't want to pay the price in terms of performance for it. And to be frank, although in many cases data inconsistency is acceptable, it still has to be controlled, an uncontrolled consistency, i.e. you don't know how inconsistent your store is and in what way or anything, is not something we want. So even a NoSQL store is limited.

So it all comes down to performance then. We sacrifice consistency to gain performance through scalability. Right? If you agree to that, then I think NoSQL is not a long term solution. It's not that I am saying that "NoSQL is for kids, real databases needs SQL", that was the argument against SQL based databases in the 1980's largely, where Hierarchical databases still ruled, and SQL just had a too big overhead, or so it was thought. The differerence here is that SQL had higher functionality than the competing technologies of the 1980's, but not enough performance in many cases. But performance is bound to go up. All the time. And for much less money. At least for a while to come. Look at virtualization. I've been a proponent for that for quite a while, and just a few years back, the argument against it was that "performance sux". Well, compared to raw iron maybe it did, but that wasn't the point. The point was, did I get enough performance? And in many cases you did, with an environment that was a lot easier to manage and at a low cost.

What this means to me is that there is a place for NoSQL stores right now, where the performance and size requiements are really high, and where one is willing to compromize consistency. But a technology that limits functionality, features and ease-of-use at the price of performance will continue to be a niche technology. But that doesn't mean it's useless or anything, quite the opposite, I'm a pragmatist at heart, and whatever works, works. But if I had the choise of storing my data in consistent or in-consistent state, and if both solutions provided enough performance for my needs, I'd go consistent any time.

And then there is one more thing. The scalability of the NoSQL stores is largely due to it's distributed nature. And there are arguments out there that says that you cannot create a consistent, distributed, scalable datastore. I think you can, I'm convinced of it actually. There may be other compromises needed to achieve that, but that it can be done I am sure.

/Karlsson

8 comments:

lsmith said...

you miss an important aspect of NoSQL: it can make life easier too if you have loosely structured data you can just throw it in the storage, get it out again, even write queries against this loose structure and index .. well depending on the NoSQL data store chosen.

so just focusing on scaling and performance is missing a big part of why NoSQL is appealing.

Karlsson said...

Well, that is true. And in terms of loosely structured data, NoSQL is pretty cool. In a way, the Web has triggered an explosion of loosely structured data, whereas an RDBMS sort of implies that there is structure to data, and that that structure is known in advance. I was, admittedly, focusing more on the NoSQL vs. RDBMS issue, there are instances where data isn't structured and where NoSQL makes good sense.

Mark Callaghan said...

You should not assume NoSQL implies scalable or NoSQL implies distributed. That is marketing FUD. NoSQL implies no SQL and nothing else. Some NoSQL servers are distributed and scalable (Cassandra, HBase). Others are very similar to sharded MyISAM (1 writer or N readers, not crash safe, require explicit sharding).

Karlsson said...

Mark!

As NoSQL isn't really strictly defined, I guess the term is up for interpretation. Would Ingres then qualify as being NoSQL (as it has Qual at it's roots, although it also speaks SQL)? I would say no.

When writing this post, I looked around a bit. One NoSQL definition is in wikipedia: http://en.wikipedia.org/wiki/NoSQL
where it says regarding the NoSQLÖ history: "the name was an attempt to describe the emergence of a growing number of non-relational, distributed data stores that often did not attempt to provide ACID guarantees, in reference to the popular naming scheme for classic relational database systems: MySQL, MS SQL, PostgreSQL, etc.".

Actually, to claim that NoSQL is anything that doesn't "speak" SQL would be pretty restrictive. Any database that is non-relational is closer, but not really there yet either, in my mind.

But from what I have seen the NoSQL term being used for, it has mostly been highly scalable stores, mostly for non-structured data, that limits requirements on consistency to achive scalability.

At the upcoming MySQL UC, I am hosting a BoF session on the history of database systems: http://en.oreilly.com/mysql2010/public/schedule/detail/13607
and maybe we can discuss this there? What's up next? NoSQL? Quel? XQuery (not a bad candidate in my mind)?

/Karlsson

Nick said...

"Quel" -> hadn't heard that one for a while. You must be preparing slides for the history talk if that acronym is top of mind. :)

Gerry Narvaja said...

I must be as old as you are since I know what you meant with Wang word processor, I used to hang around them.

Having been around databases for a while myself, NoSQL is a new tag for a concept as old as databases themselves. I even worked with a NoSQL database back in the late 80s. Hey, even flat files could be called NoSQL as well.

I think that at some point people should start ignoring the hype and ask themselves the question: "Am I better with non-RDBMS storage for my data?" which is the question I'm not seeing answered in most of the blogs on the subject.

My $.02
G

Mark Callaghan said...

@Anders,

I will be sure to track you down at the conference.

Dave said...

minor typo: inredibly should be incredibly

re history: key/blob databases (loosely structured NoSQL) bring back the fun that was the network database navigation problem. My application has to go get the blobs I want, open them up, parse them, etc.