Tuesday, October 4, 2011

NoSQL for us RDBMS folks - MongoDB 101

As you probably know, I have been doing RDBMS work for many years, some 25+ years by now. At Recorded Future I am the database architect, and although an RDBMS is used extensively, MySQL in this case, we are looking at options, and are currently doing more and more work using a NoSQL Solution, probably te most popular one by now, namely MongoDB.

And before you complan: NoSQL is not a good term, but someone with a longer NoSQL background should then find something better, not yours truly. And for all intents and purposes, you know what I am talking about, right?

I plan to post a few MongoDB posts here, how it looks like from an RDBMS dudes (like myself) POV. This is the first installment, but there will be more. I should also point out that I am no expert in NoSQL technologies in general, nor specifically in MongoDB, but I am trying, and our MongoDB setup is large and complex enough to serve as a decent example: We have a couple of Tb of data in Mongo, we use Replication and we use Sharding / Clustering whatever you want to call it. In addition, we run this puppy on Amazon EC2.

So, why did we move to MongoDB, and what are the results? I think the reason is two-fold:
  1. Better performance. Not better performance per se, but by using sharding and large in-memory stores, we can achive this.
  2. Better and easier to set up scalability. Again, this doesn't come by itself, nor is it without problems. But the built-in sharding in MongoDB actually does work reasonably well, and is largely transparent to the application.
This sounds great, doesn't it? Everyone wants better perforamnce and better scalability, right? And why can't we have MySQL to scale in the same way as MongoDB? Well, there are drawbacks to all this, but in our case, I think we can live with it.

An important thing to note here, I think, is this: Scalability is transparent and easy to use and implement because the basic idea of a NoSQL system (a key-value store in this case) is so simple: One unique key points to one value. That's it. This is easy to shard. Dead easy. A multi-table relational schema is much less so. But you could simplify the schema in MySQL to make it just as easy to shrd, but then what is the point of an RDBMS at all? And MongoDB Scalability is an incredibly important part of MongoDB performance (in addition to it's in-memory nature).

In our case, using Amazon EC2, scaleability is really important. A single Amazon instance is limited in terms of memory and CPU power. The limits are high (some 70G RAM for example), but there is still a limit. If you have, say, 500 Gb or a Tb or so of data, 70Gb might not be that much, so you have to shard or cluster or something. And this is where MongoDB rocks, it is easy to shard, the simple data structures in a NoSQL database mena this this can be done largely transparent, and all means that MongoDB Rocks for us.

Do we have other options? You bet! MySQL Cluster, NimbusDB (aka NuoDB), Oracle, Cassandra, etc etc, but we ended up with MongoDB for now.

So from a DBA POV, what is mongo then? To begin with, there is no SQL interface and no "query language" per se, rather to the user, MongoDB looks a lot like a Java Script environment with a lot of space for variables, sort of. MongoDB has databases, and in databases there are collections, which are sort of like tables in an RDBMS. A collection is a key-value store, you can define a key for the collection, or MongoDB will generate one for you. MongoDB is schemaless, which means that a collection doesn't not have any predefined columns or anything. Instead, if an object that you store has an attribute foo then you assign a value to it in your object and insert the object. That's it. And you can have an index on foo if you want to. And index in this case is a traditional B-Tree index, nothing more exciting than that.

An Object as above in Mongo is a JSON structure, which internally in Mongos is represented as BSON. To access the collections you use Java Script as I said before, like this to insert an object into the collection bar with the attribute foo:
db.bar.insert({foo: "foobar"});
In SQL this means: INSERT INTO bar(foo) VALUES('foobar')
Used this way, Mongo will assign the object a unique id, called _id, which is similar to the PRIMARY KEY in an SQL RDBMS. To retrieve the row we just inserted, use the command:
db.bar.find()
Which in SQL means: SELECT * FROM bar
And you get something like this back:
{ "_id" : ObjectId("4e8c9b2e2fde67676c2381ae"), "foo" : "anka" }

MongoDB has a big bunch of built-in Java Script methods, what I showed above was just a very basic sample. To use Mongo, knowing Java Script and JSON in general is more or less a requirement, and if you don't know these, and start using MongoDB, you will know then soon.

I will write some more blog-posts on Mongo eventually, but before I close this post, let me tell you one more thing. I never created the collections (bar) I used above. The reason is that yiu do not have to, it is created for you as you access it. The same goes for databases. This is the reason that you often end up with a bunch of misspelled databases and collections in mongo, which you can of cource drop after you have made a mistake, but as we are all lazy, you mostly have a bunch of non-used databases and collections.

/Karlsson

4 comments:

Anonymous said...

Hi Anders, if you're evaluating NoSQL, you might want to take a look at the tool I'm working on - it's an in-memory key/value storage.

Daniƫl van Eeden said...

Nice article.

What are the drawbacks of using MongoDB?
What about MongoDB replication from/to MongoDB?
What about statistics, tuning?

Karlsson said...

Drawbacks of MongoDB? I think I can see 3 big ones right now, compared to, say, MySQL:
- Much smaller community (means lack of help and support).
- Less mature software.
- Less flexibility.
On the plus sida are:
+ Real good performance.
+ Good scalability.
+ Ease of use and setup, even advanced configurations.

As for MongoDB Replication, we use what they call Replica Sets. It works reasonably well but is much different from what a MySQL user is used to. Main gripe is that the slaves aren'tt really as autonomous as in the case of MySQL.

As for tuning and instrumenation for that, this is one area where MongoDB still has some way to go. As does MySQL if you ask me.

/Karlsson

Karlsson said...

Kostja!

I'll sure have a look at your tool!

/Karlsson