Wednesday, November 2, 2011

MongoDB for MySQL folks part 3 - More on queries and indexes

Last time I wrote about MongoDB for MySQL DBAs I described some of the basics of MongoDB querying, and this time I'll follow that up with some more on querying.

As we saw last time, the basic format of a MongoDB query is:
Note that you do NOT replace db with the name of the database you want to query here, you just make the database you want to use the current one and issue the query, such as:
> use test
> db.mycoll.find()
The example above will find all objects in the mycoll collection, and will include all the object attributes and also the key (_id), like this:
{ "_id" : ObjectId("4eb0634807b16556bf46b214"), "c1" : 1 }
{ "_id" : ObjectId("4eb0634a07b16556bf46b215"), "c2" : 1 }
{ "_id" : ObjectId("4eb0635607b16556bf46b216"), "c1" : 2, "c2" : 2 }
{ "_id" : ObjectId("4eb0635e07b16556bf46b217"), "c3" : 3 }
The Object id is generated by MongoDB itself here, although you can set it yorself if you want to, as long as it's unique. The insert method is used to insert data:
> db.mycoll.insert({c3: 4, c4: 'some string'})
> db.mycoll.find()
results in;
{ "_id" : ObjectId("4eb0634807b16556bf46b214"), "c1" : 1 }
{ "_id" : ObjectId("4eb0634a07b16556bf46b215"), "c2" : 1 }
{ "_id" : ObjectId("4eb0635607b16556bf46b216"), "c1" : 2, "c2" : 2 }
{ "_id" : ObjectId("4eb0635e07b16556bf46b217"), "c3" : 3 }
{ "_id" : ObjectId("4eb063d307b16556bf46b218"), "c3" : 4, "c4" : "some string" }
And as you can see, typing is automatic, or you can look at it as being type agnostic. Now, this wasn't much more than we saw last time, what we want is to select some specific objects and possibly get some specific columns from it, this is done by specifying one or two arguments to the find() method. For example, if I only want to get the object back that I inserted last above, I'd do this:
> db.mycoll.find({c3: 4})
{ "_id" : ObjectId("4eb063d307b16556bf46b218"), "c3" : 4, "c4" : "some string" }
And this wasn't really complicated, right? The condition is passed as a Java Script object notation, and that is fairly uncomplicated. But what happens for something slightly more than this really simple example, like a rangesearch? To get all objects where the c3 member is 4 or higher (which results in the same object as above by the way), you would write something like this, and :
> db.mycoll.find({c3: {$gt: 3}})
{ "_id" : ObjectId("4eb063d307b16556bf46b218"), "c3" : 4, "c4" : "some string" }
I will show some more $-operations beyond $gt in a later post, for now just accept that they exist and are documented here: Advanced Queries

The _id column is just annoying here, right now, but it is always shown by default, as are all the other object. To get rid of it for now, this will do the trick:
> db.mycoll.find({c3: {$gt: 3}}, {_id: 0})
{ "c3" : 4, "c4" : "some string" }
Not too bad, right, and kinda easy to understand. The falgs you pass for each field in the second argument may have one of three values:
  • 1 - Include this field. This is the default.
  • 0 - Do not include this field.
  • -1 - Include no fields except this one and the ObjectId. You may have more of these, in which case all the -1 flagged fields will be included.
Let's try a more advanced version. I want to the the c1 and c2 attributes, and nothing else, then I do this:
> db.mycoll.find({},{c1: -1, c2: -1, _id: 0})
{ "c1" : 1 }
{ "c2" : 1 }
{ "c1" : 2, "c2" : 2 }
{ }
{ }
As you can see, I have to explicitly exclude the _id field.

Online help
The mongo commandline tool for once has decent online help. Typing just help will show the options. For help on database specific operations, type and for collection specific operations, type, such as In JavaScript, a function is just another script, and adding arguments to the function will execute the function, but maybe you want to see how the function is implemented? The just type the name of the function, like this:
> db.mycoll.find
function (query, fields, limit, skip) {
return new DBQuery(this._mongo, this._db, this, this._fullName, this._massageObject(query), fields, limit, skip);

DBA Work - Indexing data and explain
What would a mongo DBA want to do? Let's try creating an index. Let's say we want an index on the c1 attribute in the mycoll collection as above, then we must use the ensureIndex() method on the collection in question, telling what columns I want to index, like this:
> db.mycoll.ensureIndex({c1: 1})
And that's it. Let's try to query that collection again, this time using the c1 column as an argument, and hopefully the index will be used:
{ "_id" : ObjectId("4eb0634807b16556bf46b214"), "c1" : 1 }
Right. But is the index used? I want to know that it is for a fact, or if it isn't, so I have something to complain to my developers about. In MySQL, you want use the EXPLAIN command and figure out what index are being used, but with mongo? Easy. Use the explain method, like this:
> db.mycoll.find({c1: 1}).explain()
"cursor" : "BtreeCursor c1_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"c1" : [
Hey, that's prett cool, right! The index is a standard B-tree index (the only index type available in MongoDB). An index can also be unique, like this:
> db.mycoll.ensureIndex({c2: 1}, {unique: true})
Which will create a unique index on the c2 attribute, but in our case it will not work:
E11000 duplicate key error index: test.mycoll.$c2_1 dup key: { : null }
What's going on here? Well, the c2 attribute isn't included in all objects, and but the index will include all objects, and MongoDB considers NULL a duplicate here (unlike an SQL NULL in which case this is not the case). So the real question here is, what do you want? As MongoDB is schema-free, and you can have any kind of attributes, and also looking at the data above, what I would probably want is an index on the c2 attrbute that makes sure that c2 is unique WHEN INCLUDED, if the c2 attribute isn't part of the object, then please mr. Indexer, ignore it. This is called a sparse index in MongoDB, and what it means is an index that just indexes the objects where the attribute is included.

Note that this may not always be what you want with non-unique indexes, but it often it is, and it makes seaching and inserting faster (as the index is smaller). In the case you have an attribute that is only rarely part of the object, and you want to find the objects where it IS included, this is just what you want.

In our case, the index is created like this:
> db.mycoll.ensureIndex({c2: 1}, {unique: true, sparse: true})
And this time we had no errors. Let's see how it works, first get some data:
> db.mycoll.find({}, {c2: 1, _id:0})
{ }
{ "c2" : 1 }
{ "c2" : 2 }
{ }
{ }
Now, let's see if the unique index on c2 will guarantee uniqueness by inserting a new row with an existing value for c2:
> db.mycoll.insert({c2: 1})
E11000 duplicate key error index: test.mycoll.$c2_1 dup key: { : 1.0 }
Yo! That worked as expected! As does this (which gives no errors):
> db.mycoll.insert({c2: 3})

That's it for now, I'll be back soon with some more MongoDB DBA stuff: Sharding!

No comments: