Friday, February 17, 2012

Do I really want to get rid of SQL_MODE?

I recently wrote about SQL_MODE, and Ronald Bradford was quick to reply with his view on things. Although my post was a bit of a rant, and I think it was a necessary rant, it seems that Ronald and I largely agree on the overall matter of things.

Now today WebYog implies that I and Ronald want to get rid of it and replace it with more strict, rigid checks on data entering MySQL. Tell you want, however much I would like that, I am not stupid, I realize that is not practical. Rather, I say sure, keep SQL_MODE, but make it possible to enforce it and check it, and do not allow just about anyone with access to the database to compromise the data in it by a single command.

WebYog also implies that this would break a lot of applications. I say that is not the case as I say you can still set it at anything you want, just that it is enforced!

Even better, as WebYog says that applications share a MySQL instance, they may have, probably do have, different requirements. OK. Read what I wrote: I suggest you set it and enforce it on the database level. Right?

WebYou then pops up this example. With a table like this:
CREATE TABLE `test`.`strict`(
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`col1` VARCHAR(10) NOT NULL,
`col2` VARCHAR(10),
PRIMARY KEY (`id`));
An insert like this would fail:
INSERT INTO `test`.`strict`(`col2`) VALUES (‘string’);
Yes it would. And that INSERT is plain silly. Why am I assuming a value without specifying it anywhere, nowhere in the INSERT statement, not in the table schema, I just assume that MySQL will figure out an appropriate value for me? And if you DO assume something, like the value of the column c2 should be set to an empty string when not specified, then why not set the default value for it in the create table statement? Like this:
CREATE TABLE `test`.`strict`(
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`col1` VARCHAR(10) NOT NULL,
`col2` VARCHAR(10) DEFAULT '',
PRIMARY KEY (`id`));

Or? And note that in this case, having done things right, you don't get a warning! (Yes, the INSERT on the original table actually issues a warning. Why would I want to have code that issues warnings. And if the user can leave the column out without specifying it, why is it NOT NULL).

Actually, this behavior shows another MySQL weirdo behavior. If I change the table definition above slightly, and allow NULL values in the col2 column, what do I get from an INSERT if i leave col2 out? The same INSERT as above. The column col2 has the same database. The INSERT still specifies no value for it. Now col2 will be set to NULL. And this is supposed to be coherent, consistent and easy to use?

Now, when the column col2 allows NULLS, MySQL does something with my schema also, it adds a DEFAULT NULL clause to col2. So that makes a bit more sense, if it wasn't for the fact that it doesn't add a DEFAULT '' when I specify NOT NULL.

Enough of weird MySQL behavior complaints, all database systems have these, to an extent. But it would be nice if MySQL was more in line with the rest of them, which SQL_MODE is on its way to do, but then MySQL should also be able to enforce it, if I wanted it to.

/Karlsson
PS Comments are welcome but before you fire off your tomatoes, read what I wrote.

Thursday, February 16, 2012

More on database consistency

I've written a few times about database consistency before, mainly in conjunction with NoSQL and the concept of Eventual consistency. Now, I'm about to do an update on the subject, as I have come to realize a few things.

From an oldtimer like myself, having been an SQL guy for 25 years, I remember Punk-rock and even The Beatles and I having hair growing out of my ears, what can be contributed? Well, let me beging with stating what I mean when I say Database consistency. What I mean is Consistency as the C in ACID (no, we aren't talking drugs here, we are talking databases). Let's see what the online authorative reference work on just about anything on this planet, from the size of J-Lo's feet to the number of Atoms in the universe (those two numbers are quite far apart by the way), Wikipedia: "The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof." In other words, consistency means that the databas is always in a consistent state, the different data items in it (rows, if you wish) are "in sync" with eachother. I think most of you agree with this notion.

Now, when it comes to NoSQL databases, like MongoDB, this terminology is different. These guys introduced Eventual consistency, which means that the database will eventually reach a consistent state with regards to a specific transaction that changes that "state" of the database. But there are multiple transactions at the same time, and they aren't necessarily, in an Eventual consistentcy model, consistent with eachother as they aren't on the same node. But the theory goes that some time, eventually, they will. If the system never stops, and transactions keep coming, then eventual consistens is determined to happen within 100 ms or less from the point in time when pigs fly. But if you stop all state changing transactions, then the state of the database will reach consistency. Eventually.

Now in NoSQL circles there is a thing called a Consistent read. If my database was consistent, then any read is consistent, right? And in the case of use SQL RDBMS folks, consistency is about the state of the database when I write to it? Well, if you have an eventual consistency model, where you have data distributed all over the place, things are different. To begin with, the basic thing that you have to make sure, and the NoSQL databases do this, is to ensure that the writes to the databases are all in order (we know this from MySQL also, and it is part of the issue with the MySQL slaves, and the NoSQL guys aren't fixing this particular bottleneck). And here we mean they are in order in each and every node. Across nodes, we don't care, which is where I get my abilility to scale out writes from!

A consistent read is a read where the data I am reading is in a consistent state, or sometime that my data is the most recent data. These two aren't always the same, but the second (reading most recent data) typically implies the former, although I assume this is not always the case. This is VERY different from the meaning of Database Consistency as we RDBMS folks look at it. All the same, the concept sure is useful, and as the NoSQL distributed systems doesn't need to keep the data consistent on a global level, a lot of shortcuts can be taken. But having Read Consistency has litte to do with Database Consistency. Your NoSQL fans will complain here and try to tell you that these achieve the same thing, but they don't. Achieving global Database Consistency costs an arm and a leg or two in performance, but the database is ALWAYS consistent.

So two different things, both with advantages and disadvantages, but they are STILL different! And the NoSQL folks will confuse things by allowing you not to have even Read Consistency, somewhat implying that turning it on means you get Database Consistency and that the Read Consistency model (which is very very simple by the way) means you get the effect of Database Consistency using Eventual Consistency. Nope. You don't. Which doesn't make it bad, but IT IS NOT THE SAME THING!

/Karlsson

Wednesday, February 15, 2012

Why I think SQL_MODE is useless...

I think the SQL_MODE setting is a sort-of good idea, but that the implementation of it is useless, more or less. What I would want from a feature like this is to allow me to control the quality of the data in the database, but that is not what SQL_MODE does, regrettably.

Rather, what SQL_MODE does is define a default quality checking level for an individual session. This is NOT the same thing as guaranteeing that data has certain qualities in the database.
  1. To begin with, a specific session can change it. If you want to insert an invalid data, just set SQL_MODE=ALLOW_INVALID_DATES and go ahead and insert.
  2. Secondly, setting this to a non-default value requires no specific privileges at all!
  3. Thirdly, as anyone can bypass these checks, you would assume that there was a way to check afterwards that data follows a specific SQL_MODE, or at least a function that I could apply to a table or column to check it, but no, no such thing exists.
  4. The different consistency checks provided are somewhat ambigious and difficult to understand. Some applicatioon that found certaian aspects of MySQL 4.0 to be incompatible with their applications sets SQL_MODE=MYSQL40, which implies a lot of things.
  5. And last but not least: This variable controls data quality. If some user decides to set this to some value to get around the database quality checks, you expect it to be logged somewhere, but no. Well, you can see it if you look carefully in the binlog, after the fact (and there you don't see the actual SQL_MODE, only the numeric representation of it).

And having said all this, I know what many of you think: Why let the database chack your data, that is the job of the application anyway? Right? Well, sort of. In my case, when I write applications of build websites, SQL_MODE isn't often an issue (except that Joomla insists on setting SQLMODE=MYSQL40, which means that my nice FK constrained database gets screwed up). But in larger team, like at work, developers are in one part of the office, and I am playing with the database. My responsibility: Make sure data is there is is right. The developers responsibility: Build their applications and make them run.

Here at Recorded Future, this is a very, very small problem. We have a good team, which isn't that big, we communicate and we also use less and less of MySQL these days. But I have been in places where this is a very real problem, and if delevopers use stuff like Hibernate it gets even worse, as the developers are now even more distanced from the real data.

And note here: This is not to say that developers lack competence and knowledge of databases, or that they write bad database code something, it's just that their focus is different from someone like myself, who work with databases on a regular basis.

And look, if you were a consultant or a developer, wouldn't it be nice to look at a database and just KNOW what data goes in there. Will I find a date that is the 31st of April in a datetime field or not?

And yes, there are applications that just will not run with MySQL 5.x. Fine, allow an option to be set for this, but don't make it changeable on a session level! On a global level then? With multiple applications using different databases, this is also a bad idea. I'd say make it a per database attribute!

And then we come to the situation where you really NEED to disable constraints, SQL_MODE etc. OK, I can allow this, assuming:

  • A higher level privilege than normal, or even better, a special privilege, a special privilege, is needed.
  • There has to be a way to check it, like an ALTER DATABASE SQL_MODE=xxx, ALTER TABLE SQL_MODE=xxx etc. And also some means of column level checking (can you write a script for this checking? Sure. Will it be faster or more convenient than running it as part of the database process? Nope, far from it. And convenience is important here as we want to run this often, to make sure that our data is what it should be).

And by the way, it is OK to change SQL_MODE on a session level, but only change it UP (where this is applicable). I.e. if the global SQL_MODE is ALLOW_INVALID_DATES, then a session can turn this off, but not ON (and by the way, again, this is a weird name. Id's rather have all SQL_MODE options be names as a FEATURE that is a good things: STRICT_DATE_CHECKS with the default being ON would be a better option, in my mind).

That's enough of complaints for today, take care
/Karlsson

Monday, February 6, 2012

On datatypes, domains and why I think it's time we reconsidered

What's in a datatype then? A MySQL SMALLINT? A C int? An Oracle BLOB? One thing is for sure, they are not very well standardized, not even within the same environment, and much less so across them. And what does it means, really? When should I use a BLOB, when a BINARY VARCHAR and when to use a long long?

A datatype defines many attributes:
  • What data I can store in it: Only numbers? Printable alaphanumeric characters? Unicode? Binary data? An object?
  • What I can do with a value of that particular type and how does it behave? Can I concatenate two values? Add them (that is NOT the same as concatenate! The same operator (+) may be used, but it's not the same thing!)? Save it to a file?
  • How is it stored and represented in technical terms. And this can be argued, a C int doesn't define how it is stored on disk, but a MySQL INT does! And a BLOB is a type that is largely there just because it defines how to handle things on disk!
To link this up with MySQL, let's look at the built-in MySQL datatypes. Some of them are rarely used, and for all intents and purposes, many of them are overlapping and in some cases they are there JUST BECAUSE they have different semantics and behaviour! I am, for example, not a big fan of all the different INT types that MySQL supports; tinyint, smallint mediuming, bigint all in signed and unsigned shape. Why is this useful? Yes, I know a smallint is smaller on disk so there is 2 bytes less to read from disk compared to an int. Yeah right, big deal: Look now, if that is a big deal for your application, then you have bigger problems. Which doesn't mean that using a smallint isn't a means of, in some cases, improve performance a small bit. And that this is sometimes useful. But really, this should be hidden somewhere, we just have too many datatypes. The same goes for all the different BLOB datatypes! The BLOB datatypes has another issue by the way, the names of them a a bit silly: "TINYBLOB" is a contradiction, as BLOB stands for Binary LARGE Object! A Large object that is Tiny? And a something that is at the most 255 bytes isn't really "Large" anymore, that was way back in the 1970's or so.

No, I'm not a big fan of determining which datatype to use by considering storage requirements to optimize performance. That is something you might do as a performance enhancing measure after the job is done, possibly. In C, which is a language I have used longer than I care to remember, int seems to serve the role of being the default numeric integer datatype, which is not a bad thing, in particular not in C as C is a language used to write low-level stuff in, stuff that is performance sensitive, and a C int is linked to hardware specific integers.

FLOAT is another issue altogether. In very few business applications is there a need for FLOAT or DOUBLE in terms of IEEE floating point values. Really! This is a scientific number format, that is also supported by the Floating point unit in most computers of today, so operations on them are pretty fast these days. But I have seen too many business applications where the developer uses a FLOAT in the database for any kind of numeric value that is not an integer, which is a distinctly bad idea! It works as long as the numbers aren't too big, and when the operations on them are reasonably simple. This is not to say a FLOAT or DOUBLE isn’t useful, just that they are more targeted for scientific programming (Hey, they are after all defined by IEEE!). What most people really should use is DECIMAL (I'm using MySQL datatypes for reference, but you know what I mean). This is a datatype that accepts (hey, big surprise coming up here) decimal numbers, fixed point decimal numbers! Operations on DECIMAL are slower though, as they aren't really supported directly by either C (which is the language that MySQL is largely written in) or by the CPU / FPU. But they are better for business applications (for scientific applications, use FLOAT or DOUBLE. And continue to write your code in Fortran by all means!)

But all these datatype are increasingly used also to define the domain of the data in them! The SQL DOMAIN feature is in the SQL Standard but is largely forgotten. It is there in PostgreSQL for you to try. The advantage of this, in my mind, isn't just that you can define in more exact terms what data should go into a database column, without writing code to do that, and to make it apparent in the database schema what data is expected, it also means that we can get away from the wide array of different built-in datatypes in, say, MySQL. Instead we can stick with a few optimized ones, let them have a simple inherent domain and then we can define the application specific domains in application code. Anyone for a IP_ADDRESS datatype? A ZIPCODE datatype? An URL type. All those a reasonable requests from the application POV, but it really shouldn't (in my mind) be defined by the database server (As: What are the semantics of a ZIP-code? They are different in different countries, and the post-office can change their minds (although that is something that takes a long time I guess)). Why? As this is application specific domain! And application specifics should be in application code! Simple as that!

So what datatypes are useful then, the way I look at it:
  • RAW Data - This should be a basic type. As long as can be, or as short. Any data can be stored in it, and nearly as much as you would like.
  • Integers - Although an integer can be considered to be a special case of a more generic number, the integer domain is so generic, and the integer is such a common and basic type that is deserves a place here.
  • Decimal – Fixed point decimal numbers.
  • Float - Now we are getting close to the application domain here, but these guys deserve a place here anyway, as they are so common and hooked up to the programming languages and hardware.
  • DATETIME - Only one of them is really needed. One of DATETIME, DATE, YEAR, MONTH, TIMESTAMP, TIME etc etc. that is. Just one, very generic base type, where the specific domain is, you've heard it before, application specific!
  • Text - Reluctantly I add this to the list. Although this is just a RAW with some limitations (usually only UTF8 or ISO8859-15 or something characters), there are some operations and attributes that are so tied up with text strings that a TEXT type is reasonable to include.
And that's about it! Yes! We could add a few standard domains of top of all this, like FLOAT and DOUBLE, SMALLINT, TINYINT etc etc. But as for the basic types, this is what I would like to see. As for the weirdo TIMESTAMP semantics in MySQL, add them as triggers on your table, in application code or, best, as a specific domain (I do know that not all weirdiness of TIMESTAMP can be represented by a standard SQL DOMAIN, which is both a sign of just how weird TIMESTAMP is, and of some of the limitation of the SQL DOMAIN).

I have now mentioned SQL DOMAINs a few times, so lets spend some quality time with them right now before closing up, here is a simple session using PostgreSQL 8.4:
  • First we create a domain:
    CREATE DOMAIN yesno AS CHAR(1)
    DEFAULT 'N'
    NOT NULL CHECK (VALUE = 'Y' OR VALUE = 'N');
  • Then we create a table that uses that domain:
    CREATE TABLE user_active(username CHAR(10), is_active YESNO);
  • And then we check if it works using a few SQL statements:
    INSERT INTO user_active VALUES('foo', NULL);
    ERROR: domain yesno does not allow null values
    INSERT INTO user_active VALUES('foo', 'Y');
    INSERT 0 1
    INSERT INTO user_active VALUES('foo', 'N');
    INSERT 0 1
    INSERT INTO user_active VALUES('foo', 'A');
    ERROR: value for domain yesno violates check constraint "yesno_check"
Before I wrap this up, I want to say one more thing on the issue of SQL Domains: There is an alternative n PostgreSQL, which is user defined datatypes using a CREATE TYPE SQL command. To begin with, I like domains much better as they are declarative, which means I don’t have to read a whole bunch of (usually uncommented and undocumented) code to understand what they type does. Secondly, this little warning in the PostgreSQL 9.1 manual isn’t too encouraging (page 1289): “To create a new base type, you must be a superuser. (This restriction is made because an erroneous type definition could confuse or even crash the server.)

I have more to say on types, and I will follow up with a new post on this subject eventually, but this is enough for now.

/Karlsson

Wednesday, February 1, 2012

More on using the commandline vs. Visual editors

Shlomi Noach has made a couple of posts (here and here) on Visual Editors vs. commandline for MySQL working with SQL statements. All in all, I tend to agree with him in many respects, but I am surprised that he recommands MySQL Query Browser, as this is the editor that breaks one of the things that seems to annoy him (and me) most: Every SQL statement that is issed uses it's own connection. Which is not to say others may feel differently about it.

As for myself, I started developing my own MyQuery tool because there are things you just cannot do from the commandline, and there are things that are really complex and non-inituitive to do from the commandline, and you need a better tool than the commandline. Again: This is not to say that there is something wrong with the MySQL commandline, it really should not be full featured, but it should run all SQL commands with ease.

So, what was I missing in the commandline MySQL that led to the development of MyQuery, and project that I am still working on, on and off? Mainly, using the commandline makes it REAL hard to develop and debug scripts. That was the main reason! At the time that I released MyQueru 1.0, I was involved in a project that required A LOT of SQL scripting. And if you have a big, big SQL script, then you know what happens: The scripts runs for a few 100 lines, and then there is an error or a typy, and the commandline just keeps processing until we get an error, and then it stops (alternatively, you use --force, in which case you usually get truckloads of errors, and even if youjust get one error,it gets lost among all the other errors).

OK, so the script failed with an error, then you just fix the error and run it again? Right? Well, that approach assumes that your script is idempotent, which isn't always possible and is at least difficult in many cases. Wouldn't it be neat to have the script running stop when there is an error, allow you to fix the error, and then keep running from where the error occured? That is exactly what you can do with MyQuery.

Alternatively, here is another script running scenario:
You have a script that takes forever to run. You might have the odd error in there, but you can fix them afterwards, just run the script please! But also tell me about any errors in there, so I can fix them! OK, MyQuery does this too. You can run with force in MyQuery also, but the errors are should in a special errors tab. After the script has finished running, you can see the errors in the error output tab, and that is not intermixed with any status printouts or anything. And yes, click on an error text and you get to that position in the script.

Using SQL to generate SQL? Yeah, you can do that with the MySQL commandline, but MyQuery makes it far easier.

MyQuery has a bunch of other neat features, like the ability to save and store blob data to disk (doing that from the commandline is somewhat possible, but only with many limitations. INTO DUMPFILE only works for single row for example), extend the interface with custom tools, using SQL statements, or the MyQuery API and many other things (One supplied tool will nicely break up a long SQL statement into several indented lines, simple but useful when working with big SQL statements). That MyQuery has a colour-coded editor with a very flexible setup is also a bonus that makes it easier to work with.

Actually, the reason I started developing MyQuery is probably still the best reason to use it: To run scriripts.

/Karlsson

Wednesday, January 18, 2012

Database Innovation, pleeease!

I think you have heard me say it before, but in this case I think repetion is needed: We should be much more innovative in the database world. And no, I am not talking NoSQL here, not at all. For all the good things with the NoSQL technologies and the movement itself, it's not really innovative. Rather, in my mind, NoSQL sacrifices functionality for performance, largely. The schema-less design of most of these technologies is probably the one aspect I would consider innovative, the rest is just RAM based storage, sharding, key-based lookups and good, old B-Trees.

Talking about B-Trees, isn't it time we retired them soon? There should be better ways if indexing data. Look at something like Mongo. With MongoDB, you really want to have your indexes in-memory, all of them, without that, performance will be awful (there are exceptions though, but in general this is true). Now, a B-Tree is an index mechanism that has worked well, as the structure of it lends itself to good performance be it on disk or memory, although in general, a B-Tree is built for disk-based storage with caching; for in-memory use, there are better, more efficient, indexing (or access) methods. So if an index in Mongo is supposed to be in memory, why choose a disk-oriented indexing mechanism? T-Trees are there, they are optimized for in-memory use and has been around for ages? I guess the answer is tradition.

Tell you what, tradition is a BAD BAD argument for anything in an industry that changes as fast as the IT-industry. Would anyone suggest that Facebook base their hardware platform in Motorola 6800 CPUs? I think not. But the B-Tree predates the 6800 by far.

Which is not to say that the B-Tree is so bad (or that the Motorola 6800 is either), it's not, but we have much more diverse needs these days, so there should be more diverse access methods in use, but the B-Tree persists, despite that.

And look at SSD-disks. Yeah, the future, right? A largely random access style memory hooked onto an interface designed for electro-mechanical harddisks in the 1970's. Innovative? I think not. Apple got it right in attaching Flash on the Mobo and PCI-based Flash is growing and coming down in rice, so it seems things are moving there at least.

But in any case, Flash / SSD isn't an electromechanical disk with cylinders and sectors, despite what the SSD interface tells us. And if the B-Tree works well on disk, we talk electro-mechanical disks. Where is the access methods designed specifically to reap the benefits of direct attached Flash?

And to be honest, the SQL-Based RDBMS, something which I have spent my career with, in one shape or te other, for 25'is years, is hopelessly outdated, but that is not why I'm no big fan of the NoSQL movement. Rather, my problem is just that the NoSQL movement really doesn't represent something new or is a disruptive technology in any way. Where, my friends, is the disruptive database technology? A Technology built for (you are sitting down now, I hope) the 21st century, If you missed it, we are there now, since 11 years back actually, so start inventing.

And yes, I know about the different MySQL variations with sharding, storage engines etc. etc. But that is not terribly innovative or new. The closest we get to a disruptive technology in the database world recently, is the column based storage databases. But thses are not gereric enough in my mind, and also, most of them have a SQL based interface tucked onto them. And I understand why they want SQL, they need this to be able to sell it, as all consumers of database products (most at least) wants SQL to integrate it with some tool or infrastructure. And I understand this too, but it brings up a question. Where is the customer or end-users who is willing to sacrifice using a query language as old as Led Zepplin to instead get the benefits from some new disruotive database technology?

But this is, I'm afraid, a bit of and chicken-and-egg-situation. The customer isn't requesting innovative products as that technology doesn't exist much, and the products aren't developed and research isn't much done as the customers aren't there. This really has to change soon, and I am sure it will. If for no other reason so that I can retire in peace, knowing that my SQL skills are truly outdated and I will not have to work, because noone wants my skills!

/Karlsson
Lookiing forward to retirement

Tuesday, January 17, 2012

MongoDB for MySQL Folks part 4 - Sharding

Welcome part four of this series of blog-posts on MongoDB, where we previously looked at:
These were introducing some basic concept when it comes to querying MongoDB and to show some simple use cases. By now you realize that MongoDB is different from MySQL, but you probably knew that already, but why would you move from MySQL to MongoDB? Well, you know the answer to that: Performance and Scalability. If MongoDB didn't provide pretty seamless sharding, scaling out over a large number of nodes, then MongoDB wouldn't be that interesting.

With MongoDB, it's performance depends on having large portions of data in RAM, and this is no different from MySQL, but it's even more true with MongoDB. But if you were running on a single machine, the amount of RAM you can use is limited, there is a limit to how much RAM you can (c)RAM into a single box. This is will limit performance of course, and is again not much different from MySQL. What is different is that MongoDB has a solution: transparent sharding (yes, I am aware of the different transparent sharding implementation of MySQL, like Scalebase, but that is a different story, that I will get into at a later stage).

Without Sharding, we would not have gone into using MongoDB here at Recorded Future. Why anyone would use MongoDB in the versions before when sharding was introduced, is beyond me.

MongoDB sharding from above

From a high-level view, this is how sharding works in MongoDB:
  • Data is distributed across one or more mongo servers automatically. A server may be either a single MongoDB server or a replica set (more on this in a later post).
  • Data is distributed in ranges in a user-defined shard-key. The shard key is, as is obvioious, a unique key. It may well be the unique _id identifier that MongoDB assigns to each document, but that is not necessary. Each such range is called a chunk and is some 64 Mb is size by default.
  • Each MongoDB server holds a number of these chunks.
  • Balancing a sharded setup involves moving chunks between the servers and this is automatic. In a perfectly balanced MongoDB shard setup, each involved server has the same number of chunks.
So far so good. Nothing incredibly complex, right, and also useful, right? Yes,, useful and workable, but you have to know what you are doing here.

There are actually three kinds of servers involved here:
  • Mongo shard servers. These are the MongoDB servers or replicasets that holds the actual sharded data. These are the same servers as for a non-sharded MongoDB setup, and there is no stopping you connecting to it just like that, accessing the sharded data in a non sharded way. Also, there is no stopping you adding databases, collections and documents to this server. The server may well hold both sharded and non-sharded data. The latter seems like an advantage, but actually is not and should probably be avoided. The number of mongo shard servers or shard replica sets determines how much data is distributed obviously, with 2 shard servers, each server will hold about 50% of the data etc.
  • Mongo "router" mongos. The mongos process is the process that the application connects to, and it is responsible for distributing a query between to the appropriate shard servers for a particular query. This is a different program than the other mongo servers, and it also has the role of performing the automatic balancing. The interface to it, from the point-of-view of the poor old application, is the same though, so an application that works with a non-sharded MongoDB should also work with a sharded one, but it connects to mongos instead of mongod. You may have as many mongos servers as you please.
  • The Mongo "config server". This is process that runs the usual mongod server, but it has a special role. Instead of storing the actual data, the mongoc manages the metadata, in other words, the config server data tells mongos in which mongo shard server an actual document is located, based on some query. It might seem like this server is a single point of failure, but it is not, you may have 3 of them, in which case they are replicated using a special mechanism.
To summarize this: The application talks to mongos. Mongos need to know where that data that the application is located, so it asks the mongoc. Once it has this data, it goes ahead and asks the mongo shard server for the actual data. And you might think that the extra roundtrip that the mongos has to do do the config server will slow things down and that the config server(s) is a potential bottleneck, but this is not so as the mongos process caches the config data. If the mongos data gets outdated (stale in MongoDB terms), the cache is refreshed.

Setting up MongoDB sharding


There are a couple of weird issues when you configure MongoDB for sharding, and some which are not so obvious. What I have described above is how the MongoDB server processes interact, and setting that up is difficult enough, although far from as difficult as it used to be.

You might be tempted to think that once the above setup is running, all data will be sharded just like that. And that would be nice and that would be the natural way for things to work, but hey, this is MongoDB, so it was close, but no cigar! No, you still have to tell the system that a particular database is sharded, and then enable sharding and determine the key to use for each and every individual collection in that database.

And again, as often is the case with mongo, some things are controlled from the commandline, some things are in configuration files and some things are stored in MongoDB itself. Also, in many cases, even though you have to set certain things in a configuration file, they end up in the database itself anyway. Why this is so you have to ask someone else than yours truly.

So, to get thing straight, this is how you set up sharding with Mongo in not so few and not so easy steps:
  • Start your mongod daemons that stores the actual data. Start them with the --shardsvr option enabled. Apparently this is no longer needed, but I think it's a good thing to put it there anyway.
  • Start your mongoc servers, you have 1 or 3 of these. These are mongod servers running with the --configsvr option.
  • Wait until the config servers are up and running. All of them! This is not well documented, but mongos will not start unless at least 1 config server is running, and the first time around, ALL config servers must be running!
  • Start your mongos servers (routers). Make sure that they actually start, if the config servers cannot be contacted, then mongos will just fail with a message in the logfile.
  • Now, we must tell the config servers where our shards are, lets say we have two of them, "datasvr" and "datasvr2", both running on port 33010. Then enter the mongo shell, connecting to the mongos host and port (in this case mongos_host and 33011 respectively). First you must be in the admin database, and then you can add the shards:
    $ mongo --host mongos_host:33011
    mongo> use admin
    mongo> db.runCommand({addShard: "datasvr1:33010"});
    mongo> db.runCommand({addShard: "datasvr2:33010"});
  • Now we can make sure that we shard the data we have, so enable sharding for the database in question, lets call the database "mydb". Enter the mongo shell and connect with one of the mongos servers, again in the admin database:
    mongo> use admin
    mongo> db.runCommand({enablesharding: "mydb"});
  • And then we have one step left: enable sharding on the collections we want to shard, again from the mongo commandline connected to mongos. Here we specify the name of the collection and the key used for sharding. In this case, we use the always present unique _id column as the shard key and the collection is called mycoll:
    mongo> use admin
    mongo> db.runCommand({shardcollection: "mydb.mycoll", key: {_id: 1}});

Does the commands above that you run from mongos look weird to you? Like the key specification key: {_id: 1}? Yes, at first it looks awkward, even to me. But after a while you get used to it and can start to appreciate the strict Java Script / JSON syntax used all over the place in MongoDB, it is actually quite powerful and easy to use, once you get used to it.

Now, we have enabled sharding for one colletion here. If you have been through all this fuzz to create a sharded setup, I guess you would want all tables in that setup to use the powerful sharding mechanism? Like, automatically? Nope, can't do. You have to enable sharding manually for each MongoDB database and collection you create.

I plan to dig deeper into mongo in a later post, to show some monitoring features, how to use Java Script, how to set up replication and some other things. But for now, this is it!

Cheers
/Karlsson