Error message

You are not authorized to post comments.

MongoDB: NoSQL Operations Hands-On

This blog entry is from Greg RiedeselĀ (http://sysadmin1138.net)

Nuri Halprin taught the MongoDB: NoSQL Operations Hands On class this afternoon. The intent of this class was to give sysadmins a solid familiarization of MongoDB and those things near and dear to our heart: high-availability and disaster recovery.

MongoDB is a NoSQL database that is pretty popular. It has some RDBMS-like features, but isn't one; there will be no JOINing here, please move along. Even without such key functionality these systems are extremely useful, especially for activities that can be run on a single large table. These databases are very new so many of us haven't run into them before, or are having to learn about it real fast.

Nuri provided a very good foundation for sysadmins. Mongo uses memory-mapped files, which leverages the Operating System's memory management subsystem. It's written in compiled code, C++, for speed. It uses BSON as a native format, which is a good thing to keep in mind when dealing with data export/import.

MongoDB was also written from the bolts out to be run on commodity hardware or cloudy VMs, not massively redundant single servers. The recommended mode of operation is in a minimum of a three replica set which gives high availability. For scaling out, it can shard to spread read and write loads as well as simply spread data around. Sharded replicas are perfectly allowed, and is just how the big environments run. All this redundancy means a node can completely fall out and impacts will be minimal.

As of the current version (2.2) Mongo does not have a very featured security model. Users can be assigned at the database level, but not per-document or collection. There is no native encryption support at either the file level or network level (this is what IPSec is for).

Backup and restore is a complex issue for Mongo. Since Mongo systems tend to be extremely large, traditional backup-all style backups are problematic to take. Sharded systems complicate the challenge of taking a coherent backup. It is possible to take such backups, but the resources required to do so may not be reasonable. This doesn't sit well with some sysadmins, but this is part of the risk-management program we have to take into account.

That said, there are tools to do backups. The mongodump utility is the built-in one for exporting large quantities of data, which is restored through mongorestore. Use of these, or more traditional LVM-snapshot methods, will get a backup of Mongo system.

Replicas and sharding were the focus of the hands-on session in the second half of the class. Nuri passed around a few USB drives with a class kit on them that contained Mongo binaries (Windows, Linux, and OSX) and some test databases. We then went through the paces of getting a MongoDB system running, import data into it two different ways, and then configured them for a three node replica-set.

Sharding is more complicated. For one it requires at least two replica-sets of identical node counts. Second, it requires a separate config server (three such for a production setup, but this is just test). Third, a mongoS process needs to run somewhere, which is the process that actually balances chunks between shards. Fourth, once you pick your shard-key (selected during creation) you can't change it without a full export/import cycle, so pick well.This was a very good class for the new sysadmin to learn the ops-relevant issues surrounding a MongoDB deployment.