MongoDB - Best practices

MongoDB


This article in Spanish.

MongoDB is an agile database that, thanks to its flexibility to change the document schemas, provides productivity, as well as scalability, performance and availability for its architecture and options that it provides when configuring a cluster.

Data Modeling

The decision among which data should be normalized and denormalized is important, so it is necessary to take some time to think about the way that best suits our needs. There are a variety of modeling strategies that can be used to read and write our data, both normalized and denormalized, such as one to few and one to many.

An example of one to few, can be phone numbers of a contact, where it is possible to embed this directly in the document to which it belongs allowing edition and access in a single call:

{

  Name: 'Peter Pan',

  Ssn: '456-123-0987',

  Numbers: [

    {Name: 'Personal', number: '918-876-5199'},

    {Name: 'Home', number: '417-789-9995'}

  ]

}

In the case of one to many, you can have a few hundred or maximum thousands of an object related to a single object. We can see as an example the purchase invoice of a mini-market, where you have a collection of products with a unique identifier and another collection with purchase invoices, where we have:

Each product composed as follows:

{

  _id: ObjectId ( 'AAAA'),

  Number: 123,

  Name: 'Milk 1 lt',

  Price: 0.97

  Qty: 2

}

And each purchase invoice as follows:

{

  Date: '2016-11-24T22: 43: 03.288Z',

  Total: 39.64,

  Tax: 9.70,

  Products: [

    ObjectId ( 'AAAA'),

    ObjectId ( 'AABB'),

    // etc

  ]

}

We could get all the information in two queries, making a "join" at the application level:

Invoice = db.invoices.findOne ({number: 123});

Invoice_products = db.products.find ({_ id: {$ in: invoice.products}}.) ToArray ()

Adding Indices

Adding indexes allows search operations to be performed much faster. It should be taken into account that only one index per query is used, and that having many indexes may not benefit, so it is always good to analyze the behavior of the data before thinking about this point.

By default, Mongo creates a unique index for the '_id' field during the creation of the collection, this prevents the insertion of two documents with the same value in the '_id' field.

To create an index, the following statement is executed:

Db.collection_name.createindex ({fieldName: 1})

Where collection_name is the name of the collection and which allows us to make more efficient queries using the field field.

Use Replica Sets

Replica sets allow high availability through failover, meaning that if the primary node fails, another node will be chosen as the primary node and everything will continue to run without interruptions. It is important to keep in mind that replica sets are not backups. The reason is that a replica can not be protected from human error, for example, erasing a collection of data accidentally or deploying an erroneous version of an application.

Always back up your data

We all know the importance of backing up data, when deployed to production you must have a strategy to backup and restore backups for any kind of eventuality, such as data loss. With a small amount of data and an instance of Mongo you can use a couple of utilities provided by Mongo to work around this purpose: mongodump and mongorestore.

Mongodump allows to "export" a database, collection or query:

$ Mongodump --host localhost: 27017 --db database_name --out / home / user / database_name

The above example would back up the database_name in the / home / user / database_name path which would be a folder with BSON files for the backup.

To restore a backup you must use mongorestore:

$ Mongorestore --host localhost: 27017 --drop --db database_name / home / user / database_name

The above command will restore the backup located in / home / user / database_name in the database database_name and the '--drop' option tells you to delete the database before resizing it.

Use the most recent stable version

The technology advances by leaps and bounds, for which being aware of the new versions allows us to take advantage of new features, performance improvements and ease of use. Currently, the most recent stable version is 3.2.11. It is advisable to go to the official download center [https://www.mongodb.com/download-center] or use the package manager of the operating system.

Avoid using MongoDB on 32-bit systems

MongoDB limits its processes to about 2 GB of data, although this is not a major problem nowadays since it is quite common to have 64-bit systems.

Security

Restricting who or what can access stored information is vital. By default MongoDB does not include security but it does provide a way to ensure access to information within which it includes Authentication, Authorization and Transport Encryption. It is good to have a safety checklist before deploying to production. Here are some important points:

  • Enable access control and force authentication. For example, you can use the mechanism that Mongo includes to force valid credentials to be provided if you want to access or modify the information in the database.

  • Configure role-based access control based on the needs of the application or system. You can define roles that can perform certain operations, such as a read-only user.

  • Encrypt communication: If MongoDB is exposed to applications located on different hosts from the one MongoDB is hosted in, encrypting all outgoing and incoming traffic using TLS / SSL ensures that no third party can access the information.

  • Limit exposure to the network: Allow Mongo to operate only on trusted networks such as private networks, or better yet, if your application is on the same host as MongoDB, it can be limited to local access only.

  • Run MongoDB with a dedicated user: Allow MongoDB processes to run with a dedicated user who does not have unnecessary permissions.

There are further good practices to follow than those mentioned, for which it is always good to do a bit of research and have them present when we plan to make a deployment to production, making sure we keep checklists of items to keep in mind for the good health of our applications during their life cycle.