Introduction

No-SQL Databases like MongoDB use asynchronous, statement-based replication because it’s platform independent and allows more flexibility within a replica set.

Replication is the concept of maintaining multiple copies of your data.

Why Replication?

Replication is extremely important because you can never assume that all your servers will always be available. It has become the core of distributed systems over the course of time.

Replication tries to mitigate the issue of server downtime, and all-time access to data even in case of disaster or maintenance work.

The point of replication is to make sure that in the event your server goes down, you can still access your data.

This concept is called availability.

A database that does not use replication only has a single database server are referred as standalone nodes. In this setup, databases can perform reads and writes only while that single node is up and running.

But if the node goes down, we lose all access to that data.

How does Replication work?

In a replicated system, there are a couple extra nodes (read: database server) on hand, and they hold copies of our data.

A group of nodes that each have copies of the same data is called a replica set.

In a replica set, entire data is handled by default through one of the nodes(master), and it’s up to the remaining nodes(slaves) in the set to sync up with it and replicate any new data that’s been written through an asynchronous mechanism. This mechanism is also known as Master-Slave system, which serves the purpose that all nodes stay consistent to each other.

In times of disaster when the primary node goes down, one of the secondary nodes can take its place as primary in a process known as failover. The process of selecting the new master or primary node is literally called as election, where, nodes actually vote for one another.

To maintain a durable system, the process of failover happens in a very quick time and the end application will not even sense as if something has gone wrong at all. After, the original primary node comes back to life, it catches up on the missed data via process of syncing and re-joins the replica set.

Availability and redundancy of data are typical properties of a durable database solution.

Different forms of Data replication

Data replication can take one of two forms, i.e. binary replication and statement-based replication.

Binary Replication

Suppose application is inserting a document into database system, and after process completes it has a few bytes on disk that were written to contain some new data.

The way binary replication works is by examining the exact bytes that changed in the data files and recording those changes in a binary log. The secondary nodes then receive a copy of the binary log and write the specified data that changed to the exact byte locations that are specified on the binary log.

Pros:

· Replicating data is smooth on the secondary nodes since they get really specific instructions on what bytes to change and what to change them to.

· Secondary nodes aren’t even aware of the statements that they’re replicating.

Cons:

· Assumption is made that OS will be consistent across replica set. However, if one set is on Windows and other is on Linux, the same binary logs cannot function.

· In case of same OS, all the machines should have same instruction set.

· In case, data set is not updated on of the servers, it will result in corrupted data.

Statement Based Replication

After a write operation is completed on the primary node, the write statement itself is stored in a specific Log and the secondary nodes sync their logs with the primary node’s log and replay any new statements on their own data.

Pros:

· This approach works regardless of the operating system or instruction set of the nodes in the replica set.

· Data consistency.

· No OS level or machine level dependency, hence valuable for any cross-platform solution that requires multiple OS in same replica set.

Cons:

· Process of replication is slow in comparison with Binary replication.

· Statement based replication uses actual database commands to write in logs, hence, operation is bit heavier than Binary replication and workload is more.

Full Stack developer, avid photographer and cyclist, Fitness trainer. contact: reeshabh.choudhary@gmail.com