Database Replication

7 min readFeb 6, 2024

👷‍♂️ Software Architecture Series — Part 22.

Replication is the concept of maintaining multiple copies of your data.

Replication is a vital aspect of distributed systems, forming its core over time. Its significance lies in ensuring continuous data access, even when some nodes or servers might be unavailable due to downtime, disasters, or maintenance work. By replicating data across multiple servers, the risk of losing access to critical information during server failures is greatly reduced. This redundancy ensures that data remains available, even if one server becomes inaccessible.

The point of replication is to make sure that in the event your node goes down, you can still access your data using a standby database.

How does Replication work?

In a replicated system, there are a couple extra nodes (read: database server) on hand, and they hold copies of our data. A group of nodes that each have copies of the same data is called a replica set.

In a replica set, entire data is handled by default through one of the nodes(master), and it’s up to the remaining nodes(slaves) in the set to sync up with it and replicate any new data that’s been written through an asynchronous mechanism. This mechanism is also known as Master-Slave system, which serves the purpose that all nodes stay consistent to each other.

In times of disaster when the primary node goes down, one of the secondary nodes can take its place as primary in a process known as failover. The process of selecting the new master or primary node is literally called as election, where, nodes vote for one another.

To maintain a durable system, the process of failover happens in a very quick time and the end application will not even sense as if something has gone wrong at all. After, the original primary node comes back to life, it catches up on the missed data via process of syncing and re-joins the replica set.

Availability and redundancy of data are typical properties of a durable database solution.

Primarily there are two classification, one is relational model also known as SQL based and other non-relational, also known as NoSQL. However, in the context of data replication, let us briefly discuss how various replication strategies can be considered in case of different data models. Replication plays a vital role in ensuring data consistency and availability across distributed systems. Each data model presents unique characteristics and requirements, necessitating different replication approaches.

Replication strategy for Relational Database

In the relational model, characterized by SQL-based databases, there are three main strategies: transactional, snapshot, and merge replication, each serving distinct use cases and scenarios.

Transactional Replication

Transactional replication is characterized by real-time replication of transactions or changes from the original database to the replicated database in a sequential manner. This means that users experience these changes on the replicated database almost instantly, ensuring transactional consistency. This type of replication is commonly used in server-to-server environments where data needs to be synchronized in real-time across multiple servers. The advantage of transactional replication lies in its ability to provide up-to-date and consistent data across all replicas, ensuring that the distributed system operates seamlessly.

When a transaction commits in an ACID-compliant database, the system immediately writes a transaction record to the transaction log. The transaction log serves as a chronological record of all committed transactions, including their before and after states. This log acts as a safeguard against potential failures, ensuring that changes made during transactions are preserved.

Database replication utilizes this transaction log to create and maintain replicas of the original database. A replication process continuously monitors the transaction log for any committed changes. As new transactions commit, the replication process captures those changes from the transaction log and applies them to a backup database (slave).

By doing so, the backup database is kept up-to-date with the changes made in the primary database (master). This process allows the backup database to mirror the original database’s state, creating a replica that is synchronized with the primary database’s data.

In the event of a failure or downtime of the primary database, the replication process enables a seamless switch to the replica. As the replica has been continuously updated with the changes from the transaction log, it reflects the latest committed data, ensuring data availability and minimizing downtime.

This replication approach is particularly valuable for disaster recovery and high availability scenarios. By maintaining real-time or near-real-time replicas, organizations can ensure business continuity even in the face of hardware failures, software glitches, or other unexpected disruptions.

Asynchronous Replication Strategy: In many cases, replication is asynchronous, meaning there may be some delay between the time a transaction is committed on the master database and when it is applied to the slave databases. This approach helps reduce potential performance impact on the primary database. However, it also means that there might be a brief period when the slave databases are not fully up-to-date with the primary database.

Deferred Commit Strategy: In some databases, a commit can be deferred until the transaction has been successfully replicated to the slave database(s). This is known as synchronous or semi-synchronous replication. By deferring the commit until replication is complete, it ensures that the slave databases are fully synchronized with the primary database before acknowledging the transaction as committed. This provides stricter data consistency guarantees at the cost of potential performance overhead.

Practically the combination of asynchronous writing to data files and immediate writing to the transaction log, along with various replication strategies, allows for efficient and reliable data management in distributed systems, ensuring data integrity and availability even in the face of failures or replication delays.

Snapshot replication

Snapshot replication captures and overwrites a snapshot of data from the original database onto the receiving database. Unlike transactional replication, snapshot replication does not continuously monitor for data updates. Instead, it distributes data exactly as it appears at a specific moment in time. Snapshot replication is typically used when data changes happen infrequently, as it is less efficient in updating changes in real-time. While this type of replication may be slower than transactional replication, it is well-suited for scenarios where occasional data updates are acceptable, and real-time synchronization is not critical.

Merge Replication

Merge replication is the most complex type of SQL database replication. It involves the merging of data from multiple databases into a single receiving database. Unlike transactional and snapshot replication, merge replication allows for independent changes to be made by both the publisher (original database) and the subscribers (replicated databases). This model enables one publisher to send changes to multiple subscribers, making it suitable for server-to-client environments where bidirectional data synchronization is necessary. Merge replication provides the flexibility to accommodate multiple sources of data and ensures that all changes are merged seamlessly into the receiving database, maintaining data integrity across the distributed system.

Replication Strategy for No SQL Database

In multi-master databases like DynamoDB, multiple nodes act as masters, allowing read and write operations on any node. This model offers high availability and distributed data updates. However, it introduces a point of failure, as when a master node goes down, electing a new master causes a brief downtime, potentially leading to SLA violations.

Master-slave architectures, exemplified by MongoDB, consist of a master node that handles all write operations and replicates data to multiple read-only slave nodes. While providing read scaling, this model still faces the risk of a single point of failure, where the master node going down disrupts write operations and necessitates the promotion of a new master.

To address this limitation, master-less architectures, like Scylla DB, have emerged. In these databases, data is replicated across multiple nodes, and all nodes are considered equal. This approach ensures that no single node can bring down the entire cluster, enhancing resilience and fault tolerance. In a typical master-less setup, each dataset is replicated across three or more replicas, distributing data efficiently and eliminating single points of failure.

By adopting a NoSQL database that employs a master-less architecture, applications gain an additional layer of resilience, particularly for high-volume and low-latency use cases. With multiple equal replicas, masterless databases offer uninterrupted operations even in the face of node failures, ensuring consistent data availability and minimizing downtime for critical applications.

Different forms of Data Replication

Data replication can take one of two forms, i.e., binary replication, and statement-based replication.

Binary Replication

Suppose application is inserting a document into database system, and after process completes it has a few bytes on disk that were written to contain some new data.

The way binary replication works is by examining the exact bytes that changed in the data files and recording those changes in a binary log. The secondary nodes then receive a copy of the binary log and write the specified data that changed to the exact byte locations that are specified on the binary log.

Pros:

· Replicating data is smooth on the secondary nodes since they get really specific instructions on what bytes to change and what to change them to.

· Secondary nodes aren’t even aware of the statements that they’re replicating.

Cons:

· Assumption is made that OS will be consistent across replica set. However, if one set is on Windows and other is on Linux, the same binary logs cannot function.

· In case of same OS, all the machines should have same instruction set.

· In case, data set is not updated on of the servers, it will result in corrupted data.

Statement Based Replication

After a write operation is completed on the primary node, the write statement itself is stored in a specific Log and the secondary nodes sync their logs with the primary node’s log and replay any new statements on their own data.

Pros:

· This approach works regardless of the operating system or instruction set of the nodes in the replica set.

· Data consistency.

· No OS level or machine level dependency, hence valuable for any cross-platform solution that requires multiple OSs in same replica set.

Cons:

· Process of replication is slow in comparison with Binary replication.

· Statement based replication uses actual database commands to write in logs, hence, operation is bit heavier than Binary replication and workload is more.

Database Replication

How does Replication work?

Replication strategy for Relational Database

Replication Strategy for No SQL Database

Different forms of Data Replication

Pros:

Cons:

Pros:

Cons:

Written by Reeshabh Choudhary