Database Replication Guide | Scaling Reads and High Availability

Database replication copies data across multiple servers for scalability, availability, and disaster recovery. Understanding replication strategies helps you make informed decisions about consistency, performance, and failure handling.

Why Replicate?

Scaling Reads

Without replication, all database reads hit a single server, creating a bottleneck. With replication, you distribute read traffic across multiple replicas, dramatically increasing your read capacity. The following diagram illustrates the difference.

Without replication:
All Reads → Single Database (bottleneck)

With replication:
Reads → Primary
     → Replica 1
     → Replica 2
     → Replica 3

High Availability

Replication provides automatic failover capability. When your primary fails, you can promote a replica to take its place with minimal downtime. The basic failover flow is straightforward.

Primary fails → Promote replica → Minimal downtime

Geographic Distribution

Place replicas close to your users to reduce latency. Users in each region read from their local replica, improving response times significantly. This architecture serves global users with local performance.

US Users → US Replica (low latency)
EU Users → EU Replica (low latency)

Single-Leader Replication

How It Works

In single-leader replication, one node (the primary) accepts all writes and propagates changes to replicas through a replication log. Any node can serve reads. This is the most common replication topology.

Writes → Primary → Replication Log → Replica 1
                                  → Replica 2
                                  → Replica 3
Reads  → Any node

Synchronous vs Asynchronous

Synchronous:

With synchronous replication, the primary waits for at least one replica to acknowledge the write before returning success to the client. This provides strong consistency but adds latency. You can visualize the request flow below.

Client → Primary → Wait for Replica ACK → Return Success

Strong consistency
Higher latency
Replica failure blocks writes

Asynchronous:

Asynchronous replication returns success immediately after the primary commits, replicating in the background. This is faster but risks data loss if the primary fails before replication completes. The flow looks like this.

Client → Primary → Return Success
              ↓
         Replicate (background)

Eventually consistent
Lower latency
Data loss risk on primary failure

Semi-Synchronous

Semi-synchronous replication offers a middle ground. The primary waits for acknowledgment from at least one replica while other replicas receive updates asynchronously. This balances durability and performance.

Client → Primary → Wait for ONE replica ACK → Return Success
              ↓
         Other replicas async

Balance between consistency and performance.

PostgreSQL Configuration

Here is how you configure single-leader replication in PostgreSQL. The primary needs WAL (Write-Ahead Log) settings enabled, while the replica connects to the primary and runs in standby mode.

-- Primary: postgresql.conf
wal_level = replica
max_wal_senders = 10
synchronous_commit = on
synchronous_standby_names = 'replica1'

-- Replica: recovery.conf
standby_mode = on
primary_conninfo = 'host=primary port=5432'
trigger_file = '/tmp/postgresql.trigger'

The trigger file provides a simple failover mechanism: create this file on the replica, and it will promote itself to primary.

MySQL Configuration

MySQL uses binary logging for replication. You configure a unique server ID on each node, enable binary logging on the primary, and point the replica to the primary's log position.

-- Primary: my.cnf
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW

-- Replica
CHANGE MASTER TO
  MASTER_HOST='primary',
  MASTER_USER='replication',
  MASTER_PASSWORD='password',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=0;
START SLAVE;

Using ROW-based binary logging (binlog_format = ROW) is recommended for most scenarios as it captures the actual data changes rather than SQL statements.

Replication Lag

The Problem

With asynchronous replication, there is always a delay between writes to the primary and their appearance on replicas. This can lead to confusing user experiences. The timeline below shows how stale reads can occur.

Time 0: Write to Primary (balance = $100)
Time 1: Read from Replica (balance = $0, stale!)
Time 2: Replica catches up (balance = $100)

Handling Lag

Read-your-writes consistency:

After a user modifies data, force subsequent reads to the primary so they immediately see their changes. This Laravel example forces the read to use the primary connection after an update.

// After write, read from primary
public function updateProfile(Request $request)
{
    $user->update($request->all());

    // Force read from primary
    return User::on('primary')->find($user->id);
}

Monotonic reads:

Track which replica position a user has seen and ensure subsequent reads come from replicas at least as current. This prevents users from seeing data "go backward." You can track the position in the session.

// Track replica position per user session
$lastPosition = session('replica_position');

$replica = $this->findReplicaAtPosition($lastPosition);
$user = User::on($replica)->find($id);

session(['replica_position' => $replica->position()]);

Causal consistency:

Include version information in requests so the server can wait for replicas to catch up before responding. This ensures causally related reads see their dependencies. The client passes a minimum version header.

// Include version/timestamp in requests
public function getOrder(Request $request, $id)
{
    $minVersion = $request->header('X-Min-Version');

    $replica = $this->waitForVersion($minVersion);
    return Order::on($replica)->find($id);
}

Multi-Leader Replication

Use Cases

Multi-leader replication is essential when you need writes to succeed in multiple geographic locations without cross-region latency:

Multi-datacenter operation
Offline-capable applications
Collaborative editing

Architecture

In multi-leader setups, each datacenter has its own primary that accepts writes locally. Primaries replicate to each other asynchronously. This allows low-latency writes in all regions.

Datacenter A          Datacenter B
    ↓                     ↓
Primary A ←──────────→ Primary B
    ↓                     ↓
Replicas              Replicas

Conflict Resolution

When both leaders modify the same data, you need a strategy to resolve the conflict. Here are the common approaches.

Last-write-wins (LWW):

The simplest approach uses timestamps to determine which write survives. This is easy to implement but can silently discard data. The write with the higher timestamp wins.

Leader A: UPDATE users SET name='Alice' WHERE id=1  (timestamp: 100)
Leader B: UPDATE users SET name='Bob' WHERE id=1    (timestamp: 101)

Result: name = 'Bob' (higher timestamp wins)

Merge values:

For certain data types, you can merge conflicting values rather than choosing one. This works well for sets and counters. The union of both sets preserves all data.

Leader A: tags = ['php', 'laravel']
Leader B: tags = ['php', 'vue']

Result: tags = ['php', 'laravel', 'vue'] (union)

Custom resolution:

Complex domains often require application-specific conflict resolution logic that understands your business rules. The following class implements a custom resolver that prioritizes admin changes.

class ConflictResolver
{
    public function resolve($local, $remote, $base)
    {
        // Application-specific logic
        if ($local->updated_by_admin) {
            return $local;
        }
        return $remote;
    }
}

The three-way merge with base, local, and remote versions gives you the most flexibility to implement sophisticated resolution strategies.

Leaderless Replication

How It Works

Leaderless systems have no single leader. Clients write to multiple nodes directly and read from multiple nodes, using quorums to ensure consistency. The flow below shows a write succeeding when two of three nodes acknowledge.

Write → Node 1 ✓
     → Node 2 ✓
     → Node 3 ✗
     (2 of 3 = success with W=2)

Read  → Node 1: value A
     → Node 2: value A
     → Node 3: value B (stale)
     (2 of 3 agree = return A with R=2)

Quorum Consistency

The quorum formula determines whether reads and writes overlap, guaranteeing that reads see the latest writes. Understanding this formula is essential for configuring leaderless systems correctly.

N = total nodes
W = write quorum
R = read quorum

If W + R > N, reads see latest writes

Example: N=3, W=2, R=2
- Write succeeds on 2+ nodes
- Read queries 2+ nodes
- At least 1 node has latest value

Read Repair

When a client detects inconsistency during reads, it can repair stale nodes by writing the current value back to them. This healing process happens automatically as part of normal read operations.

Client reads from 3 nodes:
- Node 1: version 5
- Node 2: version 5
- Node 3: version 4 (stale)

Client sends version 5 to Node 3 (repair)

Anti-Entropy

A background process continuously compares data across nodes using efficient data structures like Merkle trees to detect and repair inconsistencies. The tree hashes make it easy to identify which data needs synchronization.

Merkle Tree comparison:
Node A: hash(data) = abc123
Node B: hash(data) = abc123 ✓
Node C: hash(data) = def456 ✗ → sync needed

Failover Strategies

Automatic Failover

Automatic failover detects primary failure and promotes a replica without manual intervention. This requires consensus among nodes to prevent split-brain scenarios. The process involves four steps.

1. Detect primary failure (heartbeat timeout)
2. Elect new primary (consensus)
3. Reconfigure replicas
4. Update client connections

Manual Failover

In some cases, you may prefer manual control over failover. These commands promote a replica to become the new primary. Use these during planned maintenance or when automatic failover is disabled.

# PostgreSQL: Promote replica
pg_ctl promote -D /var/lib/postgresql/data

# MySQL: Stop slave, reset master
STOP SLAVE;
RESET MASTER;

Split-Brain Prevention

Split-brain occurs when multiple nodes believe they are the primary, potentially causing data divergence and corruption. Fencing mechanisms prevent the old primary from accepting writes. Several methods exist for fencing.

Fencing: Ensure old primary can't accept writes

Methods:
- STONITH (Shoot The Other Node In The Head)
- Shared storage locks
- Network partition detection

The dramatic name "STONITH" reflects the seriousness of split-brain: sometimes the safest action is to forcibly shut down the questionable node.

Laravel Configuration

Read/Write Splitting

Laravel makes it easy to route reads to replicas and writes to the primary. The sticky option ensures a session reads from the primary after writing. This configuration defines multiple read replicas with automatic connection selection.

// config/database.php
'mysql' => [
    'read' => [
        ['host' => 'replica1.example.com'],
        ['host' => 'replica2.example.com'],
    ],
    'write' => [
        'host' => 'primary.example.com',
    ],
    'sticky' => true,  // Use write connection after writing
],

The sticky option is crucial for read-your-writes consistency. Without it, a user might not see their own changes immediately after saving.

Explicit Connection Selection

Sometimes you need explicit control over which connection to use, such as when reading critical data that must be current. Laravel's on method lets you specify the connection directly.

// Force primary for critical reads
$user = User::on('mysql::write')->find($id);

// Explicit replica
$users = User::on('mysql::read')->get();

Monitoring

Key Metrics

Monitoring replication lag is essential for maintaining data consistency and detecting issues before they affect users. Run these queries regularly to check replication health.

-- PostgreSQL: Check replication lag
SELECT
    client_addr,
    state,
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes
FROM pg_stat_replication;

-- MySQL: Check slave status
SHOW SLAVE STATUS\G
-- Look for: Seconds_Behind_Master

The lag_bytes metric in PostgreSQL shows how far behind replicas are in terms of WAL position. In MySQL, Seconds_Behind_Master provides a time-based lag estimate.

Alerting Thresholds

Set up alerts at multiple severity levels to catch replication issues before they become critical. The following configuration shows a typical alerting setup with graduated thresholds.

alerts:
  - name: ReplicationLag
    condition: lag_seconds > 30
    severity: warning

  - name: ReplicationLagCritical
    condition: lag_seconds > 300
    severity: critical

  - name: ReplicationBroken
    condition: slave_io_running = 'No'
    severity: critical

A warning at 30 seconds gives you time to investigate, while critical alerts at 5 minutes or replication failure demand immediate attention.

Choosing a Strategy

Requirement	Strategy
Simple scaling	Single-leader
Multi-datacenter	Multi-leader
High write availability	Leaderless
Strong consistency	Synchronous single-leader
Low latency	Async replication
Conflict tolerance	Multi-leader with resolution

Conclusion

Database replication involves tradeoffs between consistency, availability, and latency. Single-leader replication is simplest and suits most applications. Multi-leader enables multi-datacenter deployments but requires conflict handling. Leaderless systems offer high availability but eventual consistency. Choose based on your consistency requirements, failure tolerance, and geographic distribution needs.

Database Replication Strategies

Why Replicate?

Scaling Reads

High Availability

Geographic Distribution

Single-Leader Replication

How It Works

Synchronous vs Asynchronous

Semi-Synchronous

PostgreSQL Configuration

MySQL Configuration

Replication Lag

The Problem

Handling Lag

Multi-Leader Replication

Use Cases

Architecture

Conflict Resolution

Leaderless Replication

How It Works

Quorum Consistency

Read Repair

Anti-Entropy

Failover Strategies

Automatic Failover

Manual Failover

Split-Brain Prevention

Laravel Configuration

Read/Write Splitting

Explicit Connection Selection

Monitoring

Key Metrics

Alerting Thresholds

Choosing a Strategy

Conclusion

Share this article

Related Articles

Kubernetes Persistent Storage Solutions

Observability Pipeline Design

Database Schema Migration Strategies

Need help with your project?

ScopeForged Assistant