Database Replication Strategies

Reverend Philip Dec 22, 2025 9 min read

Scale reads and improve availability with database replication. Compare single-leader, multi-leader, and leaderless approaches.

Database replication copies data across multiple servers for scalability, availability, and disaster recovery. Understanding replication strategies helps you make informed decisions about consistency, performance, and failure handling.

Why Replicate?

Scaling Reads

Without replication, all database reads hit a single server, creating a bottleneck. With replication, you distribute read traffic across multiple replicas, dramatically increasing your read capacity.

Without replication:
All Reads → Single Database (bottleneck)

With replication:
Reads → Primary
     → Replica 1
     → Replica 2
     → Replica 3

High Availability

Replication provides automatic failover capability. When your primary fails, you can promote a replica to take its place with minimal downtime.

Primary fails → Promote replica → Minimal downtime

Geographic Distribution

Place replicas close to your users to reduce latency. Users in each region read from their local replica, improving response times significantly.

US Users → US Replica (low latency)
EU Users → EU Replica (low latency)

Single-Leader Replication

How It Works

In single-leader replication, one node (the primary) accepts all writes and propagates changes to replicas through a replication log. Any node can serve reads.

Writes → Primary → Replication Log → Replica 1
                                  → Replica 2
                                  → Replica 3
Reads  → Any node

Synchronous vs Asynchronous

Synchronous:

With synchronous replication, the primary waits for at least one replica to acknowledge the write before returning success to the client. This provides strong consistency but adds latency.

Client → Primary → Wait for Replica ACK → Return Success
  • Strong consistency
  • Higher latency
  • Replica failure blocks writes

Asynchronous:

Asynchronous replication returns success immediately after the primary commits, replicating in the background. This is faster but risks data loss if the primary fails before replication completes.

Client → Primary → Return Success
              ↓
         Replicate (background)
  • Eventually consistent
  • Lower latency
  • Data loss risk on primary failure

Semi-Synchronous

Semi-synchronous replication offers a middle ground. The primary waits for acknowledgment from at least one replica while other replicas receive updates asynchronously.

Client → Primary → Wait for ONE replica ACK → Return Success
              ↓
         Other replicas async

Balance between consistency and performance.

PostgreSQL Configuration

Here is how you configure single-leader replication in PostgreSQL. The primary needs WAL (Write-Ahead Log) settings enabled, while the replica connects to the primary and runs in standby mode.

-- Primary: postgresql.conf
wal_level = replica
max_wal_senders = 10
synchronous_commit = on
synchronous_standby_names = 'replica1'

-- Replica: recovery.conf
standby_mode = on
primary_conninfo = 'host=primary port=5432'
trigger_file = '/tmp/postgresql.trigger'

The trigger file provides a simple failover mechanism: create this file on the replica, and it will promote itself to primary.

MySQL Configuration

MySQL uses binary logging for replication. You configure a unique server ID on each node, enable binary logging on the primary, and point the replica to the primary's log position.

-- Primary: my.cnf
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW

-- Replica
CHANGE MASTER TO
  MASTER_HOST='primary',
  MASTER_USER='replication',
  MASTER_PASSWORD='password',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=0;
START SLAVE;

Using ROW-based binary logging (binlog_format = ROW) is recommended for most scenarios as it captures the actual data changes rather than SQL statements.

Replication Lag

The Problem

With asynchronous replication, there is always a delay between writes to the primary and their appearance on replicas. This can lead to confusing user experiences.

Time 0: Write to Primary (balance = $100)
Time 1: Read from Replica (balance = $0, stale!)
Time 2: Replica catches up (balance = $100)

Handling Lag

Read-your-writes consistency:

After a user modifies data, force subsequent reads to the primary so they immediately see their changes.

// After write, read from primary
public function updateProfile(Request $request)
{
    $user->update($request->all());

    // Force read from primary
    return User::on('primary')->find($user->id);
}

Monotonic reads:

Track which replica position a user has seen and ensure subsequent reads come from replicas at least as current. This prevents users from seeing data "go backward."

// Track replica position per user session
$lastPosition = session('replica_position');

$replica = $this->findReplicaAtPosition($lastPosition);
$user = User::on($replica)->find($id);

session(['replica_position' => $replica->position()]);

Causal consistency:

Include version information in requests so the server can wait for replicas to catch up before responding. This ensures causally related reads see their dependencies.

// Include version/timestamp in requests
public function getOrder(Request $request, $id)
{
    $minVersion = $request->header('X-Min-Version');

    $replica = $this->waitForVersion($minVersion);
    return Order::on($replica)->find($id);
}

Multi-Leader Replication

Use Cases

Multi-leader replication is essential when you need writes to succeed in multiple geographic locations without cross-region latency:

  • Multi-datacenter operation
  • Offline-capable applications
  • Collaborative editing

Architecture

In multi-leader setups, each datacenter has its own primary that accepts writes locally. Primaries replicate to each other asynchronously.

Datacenter A          Datacenter B
    ↓                     ↓
Primary A ←──────────→ Primary B
    ↓                     ↓
Replicas              Replicas

Conflict Resolution

When both leaders modify the same data, you need a strategy to resolve the conflict.

Last-write-wins (LWW):

The simplest approach uses timestamps to determine which write survives. This is easy to implement but can silently discard data.

Leader A: UPDATE users SET name='Alice' WHERE id=1  (timestamp: 100)
Leader B: UPDATE users SET name='Bob' WHERE id=1    (timestamp: 101)

Result: name = 'Bob' (higher timestamp wins)

Merge values:

For certain data types, you can merge conflicting values rather than choosing one. This works well for sets and counters.

Leader A: tags = ['php', 'laravel']
Leader B: tags = ['php', 'vue']

Result: tags = ['php', 'laravel', 'vue'] (union)

Custom resolution:

Complex domains often require application-specific conflict resolution logic that understands your business rules.

class ConflictResolver
{
    public function resolve($local, $remote, $base)
    {
        // Application-specific logic
        if ($local->updated_by_admin) {
            return $local;
        }
        return $remote;
    }
}

The three-way merge with base, local, and remote versions gives you the most flexibility to implement sophisticated resolution strategies.

Leaderless Replication

How It Works

Leaderless systems have no single leader. Clients write to multiple nodes directly and read from multiple nodes, using quorums to ensure consistency.

Write → Node 1 ✓
     → Node 2 ✓
     → Node 3 ✗
     (2 of 3 = success with W=2)

Read  → Node 1: value A
     → Node 2: value A
     → Node 3: value B (stale)
     (2 of 3 agree = return A with R=2)

Quorum Consistency

The quorum formula determines whether reads and writes overlap, guaranteeing that reads see the latest writes.

N = total nodes
W = write quorum
R = read quorum

If W + R > N, reads see latest writes

Example: N=3, W=2, R=2
- Write succeeds on 2+ nodes
- Read queries 2+ nodes
- At least 1 node has latest value

Read Repair

When a client detects inconsistency during reads, it can repair stale nodes by writing the current value back to them.

Client reads from 3 nodes:
- Node 1: version 5
- Node 2: version 5
- Node 3: version 4 (stale)

Client sends version 5 to Node 3 (repair)

Anti-Entropy

A background process continuously compares data across nodes using efficient data structures like Merkle trees to detect and repair inconsistencies.

Merkle Tree comparison:
Node A: hash(data) = abc123
Node B: hash(data) = abc123 ✓
Node C: hash(data) = def456 ✗ → sync needed

Failover Strategies

Automatic Failover

Automatic failover detects primary failure and promotes a replica without manual intervention. This requires consensus among nodes to prevent split-brain scenarios.

1. Detect primary failure (heartbeat timeout)
2. Elect new primary (consensus)
3. Reconfigure replicas
4. Update client connections

Manual Failover

In some cases, you may prefer manual control over failover. These commands promote a replica to become the new primary.

# PostgreSQL: Promote replica
pg_ctl promote -D /var/lib/postgresql/data

# MySQL: Stop slave, reset master
STOP SLAVE;
RESET MASTER;

Split-Brain Prevention

Split-brain occurs when multiple nodes believe they are the primary, potentially causing data divergence and corruption. Fencing mechanisms prevent the old primary from accepting writes.

Fencing: Ensure old primary can't accept writes

Methods:
- STONITH (Shoot The Other Node In The Head)
- Shared storage locks
- Network partition detection

The dramatic name "STONITH" reflects the seriousness of split-brain: sometimes the safest action is to forcibly shut down the questionable node.

Laravel Configuration

Read/Write Splitting

Laravel makes it easy to route reads to replicas and writes to the primary. The sticky option ensures a session reads from the primary after writing.

// config/database.php
'mysql' => [
    'read' => [
        ['host' => 'replica1.example.com'],
        ['host' => 'replica2.example.com'],
    ],
    'write' => [
        'host' => 'primary.example.com',
    ],
    'sticky' => true,  // Use write connection after writing
],

The sticky option is crucial for read-your-writes consistency. Without it, a user might not see their own changes immediately after saving.

Explicit Connection Selection

Sometimes you need explicit control over which connection to use, such as when reading critical data that must be current.

// Force primary for critical reads
$user = User::on('mysql::write')->find($id);

// Explicit replica
$users = User::on('mysql::read')->get();

Monitoring

Key Metrics

Monitoring replication lag is essential for maintaining data consistency and detecting issues before they affect users.

-- PostgreSQL: Check replication lag
SELECT
    client_addr,
    state,
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes
FROM pg_stat_replication;

-- MySQL: Check slave status
SHOW SLAVE STATUS\G
-- Look for: Seconds_Behind_Master

The lag_bytes metric in PostgreSQL shows how far behind replicas are in terms of WAL position. In MySQL, Seconds_Behind_Master provides a time-based lag estimate.

Alerting Thresholds

Set up alerts at multiple severity levels to catch replication issues before they become critical.

alerts:
  - name: ReplicationLag
    condition: lag_seconds > 30
    severity: warning

  - name: ReplicationLagCritical
    condition: lag_seconds > 300
    severity: critical

  - name: ReplicationBroken
    condition: slave_io_running = 'No'
    severity: critical

A warning at 30 seconds gives you time to investigate, while critical alerts at 5 minutes or replication failure demand immediate attention.

Choosing a Strategy

Requirement Strategy
Simple scaling Single-leader
Multi-datacenter Multi-leader
High write availability Leaderless
Strong consistency Synchronous single-leader
Low latency Async replication
Conflict tolerance Multi-leader with resolution

Conclusion

Database replication involves tradeoffs between consistency, availability, and latency. Single-leader replication is simplest and suits most applications. Multi-leader enables multi-datacenter deployments but requires conflict handling. Leaderless systems offer high availability but eventual consistency. Choose based on your consistency requirements, failure tolerance, and geographic distribution needs.

Share this article

Related Articles

Distributed Locking Patterns

Coordinate access to shared resources across services. Implement distributed locks with Redis, ZooKeeper, and databases.

Jan 16, 2026

Need help with your project?

Let's discuss how we can help you build reliable software.