Database replication copies data across multiple servers for scalability, availability, and disaster recovery. Understanding replication strategies helps you make informed decisions about consistency, performance, and failure handling.
Why Replicate?
Scaling Reads
Without replication, all database reads hit a single server, creating a bottleneck. With replication, you distribute read traffic across multiple replicas, dramatically increasing your read capacity.
Without replication:
All Reads → Single Database (bottleneck)
With replication:
Reads → Primary
→ Replica 1
→ Replica 2
→ Replica 3
High Availability
Replication provides automatic failover capability. When your primary fails, you can promote a replica to take its place with minimal downtime.
Primary fails → Promote replica → Minimal downtime
Geographic Distribution
Place replicas close to your users to reduce latency. Users in each region read from their local replica, improving response times significantly.
US Users → US Replica (low latency)
EU Users → EU Replica (low latency)
Single-Leader Replication
How It Works
In single-leader replication, one node (the primary) accepts all writes and propagates changes to replicas through a replication log. Any node can serve reads.
Writes → Primary → Replication Log → Replica 1
→ Replica 2
→ Replica 3
Reads → Any node
Synchronous vs Asynchronous
Synchronous:
With synchronous replication, the primary waits for at least one replica to acknowledge the write before returning success to the client. This provides strong consistency but adds latency.
Client → Primary → Wait for Replica ACK → Return Success
- Strong consistency
- Higher latency
- Replica failure blocks writes
Asynchronous:
Asynchronous replication returns success immediately after the primary commits, replicating in the background. This is faster but risks data loss if the primary fails before replication completes.
Client → Primary → Return Success
↓
Replicate (background)
- Eventually consistent
- Lower latency
- Data loss risk on primary failure
Semi-Synchronous
Semi-synchronous replication offers a middle ground. The primary waits for acknowledgment from at least one replica while other replicas receive updates asynchronously.
Client → Primary → Wait for ONE replica ACK → Return Success
↓
Other replicas async
Balance between consistency and performance.
PostgreSQL Configuration
Here is how you configure single-leader replication in PostgreSQL. The primary needs WAL (Write-Ahead Log) settings enabled, while the replica connects to the primary and runs in standby mode.
-- Primary: postgresql.conf
wal_level = replica
max_wal_senders = 10
synchronous_commit = on
synchronous_standby_names = 'replica1'
-- Replica: recovery.conf
standby_mode = on
primary_conninfo = 'host=primary port=5432'
trigger_file = '/tmp/postgresql.trigger'
The trigger file provides a simple failover mechanism: create this file on the replica, and it will promote itself to primary.
MySQL Configuration
MySQL uses binary logging for replication. You configure a unique server ID on each node, enable binary logging on the primary, and point the replica to the primary's log position.
-- Primary: my.cnf
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
-- Replica
CHANGE MASTER TO
MASTER_HOST='primary',
MASTER_USER='replication',
MASTER_PASSWORD='password',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=0;
START SLAVE;
Using ROW-based binary logging (binlog_format = ROW) is recommended for most scenarios as it captures the actual data changes rather than SQL statements.
Replication Lag
The Problem
With asynchronous replication, there is always a delay between writes to the primary and their appearance on replicas. This can lead to confusing user experiences.
Time 0: Write to Primary (balance = $100)
Time 1: Read from Replica (balance = $0, stale!)
Time 2: Replica catches up (balance = $100)
Handling Lag
Read-your-writes consistency:
After a user modifies data, force subsequent reads to the primary so they immediately see their changes.
// After write, read from primary
public function updateProfile(Request $request)
{
$user->update($request->all());
// Force read from primary
return User::on('primary')->find($user->id);
}
Monotonic reads:
Track which replica position a user has seen and ensure subsequent reads come from replicas at least as current. This prevents users from seeing data "go backward."
// Track replica position per user session
$lastPosition = session('replica_position');
$replica = $this->findReplicaAtPosition($lastPosition);
$user = User::on($replica)->find($id);
session(['replica_position' => $replica->position()]);
Causal consistency:
Include version information in requests so the server can wait for replicas to catch up before responding. This ensures causally related reads see their dependencies.
// Include version/timestamp in requests
public function getOrder(Request $request, $id)
{
$minVersion = $request->header('X-Min-Version');
$replica = $this->waitForVersion($minVersion);
return Order::on($replica)->find($id);
}
Multi-Leader Replication
Use Cases
Multi-leader replication is essential when you need writes to succeed in multiple geographic locations without cross-region latency:
- Multi-datacenter operation
- Offline-capable applications
- Collaborative editing
Architecture
In multi-leader setups, each datacenter has its own primary that accepts writes locally. Primaries replicate to each other asynchronously.
Datacenter A Datacenter B
↓ ↓
Primary A ←──────────→ Primary B
↓ ↓
Replicas Replicas
Conflict Resolution
When both leaders modify the same data, you need a strategy to resolve the conflict.
Last-write-wins (LWW):
The simplest approach uses timestamps to determine which write survives. This is easy to implement but can silently discard data.
Leader A: UPDATE users SET name='Alice' WHERE id=1 (timestamp: 100)
Leader B: UPDATE users SET name='Bob' WHERE id=1 (timestamp: 101)
Result: name = 'Bob' (higher timestamp wins)
Merge values:
For certain data types, you can merge conflicting values rather than choosing one. This works well for sets and counters.
Leader A: tags = ['php', 'laravel']
Leader B: tags = ['php', 'vue']
Result: tags = ['php', 'laravel', 'vue'] (union)
Custom resolution:
Complex domains often require application-specific conflict resolution logic that understands your business rules.
class ConflictResolver
{
public function resolve($local, $remote, $base)
{
// Application-specific logic
if ($local->updated_by_admin) {
return $local;
}
return $remote;
}
}
The three-way merge with base, local, and remote versions gives you the most flexibility to implement sophisticated resolution strategies.
Leaderless Replication
How It Works
Leaderless systems have no single leader. Clients write to multiple nodes directly and read from multiple nodes, using quorums to ensure consistency.
Write → Node 1 ✓
→ Node 2 ✓
→ Node 3 ✗
(2 of 3 = success with W=2)
Read → Node 1: value A
→ Node 2: value A
→ Node 3: value B (stale)
(2 of 3 agree = return A with R=2)
Quorum Consistency
The quorum formula determines whether reads and writes overlap, guaranteeing that reads see the latest writes.
N = total nodes
W = write quorum
R = read quorum
If W + R > N, reads see latest writes
Example: N=3, W=2, R=2
- Write succeeds on 2+ nodes
- Read queries 2+ nodes
- At least 1 node has latest value
Read Repair
When a client detects inconsistency during reads, it can repair stale nodes by writing the current value back to them.
Client reads from 3 nodes:
- Node 1: version 5
- Node 2: version 5
- Node 3: version 4 (stale)
Client sends version 5 to Node 3 (repair)
Anti-Entropy
A background process continuously compares data across nodes using efficient data structures like Merkle trees to detect and repair inconsistencies.
Merkle Tree comparison:
Node A: hash(data) = abc123
Node B: hash(data) = abc123 ✓
Node C: hash(data) = def456 ✗ → sync needed
Failover Strategies
Automatic Failover
Automatic failover detects primary failure and promotes a replica without manual intervention. This requires consensus among nodes to prevent split-brain scenarios.
1. Detect primary failure (heartbeat timeout)
2. Elect new primary (consensus)
3. Reconfigure replicas
4. Update client connections
Manual Failover
In some cases, you may prefer manual control over failover. These commands promote a replica to become the new primary.
# PostgreSQL: Promote replica
pg_ctl promote -D /var/lib/postgresql/data
# MySQL: Stop slave, reset master
STOP SLAVE;
RESET MASTER;
Split-Brain Prevention
Split-brain occurs when multiple nodes believe they are the primary, potentially causing data divergence and corruption. Fencing mechanisms prevent the old primary from accepting writes.
Fencing: Ensure old primary can't accept writes
Methods:
- STONITH (Shoot The Other Node In The Head)
- Shared storage locks
- Network partition detection
The dramatic name "STONITH" reflects the seriousness of split-brain: sometimes the safest action is to forcibly shut down the questionable node.
Laravel Configuration
Read/Write Splitting
Laravel makes it easy to route reads to replicas and writes to the primary. The sticky option ensures a session reads from the primary after writing.
// config/database.php
'mysql' => [
'read' => [
['host' => 'replica1.example.com'],
['host' => 'replica2.example.com'],
],
'write' => [
'host' => 'primary.example.com',
],
'sticky' => true, // Use write connection after writing
],
The sticky option is crucial for read-your-writes consistency. Without it, a user might not see their own changes immediately after saving.
Explicit Connection Selection
Sometimes you need explicit control over which connection to use, such as when reading critical data that must be current.
// Force primary for critical reads
$user = User::on('mysql::write')->find($id);
// Explicit replica
$users = User::on('mysql::read')->get();
Monitoring
Key Metrics
Monitoring replication lag is essential for maintaining data consistency and detecting issues before they affect users.
-- PostgreSQL: Check replication lag
SELECT
client_addr,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes
FROM pg_stat_replication;
-- MySQL: Check slave status
SHOW SLAVE STATUS\G
-- Look for: Seconds_Behind_Master
The lag_bytes metric in PostgreSQL shows how far behind replicas are in terms of WAL position. In MySQL, Seconds_Behind_Master provides a time-based lag estimate.
Alerting Thresholds
Set up alerts at multiple severity levels to catch replication issues before they become critical.
alerts:
- name: ReplicationLag
condition: lag_seconds > 30
severity: warning
- name: ReplicationLagCritical
condition: lag_seconds > 300
severity: critical
- name: ReplicationBroken
condition: slave_io_running = 'No'
severity: critical
A warning at 30 seconds gives you time to investigate, while critical alerts at 5 minutes or replication failure demand immediate attention.
Choosing a Strategy
| Requirement | Strategy |
|---|---|
| Simple scaling | Single-leader |
| Multi-datacenter | Multi-leader |
| High write availability | Leaderless |
| Strong consistency | Synchronous single-leader |
| Low latency | Async replication |
| Conflict tolerance | Multi-leader with resolution |
Conclusion
Database replication involves tradeoffs between consistency, availability, and latency. Single-leader replication is simplest and suits most applications. Multi-leader enables multi-datacenter deployments but requires conflict handling. Leaderless systems offer high availability but eventual consistency. Choose based on your consistency requirements, failure tolerance, and geographic distribution needs.