Distributed Locking Guide | Coordination Patterns for Microservices

Distributed locking coordinates access to shared resources across multiple processes or machines. In a single-process application, language-level locks handle synchronization. In distributed systems, processes don't share memory, so coordination requires external mechanisms. Getting distributed locking right is surprisingly difficult, and getting it wrong causes data corruption, lost updates, and race conditions.

The fundamental challenge is that distributed systems lack a global clock and messages can be delayed, lost, or reordered. A process that thinks it holds a lock might actually have lost it. A process waiting for a lock might receive a stale grant. These edge cases turn simple-sounding operations into complex distributed systems problems.

Why You Need Distributed Locks

Distributed locks prevent concurrent operations that would conflict. Classic examples include preventing double-processing of messages, serializing access to external APIs with rate limits, coordinating leader election, and ensuring only one process runs a scheduled task.

Consider a payment webhook handler. Stripe sends a webhook for a successful payment. Due to network issues, Stripe retries the webhook. Without locking, two processes might handle the "same" payment simultaneously, potentially double-crediting the user's account.

The following code shows a naive implementation that's vulnerable to race conditions. Even with a "check if already processed" guard, two processes could both pass the check before either marks the payment as processed.

class PaymentWebhookHandler
{
    public function handle(WebhookPayload $payload): void
    {
        $paymentId = $payload->paymentId;

        // Without locking - race condition possible
        if ($this->alreadyProcessed($paymentId)) {
            return;
        }

        // Another process might be here simultaneously
        $this->processPayment($payload);
        $this->markProcessed($paymentId);
    }
}

A distributed lock ensures only one process handles each payment. The following version acquires a lock before checking or processing, guaranteeing that concurrent webhook deliveries are serialized.

class PaymentWebhookHandler
{
    public function handle(WebhookPayload $payload): void
    {
        $paymentId = $payload->paymentId;
        $lockKey = "payment:process:{$paymentId}";

        $lock = $this->lockManager->acquire($lockKey, ttl: 30);

        if (!$lock) {
            // Another process is handling this payment
            throw new PaymentProcessingInProgressException();
        }

        try {
            if ($this->alreadyProcessed($paymentId)) {
                return;
            }

            $this->processPayment($payload);
            $this->markProcessed($paymentId);
        } finally {
            $lock->release();
        }
    }
}

The try/finally block ensures the lock is released even if processing throws an exception. The TTL (time-to-live) provides a safety net if the process crashes before releasing the lock.

Redis-Based Locking

Redis is popular for distributed locking due to its speed and atomic operations. A basic lock uses SET with NX (only set if not exists) and PX (expiration in milliseconds).

The following implementation shows a basic Redis lock with atomic acquire and release operations. Pay attention to the token-based release, which prevents a process from accidentally releasing another process's lock.

class RedisLock
{
    private Redis $redis;

    public function acquire(string $key, int $ttlMs): ?Lock
    {
        $token = bin2hex(random_bytes(16));

        $acquired = $this->redis->set(
            $key,
            $token,
            ['NX', 'PX' => $ttlMs]
        );

        if (!$acquired) {
            return null;
        }

        return new Lock($key, $token, $this);
    }

    public function release(string $key, string $token): bool
    {
        // Atomic check-and-delete to prevent releasing someone else's lock
        $script = <<<LUA
            if redis.call("get", KEYS[1]) == ARGV[1] then
                return redis.call("del", KEYS[1])
            else
                return 0
            end
        LUA;

        return (bool) $this->redis->eval($script, [$key, $token], 1);
    }
}

The token ensures a process only releases its own lock. Without this, a slow process might release a lock that a different process now holds. The Lua script makes the check-and-delete atomic.

Lock Expiration and Safety

Lock TTLs (time-to-live) prevent deadlocks when processes crash. If a process acquires a lock and dies without releasing it, the TTL ensures the lock eventually expires. But TTLs create their own problems.

If the lock expires while work is still in progress, another process can acquire it. Now two processes operate concurrently; exactly what the lock was supposed to prevent. This is called "lock expiration during processing" and it's a fundamental challenge in distributed locking.

Fencing tokens help detect stale locks. Each lock acquisition gets a monotonically increasing token. The protected resource checks that incoming operations have valid tokens and rejects operations with older tokens.

The following example demonstrates fencing tokens in action. The lock manager assigns an incrementing token with each acquisition, and the protected resource validates that operations come from the current lock holder.

class FencedLockManager
{
    public function acquire(string $key, int $ttlMs): ?FencedLock
    {
        // Atomically increment and set
        $fencingToken = $this->redis->incr("lock:fence:{$key}");

        $acquired = $this->redis->set(
            $key,
            json_encode(['token' => $fencingToken, 'owner' => $this->processId]),
            ['NX', 'PX' => $ttlMs]
        );

        if (!$acquired) {
            return null;
        }

        return new FencedLock($key, $fencingToken, $this);
    }
}

// Resource that uses fencing tokens
class ProtectedResource
{
    private int $lastSeenToken = 0;

    public function update(int $fencingToken, array $data): void
    {
        if ($fencingToken <= $this->lastSeenToken) {
            throw new StaleLockException(
                "Fencing token {$fencingToken} is not newer than {$this->lastSeenToken}"
            );
        }

        $this->lastSeenToken = $fencingToken;
        $this->doUpdate($data);
    }
}

Fencing tokens require cooperation from the protected resource. If you control both the lock client and the resource, this pattern provides strong safety guarantees even when locks expire unexpectedly.

Lock Renewal

Long-running operations need lock renewal. Instead of setting a long TTL (which delays recovery after crashes), acquire a short TTL and periodically renew it while work continues.

This implementation automatically renews the lock at half the TTL interval. If renewal fails (perhaps because of a network partition), the onLockLost callback signals the application to stop processing.

class RenewableLock
{
    private string $key;
    private string $token;
    private int $ttlMs;
    private bool $released = false;
    private ?TimerInterface $renewalTimer = null;

    public function __construct(string $key, string $token, int $ttlMs, RedisLock $manager)
    {
        $this->key = $key;
        $this->token = $token;
        $this->ttlMs = $ttlMs;
        $this->manager = $manager;

        // Renew at half the TTL interval
        $this->startRenewal($ttlMs / 2);
    }

    private function startRenewal(int $intervalMs): void
    {
        $this->renewalTimer = Loop::addPeriodicTimer(
            $intervalMs / 1000,
            function () {
                if ($this->released) {
                    return;
                }

                $renewed = $this->manager->renew($this->key, $this->token, $this->ttlMs);

                if (!$renewed) {
                    // Lost the lock - signal to stop processing
                    $this->onLockLost();
                }
            }
        );
    }

    public function release(): void
    {
        $this->released = true;

        if ($this->renewalTimer) {
            Loop::cancelTimer($this->renewalTimer);
        }

        $this->manager->release($this->key, $this->token);
    }
}

Renewing at half the TTL gives you a safety margin. If one renewal fails, you have another interval to retry before the lock actually expires. This approach balances responsiveness with reliability.

Redlock Algorithm

Single-Redis locking fails if Redis restarts or fails over. The Redlock algorithm addresses this by acquiring locks across multiple independent Redis instances. A lock is considered held only when acquired from a majority of instances.

The following implementation shows the core Redlock algorithm. You'll acquire locks from multiple Redis instances, verify you have a quorum, and account for the time elapsed during acquisition when calculating the remaining TTL.

class Redlock
{
    private array $instances;
    private int $quorum;
    private int $retryCount = 3;
    private int $retryDelayMs = 200;

    public function __construct(array $redisInstances)
    {
        $this->instances = $redisInstances;
        $this->quorum = floor(count($instances) / 2) + 1;
    }

    public function acquire(string $key, int $ttlMs): ?Lock
    {
        $token = bin2hex(random_bytes(16));

        for ($attempt = 0; $attempt < $this->retryCount; $attempt++) {
            $startTime = microtime(true) * 1000;
            $acquired = 0;
            $acquiredInstances = [];

            foreach ($this->instances as $instance) {
                if ($this->tryAcquire($instance, $key, $token, $ttlMs)) {
                    $acquired++;
                    $acquiredInstances[] = $instance;
                }
            }

            // Calculate time elapsed during acquisition
            $elapsedMs = (microtime(true) * 1000) - $startTime;
            $remainingTtl = $ttlMs - $elapsedMs;

            // Check if we have quorum and enough time remaining
            if ($acquired >= $this->quorum && $remainingTtl > 0) {
                return new Lock($key, $token, $acquiredInstances, $remainingTtl);
            }

            // Failed to get quorum - release any acquired locks
            foreach ($acquiredInstances as $instance) {
                $this->release($instance, $key, $token);
            }

            // Random delay before retry
            usleep(rand(0, $this->retryDelayMs) * 1000);
        }

        return null;
    }
}

The elapsed time calculation is crucial. If it takes too long to acquire locks from a majority, the first locks you acquired may have already expired by the time you finish. The algorithm accounts for this by verifying sufficient TTL remains.

Redlock is controversial. Martin Kleppmann's analysis showed it can fail under certain timing assumptions. For many applications, a single Redis instance with proper monitoring is sufficient. Use Redlock when you need stronger guarantees and understand its limitations.

Database-Based Locking

Databases can implement distributed locks using row-level locking or advisory locks. This works well when you already have a database and don't want additional infrastructure.

PostgreSQL provides advisory locks that are independent of table data. The following examples show both advisory locks (lightweight, session-scoped) and table-based locks (persistent, with TTL support).

// PostgreSQL advisory locks
class PostgresAdvisoryLock
{
    public function acquire(string $key): bool
    {
        $lockId = crc32($key);

        // pg_try_advisory_lock returns true if lock acquired
        $result = DB::select(
            "SELECT pg_try_advisory_lock(?) as acquired",
            [$lockId]
        );

        return $result[0]->acquired;
    }

    public function release(string $key): void
    {
        $lockId = crc32($key);
        DB::select("SELECT pg_advisory_unlock(?)", [$lockId]);
    }
}

// Table-based locking
class TableBasedLock
{
    public function acquire(string $key, int $ttlSeconds): bool
    {
        $expiresAt = now()->addSeconds($ttlSeconds);
        $token = Str::uuid()->toString();

        try {
            // Atomic insert with conflict handling
            DB::table('distributed_locks')->insert([
                'key' => $key,
                'token' => $token,
                'expires_at' => $expiresAt,
            ]);

            return true;
        } catch (UniqueConstraintViolationException $e) {
            // Lock exists - check if expired
            $existing = DB::table('distributed_locks')
                ->where('key', $key)
                ->first();

            if ($existing && $existing->expires_at < now()) {
                // Expired - try to take over
                $updated = DB::table('distributed_locks')
                    ->where('key', $key)
                    ->where('token', $existing->token)
                    ->update([
                        'token' => $token,
                        'expires_at' => $expiresAt,
                    ]);

                return $updated > 0;
            }

            return false;
        }
    }
}

Advisory locks are simpler but are tied to the database connection. If your connection drops, the lock is released automatically. Table-based locks persist across connections and support TTLs, but require more complex logic to handle expiration.

When to Avoid Distributed Locks

Distributed locks add complexity and potential failure modes. Before reaching for a lock, consider alternatives.

Idempotent operations don't need locks. If processing the same item twice produces the same result, concurrent processing is safe. Design operations to be idempotent when possible.

Optimistic concurrency uses version numbers instead of locks. Read the current version, do work, write with a version check. If the version changed, retry. This works well for low-contention scenarios.

Message partitioning ensures only one consumer processes each partition. Messages for the same entity go to the same partition, serializing access without explicit locks.

Conclusion

Distributed locking coordinates access to shared resources across processes. Redis provides fast, simple locking for most use cases. Fencing tokens and lock renewal handle edge cases around expiration. Database-based locking works when you want to avoid additional infrastructure.

Distributed locks are difficult to get completely right. Edge cases around timing, expiration, and failures are subtle. Prefer designs that avoid locks when possible; idempotent operations, optimistic concurrency, and partitioning often work better. When you do need locks, understand their failure modes and plan accordingly.

Distributed Locking Patterns

Why You Need Distributed Locks

Redis-Based Locking

Lock Expiration and Safety

Lock Renewal

Redlock Algorithm

Database-Based Locking

When to Avoid Distributed Locks

Conclusion

Share this article

Related Articles

Building Internal Developer Portals

Kubernetes Persistent Storage Solutions

Observability Pipeline Design

Need help with your project?

ScopeForged Assistant