Implementing Rate Limiting from Scratch

Rate limiting protects services from excessive use, whether from legitimate traffic spikes, misbehaving clients, or malicious attacks. Without rate limiting, a single client can monopolize resources, degrading service for everyone else. Effective rate limiting balances protecting the system while allowing legitimate usage.

The challenge is defining what's "excessive." The same rate might be normal for one use case and abusive for another. Rate limiting strategies must account for different client types, endpoints, and usage patterns.

Token Bucket Algorithm

The token bucket is the most common rate limiting algorithm. Imagine a bucket that holds tokens. Tokens are added at a fixed rate until the bucket is full. Each request consumes a token. If no tokens are available, the request is rejected or delayed.

This model allows burst traffic while enforcing an average rate. A bucket with capacity 100 and refill rate of 10/second allows 100 requests instantly (the burst), then 10 requests per second sustained. Here's how you can implement a basic token bucket in PHP. The key insight is calculating how many tokens have accumulated since the last request.

class TokenBucketRateLimiter
{
    private float $tokensPerSecond;
    private int $bucketCapacity;

    public function attempt(string $key): bool
    {
        $now = microtime(true);

        $bucket = $this->getBucket($key);

        // Calculate tokens to add since last request
        $elapsed = $now - $bucket['last_request'];
        $tokensToAdd = $elapsed * $this->tokensPerSecond;

        // Update token count (capped at capacity)
        $newTokens = min(
            $this->bucketCapacity,
            $bucket['tokens'] + $tokensToAdd
        );

        if ($newTokens < 1) {
            return false; // Rate limited
        }

        // Consume a token
        $this->saveBucket($key, [
            'tokens' => $newTokens - 1,
            'last_request' => $now,
        ]);

        return true;
    }
}

You'll notice the algorithm doesn't require a background process to add tokens. Instead, it calculates accumulated tokens on each request based on elapsed time. This makes the implementation stateless between requests.

Sliding Window Algorithm

Sliding window rate limiting provides smoother rate enforcement than fixed windows. Instead of resetting at fixed intervals, it considers requests over a rolling time period.

The sliding window log tracks exact timestamps of each request. To check if a request is allowed, count requests within the window. This is accurate but memory-intensive for high-volume scenarios. Use this approach when you need precise rate limiting and your request volume is manageable.

class SlidingWindowLogLimiter
{
    private int $maxRequests;
    private int $windowSeconds;

    public function attempt(string $key): bool
    {
        $now = time();
        $windowStart = $now - $this->windowSeconds;

        // Remove old entries
        $this->redis->zRemRangeByScore($key, '-inf', $windowStart);

        // Count current requests
        $count = $this->redis->zCard($key);

        if ($count >= $this->maxRequests) {
            return false;
        }

        // Add this request
        $this->redis->zAdd($key, $now, uniqid());
        $this->redis->expire($key, $this->windowSeconds);

        return true;
    }
}

The sliding window counter approximates the log approach with less memory. It combines counts from the current and previous windows, weighted by time elapsed. This gives you most of the benefits of sliding windows without the memory overhead.

class SlidingWindowCounterLimiter
{
    private int $maxRequests;
    private int $windowSeconds;

    public function attempt(string $key): bool
    {
        $now = time();
        $currentWindow = floor($now / $this->windowSeconds);
        $previousWindow = $currentWindow - 1;

        // Get counts from both windows
        $currentCount = (int) $this->redis->get("{$key}:{$currentWindow}") ?? 0;
        $previousCount = (int) $this->redis->get("{$key}:{$previousWindow}") ?? 0;

        // Weight previous window by overlap
        $elapsed = $now % $this->windowSeconds;
        $previousWeight = ($this->windowSeconds - $elapsed) / $this->windowSeconds;

        $weightedCount = $currentCount + ($previousCount * $previousWeight);

        if ($weightedCount >= $this->maxRequests) {
            return false;
        }

        // Increment current window
        $this->redis->incr("{$key}:{$currentWindow}");
        $this->redis->expire("{$key}:{$currentWindow}", $this->windowSeconds * 2);

        return true;
    }
}

The weighted calculation smooths the transition between windows. At the start of a new window, the previous window has full weight. As time progresses, the previous window's influence decreases until it has no effect.

Rate Limit Keys

The key determines what gets rate limited together. Common key strategies serve different purposes.

Per-IP limiting protects against single sources overwhelming the system. It's simple but doesn't work well for shared IPs (corporate NAT, mobile carriers).

Per-user limiting provides fair access regardless of source IP. Authenticated users are identified consistently across devices and networks.

Per-API-key limiting suits services with API key authentication. Each key gets its own quota, enabling different limits for different customers.

The following strategy class shows how to select the appropriate key based on the request context. You'll want to adapt this logic to your authentication system and business requirements.

class RateLimitKeyStrategy
{
    public function getKey(Request $request, string $endpoint): string
    {
        // Authenticated users: rate limit by user
        if ($user = $request->user()) {
            return "rate:{$endpoint}:user:{$user->id}";
        }

        // API key requests: rate limit by key
        if ($apiKey = $request->header('X-API-Key')) {
            return "rate:{$endpoint}:key:{$apiKey}";
        }

        // Anonymous: rate limit by IP
        return "rate:{$endpoint}:ip:{$request->ip()}";
    }
}

Including the endpoint in the key allows different limits for different operations. A search endpoint might have stricter limits than a profile view endpoint.

Implementing in Laravel

Laravel provides built-in rate limiting through middleware. The throttle middleware uses cache-based limiters. You can define custom limiters in your RouteServiceProvider for more complex scenarios.

// routes/api.php
Route::middleware(['throttle:api'])->group(function () {
    Route::get('/users', [UserController::class, 'index']);
    Route::post('/orders', [OrderController::class, 'store']);
});

// app/Providers/RouteServiceProvider.php
protected function configureRateLimiting(): void
{
    RateLimiter::for('api', function (Request $request) {
        return Limit::perMinute(60)->by($request->user()?->id ?: $request->ip());
    });

    // Different limits for different endpoints
    RateLimiter::for('uploads', function (Request $request) {
        return Limit::perHour(10)->by($request->user()->id);
    });

    // Tiered limits based on subscription
    RateLimiter::for('tiered', function (Request $request) {
        $user = $request->user();

        return match ($user?->subscription_tier) {
            'enterprise' => Limit::perMinute(1000)->by($user->id),
            'professional' => Limit::perMinute(100)->by($user->id),
            default => Limit::perMinute(10)->by($user?->id ?: $request->ip()),
        };
    });
}

The tiered example demonstrates how rate limits can vary by user type. Enterprise customers get higher limits than free users, creating incentive to upgrade while protecting your infrastructure.

Communicating Limits

Clients need to know their rate limit status. Standard headers communicate current limits and remaining quota. Providing this information helps clients self-regulate and build better user experiences.

class RateLimitHeadersMiddleware
{
    public function handle(Request $request, Closure $next): Response
    {
        $response = $next($request);

        $key = $this->getKey($request);
        $limits = $this->limiter->getLimits($key);

        $response->headers->set('X-RateLimit-Limit', $limits['limit']);
        $response->headers->set('X-RateLimit-Remaining', $limits['remaining']);
        $response->headers->set('X-RateLimit-Reset', $limits['reset']);

        if ($limits['remaining'] <= 0) {
            $response->headers->set('Retry-After', $limits['reset'] - time());
        }

        return $response;
    }
}

When limits are exceeded, return appropriate HTTP status codes and helpful error messages. The 429 status code is specifically designed for rate limiting. Always include a Retry-After header to tell clients when they can try again.

public function handleRateLimitExceeded(Request $request): JsonResponse
{
    $retryAfter = $this->limiter->availableIn($this->getKey($request));

    return response()->json([
        'error' => 'rate_limit_exceeded',
        'message' => 'Too many requests. Please try again later.',
        'retry_after' => $retryAfter,
    ], 429)->withHeaders([
        'Retry-After' => $retryAfter,
    ]);
}

A well-designed error response helps developers debug their integrations. Include the retry time both in the header and the response body for convenience.

Distributed Rate Limiting

In distributed systems, rate limiting must work across multiple servers. A limit of 100 requests per minute should apply to all requests, regardless of which server handles them.

Redis provides atomic operations for distributed counters. The INCR command with expiration handles simple fixed windows atomically. This implementation is simple and works well for most use cases.

class RedisRateLimiter
{
    public function attempt(string $key, int $limit, int $windowSeconds): array
    {
        $current = $this->redis->incr($key);

        if ($current === 1) {
            $this->redis->expire($key, $windowSeconds);
        }

        return [
            'allowed' => $current <= $limit,
            'remaining' => max(0, $limit - $current),
            'reset' => $this->redis->ttl($key),
        ];
    }
}

For token bucket in distributed systems, Lua scripts ensure atomicity. Without Lua, a race condition could allow multiple requests to read the same token count and each consume a token, exceeding the limit.

public function attemptTokenBucket(string $key, float $rate, int $capacity): bool
{
    $script = <<<LUA
        local now = tonumber(ARGV[1])
        local rate = tonumber(ARGV[2])
        local capacity = tonumber(ARGV[3])

        local bucket = redis.call('HMGET', KEYS[1], 'tokens', 'last')
        local tokens = tonumber(bucket[1]) or capacity
        local last = tonumber(bucket[2]) or now

        local elapsed = now - last
        tokens = math.min(capacity, tokens + (elapsed * rate))

        if tokens < 1 then
            return 0
        end

        redis.call('HMSET', KEYS[1], 'tokens', tokens - 1, 'last', now)
        redis.call('EXPIRE', KEYS[1], capacity / rate * 2)
        return 1
    LUA;

    return (bool) $this->redis->eval($script, [$key, microtime(true), $rate, $capacity], 1);
}

The Lua script executes atomically on the Redis server. All the token calculation and update happens in a single operation, eliminating race conditions that would occur with multiple Redis commands.

Graceful Degradation

When rate limits are approached, consider degrading gracefully rather than hard rejecting. Return cached responses, reduce response detail, or queue requests for later processing. This approach improves user experience during traffic spikes.

class GracefulRateLimiter
{
    public function handle(Request $request, Closure $next): Response
    {
        $usage = $this->limiter->getUsage($this->getKey($request));

        // Below 50%: full service
        if ($usage < 0.5) {
            return $next($request);
        }

        // 50-80%: return cached/simplified response
        if ($usage < 0.8) {
            return $this->simplifiedResponse($request);
        }

        // 80-100%: queue for later, return accepted
        if ($usage < 1.0) {
            $this->queueRequest($request);
            return response()->json(['status' => 'queued'], 202);
        }

        // Over limit: reject
        return $this->rateLimitExceeded($request);
    }
}

This graduated approach keeps the service responsive even under heavy load. Users get some response rather than a flat rejection, and the system stays protected.

Conclusion

Rate limiting protects services from overuse and abuse. Token bucket allows bursts while enforcing average rates. Sliding windows provide smooth enforcement. The rate limit key determines what gets limited together.

Communicate limits clearly through headers. Handle exceeded limits gracefully with appropriate status codes and retry information. In distributed systems, use Redis or similar for coordination. Consider graceful degradation as limits approach.

Effective rate limiting balances protection with usability. Too strict, and legitimate users suffer. Too lenient, and the system remains vulnerable. Monitor actual usage patterns and adjust limits based on real-world behavior.

Implementing Rate Limiting from Scratch

Token Bucket Algorithm

Sliding Window Algorithm

Rate Limit Keys

Implementing in Laravel

Communicating Limits

Distributed Rate Limiting

Graceful Degradation

Conclusion

Share this article

Related Articles

Cell-Based Architecture: Blast Radius Isolation at Scale

The Strangler Fig Pattern: Migrating Legacy Systems Without the Big Rewrite

Quantifying Technical Debt: Metrics That Actually Drive Action

Need help with your project?

ScopeForged Assistant