Memory leaks have a distinctive signature in production: everything works fine after a deployment, then gradually slows down over hours or days, until the process is restarted and the cycle begins again. The restart masks the symptom, but the leak continues.
In garbage-collected languages, leaks typically mean objects that should be freed are being held by references that aren't being cleaned up. In languages with manual memory management, it's often a missing free call or improper resource cleanup. In both cases, the process grows indefinitely until it crashes or becomes too slow to function.
Recognizing Memory Leaks
Before diagnosing, recognize the pattern:
Leak indicators:
- Memory usage grows monotonically over time
- Restarts temporarily fix performance issues
- Memory usage after GC still trends upward
- Process eventually OOMKilled by the OS or container runtime
- Response times degrade over a process's lifetime
Not necessarily a leak:
- Memory usage increases with traffic (expected — caches fill)
- Memory spikes during peak hours, returns to baseline at night
- High memory after a large batch job that caches results
The key distinction: a leak grows continuously without bound. Normal memory growth plateaus.
Monitoring for Memory Leaks
Process-Level Memory Tracking
// Node.js: track memory over time
const MEMORY_CHECK_INTERVAL = 30 * 1000; // 30 seconds
const MEMORY_ALERT_THRESHOLD = 500 * 1024 * 1024; // 500MB
setInterval(() => {
const usage = process.memoryUsage();
const metrics = {
rss: Math.round(usage.rss / 1024 / 1024), // Resident Set Size (total)
heapTotal: Math.round(usage.heapTotal / 1024 / 1024), // Heap allocated
heapUsed: Math.round(usage.heapUsed / 1024 / 1024), // Heap in use
external: Math.round(usage.external / 1024 / 1024), // C++ objects
arrayBuffers: Math.round(usage.arrayBuffers / 1024 / 1024), // ArrayBuffers
uptimeMinutes: Math.round(process.uptime() / 60)
};
logger.info('memory_usage', metrics);
if (usage.heapUsed > MEMORY_ALERT_THRESHOLD) {
logger.warn('high_memory_usage', {
heapUsedMB: metrics.heapUsed,
threshold: MEMORY_ALERT_THRESHOLD / 1024 / 1024
});
}
}, MEMORY_CHECK_INTERVAL);
Alerting on Growth Rate
Absolute memory usage isn't as informative as the growth rate:
# Python: track memory with growth rate detection
import psutil
import time
from collections import deque
class MemoryLeakDetector:
def __init__(self, window_minutes=60):
self.window = deque(maxlen=window_minutes * 2) # Every 30 seconds
self.process = psutil.Process()
def check(self):
current_mb = self.process.memory_info().rss / 1024 / 1024
self.window.append((time.time(), current_mb))
if len(self.window) < 10:
return # Not enough data
# Calculate growth rate over the window
oldest_time, oldest_mb = self.window[0]
newest_time, newest_mb = self.window[-1]
elapsed_minutes = (newest_time - oldest_time) / 60
if elapsed_minutes > 0:
growth_rate_per_hour = (newest_mb - oldest_mb) / elapsed_minutes * 60
if growth_rate_per_hour > 50: # Growing > 50MB/hour
alert(f'Potential memory leak: +{growth_rate_per_hour:.1f}MB/hour')
Container-Level Monitoring
# Kubernetes: set memory limits and track OOMKills
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: api
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi" # Container killed if it exceeds this
# Check for OOMKill events
kubectl get events --field-selector reason=OOMKilling
# Check container restart count (repeated restarts = likely leak)
kubectl get pods -o json | jq '
.items[] |
{
name: .metadata.name,
restarts: .status.containerStatuses[0].restartCount,
lastState: .status.containerStatuses[0].lastState
}' | jq 'select(.restarts > 5)'
Diagnosing Node.js Memory Leaks
Heap Snapshots
A heap snapshot captures all objects in memory and their references at a point in time. Compare two snapshots (before and after a suspected leak) to find what's growing:
// Take heap snapshot via HTTP endpoint (internal only)
const v8 = require('v8');
const fs = require('fs');
app.get('/internal/heap-snapshot', (req, res) => {
// Require internal auth token
if (req.headers['x-internal-token'] !== process.env.INTERNAL_TOKEN) {
return res.status(403).end();
}
const filename = `/tmp/heap-${Date.now()}.heapsnapshot`;
const snapshotStream = v8.writeHeapSnapshot(filename);
res.json({ file: snapshotStream, size: fs.statSync(snapshotStream).size });
});
# From a running production pod
kubectl exec -it api-pod-abc123 -- node -e "
const v8 = require('v8');
const f = v8.writeHeapSnapshot('/tmp/heap1.heapsnapshot');
console.log('Written to:', f);
"
# Download the snapshot
kubectl cp api-pod-abc123:/tmp/heap1.heapsnapshot ./heap1.heapsnapshot
# Open in Chrome DevTools:
# DevTools → Memory → Load → select heapsnapshot file
Take a snapshot, wait for the leak to grow, take another. Use the "Comparison" view in Chrome DevTools to see what objects grew.
Common Node.js Leak Patterns
Unbounded in-memory caches:
// Leak: cache grows without limit
const cache = new Map();
function getUser(userId) {
if (!cache.has(userId)) {
cache.set(userId, db.query('SELECT * FROM users WHERE id = ?', [userId]));
}
return cache.get(userId);
}
// After 1 million unique users, cache holds 1 million objects
// Fix: use LRU cache with size limit
const LRU = require('lru-cache');
const cache = new LRU({
max: 1000, // Maximum 1000 items
ttl: 1000 * 300, // 5 minute TTL
});
Event listener accumulation:
// Leak: listener added on every request but never removed
app.get('/stream', (req, res) => {
// BUG: new listener added for every incoming request
emitter.on('data', (data) => {
res.write(data);
});
// When request ends, listener is never cleaned up
// After 1000 requests: 1000 listeners firing for every event
});
// Fix: clean up listeners when connection closes
app.get('/stream', (req, res) => {
const handler = (data) => res.write(data);
emitter.on('data', handler);
// Critical: remove listener when client disconnects
req.on('close', () => {
emitter.off('data', handler);
});
});
Closures capturing large objects:
// Leak: closure captures entire large object
function processLargeDataset(data) {
const summary = { count: data.length, total: data.reduce((s, d) => s + d.value, 0) };
// BUG: timer closure captures 'data' (potentially hundreds of MB)
setTimeout(() => {
console.log('Processing complete for dataset size:', data.length);
// 'data' is kept alive until this timer fires
}, 60000);
return summary;
}
// Fix: only capture what you need
function processLargeDataset(data) {
const summary = { count: data.length, total: data.reduce((s, d) => s + d.value, 0) };
const dataLength = data.length; // Capture only the primitive
setTimeout(() => {
console.log('Processing complete for dataset size:', dataLength);
// 'data' is now eligible for GC
}, 60000);
return summary;
}
Diagnosing Python Memory Leaks
tracemalloc
Python's built-in memory tracing:
import tracemalloc
import linecache
def take_snapshot(label):
"""Print the top 10 memory allocations."""
snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics('lineno')
print(f"\n=== Memory snapshot: {label} ===")
for stat in stats[:10]:
frame = stat.traceback[0]
filename = frame.filename
lineno = frame.lineno
line = linecache.getline(filename, lineno).strip()
print(f"{stat.size / 1024:.1f} KB: {filename}:{lineno}: {line}")
# Usage:
tracemalloc.start()
handle_requests_for_a_while()
take_snapshot('after_load_test')
tracemalloc.stop()
memory_profiler for Function-Level Analysis
from memory_profiler import profile
@profile
def process_orders(order_ids: list[int]):
results = []
for order_id in order_ids:
order = Order.objects.select_related('user', 'items').get(id=order_id)
results.append(transform_order(order))
return results
# Run with:
# python -m memory_profiler my_script.py
# Output:
# Line # Mem usage Increment Line Contents
# ==============================================
# 5 45.2 MiB 45.2 MiB def process_orders(order_ids):
# 6 45.2 MiB 0.0 MiB results = []
# 7 45.2 MiB 0.0 MiB for order_id in order_ids:
# 8 182.7 MiB 137.5 MiB order = Order.objects...get(id=order_id)
# ← Memory grows here each iteration, not released
Django ORM Leak: Iterator vs All
# Leak: loads all 1M orders into memory at once
def export_orders():
orders = Order.objects.filter(status='completed').all()
for order in orders: # All 1M orders already in RAM
write_to_csv(order)
# Fix: use iterator() for memory-efficient streaming
def export_orders():
orders = Order.objects.filter(status='completed').iterator(chunk_size=1000)
for order in orders: # Loads 1000 at a time, releases after processing
write_to_csv(order)
Diagnosing Go Memory Leaks
// Go: expose pprof endpoint for production profiling
import (
"net/http"
_ "net/http/pprof"
"runtime"
)
func main() {
// Enable pprof on internal port
go func() {
http.ListenAndServe("127.0.0.1:6060", nil)
}()
// Enable GC debug info
runtime.SetGCPercent(20) // Trigger GC more aggressively
// Default is 100 (GC when heap doubles)
// Lower = more frequent GC, higher = fewer GC pauses
startApplication()
}
# Capture heap profile
go tool pprof http://localhost:6060/debug/pprof/heap
(pprof) top20 # Show top allocators by memory
(pprof) list myFunc # Annotated source: bytes allocated per line
(pprof) web # Open flame graph in browser
# Compare two heap profiles to find what's growing
go tool pprof -base heap1.pb.gz heap2.pb.gz
(pprof) top20 # Shows allocations that grew between snapshots
Common Go leak: goroutine leak
// Leak: goroutine started, never exits
func handleRequest(conn net.Conn) {
go func() {
// This goroutine blocks forever if channel is never closed
data := <-dataChannel
process(data)
}()
// If dataChannel is never closed, goroutine accumulates
}
// Check goroutine count
http.Get("http://localhost:6060/debug/pprof/goroutine?debug=1")
// Growing goroutine count = goroutine leak
// Fix: use context cancellation
func handleRequest(ctx context.Context, conn net.Conn) {
go func() {
select {
case data := <-dataChannel:
process(data)
case <-ctx.Done():
return // Goroutine exits when request context is cancelled
}
}()
}
Prevention Practices
Code Review Checklist
Review these patterns for memory issues:
□ Caches: does every cache have a size limit and/or TTL?
□ Event listeners: is every addEventListener paired with removeEventListener?
□ Database results: is the result set size bounded? (LIMIT, pagination)
□ Background goroutines/threads: do they have exit conditions?
□ Long-lived objects: do they hold references to large data?
□ Connection pools: are connections returned after use?
□ File handles: are files/sockets closed in finally/defer blocks?
□ Timers: are recurring timers cleared when components unmount?
Load Testing for Memory
Run load tests long enough to surface leaks:
# k6: sustained load test to surface memory leaks
k6 run --duration 1h --vus 50 \
--out influxdb=http://influxdb:8086/k6 \
load-test.js
# Monitor memory during test:
# kubectl top pods -w # Watch memory over the hour
# Alert if any pod grows > 200MB above baseline
A 30-minute load test often reveals leaks that wouldn't appear in a 5-minute smoke test. If memory is still climbing at the end of 1 hour, you have a leak.
Memory leaks punish you quietly and then suddenly. The process that's been slowly bloating for three days crashes at 3 AM on a Friday. Building memory tracking, alerting, and regular heap analysis into your workflow is the difference between finding leaks in development and finding them during incidents.
Building something that needs to scale? We help teams architect systems that grow with their business. scopeforged.com