Metrics Cardinality Explosion: How to Monitor Without Blowing Up Your Bill

Philip Rehberger Mar 17, 2026 6 min read

Adding a user ID label to a Prometheus metric seems harmless. With 100,000 users, it creates 100,000 unique time series. This is cardinality explosion, and it will crash your monitoring stack.

Metrics Cardinality Explosion: How to Monitor Without Blowing Up Your Bill

What Is Cardinality?

In the context of metrics, cardinality refers to the number of unique label combinations for a given metric. Every unique combination of label values creates a separate time series in your monitoring system.

A metric with no labels has cardinality 1. A metric with an environment label that has two values (production, staging) has cardinality 2. A metric with environment (2 values) and method (5 values) and status_code (10 values) has cardinality 2 * 5 * 10 = 100.

High cardinality is not inherently bad. The problem is unbounded cardinality: when a label's value set grows without limit.

The Explosion Scenario

Here's how it happens. An engineer wants to track API requests by user to understand which users are hammering the API:

// DANGEROUS: user_id can have millions of unique values
$metrics->increment('api_requests_total', [
    'method'  => $request->method(),
    'path'    => $request->path(),
    'user_id' => $request->user()?->id,  // Unbounded!
]);

In Prometheus, this creates a new time series for every user ID. With 100,000 users each making requests via 10 different endpoints, you have 1,000,000 time series for this single metric. Prometheus stores time series in memory; this will either crash your Prometheus instance or drive your managed monitoring bill into four figures per month.

Labels that are always unbounded:

  • User IDs
  • Request IDs or trace IDs
  • IP addresses
  • Email addresses
  • Any database row ID
  • Full URL paths (with query parameters)

How to Find High-Cardinality Metrics

In Prometheus, query for time series counts:

# Find metrics with the most time series
topk(10, count by (__name__)({__name__=~".+"}))

This shows your top 10 highest-cardinality metrics. Any metric with more than 10,000 time series deserves investigation.

For individual metrics:

# How many time series does this metric have?
count(http_requests_total)

# How many unique values does the user_id label have?
count(count by (user_id)(http_requests_total))

Fixing Cardinality Problems

Drop the High-Cardinality Label

The simplest fix: if you don't query by user ID at the metrics level, remove it.

// Safe: bounded label values
$metrics->increment('api_requests_total', [
    'method'      => $request->method(),
    'endpoint'    => $this->normalizedPath($request->path()),
    'status_code' => $response->status(),
]);

If you need per-user analysis, logs and distributed traces are better tools. Query your log aggregation system for "all requests by user 4821." Metrics are for aggregate analysis; logs and traces are for individual analysis.

Normalize Paths

URL paths often have high cardinality due to IDs embedded in them:

/api/invoices/1
/api/invoices/2
/api/invoices/3
...
/api/invoices/100000

Normalize to route templates:

private function normalizedPath(string $path): string
{
    // Replace numeric IDs with {id} placeholder
    return preg_replace('/\/\d+/', '/{id}', $path);
}

// /api/invoices/4821 becomes /api/invoices/{id}
// /api/clients/99/projects/7 becomes /api/clients/{id}/projects/{id}

Laravel's router knows the route name, which is always bounded:

$routeName = Route::currentRouteName() ?? 'unknown';
// 'admin.invoices.show' - always a fixed set of values

Bucket High-Cardinality Values

Sometimes you want some granularity without full cardinality. Bucket values into meaningful ranges:

private function invoiceSizeBucket(int $lineItemCount): string
{
    return match (true) {
        $lineItemCount <= 5   => 'small',
        $lineItemCount <= 20  => 'medium',
        $lineItemCount <= 100 => 'large',
        default              => 'xlarge',
    };
}

$metrics->increment('invoice_generated_total', [
    'size_bucket' => $this->invoiceSizeBucket($invoice->lineItems->count()),
]);

Now instead of potentially thousands of unique line_item_count values, you have four: small, medium, large, xlarge.

Use Recording Rules to Pre-Aggregate

If you need high-cardinality data for analysis but not for alerting, aggregate it with recording rules before it reaches long-term storage:

groups:
  - name: aggregation
    interval: 1m
    rules:
      # Aggregate per-endpoint metrics (high cardinality) into
      # per-service totals (low cardinality) for long-term storage
      - record: service:http_requests:rate5m
        expr: sum by (service, status_class) (
          label_replace(
            rate(http_requests_total[5m]),
            "status_class",
            "${1}xx",
            "status_code",
            "([0-9]).*"
          )
        )

Keep the high-cardinality raw metric for 24 hours (for debugging), and the pre-aggregated recording rule for 30 days (for trend analysis).

Prometheus Cardinality Limits

Configure Prometheus to reject metrics that exceed cardinality limits before they cause problems:

# prometheus.yml
global:
  scrape_interval: 15s

# Limit total time series
storage:
  tsdb:
    max-block-duration: 2h

# Use --storage.tsdb.max-block-bytes flag
# Limit per-metric cardinality in recording rules

For more granular control, use the enforce feature in the Prometheus Operator or configure relabeling rules to drop high-cardinality labels at scrape time:

scrape_configs:
  - job_name: 'laravel-app'
    metric_relabel_configs:
      # Drop user_id label before storing
      - action: labeldrop
        regex: user_id
      # Drop entire metric if it has an IP label
      - source_labels: [client_ip]
        action: drop
        regex: '.+'

High-Cardinality Observability with Other Tools

Some observability questions genuinely need high cardinality. "Which specific users are seeing errors?" "Which exact request IDs are slow?" These are legitimate questions that metrics aren't suited for.

Use the right tool:

Structured logs for per-user, per-request analysis. Your log aggregation system (Datadog, CloudWatch, Loki) can filter by user_id because logs are stored as individual documents, not as time series.

Distributed traces for per-request latency breakdown. Traces carry arbitrary attributes without cardinality concerns because they're stored as individual trace records.

Honeycomb is a specialized observability tool designed for high-cardinality analytics. Unlike Prometheus, it stores every event and lets you group and filter by any combination of fields. It's expensive but powerful for teams that need high-cardinality metric analysis.

Designing Metrics for Low Cardinality

When designing new metrics, follow these guidelines:

Enumerate allowed label values. Before adding a label, list every possible value. If you can't list them all, the label is high-cardinality.

// Good: finite, enumerable set
'status' => 'success|failure|timeout'
'tier'   => 'starter|professional|enterprise'
'method' => 'GET|POST|PUT|DELETE|PATCH'

// Bad: unbounded
'user_id'    => $user->id
'invoice_id' => $invoice->id
'ip_address' => $request->ip()

Aim for under 100 unique time series per metric. Most well-designed metrics have 5-50 time series. 100+ should trigger a review. 1000+ is a problem.

Use histograms instead of per-request metrics. Instead of tracking each request's latency as a separate event, use a histogram that pre-buckets values:

$metrics->histogram('http_request_duration_seconds', $duration, [
    'method'   => $request->method(),
    'endpoint' => Route::currentRouteName(),
]);

Histograms aggregate observations into predefined buckets, giving you percentile queries without per-request cardinality.

Monitoring Your Monitoring

Prometheus exposes its own metrics. Alert on them:

- alert: PrometheusHighCardinality
  expr: prometheus_tsdb_head_series > 1000000
  for: 5m
  annotations:
    summary: "Prometheus has over 1M time series - investigate cardinality"

- alert: PrometheusStorageHigh
  expr: (
    prometheus_tsdb_storage_blocks_bytes
    / prometheus_tsdb_storage_blocks_bytes offset 1d
  ) > 1.5
  for: 1h
  annotations:
    summary: "Prometheus storage grew 50% in 24 hours - cardinality explosion?"

Catching cardinality growth early, before it crashes your monitoring stack, is much better than discovering it during an incident when your dashboards are down.

Practical Takeaways

  • Every unique label combination creates a separate time series; unbounded label values cause exponential growth
  • Common offenders: user IDs, request IDs, full URL paths with parameters, IP addresses
  • Fix high cardinality by dropping the label, normalizing paths to route templates, or bucketing values
  • Use recording rules to pre-aggregate high-cardinality metrics before long-term storage
  • Use logs and traces for per-user and per-request analysis; use metrics for aggregate analysis
  • Alert on Prometheus time series count to catch cardinality explosions before they cause outages

Need help building reliable systems? We help teams architect software that scales. scopeforged.com

Share this article

Related Articles

Need help with your project?

Let's discuss how we can help you build reliable software.