Prometheus is an open-source monitoring system designed for reliability and scalability. It collects metrics by scraping HTTP endpoints, stores them in a time-series database, and provides a powerful query language for analysis and alerting. Understanding Prometheus architecture and best practices helps you build effective monitoring for your applications.
The pull-based model is fundamental to Prometheus. Instead of applications pushing metrics, Prometheus periodically scrapes metric endpoints. This simplifies application code, enables service discovery, and allows Prometheus to detect when targets are down.
Metrics Types
Prometheus supports four metric types, each suited for different measurements.
Counters track cumulative values that only increase. Request counts, error counts, and bytes transferred are counters. Query rate of change using rate() or increase().
// Exposing counter metrics in PHP
class MetricsController extends Controller
{
public function index(): Response
{
$metrics = [];
// Counter: total requests
$requestCount = Cache::get('metrics:request_count', 0);
$metrics[] = "# HELP http_requests_total Total HTTP requests";
$metrics[] = "# TYPE http_requests_total counter";
$metrics[] = "http_requests_total{method=\"GET\",status=\"200\"} $requestCount";
return response(implode("\n", $metrics))
->header('Content-Type', 'text/plain');
}
}
Gauges track values that can go up or down. Queue depth, active connections, and temperature are gauges. Query current values directly.
// Gauge: current queue depth
$queueDepth = Queue::size('default');
$metrics[] = "# HELP queue_depth Current jobs in queue";
$metrics[] = "# TYPE queue_depth gauge";
$metrics[] = "queue_depth{queue=\"default\"} $queueDepth";
Histograms track the distribution of values. Request latency and response sizes benefit from histograms. They provide count, sum, and configurable buckets for calculating percentiles.
// Histogram: request duration
$buckets = [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10];
$metrics[] = "# HELP http_request_duration_seconds Request duration";
$metrics[] = "# TYPE http_request_duration_seconds histogram";
foreach ($buckets as $bucket) {
$count = Cache::get("metrics:duration_bucket:$bucket", 0);
$metrics[] = "http_request_duration_seconds_bucket{le=\"$bucket\"} $count";
}
$metrics[] = "http_request_duration_seconds_bucket{le=\"+Inf\"} " . Cache::get('metrics:duration_count', 0);
$metrics[] = "http_request_duration_seconds_sum " . Cache::get('metrics:duration_sum', 0);
$metrics[] = "http_request_duration_seconds_count " . Cache::get('metrics:duration_count', 0);
Summaries are similar to histograms but calculate quantiles on the client side. They're less flexible but more accurate for specific quantiles. Generally prefer histograms for new implementations.
Instrumenting Applications
Good instrumentation captures what's happening inside your application. The RED method covers key metrics: Rate, Errors, and Duration.
class MetricsMiddleware
{
public function handle(Request $request, Closure $next): Response
{
$start = microtime(true);
$response = $next($request);
$duration = microtime(true) - $start;
$method = $request->method();
$status = $response->status();
$path = $this->normalizePath($request->path());
// Increment request counter
$this->incrementCounter("requests:$method:$path:$status");
// Record duration in histogram buckets
$this->recordHistogram("duration:$method:$path", $duration);
return $response;
}
private function normalizePath(string $path): string
{
// Replace IDs with placeholders to control cardinality
return preg_replace('/\/\d+/', '/:id', $path);
}
private function incrementCounter(string $key): void
{
Cache::increment("metrics:$key");
}
private function recordHistogram(string $key, float $value): void
{
Cache::increment("metrics:$key:count");
Cache::increment("metrics:$key:sum", $value);
$buckets = [0.01, 0.05, 0.1, 0.5, 1, 5];
foreach ($buckets as $bucket) {
if ($value <= $bucket) {
Cache::increment("metrics:$key:bucket:$bucket");
}
}
}
}
For production PHP applications, use dedicated libraries like promphp/prometheus_client_php that handle metric storage and exposition efficiently.
PromQL Fundamentals
PromQL (Prometheus Query Language) retrieves and transforms metrics. Understanding PromQL is essential for dashboards and alerts.
Instant vectors return the most recent value for each time series:
# All HTTP request counters
http_requests_total
# Filter by label
http_requests_total{status="500"}
# Regex matching
http_requests_total{path=~"/api/.*"}
Range vectors return values over a time range, used with functions:
# Request rate over 5 minutes
rate(http_requests_total[5m])
# Error rate as percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100
Aggregation operators combine time series:
# Total requests per status code
sum by (status) (rate(http_requests_total[5m]))
# 99th percentile latency
histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
# Average across instances
avg without (instance) (process_cpu_seconds_total)
Alerting Rules
Prometheus alerting rules define conditions that trigger alerts. Alertmanager handles routing, grouping, and notification.
# prometheus/rules/application.yml
groups:
- name: application
rules:
# High error rate
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value | humanizePercentage }}"
# Slow responses
- alert: SlowResponses
expr: |
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m]))) > 1
for: 10m
labels:
severity: warning
annotations:
summary: "95th percentile latency above 1s"
description: "P95 latency is {{ $value | humanizeDuration }}"
# Service down
- alert: ServiceDown
expr: up{job="api"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.instance }} is down"
Alertmanager routes alerts to appropriate channels:
# alertmanager.yml
route:
receiver: 'default'
group_by: ['alertname', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: 'pagerduty'
- match:
severity: warning
receiver: 'slack'
receivers:
- name: 'default'
email_configs:
- to: 'ops@example.com'
- name: 'pagerduty'
pagerduty_configs:
- service_key: '<pagerduty-key>'
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/...'
channel: '#alerts'
Service Discovery
Prometheus discovers scrape targets dynamically. Static configuration works for small deployments, but dynamic discovery scales better.
# prometheus.yml
scrape_configs:
# Static targets
- job_name: 'api'
static_configs:
- targets: ['api-1:9090', 'api-2:9090']
# Kubernetes service discovery
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Only scrape pods with annotation
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# Use custom port if specified
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}:$2
# EC2 service discovery
- job_name: 'ec2'
ec2_sd_configs:
- region: us-east-1
port: 9090
relabel_configs:
- source_labels: [__meta_ec2_tag_Environment]
target_label: environment
Recording Rules
Recording rules precompute expensive queries, improving dashboard performance and enabling alerts on complex expressions.
groups:
- name: aggregations
rules:
# Precompute request rate by service
- record: job:http_requests:rate5m
expr: sum by (job) (rate(http_requests_total[5m]))
# Precompute error ratio
- record: job:http_errors:ratio5m
expr: |
sum by (job) (rate(http_requests_total{status=~"5.."}[5m]))
/ sum by (job) (rate(http_requests_total[5m]))
# Precompute latency percentiles
- record: job:http_latency:p99
expr: |
histogram_quantile(0.99, sum by (job, le) (rate(http_request_duration_seconds_bucket[5m])))
Use recording rules when queries take too long for dashboards, when the same expensive query appears in multiple places, or when you need to alert on aggregated data.
High Availability
Single Prometheus instances are single points of failure. For production, run multiple Prometheus instances scraping the same targets.
# Run two identical Prometheus instances
# Both scrape the same targets and evaluate the same rules
# Use external labels to distinguish them
global:
external_labels:
prometheus_replica: 'prometheus-1'
Thanos or Cortex provide long-term storage and global query across multiple Prometheus instances:
# Thanos sidecar uploads to object storage
- name: thanos-sidecar
image: thanos:latest
args:
- sidecar
- --prometheus.url=http://localhost:9090
- --objstore.config-file=/etc/thanos/bucket.yml
Conclusion
Prometheus provides a robust foundation for application monitoring. The pull-based model and powerful query language enable flexible metrics collection and analysis. Proper instrumentation using the RED method captures application behavior. Alerting rules and Alertmanager provide reliable notification.
Start with basic metrics; request rate, error rate, and latency. Add application-specific metrics as you identify what's important to monitor. Use recording rules for expensive queries and dashboards. Run multiple instances for high availability. Prometheus scales with your monitoring needs.