Distributed Tracing Guide | OpenTelemetry and Observability

Distributed tracing follows requests as they travel through microservices, revealing latency bottlenecks and failure points. OpenTelemetry has become the standard for implementing observability, unifying tracing, metrics, and logs.

Why Distributed Tracing?

The Visibility Problem

Consider a typical microservices request flow. When something goes wrong or runs slowly, finding the root cause becomes a guessing game without proper instrumentation.

This diagram shows a typical request flowing through multiple services. Without tracing, you have no way to know which service is causing the three-second latency.

User request → API Gateway → Auth Service → User Service → Database
                          → Order Service → Inventory Service → Database
                          → Payment Service → External API

Where did the 3-second latency come from?
Which service failed?
Why did this specific request fail when others succeeded?

Without tracing, debugging distributed systems is guesswork.

What Tracing Provides

Tracing gives you a hierarchical view of exactly where time is spent. Each operation is captured as a span, and you can see parent-child relationships that reveal the actual execution path.

With distributed tracing, you can see exactly where time is spent in each request. The visualization below immediately reveals that a slow database query in the Order Service is the root cause.

Trace ID: abc123
├── Span: API Gateway (50ms)
│   ├── Span: Auth Service (20ms)
│   └── Span: Order Service (2800ms) ← BOTTLENECK
│       ├── Span: Inventory Service (100ms)
│       └── Span: Database Query (2600ms) ← ROOT CAUSE
└── Span: Response (10ms)

This visualization immediately shows that a slow database query in the Order Service is causing the latency, not network issues or other services.

OpenTelemetry Concepts

Traces, Spans, and Context

Understanding the relationship between traces and spans is fundamental. A trace represents the complete journey of a request, while spans are the individual operations within that journey.

The following code demonstrates how to create and configure spans in OpenTelemetry. You'll set the span kind, add attributes for context, and establish parent-child relationships between spans.

// A trace represents the entire request journey
// trace_id: abc123

// Spans are individual operations within the trace
$span = $tracer->spanBuilder('process-order')
    ->setSpanKind(SpanKind::KIND_SERVER)
    ->startSpan();

// Spans have timing, status, and attributes
$span->setAttribute('order.id', $orderId);
$span->setAttribute('order.total', $total);

// Spans can be nested (parent-child relationship)
$childSpan = $tracer->spanBuilder('validate-inventory')
    ->setParent($span->storeInContext(Context::getCurrent()))
    ->startSpan();

Notice how the child span explicitly references its parent. This creates the hierarchical structure you see in trace visualizations.

Context Propagation

For tracing to work across service boundaries, context must travel with requests. This is how trace IDs and span IDs get passed between services.

When making HTTP calls between services, you need to propagate the trace context in headers. This code shows both the sending and receiving sides of context propagation.

// Context carries trace information across service boundaries
// HTTP headers propagate context automatically

// Outgoing request (sender)
$request = new Request('GET', 'http://inventory-service/check');
$propagator = TraceContextPropagator::getInstance();
$propagator->inject($request->getHeaders(), $request, $setter);

// Incoming request (receiver)
$context = $propagator->extract($request->getHeaders(), $request, $getter);
$span = $tracer->spanBuilder('handle-request')
    ->setParent($context)
    ->startSpan();

The W3C Trace Context format is the standard for propagating trace context via HTTP headers. OpenTelemetry handles this automatically when you use its HTTP client instrumentation.

Laravel Integration

Installation

Getting started with OpenTelemetry in Laravel requires installing the SDK and an exporter. OTLP (OpenTelemetry Protocol) is the recommended export format.

Run these composer commands to install the necessary packages for OpenTelemetry in your Laravel application.

composer require open-telemetry/sdk
composer require open-telemetry/exporter-otlp

Service Provider

You'll want to configure OpenTelemetry as a singleton in your service container. This ensures consistent tracer configuration throughout your application.

Create this service provider to configure the OpenTelemetry tracer. The singleton ensures all parts of your application use the same configured tracer instance.

// app/Providers/TracingServiceProvider.php
use OpenTelemetry\API\Trace\TracerInterface;
use OpenTelemetry\SDK\Trace\TracerProvider;
use OpenTelemetry\SDK\Trace\SpanProcessor\SimpleSpanProcessor;
use OpenTelemetry\Contrib\Otlp\SpanExporter;

class TracingServiceProvider extends ServiceProvider
{
    public function register(): void
    {
        $this->app->singleton(TracerInterface::class, function () {
            $exporter = new SpanExporter(
                (new \OpenTelemetry\Contrib\Otlp\OtlpHttpTransportFactory())
                    ->create(config('tracing.endpoint'), 'application/json')
            );

            $tracerProvider = new TracerProvider(
                new SimpleSpanProcessor($exporter)
            );

            return $tracerProvider->getTracer(
                config('app.name'),
                config('app.version')
            );
        });
    }
}

For production, consider using BatchSpanProcessor instead of SimpleSpanProcessor to reduce the performance impact of exporting spans.

Middleware for HTTP Tracing

This middleware wraps every incoming HTTP request in a span, capturing timing, status codes, and any errors that occur.

Add this middleware to automatically trace all HTTP requests. It extracts incoming context, creates a span for the request, and records the response status.

// app/Http/Middleware/TracingMiddleware.php
class TracingMiddleware
{
    public function __construct(private TracerInterface $tracer) {}

    public function handle(Request $request, Closure $next)
    {
        // Extract context from incoming headers
        $context = $this->extractContext($request);

        // Start span for this request
        $span = $this->tracer->spanBuilder($request->method() . ' ' . $request->path())
            ->setParent($context)
            ->setSpanKind(SpanKind::KIND_SERVER)
            ->startSpan();

        // Add request attributes
        $span->setAttribute('http.method', $request->method());
        $span->setAttribute('http.url', $request->fullUrl());
        $span->setAttribute('http.user_agent', $request->userAgent());

        // Store span in context for child spans
        $scope = $span->activate();

        try {
            $response = $next($request);

            $span->setAttribute('http.status_code', $response->status());

            if ($response->status() >= 400) {
                $span->setStatus(StatusCode::STATUS_ERROR);
            }

            return $response;
        } catch (Throwable $e) {
            $span->recordException($e);
            $span->setStatus(StatusCode::STATUS_ERROR, $e->getMessage());
            throw $e;
        } finally {
            $span->end();
            $scope->detach();
        }
    }
}

The finally block is critical here. It ensures spans are properly closed even when exceptions occur, preventing memory leaks and incomplete traces.

Database Query Tracing

Laravel's query events make it easy to trace every database query. This helps identify N+1 problems and slow queries in production.

Register this listener in your AppServiceProvider to automatically create spans for every database query. You'll see query timing and the actual SQL in your traces.

// app/Providers/AppServiceProvider.php
public function boot(): void
{
    DB::listen(function (QueryExecuted $query) {
        $tracer = app(TracerInterface::class);

        $span = $tracer->spanBuilder('db.query')
            ->setSpanKind(SpanKind::KIND_CLIENT)
            ->startSpan();

        $span->setAttribute('db.system', $query->connectionName);
        $span->setAttribute('db.statement', $query->sql);
        $span->setAttribute('db.duration_ms', $query->time);

        $span->end();
    });
}

Be careful about logging the full SQL statement in production if your queries might contain sensitive data. Consider truncating or sanitizing the statement.

HTTP Client Tracing

When your application calls external APIs, you'll want to trace those requests too. This wrapper ensures outgoing HTTP calls are visible in your traces.

Use this traced HTTP client wrapper for all external API calls. It creates spans for outgoing requests and injects trace context so downstream services can continue the trace.

// app/Services/TracedHttpClient.php
class TracedHttpClient
{
    public function __construct(
        private Client $client,
        private TracerInterface $tracer
    ) {}

    public function request(string $method, string $url, array $options = []): Response
    {
        $span = $this->tracer->spanBuilder("HTTP $method")
            ->setSpanKind(SpanKind::KIND_CLIENT)
            ->startSpan();

        $span->setAttribute('http.method', $method);
        $span->setAttribute('http.url', $url);

        // Inject trace context into outgoing headers
        $options['headers'] = $options['headers'] ?? [];
        $this->injectContext($options['headers']);

        try {
            $response = $this->client->request($method, $url, $options);

            $span->setAttribute('http.status_code', $response->getStatusCode());

            return $response;
        } catch (Throwable $e) {
            $span->recordException($e);
            $span->setStatus(StatusCode::STATUS_ERROR);
            throw $e;
        } finally {
            $span->end();
        }
    }
}

The context injection is what enables distributed tracing across service boundaries. Without it, you would only see traces within a single service.

Visualization with Jaeger

Docker Setup

Jaeger is a popular open-source tracing backend that provides a web UI for exploring traces. This Docker Compose configuration gets you started quickly for local development.

Run this Docker Compose configuration to start Jaeger locally. You'll access the UI at port 16686 and send traces to port 4318.

# docker-compose.yml
services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # UI
      - "4318:4318"    # OTLP HTTP
    environment:
      - COLLECTOR_OTLP_ENABLED=true

Configuration

Point your application at the Jaeger collector, and traces will start flowing immediately.

Add this configuration file to connect your Laravel application to Jaeger. The sample rate controls what percentage of traces are collected.

// config/tracing.php
return [
    'enabled' => env('TRACING_ENABLED', true),
    'endpoint' => env('OTEL_EXPORTER_OTLP_ENDPOINT', 'http://localhost:4318/v1/traces'),
    'service_name' => env('OTEL_SERVICE_NAME', config('app.name')),
    'sample_rate' => env('OTEL_SAMPLE_RATE', 1.0),
];

In production, you'll likely want to set the sample rate below 1.0 to reduce storage costs while still capturing representative traces.

Sampling Strategies

Head-Based Sampling

Head-based sampling decides whether to record a trace at the very beginning, before any processing happens. This is simple and efficient but can miss important traces.

Use head-based sampling when you want simple, predictable sampling. This example samples 10% of all traces.

// Decide at trace start whether to sample
use OpenTelemetry\SDK\Trace\Sampler\TraceIdRatioBasedSampler;

$sampler = new TraceIdRatioBasedSampler(0.1); // 10% of traces

$tracerProvider = new TracerProvider(
    processors: [$processor],
    sampler: $sampler
);

Tail-Based Sampling

Tail-based sampling waits until a trace is complete before deciding whether to keep it. This lets you prioritize interesting traces like errors or slow requests.

Tail-based sampling is configured in the OpenTelemetry Collector, not in your application. This configuration keeps all error traces and traces longer than one second.

// Decide after trace completes based on characteristics
// Requires a collector that buffers traces

// In OpenTelemetry Collector config:
// processors:
//   tail_sampling:
//     decision_wait: 10s
//     policies:
//       - name: errors
//         type: status_code
//         status_codes: [ERROR]
//       - name: slow-traces
//         type: latency
//         latency: {threshold_ms: 1000}

Tail-based sampling requires an OpenTelemetry Collector to buffer traces before the sampling decision. This adds infrastructure complexity but provides much better sampling quality.

Custom Sampling

When you need fine-grained control over which traces to keep, implement a custom sampler. This example always samples errors and specific critical endpoints.

This custom sampler demonstrates business-aware sampling. You can ensure critical paths like checkout are always traced while sampling other traffic at a lower rate.

class PrioritySampler implements SamplerInterface
{
    public function shouldSample(
        Context $parentContext,
        string $traceId,
        string $spanName,
        int $spanKind,
        Attributes $attributes
    ): SamplingResult {
        // Always sample errors
        if ($attributes->get('error') === true) {
            return new SamplingResult(SamplingDecision::RECORD_AND_SAMPLE);
        }

        // Always sample specific endpoints
        if (str_contains($spanName, '/api/checkout')) {
            return new SamplingResult(SamplingDecision::RECORD_AND_SAMPLE);
        }

        // Sample 10% of other traffic
        if (random_int(1, 100) <= 10) {
            return new SamplingResult(SamplingDecision::RECORD_AND_SAMPLE);
        }

        return new SamplingResult(SamplingDecision::DROP);
    }
}

This approach ensures you never miss traces for business-critical paths while still controlling costs for high-volume endpoints.

Best Practices

Meaningful Span Names

Span names should describe what the operation does, not just that something happened. Include the service name or class for easy identification.

Choose descriptive span names that will help you identify operations quickly when browsing traces. The examples below show the difference between generic and useful names.

// Bad: Generic names
$span = $tracer->spanBuilder('process')->startSpan();
$span = $tracer->spanBuilder('query')->startSpan();

// Good: Descriptive names
$span = $tracer->spanBuilder('OrderService.processPayment')->startSpan();
$span = $tracer->spanBuilder('SELECT users WHERE id = ?')->startSpan();

Useful Attributes

Attributes provide the context needed to debug issues. Add identifiers, counts, and relevant business data, but never include sensitive information.

Add attributes that will help you understand and debug issues. Include identifiers and business context, but be careful to avoid logging sensitive data.

// Add context that helps debugging
$span->setAttribute('user.id', $userId);
$span->setAttribute('order.id', $orderId);
$span->setAttribute('order.item_count', count($items));
$span->setAttribute('feature.flag', $featureEnabled);

// Don't add sensitive data
// $span->setAttribute('user.password', $password); // NO!
// $span->setAttribute('credit_card', $cardNumber); // NO!

Error Recording

When exceptions occur, record them with context that helps diagnose the problem. Include relevant state that led to the error.

Record exceptions with additional context that explains what was happening when the error occurred. This makes debugging much faster.

try {
    $result = $this->processOrder($order);
} catch (InsufficientInventoryException $e) {
    $span->recordException($e);
    $span->setStatus(StatusCode::STATUS_ERROR, 'Insufficient inventory');
    $span->setAttribute('inventory.requested', $e->requested);
    $span->setAttribute('inventory.available', $e->available);
    throw $e;
}

Span Events

Events mark significant moments within a span without creating child spans. Use them for cache misses, retries, or other noteworthy occurrences.

Use span events to record significant occurrences without the overhead of creating separate spans. They're ideal for cache misses, retries, and other notable moments.

// Record significant events within a span
$span->addEvent('cache.miss', [
    'cache.key' => $cacheKey,
]);

$span->addEvent('retry.attempt', [
    'retry.count' => $attempt,
    'retry.delay_ms' => $delay,
]);

Events are lightweight and don't create the overhead of additional spans, making them ideal for logging multiple occurrences within a single operation.

OpenTelemetry Collector

Centralized Processing

The OpenTelemetry Collector acts as a central hub for receiving, processing, and exporting telemetry data. This configuration receives traces via OTLP and exports to both Jaeger and Tempo.

Deploy this Collector configuration to centralize trace processing. The batch processor improves efficiency, and you can export to multiple backends simultaneously.

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

  attributes:
    actions:
      - key: environment
        value: production
        action: upsert

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

  otlp:
    endpoint: tempo:4317

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes]
      exporters: [jaeger, otlp]

Using a Collector decouples your application from specific backends. You can change where traces go without modifying application code.

Correlating Logs and Traces

Adding trace context to your logs enables jumping from a log entry directly to the related trace. This connection is invaluable during incident response.

This log processor automatically adds trace and span IDs to every log entry. You can then click through from logs to traces in your observability platform.

// Include trace context in logs
class TracingLogProcessor
{
    public function __invoke(array $record): array
    {
        $span = Span::getCurrent();
        $context = $span->getContext();

        $record['extra']['trace_id'] = $context->getTraceId();
        $record['extra']['span_id'] = $context->getSpanId();

        return $record;
    }
}

// Usage in logging
Log::info('Order processed', [
    'order_id' => $orderId,
    // trace_id and span_id added automatically
]);

Most observability platforms like Grafana and Datadog can link logs to traces when they share the same trace ID.

Conclusion

Distributed tracing transforms debugging from guesswork to precision. OpenTelemetry provides a vendor-neutral standard for instrumenting applications. Start with automatic instrumentation for HTTP and database operations, then add custom spans for business-critical paths. Use sampling to control costs at scale, and correlate traces with logs for complete observability. The investment in tracing pays off immediately when debugging production issues.

Distributed Tracing with OpenTelemetry