Distributed Tracing Implementation Guide

Distributed tracing tracks requests as they flow through microservices architectures. When a single user action triggers calls across multiple services, tracing connects these calls into a unified view. Without tracing, debugging distributed systems means correlating logs across services—tedious and error-prone.

A trace represents an end-to-end request. Each service's work appears as a span within the trace. Spans have parent-child relationships showing which service called which. Timing data reveals where latency accumulates. Error flags show where failures originate.

Tracing Concepts

Traces and spans form a tree structure. The root span represents the initial request. Child spans represent downstream calls. Each span captures start time, duration, and metadata.

// Conceptual span structure
class Span
{
    public string $traceId;      // Shared across all spans in trace
    public string $spanId;       // Unique to this span
    public ?string $parentId;    // Parent span (null for root)
    public string $operationName;
    public int $startTime;
    public int $duration;
    public array $tags;          // Key-value metadata
    public array $logs;          // Timestamped events
}

Context propagation passes trace context between services. HTTP headers typically carry trace ID and parent span ID:

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01

The W3C Trace Context standard defines the format. Using standards ensures interoperability between different tracing implementations.

Instrumenting Applications

Automatic instrumentation captures common operations without code changes. Libraries hook into HTTP clients, database drivers, and queue processors.

// Manual span creation with OpenTelemetry
use OpenTelemetry\API\Globals;
use OpenTelemetry\API\Trace\SpanKind;

class OrderService
{
    public function createOrder(array $data): Order
    {
        $tracer = Globals::tracerProvider()->getTracer('order-service');

        $span = $tracer->spanBuilder('create_order')
            ->setSpanKind(SpanKind::KIND_INTERNAL)
            ->setAttribute('order.customer_id', $data['customer_id'])
            ->setAttribute('order.items_count', count($data['items']))
            ->startSpan();

        try {
            $order = $this->processOrder($data);

            $span->setAttribute('order.id', $order->id);
            $span->setAttribute('order.total', $order->total);

            return $order;
        } catch (Exception $e) {
            $span->recordException($e);
            $span->setStatus(StatusCode::STATUS_ERROR, $e->getMessage());
            throw $e;
        } finally {
            $span->end();
        }
    }
}

Context propagation ensures spans connect across service boundaries:

// Propagate context in HTTP requests
class TracedHttpClient
{
    public function request(string $method, string $url, array $options = []): Response
    {
        $tracer = Globals::tracerProvider()->getTracer('http-client');

        $span = $tracer->spanBuilder("HTTP $method")
            ->setSpanKind(SpanKind::KIND_CLIENT)
            ->setAttribute('http.method', $method)
            ->setAttribute('http.url', $url)
            ->startSpan();

        $scope = $span->activate();

        try {
            // Inject trace context into headers
            $carrier = [];
            TraceContextPropagator::getInstance()->inject(
                $carrier,
                null,
                Context::getCurrent()
            );

            $options['headers'] = array_merge(
                $options['headers'] ?? [],
                $carrier
            );

            $response = $this->client->request($method, $url, $options);

            $span->setAttribute('http.status_code', $response->status());

            return $response;
        } finally {
            $scope->detach();
            $span->end();
        }
    }
}

Extract context from incoming requests:

class TracingMiddleware
{
    public function handle(Request $request, Closure $next): Response
    {
        // Extract context from incoming request headers
        $context = TraceContextPropagator::getInstance()->extract(
            $request->headers->all()
        );

        $tracer = Globals::tracerProvider()->getTracer('api');

        $span = $tracer->spanBuilder($request->path())
            ->setParent($context)
            ->setSpanKind(SpanKind::KIND_SERVER)
            ->setAttribute('http.method', $request->method())
            ->setAttribute('http.url', $request->fullUrl())
            ->setAttribute('http.user_agent', $request->userAgent())
            ->startSpan();

        $scope = $span->activate();

        try {
            $response = $next($request);

            $span->setAttribute('http.status_code', $response->status());

            return $response;
        } catch (Exception $e) {
            $span->recordException($e);
            $span->setStatus(StatusCode::STATUS_ERROR);
            throw $e;
        } finally {
            $scope->detach();
            $span->end();
        }
    }
}

Tracing Infrastructure

Tracing systems have three components: instrumentation (in applications), collectors (receive and process spans), and backends (store and query traces).

# OpenTelemetry Collector configuration
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

  attributes:
    actions:
      - key: environment
        value: production
        action: upsert

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes]
      exporters: [jaeger, logging]

Configure your application to export to the collector:

// OpenTelemetry configuration
use OpenTelemetry\SDK\Trace\TracerProviderFactory;
use OpenTelemetry\Contrib\Otlp\OtlpHttpTransportFactory;
use OpenTelemetry\Contrib\Otlp\SpanExporter;

$transport = (new OtlpHttpTransportFactory())->create(
    'http://collector:4318/v1/traces',
    'application/json'
);

$exporter = new SpanExporter($transport);

$tracerProvider = TracerProviderFactory::create(
    exporter: $exporter,
    resource: ResourceInfo::create(Attributes::create([
        'service.name' => 'order-service',
        'service.version' => '1.2.3',
        'deployment.environment' => 'production',
    ]))
);

Globals::setTracerProvider($tracerProvider);

Sampling Strategies

High-traffic services generate enormous trace volumes. Sampling reduces volume while preserving visibility.

Head-based sampling decides at trace start whether to sample. Simple but can miss interesting traces.

// Probabilistic sampling
$sampler = new TraceIdRatioBasedSampler(0.1);  // Sample 10% of traces

Tail-based sampling decides after seeing the complete trace. It can keep all error traces or slow traces but requires buffering.

# Collector tail sampling configuration
processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow
        type: latency
        latency: {threshold_ms: 1000}
      - name: probabilistic
        type: probabilistic
        probabilistic: {sampling_percentage: 10}

Adaptive sampling adjusts rates based on traffic, ensuring coverage during low traffic and controlling costs during spikes.

Adding Business Context

Technical spans show service calls. Business context makes traces actionable.

class PaymentProcessor
{
    public function processPayment(Order $order): PaymentResult
    {
        $span = $this->tracer->spanBuilder('process_payment')
            ->startSpan();

        // Add business context
        $span->setAttribute('order.id', $order->id);
        $span->setAttribute('customer.id', $order->customer_id);
        $span->setAttribute('payment.amount', $order->total);
        $span->setAttribute('payment.currency', $order->currency);
        $span->setAttribute('payment.method', $order->payment_method);

        try {
            $result = $this->gateway->charge($order);

            // Record outcome
            $span->setAttribute('payment.success', $result->success);
            $span->setAttribute('payment.transaction_id', $result->transactionId);

            if (!$result->success) {
                $span->setAttribute('payment.failure_reason', $result->failureReason);
                $span->addEvent('payment_declined', [
                    'reason' => $result->failureReason,
                ]);
            }

            return $result;
        } finally {
            $span->end();
        }
    }
}

Events within spans capture important moments:

$span->addEvent('cache_miss', ['key' => $cacheKey]);
$span->addEvent('retry_attempt', ['attempt' => $attempt, 'reason' => $error->getMessage()]);
$span->addEvent('circuit_breaker_opened');

Correlating Traces with Logs

Connect logs to traces by including trace context:

class TracingLogger
{
    public function log(string $level, string $message, array $context = []): void
    {
        $span = Span::getCurrent();

        $context['trace_id'] = $span->getContext()->getTraceId();
        $context['span_id'] = $span->getContext()->getSpanId();

        $this->logger->log($level, $message, $context);
    }
}

Query logs by trace ID to see all log entries for a request:

# Elasticsearch query
{
  "query": {
    "term": {
      "trace_id": "0af7651916cd43dd8448eb211c80319c"
    }
  }
}

Analyzing Traces

Use traces to identify performance bottlenecks, error sources, and dependency issues.

class TraceAnalyzer
{
    public function findBottlenecks(array $traces): array
    {
        $serviceLatencies = [];

        foreach ($traces as $trace) {
            foreach ($trace['spans'] as $span) {
                $service = $span['service.name'];
                $serviceLatencies[$service][] = $span['duration'];
            }
        }

        return collect($serviceLatencies)
            ->map(fn ($latencies) => [
                'p50' => $this->percentile($latencies, 50),
                'p95' => $this->percentile($latencies, 95),
                'p99' => $this->percentile($latencies, 99),
            ])
            ->sortByDesc('p95')
            ->toArray();
    }

    public function findErrorSources(array $traces): array
    {
        return collect($traces)
            ->filter(fn ($trace) => $trace['has_error'])
            ->flatMap(fn ($trace) => $this->findRootCauseSpan($trace))
            ->groupBy('service.name')
            ->map(fn ($errors) => $errors->count())
            ->sortDesc()
            ->toArray();
    }
}

Conclusion

Distributed tracing provides visibility into request flow across services. Instrument applications with span creation and context propagation. Use collectors to process and route spans. Sample appropriately to manage volume. Add business context to make traces actionable.

Tracing complements metrics and logs in the observability stack. Metrics show aggregates, logs show details, traces show flow. Together they provide the visibility needed to operate distributed systems reliably.

Distributed Tracing Implementation Guide

Tracing Concepts

Instrumenting Applications

Tracing Infrastructure

Sampling Strategies

Adding Business Context

Correlating Traces with Logs

Analyzing Traces

Conclusion

Share this article

Related Articles

Quantifying Technical Debt: Metrics That Actually Drive Action

Designing Rate Limiters: Token Bucket, Sliding Window, and Distributed Approaches

Zero-Downtime Database Migrations: The Expand-Contract Pattern

Need help with your project?

ScopeForged Assistant