Distributed tracing tracks requests as they flow through microservices architectures. When a single user action triggers calls across multiple services, tracing connects these calls into a unified view. Without tracing, debugging distributed systems means correlating logs across services—tedious and error-prone.
A trace represents an end-to-end request. Each service's work appears as a span within the trace. Spans have parent-child relationships showing which service called which. Timing data reveals where latency accumulates. Error flags show where failures originate.
Tracing Concepts
Traces and spans form a tree structure. The root span represents the initial request. Child spans represent downstream calls. Each span captures start time, duration, and metadata.
// Conceptual span structure
class Span
{
public string $traceId; // Shared across all spans in trace
public string $spanId; // Unique to this span
public ?string $parentId; // Parent span (null for root)
public string $operationName;
public int $startTime;
public int $duration;
public array $tags; // Key-value metadata
public array $logs; // Timestamped events
}
Context propagation passes trace context between services. HTTP headers typically carry trace ID and parent span ID:
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
The W3C Trace Context standard defines the format. Using standards ensures interoperability between different tracing implementations.
Instrumenting Applications
Automatic instrumentation captures common operations without code changes. Libraries hook into HTTP clients, database drivers, and queue processors.
// Manual span creation with OpenTelemetry
use OpenTelemetry\API\Globals;
use OpenTelemetry\API\Trace\SpanKind;
class OrderService
{
public function createOrder(array $data): Order
{
$tracer = Globals::tracerProvider()->getTracer('order-service');
$span = $tracer->spanBuilder('create_order')
->setSpanKind(SpanKind::KIND_INTERNAL)
->setAttribute('order.customer_id', $data['customer_id'])
->setAttribute('order.items_count', count($data['items']))
->startSpan();
try {
$order = $this->processOrder($data);
$span->setAttribute('order.id', $order->id);
$span->setAttribute('order.total', $order->total);
return $order;
} catch (Exception $e) {
$span->recordException($e);
$span->setStatus(StatusCode::STATUS_ERROR, $e->getMessage());
throw $e;
} finally {
$span->end();
}
}
}
Context propagation ensures spans connect across service boundaries:
// Propagate context in HTTP requests
class TracedHttpClient
{
public function request(string $method, string $url, array $options = []): Response
{
$tracer = Globals::tracerProvider()->getTracer('http-client');
$span = $tracer->spanBuilder("HTTP $method")
->setSpanKind(SpanKind::KIND_CLIENT)
->setAttribute('http.method', $method)
->setAttribute('http.url', $url)
->startSpan();
$scope = $span->activate();
try {
// Inject trace context into headers
$carrier = [];
TraceContextPropagator::getInstance()->inject(
$carrier,
null,
Context::getCurrent()
);
$options['headers'] = array_merge(
$options['headers'] ?? [],
$carrier
);
$response = $this->client->request($method, $url, $options);
$span->setAttribute('http.status_code', $response->status());
return $response;
} finally {
$scope->detach();
$span->end();
}
}
}
Extract context from incoming requests:
class TracingMiddleware
{
public function handle(Request $request, Closure $next): Response
{
// Extract context from incoming request headers
$context = TraceContextPropagator::getInstance()->extract(
$request->headers->all()
);
$tracer = Globals::tracerProvider()->getTracer('api');
$span = $tracer->spanBuilder($request->path())
->setParent($context)
->setSpanKind(SpanKind::KIND_SERVER)
->setAttribute('http.method', $request->method())
->setAttribute('http.url', $request->fullUrl())
->setAttribute('http.user_agent', $request->userAgent())
->startSpan();
$scope = $span->activate();
try {
$response = $next($request);
$span->setAttribute('http.status_code', $response->status());
return $response;
} catch (Exception $e) {
$span->recordException($e);
$span->setStatus(StatusCode::STATUS_ERROR);
throw $e;
} finally {
$scope->detach();
$span->end();
}
}
}
Tracing Infrastructure
Tracing systems have three components: instrumentation (in applications), collectors (receive and process spans), and backends (store and query traces).
# OpenTelemetry Collector configuration
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
attributes:
actions:
- key: environment
value: production
action: upsert
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
logging:
loglevel: debug
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, attributes]
exporters: [jaeger, logging]
Configure your application to export to the collector:
// OpenTelemetry configuration
use OpenTelemetry\SDK\Trace\TracerProviderFactory;
use OpenTelemetry\Contrib\Otlp\OtlpHttpTransportFactory;
use OpenTelemetry\Contrib\Otlp\SpanExporter;
$transport = (new OtlpHttpTransportFactory())->create(
'http://collector:4318/v1/traces',
'application/json'
);
$exporter = new SpanExporter($transport);
$tracerProvider = TracerProviderFactory::create(
exporter: $exporter,
resource: ResourceInfo::create(Attributes::create([
'service.name' => 'order-service',
'service.version' => '1.2.3',
'deployment.environment' => 'production',
]))
);
Globals::setTracerProvider($tracerProvider);
Sampling Strategies
High-traffic services generate enormous trace volumes. Sampling reduces volume while preserving visibility.
Head-based sampling decides at trace start whether to sample. Simple but can miss interesting traces.
// Probabilistic sampling
$sampler = new TraceIdRatioBasedSampler(0.1); // Sample 10% of traces
Tail-based sampling decides after seeing the complete trace. It can keep all error traces or slow traces but requires buffering.
# Collector tail sampling configuration
processors:
tail_sampling:
decision_wait: 10s
policies:
- name: errors
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow
type: latency
latency: {threshold_ms: 1000}
- name: probabilistic
type: probabilistic
probabilistic: {sampling_percentage: 10}
Adaptive sampling adjusts rates based on traffic, ensuring coverage during low traffic and controlling costs during spikes.
Adding Business Context
Technical spans show service calls. Business context makes traces actionable.
class PaymentProcessor
{
public function processPayment(Order $order): PaymentResult
{
$span = $this->tracer->spanBuilder('process_payment')
->startSpan();
// Add business context
$span->setAttribute('order.id', $order->id);
$span->setAttribute('customer.id', $order->customer_id);
$span->setAttribute('payment.amount', $order->total);
$span->setAttribute('payment.currency', $order->currency);
$span->setAttribute('payment.method', $order->payment_method);
try {
$result = $this->gateway->charge($order);
// Record outcome
$span->setAttribute('payment.success', $result->success);
$span->setAttribute('payment.transaction_id', $result->transactionId);
if (!$result->success) {
$span->setAttribute('payment.failure_reason', $result->failureReason);
$span->addEvent('payment_declined', [
'reason' => $result->failureReason,
]);
}
return $result;
} finally {
$span->end();
}
}
}
Events within spans capture important moments:
$span->addEvent('cache_miss', ['key' => $cacheKey]);
$span->addEvent('retry_attempt', ['attempt' => $attempt, 'reason' => $error->getMessage()]);
$span->addEvent('circuit_breaker_opened');
Correlating Traces with Logs
Connect logs to traces by including trace context:
class TracingLogger
{
public function log(string $level, string $message, array $context = []): void
{
$span = Span::getCurrent();
$context['trace_id'] = $span->getContext()->getTraceId();
$context['span_id'] = $span->getContext()->getSpanId();
$this->logger->log($level, $message, $context);
}
}
Query logs by trace ID to see all log entries for a request:
# Elasticsearch query
{
"query": {
"term": {
"trace_id": "0af7651916cd43dd8448eb211c80319c"
}
}
}
Analyzing Traces
Use traces to identify performance bottlenecks, error sources, and dependency issues.
class TraceAnalyzer
{
public function findBottlenecks(array $traces): array
{
$serviceLatencies = [];
foreach ($traces as $trace) {
foreach ($trace['spans'] as $span) {
$service = $span['service.name'];
$serviceLatencies[$service][] = $span['duration'];
}
}
return collect($serviceLatencies)
->map(fn ($latencies) => [
'p50' => $this->percentile($latencies, 50),
'p95' => $this->percentile($latencies, 95),
'p99' => $this->percentile($latencies, 99),
])
->sortByDesc('p95')
->toArray();
}
public function findErrorSources(array $traces): array
{
return collect($traces)
->filter(fn ($trace) => $trace['has_error'])
->flatMap(fn ($trace) => $this->findRootCauseSpan($trace))
->groupBy('service.name')
->map(fn ($errors) => $errors->count())
->sortDesc()
->toArray();
}
}
Conclusion
Distributed tracing provides visibility into request flow across services. Instrument applications with span creation and context propagation. Use collectors to process and route spans. Sample appropriately to manage volume. Add business context to make traces actionable.
Tracing complements metrics and logs in the observability stack. Metrics show aggregates, logs show details, traces show flow. Together they provide the visibility needed to operate distributed systems reliably.