Code quality metrics provide objective measurements of codebase health. While subjective assessments like "this code feels messy" have value, metrics offer concrete data for tracking improvements, setting quality gates, and identifying problem areas. Understanding what metrics measure and their limitations helps you use them effectively.
Metrics aren't goals in themselves. Goodhart's Law applies: when a measure becomes a target, it ceases to be a good measure. Teams that optimize for metrics rather than actual quality produce code that looks good by the numbers but remains difficult to work with. Use metrics as indicators, not objectives.
Cyclomatic Complexity
Cyclomatic complexity measures the number of independent paths through code. Each decision point (if, while, for, case) adds to complexity. Higher complexity means more paths to test and more cognitive load to understand.
// Cyclomatic complexity: 1 (no branches)
function add(int $a, int $b): int
{
return $a + $b;
}
// Cyclomatic complexity: 4 (multiple branches)
function calculateDiscount(Order $order): float
{
$discount = 0;
if ($order->total > 1000) { // +1
$discount = 0.1;
} elseif ($order->total > 500) { // +1
$discount = 0.05;
}
if ($order->customer->isVip()) { // +1
$discount += 0.05;
}
return $discount;
}
Generally, keep methods under complexity 10. Methods over 20 are difficult to test thoroughly. High complexity often indicates a method doing too much; consider extracting submethods or using polymorphism.
// Reduced complexity through polymorphism
interface DiscountStrategy
{
public function calculate(Order $order): float;
}
class TieredDiscount implements DiscountStrategy
{
public function calculate(Order $order): float
{
return match (true) {
$order->total > 1000 => 0.10,
$order->total > 500 => 0.05,
default => 0,
};
}
}
class VipDiscount implements DiscountStrategy
{
public function calculate(Order $order): float
{
return $order->customer->isVip() ? 0.05 : 0;
}
}
Code Coverage
Code coverage measures what percentage of code is executed during tests. It indicates test thoroughness but not test quality.
# PHPUnit with coverage
./vendor/bin/phpunit --coverage-html coverage/
# Coverage output example:
# Lines: 87.5% (1400/1600)
# Functions/Methods: 92.3% (240/260)
# Classes: 95.0% (38/40)
Line coverage shows which lines were executed. Branch coverage shows which conditional branches were taken. Path coverage shows which paths through the code were exercised.
// This function has 100% line coverage with one test
// but incomplete branch coverage
function validate(array $data): bool
{
if (empty($data['name'])) {
return false; // Line covered
}
if (empty($data['email'])) {
return false; // Line covered
}
return true; // Line covered
}
// One test: validate(['name' => '', 'email' => '']) returns false
// Achieves 100% line coverage but misses branches:
// - name present, email empty
// - name and email present
Coverage targets (often 80%) provide baselines, but don't obsess over percentages. Some code (error handlers, edge cases) is difficult to test and low-value to cover. Focus coverage efforts on business-critical code.
Coupling and Cohesion
Coupling measures dependencies between modules. High coupling means changes ripple across the codebase. Low coupling allows modules to change independently.
Afferent coupling (Ca) counts incoming dependencies; how many modules depend on this one. Efferent coupling (Ce) counts outgoing dependencies; how many modules this one depends on.
// High efferent coupling: depends on many concrete classes
class OrderProcessor
{
public function process(Order $order): void
{
$inventory = new InventoryService();
$payment = new PaymentGateway();
$email = new EmailService();
$audit = new AuditLogger();
$metrics = new MetricsCollector();
// Uses all these dependencies directly
}
}
// Lower coupling: depends on abstractions
class OrderProcessor
{
public function __construct(
private InventoryInterface $inventory,
private PaymentInterface $payment,
private NotifierInterface $notifier,
) {}
// Depends on interfaces, not implementations
}
Cohesion measures how related a module's responsibilities are. High cohesion means a class does one thing well. Low cohesion means a class handles unrelated concerns.
// Low cohesion: multiple unrelated responsibilities
class UserManager
{
public function createUser(array $data): User { }
public function sendEmail(User $user, string $subject): void { }
public function generateReport(): string { }
public function backupDatabase(): void { }
}
// High cohesion: focused responsibility
class UserService
{
public function createUser(array $data): User { }
public function updateUser(User $user, array $data): User { }
public function deleteUser(User $user): void { }
public function findByEmail(string $email): ?User { }
}
Technical Debt
Technical debt metrics attempt to quantify maintainability issues. Tools like SonarQube estimate "debt" in time required to fix issues.
Technical Debt Ratio = Remediation Cost / Development Cost
Example:
- Estimated time to fix all issues: 40 hours
- Estimated time to develop the code: 400 hours
- Debt ratio: 10%
Debt ratio provides a high-level health indicator. Track it over time to see if debt is accumulating or being paid down. Spikes often correlate with rushed features or deadline pressure.
Individual debt items matter more than aggregate numbers. A single architectural flaw might be more damaging than dozens of minor style issues. Prioritize debt based on impact, not just quantity.
Duplication
Code duplication indicates potential abstraction opportunities. Duplicated code means bugs must be fixed in multiple places and changes require updates everywhere.
// Duplicated validation logic
class UserController
{
public function store(Request $request): Response
{
if (strlen($request->password) < 8) {
return response()->json(['error' => 'Password too short'], 400);
}
if (!filter_var($request->email, FILTER_VALIDATE_EMAIL)) {
return response()->json(['error' => 'Invalid email'], 400);
}
// ...
}
public function update(Request $request): Response
{
if (strlen($request->password) < 8) { // Duplicated
return response()->json(['error' => 'Password too short'], 400);
}
if (!filter_var($request->email, FILTER_VALIDATE_EMAIL)) { // Duplicated
return response()->json(['error' => 'Invalid email'], 400);
}
// ...
}
}
// Extracted to reusable validation
class UserRequest extends FormRequest
{
public function rules(): array
{
return [
'email' => 'required|email',
'password' => 'required|min:8',
];
}
}
Tools detect duplication automatically. Set thresholds; small duplication (3-5 lines) may not warrant abstraction, while larger blocks usually should.
Maintainability Index
The Maintainability Index combines cyclomatic complexity, lines of code, and Halstead volume into a single score. Higher scores (0-100) indicate more maintainable code.
MI = 171 - 5.2 * ln(V) - 0.23 * G - 16.2 * ln(L)
Where:
- V = Halstead Volume
- G = Cyclomatic Complexity
- L = Lines of Code
Scores above 65 are generally maintainable. Scores below 20 indicate highly unmaintainable code. The index provides a quick health check but hides specific problems behind an aggregate number.
Using Metrics Effectively
Set quality gates in CI/CD pipelines to prevent degradation:
# .github/workflows/quality.yml
- name: Run static analysis
run: |
./vendor/bin/phpstan analyse --error-format=github
- name: Check complexity
run: |
./vendor/bin/phpmd app text codesize --suffixes php \
--reportfile-xml complexity-report.xml
- name: Enforce coverage threshold
run: |
./vendor/bin/phpunit --coverage-clover coverage.xml
php check-coverage.php coverage.xml --min-coverage=80
Track metrics over time. Single snapshots tell little; trends reveal whether quality is improving or degrading. Dashboard metrics alongside deployment frequency and incident rates.
Use metrics to identify hotspots. Files with high complexity, low coverage, and frequent changes are maintenance risks. Prioritize refactoring efforts on these hotspots.
// Find risky files
class CodebaseAnalyzer
{
public function findHotspots(): array
{
$metrics = $this->collectMetrics();
$changes = $this->getChangeFrequency();
return collect($metrics)
->map(fn ($file) => [
'file' => $file['path'],
'risk' => $file['complexity'] * $changes[$file['path']] ?? 0,
])
->sortByDesc('risk')
->take(10)
->toArray();
}
}
Limitations
Metrics measure what's measurable, not what matters. Clean code principles like clarity, expressiveness, and appropriate naming don't reduce to numbers.
Gaming metrics is easy. Teams pressured to hit coverage targets write meaningless tests. Teams avoiding complexity limits split methods arbitrarily. The underlying problems remain.
Context matters. A 50-complexity method in a state machine might be acceptable. A 10-complexity method in simple business logic might be too high. Apply judgment alongside metrics.
Conclusion
Code quality metrics provide valuable signals about codebase health. Cyclomatic complexity indicates testability. Coverage shows test thoroughness. Coupling and cohesion measure design quality. Technical debt tracks accumulated issues.
Use metrics as tools, not targets. Track trends over time. Focus on hotspots with high risk. Remember that metrics indicate symptoms; fixing underlying causes matters more than improving numbers. The goal is maintainable, reliable code, not perfect metric scores.