Every engineering team carries technical debt. The problem isn't having it; the problem is that most teams can't describe it in terms that drive action. "The codebase is a mess" loses every prioritization battle against "this feature will generate $200k in new ARR."
To win that argument you need numbers. Not arbitrary complexity scores, but metrics tied to business costs that stakeholders can weigh against feature work.
This post covers how to measure technical debt in ways that actually change how your organization responds to it.
What Technical Debt Actually Costs
Technical debt has two types of costs:
Interest payments: The ongoing slowdown that debt causes. Every feature takes longer because of unclear code, missing tests, fragile integrations, and workarounds built on workarounds. This is the compounding cost.
Principal: The effort required to eliminate the debt. Rewriting a module, adding comprehensive tests, or migrating off a deprecated dependency.
Most teams focus on the principal ("it would take 3 sprints to refactor this") without measuring the interest ("but it's adding 20% overhead to every feature we build in this area"). Once you can show accumulated interest, the conversation shifts.
Metric 1: Cycle Time by Module
Cycle time — the time from code commit to production deployment — varies dramatically across codebases. Areas with high debt typically show:
- Longer PR review times (reviewers need more time to understand changes)
- More back-and-forth review cycles
- More time in QA (failures surface in testing that weren't caught earlier)
- More hotfixes after deployment
Measure cycle time per module or service by pulling data from your version control system:
import subprocess
import json
from datetime import datetime
from collections import defaultdict
def get_commits_with_files(days=90):
"""Get commits and which files they touched in the last N days."""
result = subprocess.run(
['git', 'log', f'--since={days} days ago',
'--name-only', '--format=%H|%ai|%s'],
capture_output=True, text=True
)
commits = []
current = None
for line in result.stdout.strip().split('\n'):
if '|' in line and len(line.split('|')) == 3:
if current:
commits.append(current)
hash_, date, subject = line.split('|', 2)
current = {'hash': hash_, 'date': date, 'files': []}
elif line.strip() and current:
current['files'].append(line.strip())
return commits
def churn_by_module(commits):
"""Count commits touching each top-level module."""
module_churn = defaultdict(int)
for commit in commits:
modules_touched = set()
for file in commit['files']:
parts = file.split('/')
if len(parts) > 1:
modules_touched.add(parts[0] + '/' + parts[1])
for module in modules_touched:
module_churn[module] += 1
return dict(sorted(module_churn.items(), key=lambda x: x[1], reverse=True))
High churn with high cycle time in the same module is a reliable signal of debt. Modules you touch constantly but never feel confident about are costing you real time every sprint.
Metric 2: Defect Density
Count bugs by the module or component they originate from. Track this over rolling 90-day windows.
-- In your issue tracker database or via API
SELECT
component,
COUNT(*) AS bug_count,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS pct_of_total_bugs,
AVG(DATEDIFF(resolved_at, created_at)) AS avg_days_to_resolve
FROM issues
WHERE
type = 'bug'
AND created_at >= DATE_SUB(NOW(), INTERVAL 90 DAY)
AND resolved_at IS NOT NULL
GROUP BY component
ORDER BY bug_count DESC
LIMIT 20;
A component that represents 8% of your codebase but produces 35% of your bugs has measurable debt. More importantly, you can calculate its cost: if each bug takes an average of 6 engineer-hours to fix, and you're fixing 15 bugs per month from that component, that's 90 engineer-hours per month — about 56 engineer-days per year — attributable to debt in one area.
Metric 3: Test Coverage Delta
Test coverage alone is a poor proxy for code quality (you can have 90% coverage and still have terrible tests). But coverage in high-churn areas is meaningful. Code you change constantly without test coverage is where bugs hide.
# Generate coverage report with per-file data
phpunit --coverage-clover coverage.xml
# Extract low-coverage, high-churn files
python3 scripts/debt/coverage_churn_matrix.py \
--coverage coverage.xml \
--days 90 \
--threshold 50
# coverage_churn_matrix.py
import xml.etree.ElementTree as ET
from collections import defaultdict
import subprocess
def parse_coverage(xml_path):
tree = ET.parse(xml_path)
root = tree.getroot()
coverage = {}
for file_elem in root.iter('file'):
name = file_elem.get('name')
metrics = file_elem.find('metrics')
if metrics is not None:
statements = int(metrics.get('statements', 0))
covered = int(metrics.get('coveredstatements', 0))
pct = (covered / statements * 100) if statements > 0 else 100
coverage[name] = pct
return coverage
def get_file_churn(days):
result = subprocess.run(
['git', 'log', f'--since={days} days ago', '--name-only', '--format='],
capture_output=True, text=True
)
churn = defaultdict(int)
for line in result.stdout.strip().split('\n'):
if line.strip():
churn[line.strip()] += 1
return dict(churn)
coverage = parse_coverage('coverage.xml')
churn = get_file_churn(90)
# Find high-churn, low-coverage files
risk_files = [
{
'file': f,
'churn': churn.get(f, 0),
'coverage': coverage.get(f, 0)
}
for f in set(list(coverage.keys()) + list(churn.keys()))
if churn.get(f, 0) > 5 and coverage.get(f, 100) < 50
]
risk_files.sort(key=lambda x: x['churn'], reverse=True)
for item in risk_files[:20]:
print(f"{item['churn']:4d} commits | {item['coverage']:5.1f}% coverage | {item['file']}")
This matrix is one of the most actionable outputs you can produce. A file with 40 commits and 12% test coverage is a verified risk area. You can show exactly what it's costing you when bugs emerge from it.
Metric 4: Complexity Trends
Cyclomatic complexity measures the number of linearly independent paths through code. High complexity correlates with defect rates. Track it over time to see whether debt is growing or shrinking.
# PHP: use PHP_CodeSniffer or phpmetrics
phpmetrics --report-html=reports/metrics/ src/
# Or use phploc for quick summary stats
phploc src/ --log-csv=reports/metrics/phploc-$(date +%Y%m%d).csv
The trend matters more than the absolute number. A codebase where average complexity is increasing month over month is accumulating debt faster than it's paying it off.
Store results in a time-series database or even a simple CSV per run and chart it. When the line trends upward for three consecutive months, that's a concrete conversation starter in planning.
Metric 5: Dependency Age and Vulnerability Count
Outdated dependencies are a specific, measurable form of debt:
# Node.js
npm audit --json | jq '.vulnerabilities | length'
npm outdated --json | jq 'to_entries | length'
# PHP / Composer
composer audit --format=json | jq '.advisories | length'
composer outdated --direct --no-dev --format=json | jq 'length'
# Python
pip list --outdated --format=json | jq 'length'
pip-audit --format=json | jq '.vulnerabilities | length'
You can express this in risk terms: "We have 12 dependencies that have not been updated in over 18 months. Three of them have known CVEs with CVSS scores above 7.0."
This language is understood by security teams, CTOs, and customers who ask about your security posture. It converts invisible debt into visible risk.
Building the Debt Dashboard
Individual metrics are useful, but a consolidated view creates the most leverage. Build a simple dashboard that shows:
| Module | Churn (90d) | Bug Count (90d) | Coverage % | Avg Complexity | Debt Score |
|---|---|---|---|---|---|
| Billing | 87 | 23 | 34% | 18.2 | HIGH |
| Auth | 12 | 2 | 88% | 6.1 | LOW |
| Checkout | 64 | 18 | 41% | 14.7 | HIGH |
| Dashboard | 31 | 4 | 72% | 8.3 | MEDIUM |
The composite debt score can be as simple as a weighted sum:
def debt_score(churn, bugs, coverage, complexity):
# Normalize each to 0-100 scale, then weight
churn_score = min(churn / 100 * 100, 100)
bug_score = min(bugs / 25 * 100, 100)
coverage_score = 100 - coverage # Invert: low coverage = high score
complexity_score = min((complexity - 5) / 20 * 100, 100) # 5 = baseline
weighted = (
churn_score * 0.25 +
bug_score * 0.35 +
coverage_score * 0.25 +
complexity_score * 0.15
)
if weighted > 65: return 'HIGH'
if weighted > 35: return 'MEDIUM'
return 'LOW'
The specific weights should reflect your team's experience of what actually causes pain. Adjust them until the output matches your intuition about which modules are genuinely problematic.
Translating Metrics to Business Cases
Once you have data, translate it into the language of planning:
Velocity tax calculation: "Our billing module has a debt score of HIGH. Developers estimate it takes 40% longer to ship features there than in comparable modules. Last quarter we shipped 8 billing features. At 40% overhead, we paid roughly 3 features worth of engineering time in interest on that debt."
Risk quantification: "The checkout module produced 18 bugs in 90 days. Our average cost to fix and recover from a production bug is 8 engineer-hours. That's 144 engineer-hours, or roughly $21,600 in direct engineering cost, not counting customer impact."
Investment case: "A focused 3-sprint effort to refactor checkout and bring test coverage from 41% to 80% would cost approximately 180 engineer-hours. Based on current bug rates, we'd expect to recover that investment within 6-8 months through reduced incident volume alone, and accelerate feature delivery by an estimated 25% in that area."
This is how you win budget for technical debt work. Not "the code is messy" but "this is costing us $21,600 per quarter and here's the ROI of fixing it."
Making It Sustainable
One-off analyses don't change behavior. Automate the data collection and run it on a schedule:
# .github/workflows/debt-report.yml
name: Technical Debt Report
on:
schedule:
- cron: '0 9 * * 1' # Every Monday at 9am
jobs:
report:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for churn analysis
- name: Run debt analysis
run: |
python scripts/debt/generate_report.py \
--output reports/debt-$(date +%Y-%m-%d).json
- name: Post to Slack
run: |
python scripts/debt/post_to_slack.py \
--report reports/debt-$(date +%Y-%m-%d).json
When the report lands in your team's Slack channel every Monday, it becomes part of the routine. Trends become visible. When a module's score spikes, someone asks why. That conversation is the whole point.
The Key Principle
Technical debt is not a code quality problem. It's a business risk that happens to live in code. Once your metrics demonstrate that clearly, debt work competes fairly in planning conversations. Before that, it will always lose to the next shiny feature.
Measure, quantify, and show the interest payments. Then make the investment case.
Tackling complex architecture decisions? We help teams build systems that last. scopeforged.com