The Coverage Lie
You have 85% code coverage. Your CI is green. You feel good about your test suite. Then a bug slips through that a developer introduced by accidentally flipping a comparison operator from > to >=. Your tests ran over that code. They just didn't notice anything changed.
This is the fundamental problem with coverage as a quality metric: it measures execution, not verification. Mutation testing fixes this.
What Mutation Testing Does
Mutation testing works by introducing small, deliberate bugs into your source code—called "mutants"—and then running your test suite against each mutant. If your tests catch the bug (the mutant is "killed"), your tests are doing their job. If your tests still pass with a bug present (the mutant "survives"), you have a gap in your test suite.
Common mutations include:
- Changing
>to>=or< - Changing
&&to|| - Replacing
truewithfalse - Removing a function call entirely
- Changing
+to-in arithmetic - Replacing a return value with
null
Your mutation score is the percentage of mutants killed. A score of 80% means 20% of the bugs a mutation tool could introduce would go undetected by your tests.
Infection: Mutation Testing for PHP
Infection is the standard mutation testing tool for PHP. Install it and run against your test suite:
composer require --dev infection/infection
# Run mutation testing
vendor/bin/infection --threads=4 --min-msi=70 --min-covered-msi=80
The flags here are important:
--threads=4runs mutations in parallel (mutation testing is slow by nature; parallelism helps)--min-msi=70requires at least 70% Mutation Score Indicator overall--min-covered-msi=80requires 80% of covered code to kill mutants
A typical run output looks like this:
370 mutations were generated:
304 mutants were killed
12 mutants were not covered by tests
54 mutants survived
0 mutants resulted in a timeout
Metrics:
Mutation Score Indicator (MSI): 82%
Mutation Code Coverage (MCC): 97%
Covered Code MSI (CMSI): 85%
The interesting number is the 54 survivors. Each one represents a type of bug your tests wouldn't catch.
Reading the Infection Report
Infection generates an HTML report showing each mutant and its status. Here's how to interpret a survived mutant:
Original code:
public function isEligibleForDiscount(Invoice $invoice): bool
{
return $invoice->total > 500 && $invoice->client->isPremium();
}
Survived mutant (Infection changed > to >=):
public function isEligibleForDiscount(Invoice $invoice): bool
{
return $invoice->total >= 500 && $invoice->client->isPremium();
}
// Tests still passed! The boundary case (exactly $500) is untested.
This tells you exactly what test to write: an invoice with a total of exactly $500.00 should either qualify or not qualify for the discount, and you haven't specified which.
Writing Tests That Kill Mutants
Surviving mutants point to specific gaps. Here's how to respond:
For the boundary case above:
public function test_invoice_at_exact_threshold_is_not_eligible(): void
{
$client = Client::factory()->premium()->create();
$invoice = Invoice::factory()->for($client)->create(['total' => 500.00]);
$service = new DiscountEligibilityService();
// Exactly $500 does NOT qualify; requires strictly more than $500
$this->assertFalse($service->isEligibleForDiscount($invoice));
}
public function test_invoice_above_threshold_is_eligible(): void
{
$client = Client::factory()->premium()->create();
$invoice = Invoice::factory()->for($client)->create(['total' => 500.01]);
$service = new DiscountEligibilityService();
$this->assertTrue($service->isEligibleForDiscount($invoice));
}
These two tests together kill the > to >= mutant and the > to < mutant. They also document the business rule precisely: discount requires strictly more than $500.
Configuring Infection for Your Project
Infection is configured via infection.json5 at the project root:
{
"source": {
"directories": ["app"],
"excludes": [
"app/Http/Controllers",
"app/Console",
"app/Providers"
]
},
"mutators": {
"@default": true,
"MethodCallRemoval": false
},
"testFramework": "phpunit",
"testFrameworkOptions": "--testsuite=Unit",
"logs": {
"text": "infection.log",
"html": "infection.html",
"summary": "infection-summary.log"
},
"minMsi": 70,
"minCoveredMsi": 80
}
Key decisions:
Exclude controllers and providers from mutation testing. These are glue code; their behavior is better tested by integration tests, and mutating them generates false signals.
Run only unit tests during mutation testing. Integration tests are too slow to run against hundreds of mutants. Your unit tests should kill unit-level mutants.
Disable specific mutators that generate noise for your codebase. MethodCallRemoval is often too aggressive and generates mutants that aren't meaningful for your logic.
The Slow Problem and How to Address It
Mutation testing is inherently slow. If you have 500 tests and generate 400 mutants, you're potentially running your test suite 400 times. There are strategies to make this manageable:
Run in CI on a schedule, not on every commit. Mutation testing is a quality audit tool, not a per-commit gate. Run it nightly or weekly.
Use --filter to target specific files. When working on a new feature, run mutation testing only against that module:
vendor/bin/infection \
--filter=app/Services/Billing \
--threads=8
Use --only-covered to skip uncovered code. Don't waste time mutating code with no tests:
vendor/bin/infection --only-covered
Increase thread count. On a CI machine with 8 cores, --threads=8 can cut runtime by 70%.
Interpreting Mutation Score Thresholds
What's a good mutation score? It depends on code criticality:
| Code Type | Target MSI |
|---|---|
| Core business logic | 85%+ |
| Service layer | 75%+ |
| Utilities and helpers | 70%+ |
| Controllers and glue | Not applicable |
Don't chase 100%. Some mutants represent valid alternative implementations that behave identically in practice. Spending time killing every surviving mutant has diminishing returns past 85-90%.
Mutation Testing vs. Coverage
The two metrics answer different questions:
| Metric | Question Answered |
|---|---|
| Code coverage | Did my tests execute this code? |
| Mutation score | Would my tests catch a bug here? |
You want both. High coverage with low mutation score means your tests run the code but don't verify its behavior. Low coverage with high mutation score means the code that is tested is tested well, but large swaths of code are untested entirely.
Target: high coverage AND high mutation score for critical business logic.
Integrating into Your Development Workflow
The most effective workflow:
- Write code and tests normally
- Before marking a feature as complete, run mutation testing on the new code
- Kill surviving mutants by writing better tests
- Add mutation testing to your weekly CI run to track score trends over time
Track mutation scores in your CI dashboard over time. A declining score indicates test quality is degrading as new code is written without adequate tests.
Practical Takeaways
- Code coverage measures test execution; mutation score measures test quality
- Infection is the standard PHP mutation testing tool; configure it to target your business logic
- Surviving mutants are specific test gaps, not abstract problems; each one points to a test to write
- Run mutation testing on a schedule rather than every commit due to runtime cost
- Target 80-85% mutation score for business logic; don't chase 100%
Need help building reliable systems? We help teams architect software that scales. scopeforged.com