Mutation Testing: The Metric That Exposes Weak Test Suites

The Coverage Lie

You have 85% code coverage. Your CI is green. You feel good about your test suite. Then a bug slips through that a developer introduced by accidentally flipping a comparison operator from > to >=. Your tests ran over that code. They just didn't notice anything changed.

This is the fundamental problem with coverage as a quality metric: it measures execution, not verification. Mutation testing fixes this.

What Mutation Testing Does

Mutation testing works by introducing small, deliberate bugs into your source code—called "mutants"—and then running your test suite against each mutant. If your tests catch the bug (the mutant is "killed"), your tests are doing their job. If your tests still pass with a bug present (the mutant "survives"), you have a gap in your test suite.

Common mutations include:

Changing > to >= or <
Changing && to ||
Replacing true with false
Removing a function call entirely
Changing + to - in arithmetic
Replacing a return value with null

Your mutation score is the percentage of mutants killed. A score of 80% means 20% of the bugs a mutation tool could introduce would go undetected by your tests.

Infection: Mutation Testing for PHP

Infection is the standard mutation testing tool for PHP. Install it and run against your test suite:

composer require --dev infection/infection

# Run mutation testing
vendor/bin/infection --threads=4 --min-msi=70 --min-covered-msi=80

The flags here are important:

--threads=4 runs mutations in parallel (mutation testing is slow by nature; parallelism helps)
--min-msi=70 requires at least 70% Mutation Score Indicator overall
--min-covered-msi=80 requires 80% of covered code to kill mutants

A typical run output looks like this:

370 mutations were generated:
    304 mutants were killed
     12 mutants were not covered by tests
     54 mutants survived
      0 mutants resulted in a timeout

Metrics:
         Mutation Score Indicator (MSI): 82%
 Mutation Code Coverage (MCC): 97%
 Covered Code MSI (CMSI): 85%

The interesting number is the 54 survivors. Each one represents a type of bug your tests wouldn't catch.

Reading the Infection Report

Infection generates an HTML report showing each mutant and its status. Here's how to interpret a survived mutant:

Original code:

public function isEligibleForDiscount(Invoice $invoice): bool
{
    return $invoice->total > 500 && $invoice->client->isPremium();
}

Survived mutant (Infection changed > to >=):

public function isEligibleForDiscount(Invoice $invoice): bool
{
    return $invoice->total >= 500 && $invoice->client->isPremium();
}
// Tests still passed! The boundary case (exactly $500) is untested.

This tells you exactly what test to write: an invoice with a total of exactly $500.00 should either qualify or not qualify for the discount, and you haven't specified which.

Writing Tests That Kill Mutants

Surviving mutants point to specific gaps. Here's how to respond:

For the boundary case above:

public function test_invoice_at_exact_threshold_is_not_eligible(): void
{
    $client = Client::factory()->premium()->create();
    $invoice = Invoice::factory()->for($client)->create(['total' => 500.00]);

    $service = new DiscountEligibilityService();

    // Exactly $500 does NOT qualify; requires strictly more than $500
    $this->assertFalse($service->isEligibleForDiscount($invoice));
}

public function test_invoice_above_threshold_is_eligible(): void
{
    $client = Client::factory()->premium()->create();
    $invoice = Invoice::factory()->for($client)->create(['total' => 500.01]);

    $service = new DiscountEligibilityService();

    $this->assertTrue($service->isEligibleForDiscount($invoice));
}

These two tests together kill the > to >= mutant and the > to < mutant. They also document the business rule precisely: discount requires strictly more than $500.

Configuring Infection for Your Project

Infection is configured via infection.json5 at the project root:

{
    "source": {
        "directories": ["app"],
        "excludes": [
            "app/Http/Controllers",
            "app/Console",
            "app/Providers"
        ]
    },
    "mutators": {
        "@default": true,
        "MethodCallRemoval": false
    },
    "testFramework": "phpunit",
    "testFrameworkOptions": "--testsuite=Unit",
    "logs": {
        "text": "infection.log",
        "html": "infection.html",
        "summary": "infection-summary.log"
    },
    "minMsi": 70,
    "minCoveredMsi": 80
}

Key decisions:

Exclude controllers and providers from mutation testing. These are glue code; their behavior is better tested by integration tests, and mutating them generates false signals.

Run only unit tests during mutation testing. Integration tests are too slow to run against hundreds of mutants. Your unit tests should kill unit-level mutants.

Disable specific mutators that generate noise for your codebase. MethodCallRemoval is often too aggressive and generates mutants that aren't meaningful for your logic.

The Slow Problem and How to Address It

Mutation testing is inherently slow. If you have 500 tests and generate 400 mutants, you're potentially running your test suite 400 times. There are strategies to make this manageable:

Run in CI on a schedule, not on every commit. Mutation testing is a quality audit tool, not a per-commit gate. Run it nightly or weekly.

Use --filter to target specific files. When working on a new feature, run mutation testing only against that module:

vendor/bin/infection \
    --filter=app/Services/Billing \
    --threads=8

Use --only-covered to skip uncovered code. Don't waste time mutating code with no tests:

vendor/bin/infection --only-covered

Increase thread count. On a CI machine with 8 cores, --threads=8 can cut runtime by 70%.

Interpreting Mutation Score Thresholds

What's a good mutation score? It depends on code criticality:

Code Type	Target MSI
Core business logic	85%+
Service layer	75%+
Utilities and helpers	70%+
Controllers and glue	Not applicable

Don't chase 100%. Some mutants represent valid alternative implementations that behave identically in practice. Spending time killing every surviving mutant has diminishing returns past 85-90%.

Mutation Testing vs. Coverage

The two metrics answer different questions:

Metric	Question Answered
Code coverage	Did my tests execute this code?
Mutation score	Would my tests catch a bug here?

You want both. High coverage with low mutation score means your tests run the code but don't verify its behavior. Low coverage with high mutation score means the code that is tested is tested well, but large swaths of code are untested entirely.

Target: high coverage AND high mutation score for critical business logic.

Integrating into Your Development Workflow

The most effective workflow:

Write code and tests normally
Before marking a feature as complete, run mutation testing on the new code
Kill surviving mutants by writing better tests
Add mutation testing to your weekly CI run to track score trends over time

Track mutation scores in your CI dashboard over time. A declining score indicates test quality is degrading as new code is written without adequate tests.

Practical Takeaways

Code coverage measures test execution; mutation score measures test quality
Infection is the standard PHP mutation testing tool; configure it to target your business logic
Surviving mutants are specific test gaps, not abstract problems; each one points to a test to write
Run mutation testing on a schedule rather than every commit due to runtime cost
Target 80-85% mutation score for business logic; don't chase 100%

Need help building reliable systems? We help teams architect software that scales. scopeforged.com

Mutation Testing: The Metric That Exposes Weak Test Suites

The Coverage Lie

What Mutation Testing Does

Infection: Mutation Testing for PHP

Reading the Infection Report

Writing Tests That Kill Mutants

Configuring Infection for Your Project

The Slow Problem and How to Address It

Interpreting Mutation Score Thresholds

Mutation Testing vs. Coverage

Integrating into Your Development Workflow

Practical Takeaways

Share this article

Related Articles

Your Developer Should Audit Their Own Code (Most Don't)

Visual Regression Testing: Catching UI Bugs Automatically

Test Data Management: Factories, Fixtures, and Seeding at Scale

Need help with your project?

ScopeForged Assistant