Code coverage metrics are often used by developers to identify how well-tested an application is. There are a wide variety of coverage metrics, including statement, branch, MC/DC, method, file, and path coverage. Statement coverage – the ratio of statements executed by tests divided by total number of statements – is the simplest but most commonly used.

While the overall statement coverage of a test suite provides some insight into its (in)completeness, it reduces the quality measure to a single ratio, making developers potentially miss valuable information about their test suite and its limitations. For developers of large, stable projects that have a large number of statements, it is often difficult to recognize any noticeable change in this metric from one commit (patch) to another, e.g., for a project with 1 million lines of code, a change in coverage of even 100 lines would only impact coverage by one hundredth of one percentage point.

Nonetheless, these changes can add up over time: although a single 100-line patch may not make a noticeable change in coverage, many small patches can make such a change. Even more concerning, coverage of some lines may change non-deterministically due to inherent non-determinism in the tests. Even in smaller projects, where an increase in the overall coverage might be more noticeable, tracking only this simple ratio does not capture which statements are covered. In an extreme case, a project with 50% code coverage could maintain that overall coverage while completely flipping the set of statements covered. Coverage can also increase, seemingly indicating a better test suite, even when that is not necessarily the case, e.g., coverage might go up despite a drop in the number of executed statements if code is removed, decreasing the total number of statements even more.

One approach for gaining better insights from statement coverage is to focus not on the coverage of the entire system under test (SUT) but, instead, only on the coverage of each patch (changed statements) performed on the SUT. Collecting patch coverage can be useful because if a patch is not covered enough, then developers can easily flag this patch in code review and require more tests to be added with the patch.

However, even if a patch is well covered, the impact of the patch on the non-patch (unchanged statements) part of the SUT is not known a priori.

To better understand code coverage, how it changes, and how developers can better reason about their code and their tests, we have conducted empirical studies and begun to build tools that surface how the set of statements covered changes, rather than focusing simply on the percentage of statements covered.

For more information please see our paper:

A Large-Scale, Longitudinal Study of Test Coverage Evolution (Michael Hilton, Jonathan Bell, Darko Marinov), In 33rd IEEE/ACM International Conference on Automated Software Engineering, 2018. [bibtex] [pdf]

author = {Hilton, Michael and Bell, Jonathan and Marinov, Darko},
title = {A Large-Scale, Longitudinal Study of Test Coverage Evolution},
series = {ASE 2018},
booktitle={33rd IEEE/ACM International Conference on Automated Software Engineering},
url = {}