5.1 Metrics

Next: 5.2 Test Setup Up: 5 Real-World Tests Previous: 5 Real-World Tests

5.1 Metrics

The ideal bug detector would detect all extant bugs without flagging correct code as being incorrect. The initial output from cqual is a list of warnings that indicate a type error somewhere in the program. Some of these correspond to real bugs; others are false positives stemming from our conservative tainting approach (and lack of full polymorphism). False negatives are also of interest: we would like all vulnerabilities to show up as warnings. One complicating factor is that many warnings can result from the same bug--for example, if many functions reading network data call a single function that has a format string bug, then all the warnings may go away when that bug is fixed.

We chose the following metrics, measured per-program:

How many known vulnerabilities were detected and how many went undetected?
How many false positives were there?
How easy was it to check whether a warning was a real bug?
How long did the automatic analysis take, and what were its resource needs?
How easy was it to prepare programs for analysis?

**Figure 6:** Results of our experimental evaluation of the tool. The size of the program is measured unpreprocessed and preprocessed, in thousands of lines of code, excluding comments. Time is the wall clock time for a run of `cqual`. Warnings counts the total number of warnings issued by `cqual` after the GUI's recommendations were followed, and Bugs is the number of real vulnerabilities found.
$\begin{figure}\begin{tabular}{lllrrrrr} \hline Name & Version & Description & L... ...dentification service &0.2k &1.2k &3s &0 &0\\ \hline \end{tabular}\end{figure}$

Next: 5.2 Test Setup Up: 5 Real-World Tests Previous: 5 Real-World Tests

Umesh Shankar 2001-05-16