Next: 5.2 Test Setup
Up: 5 Real-World Tests
Previous: 5 Real-World Tests
The ideal bug detector would detect all extant bugs without
flagging correct code as being incorrect. The initial output from
cqual is a list of warnings that indicate a type error
somewhere in the program. Some of these correspond to real bugs;
others are false positives stemming from our conservative
tainting approach (and lack of full polymorphism). False
negatives are also of interest: we would like all vulnerabilities
to show up as warnings. One complicating factor is that many warnings
can result from the same bug--for example, if many functions
reading network data call a single function that has a format string bug,
then all the warnings may go away when that bug is fixed.
We chose the following metrics, measured per-program:
- How many known vulnerabilities were detected and how many went
undetected?
- How many false positives were there?
- How easy was it to check whether a warning was a real bug?
- How long did the automatic analysis take, and what were its
resource needs?
- How easy was it to prepare programs for analysis?
Figure 6:
Results of our experimental evaluation of the
tool. The size of the program is measured unpreprocessed and
preprocessed, in thousands of lines of code, excluding
comments. Time is the wall clock time for a run of cqual.
Warnings counts the total number of warnings issued by cqual after
the GUI's recommendations were followed,
and Bugs is the number of real vulnerabilities found.
|
Next: 5.2 Test Setup
Up: 5 Real-World Tests
Previous: 5 Real-World Tests
Umesh Shankar
2001-05-16