We have speculated in several places that our kernel modifications affect data cache utilization. DCPI allows us to estimate the mean cycles per instruction (CPI) for each procedure in a profile, and to estimate the fraction of dynamic stalls caused by data-cache misses. We found that the CPI for the user-mode commSelect() procedure declined from 1.69 to 1.62 as a result of our kernel changes, mostly because of fewer data-cache misses.
We also found that the CPI for in_pcblookup() increased
from about 1.28 to 11.15 as an apparent result of our kernel changes,
even though we did not change the code for this kernel procedure.
This suggests that we somehow created a particularly unlucky
collision in the data caches between the data structures for
in_pcblookup() and those for select().