We have shown that FFPF increases packet filtering efficiency even for relatively simple tasks. The previous tests fail to show, however, where the performance gains originate and how the system would operate with more complex filters. Table 1 breaks down the overhead in several subtasks.
Rows deal with general overhead, namely the calling of a filter
, the total overhead per filter in the flowgraph (with filters that
return immediately after being called to show only framework
overhead), the saving of an element in an index buffer and the saving
of a 1500B packet to
. The decrease in cost by a factor 50 for
saving a reference in
over saving a full packet shows that
in the presence of overlapping flows, FFPF's flowgroups can truly
increase efficiency. This, combined with memory mapping of buffers,
is perhaps the most important factor to the gradual degradation of
performance when running multiple applications.
Rows show resource consumption for a number of often executed
filters, namely the Aho-Corasick pattern matching algorithm used in
snort
[31], and a simple tcpdump
filter5 executed in FPL2 code and BPF respectively.
Rows and
show that FPL2 is four times as efficient as BPF,
even for such a trivial filter. While not shown, cost savings grow
with expression complexity (as expected). Unfortunately, the
performance of really elaborate filters, such as those shown in
Figures 6 and 7, cannot be compared, as such
complex filters cannot be expressed in BPF.
Pattern matching can also be seen to be costly. We show the case
where an application (e.g., snort
) is only interested in packets that
contain a signature. Especially when a signature is not found
after scanning the entire packet processing costs are high (the result
shown is for 1500 byte packets).
By executing this function in the kernel, FFPF eliminates a journey to
userspace for every packet, avoiding unnecessary packet copies,
context switches and signalling. Note that even compared to the high
overhead of pattern matching, the overhead of storing packets is
significant.
The complete cost of context switching is hard to measure due largely
to the asynchronous nature of userspace/kernel communication. One
measure that is quantifiable is the cost to wake up a user process,
row in Table 1. At 600 cycles (4 times the overhead
of a filter stage), this is a significant cost. To minimize this
overhead users can reduce communication by batching packets. Waking
up a client process only once every
packets reduces this type of
overhead by
. In FFPF,
is configured by the size of the
circular buffers and can be thousands of packets.
Furthermore, comparing filtering (row ) and framework
(rows
) overhead shows that costs due to FFPF's complexity
contributes only a moderate amount to overall processing. Finally, we
discuss in a related publication that the IXP implementation is able
to sustain full Gigabit rates for the same simple filter that was used
for Figure 1, while a few hundred Mbps can still be
sustained for complex filters that check every byte in the
packet [29]. As the FPL-2 code on the IXP is used as
pre-filtering stage, we are able to support line rates without being
hampered by bottlenecks such as the PCI bus and host memory latency,
which is not true for most existing approaches. We conclude that FFPF
can be used as an efficient solution for both simple (e.g. BPF) and
more complex (sampling, pattern matching) tasks.