Analysis of operational costs

We have shown that FFPF increases packet filtering efficiency even for relatively simple tasks. The previous tests fail to show, however, where the performance gains originate and how the system would operate with more complex filters. Table 1 breaks down the overhead in several subtasks.

Rows

deal with general overhead, namely the calling of a filter , the total overhead per filter in the flowgraph (with filters that return immediately after being called to show only framework overhead), the saving of an element in an index buffer and the saving of a 1500B packet to $\ensuremath{PBuf}$ . The decrease in cost by a factor 50 for saving a reference in $\ensuremath{IBuf}$ over saving a full packet shows that in the presence of overlapping flows, FFPF's flowgroups can truly increase efficiency. This, combined with memory mapping of buffers, is perhaps the most important factor to the gradual degradation of performance when running multiple applications.

Rows

show resource consumption for a number of often executed filters, namely the Aho-Corasick pattern matching algorithm used in snort [31], and a simple tcpdump filter⁵ executed in FPL2 code and BPF respectively. Rows

and

show that FPL2 is four times as efficient as BPF, even for such a trivial filter. While not shown, cost savings grow with expression complexity (as expected). Unfortunately, the performance of really elaborate filters, such as those shown in Figures 6 and 7, cannot be compared, as such complex filters cannot be expressed in BPF.

Pattern matching can also be seen to be costly. We show the case where an application (e.g., snort) is only interested in packets that contain a signature. Especially when a signature is not found after scanning the entire packet processing costs are high (the result shown is for 1500 byte packets). By executing this function in the kernel, FFPF eliminates a journey to userspace for every packet, avoiding unnecessary packet copies, context switches and signalling. Note that even compared to the high overhead of pattern matching, the overhead of storing packets is significant.

The complete cost of context switching is hard to measure due largely to the asynchronous nature of userspace/kernel communication. One measure that is quantifiable is the cost to wake up a user process, row

in Table 1. At 600 cycles (4 times the overhead of a filter stage), this is a significant cost. To minimize this overhead users can reduce communication by batching packets. Waking up a client process only once every

packets reduces this type of overhead by

. In FFPF,

is configured by the size of the circular buffers and can be thousands of packets.

Furthermore, comparing filtering (row

) and framework (rows

) overhead shows that costs due to FFPF's complexity contributes only a moderate amount to overall processing. Finally, we discuss in a related publication that the IXP implementation is able to sustain full Gigabit rates for the same simple filter that was used for Figure 1, while a few hundred Mbps can still be sustained for complex filters that check every byte in the packet [29]. As the FPL-2 code on the IXP is used as pre-filtering stage, we are able to support line rates without being hampered by bottlenecks such as the PCI bus and host memory latency, which is not true for most existing approaches. We conclude that FFPF can be used as an efficient solution for both simple (e.g. BPF) and more complex (sampling, pattern matching) tasks.

Table 1: Breakdown of various types of overhead in cycles

	task	cycles
1	calling a filter	71
2	single filter stage in flowgrabber	171
3	saving index in $\ensuremath{IBuf}$	154
4	storing packet in $\ensuremath{PBuf}$	7479
5	waking up user process	624
6	snort's Aho-Corasick algorithm (match)	1000
7	same but without match	9900
8	FPL-2 filter	185
9	BPF filter	740