We use two definitions for counting flows: active flows and flow arrivals. A
flow is active during a time bin if it sends at least one packet during that
time bin. Consecutive TCP connections between the same two computers that happen
to share the same port numbers are considered a single flow and they will be
reported in the same flow record under our current assumptions. Active flows
with none of their packets sampled by the flow slicing process, will have no
records; at least some of the flow records we get should be counted as more than
one active flow, so that the total estimate will be unbiased. We count records
with a packet counter of
as
flows and other records as
flow
and this gives us unbiased estimates for the number of active flows.
Proof: There are three possible cases: if a packet before the
last gets sampled, , if only the last packet gets sampled
,
and if none of the packets gets sampled there will be no flow record,
so the contribution of the flow to the estimate of the number of
active flows will be
. The probability of the first
case is
, the probability of the second is
and that of the third is
.
![]() |
![]() |
![]() |
|
![]() |
The estimators for the number of bytes and packets in a flow were trivial to
generalize to the case where we apply random packet sampling before flow slicing
because the expected number of packets and bytes after packet sampling was
exactly times the number before. For the number of active flows there is no
such simple relationship and actually it has been shown that it is impossible to
estimate without significant bias the number of active flows once random
sampling has been applied [5]. But by changing
slightly the definition of flow counts we can take advantage of the SYN flags
used by TCP flows.