Flow arrivals are defined only for TCP flows which should start with one SYN packet. A flow is considered to have arrived in a bin if its SYN packet is in that time bin. Flows active during a certain bin, but with their SYN packet before the bin do not count as flow arrivals for that bin (but they count as active flows). If we look a the core flow slicing algorithm we can use the following estimator to compute the number of flow arrivals.
Given that the SYN flag is set in the flow record if it was set in any of the packets counted against the record, it is trivial to prove that leads to unbiased estimates of the number of flow arrivals if we make an assumption.
The flow arrival information is preserved by random packet sampling. Duffield et al. propose two estimators of the number of flow arrivals that work based on flow records collected after random sampling of the traffic [9]. The formulas for the individual contributions of flow records to the total estimate of the number of flow arrivals are as follows.
Duffield et al. show [9] that both estimators are unbiased for flows that have exactly one SYN packet. Both estimators overestimate the number of flow arrivals if flows have more than 1 SYN packet. For flows without any SYN packets which according to our definition of flow arrivals (which differs slightly from that used in [9]) should not be counted, we have and , so to make the second estimator unbiased we need another assumption.
Flows retaining SYN packets after the random packet sampling stage will retain a single SYN packet, and estimates the number of flow arrivals based on the number of such flows. We can easily combine it with to get an estimator for the number of flow arrivals for the combined algorithm using random packet sampling and flow slicing.
treats separately flows that only have a SYN packet after packet sampling and the others that survive it. Fortunately we can differentiate between the two types of flows even after flow slicing is applied: if a flow with a single SYN packet is sampled by flow slicing its record will have and the SYN flag set; if any other flow is sampled by flow slicing and it has at the end of the bin it means that only its last packet was sampled thus it will not have the SYN flag set because that would put it into the category of flows with a single SYN packet surviving the packet sampling. Thus we can combine with to obtain another estimator.
Note that if assumption 1 is violated and we have more than one SYN packet at the beginning of the flow, say due to SYN retransmissions, both estimators will be biased towards over-counting. But if repeated SYNs are a rare enough occurrence, the effect on a final estimate based on many flow records will be small.