The problem

A click-through agreement is set up when (the webmaster of) one site A, the referrer, applies for an account at the target site B that is running a click-through payment program. This typically takes place by A filling out a form for B in which A provides, for example, its address to which payment checks should be mailed. After establishing the account, A is given advertising material in the form of Hypertext Markup Language (HTML) commands to include in its web pages, possibly along with accompanying images (``banners'') to display on its web pages. These HTML commands include a hypertext link to the target site B; i.e., when a user views A's page and clicks on this link, then the user's browser retrieves the referred-to page from B. In this sense, the user has ``clicked through'' A to get to B. Typically B maintains an account statement for A, so that A can periodically visit site B to see how many referrals A's pages have made to B (and thus the amount of money to which A is entitled).

To understand the risks for fraud in this mechanism, we need to review the actual Hypertext Transfer Protocol (HTTP) messages exchanged during a click-through. This exchange is shown in Figure 1. The exchange begins when the user's browser retrieves a web page from site A, say https://siteA.com/pageA.html. This page contains a hypertext link to a page served by B, say https://siteB.com/pageB.html. A included this link in pageA.html when it registered to participate in B's click-through payment program. When the user clicks on that link, her browser retrieves pageB.html from B. B can use a header of the HTTP protocol, the Referrer header, to determine the URL of the page that referred the user to site B. In this case, the Referrer field will indicate the URL of pageA.html, and so A will be credited with the referral.

**Figure 1:** **A click-through**: User U retrieves `pageA.html` from A (message 1) and clicks on a link in it, causing `pageB.html` on site B to be requested (message 2) and loaded (message 3).
$\begin{figure} \rule{\columnwidth}{.5mm} \begin{center} \setlength {\unitlength... ...\updefault}1}}}}}\end{picture}}\end{center}\rule{\columnwidth}{.5mm}\end{figure}$

The possibilities for abuse in this system should be evident. Since there is no communication to A after the user clicks on the link to pageB.html, there is no way for A to know how many referrals its pages give B. So, B can just ignore the Referrer field of requests and thus fail to give proper credit to A; as described earlier, this is called hit shaving in the click-through payment industry. B is also subject to abuses by A, e.g., if A generates false requests to B with Referrer fields naming pageA.html. In this way, A can unfairly inflate the payment that it receives from B, and we will henceforth refer to such practices by A as hit inflation. Like shaving, hit inflation is a recognized problem in the click-through payment industry. Many providers of click-through payment programs threaten cancellation of a referrer's account if hit inflation by an account holder is detected, but for obvious reasons providers do not typically reveal their methods for detecting hit inflation. It is likely that these methods are at least partially based on monitoring user IP addresses, and in particular detecting multiple requests from the same IP address or domain. Indeed, many click-through programs agree to pay only for ``unique'' referrals, i.e., referred requests from users at different addresses.

In this paper we treat only the problem of hit shaving. Nevertheless, the threat of hit inflation shapes the class of solutions that we are willing to consider. For example, one approach to detect hit shaving would be for A to craft pageA.html so that its link purportedly to pageB.html is actually a link to a URL on site A; then, when the user clicks on that link, A retrieves pageB.html from B and serves it to the user. This enables A to precisely know how many referrals it gives to B. However, this exacerbates the problem of detecting hit inflation, because it establishes a norm in which A directly issues to B all requests for which it should be credited. This hampers B's ability to detect hit inflation based on user IP addresses. For this reason, we require that our solutions do not change the fact that the user's browser requests B's pages directly from B.