Alejandro Cuevas, Carnegie Mellon University; Fieke Miedema, Delft University of Technology; Kyle Soska, University of Illinois Urbana Champaign and Hikari Labs, Inc.; Nicolas Christin, Carnegie Mellon University and Hikari Labs, Inc.; Rolf van Wegberg, Delft University of Technology
A number of recent studies have investigated online anonymous ("dark web") marketplaces. Almost all leverage a "measurement-by-proxy" design, in which researchers scrape market public pages, and take buyer reviews as a proxy for actual transactions, to gain insights into market size and revenue. Yet, we do not know if and how this method biases results.
We build a framework to reason about marketplace measurement accuracy, and use it to contrast estimates projected from scrapes of Hansa Market with data from a back-end database seized by the police. We further investigate, by simulation, the impact of scraping frequency, consistency and rate-limits. We find that, even with a decent scraping regimen, one might miss approximately 46% of objects—with scraped listings differing significantly from not-scraped listings on price, views and product categories. This bias also impacts revenue calculations. We find Hansa's total market revenue to be US $50M, which projections based on our scrapes underestimate by a factor of four. Simulations further show that studies based on one or two scrapes are likely to suffer from a very poor coverage (on average, 14% to 30%, respectively).
A high scraping frequency is crucial to achieve reliable coverage, even without a consistent scraping routine. When high-frequency scraping is difficult, e.g., due to deployed anti-scraping countermeasures, innovative scraper design, such as scraping most popular listings first, helps improve coverage. Finally, abundance estimators can provide insights on population coverage when population sizes are unknown.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Alejandro Cuevas and Fieke Miedema and Kyle Soska and Nicolas Christin and Rolf van Wegberg},
title = {Measurement by Proxy: On the Accuracy of Online Marketplace Measurements},
booktitle = {31st USENIX Security Symposium (USENIX Security 22)},
year = {2022},
isbn = {978-1-939133-31-1},
address = {Boston, MA},
pages = {2153--2170},
url = {https://www.usenix.org/conference/usenixsecurity22/presentation/cuevas},
publisher = {USENIX Association},
month = aug
}