usenix conference policies
Cobra: Content-based Filtering and Aggregation of Blogs and RSS Feeds
Blogs and RSS feeds are becoming increasingly popular. The blogging site LiveJournal has over 11 million user accounts, and according to one report, over 1.6 million postings are made to blogs every day. The “Blogosphere” is a new hotbed of Internet-based media that represents a shift from mostly static content to dynamic, continuously-updated discussions. The problem is that finding and tracking blogs with interesting content is an extremely cumbersome process. In this paper, we present Cobra (Content-Based RSS Aggregator), a system that crawls, filters, and aggregates vast numbers of RSS feeds, delivering to each user a personalized feed based on their interests. Cobra consists of a three-tiered network of crawlers that scan web feeds, filters that match crawled articles to user subscriptions, and reflectors that provide recently-matching articles on each subscription as an RSS feed, which can be browsed using a standard RSS reader. We present the design, implementation, and evaluation of Cobra in three settings: a dedicated cluster, the Emulab testbed, and on PlanetLab. We present a detailed performance study of the Cobra system, demonstrating that the system is able to scale well to support a large number of source feeds and users; that the mean update detection latency is low (bounded by the crawler rate); and that an offline service provisioning step combined with several performance optimizations are effective at reducing memory usage and network load.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Ian Rose and Rohan Murty and Peter Pietzuch and Jonathan Ledlie and Mema Roussopoulos and Matt Welsh},
title = {Cobra: Content-based Filtering and Aggregation of Blogs and {RSS} Feeds },
booktitle = {4th USENIX Symposium on Networked Systems Design \& Implementation (NSDI 07)},
year = {2007},
address = {Cambridge, MA},
url = {https://www.usenix.org/conference/nsdi-07/cobra-content-based-filtering-and-aggregation-blogs-and-rss-feeds},
publisher = {USENIX Association},
month = apr
}
connect with us