Dr NMS or: How Facebook Learned to Stop Worrying and Love the Network

Friday, May 15, 2015 - 9:00am-9:30am

Jose Leitao and David Rothera, Facebook

Abstract: 

Want to learn how Facebook operates their global network to support more than 1.3 billion users? We will be describing the technologies and methods we use to manage Facebook's production network. The neteng org at Facebook has built/leverage several systems for managing and operating the production network, including an audit framework, alarms daemons, drainers, and an automatic remediation engine. This talk will focus on these technologies and how they have helped improve user experience, administer complexity, automate day-to-day operations, mitigate impact, and increase reliability.

Jose Leitao and David Rothera are production netengs in the Network Infrastructure Engineering team at Facebook. Their team responsibilities include maintaining, monitoring, and improving the global production network infrastructure.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {208853,
author = {Jose Leitao and David Rothera},
title = {Dr {NMS} or: How Facebook Learned to Stop Worrying and Love the Network},
year = {2015},
address = {Dublin},
publisher = {USENIX Association},
month = may
}

Presentation Video

Presentation Audio