sponsors
usenix conference policies
You are here
Insight: In-situ Online Service Failure Path Inference in Production Computing Infrastructures
Hiep Nguyen, Daniel J. Dean, Kamal Kc, and Xiaohui Gu, North Carolina State University
Online service failures in production computing environments are notoriously difficult to debug. When those failures occur, the software developer often has little information for debugging. In this paper, we present Insight, a system that reproduces the execution path of a failed service request onsite immediately after a failure is detected. Upon a request failure is detected, Insight dynamically creates a shadow copy of the production server and performs guided binary execution exploration in the shadow node to gain useful knowledge on how the failure occurs. Insight leverages both environment data (e.g., input logs, configuration files, states of interacting components) and runtime outputs (e.g., console logs, system calls) to guide the failure path finding. Insight does not require source code access or any special system recording during normal production run. We have implemented Insight and evaluated it using 13 failures from a production cloud management system and 8 open source software systems. The experimental results show that Insight can successfully find high fidelity failure paths within a few minutes. Insight is light-weight and unobtrusive,making it practical for online service failure inference in the production computing environment.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Hiep Nguyen and Daniel J. Dean and Kamal Kc and Xiaohui Gu},
title = {Insight: In-situ Online Service Failure Path Inference in Production Computing Infrastructures},
booktitle = {2014 USENIX Annual Technical Conference (USENIX ATC 14)},
year = {2014},
isbn = {978-1-931971-10-2},
address = {Philadelphia, PA},
pages = {269--280},
url = {https://www.usenix.org/conference/atc14/technical-sessions/presentation/nguyen},
publisher = {USENIX Association},
month = jun
}
connect with us