sponsors
usenix conference policies
Root Cause Analysis—Beginner
Hoover Room
This version of the class is aimed at the mid-level sysadmin. You manage servers and/or network gear, look at packet traces, poke through logs—but wouldn’t consider yourself an expert at any of this. You want a chance to tackle the problem on your own, then want guided practice on technique: analyzing a packet trace for performance problems, extracting insights from trending charts, correlating log entries from multiple devices. In this version of the class, we spend time together reviewing concepts (e.g., caching and spindles), applying techniques (e.g., Wireshark features), asking questions (e.g., TCP, SMB, and NFS). In addition to the technical contributors, each team will need a problem manager—perhaps a senior engineer, perhaps a resource or project manager comfortable with coordinating teams of techs.
Troubleshooting is hard. In hindsight, the answer to a problem is often obvious, but in the chaos and confusion of the moment—with too much data flowing in, time pressure, misleading clues—slicing through the distractions and focusing on the key elements is tough. This is a hands-on seminar: you will work through case studies taken from real-world situations. We divide into groups of 5–7, review a simplified version of Advance7′s Rapid Problem Resolution (RPR) methodology, and then oscillate on a half-hour cycle between coming together as a class and splitting into groups. During class time, I will describe the scenario, explain the current RPR step, and offer to role-play key actors. During group time, I will walk around, coaching and answering questions.
The course material includes log extracts, packet traces, strace output, network diagrams, Cacti snapshots, and vendor tech support responses, all taken from actual RCA efforts. Preview the deck to get a feel for how your day will look. BYOL (Bring Your Own Laptop) for some hands-on, interactive, team-oriented, real-world puzzle solving.
System administrators and network engineers tasked with troubleshooting multidisciplinary problems; problem managers and problem analysts wanting experience coordinating teams.
Practice in employing a structured approach to analyzing problems that span multiple technology spaces.
Case studies:
- Remote Office Bumps: A remote office ties back to the campus via a 10MB circuit. Intermittently, opening documents on the campus-based file-server is slow, printing is slow, Exchange appointments vanish…
- Many Applications Crash: Outlook crashes, Word documents fail to save, Windows Explorer hangs: The office automation applications servicing ~1500 users intermittently report a range of error messages; users reboot their machines. Some days are fine, other days are terrible, and the symptoms are worsening…
connect with us