Table of Contents
A New Focus for a New Century: Availability and Maintainability >> Performance
Thanks to Darrell Long for FAST!
Outline
The past: research goals andassumptions of last 15 years
After 15 years of research on price-performance, what’s next?
Downtime Costs (per Hour)
Total Cost Ownership Hypothesis
Cost of Ownership after 15 years of improving price-performance?
What have we learned from past projects?
Jim Gray: Trouble-Free Systems
Butler Lampson: Systems Challenges
John Hennessy: What Should the “New World” Focus Be?
IBM Research (10/15/2001)
Bill Gates M/S (1/15/2002): “Trustworthy Computing”
New research goals for a New Century: ACME
Where does ACME stand today?
ACME: Availability
ACME: Claims of 5 9s?
ACME: Uptime of HP.com?
“Microsoft fingers technicians for crippling site outages”
ACME: Learning from other fields: disasters
ACME Learning from other fields: human error
ACME: The Automation Irony
Learning from other fields: Bridges
Summary: the present
Outline
Recovery-Oriented Computing Philosophy
ROC approach
ROC Part I: Failure DataLessons about human operators
Failure Data: Public Switched Telephone Network (PSTN) record
Blocked Calls: PSTN in 2000
Failure Data: 2 Internet Sites
Internet Site Failures
ROC Part 1: Failures Data Collection (so far)
ROC Part 2: ACME benchmarks
Availability benchmarking 101
Availability Benchmarking Environment
Example: 1 fault in SW RAID
Software RAID: QoS behavior
ROC Part 2: ACME Benchmarks (so far)
ROC Part 3: Margin of Safety in CS&E?
ROC Part 4: Create and Evaluate Techniques to help ACME
Safe, forgiving space for operator?
Partitioning and Redundancy?
Geographic distribution, Paired Sites
Input Insertion for Detection?
Aid Diagnosis?
Automation vs. Aid?
Refresh via Restart?
Support Operator Trial and Error?
Undo for Sysadmin
Summary: from ACME to ROC
Interested in ROCing?
BACKUP SLIDES
A science fiction analogy: Autonomic vs. ROC
Outage Report
TCO breakdown (average)
Internet x86/Linux Breakdown
Evaluating ROC: human aspects
Example results: software RAID (2)
Lessons Learned from Other Cultures
|