Statistics for Engineers

Heinrich Hartmann, Circonus

Abstract: 

Gathering telemetry data is key to operating reliable distributed systems at scale. Once you have set up your monitoring systems and recorded all relevant data, the challenge becomes to make sense of it and extract valuable information, like:

  • Is the system down?
  • Is user experience degraded for some percentage of our customers?
  • How did our query response times change with the last update?

Statistics is the art of extracting information from data. In this tutorial, we address the basic statistical knowledge that helps you at your daily work as an SRE. We will cover probabilistic models, summarizing distributions with mean values, quantiles, and histograms and their relations.

The tutorial focuses on practical aspects, and will give you hands-on knowledge of how to handle, import, analyze, and visualize telemetry data with UNIX tools and the IPython toolkit.
This tutorial has been given at several occasions over the last year and has been refined and extended since, cf. Twitter #StatsForEngineers

BibTeX
@conference {208525,
author = {Heinrich Hartmann},
title = {Statistics for Engineers},
year = {2016},
address = {Dublin},
publisher = {USENIX Association},
month = jul
}