LISA17 Training Program

S1 Hands-On Intro to Kubernetes

Ryan Jarvinen, Red Hat

9:00 am–12:30 pm

Bayview Room

Kick off your journey to becoming a DevOps master by learning Kubernetes from the ground up. Get started with an introduction to distributed systems and the architecture behind Kubernetes; then learn about Kubernetes APIs and API object primitives. By the end of this workshop you’ll be deploying, scaling, and automating container-based solutions using open source tools for distributed computing.

Slides: http://bit.ly/lisa17-k8s. Bring a laptop with the following materials: http://bit.ly/lisa17-k8s#/workshop-setup.

Who should attend:

For developers, systems administrators, "DevOps" folks, architects, and those who are interested in learning about distributed systems via hands-on examples. Attendees should have some basic knowledge of Linux Containers (docker) and have an interest in using distributed architectures to develop web solutions.

Take back to work:

Attendees will learn how to deploy, scale, update, and manage container-based solutions through hands-on examples and exercises

Topics include:

Kubernetes, Distributed computing and solutions delivery, SRE, container operations

Ryan Jarvinen is an Open Source Advocate at CoreOS, focusing on improving developer experience in the container community. He lives in Oakland, California and is passionate about open source, open standards, open government, and digital rights. You can reach him as ryanj on twitter, github, and IRC.

S4 Underpinnings—Container Hosts: CoreOS and Project Atomic

Mark Lamourine, Red Hat

9:00 am–12:30 pm

Marina Room

In this talk I outline the characteristics that define a "container host", an OS tuned to run software in containers. Explore the benefits and peculiarities of a stripped down, light weight minimal OS image and the implications for CM and update strategies.

Then I explore the architecture of two common container hosts, CoreOS and Project Atomic. Each has characteristics that make it suitable for different environments. Users will install one of the two environments and follow along probing and observing how a container host differs in operation from a conventional package based host.

Finally I will look at how a sysadmin's day to day tasks and operations will differ when running infrastructure services and providing application runtime environments for developers and users on container hosts. We will establish base network services (DNS, NTP, Authentication) on container hosts as well as installing and demonstrating utility containers to provide standard admin tools that are stripped from light-weight hosts.

Who should attend:

Sysadmins and service designers interested in learning to use container hosts to reduce host management.

Participation requires access to local or cloud VM service.

Take back to work:

Attendees will understand the goals and basic design requirements for container hosts. They will get an overview of the design of both CoreOS and Atomic host, highlighting the differences in architecture and how these inform the choice of container host for an installation.

They will learn how to boot and integrate container hosts into their existing infrastructure. They will know how to install and use traditional host tools from containers and how to manage, update and customize container hosts.

They will create a sample cluster of either CoreOS or Atomic hosts in a demo environment.

Topics include:

Container Hosts
Large Scale Container Infrastructure
Atomic Host and CoreOS architecture

Mark Lamourine fell into system administration when the VAX shop he worked in as a student inherited a set of HP/UX boxes. He became the de-facto admin because he was the only one in the group who had read a man(8) page. Since then he's done stints as a developer, a QA engineer, a lab infrastructure manager and infrastructure admin at a now-defunct world-wide ISP. These days he plays the Sysadmin Advocate to software developers who think software is done when they've installed it once in Vagrant.

When not computer geeking Mark geeks on road bicycles. He's been riding road fixed-gear for fun since before that was a thing.

Connect:

@markllama

S5 Automating Azure with Resource Manager Templates

George Beech, Stack Exchange

9:00 am–12:30 pm

Garden Room

This class will teach administrators how to get a project up and running with Azure Resource manager templates. These templates are an easy way to define, manage, and deploy instances into the Azure cloud using this technology. Additionally, I will go over some basic best practices for making your template more manageable.

Who should attend:

System Administrators who are new to Azure, or have not worked with Resource Manager Templates in the past. Anyone interested in streamlining and automating his or her workflow in the Azure cloud

Take back to work:

Attendees will take back to work the basic skills to get started automating their Azure deployment. They will have the basic understanding and baseline knowledge to work with ARM templates.

Topics include:

Azure Resource Manager
Azure Powershell/Azure CLI
Basics of the ARM Template layout

Metadata
Parameters
Variables
Template file

Using Parameters, and variables to generalize your deployment
Adjusting resource sizing on the fly
Deploying Resources

Base resource
Sizing
Monitoring Configuration

Resource Dependencies
Troubleshooting Templates
Tips and tricks to help you configure templates

George has been an SRE generalist at Stack Exchange for Since October, 2011. Before that he worked for a Multinational CRM company running their IVR infrastructure. He has worked on every part of the stack from Windows, to Linux, to the network infrastructure. He is currently serving his first term as a LOPSA Director. His experience working in the IT field over more than a decade has led him to love working with multiple technologies, and allowed him to experience everything from running a small network as a consultant to being part of a large team running very large scale infrastructure.

In the past he has spoken at LISA, Velocity NYC, Local user groups, and LOPSA-EAST. As well as writing about his experience working on a high volume web infrastructure on his personal blog as well as the Server Fault blog.

Full Day

S2 Everything an Administrator Wanted to Know about Operating System Internals but Was Afraid to Ask

Caskey Dickson, Microsoft

9:00 am–5:00 pm

Seacliff AB Room

Insufficient knowledge of operating system internals is my most common reason for passing on an interview candidate. Anyone can learn that you run tool X to fix problem Y. But what happens when there is no tool X, or when you can't even accurately pinpoint the root cause of why "It's sometimes slow."

This will be a no-holds-barred, fury-road-paced review of all major parts of modern operating systems with specific emphasis on what's important for system administrators. It will provide just enough of an academic focus to bridge the "whys" so you can make better use of fiddling with the "whats" on a day-to-day basis. As an added bonus, it will prime you for the following day's "Linux Performance Tuning" tutorial with Theodore Ts'o.

You will learn about process management, scheduling, file system architecture and internals, interrupt management, the mysteries of the MMU and TLB, belady's anomaly, page replacement algorithms and hopefully a bit of networking. In a nutshell, we'll cover 16 weeks of college-level material in a few hours.

Buckle up.

S3 Automation Tools Bootcamp

Tyler Fitch, Adobe

9:00 am–5:00 pm

Seacliff C Room

The Automation Tools Bootcamp is a tutorial for individuals looking for exposure to and usage of new IT automation tools. We will learn about and then use Vagrant, Chef, Packer, Docker, Terraform, and Artifactory to deploy a small application in local VMs.

We will cover a progression of tasks, leveraging information from previous sections to deploy a small app that runs identically on your local development machine or on a shared server. Get rid of the “it works for me” mentality when you know your local VM is identical to your co-workers' and your shared environments.

Who should attend:

Operations, QA, those who choose to call themselves DevOps, and even managers can come learn.

Take back to work:

These automation tools are freely available to engineers, enabling them to safely break local environments until the change in configuration has been perfected. Basic exposure to these tools will allow attendees to return to work with new ways to tackle the problems they face daily.

Topics include:

Vagrant, Chef, Packer, Docker, Terraform, and Artifactory

Tyler is a Site Reliability Engineer for the Adobe Stock site—working to automate all the things done to build and release changes to the Stock platforms. He recently finished three years of "post graduate work" in DevOps as an Architect in Chef's Customer Success Program where he helped Chef's largest enterprise customers have delightful experiences in IT Automation. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

Connect:

@tfitch

Half Day Afternoon

S6 Service Management with systemd

Michal Sekletar, Red Hat

1:30 pm–5:00 pm

Bayview Room

Tasks like management and maintenance of services that are critical to the business are on the daily TODO list of every system administrator. Also, containers and micro-service based architectures create the reality in which number of services that sysadmin has to manage is ever growing. To successfully manage thousands of services we need smart tools that can help us. In this session, we will look at systemd. Init system and service manager used by all major Linux distributions. Session will be a hands-on, interactive look at the architecture, capabilities, and administrative how-tos of systemd. Anyone who is new to systemd or looking to dig deeper into some of the advanced features should attend. Please bring a laptop with a virtual machine running a distribution of your choice that uses systemd.

Who should attend:

Linux system administrators, package maintainers and developers who are transitioning to systemd, or who are considering doing so.

Take back to work:

Understanding of how systemd works, where to find the configuration files, and how to maintain them.

Topics include:

The basic principles of systemd
systemd's major components
Anatomy of a systemd unit file
Understanding and optimizing the boot sequence
Improved system logging with the journal
Resource management via systemd's cgroups interface
Simple security management with systemd and the kernel's capabilities
systemd, containers, and virtualization

Michal Sekletar joined Red Hat in 2011 and currently works as Senior Software Engineer in the "Plumbers" team. He spends his days working and supporting init systems and other low level user-space components. He holds a Masters degree from Brno University of Technology. His other professional interests include programming languages, algorithms, and UNIX-like (other than Linux) operating systems.

S7 One Metrics Framework to Rule Them All

Nicole Forsgren, Devops Research

1:30 pm–5:00 pm

Garden Room

Have you ever wondered how to find the “one metric that matters” (for your team)? Or how to magically communicate why your team is doing what you’re doing so everyone can understand? Or moving back several steps -- how should you decide which work to focus on? This tutorial isn’t the magic pill, but it’s the closest thing to get you to be able to answer all of those questions. And once you learn it, you’ll be able to sketch it out on the back of a napkin.

I’ve used this simple framework with:

Fortune 500 executives decide on the right metrics to use for their latest initiatives and communicate it throughout the organization
Sysadmins to communicate their latest improvement work across their own teams and to “the business”
My own research ranging from complex hardware studies to the State of DevOps Reports

The framework works for all types of measures: system, survey, technical, financial, etc.

Who should attend:

Engineers, managers, anyone needing to plan or understand a system.

Take back to work:

When you leave this tutorial, you’ll be able to:

Communicate your measurement framework in a straightforward manner
Identify key measures for your own improvement work, and share this easily with the data team (whether that’s you or another team)
Chain your measurement frameworks, allowing you to link executive-level initiatives to middle management goals to practitioner workstreams

Topics include:

Metrics

S8 The Hardest Problem in Tech(nical Interviewing) Is People: The Personal Skills in Interviewing

Carol Smith, Microsoft
Heidi Waterhouse, Consultant

1:30 pm–5:00 pm

Marina Room

Technical interviews can be intimidating, but it’s easier if you have confidence in yourself and your ability to answer complicated questions. The hardest questions are not about sorting algorithms, but how you’ll work in a team, how you’ll resolve conflicts, and what it will be like to manage and work with you. This workshop exists to address the skills and theories of presenting yourself as confident, capable, and coachable.

Who should attend:

We envision the audience for this tutorial to be people interviewing for technical or technical-adjacent roles at technology companies who are early career (2-7 years). It is meant for beginners, but all are welcome if they want to brush up on their interviewing skills.

Take back to work:

The audience will experience hands-on practice, and can expect to learn tactics for preparing for and excelling at interviews. We will provide handouts for participants to use after the workshop and for practice. Participants will learn how to accomplish the checkpoints of a hiring workflow, including: phone screens, phone interviews, in-person interviews, and how to accept or reject an offer. The take-home worksheets will provide types of interview questions, job search rubric, self-evaluation forms, and resources for further research.

Topics include:

Culture, Interviewing, Career, Early Career, Technology Industry

Carol Smith has over 12 years experience with programs, communities, and partnerships. She worked at GitHub managing education partnerships for the Student Developer Pack and at Google managing the Google Summer of Code program. She has a degree in Journalism from California State University, Northridge, and is a cook, cyclist, and horseback rider.

Connect:

@fossygrl

Heidi Waterhouse is a freelance technical writer, information architect, and active conference speaker. Her experience as an in-demand consultant has given her insight into the interview process across several industry segments and allows her to generate meaningful answers to a wide variety of weird interview questions. In her spare time, she considers the technical writing aspects of sewing patterns.

Connect:

@wiredferret

Monday, October 30, 2017

Full Day

M1 Defending against the Dark Arts

Branson Matheson, Cisco Systems, Inc.

9:00 am–5:00 pm

Bayview Room

Today's threats to the enterprise are manifested in many ways but all share similar traits: highly intelligent, well-funded and determined to gain access. In this class, we will explore the murky world of the black-hats. We will examine your security foot-print as they view it, and discuss ways to minimize it, various vectors for attack, and how to detect and defend. We will spend time talking about current threats, and how they can impact your company, and we will build upon the foundations of good security practice. This class has been updated with current events and topics relative to environment profiling, social engineering and new attack vectors. As with all my classes, this will be accompanied with a pinch of humor and a large dollop of common sense.

Who should attend:

Participants should be beginning to mid-level system administrators of any stripe with an interest in IT Security and a desire to understand their potential adversaries. It is suggested that participants have experience with *nix command line and virtual hosts.

Take back to work:

Tools, tips, tricks, and a working security toolkit which can be implemented to improve monitoring, detection, and defense in your organization. Experience working with (mostly) free security software tools.

Topics include:

Security, Risk Evaluation, Social Engineering

Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."

Connect:

@sandinak

Half Day Morning

M2 The Accidental DBA

Jenni Snyder, Yelp

9:00 am–12:30 pm

Garden Room

Open source relational databases like MySQL and PostgreSQL power some of the world's largest websites, including Yelp. They can be used out of the box with few adjustments and rarely require a dedicated database administrator for the first few months or even years. This means that System Administrators and Site Reliability Engineers are usually the first to respond to some of the more "interesting" issues that can arise as you scale your databases. This tutorial will cover MySQL, but many of the concepts apply to PostgreSQL and other open source RDBMS's. We'll first go over a broad set of DBA basics to introduce MySQL Database Administration and next cover the InnoDB storage engine, database defense and monitoring. Finally, I'll cover the wide array of online resources, books, open source toolkits, and scripts from MySQL, Percona, and the Open Source community that will make the job easier.

Who should attend:

Sysadmins and SREs of all levels who have an interest or need to learn MySQL or supporting an open source relational database.

Take back to work:

Sysadmins and SREs who join us for this tutorial will come away with a real-world and ready for production understanding of why and how MySQL works the way it does.

Topics include:

MySQL Installation and Configuration
Architecture and Filesystem Layout
InnoDB Tuning and Optimization
Transactions
Replication and Scaling Out
Schema/Query Basics, Indexes, and Query Plans
Deciphering Common Errors
Monitoring
Backup and Restore
Troubleshooting
Online Communities
Open Source Toolkits

Connect:

@jcsuperstar

M3 Introduction to R for System Administrators

Robert Ballance, Independent Computer Scientist

9:00 am–12:30 pm

Seacliff C Room

The R programming language and ecosystem constitute a rich tool set for performing system analyses, for communicating the results and importance of those analyses, and for automating the process with reproducible and repeatable results. This brief introduction to R and its ecosystem will provide a walk along the mainline — coming up to speed on R, accessing data, and getting results.

This tutorial will

motivate you to pick up R
introduce the basics of the R language
demonstrate useful techniques using R and RStudio
illustrate ways to simplify your life by automating data analysis and reporting

In-class demonstrations will be complemented with hands-on opportunities during the workshop. Additional exercises and data sets that students can explore following the workshop will be provided.

Who should attend:

This tutorial is designed for system administrators who are awash in operational data and who want to do a more efficient job of understanding their data and communicating their findings to others. Some facility with programming and a knowledge of basic descriptive statistics are assumed. Prior knowledge of R is not required.

Take back to work:

Understanding where R fits into the system administrator’s tool set
Acquaintance with R, R packages, and R Studio
Familiarity with basic R data-manipulation techniques
Motivation to learn or improve your R skills
Next steps to take in mastering R

Topics include:

Analytics of System Data

Dr. Robert Ballance recently completed a White House Presidential Innovation Fellowship where he applied his skills with R to analyzing and delivering broadband deployment data to communities across the U.S.A. He first developed his R-programming skills while managing large-scale High-Performance Computing systems for Sandia National Laboratories. While at Sandia, he developed several R packages used internally for system analysis and reporting. Prior to joining Sandia in 2003, Dr. Ballance managed systems at the University of New Mexico High Performance Computing Center. He has consulted, taught, and developed software, including R packages, PERL applications, C and C++ compilers, programming tools, Internet software, and Unix device drivers. He is a member of USENIX, the ACM, the IEEE Computer Society, the Internet Society, and the American Association for the Advancement of Science. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. Bob received his Ph.D. in Computer Science from U.C. Berkeley in 1989.

Connect:

@BobBallance

M4 Deployment and Orchestration with Terraform

Christopher DeMarco, Rhiza, a Nielsen Company

9:00 am–12:30 pm

Seacliff AB Room

Terraform is a tool for deploying and configuring cloud infrastructure in AWS, Google Compute Engine, Digital Ocean, Azure, and many, many other platforms. It is a consistent, robust, well-maintained alternative to clicking in a web interface or writing custom provisioning code against the cloud provider's API.

This tutorial will show code and runtime examples of deploying various types of cloud infrastructure in AWS, Google Compute Engine, and others. Interactivity is unfortunately not offered due to the logistics of billing for arbitrary cloud resources.

Who should attend:

Novice- to intermediate-level sysadmins who to learn what Terraform is and what it's good for, why you'd use it instead of your cloud provider's web interface or API, and how to implement common patterns across several different providers.

Take back to work:

What is Terraform? What is it good for? How do we use it to build/manage infrastructure? How do we scale it to a team?

Topics include:

Terraform

Sysadmin-turned-IT-turned-DevOps, I've been coding automation since 1998.

Connect:

@cdemarco

M5 Personal Time Management: The Basics for Sysadmins That Are Overloaded

Tom Limoncelli, Stack Overflow, Inc.

9:00 am–12:30 pm

Marina Room

Whether you are a sysadmin, dev, or web ops, time management can be more difficult than any technology issue. This class is for new and junior system admins that have found themselves over their head, overloaded, and looking for a better way to survive the tech world.

This tutorial presents fundamental techniques for eliminating interruptions and distractions so you have more time for projects, prioritization techniques so the projects you do work on have the most impact, plus "The Cycle System," which is the easiest and most effective way to juggle all your tasks without dropping any.

Who should attend:

Sysadmins, devs, operations, and their managers

Take back to work:

By the end of this class, you will be able to schedule and prioritize your work (rather than be interruption-driven), have perfect follow-through (never forget a request), and limit your work-time to 40 hours a week (have a life).

Topics include:

How to manage all the work you have to do.
How to prioritize and eliminate unnecessary tasks.
Manage interruptions: prevent them, managing the ones you get.
The Cycle System for recording and processing to-do lists
Task grouping: batching, sharding, and multitasking

Tom is the SRE Manager at StackOverflow.com and author of Time Management for System Administrators (O'Reilly). He is co-author of The Practice of System and Network Administration (3rd edition just released) and The Practice of Cloud System Administration. He is an internationally recognized author, speaker, system administrator, and DevOps advocate. He's previously worked at small and large companies including Google, Bell Labs/Lucent, and AT&T. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

Connect:

@YesThatTom

Half Day Afternoon

M6 Building a Process to Manage the Problems that Cause Incidents

Jeanne Schock, Armstrong Flooring Inc.

1:30 pm–5:00 pm

Marina Room

All too often, technical teams spend so much time firefighting that they can’t stop to identify and eliminate the problems—the underlying causes—of incidents. Incident resolution is about taking care of the customer—restoring a service to normal levels of operation ASAP. Without a process in place to turn the problem into a known error, the root causes of the incident remain, resulting in recurrences of the incident.

The goals of the Problem Management Process are to prevent repeat incidents and to minimize the impact of incidents and problems that cannot be prevented. Most technical people already have experience in root cause analysis and problem resolution. This tutorial will help them to be measurably more consistent, mature and effective in their practices. Using IT Infrastructure Library (ITIL) best practices, this tutorial will deliver step-by-step instructions on building and managing a problem process.

Who should attend:

Technical people and managers responsible for the support of live production services. This is an operational support process that can be put in place from the bottom up. The more teams involved in the process—DBAs, system administrators, developers, helpdesk—the greater the scope of problems that can be addressed.

Take back to work:

a step-by-step guide for building and implementing a problem process and the reasons behind each step
a process template with examples that can be easily adapted to fit your organization’s current and future needs
instructions on setting up a Known Error Database and communicating work arounds with impacted support teams
guidance for getting buy-in from peers and managers
a complete kit for starting to use After Action Reviews to handle the human component of problems

Topics include:

Incident response vs. problem resolution
Root cause analysis techniques
Making decisions that are aligned with business objectives
Getting buy-in from teammates, colleagues and managers
Proactive problem management
After-action reviews

Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About 7 years ago she transitioned to a role building and managing processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience in Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter.

Connect:

@JeanneSchock

M7 Getting Started with Docker and Containers

Jérôme Petazzoni, Docker Inc.

1:30 pm–5:00 pm

Garden Room

If you still haven't checked that Docker thing, but need (or want) to get started with containers, this tutorial is for you!

After a short introduction explaining various usage scenarios for containers, we will roll up the sleeves of our T-shirts, and run a few simple containers through the Docker CLI. We will explain the difference between containers and images, and write a Dockerfile to build an image for a trivial application. Finally, we will present Compose, a tool to build, run, and manage stacks with multiple containers.

No prior knowledge of Docker is needed. If you know how to interact with the UNIX command line, you're set! Some demos will feature code snippets in Python, Ruby, or even C; but you will be perfectly fine even if your language of choice is Bash.

Advanced topics like networks, volumes, plugins, multi-stage builds, health checks, etc. will be mentioned but not covered in depth.

The tutorial will be hands-on. You will be provided with a pre-configured Docker environment running on a cloud VM (you won't need to setup Docker or Vagrant or VirtualBox on your machine).

Who should attend:

Devs and ops who have managed to avoid the container hype so far but now want to catch up on all that Docker jazz

Take back to work:

The audience will learn about the basic principles of containers: what they are, what they're for, why they have been trending the last few years.

They will also learn how to use the Docker CLI to run simple containers; build container images with Dockerfiles; start multi-container applications with Docker Compose.

This will allow them to understand containers in general and Docker in particular; use them in simple scenarios; and have a reference point for more complex ones.

Topics include:

Docker, containers

Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PAAS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

Connect:

@jpetazzo

M8 Automation with Ansible

Christopher DeMarco, Rhiza, a Nielsen Company

1:30 pm–5:00 pm

Seacliff AB Room

Ansible is a fantastic starting point for automation—either when the learning curve or the infrastructure around Chef/Puppet is too high. New users can start writing useful automation playbooks with just an SSH connection and an hour (or two) reading the docs.

This tutorial will alternate between lecture and hands-on activities using (instructor-supplied) disposable cloud infrastructure.

Who should attend:

Sysadmins with zero exposure to Ansible through intermediate-level users who want a guided tour of its potential.

Take back to work:

Knowledge of what Ansible is, how it works, and how it compares with other configuration-management tools; hands-on experience using Ansible to solve real-world problems; and opinionated best-practices for saving blood, sweat, and/or tears.

Topics include:

Ansible

Sysadmin-turned-IT-turned-DevOps, I've been coding automation since 1998.

Connect:

@cdemarco

M9 "I Never Want to Live through This Again!": Running Excellent Retrospectives

Courtney Eckhardt

1:30 pm–5:00 pm

Seacliff C Room

Your site’s back up, you’re back in business. Do you have a way to make sure that problem doesn’t happen again? And if you do, do you like how it works?

Heroku uses a blameless retrospective process to understand and learn from our operational incidents. We’ve recently released the templates and documentation we use in this process, but experience has taught us that facilitating a retrospective is a skill that’s best taught person to person.

This tutorial will take you through a retrospective based on the internal and external communications of a real Heroku operational incident. We’ve designed it to help you experience first-hand the relaxed, collaborative space that we achieve in our best retrospectives. We’ll practice tactics like active listening, redirecting blame, and reframing conversations. Along the way, we’ll discuss how we developed this process, what issues we were trying to solve, and how we’re still iterating on it.

Who should attend:

Managers, tech leads, anyone interested in retrospective culture and iterating on processes.

Take back to work:

Attendees will have the materials and firsthand experience to advocate for (or to begin) an incident retrospective process at their workplace, or to improve a process they might already be using.

Topics include:

Why run a retrospective
Goal of a retrospective
Blameless retrospectives
Facilitating: redirecting blame, reframing, drawing people out
How to structure a retrospective
Preparing for a retrospective
Five "why"s / infinite "how"s
Human error
Achieving follow-through on remediation items

Courtney Eckhardt first got into retrospectives when she signed up for comp.risks as an undergrad (and since then, not as much has changed as we’d like to think). Her perspectives on engineering process improvement are strongly informed by the work of Kathy Sierra and Don Norman (among others).

Tuesday, October 31, 2017

Full Day

T5 Linux Performance Tuning

Theodore Ts'o, Google

9:00 am–5:00 pm

Bayview Room

Who should attend:

Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

Take back to work:

The ability to hone your Linux systems for the specific tasks they need to perform.

Topics include:

Strategies for performance tuning
Characterizing your workload's requirements
Finding bottlenecks
Tools for measuring system performance
Memory usage tuning
Filesystem and storage tuning
Network tuning
Latency vs. throughput
Capacity planning
Profiling
Memory cache and TLB tuning
Application tuning strategies

Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

Connect:

@tytso

Half Day Morning

T1 Close to the Edge Systems Administration in Go

Chris McEniry, Sony Interactive Entertainment

9:00 am–12:30 pm

Seacliff C Room

There's many times that the daily grind pushes you out of your comfort zone. Sometimes, you're in a bind and the best way forward is fashioning a tool out of what's available. Sometimes, those really are nails you see around you. This class looks at some of the normal, and some of the not so normal, uses for Golang in Systems Administration.

Who should attend:

New Golang programmers who want to get a better idea of using the language (should have some familiarity with Golang).
Old dogs looking for new tricks.

Take back to work:

Several MacGyver tools that may come in handy.
Techniques and approaches for some out of the box thinking.

Topics include:

Running a quick and dirty TLS secured web server for file transfers
Collecting and serving up system metrics
Driving web applications from the command line
Speak http2
Fanout shell results from one system to many with ssh
Roll your own container system
and more

Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

Connect:

@macmceniry

T2 Linux Performance Monitoring with BPF

Sasha Goldshtein, CTO, Sela Group

9:00 am–12:30 pm

Garden Room

eBPF (extended Berkeley Packet Filters) is a modern kernel technology that can be used to introduce dynamic tracing into a system that wasn't prepared or instrumented in any way. The tracing programs run in the kernel, are guaranteed to never crash or hang your system, and can probe every module and function—from the kernel to user-space frameworks such as Node and Ruby.

In this workshop, you will experiment with Linux dynamic tracing first-hand. First, you will explore BCC, the BPF Compiler Collection, which is a set of tools and libraries for dynamic tracing. Many of your tracing needs will be answered by BCC, and you will experiment with memory leak analysis, generic function tracing, kernel tracepoints, static tracepoints in user-space programs, and the "baked" tools for file I/O, network, and CPU analysis. You'll be able to choose between working on a set of hands-on labs prepared by the instructors, or trying the tools out on your own test system.

Next, you will hack on some of the bleeding edge tools in the BCC toolkit, and build a couple of simple tools of your own. You'll be able to pick from a curated list of GitHub issues for the BCC project, a set of hands-on labs with known "school solutions", and an open-ended list of problems that need tools for effective analysis. At the end of this workshop, you will be equipped with a toolbox for diagnosing issues in the field, as well as a framework for building your own tools when the generic ones do not suffice.

Who should attend:

Developers, SRE, ops engineers

Take back to work:

Low-overhead, production-ready tools based on the BPF kernel technology for CPU sampling, memory leak analysis, I/O and file issues, and many other performance and troubleshooting scenarios.

Topics include:

Performance, Monitoring, Tracing, BPF, Kernel

Sasha Goldshtein is the CTO of Sela Group, a Microsoft MVP, Pluralsight author, and international consultant and trainer. Sasha is the author of two books and multiple online courses, and a prolific blogger. He is also an active open source contributor to projects focused on system diagnostics, performance monitoring, and tracing—across multiple operating systems and runtimes. Sasha authored and delivered training courses on Linux performance optimization, event tracing, production debugging, mobile application development, and modern C++. Between his consulting engagements, Sasha speaks at international conferences world-wide.

Connect:

@goldshtn

T3 Git Foundations: Unlocking the Mysteries

Mike Weilgart, Vertical Sysadmin, Inc.

9:00 am–12:30 pm

Seacliff AB Room

Dozens of commands! Hundreds of options! Git has dumbfounded sysadmins and developers alike since its appearance in 2005.

And yet, this ingenious software is among the most fantastically useful ever developed.

Learn Git from the ground up and the inside out with Git Foundations Training!

This half-day class explores Git's internals in depth and includes unique practical exercises to gain familiarity and comfort in handling the nuts and bolts.

Bring with you:

A laptop with a UNIX-like command-line environment on which "git --version" displays a version (any version).
A willingness to learn.

Who should attend:

No prior knowledge of Git is required. Basic Unix/Linux command line experience is assumed. Experienced users of Git have given rave reviews; the class is not aimed only at beginners, but at anyone wishing to thoroughly understand and use Git to the fullest.

Take back to work:

A thorough and practical understanding of the internals of Git
The ability to easily and *confidently* manipulate Git repositories and their contents
Readiness to pick up and *quickly* learn more exotic and advanced Git commands (and to read the man pages easily!)

Topics include:

Git Internals are covered in depth, beginning from basic definitions and proceeding through the essentials of Graph Theory needed to appreciate Git's architecture. Plenty of audience Q&A throughout, live demonstrations, and diagrams. Following this complete theory portion comes the practical portion of the course, with hands-on exercises to ensure retention and application of all theory.

Mike Weilgart has loved maths and computers all his life. Graduating high school at the age of 13, he thereafter worked in a variety of positions including software QA, calculus teacher, and graphic design, before resolving to put his love of computers to professional use as a Linux sysadmin and trainer. Mike currently consults at a Fortune 50 company as an automation specialist, and enjoys nothing more than training people to full mastery of their tools.

T4 Speed Up Your Change Control: Streamline Your Change Process without Increasing Risks

Jeanne Schock, Armstrong Flooring Inc.

9:00 am–12:30 pm

Marina Room

Speedy Change Control is not an oxymoron. This tutorial will provide practical, actionable steps to streamline and speed up change control at your organization without increasing risks. In The Visible Ops Handbook, authors Behr, Kim, and Spafford identify a culture of change management as common to high-performing IT groups: “change management does not slow things down in these organizations.” This tutorial will help anyone wishing to implement phase one of the handbook: “Stabilize The Patient” And “Modify First Response”. While I draw heavily on IT infrastructure Library (ITIL) guidance, much of this is common sense good practice based on lessons learned from past success and failure. No special ticketing system, tools or ITIL knowledge are necessary. I am a certified ITIL Expert. I have over five years of experience designing, improving and managing a successful change management process at an audited technology company delivering public registry and DNS services running on complex technologies across international data centers.

Who should attend:

Individuals and managers involved in preparing for and deploying changes and software builds in production environments.

Take back to work:

templates for change request types and procedures
templates for creating standard operating procedures
ITIL-aligned talking points for making your case for these process improvements

Topics include:

Change management
Process
Different change types to help you speed up the process
Assessing risks and potential impact
Defining change authorities specific for each change type
Metrics for measuring change process performance against goals
Release and deployment management
Continuous delivery

Connect:

@JeanneSchock

Half Day Afternoon

T6 Troubleshooting Performance Issues in PostgreSQL

Camille Baldock, Salesforce

1:30 pm–5:00 pm

Marina Room

This tutorial will give you ways of diagnosing and preempting PostgreSQL performance issues using a wide range of tools and techniques to measure and improve your database's performance. We will cover query optimisation, configuration, and OS settings for your database server and pooling, caching, replication, and partitioning strategies that can be used to ensure performance at scale.

Who should attend:

The target audience for this talk is server administrators and developers working with PostgreSQL, or considering using it. No specific knowledge of PostgreSQL is required but some background in RDBMS or SQL is recommended.

Take back to work:

System administrators will benefit by learning about:

what aspects of server and PostgreSQL configuration affect database performance and how to choose and tweak them
how to monitor the database server to maintain high performance

Developers will benefit by learning about:

detecting performance issues in their database usage
optimising their queries

Topics include:

This tutorial breaks down into the various potential causes of performance issues in PostgreSQL: how to diagnose them, fix them and monitor them
Query performance issues
Choosing the right PostgreSQL configuration within hardware and OS limitations
Operating system and hardware tweaks that can affect performance
Optimising database usage
Monitoring your database and database servers performance

Camille Baldock is an infrastructure engineer with the Heroku Department of Data. She works on distributed systems monitoring, operations, automation, and tuning for Heroku Postgres.

Connect:

@camille_

T7 Distributed Systems Building Blocks

John Looney, Intercom

1:30 pm–5:00 pm

Seacliff AB Room

All distributed systems make tradeoffs and compromises. Different designs behave very differently with respect to cost, performance, and how they behave under failure conditions.

It's important to understand the tradeoffs that the building blocks in your systems make, and the implications this has for your system as a whole. In this workshop we'll look at several examples of different real-world distributed systems and discuss their strengths and shortcomings.

This workshop will include some practical elements. Attendees will be given some system designs to read and to evaluate, and then we'll discuss the implications of each design together as a group.

Who should attend:

People working with distributed systems, who want to fill-in the blanks as to what 'distributed systems' are supposed to be.

Take back to work:

They will know the basic building blocks of distributed systems, how to choose between different implementations as needed.

They will know the names and basic details on common distributed systems patterns, why they exist and what happens when they are not applied correctly.

Topics include:

Distributed Systems Primer

John Looney is an SRE in Intercom, pretending to be a Product Engineer, improving infrastructure and reliability while pretending to also add features customers want.

Previously, he spent a decade in Google SRE running GFS, Borg, Colossus, Chubby, Datacenter Automation, Ads Quality pipelines and Ads Serving systems.

He has been on the programme committee of SRECon Dublin for the last three years, and presented a 'Large Scale Design' tutorial at LISA in 2012.

T8 Setting up CI/CD Pipelines

Aleksey Tsalolikhin, Vertical Sysadmin, Inc.

1:30 pm–5:00 pm

Garden Room

Attendees will learn how CI/CD pipelines can increase IT velocity (from Dev to Ops), increase code quality and lower risk; and will learn how to implement CI/CD pipelines in two popular tools, Jenkins and GitLab CI.

Who should attend:

Infrastructure engineers, system administrators, or DevOps engineers familiar with Git who have to set up or support CI/CD pipelines.

Take back to work:

Familiarity with CI/CD concepts; ability to implement CI/CD pipelines using popular tools such as Jenkins and GitLab CI.

Topics include:

Introduction and orientation
- Origin of Continuous Integration (CI) at ThoughtWorks
- Widespread adoption; how CI relates to DevOps
- Basic tasks: Build, Test, Deploy
Jenkins
- Overview and Architecture
- Definition of Key Terms
- Building, Testing and Deploying (with hands-on lab)
- Checking Pipeline status with Jenkins Blue Ocean UI
- Troubleshooting
GitLab CI
- Architecture: GitLab, GitLab CI Multi Runner, ephemeral test environments
- Definitions: pipeline, stage, job, build, runner, environment, artifact, cache
- Setting up runners: adding job runners; host instance types (shell, Docker, ssh, etc.); runner/job tags
- Building, Testing, and Deploying (with hands-on lab)
- Troubleshooting: build logs; enabling verbose builds; increasing "loglevel"; interactive access to containers

Aleksey Tsalolikhin is a practitioner in the area of Operations of information systems. Aleksey's mission is to improve the lives of fellow practitioners through effective training in excellent technologies. Aleksey is the principal at Vertical Sysadmin, which provides on-site training on UNIX shell basics, version control with Git, Configuration Management, Continuous Integration/Continuous Deployment, SQL basics and more.

Connect:

@atsaloli

T9 Build, Ship, and Run Microservices on a Docker Swarm Cluster

Jérôme Petazzoni, Docker Inc.

1:30 pm–5:00 pm

Seacliff C Room