Summit Program

All sessions will be held in Lincoln 5 unless otherwise noted.

 

Friday, November 13, 2015

8:00 am–9:00 am Friday

Continental Breakfast

Thurgood Marshall Ballroom Foyer

9:00 am–10:30 am Friday

LISA15 Keynote Address

Lean Configuration Management

9:00 am-10:30 am
Keynote Address

Jez Humble, VP, Chef

Thurgood Marshall Ballroom

Jez Humble is a vice president at Chef, a lecturer at UC Berkeley, and co-author of the Jolt Award winning Continuous Delivery, published in Martin Fowler’s Signature Series, and Lean Enterprise, in Eric Ries’ Lean series. He has worked as a software developer, product manager, executive, consultant and trainer across a wide variety of domains and technologies. His focus is on helping organizations deliver valuable, high-quality software frequently and reliably through implementing effective engineering practices.

Configuration management is an essential ingredient in creating high performance IT. But how you implement it matters. In this talk Jez will present the principles that enable high throughput and stability and the configuration management practices behind them, using models drawn from the Lean movement.

Configuration management is an essential ingredient in creating high performance IT. But how you implement it matters. In this talk Jez will present the principles that enable high throughput and stability and the configuration management practices behind them, using models drawn from the Lean movement.

Available Media

10:30 am–10:45 am Friday

Break with Refreshments

Thurgood Marshall Ballroom Foyer

10:45 am–11:30 am Friday

Invited Talk

Scaling Mobile Testing on AWS: Emulators All the Way Down

Kim Moir, Mozilla Corporation

This talk will explore the evolution of Mozilla's continuous integration infrastructure for Firefox for Android. From our early device lab, to running tests on reference cards in custom racks, to our current implementation running on emulators in AWS. In addition, I'll discuss how we reduced the cost of running our tests in AWS by the use of spot instances, and fine tuning the selection of instance types. Finally, I'll discuss how we analyzed regression data to prune the number of tests we run to extend the capacity of our test pools and reduce costs. To give you some scope, our continuous integration farm consists of 6500 machines, 75,000 combined daily build and test jobs that are triggered by an average 300 pushes.

This talk will explore the evolution of Mozilla's continuous integration infrastructure for Firefox for Android. From our early device lab, to running tests on reference cards in custom racks, to our current implementation running on emulators in AWS. In addition, I'll discuss how we reduced the cost of running our tests in AWS by the use of spot instances, and fine tuning the selection of instance types. Finally, I'll discuss how we analyzed regression data to prune the number of tests we run to extend the capacity of our test pools and reduce costs. To give you some scope, our continuous integration farm consists of 6500 machines, 75,000 combined daily build and test jobs that are triggered by an average 300 pushes.

11:30 am–12:15 pm Friday

Invited Talk

How I Learned to Stop Worrying and Love Push-On-Submit

Sam Mussmann, Google

Pushing configuration changes on submit promises faster push latency and fewer humans in the loop, but at the cost of some intimidating risks: How much will we have to change our configuration? What will I really gain from this? What if these robots take my job?

Sam Mussmann will answer these questions and more from his experience of creating and rolling out a push-on-submit system for his team at Google, a system that his team would now not be caught dead without. A highlight reel of benefits include faster incident response, better configuration, and better documented push processes.

Pushing configuration changes on submit promises faster push latency and fewer humans in the loop, but at the cost of some intimidating risks: How much will we have to change our configuration? What will I really gain from this? What if these robots take my job?

Sam Mussmann will answer these questions and more from his experience of creating and rolling out a push-on-submit system for his team at Google, a system that his team would now not be caught dead without. A highlight reel of benefits include faster incident response, better configuration, and better documented push processes.

Available Media
12:15 pm–1:30 pm Friday

Luncheon

1:30 pm–2:15 pm Friday

Lightning Talks I

Building a Distribution and Continuous Delivery for Network Devices

Akshat Sharma, Cisco Systems, Inc. 

At Cisco, we're working to change the way our gear participates in the datacenter ecosystem. Over the next few months, releases of both NXOS and IOS-XR will be released which expose our Linux Hosting environment. This lighting talk is a brief story of our funky embedded linux turning into a faux server distribution and how we built tools to support a CI pipeline along the way. Because Cisco is an enormous company with lots of interesting people and priorities, I'll share some anecdotal stories that should be entertaining or at least amusing (embedded devices are bedeviling in some ways: limited memory, disk, compute, etc.)

At Cisco, we're working to change the way our gear participates in the datacenter ecosystem. Over the next few months, releases of both NXOS and IOS-XR will be released which expose our Linux Hosting environment. This lighting talk is a brief story of our funky embedded linux turning into a faux server distribution and how we built tools to support a CI pipeline along the way. Because Cisco is an enormous company with lots of interesting people and priorities, I'll share some anecdotal stories that should be entertaining or at least amusing (embedded devices are bedeviling in some ways: limited memory, disk, compute, etc.)

Distributed Systems at Scale: Reducing the Fail

Kim Moir, Mozilla Corporation

Mozilla has a large continuous integration farm that we use to build and test our products. This talk will discuss 10 spectacular ways it can burst into flames and the steps we are taking to make it more resilient.

Mozilla has a large continuous integration farm that we use to build and test our products. This talk will discuss 10 spectacular ways it can burst into flames and the steps we are taking to make it more resilient.

Amake: Cached Builds of Top-Level Targets

Jim Buffenbarger, Boise State University

I will describe a software-build tool named Amake, an extension of GNU Make. Its additional features solve important problems that have, until now, only been addressed by "high-end" build tools (e.g., ClearCase and Vesta).

With a typical build tool, if a top-level target must be updated, intermediate targets must be built from sources, and then combined to build the top-level target. The enhancements described here allow a top-level target to be fetched from a shared cache, without building, or even fetching its intermediate-target dependencies. Thus, a developer's workspace may need only contain sources and top-level targets. This reduces build time, reduces network traffic, and saves disk space.

I will describe a software-build tool named Amake, an extension of GNU Make. Its additional features solve important problems that have, until now, only been addressed by "high-end" build tools (e.g., ClearCase and Vesta).

With a typical build tool, if a top-level target must be updated, intermediate targets must be built from sources, and then combined to build the top-level target. The enhancements described here allow a top-level target to be fetched from a shared cache, without building, or even fetching its intermediate-target dependencies. Thus, a developer's workspace may need only contain sources and top-level targets. This reduces build time, reduces network traffic, and saves disk space.

2:15 pm–3:00 pm Friday

Invited Talk

Pivotal Cloud Foundry Release Engineering: Moving Integration Upstream Where It Belongs

Evan Willey and Dave Liebreich, Pivotal

In our current release engineering workflow, upstream teams do the integration work for us. We focus on coordinating the integration work from those 20 teams, instead of integrating their products directly.

This was not the case one year ago.

This transition was not a simple effort, as there are 20+ Open and Closed Source teams that contribute bits to Pivotal’s commercial offering of the Cloud Foundry PaaS (Platform as a Service). Through automation and system optimization, Release Engineering has become a team that focuses on enabling upstream integration into our product directly by the teams building a specific component of the Product. This is allowing us to perform integration where it is most effective and least costly.

This presentation describes how we've approached the transition from integrators to enablers and what we've learned along the way.

In our current release engineering workflow, upstream teams do the integration work for us. We focus on coordinating the integration work from those 20 teams, instead of integrating their products directly.

This was not the case one year ago.

This transition was not a simple effort, as there are 20+ Open and Closed Source teams that contribute bits to Pivotal’s commercial offering of the Cloud Foundry PaaS (Platform as a Service). Through automation and system optimization, Release Engineering has become a team that focuses on enabling upstream integration into our product directly by the teams building a specific component of the Product. This is allowing us to perform integration where it is most effective and least costly.

This presentation describes how we've approached the transition from integrators to enablers and what we've learned along the way.

3:00 pm–3:45 pm Friday

Invited Talk

Chaos Patterns—Architecting for Failure in Distributed Systems

Jos Boumans, Krux

As we architect our systems for greater demands, scale, uptime, and performance, the hardest thing to control becomes the environment in which we deploy and the subtle but crucial interactions between complicated systems. Chaos Patterns help us establish and implement a virtuous cycle that let’s us both prove & improve our system along each of these dimensions before the inevitable happens.

While it may seem reckless or counter-intuitive, our experience has proven that it's a matter of how and when (not if) we will learn about the limitations and failure modes of the system.

This is the story of the pitfalls we encountered, and how, through architecture, convention, and common sense, we managed to build an infrastructure that is "Always Up" from the end-user perspective, and incredibly economical to build, scale and operate. Using chaos testing, we learn more about how our system fails from a 10 second controlled failure than a multi-hour uncontrolled outage.

As we architect our systems for greater demands, scale, uptime, and performance, the hardest thing to control becomes the environment in which we deploy and the subtle but crucial interactions between complicated systems. Chaos Patterns help us establish and implement a virtuous cycle that let’s us both prove & improve our system along each of these dimensions before the inevitable happens.

While it may seem reckless or counter-intuitive, our experience has proven that it's a matter of how and when (not if) we will learn about the limitations and failure modes of the system.

This is the story of the pitfalls we encountered, and how, through architecture, convention, and common sense, we managed to build an infrastructure that is "Always Up" from the end-user perspective, and incredibly economical to build, scale and operate. Using chaos testing, we learn more about how our system fails from a 10 second controlled failure than a multi-hour uncontrolled outage.

In this session we will cover various implementation techniques, available to any developer and operator, which will vastly increase the resilience of your systems and provide a superior end user experience—from optimizing your use of DNS for failure, to configuring your CDN to have your back, to synthetic responses and expected database outages.

3:45 pm–4:00 pm Friday

Break with Refreshments

Thurgood Marshall Ballroom Foyer

4:00 pm–4:45 pm Friday

Invited Talk

Managing Project Migrations at Scale

Dinah McNutt, Google

Problem: Migrate 1400 build projects to a container-based build environment.

Challenges:

  • Projects were spread across many teams. We had to create a migration plan that would reduce the overhead on each team.
  • Fixed deadline for completing work (no stragglers!)
  • New build environment significantly different from the old one—some tools not available, path changes, resource restrictions.
  • Autonomous development teams with different project configurations

Problem: Migrate 1400 build projects to a container-based build environment.

Challenges:

  • Projects were spread across many teams. We had to create a migration plan that would reduce the overhead on each team.
  • Fixed deadline for completing work (no stragglers!)
  • New build environment significantly different from the old one—some tools not available, path changes, resource restrictions.
  • Autonomous development teams with different project configurations

Solutions:

  • Transparently modify build helper programs where possible to detect environment (old versus new) and "do the right thing"
  • Automate as much of the migration as possible so dev teams don't have to understand the internals of the build system
  • Scripts to assess migration risk for each project (risky projects received direct attention from releng team)
  • Begin standardization effort to make future automation easier
  • Tools to assist teams in identifying correct containers in which to run
  • Daily standups helped the distributed team stay focused and improved communication

Lessons Learned:

  • Standardization is still the key to providing support at scale
  • Most engineers don't care what their build-related files look like
  • deally, engineers should not have to know anything about the build system
4:45 pm–5:30 pm Friday

Lightning Talks II

Science, Reproducibility, and Dependency Management

Sarah Elkins, SRA International

The scientific method includes the principle of reproducibility. The scope for reproducibility is broad for scientific computing experiments and clinical health studies using software applications, and it is difficult to ensure perfect reproducibility, but release engineering has some ways to address and improve it. This lightning talk will focus on relevant aspects of dependency management for builds, touching on third party (open source) packages, artifact repository servers, and proxying from approved external repositories.

The scientific method includes the principle of reproducibility. The scope for reproducibility is broad for scientific computing experiments and clinical health studies using software applications, and it is difficult to ensure perfect reproducibility, but release engineering has some ways to address and improve it. This lightning talk will focus on relevant aspects of dependency management for builds, touching on third party (open source) packages, artifact repository servers, and proxying from approved external repositories.

Automated Testing and Release Engineering

Brian Colfer, Apple Inc.

The heart of the continuous delivery process is confidence based on testing. Design each stage in the delivery pipeline to provide more confidence in the value of the release. Break up the execution of large test suites into smaller test runs focussed on a single point of concern. At each stage, give Engineering confidence that they can make more changes and QA confidence that the product is ready for more testing. Treating tests as an integral part of the product value gives Operations the confidence to release to production.

The heart of the continuous delivery process is confidence based on testing. Design each stage in the delivery pipeline to provide more confidence in the value of the release. Break up the execution of large test suites into smaller test runs focussed on a single point of concern. At each stage, give Engineering confidence that they can make more changes and QA confidence that the product is ready for more testing. Treating tests as an integral part of the product value gives Operations the confidence to release to production.

Available Media

Stop Releasing off Your Laptop—Implementing a Mobile App Release Management Process from Scratch in a Startup or Small Company

Lukas Blakk, Pinterest

You arrive at a new job ready to help a small company get mobile apps out the door quickly and with a high bar for quality. iOS builds are being shipped from one engineer's laptop, there's no automated QA to visualize release readiness, and branching for release building is an entirely manual, minimally documented procedure. What do you take on first, and how can you get to a scalable process with automated testing and a faster release cadence?

You arrive at a new job ready to help a small company get mobile apps out the door quickly and with a high bar for quality. iOS builds are being shipped from one engineer's laptop, there's no automated QA to visualize release readiness, and branching for release building is an entirely manual, minimally documented procedure. What do you take on first, and how can you get to a scalable process with automated testing and a faster release cadence?

This describes my first day at Pinterest. In a former life, at Mozilla, with incredibly scaled infrastructure, telemetry, and process we were able to confidently ship high-quality builds across multiple platforms in the blink of an eye (or at least within 24 hours). With mobile apps, while there are external restrictions to how fast you can move on some platforms, there's a whole world of improvements that can be made to get release builds off of individual contributor computers and start building out automated releases, improving communications between Product and Engineering around what's in each version, and helping a company on a growth spurt build a release process that can scale.