Stop Releasing off Your {Laptop—Implementing} a Mobile App Release Management Process from Scratch in a Startup or Small Company

Lukas Blakk; Dave Liebreich

Summit Program

All sessions will be held in Lincoln 5 unless otherwise noted.

Friday, November 13, 2015

8:00 am–9:00 am	Friday
Continental Breakfast Thurgood Marshall Ballroom Foyer
9:00 am–10:30 am	Friday
LISA15 Keynote Address Lean Configuration Management 9:00 am-10:30 am Keynote Address Jez Humble, VP, Chef Thurgood Marshall Ballroom Jez Humble is a vice president at Chef, a lecturer at UC Berkeley, and co-author of the Jolt Award winning Continuous Delivery, published in Martin Fowler’s Signature Series, and Lean Enterprise, in Eric Ries’ Lean series. He has worked as a software developer, product manager, executive, consultant and trainer across a wide variety of domains and technologies. His focus is on helping organizations deliver valuable, high-quality software frequently and reliably through implementing effective engineering practices. Configuration management is an essential ingredient in creating high performance IT. But how you implement it matters. In this talk Jez will present the principles that enable high throughput and stability and the configuration management practices behind them, using models drawn from the Lean movement. Configuration management is an essential ingredient in creating high performance IT. But how you implement it matters. In this talk Jez will present the principles that enable high throughput and stability and the configuration management practices behind them, using models drawn from the Lean movement. Available Media Read more about Lean Configuration Management
10:30 am–10:45 am	Friday
Break with Refreshments Thurgood Marshall Ballroom Foyer
10:45 am–11:30 am	Friday
Invited Talk Scaling Mobile Testing on AWS: Emulators All the Way Down Kim Moir, Mozilla Corporation This talk will explore the evolution of Mozilla's continuous integration infrastructure for Firefox for Android. From our early device lab, to running tests on reference cards in custom racks, to our current implementation running on emulators in AWS. In addition, I'll discuss how we reduced the cost of running our tests in AWS by the use of spot instances, and fine tuning the selection of instance types. Finally, I'll discuss how we analyzed regression data to prune the number of tests we run to extend the capacity of our test pools and reduce costs. To give you some scope, our continuous integration farm consists of 6500 machines, 75,000 combined daily build and test jobs that are triggered by an average 300 pushes. This talk will explore the evolution of Mozilla's continuous integration infrastructure for Firefox for Android. From our early device lab, to running tests on reference cards in custom racks, to our current implementation running on emulators in AWS. In addition, I'll discuss how we reduced the cost of running our tests in AWS by the use of spot instances, and fine tuning the selection of instance types. Finally, I'll discuss how we analyzed regression data to prune the number of tests we run to extend the capacity of our test pools and reduce costs. To give you some scope, our continuous integration farm consists of 6500 machines, 75,000 combined daily build and test jobs that are triggered by an average 300 pushes. Read more about Scaling Mobile Testing on AWS: Emulators All the Way Down
11:30 am–12:15 pm	Friday
Invited Talk How I Learned to Stop Worrying and Love Push-On-Submit Sam Mussmann, Google Pushing configuration changes on submit promises faster push latency and fewer humans in the loop, but at the cost of some intimidating risks: How much will we have to change our configuration? What will I really gain from this? What if these robots take my job? Sam Mussmann will answer these questions and more from his experience of creating and rolling out a push-on-submit system for his team at Google, a system that his team would now not be caught dead without. A highlight reel of benefits include faster incident response, better configuration, and better documented push processes. Pushing configuration changes on submit promises faster push latency and fewer humans in the loop, but at the cost of some intimidating risks: How much will we have to change our configuration? What will I really gain from this? What if these robots take my job? Sam Mussmann will answer these questions and more from his experience of creating and rolling out a push-on-submit system for his team at Google, a system that his team would now not be caught dead without. A highlight reel of benefits include faster incident response, better configuration, and better documented push processes. Available Media Read more about How I Learned to Stop Worrying and Love Push-On-Submit
12:15 pm–1:30 pm	Friday
Luncheon
1:30 pm–2:15 pm	Friday
Lightning Talks I Building a Distribution and Continuous Delivery for Network Devices Akshat Sharma, Cisco Systems, Inc. At Cisco, we're working to change the way our gear participates in the datacenter ecosystem. Over the next few months, releases of both NXOS and IOS-XR will be released which expose our Linux Hosting environment. This lighting talk is a brief story of our funky embedded linux turning into a faux server distribution and how we built tools to support a CI pipeline along the way. Because Cisco is an enormous company with lots of interesting people and priorities, I'll share some anecdotal stories that should be entertaining or at least amusing (embedded devices are bedeviling in some ways: limited memory, disk, compute, etc.) At Cisco, we're working to change the way our gear participates in the datacenter ecosystem. Over the next few months, releases of both NXOS and IOS-XR will be released which expose our Linux Hosting environment. This lighting talk is a brief story of our funky embedded linux turning into a faux server distribution and how we built tools to support a CI pipeline along the way. Because Cisco is an enormous company with lots of interesting people and priorities, I'll share some anecdotal stories that should be entertaining or at least amusing (embedded devices are bedeviling in some ways: limited memory, disk, compute, etc.) Read more about Building a Distribution and Continuous Delivery for Network Devices Distributed Systems at Scale: Reducing the Fail Kim Moir, Mozilla Corporation Mozilla has a large continuous integration farm that we use to build and test our products. This talk will discuss 10 spectacular ways it can burst into flames and the steps we are taking to make it more resilient. Mozilla has a large continuous integration farm that we use to build and test our products. This talk will discuss 10 spectacular ways it can burst into flames and the steps we are taking to make it more resilient. Read more about Distributed Systems at Scale: Reducing the Fail Amake: Cached Builds of Top-Level Targets Jim Buffenbarger, Boise State University I will describe a software-build tool named Amake, an extension of GNU Make. Its additional features solve important problems that have, until now, only been addressed by "high-end" build tools (e.g., ClearCase and Vesta). With a typical build tool, if a top-level target must be updated, intermediate targets must be built from sources, and then combined to build the top-level target. The enhancements described here allow a top-level target to be fetched from a shared cache, without building, or even fetching its intermediate-target dependencies. Thus, a developer's workspace may need only contain sources and top-level targets. This reduces build time, reduces network traffic, and saves disk space. I will describe a software-build tool named Amake, an extension of GNU Make. Its additional features solve important problems that have, until now, only been addressed by "high-end" build tools (e.g., ClearCase and Vesta). With a typical build tool, if a top-level target must be updated, intermediate targets must be built from sources, and then combined to build the top-level target. The enhancements described here allow a top-level target to be fetched from a shared cache, without building, or even fetching its intermediate-target dependencies. Thus, a developer's workspace may need only contain sources and top-level targets. This reduces build time, reduces network traffic, and saves disk space. Read more about Amake: Cached Builds of Top-Level Targets
2:15 pm–3:00 pm	Friday
Invited Talk Pivotal Cloud Foundry Release Engineering: Moving Integration Upstream Where It Belongs Evan Willey and Dave Liebreich, Pivotal In our current release engineering workflow, upstream teams do the integration work for us. We focus on coordinating the integration work from those 20 teams, instead of integrating their products directly. This was not the case one year ago. This transition was not a simple effort, as there are 20+ Open and Closed Source teams that contribute bits to Pivotal’s commercial offering of the Cloud Foundry PaaS (Platform as a Service). Through automation and system optimization, Release Engineering has become a team that focuses on enabling upstream integration into our product directly by the teams building a specific component of the Product. This is allowing us to perform integration where it is most effective and least costly. This presentation describes how we've approached the transition from integrators to enablers and what we've learned along the way. In our current release engineering workflow, upstream teams do the integration work for us. We focus on coordinating the integration work from those 20 teams, instead of integrating their products directly. This was not the case one year ago. This transition was not a simple effort, as there are 20+ Open and Closed Source teams that contribute bits to Pivotal’s commercial offering of the Cloud Foundry PaaS (Platform as a Service). Through automation and system optimization, Release Engineering has become a team that focuses on enabling upstream integration into our product directly by the teams building a specific component of the Product. This is allowing us to perform integration where it is most effective and least costly. This presentation describes how we've approached the transition from integrators to enablers and what we've learned along the way. Read more about Pivotal Cloud Foundry Release Engineering: Moving Integration Upstream Where It Belongs
3:00 pm–3:45 pm	Friday
Invited Talk Chaos Patterns—Architecting for Failure in Distributed Systems Jos Boumans, Krux As we architect our systems for greater demands, scale, uptime, and performance, the hardest thing to control becomes the environment in which we deploy and the subtle but crucial interactions between complicated systems. Chaos Patterns help us establish and implement a virtuous cycle that let’s us both prove & improve our system along each of these dimensions before the inevitable happens. While it may seem reckless or counter-intuitive, our experience has proven that it's a matter of how and when (not if) we will learn about the limitations and failure modes of the system. This is the story of the pitfalls we encountered, and how, through architecture, convention, and common sense, we managed to build an infrastructure that is "Always Up" from the end-user perspective, and incredibly economical to build, scale and operate. Using chaos testing, we learn more about how our system fails from a 10 second controlled failure than a multi-hour uncontrolled outage. As we architect our systems for greater demands, scale, uptime, and performance, the hardest thing to control becomes the environment in which we deploy and the subtle but crucial interactions between complicated systems. Chaos Patterns help us establish and implement a virtuous cycle that let’s us both prove & improve our system along each of these dimensions before the inevitable happens. While it may seem reckless or counter-intuitive, our experience has proven that it's a matter of how and when (not if) we will learn about the limitations and failure modes of the system. This is the story of the pitfalls we encountered, and how, through architecture, convention, and common sense, we managed to build an infrastructure that is "Always Up" from the end-user perspective, and incredibly economical to build, scale and operate. Using chaos testing, we learn more about how our system fails from a 10 second controlled failure than a multi-hour uncontrolled outage. In this session we will cover various implementation techniques, available to any developer and operator, which will vastly increase the resilience of your systems and provide a superior end user experience—from optimizing your use of DNS for failure, to configuring your CDN to have your back, to synthetic responses and expected database outages. Read more about Chaos Patterns—Architecting for Failure in Distributed Systems
3:45 pm–4:00 pm	Friday
Break with Refreshments Thurgood Marshall Ballroom Foyer
4:00 pm–4:45 pm	Friday
Invited Talk Managing Project Migrations at Scale Dinah McNutt, Google Problem: Migrate 1400 build projects to a container-based build environment. Challenges: Projects were spread across many teams. We had to create a migration plan that would reduce the overhead on each team. Fixed deadline for completing work (no stragglers!) New build environment significantly different from the old one—some tools not available, path changes, resource restrictions. Autonomous development teams with different project configurations Problem: Migrate 1400 build projects to a container-based build environment. Challenges: Projects were spread across many teams. We had to create a migration plan that would reduce the overhead on each team. Fixed deadline for completing work (no stragglers!) New build environment significantly different from the old one—some tools not available, path changes, resource restrictions. Autonomous development teams with different project configurations Solutions: Transparently modify build helper programs where possible to detect environment (old versus new) and "do the right thing" Automate as much of the migration as possible so dev teams don't have to understand the internals of the build system Scripts to assess migration risk for each project (risky projects received direct attention from releng team) Begin standardization effort to make future automation easier Tools to assist teams in identifying correct containers in which to run Daily standups helped the distributed team stay focused and improved communication Lessons Learned: Standardization is still the key to providing support at scale Most engineers don't care what their build-related files look like deally, engineers should not have to know anything about the build system Read more about Managing Project Migrations at Scale
4:45 pm–5:30 pm	Friday
Lightning Talks II Science, Reproducibility, and Dependency Management Sarah Elkins, SRA International The scientific method includes the principle of reproducibility. The scope for reproducibility is broad for scientific computing experiments and clinical health studies using software applications, and it is difficult to ensure perfect reproducibility, but release engineering has some ways to address and improve it. This lightning talk will focus on relevant aspects of dependency management for builds, touching on third party (open source) packages, artifact repository servers, and proxying from approved external repositories. The scientific method includes the principle of reproducibility. The scope for reproducibility is broad for scientific computing experiments and clinical health studies using software applications, and it is difficult to ensure perfect reproducibility, but release engineering has some ways to address and improve it. This lightning talk will focus on relevant aspects of dependency management for builds, touching on third party (open source) packages, artifact repository servers, and proxying from approved external repositories. Read more about Science, Reproducibility, and Dependency Management Automated Testing and Release Engineering Brian Colfer, Apple Inc. The heart of the continuous delivery process is confidence based on testing. Design each stage in the delivery pipeline to provide more confidence in the value of the release. Break up the execution of large test suites into smaller test runs focussed on a single point of concern. At each stage, give Engineering confidence that they can make more changes and QA confidence that the product is ready for more testing. Treating tests as an integral part of the product value gives Operations the confidence to release to production. The heart of the continuous delivery process is confidence based on testing. Design each stage in the delivery pipeline to provide more confidence in the value of the release. Break up the execution of large test suites into smaller test runs focussed on a single point of concern. At each stage, give Engineering confidence that they can make more changes and QA confidence that the product is ready for more testing. Treating tests as an integral part of the product value gives Operations the confidence to release to production. Available Media Read more about Automated Testing and Release Engineering Stop Releasing off Your Laptop—Implementing a Mobile App Release Management Process from Scratch in a Startup or Small Company Lukas Blakk, Pinterest You arrive at a new job ready to help a small company get mobile apps out the door quickly and with a high bar for quality. iOS builds are being shipped from one engineer's laptop, there's no automated QA to visualize release readiness, and branching for release building is an entirely manual, minimally documented procedure. What do you take on first, and how can you get to a scalable process with automated testing and a faster release cadence? You arrive at a new job ready to help a small company get mobile apps out the door quickly and with a high bar for quality. iOS builds are being shipped from one engineer's laptop, there's no automated QA to visualize release readiness, and branching for release building is an entirely manual, minimally documented procedure. What do you take on first, and how can you get to a scalable process with automated testing and a faster release cadence? This describes my first day at Pinterest. In a former life, at Mozilla, with incredibly scaled infrastructure, telemetry, and process we were able to confidently ship high-quality builds across multiple platforms in the blink of an eye (or at least within 24 hours). With mobile apps, while there are external restrictions to how fast you can move on some platforms, there's a whole world of improvements that can be made to get release builds off of individual contributor computers and start building out automated releases, improving communications between Product and Engineering around what's in each version, and helping a company on a growth spurt build a release process that can scale. Read more about Stop Releasing off Your Laptop—Implementing a Mobile App Release Management Process from Scratch in a Startup or Small Company

Summit Program

Friday, November 13, 2015

Continental Breakfast

Break with Refreshments

Luncheon

Break with Refreshments