Configuration in a World of Containers

Chris Crall

Summit Program

All sessions will be held in the Issaquah Room unless otherwise noted.

Monday, November 10, 2014

8:30 a.m.–8:45 a.m.	Monday
Continental Breakfast Second Floor Foyer
8:45 a.m.–9:00 a.m.	Monday
Opening Remarks Dinah McNutt, Google, Inc.
9:00 a.m.–9:45 a.m.	Monday
Invited Talk Release Engineering from the Ground Up Tom Santero, New York Times, Inc. The Search Team at the New York Times manages multiple internal and external facing services, powering everything from Site Search to a public Semantic API. Each of these services are unique, comprising various programming languages, API servers, webservers, distributed databases, and so on. Recently we undertook a complete revamp of our entire toolchain: migrating from SVN to GitHub, running and configuring a new build system, taking ownership over metrics and monitoring throughout the entire stack. Building out this toolchain from scratch afforded us the opportunity to carefully evaluate our needs and weigh the tradeoffs. One of our primary focuses was achieving continuous automated deployments. The Search Team at the New York Times manages multiple internal and external facing services, powering everything from Site Search to a public Semantic API. Each of these services are unique, comprising various programming languages, API servers, webservers, distributed databases, and so on. Recently we undertook a complete revamp of our entire toolchain: migrating from SVN to GitHub, running and configuring a new build system, taking ownership over metrics and monitoring throughout the entire stack. Building out this toolchain from scratch afforded us the opportunity to carefully evaluate our needs and weigh the tradeoffs. One of our primary focuses was achieving continuous automated deployments. This presentation will provide an overview of the process and tooling we selected, illustrating path code travels from development to production. Attendees can expect a serious deliberation over balancing time to production vs test coverage, as well as a discussion of the custom tooling we developed for collecting and displaying release metrics. Tom Santero is a Software Engineer at The New York Times Company, working on Search, Archives and Semantics. Previously, Tom was Director of Evangelism at Basho Technologies, makers of the Riak distributed database. An ardent advocate of lifelong learning (and alliteration), Tom is an active member of the FP and distributed systems community, organizes the NYC Erlang users group and currently sits as an Industry Track Co-Chair for the 2015 ACM International Conference on Distributed Event-Based Systems. Available Media Read more about Release Engineering from the Ground Up
9:45 a.m.–10:30 a.m.	Monday
Invited Talk How Embracing Continuous Release Reduced Change Complexity Caskey Dickson, Google, Inc. This talk is an experience report on the use of continuous release engineering to automate administration of an in-house software deployment to a moderate sized fleet (less than 1,000 nodes). At Google, the team I am on now uses an automatic process to take "green" builds from our continuous integration system and promote them to test, staging and production releases. Unless an error is detected by automated checks, once a line of code has been committed to the revision control system it will appear in production approximately 14 days later. My talk would describe our release process automation, how it differs from the manual periodic (approximately quarterly) releases done before, and how it has borne both unanticipated benefits as well as introduced new failure modes into our workflow. This talk is an experience report on the use of continuous release engineering to automate administration of an in-house software deployment to a moderate sized fleet (less than 1,000 nodes). At Google, the team I am on now uses an automatic process to take "green" builds from our continuous integration system and promote them to test, staging and production releases. Unless an error is detected by automated checks, once a line of code has been committed to the revision control system it will appear in production approximately 14 days later. My talk would describe our release process automation, how it differs from the manual periodic (approximately quarterly) releases done before, and how it has borne both unanticipated benefits as well as introduced new failure modes into our workflow. Caskey Dickson is a Site Reliability Engineer/Software Engineer at Google, where he works writing and maintaining monitoring services that operate at "Google scale." In online service development since 1995, before coming to Google he was a senior developer at Symantec, wrote software for various internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount. Available Media Read more about How Embracing Continuous Release Reduced Change Complexity
10:30 a.m.–11:00 a.m.	Monday
Break with Refreshments Second Floor Foyer
11:00 a.m.–11:45 a.m.	Monday
Invited Talk Embrace Variables So Your Customers Don't Have To Michael Stahnke, Puppet Labs, Inc. Release Engineering can be a science. Like any science, there is a goal of controlling for variables to understand how change impacts the results of the overall system (in this case the software delivery pipeline). At Puppet Labs, our goal is to control for variances on operating systems, package managers, service managers, et al, so that our customers don’t have to. But to achieve this, we must purposely add variables into our release train. We currently build and test for more than 60 operating system targets, and that number is only increasing. When you ship an abstraction layer, you have have to test and build for all of the underlying components, and at our scale that must be automatic and developer serviceable. Release Engineering can be a science. Like any science, there is a goal of controlling for variables to understand how change impacts the results of the overall system (in this case the software delivery pipeline). At Puppet Labs, our goal is to control for variances on operating systems, package managers, service managers, et al, so that our customers don’t have to. But to achieve this, we must purposely add variables into our release train. We currently build and test for more than 60 operating system targets, and that number is only increasing. When you ship an abstraction layer, you have have to test and build for all of the underlying components, and at our scale that must be automatic and developer serviceable. I'll give you an overview of our build system world, and discuss the challenges, rewards, and horrors of trying to provide fast feedback to hungry development teams across dozens of operating systems and network devices. I’ll cover using continuous delivery processes and ideals to ship an on-premise product, what metrics we’ve found the most useful for decision making, and, of course, share a few failures you can laugh at and not repeat in your own environments. Michael Stahnke is Director of Engineering Services at Puppet Labs. Previously, he was the Community Manager and the Release Manager, where he built out the Release Engineering team. He came to Puppet Labs from Caterpillar, Inc. where he was an Infrastructure Architect, system administration team lead, and open source evangelist. Michael also helped get the Extra Packages for Enterprise Linux (EPEL) repository off the ground in 2006, and is the author of Pro OpenSSH (Apress, 2005). Available Media Read more about Embrace Variables So Your Customers Don't Have To
11:45 a.m.–12:30 p.m.	Monday
Invited Talk Notary Service at MongoDB Jonathan Reams, MongoDB At MongoDB, we release new packages, binaries, and debug symbols with every commit that passes all the tests in our Continuous Integration system, and make them available for download. For some time we only provided MD5 checksums with our downloads. Packages that natively support signing - such as RPMs and Windows MSIs - were manually signed when creating a stable release. This meant that private keys had to be distributed to servers where users regularly logged in and manually interacted with them - increasing the risk that a key might be leaked or that the signing process may have problems. The manual process required for signing also meant that only stable release binaries were signed. Tarballs, which are the canonical way we distribute MongoDB, lacked signature files entirely. At MongoDB, we release new packages, binaries, and debug symbols with every commit that passes all the tests in our Continuous Integration system, and make them available for download. For some time we only provided MD5 checksums with our downloads. Packages that natively support signing - such as RPMs and Windows MSIs - were manually signed when creating a stable release. This meant that private keys had to be distributed to servers where users regularly logged in and manually interacted with them - increasing the risk that a key might be leaked or that the signing process may have problems. The manual process required for signing also meant that only stable release binaries were signed. Tarballs, which are the canonical way we distribute MongoDB, lacked signature files entirely. We recently created and deployed an extensible tool to automate the generation of a variety of checksums and signatures for all our cross-platform packages and downloads with an automated “notary service.” The CI system submits artifacts to the notary service during the build process over a RESTful interface and gets all the checksums and signature files it needs returned. Keys for signing can be restricted to a single hardened server, with a single endpoint and API for signing tarballs, RPMs, and Windows MSIs. The distribution and hosting of checksums and signature files is also automated in the CI tool, reducing errors and ensuring that the signatures match the artifacts when they were produced, rather than as an after-effect of the release process. In the talk, we would discuss the problems we had, the tool we created to solve them, and unexpected issues we ran into along the way to making our deployment of the tool scale in production. Jonathan Reams is a build engineer at MongoDB, Inc. on the Core Server team. Currently he works on the toolchain and build system for the MongoDB server. Before joining the build team, he worked as a Systems Engineer on MongoDB’s DevOps team and Columbia University IT’s UNIX Systems Engineering group. Available Media Read more about Notary Service at MongoDB
12:30 p.m.–1:30 p.m.	Monday
Lunch for Workshop Attendees Metropolitan Ballroom
1:30 p.m.–2:15 p.m.	Monday
Lightning Talks From 6 days to 18 Minutes: A Tale of Release Engineering by Geoff Halprin, Telstra The 10 Commandments of Release Engineering by Dinah McNutt, Google Building a Build Server Farm by Jonathan Reams, MongoDB
2:15 p.m.–3:00 p.m.	Monday
Invited Talk The Pain of Mobile Release Engineering at Scale Christian Legnitto, Facebook, Inc. Poor documentation, platform limitations, app stores, carriers, immature tooling with major changes every year, few industry best practices, and relatively inexperienced developers makes it painful to be a mobile release engineer. At Facebook we feel the pain acutely as we have: Many apps… on multiple platforms… used by hundreds of millions of people every day… in developed and developing markets… with hundreds of engineers contributing to those apps… and a very fast ship cycle Because of this scale we also run into a lot of mobile issues before others do. I'll talk about the general challenges the industry faces shipping on mobile today, illustrated using real war stories from mobile release engineering at Facebook. Poor documentation, platform limitations, app stores, carriers, immature tooling with major changes every year, few industry best practices, and relatively inexperienced developers makes it painful to be a mobile release engineer. At Facebook we feel the pain acutely as we have: Many apps… on multiple platforms… used by hundreds of millions of people every day… in developed and developing markets… with hundreds of engineers contributing to those apps… and a very fast ship cycle Because of this scale we also run into a lot of mobile issues before others do. I'll talk about the general challenges the industry faces shipping on mobile today, illustrated using real war stories from mobile release engineering at Facebook. Christian Legnitto leads Mobile Release Engineering at Facebook. Christian has worked in release engineering for over seven years and prior to joining Facebook, Christian was a Release Engineer at both Apple and Mozilla. Facebook: https://www.facebook.com/legnitto Twitter: @LegNeato Available Media Read more about The Pain of Mobile Release Engineering at Scale
3:00 p.m.–3:45 p.m.	Monday
Invited Talk Release Engineering: It Ain't What it Was Curt Patrick, Netflix, Inc. During the course of my twenty-four years as a Release Engineer, although tools have improved, machines have gotten faster and we have become more interconnected, for the first three quarters of that time most of what I had been doing had not changed that much. I was still gathering up tasks that slowed down developers in their work, taking responsibility for automating, running and monitoring those tasks, and taking further responsibility to chaperon the product out the door to the customer. During the course of my twenty-four years as a Release Engineer, although tools have improved, machines have gotten faster and we have become more interconnected, for the first three quarters of that time most of what I had been doing had not changed that much. I was still gathering up tasks that slowed down developers in their work, taking responsibility for automating, running and monitoring those tasks, and taking further responsibility to chaperon the product out the door to the customer. More recently, however, I have been noting some evolutionary changes that seem like game changers to me. Especially as a contributor in the Engineering Tools team at Netflix over the past four years (we provide build and release best practices and tooling for the entire company) I have not only had the chance to participate in some changing trends as they happen but am in an environment where the changes get discussed extensively in collaboration. It has given me pause to consider which changes are window dressing or simple technical improvements, versus fundamental shifts. The changing landscape of software development is touching every company today, presenting professional Release Engineers interesting and challenging opportunities within their specific contexts. In my presentation I will draw on my recollections of “the early days” to point out how a few fundamentals used to be but aren’t so much any more. And, more importantly, how failing to make note of these changes can result in obsolete thinking habits which are now counter-productive. Technical changes are more obvious, but thinking habits more pervasive. As the development environment changes, our approaches to solving Release Engineering problems must be refined. The objectives don’t change all that much, but our strategies must. I will call out a few modern trends in the development environment that impress me for their impact on Release Engineers. Although it is likely that you will have to adopt and modify anything I have to share, it is my goal to share a few observations about how these changes impact your day-to-day job and should impact your preparations for the future. Curt Patrick has been a Release Engineer almost his entire career in Software Development even though he didn’t know that was what it was going to be called when he started doing it twenty-four years ago. For him, now is the most exciting time to be in this field of work, not just because the problems are more challenging, and the tools are more sophisticated, but also because the Netflix Engineering Tools team where he currently works is such an invigorating environment in which to collaborate on solutions. Available Media Read more about Release Engineering: It Ain't What it Was
3:45 p.m.–4:00 p.m.	Monday
Break with Refreshments Second Floor Foyer
4:00 p.m.–4:45 p.m.	Monday
Invited Talk Packaging is the Worst Way to Distribute Software, Except for Everything Else Ryan McKern, Puppet Labs, Inc. Reliably distributing software is a notoriously difficult problem, and almost every operating system and programming language vendor has tried to solve it. This has led to a herd of packaging systems, almost none of which are cross-compatible; some manage system-level software, while others focus on extending their own language (often by trampling on system-level software). And like all competing standards, every packaging system comes with its own sharp corners, dull edges, and hidden idiosyncrasies to deal with along the path to packaging happiness. In an attempt to answer the question "How do I install this software and ensure that its dependencies are fulfilled?", some novel solutions have begun to see popular adoption. But a lot of these newer tools and techniques tread the same ground as their predecessors while overlooking the lessons that were learned along the way. Reliably distributing software is a notoriously difficult problem, and almost every operating system and programming language vendor has tried to solve it. This has led to a herd of packaging systems, almost none of which are cross-compatible; some manage system-level software, while others focus on extending their own language (often by trampling on system-level software). And like all competing standards, every packaging system comes with its own sharp corners, dull edges, and hidden idiosyncrasies to deal with along the path to packaging happiness. In an attempt to answer the question "How do I install this software and ensure that its dependencies are fulfilled?", some novel solutions have begun to see popular adoption. But a lot of these newer tools and techniques tread the same ground as their predecessors while overlooking the lessons that were learned along the way. I'll talk about the state of native packaging systems on some popular platforms (Debian/Ubuntu, RHEL/CentOS/Fedora, and Mac OS X), packaging systems for popular languages (Ruby, Python, Perl, and Node) and the ways that developers are attempting to work around the limitations of these systems. I'll review the reasons that tools like curlbash, FPM, and omnibus packages have become popular by sharing lessons I've learned while working through these systems. While this will be an amusing presentation, I'll show how native packages can address the concerns that have pushed Release Engineers and Developers away. I will also talk about what native packaging systems can learn from the next generation of packaging tools. Ryan McKern has the second best record collection in the Puppet Labs Release Engineering team. Before being caught in the pull of Puppet Labs' gravity, he was a Web Infrastructure Administrator at The MathWorks, Inc. for seven years, where he led the charge to embrace native packaging and configuration management, and to stop building special snowflake servers. Available Media Read more about Packaging is the Worst Way to Distribute Software, Except for Everything Else
4:45 p.m.–5:30 p.m.	Monday
Invited Talk Configuration in a World of Containers Chris Crall, Google, Inc. Kubernetes provides the infrastructure to schedule and manage Docker containers in cloud and datacenter environments. It is a powerful mechanism that simplifies deployment, maintenance and scaling of applications. This session explores the topic of configuration and container management. We will discuss how traditional configuration systems make it easier to configure and manage a cluster, their role in container based application definition, and the evolving nature of declarative application definitions. Kubernetes provides the infrastructure to schedule and manage Docker containers in cloud and datacenter environments. It is a powerful mechanism that simplifies deployment, maintenance and scaling of applications. This session explores the topic of configuration and container management. We will discuss how traditional configuration systems make it easier to configure and manage a cluster, their role in container based application definition, and the evolving nature of declarative application definitions. Chris Crall is in Product Management at Google, working on configuration and deployment for Google Cloud. He is responsible for the tools used by customers to deploy and manage Google Cloud Platform resources. Before joining Google in 2013 Chris worked in Microsoft Azure. Chris' career has covered the spectrum of developer, architect and PM in distributed computing, security and cloud. He's worked for Hewlett-Packard, Commerce One, Microsoft, and now calls Google home. Available Media Read more about Configuration in a World of Containers
5:30 p.m.	Monday
Closing Remarks Dinah McNutt, Google, Inc.

Summit Program

Monday, November 10, 2014

Continental Breakfast

Break with Refreshments

Lunch for Workshop Attendees

Break with Refreshments