LISA tutorials survey the topic, then dive into the specifics of what to do and how to do it. Instructors are well-known experts in their fields, selected for their ability to teach complex subjects. Attend tutorials at LISA16 and take valuable skills back to your company or organization. New topics are woven in with old favorites to create the most comprehensive training program to date.
LISA16 mini tutorials take place Wednesday through Friday as part of the main Conference Program and offer 90-minute overviews of new and emerging technologies. These sessions are included in the registration fee for the Conference Program.
A variety of topics are being covered at LISA16. Use the icons listed below to focus on a key subject area:
Follow the icons throughout the training sessions below. You can combine days of conference program or workshops with training sessions to build the conference that meets your needs. Pick and choose the sessions that best fit your interests—focus on just one topic or mix and match.
Continuing Education Units (CEUs)
USENIX provides Continuing Education Units for a small additional administrative fee. The CEU is a nationally recognized standard unit of measure for continuing education and training and is used by thousands of organizations.
Each full-day tutorial qualifies for 0.6 CEUs. You can request CEU credit by completing the CEU section on the registration form. USENIX provides a certificate for each attendee taking a tutorial for CEU credit. CEUs are not the same as college credits. Consult your employer or school to determine their applicability.
Training Materials
USB Drives
Training materials will be provided to you on an 8GB USB drive. If you'd like to access them during your class, please remember to bring a laptop. There will not be any formally printed materials, but print-on-demand stations will be available.
Full Day
The Linux operating system is commonly used both in the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done "out of the box" by enterprise-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workload on a Linux system.
This class will cover the tools that can be used to monitor and analyze a Linux system, and key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.
Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.
The ability to hone your Linux systems for the specific tasks they need to perform.
- Strategies for performance tuning
- Characterizing your workload's requirements
- Finding bottlenecks
- Tools for measuring system performance
- Memory usage tuning
- Filesystem and storage tuning
- Network tuning
- Latency vs. throughput
- Capacity planning
- Profiling
- Memory cache and TLB tuning
- Application tuning strategies
Theodore Ts'o, Google
Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.
Overview
The Automation Tools Bootcamp is a tutorial for individuals looking for exposure to and usage of new IT automation tools. We will learn about and then use Vagrant, Chef, Packer, Docker, Terraform and Artifactory to deploy a small application in local VMs.
We will cover a progression of tasks, leveraging information from previous sections to deploy a small app that runs identically on your local development machine or on a shared server. Get rid of the “it works for me” mentality when you know your local VM is identical to your co-workers' and your shared environments.
Operations, QA, those who choose to call themselves DevOps, and even managers can come learn.
These automation tools are freely available to engineers, enabling them to safely break local environments until the change in configuration has been perfected. Basic exposure to these tools will allow attendees to return to work with new ways to tackle the problems they face daily.
Vagrant, Chef, Packer, Docker, Terraform, and Artifactory
Tyler Fitch, Chef
Tyler is an Architect in Chef’s Customer Success program, championing successful patterns and delightful experiences in automation to enterprise customers. Prior to working at Chef, he spent a decade as an engineer for Adobe, developing and automating commerce services for adobe.com using a variety of technologies. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.
With this hands-on tutorial, you will develop an understanding for designing, building, and running reliable Internet services at a large scale.
This tutorial is suitable for executives who need to specify and evaluate systems, engineers who build systems, and IT professionals who want to run first-class services built with reliable systems.
You will take back an understanding of how to evaluate system designs, how to specify and build large systems, and how to operate these systems in the real world in a way that will scale as the system grows.
- Designing Reliable Systems
- Building Reliable Systems
- Running Reliable Systems
Salim Virji, Google
Salim Virji is a Site Reliability Engineer at Google. He has worked on infrastructure software, back-end systems, front-end applications, and delightful ways to connect them all. He lives and works in New York City.
Half Day Morning
Gardner Room
This introductory tutorial will start by examining some of the ethical responsibilities that come along with access to other users' data, accounts, and confidential information. We will look at several case studies involving both local and cloud usage. All attendees are strongly encouraged to participate in the discussion. Numerous viewpoints will be considered in order to give students a perspective from which to develop their own reasoned response to ethical challenges.
Anyone who is a system administrator or has access to personal/confidential information, or anyone who manages system administrators or makes policy decisions about computer systems and their users. There are no prerequisites for this class.
After completing this tutorial you will be better prepared and able to resolve ethically questionable situations and will have the means to support your decisions.
- Why it is important to set your ethical standards before it comes up
- Who is impacted by "expectations of ethical conduct"
- Why this isn't just an expectation of system administrators
- Implicit expectations of ethical behavior
- Ethics and The Cloud
- Coercion to violate ethics
- Well-intentioned violations of privacy
- Collection, retention, and protection of personal data
- Management directives vs. friendships
- Software piracy/copying in a company, group, or department
Lee Damon, University of Washington
Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04 and co-chaired CasITConf '11, '13, and '14.
Nicole Forsgren, DORA
Fairfax Room
This tutorial is a course in statistics with a specific focus on system administrators and the types of data they face. We assume little prior knowledge of statistics and cover the most common concepts in descriptive statistics and apply them to data taken from real life examples. Our aim is to provide insight into what methods provide good interpretation of data such as distributions, probability and formulating basic statements about the properties of observed data.
The first part will cover descriptive statistics for single datasets, including mean, median, mode, range and distributions. When discussing distributions, we will cover probabilities through percentiles (e.g., a normal distribution is very uncommon in ops data). This session will use a prepared dataset and spreadsheet (LibreOffice or OpenOffice, because it works on all platforms). We have data from the number of players from an online game over a 6-month period. In this exercise, we will analyze the distribution and try to make statements like, “What is the likelihood that we see more than 27,000 simultaneous players?” One of the lessons is that the top 5% in the distribution counts for almost a doubling in players, which is interesting. We then extend the discussion to include organizational implications: Imagine if your job is to buy resources for a service like this, and you have to double your rig in order to cope with something that is only 5% likely to happen? How would you explain it in a meeting?
The second part will discuss comparisons using two common methods that can be calculated in a spreadsheet: correlations and regressions. Correlations will be used as a tool to identify interesting relationships among data; ranked correlation may be considered for two data sets that have the same «flow» but on separate ranges (e.g., the correlation between web requests and database requests). Regression can also be used to identify relationships. For example, using a regression plot between two variables, one could identify bottlenecks by comparing the load of two tiers (db tier vs web tier). In a scalable system, we would expect a nice 45-degree linear relationship between the two. However, if the database tier struggles before the web tier, we would see the linear approximation slope «upward» (if the db load is on the y axis) as the load increases.
Throughout we will have a focus on takeaways and trying to couple the different statistical methods with the type of answers they can provide, like: “Can the average of a dataset explain the outer limits of my data?”. It is easy to fall off the wagon with a topic like statistics. We are aware of this risk and will utilize active learning techniques such as socrative and kahoot to engage the audience and make them participate more.
Sysadmins who are faced with data overload and wish they had some knowledge of how statistics can be used to make more sense of it. We assume little prior knowledge of statistics, but a basic mathematical proficiency is recommended.
- A fundamental understanding of how descriptive statistics can help provide additional insight on the data in the sysadmin world and that will allow for further self-study on statistics.
- A basic set of statistical approaches that can be used to identify fundamental properties of the data they see in their own environments, and identify patterns in that data.
- Learn how to make accurate and clear statements about their metrics that are valuable to the organization.
- Descriptive statistics for single datasets, including: mean, median, mode, range, and distributions
- Basic analysis of distributions and probabilities using percentiles typically seen in ops
- Interpretation of analyses to include team and business implications
- Regression analysis to suggest predictive relationships, with an emphasis on interpretation and implications
- Correlation analysis and broad pattern detection (if time allows)
Kyrre Begnum, Oslo University College of Applied Sciences
Kyrre Begnum works as an Associate Professor at Oslo and Akershus University College of Applied Sciences where he teaches sysadmin courses at the MSc and BSc levels. Kyrre holds a PhD from the University of Oslo with a focus on understanding the behavior of large systems. He has experience with large scale virtual machine management, cloud architectures and developing sysadmin tools. His research focus is on practical and understandable approaches that bring advanced models to real life scenarios.
Nicole Forsgren, DORA
Dr. Nicole Forsgren is an IT impacts expert who shows leaders and practitioners how to unlock the potential of technology change in their organizations. Best known for her work with tech professionals and as the lead investigator on the State of DevOps Reports, she is CEO and Chief Scientist at DORA (DevOps Research and Assessment) and an Academic Partner at Clemson University. In a previous life, she was a professor, sysadmin, and hardware performance analyst.
Half Day Afternoon
Fairfax Room
People think of "on call” as responding to a pager that beeps because of an outage. In this class, you will learn how to run an on-call system that improves uptime and reduces how often you are paged. We will start with a monitoring philosophy that prevent outages. Then we will discuss how to construct an on-call schedule—possibly in more detail than you've cared about before—but, as a result, it will be more fair and less stressful. We'll discuss how to conduct “fire drills” and “game day exercises” that create antifragile systems. Lastly, we'll discuss how to conduct a postmortem exercise that promotes better communication and prevents future problems.
Managers or Sysadmins with oncall responsibility
- Knowledge that makes being on call more fair and less stressful
- Strategies for using monitoring to improve uptime and reliability
- Team-training techniques such as "fire drills" and "game day exercises"
- How to conduct better postmortems/learning retrospectives
- Why your monitoring strategy is broken and how to fix it
- Building a more fair on-call schedule
- Monitoring to detect outages vs. monitoring to improve reliability
- Alert review strategies
- Conducting “fire drills” and “game day exercises”
- "Blameless postmortem documents"
Thomas Limoncelli, StackOverflow.com
Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is and he tweets @YesThatTom. He lives in New Jersey.
Gardner Room
Sysadmins freely acknowledge how important documentation is to their daily lives, and in the same sentence will loudly complain that they don’t have time to produce documentation. This class is about how to produce effective, useful and timely documentation as part of your normal sysadmin activities. Particular emphasis is placed on documentation as a time-saving tool rather than a workload imposition.
System administrators of all types and levels who need to produce documentation for the systems they manage, or who want to improve their documentation skills. Documentation can be the difference that turns you from a good sysadmin to a great sysadmin!
- The skills to improve personal and team documentation quality
- A solid understanding of how to establish and maintain effective documentation practices
- Why system administrators need to document
- Documentation as part of your daily workflow
- Targeting your audience
- Common mistakes made in documentation
- Tools to assist the documentation process (including effective use of wikis)
Half Day Morning
Lex Neva, Heroku
Fairfax Room
Your site’s back up, you’re back in business. Do you have a way to make sure that problem doesn’t happen again? And if you do, do you like how it works?
Heroku uses a blameless retrospective process to understand and learn from our operational incidents. We’ve recently released the templates and documentation we use in this process, but experience has taught us that facilitating a retrospective is a skill that’s best taught person to person.
This tutorial will take you through a retrospective based on the internal and external communications of a real Heroku operational incident. We’ve designed it to help you experience first-hand the relaxed, collaborative space that we achieve in our best retrospectives. We’ll practice tactics like active listening, redirecting blame, and reframing conversations. Along the way, we’ll discuss how we developed this process, what issues we were trying to solve, and how we’re still iterating on it.
Managers, tech leads, and anyone interested in retrospective culture and iterating on processes.
Attendees will have the materials and first-hand experience to advocate for (or to begin) an incident retrospective process at their workplace, or to improve a process they might already be using.
- Why run a retrospective
- Goal of a retrospective
- Blameless retrospectives
- Facilitating: redirecting blame, reframing, drawing people out
- How to structure a retrospective
- Preparing for a retrospective
- Five “why”s/infinite “how”s
- Human error
- Achieving follow-through on remediation items
Courtney Eckhardt, Heroku
Courtney comes from a background in customer support and internet anti-abuse policy. She combines this human-focused experience with the principle of Conway’s Law and the work of Kathy Sierra and Don Norman into a wide-reaching and humane concept of operational reliability.
Lex Neva, Heroku
Lex Neva is probably not a super-villain. He has six years of experience keeping large services running, including Linden Lab's Second Life, DeviantArt.com, and his current position as a Heroku SRE. While originally trained in computer science, he’s found that he most enjoys applying his software engineering skills to operations. A veteran of many large incidents, he has strong opinions on incident response, on-call sustainability, and reliable infrastructure design, and he currently runs SRE Weekly (sreweekly.com).
It's 2016 and at this point why would anyone care about an init system? Well, apparently not only is process management essential to the operating system, all the hype around things like containers and resource management are making this topic sexy. This session will be a hands-on, interactive look at the architecture, capabilities, and administrative how-tos of systemd. Anyone who's new to systemd or looking to dig deeper into some of the advanced features should attend. Please bring a laptop with a virtual machine running a distribution of your choice that uses systemd.
Linux system administrators, package maintainers and developers who are transitioning to systemd, or who are considering doing so.
Understanding of how systemd works, where to find the configuration files, and how to maintain them.
- The basic principles of systemd
- systemd's major components
- Anatomy of a systemd unit file
- Understanding and optimizing the boot sequence
- Improved system logging with the journal
- Resource management via systemd's cgroups interface
- Simple security management with systemd and the kernel's capabilities
- systemd, containers, and virtualization
Ben Breard, Red Hat
Ben Breard is the Technology Product Manager for Linux Containers at Red Hat where he focuses on driving the container roadmap, RHEL Atomic Host, and evangelizing open source technology in his free time. Previously he was a Solutions Architect and and worked closely with key customers around cloud/systems management, virtualization, and all things RHEL. Ben joined Red Hat in 2010 and currently works out of Dallas, Texas.
Lee Damon, University of Washington
Gardner Room
Systems Administrators are expected to be intelligent, dedicated, and professional experts in our field. Yet when compared to other professions of similar education, we often do not receive credit for our efforts and receive less respect from our fellow workers.
This problem doesn’t just affect our personal well-being; businesses make poorer decisions when input from technical people is disregarded or overlooked. As professionals, we are all expected to step up and defend ourselves, our teams, and our projects. Being able to communicate meaningfully and accurately is critical to our success.
This tutorial will provide practical techniques for both in-person and written interpersonal challenges. Difficult conversations are a part of life as well as business and we need to develop the tools for dealing with them. We will review materials from several sources including our own experiences and will have practical exercises to work through to give attendees a strong starting point for their own difficult communication challenges.
IT Professionals and anyone who must deal with difficult people under stressful conditions.
- How to deal effectively with verbal and written conflict
- How to identify and stop verbal and written abuse
- How to maximize your chances to succeed in difficult conversations
- E.I.Q. and how to use it
- Satir Modes of Conversation
- Verbal Jujitsu
- Lifescripts
John H. Nyhuis
John H. Nyhuis is an Infrastructure Engineer, serving as IT Director at the Altius Institute for Biomedical Sciences. He is the culmination of 20 years of experience in Infrastructure Engineering and IT Management within industry, academic, and medical environments, including extensive experience with scalable system architecture, implementation, optimization, and deployment:
- Leadership: Experienced building consensus in diverse highly environments. Project Management (Scrum and LEAN), Risk Management, IT audits and remediation, HIPAA, FIRPA
- Management: Expense Controls, Budgeting, Employees Management, Project Proposals, Process Improvement
- Vendor Relations: Contract Negotiation, Fundraising / Equipment Donations
- Architecture/Design: Cloud Computing, Virtualization, Automation, Scalability, Root Cause Analysis
- Deployments: Massively Parallel Implementations, Global Deployments, Code Management, Release Testing
In his free time, John serves as an Economic Development Commissioner for the City of Lake Forest Park, in the great state of Washington.
Lee Damon, University of Washington
Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04 and co-chaired CasITConf '11, '13, and '14.
Full Day
Constitution Ballroom B
Insufficient knowledge of operating system internals is my most common reason for passing on an interview candidate. Anyone can learn that you run tool X to fix problem Y. But what happens when there is no tool X, or when you can't even accurately pinpoint the root cause of why "it's sometimes slow."
This will be a no-holds-barred, fury-road-paced review of all major parts of modern operating systems with specific emphasis on what's important for system administrators. It will provide just enough of an academic focus to bridge the "whys" so you can make better use of fiddling with the "whats" on a day-to-day basis.
You will learn about process management, scheduling, file system architecture and internals, interrupt management, the mysteries of the MMU and TLB, belady's anomaly, page replacement algorithms, and hopefully a bit of networking. In a nutshell, we'll cover 16 weeks of college-level material in a few hours.
Buckle up.
- All admins who did not take the Comp-Sci academic route and never had a course in OS internals
- Inexperienced admins whose coursework or training didn't include the kind of OS internals that they should (modern OS courses have become a shadow of their former selves and commonly require writing no OS code)
- More experienced admins who haven't really had to address these sorts of issues on a regular basis who probably know a lot about some individual aspects but could benefit from everything being put into a broader context
Attendees will gain a deeper understanding of what goes on inside the kernel and the areas where things can go wrong. We'll explore how little the concept of "system load" captures about the true system state, and attendees will be prepared to improve both their operational response methodologies as well as their monitoring goals.
Morning:
- Scheduling and Process Management
- Memory Management and the MMU
- Virtualization and its impact on these
Afternoon:
- File System Architecture (for sysadmins covering ext (2, 3, 4), NTFS, and ZFS)
- Storage layer performance, disks, RAID, and SANs
- The impact of virtualization on these
Caskey L. Dickson, Microsoft Corporation
Caskey L. Dickson is a Site Reliability Engineer at Microsoft where he is part of the leadership team reinventing operations at Azure. Before that he was at Google where he worked as an SRE/SWE, writing and maintaining monitoring services that operate at "Google scale" as well as business intelligence pipelines. He has worked in online services since 1995 when he turned up his first web server and has been online ever since. Before working at Google, he was a senior developer at Symantec, wrote software for various Internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.
9:30 am–12:30 pm: Commonwealth Ballroom
1:30 pm–5:00 pm: Back Bay Ballroom D (LISA Lab)
The tutorial will cover topics in Software Defined Networking (SDN) in a presentation format that is oriented towards network and system administrators. SDN separates the network's control plane (the software that controls how networks from its data plane (the routers and switches in the network that forward packets).
This course will cover the aspects of Software Defined Networking that relate most closely to network operations. We will divide the course into four parts:
- Overview and motivation of SDN
- Commercial operational SDN controllers (Ryu, ODL) and switch capabilities
- Network virtualization technologies
- Network operations use cases (including SDN for the wide area, data centers, home networks, and wireless)
The after lunch portion of this class will be held in the LISA Lab.
Beginner and Intermediate Virtual Infrastructure Administrators
Attendees will take back knowledge about SDN that will help them evaluate whether it is an appropriate technology to apply in their own networks.
Attendees will better understand what SDN is, the types of problems that it can (and cannot) solve, the capabilities of current software controller platforms, and the capabilities (and shortcomings) of existing hardware switches.
The course will also include "war stories" from successful (and stunted) SDN deployments that will help attendees better evaluate the suitability of SDN for solving their own network management problems.
- Overview and motivation of SDN
- Commercial operational SDN controllers (Ryu, ODL) and switch capabilities
- Ryu
- Open Daylight
- An overview of hardware switch capabilities
- Network virtualization technologies
- Nicira NSX
- FlowVisor
- Network operations use cases (including SDN for the wide area, data centers, home networks, and wireless)
- SDX: Software Defined Internet Exchange Points
- SD-WAN: SDN in Wide Area Networks
- SDN in home networks
- SDN in data centers
Nick Feamster, Princeton University
Nick Feamster is a professor in the Computer Science Department at Princeton University and the Acting Director of the Princeton University Center for Information Technology Policy (CITP). Before joining the faculty at Princeton, he was a professor in the School of Computer Science at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, with a focus on network operations, network security, and censorship-resistant communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, the IRTF Applied Networking Research Prize, and award papers at the SIGCOMM Internet Measurement Conference (measuring Web performance bottlenecks), SIGCOMM (network-level behavior of spammers), the NSDI conference (fault detection in router configuration), USENIX Security (circumventing web censorship using Infranet), and USENIX Security (web cookie analysis).
Half Day Afternoon
Commonwealth Room
Data analysis is not just about discovery, it’s about communication. Good communication tells stories. Savvy system administrators provide their management with the background needed to maintain operations, manage budgets, support users, and provide their coworkers with the insights needed to keep their systems solid.
The R programming language and ecosystem constitute a rich tool set for performing system analyses, for communicating the results and importance of those analyses, and for automating the process with reproducible and repeatable results. This brief introduction to R and its ecosystem will provide a walk along the mainline—coming up to speed on R, accessing data, analyzing data, and getting the message out.
This tutorial is designed to:
- motivate you to pick up R
- demonstrate useful techniques using R
- illustrate ways to simplify your life by automating data analysis and reporting
In-class demonstrations will be augmented with hands-on opportunities during the workshop. Additional exercises and data sets that students can explore following the workshop will be provided. If you plan on working on the exercises, install R and (optionally) R Studio.
System administrators who are awash in operational data and want to do a more efficient job of understanding their data and communicating their findings. Facility with programming and knowledge of basic descriptive statistics is assumed. Prior knowledge of R is not required.
- Acquaintance with R, R packages, and R Studio
- Understanding where R fits into the system administrator’s tool set
- Familiarity with basic R data-manipulation techniques
- Motivation to learn or improve your R skills
- Next steps in learning and mastering R
- Introduction to the R ecosystem
- R as a language
- Basic programming in R
- The data analysis workflow
- Reading and writing data from files and pipes
- Data frames and data frame manipulations
- Exploratory analysis
- Using the ggplot2 package for graphing
- Other useful R packages.
Examples will be based on situations encountered during routine system operations.
Robert A. Ballance, Ph.D.
Dr. Robert Ballance honed his R-programming skills while managing large-scale High-Performance Computing systems for Sandia National Laboratories. While at Sandia, he developed several R packages used internally for system analysis and reporting. Prior to joining Sandia in 2003, Dr. Ballance managed systems at the University of New Mexico High Performance Computing Center. He has consulted, taught, and developed software, including R packages, PERL applications, C and C++ compilers, programming tools, Internet software, and Unix device drivers. He is a member of USENIX, the ACM, the IEEE Computer Society, the Internet Society, and the Long Now Foundation. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. Bob received his Ph.D. in Computer Science from U.C. Berkeley in 1989. He is currently serving as a White House Presidential Innovation Fellow.
Fairfax Room
Whether you are a sysadmin, dev, or web ops, time management can be more difficult than any technology issue. This class is for new and junior system admins that have found themselves over their head, overloaded, and looking for a better way to survive the tech world.
This tutorial presents fundamental techniques for eliminating interruptions and distractions so you have more time for projects, prioritization techniques so the projects you do work on have the most impact, plus "The Cycle System," which is the easiest and most effective way to juggle all your tasks without dropping any.
Sysadmins, devs, operations, and their managers
By the end of this class, you will be able to schedule and prioritize your work (rather than be interruption-driven), have perfect follow-through (never forget a request), and limit your work-time to 40 hours a week (have a life).
- Why typical “time management” strategies don’t work for sysadmins
- What makes “to-do” lists fail, and how to make them work
- How to eliminate “I forgot” from your vocabulary
- How to manage interruptions: preventing them, managing the ones you get
- Delegating to coworkers without them knowing
- Achieving perfect follow-through
- The Cycle System for recording and processing to-do lists
- Prioritization techniques
- Task grouping: batching, sharding, and multitasking
- Handling situations like a big outage disrupting your perfectly planned day
Thomas Limoncelli, StackOverflow.com
Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is and he tweets @YesThatTom. He lives in New Jersey.
Cody Chapman, Heraflux Technologies
David Klee, Heraflux Technologies
Constitution Ballroom A
Downloads:
M5 Slides (PDF) | M5 Slides (PowerPoint)
Not very long ago, the very idea of virtualization production mission-supporting enterprise applications can be said to be so career-threatening that only the brave dare entertain it for longer than a few seconds. Fast forward to now, virtualization is so pervasive and well-accepted that the inverse is true—"Virtualize First" is now a standard corporate mandate in large enterprises, and no modern commercial application is exempted.
Sadly, embracing virtualization has turned out not be the panacea for everything that ails an enterprise. In fact, virtualization often contributes to sub-optimal performance, availability, recoverability, and agility of many applications in the enterprise—with lots of frustrations, heartburn, reduced productivity, and yes, interrupted personal lives. In a rush to be a part of the "cool crowd," many enterprises fail to identify and account for the intricacies and requirements of the virtualization platform, relegating such considerations to the secondary or tertiary tiers on the "Due Diligence" scale.
If you have have adopted virtualization as a platform for your mission-critical applications, or if you are in the process of doing so, please be sure to attend this tutorial. The tutorial will provide a comprehensive and detailed knowledge transfer that enables you to avoid the common pitfalls encountered in a VMware vSphere virtualization project infrastructure. We will discuss and explain the considerations for successfully running your mission-critical applications in a vSphere-based infrastructure without loss of performance, availability, recovery, and resilience. The tutorial will go beyond the standard slide-ware and present an actual demonstration of the effects of certain configuration optimization strategies on the overall condition of the virtualized applications and the virtual infrastructure as a whole.
- Infrastructure, Solution and Enterprise Architects
- Virtual Infrastructure and Applications Administrators
- Network Administrators
- IT Operators
The tutorial will be interactive, encouraging questions from participants—so please come in with your own unique and specific questions. The tutorial will provide you with tips and tricks drawn directly from the most current VMware guidance, recommendations, and knowledge-based references, as well as from real-life customer situations.
- Virtualization concepts
- Virtualization stack
- Hardware abstraction and the relationship and inter-dependencies between the physical and virtual components
- Pooling and sharing resources in a virtual environment
- Common assumptions that lead to performance degradation for virtualized applications
- Configuration optimization that enhances performance
- Availability and resilience within a VMware vSphere virtual infrastructure
Deji Akomolafe, Microsoft Applications Virtualization Lead, VMware
Deji Akomolafe (a CTO Ambassador and Staff Solutions Architect within VMware's Global Field and Partner Readiness Group) specializes in the virtualization of Microsoft Business Critical Applications on the VMware's vSphere platform. Deji is a regular speaker at many industry-leading technical conferences and workshops (including VMworld, SQL Saturday, EMCWorld, and Partners Exchange), presenting technical subject matters related to virtualization and providing technical guidance to help clients enhance their expertise and ability to optimally virtualize and operate their critical applications.
Cody Chapman, Heraflux Technologies
Cody Chapman is a Solutions Architect with Heraflux Technologies. His areas of expertise are virtualization, cloud, storage, performance, datacenter architecture, risk mitigation through high availability and disaster recovery, and performing technical exorcisms. He has worked on systems large and small in a wide variety of industries. He is actively working to automate every facet of datacenter and database management. You can read his blog at heraflux.com, and reach him on Twitter at @codyrchapman.
David Klee, Heraflux Technologies
David Klee is a Microsoft MVP and VMware vExpert with over seventeen years of IT experience. David is the Founder of Heraflux Technologies, a consultancy focused on data virtualization and performance tuning, datacenter architecture, and business process improvements. You can read his blog at davidklee.net and reach him on Twitter at @kleegeek.
Full Day
AJ Bowen, Convox
Constitution Ballroom A
Docker is an open platform to build, ship, and run any application, anywhere. In this hands-on tutorial, you will learn advanced Docker concepts, and see how to deploy and scale applications using Docker Swarm clustering abilities and other open source tools of the Docker ecosystem.
This tutorial is living material: it is delivered at least once a month in public sessions all around the U.S. and Europe. Since the Docker platform in general, and Docker Swarm in particular, evolve rapidly, this tutorial evolves as well, following closely the releases of the various components of the Docker ecosystem: Engine, Compose, Swarm, Machine.
Docker users who want production-grade container deployments.
You should be familiar with Docker and basic Docker commands (docker run, docker ps, and docker stop) as well as the Dockerfile syntax (at least RUN, CMD, and EXPOSE commands). Ideally, you should have experimented with Compose. If you have limited Docker knowledge but consider yourself a quick learner, don't hesitate to attend: there will be numerous examples and demos, and you will be able to test them out on your own Docker cluster!
After this tutorial, you will know how to deploy applications to production with Docker and containers. We will tackle lots of frequently asked questions in the Docker ecosystem: how to manage the lifecycle of container images, how to implement service discovery across Docker clusters, how to load balance traffic on scaled applications, how to perform security upgrades, and more.
Containers, Docker, Orchestration, Scheduling, and Service Discovery
Jérôme Petazzoni, Docker Inc.
Jerome works at Docker, where he helps others to containerize all the things. He was part of the team that built, scaled, and operated the dotCloud PAAS, before it became Docker. When annoyed he threatens to replace things with a very small shell script.
Constitution Ballroom B
The course is a direct response to the many requests I have gotten for “more tools”, and so I have written it with an eye to meeting that goal. The class will be taught through a “secure and defend” plan where we will take the majority of class to learn offensive and defensive tools and break into teams and work to secure and setup monitoring for provided on-site test environments. The second phase of our class will involve students coming to LISALabs to use the attack tools and defend their environments from their peers. There would be scheduled times for either the teams independently or in groups to deal with created 'incidents'.
This will be a coordinated event that I will support both in my role as instructor and as a member of LISABuild and Labs, and the second phase will be running throughout the Conference. I will have some form of visual score keeping in Labs where people can walk in and see what's going on with the event. At the end, I will provide prizes and/or accolades for the best teams.
Participants should be beginning to advanced system administrators of any stripe with an interest in IT Security and a desire to learn how to attack and defend against potential threats in their environments. It is required that participants have experience with *nix command line, basic networking and an understanding of virtual environments.
Knowledge of how to evaluate an environment, find vulnerabilities and mitigate them, improve security monitoring and the detect and defend attacks. Students will learn how to use a working security toolkit which can be directly applied to students' home environment.
- basic security concepts and architectural design
- how to scope and scan an environment using readily available tools and general sysadmin knowledge.
- how to identify, understand, remediate vulnerabilities, and verify the solution
- how to monitor and react to incursions
Branson Matheson, Cisco Systems, Inc.
Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."
Back Bay Ballroom D (LISA Lab)
In this hands-on-hardware workshop, we explore the boundaries of traditional systems and where they converge with networks of billions of embedded devices. Starting with the theory of the Internet of Things, related data transports, and common protocols, we create embedded systems using a set of loaned hardware. Focusing on 802.3, 802.11, and Bluetooth Smart transports, we implement our own IoT edge routers serving our own network of sensor and actuator embedded computers. We will implement a simple messaging application using MQTT or AMQP, and round out the training by integrating our piecemeal solutions into a full fledged IoT system.
Intermediate hardware or network engineers benefit most from this workshop.
Attendees take to their work a broad understanding of what will power the next generation of embedded devices and how they interface with traditional large Internet systems.
Device classes
- Whirlwind tour of hardware
- Vendor market trends
- Small manufacturing
Transports
- Copper 802.3
- Wireless 802.11
- Bluetooth 1-3
- Bluetooth Smart
- Zigbee and ANT+
- Z-Wave
- 6LoWPAN
- LoRa and SigFox
Protocols
- Legacy
- MQTT
- AMQP
- CoAP
- ZeroMQ
Michael Schloh, Europalab Networks
Michael Schloh von Bennewitz is a computer scientist specializing in network engineering, embedded design, and mobile platform development. Responsible for research, development, and maintenance of packages in several community software repositories, he actively contributes to the Opensource development community.
Michael speaks four languages fluently and presents at technical events every year. He teaches workshops exclusively on Internet of Things and Embedded Computing technology, traveling with a mobile laboratory of over 300 sensors, actuators, and computer devices.
Michael's IoT knowledge profits from years of work at telecoms and relationships with industry leaders. He is a Intel innovator, Samsung partner, and Mozilla committer with the mandate to promote IoT technology.
Additional information is found at http://michael.schloh.com/
Half Day Morning
Speedy Change Control is not an oxymoron. This tutorial will provide practical, actionable steps to streamline and speed up change control at your organization without increasing risks. In "The Visible Ops Handbook", authors Behr, Kim and Spafford identify a culture of change management as common to high-performing IT groups: “change management does not slow things down in these organizations.” This tutorial will help anyone wishing to implement phase one of the Visible Ops Handbook: “Stabilize The Patient” and “Modify First Response”. While I draw heavily on IT infrastructure Library (ITIL) guidance, much of this is common sense good practice based on lessons learned from past success and failure. No special ticketing system, tools or ITIL knowledge are necessary. I am a certified ITIL Expert with over five years of experience designing, improving and managing a successful change management process at an audited technology company delivering public registry and DNS services running on complex technologies across international data centers.
Technical people and managers who participate in a change management process, or who would like to build one but are afraid that doing so will slow them down.
- Templates for change request types and procedures
- Templates for creating standard operating procedures
- ITIL-aligned talking points for making your case for these process improvements
- Better understanding of change management and process in general
- Different change types
- Assessing risks and potential impact
- Defining change authorities specific for each change type
- Metrics for measuring change process performance against goals
- Release and deployment management
- Devops
- Continuous delivery
Jeanne Schock
Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About six years ago she transitioned to a role building, managing, and promoting processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience with such processes as Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter, most recently speaking to D.C. chapters of the American Society for Quality, Software Special Interest Group and IEEE Computer Society.
Sasha Goldshtein, Sela Group
Commonwealth Room
This tutorial will give you experience with two powerful Linux performance analysis tools: perf and BPF. Learn how to profile CPU usage, create flame graphs, trace TCP connections, investigate file system latency, explore software internals, and more.
perf_events, aka "perf" after its front-end, is a Linux mainline tool for profiling and tracing. We will summarize some of its most useful one-liners, and discuss real world challenges and solutions for using it with JIT runtimes (Java, Node.js), and in cloud environments.
Enhanced BPF (Berkeley Packet Filter) is a new in-kernel programmable runtime with a variety of uses, including extending Linux static and dynamic tracing capabilities. We'll primarily focus on the BPF Compiler Collection (bcc) front-end for BPF, which provides a toolkit of many ready-to-run analysis tools, including DTrace classics like execsnoop, opensnoop, and biolatency, and new tools including memleak, trace, and argdist. bcc also provides Python and C interfaces for writing your own powerful dynamic tracing-based tools, and we'll show how that can be done.
We will spend more time exploring the new world of BPF and its features that were made available in the Linux 4.4 release. Enhanced BPF has become a recent hotspot for systems innovation, helping create other new technologies including bcc, kernel connection multiplexer (KCM), and eXpress Data Path (XDP), and is being developed by engineers from many companies, including Facebook, PLUMGrid, Netflix, Cisco, Huawei, Github, SELA, and Intel. Join this workshop to get up to speed with BPF for tracing, try some hands-on labs, and gain real experience with the technology from contributor and performance expert Brendan Gregg.
- perf
- Enhanced Berkeley Packet Filter (BPF)
- BPF Compiler Collection
- Python and C interfaces to BPF
Brendan Gregg, Netflix
Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, evaluation, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration. He was previously a performance lead and kernel engineer at Sun Microsystems, where he developed the ZFS L2ARC and led performance investigations. He has also created numerous performance analysis tools, which have been included in multiple operating systems. His recent work includes developing methodologies and visualizations for performance analysis.
Sasha Goldshtein, Sela Group
Sasha Goldshtein is the CTO of Sela Group, a Microsoft C# MVP and Azure MRS, a Pluralsight author, and an international consultant and trainer. Sasha is a book author, a prolific blogger and open source contributor, and author of numerous training courses including .NET Debugging, .NET Performance, Android Application Development, and Modern C++. His consulting work revolves mainly around distributed architecture, production debugging and performance diagnostics, and mobile app development.
Half Day Afternoon
Commonwealth Room
Go is a relatively young language that was built with systems programming in mind. It's compact yet powerful grammar aids the swift development of efficient tools for everyday work. Despite its young age, it's already taken a prominent position for system tools. This hands-on tutorial focuses on reading and writing in the Go programming language.
Anyone with a little bit of programming experience that wants to pick up Go
The ability to read and write Go
- Control Structures
- Types
- Functions
- Goroutines
- Channels
Chris McEniry, Sony Interactive Entertainment
Chris "Mac" McEniry is a practicing sysadmin and architect responsible for running a large E-commerce and gaming service. He's been working and developing in an operational capacity for 15+ years. In his free time, he builds tools and thinks about efficiency.
Fairfax Room
All too often, technical teams spend so much time firefighting that they can’t stop to identify and eliminate the problems—the underlying causes—of incidents. Incident resolution is about taking care of the customer—restoring a service to normal levels of operation ASAP. Without a process in place to turn the problem into a known error, the root causes of the incident remain, resulting in reoccurrences of the incident.
The goals of the Problem Management Process are to prevent reoccurrence of incidents, prevent problems and resulting incidents from happening, and minimize the impact of incidents and problems that cannot be prevented. Most technical people already have experience in root cause analysis and problem resolution. This tutorial will help them be measurably more consistent, mature and effective in their practices. Using IT Infrastructure Library (ITIL) best practices, this tutorial will deliver step-by-step instructions on building and managing a problem process. I am a certified ITIL Expert. I designed, implemented and then managed a problem process for four years at a registry and DNS service provider with complex technologies across international datacenters.
Technical people and managers responsible for the support of live production services. This is an operational support process that can be put in place from the bottom up. The more teams involved in the process—DBAs, system administrators, developers, helpdesk—the greater the scope of problems that can be addressed.
- A step-by-step guide for building and implementing a problem process and the reasons behind each step
- A process template with examples that can be easily adapted to fit your organization’s current and future needs
- Instructions on setting up a Known Error Database and communicating work arounds with impacted support teams
- Guidance for getting buy-in from peers and managers
- Incident response vs problem resolution
- Root cause analysis techniques
- Making decisions that are aligned with business objectives
- Getting buy-in from teammates, colleagues and managers
- Proactive problem management
- After-action reviews as a tool
- “Root cause” vs. multiple causes
Jeanne Schock
Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About six years ago she transitioned to a role building, managing, and promoting processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience with such processes as Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter, most recently speaking to D.C. chapters of the American Society for Quality, Software Special Interest Group and IEEE Computer Society.