Over 600 Million Members and Hundreds of Micro Services: How We Scaled Our Monitoring System to Keep up

Thursday, June 13, 2019 - 2:00 pm3:00 pm

Mahak Lamba, LinkedIn

Abstract: 

Behind our platform serving over 600 million users, there is an ever scaling infrastructure comprising of hundreds of servers running in different geographic locations and hosting a multitude of services including database services like Espresso, streaming services like Kafka, offline jobs and ML ranking services, written across various languages like Java, Python, and Go. To keep up with these ever growing member and infrastructure needs, we need to scale our monitoring systems accordingly, in order to efficiently deliver a seamless user experience. But is this possible using the existing tools and technologies that exist?

This talk will focus on the scale that we operate at, the challenges we faced while scaling our monitoring system and a 360-degree view of how we monitor our microservice architecture.

Mahak Lamba, LinkedIn

I joined LinkedIn as an SRE Intern while I was graduating in Computer Science. After completing my graduation in 2017, I joined LinkedIn as a Site Reliability Engineer in Production-SRE team, majorly responsible for building applications and tools for efficient troubleshooting, issue detection & correlation.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {233251,
author = {Mahak Lamba},
title = {Over 600 Million Members and Hundreds of Micro Services: How We Scaled Our Monitoring System to Keep up},
year = {2019},
address = {Singapore},
publisher = {USENIX Association},
month = jun
}

Presentation Video