Serverless Data Processing and Machine Learning

Monday, October 29, 2018 - 11:45 am12:30 pm

Sunil Mallya, AWS

Abstract: 

Serverless computing reduces infrastructure complexity, provides fine grained billing and easy scalability. Setting up concurrent data processing infrastructure pipelines that supports many users is a complex task, moreover utilization, cost and performance are hard to tune for these pipelines. Machine Learning (ML) workloads are on the uptick, and likes of Apache Spark aim to provide an end to end data to ML story, but run in to the same complexities previously mentioned. These aren't two disjoint data and ML workflows, but share a lot in common.

In this talk, I will present a serverless data and machine learning pipeline that includes a MapReduce framework built on using Amazon S3 and AWS Lambda. We'll see how it can help alleviate issues like concurrent processing, cost and scaling. I will also showcase how machine learning algorithms like K-Means clustering can be built on top of this framework, exploiting the inherent distributed architecture. We'll then discuss the benefits and challenges of the framework with a focus on production deployments for ML models.

Sunil Mallya, AWS

Sunil is a Sr. AI Solutions Architect focused on Deep Learning in the Machine Learning Lab team at AWS working with customers in various industry verticals. Prior to that, he co-founded the neuroscience and machine learning based image analysis and video thumbnail recommendation company Neon labs. He’s also worked on building large scale low latency systems at Zynga and has an acute passion for Serverless computing. He hold a master’s degree in computer science from Brown University.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {221845,
author = {Sunil Mallya},
title = {Serverless Data Processing and Machine Learning},
year = {2018},
address = {Nashville, TN},
publisher = {USENIX Association},
month = oct
}

Presentation Video 

Presentation Audio