SparkPost: The Day the DNS Died

Friday, 31 August, 2018 - 14:0014:50

Jeremy Blosser, SparkPost

Abstract: 

More than 30% of the world's non-spam email is sent using SparkPost's technology, and our cloud service sends over 15 billion messages per month. Deploying that service on AWS has provided all the expected cloud benefits of flexibility and scalability, but also unique challenges due to email's unique profile and needs.

Our DNS needs are particularly extreme. Our infrastructure currently has to support 8,000 DNS queries per second. We have experienced several issues deploying a service model that can meet this need, and a major DNS-related outage in May of 2017 caused significant pain for our customers and sent us back to the drawing board once again. We recently completed a ground-up DNS service redesign that includes dedicated VPCs with optimized security groups and ACLs, distribution across tiers and availability zones, resolver tuning and custom configurations, and multiple local caching resolvers per instance.

In this talk, we will discuss our history addressing this challenge and lessons learned, the May outage event itself, and our current architecture's design and results. Attendees will gain an understanding of what it takes to host a robust DNS service in AWS at a scale beyond what is currently natively supported by AWS' resolver services.

Jeremy Blosser, SparkPost

Jeremy Blosser has worked in systems administration and engineering for 20 years, and most of that time has included a focus on reliably delivering email and other traffic at scale. He is currently the Principal Operations Engineer at SparkPost, responsible for technical architecture oversight and keeping the cloud service operating and healthy. He lives in Texas with his wife and five kids.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {218925,
author = {Jeremy Blosser},
title = {{SparkPost}: The Day the {DNS} Died},
booktitle = {SREcon18 Europe/Middle East/Africa (SREcon18 Europe)},
year = {2018},
address = {Dusseldorf},
url = {https://www.usenix.org/node/218926},
publisher = {USENIX Association},
month = aug
}