Nicolas Arroyo, Bloomberg LP
The 'thundering herd problem' is an issue that occurrs when multiple threads wait on the same event and are all woken up at the same time. If only one thread can handle the event, then that means that the others waste resources with noop context switches. This problem has been largely resolved in modern kernels and through the use of notification APIs (e.g., epoll, kqueue, and/or IOCP).
We will present how we investigated and identified an unexpected variant of this problem. We will review our performance troubleshooting process, starting with aggregated sampling, followed by dynamic instrumentation and detailed sampling, and finally, kernel mode sampling. With every step, we will explain what information we gained to help us discover the problem: system calls buried inside commonly used libraries that use absolute timers, which caused threads to synchronize and led to a multitude of threads waking up at the same time.

Nicolas Arroyo is a seasoned developer with 20 years of experience across diverse domains, including machine learning, data science, security, performance, systems architecture, embedded systems, distributed systems, and networking. He is passionate about performance optimization, scalability, and solving complex technical challenges. Currently he focuses on performance analysis and tooling for low-latency/high-throughput financial systems.

author = {Nicolas Arroyo},
title = {Case Study: A Thundering Herd in the Wild},
year = {2025},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = mar
}