ENG25519: Faster TLS 1.3 handshake using optimized X25519 and Ed25519

Authors: 

Jipeng Zhang, CCST, Nanjing University of Aeronautics and Astronautics; Junhao Huang, Guangdong Provincial Key Laboratory IRADS, BNU-HKBU United International College; Hong Kong Baptist University; Lirui Zhao, CCST, Nanjing University of Aeronautics and Astronautics; Donglong Chen, Guangdong Provincial Key Laboratory IRADS, BNU-HKBU United International College; Çetin Kaya Koç, CCST, Nanjing University of Aeronautics and Astronautics; Iğdır University; University of California Santa Barbara

Abstract: 

The IETF released RFC 8446 in 2018 as the new TLS 1.3 standard, which recommends using X25519 for key exchange and Ed25519 for identity verification. These computations are the most time-consuming steps in the TLS handshake. Intel introduced AVX-512 in 2013 as an extension of AVX2, and in 2018, AVX-512IFMA, a submodule of AVX-512 to further support 52-bit (integer) multipliers, was implemented on Cannon Lake CPUs.

This paper first revisits various optimization strategies for ECC and presents a more performant X25519/Ed25519 implementation using the AVX-512IFMA instructions. These optimization strategies cover all levels of ECC arithmetic, including finite field arithmetic, point arithmetic, and scalar multiplication computations. Furthermore, we formally verify our finite field implementation to ensure its correctness and robustness.

In addition to the cryptographic implementation, we further explore the deployment of our optimized X25519/Ed25519 library in the TLS protocol layer and the TLS ecosystem. To this end, we design and implement an OpenSSL ENGINE called ENG25519, which propagates the performance benefits of our ECC library to the TLS protocol layer and the TLS ecosystem. The TLS applications can benefit directly from the underlying cryptographic improvements through ENG25519 without necessitating any changes to the source code of OpenSSL and applications. Moreover, we discover that the cold-start issue of vector units degrades the performance of cryptography in TLS protocol, and we develop an auxiliary thread with a heuristic warm-up scheme to mitigate this issue.

Finally, this paper reports a successful integration of the ENG25519 into an unmodified DNS over TLS (DoT) server called unbound, which further highlights the practicality of the ENG25519. We also report benchmarks of TLS 1.3 handshake and DoT query, achieving a speedup of 25% to 35% for TLS 1.3 handshakes per second and an improvement of 24% to 41% for the peak server throughput of DoT queries.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.