HadaFS: A File System Bridging the Local and Shared Burst Buffer for Exascale Supercomputers

Authors: 

Xiaobin He, National Supercomputing Center in Wuxi; Bin Yang, Tsinghua University, Dept. of C.S; National Supercomputer Center in Wuxi; Jie Gao and Wei Xiao, National Supercomputing Center in Wuxi; Qi Chen, Tsinghua University, Dept. of C.S; Shupeng Shi and Dexun Chen, National Supercomputing Center in Wuxi; Weiguo Liu, Shandong University; Wei Xue, Tsinghua University, Dept. of C.S; Tsinghua University, BNRist.; National Supercomputer Center in Wuxi; Zuo-ning Chen, Chinese Academy of Engineering

Abstract: 

Current supercomputers introduce SSDs to form a Burst Buffer (BB) layer to meet the HPC application’s growing I/O requirements. BBs can be divided into two types by deployment location. One is the local BB, which is known for its scalability and performance. The other is the shared BB, which has the advantage of data sharing and deployment costs. How to unify the advantages of the local BB and the shared BB is a key issue in the HPC community.

We propose a novel BB file system named HadaFS that provides the advantages of local BB deployments to shared BB deployments. First, HadaFS offers a new Localized Triage Architecture (LTA) to solve the problem of ultra-scale expansion and data sharing. Then, HadaFS proposes a full-path indexing approach with three metadata synchronization strategies to solve the problem of complex metadata management of traditional file systems and mismatch with the application I/O behaviors. Moreover, HadaFS integrates a data management tool named Hadash, which supports efficient data query in the BB and accelerates data migration between the BB and traditional HPC storage. HadaFS has been deployed on the Sunway New-generation Supercomputer (SNS), serving hundreds of applications and supporting a maximum of 600,000-client scaling.

Category: 
Deployed-Systems Paper

FAST '23 Open Access Sponsored by
NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

BibTeX
@inproceedings {285766,
author = {Xiaobin He and Bin Yang and Jie Gao and Wei Xiao and Qi Chen and Shupeng Shi and Dexun Chen and Weiguo Liu and Wei Xue and Zuo-ning Chen},
title = {{HadaFS}: A File System Bridging the Local and Shared Burst Buffer for Exascale Supercomputers},
booktitle = {21st USENIX Conference on File and Storage Technologies (FAST 23)},
year = {2023},
isbn = {978-1-939133-32-8},
address = {Santa Clara, CA},
pages = {215--230},
url = {https://www.usenix.org/conference/fast23/presentation/he},
publisher = {USENIX Association},
month = feb
}

Presentation Video