Deployed System: Trading More Storage for Less Computation: A KVCache-centric Architecture for Serving LLM Chatbot

Authors: 

Ruoyu Qin, Moonshot AI and Tsinghua University; Zheming Li, Weiran He, and Jialei Cui, Moonshot AI; Feng Ren, Mingxing Zhang, Yongwei Wu, and Weimin Zheng, Tsinghua University; Xinran Xu, Moonshot AI