Shadow Puppets: Cloud-level Accurate {AI} Inference at the Speed and Economy of Edge

Srikumar Venugopal; Michele Gazzetti; Yiannis Gkoufas; Kostas Katrinis

Authors:

Srikumar Venugopal, Michele Gazzetti, Yiannis Gkoufas, and Kostas Katrinis, IBM Research

Abstract:

Extracting value from insights on unstructured data on the Internet of Things and Humans is a major trend in capitalizing on digitization. To date, the design space for doing AI inference on the edge has been highly binary: either consuming cloud-based inference services through edge APIs or running full-fledged deep models on edge devices. In this paper, we break this design space duality by proposing the Semantic Cache, an approach that blends best-of-breed features of the extreme ends of the current design space. Early evaluation results on a first prototype implementation of our semantic cache service on object classification tasks shows tremendous inference latency reduction, when compared to cloud-only inference, and high potential in scoring adequate accuracy for a plurality of AI use-cases.

Srikumar Venugopal, IBM Research

Michele Gazzetti, IBM Research

Yiannis Gkoufas, IBM Research

Kostas Katrinis, IBM Research

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

Venugopal PDF

Shadow Puppets: Cloud-level Accurate AI Inference at the Speed and Economy of Edge

Srikumar Venugopal, IBM Research

Michele Gazzetti, IBM Research

Yiannis Gkoufas, IBM Research

Kostas Katrinis, IBM Research

Open Access Media