Publications

Pipeedge: Pipeline parallelism for large-scale model inference on heterogeneous edge devices

Abstract

Deep neural networks with large model sizes achieve state-of-the-art results for tasks in computer vision and natural language processing. However, such models are too compute- or memory-intensive for resource-constrained edge devices. Prior works on parallel and distributed execution primarily focus on training-rather than inference-using homogeneous accelerators in data centers. We propose PipeEdge, a distributed framework for edge systems that uses pipeline parallelism to both speed up inference and enable running larger, more accurate models that otherwise cannot fit on single edge devices. PipeEdge uses an optimal partition strategy that considers heterogeneity in compute, memory, and network bandwidth. Our empirical evaluation demonstrates that PipeEdge achieves 11.88× and 12.78× speedup using 16 edge devices for the ViT-Huge and BERT-Large models, respectively, with no accuracy …

Metadata

publication
2022 25th Euromicro Conference on Digital System Design (DSD), 298-307, 2022
year
2022
publication date
2022/8/31
authors
Yang Hu, Connor Imes, Xuanang Zhao, Souvik Kundu, Peter A Beerel, Stephen P Crago, John Paul Walters
link
https://ieeexplore.ieee.org/abstract/document/9996638/
conference
2022 25th Euromicro Conference on Digital System Design (DSD)
pages
298-307
publisher
IEEE