Publications
Pipeedge: Pipeline parallelism for large-scale model inference on heterogeneous edge devices
Abstract
Deep neural networks with large model sizes achieve state-of-the-art results for tasks in computer vision and natural language processing. However, such models are too compute- or memory-intensive for resource-constrained edge devices. Prior works on parallel and distributed execution primarily focus on training-rather than inference-using homogeneous accelerators in data centers. We propose PipeEdge, a distributed framework for edge systems that uses pipeline parallelism to both speed up inference and enable running larger, more accurate models that otherwise cannot fit on single edge devices. PipeEdge uses an optimal partition strategy that considers heterogeneity in compute, memory, and network bandwidth. Our empirical evaluation demonstrates that PipeEdge achieves 11.88× and 12.78× speedup using 16 edge devices for the ViT-Huge and BERT-Large models, respectively, with no accuracy …
Metadata
- publication
- 2022 25th Euromicro Conference on Digital System Design (DSD), 298-307, 2022
- year
- 2022
- publication date
- 2022/8/31
- authors
- Yang Hu, Connor Imes, Xuanang Zhao, Souvik Kundu, Peter A Beerel, Stephen P Crago, John Paul Walters
- link
- https://ieeexplore.ieee.org/abstract/document/9996638/
- conference
- 2022 25th Euromicro Conference on Digital System Design (DSD)
- pages
- 298-307
- publisher
- IEEE