Publications
RNA-ViT: reduced-dimension approximate normalized attention vision transformers for latency efficient private inference
Abstract
The concern over data and model privacy in machine learning inference as a service (MLaaS) has led to the development of private inference (PI) techniques. However, existing PI frameworks, especially those designed for large models such as vision transformers (ViT), suffer from high computational and communication overheads caused by the expensive multi-party computation (MPC) protocols. The encrypted attention module that involves the softmax operation contributes significantly to this overhead. In this work, we present a family of models dubbed RNA-ViT, that leverage a novel attention module called reduced-dimension approximate normalized attention and a latency efficient GeLU-alternative layer. In particular, RNA-ViT uses two novel techniques to improve PI efficiency in ViTs: a reduced-dimension normalized attention (RNA) architecture and a high order polynomial (HOP) softmax approximation for …
Metadata
- publication
- 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 1-9, 2023
- year
- 2023
- publication date
- 2023/10/28
- authors
- Dake Chen, Yuke Zhang, Souvik Kundu, Chenghao Li, Peter A Beerel
- link
- https://ieeexplore.ieee.org/abstract/document/10323702/
- resource_link
- https://howardli0816.github.io/files/RNA_ViT.pdf
- conference
- 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)
- pages
- 1-9
- publisher
- IEEE