Publications

RNA-ViT: reduced-dimension approximate normalized attention vision transformers for latency efficient private inference

Abstract

The concern over data and model privacy in machine learning inference as a service (MLaaS) has led to the development of private inference (PI) techniques. However, existing PI frameworks, especially those designed for large models such as vision transformers (ViT), suffer from high computational and communication overheads caused by the expensive multi-party computation (MPC) protocols. The encrypted attention module that involves the softmax operation contributes significantly to this overhead. In this work, we present a family of models dubbed RNA-ViT, that leverage a novel attention module called reduced-dimension approximate normalized attention and a latency efficient GeLU-alternative layer. In particular, RNA-ViT uses two novel techniques to improve PI efficiency in ViTs: a reduced-dimension normalized attention (RNA) architecture and a high order polynomial (HOP) softmax approximation for …

Metadata

publication
2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 1-9, 2023
year
2023
publication date
2023/10/28
authors
Dake Chen, Yuke Zhang, Souvik Kundu, Chenghao Li, Peter A Beerel
link
https://ieeexplore.ieee.org/abstract/document/10323702/
resource_link
https://howardli0816.github.io/files/RNA_ViT.pdf
conference
2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)
pages
1-9
publisher
IEEE