Publications : Information Sciences Institute

Sal-vit: Towards latency efficient private inference on vit using selective attention search with a learnable softmax approximation

Abstract

Recently, private inference (PI) has addressed the rising concern over data and model privacy in machine learning inference as a service. However, existing PI frameworks suffer from high computational and communication overheads due to the expensive multi-party computation (MPC) protocols, particularly for large models such as vision transformers (ViT). The majority of this overhead is due to the encrypted softmax operation in each self-attention layer. In this work, we present SAL-ViT with two novel techniques to boost PI efficiency on ViTs. Our first technique is a learnable PI-efficient approximation to softmax, namely, learnable 2Quad (L2Q), that introduces learnable scaling and shifting parameters to the prior 2Quad softmax approximation, enabling improvement in accuracy. Then, given our observation that external attention (EA) presents lower PI latency than widely-adopted self-attention (SA) at the cost of accuracy, we present a selective attention search (SAS) method to integrate the strength of EA and SA. Specifically, for a given lightweight EA ViT, we leverage a constrained optimization procedure to selectively search and replace EA modules with SA alternatives to maximize the accuracy. Our extensive experiments show that our SAL-ViT can averagely achieve 1.28 x, 1.28 x, 1.14 x lower PI latency with 1.79%, 1.41%, and 2.08% higher accuracy compared to the existing alternatives, on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively.

Metadata

publication: Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
year: 2023
publication date: 2023
authors: Yuke Zhang, Dake Chen, Souvik Kundu, Chenghao Li, Peter A Beerel
link: http://openaccess.thecvf.com/content/ICCV2023/html/Zhang_SAL-ViT_Towards_Latency_Efficient_Private_Inference_on_ViT_using_Selective_ICCV_2023_paper.html
resource_link: https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_SAL-ViT_Towards_Latency_Efficient_Private_Inference_on_ViT_using_Selective_ICCV_2023_paper.pdf
conference: Proceedings of the IEEE/CVF International Conference on Computer Vision
pages: 5116-5125