Publications
Making models shallow again: Jointly learning to reduce non-linearity and depth for latency-efficient private inference
Abstract
Large number of ReLU and MAC operations of Deep neural networks make them ill-suited for latency and compute-efficient private inference. In this paper, we present a model optimization method that allows a model to learn to be shallow. In particular, we leverage the ReLU sensitivity of a convolutional block to remove a ReLU layer and merge its succeeding and preceding convolution layers to a shallow block. Unlike existing ReLU reduction methods, our joint reduction method can yield models with improved reduction of both ReLUs and linear operations by up to 1.73 x and 1.47 x, respectively, evaluated with ResNet18 on CIFAR-100 without any significant accuracy-drop.
Metadata
- publication
- Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
- year
- 2023
- publication date
- 2023
- authors
- Souvik Kundu, Yuke Zhang, Dake Chen, Peter A Beerel
- link
- https://openaccess.thecvf.com/content/CVPR2023W/ECV/html/Kundu_Making_Models_Shallow_Again_Jointly_Learning_To_Reduce_Non-Linearity_and_CVPRW_2023_paper.html
- resource_link
- https://openaccess.thecvf.com/content/CVPR2023W/ECV/papers/Kundu_Making_Models_Shallow_Again_Jointly_Learning_To_Reduce_Non-Linearity_and_CVPRW_2023_paper.pdf
- conference
- Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
- pages
- 4685-4689