Publications
Sparse Distillation: Speeding Up Text Classification by Using Bigger Models
Abstract
Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time. However, the improved inference speed may be still unsatisfactory for certain timesensitive applications. In this paper, we aim to further push the limit of inference speed by exploring a new area in the design space of the student model. More specifically, we consider distilling a transformer-based text classifier into a billion-parameter, sparsely-activated student model with a embedding-averaging architecture. Our experiments show that the student models retain 97% of the RoBERTa-Large teacher performance on a collection of six text classification tasks. Meanwhile, the student model achieves up to 600x speed-up on both GPUs and CPUs, compared to the teacher models. Further investigation shows that our pipeline is also effective in privacy-preserving and domain …
Metadata
- publication
- NAACL 2022, 2022
- year
- 2022
- publication date
- 2022/3/20
- authors
- Qinyuan Ye, Madian Khabsa, Mike Lewis, Sinong Wang, Xiang Ren, Aaron Jaech
- link
- https://scholar.google.com/scholar?cluster=90543384798526475&hl=en&oi=scholarr
- journal
- NAACL 2022