Publications
Token pruning optimization for efficient multi-vector dense retrieval
Abstract
Multi-vector dense retrieval with ColBERT has been shown to be effective in striking a good relevance and efficiency tradeoff for both in-domain and out-of-domain datasets through late interaction between queries and documents. However, the efficiency of ColBERT for a largescale retrieval dataset is still constrained by its large memory footprint, as one embedding is stored per token; thus, previous work has studied static pruning of less significant tokens to enhance efficiency. To improve the adaptivity of prior work in zero-shot retrieval settings, this paper proposes a neural classification method that learns pruning decisions with Gumbel-Softmax, and provides an extension to adjust pruning decisions and meet memory space reduction requirements. We evaluate the effectiveness of our proposed method against several baseline approaches on out-of-domain datasets LoTTE and BEIR, and the in-domain MS MARCO passage dataset.
Metadata
- publication
- year
- 2025
- publication date
- 2025
- authors
- Shanxiu He, Mutasem Al-Darabsah, Suraj Nair, Jonathan May, Tarun Agarwal, Tao Yang, Choon Hui Teo
- link
- https://www.amazon.science/publications/token-pruning-optimization-for-efficient-multi-vector-dense-retrieval
- resource_link
- https://www.amazon.science/publications/token-pruning-optimization-for-efficient-multi-vector-dense-retrieval