Publications
In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search
Abstract
To effectively use large language models (LLMs) for real-world queries, it is imperative that they generalize to the long-tail distribution, i.e. rare examples where models exhibit low confidence. In this work, we take the first step towards evaluating LLMs in the long-tail distribution of inferential knowledge. We exemplify long-tail evaluation on the Natural Language Inference task. First, we introduce Logic-Induced-Knowledge-Search (LINK), a systematic long-tail data generation framework, to obtain factually-correct yet long-tail inferential statements. LINK uses variable-wise prompting grounded on symbolic rules to seek low-confidence statements while ensuring factual correctness. We then use LINK to curate Logic-Induced-Long-Tail (LINT), a large-scale long-tail inferential knowledge dataset that contains 108K statements spanning four domains. We evaluate popular LLMs on LINT; we find that state-of-the-art LLMs show significant performance drop (21% relative drop for GPT4) on long-tail data as compared to on head distribution data, and smaller models show even more generalization weakness. These results further underscore the necessity of long-tail evaluation in developing generalizable LLMs.
Metadata
- publication
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language …, 2024
- year
- 2024
- publication date
- 2023/11/13
- authors
- Huihan Li, Yuting Ning, Zeyi Liao, Siyuan Wang, Xiang Lorraine Li, Ximing Lu, Wenting Zhao, Faeze Brahman, Yejin Choi, Xiang Ren
- link
- https://arxiv.org/abs/2311.07237
- resource_link
- https://arxiv.org/pdf/2311.07237
- journal
- arXiv preprint arXiv:2311.07237