Publications : Information Sciences Institute

DiscoverArtificial Intelligence

Abstract

Entity Resolution (ER) is the problem of semi-automatically determining when two entities refer to the same underlying entity, with applications ranging from healthcare to e-commerce. Traditional ER solutions required considerable manual expertise, including domain-specific feature engineering, as well as identification and curation of training data. Recently released large language models (LLMs) provide an opportunity to make ER more seamless and domain-independent. Because of LLMs’ pre-trained knowledge, the matching step in ER can be made easier by just prompting. However, it is also well known that LLMs can pose risks, that the quality of their outputs can depend on how prompts are engineered, and that the cost of using LLMs can be significant. Unfortunately, a systematic experimental study on the effects of different prompting methods and their respective cost for solving domain-specific entity matching using LLMs, like ChatGPT, has been lacking thus far. This paper aims to address this gap by conducting such a study. We consider some relatively simple and cost-efficient ER prompt engineering methods and apply them to perform product matching on two realworld datasets widely used in the community. We select two well-known e-commerce datasets and provide extensive experimental results to show that an LLM like GPT3. 5 is viable for high-performing product matching and, interestingly, that more complicated and detailed (and hence, expensive) prompting methods do not necessarily outperform simpler approaches. We provide brief discussions on qualitative and error analysis, including a study of the inter-consistency of …

Metadata

publication: Discover 4, 56, 2024
year: 2024
publication date: 2024
authors: Navapat Nananukul, Khanin Sisaengsuwanchai, Mayank Kejriwal
link: https://scholar.google.com/scholar?cluster=5013402805236231058&hl=en&oi=scholarr
journal: Discover
volume: 4
pages: 56