Publications
Exploiting Distant Supervision to Learn Semantic Descriptions of Tables with Overlapping Data
Abstract
Understanding the semantic structure of tabular data is essential for data integration and discovery. Specifically, the goal is to annotate columns in a tabular source with types and relationships between them using classes and predicates of a target ontology. Previous work that exploits the matches between entities in a knowledge graph and the table data does not perform well for tables with noisy or ambiguous data. A key reason for this poor performance is the limited amount of labeled data to train these methods. To address this problem, we propose a novel distant supervision approach that leverages existing Wikipedia tables and hyperlinks to automatically label tables with their semantic descriptions. Then, we use the labeled dataset to train neural network models to predict the semantic description of a new table. Our empirical evaluation shows that using the automatically labeled dataset provides …
Metadata
- publication
- International Semantic Web Conference, 116-134, 2024
- year
- 2024
- publication date
- 2024/11/11
- authors
- Binh Vu, Craig A Knoblock, Basel Shbita, Fandel Lin
- link
- https://link.springer.com/chapter/10.1007/978-3-031-77850-6_7
- book
- International Semantic Web Conference
- pages
- 116-134
- publisher
- Springer Nature Switzerland