Publications

Improving Covert Toxicity Detection by Retrieving and Generating References

Abstract

Models for detecting toxic content play an important role in keeping people safe online. There has been much progress in detecting overt toxicity. Covert toxicity, however, remains a challenge because its detection requires an understanding of implicit meaning and subtle connotations. In this paper, we explore the potential of leveraging references, such as external knowledge and textual interpretations, to enhance the detection of covert toxicity. We run experiments on two covert toxicity datasets with two types of references: 1) information retrieved from a search API, and 2) interpretations generated by large language models. We find that both types of references improve detection, with the latter being more useful than the former. We also find that generating interpretations grounded on properties of covert toxicity, such as humor and irony, lead to the largest improvements 1.

Metadata

publication
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), 266-274, 2024
year
2024
publication date
2024/6
authors
Dong-Ho Lee, Hyundong Cho, Woojeong Jin, Jihyung Moon, Sungjoon Park, Paul Röttger, Jay Pujara, Roy Ka-Wei Lee
link
https://aclanthology.org/2024.woah-1.21/
resource_link
https://aclanthology.org/2024.woah-1.21.pdf
conference
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)
pages
266-274