Publications

Harmful speech detection by language models exhibits gender-queer dialect bias

Abstract

Trigger Warning: Profane Language, Slurs Content moderation on social media platforms shapes the dynamics of online discourse, influencing whose voices are amplified and whose are suppressed. Recent studies have raised concerns about the fairness of content moderation practices, particularly for aggressively flagging posts from transgender and non-binary individuals as toxic. In this study, we investigate the presence of bias in harmful speech classification of gender-queer dialect online, focusing specifically on the treatment of reclaimed slurs. We introduce a novel dataset, QueerReclaimLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs. Dataset instances are scored by gender-queer annotators for potential harm depending on additional context about speaker identity. We systematically evaluate the performance of five off-the-shelf language models in assessing the …

Metadata

publication
Proceedings of the 4th ACM Conference on Equity and Access in Algorithms …, 2024
year
2024
publication date
2024/10/29
authors
Rebecca Dorn, Lee Kezar, Fred Morstatter, Kristina Lerman
link
https://dl.acm.org/doi/abs/10.1145/3689904.3694704
resource_link
https://dl.acm.org/doi/pdf/10.1145/3689904.3694704
book
Proceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization
pages
1-12