Publications
Harmful speech detection by language models exhibits gender-queer dialect bias
Abstract
Trigger Warning: Profane Language, Slurs Content moderation on social media platforms shapes the dynamics of online discourse, influencing whose voices are amplified and whose are suppressed. Recent studies have raised concerns about the fairness of content moderation practices, particularly for aggressively flagging posts from transgender and non-binary individuals as toxic. In this study, we investigate the presence of bias in harmful speech classification of gender-queer dialect online, focusing specifically on the treatment of reclaimed slurs. We introduce a novel dataset, QueerReclaimLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs. Dataset instances are scored by gender-queer annotators for potential harm depending on additional context about speaker identity. We systematically evaluate the performance of five off-the-shelf language models in assessing the …
Metadata
- publication
- Proceedings of the 4th ACM Conference on Equity and Access in Algorithms …, 2024
- year
- 2024
- publication date
- 2024/10/29
- authors
- Rebecca Dorn, Lee Kezar, Fred Morstatter, Kristina Lerman
- link
- https://dl.acm.org/doi/abs/10.1145/3689904.3694704
- resource_link
- https://dl.acm.org/doi/pdf/10.1145/3689904.3694704
- book
- Proceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization
- pages
- 1-12