Publications : Information Sciences Institute

Data efficient child-adult speaker diarization with simulated conversations

Abstract

Automating child speech analysis is crucial for applications such as neurocognitive assessments. Speaker diarization, which identifies "who spoke when", is an essential component of the automated analysis. However, publicly available child-adult speaker diarization solutions are scarce due to privacy concerns and a lack of annotated datasets, while manually annotating data for each scenario is both time-consuming and costly. To overcome these challenges, we propose a data-efficient solution by creating simulated child-adult conversations using AudioSet. We then train a Whisper Encoder-based model, achieving strong zero-shot performance on child-adult speaker diarization using real datasets. The model performance improves substantially when fine-tuned with only 30 minutes of real train data, with LoRA further improving the transfer learning performance. The source code and the child-adult speaker …

Metadata

publication: arXiv preprint arXiv:2409.08881, 2024
year: 2024
publication date: 2024/9/13
authors: Anfeng Xu, Tiantian Feng, Helen Tager-Flusberg, Catherine Lord, Shrikanth Narayanan
link: https://ieeexplore.ieee.org/abstract/document/10889307/
resource_link: https://arxiv.org/pdf/2409.08881
journal: arXiv preprint arXiv:2409.08881