Publications
Audio-visual activity guided cross-modal identity association for active speaker detection
Abstract
Active speaker detection in videos addresses associating a source face, visible in the video frames, with the underlying speech in the audio modality. The two primary sources of information to derive such a speech-face relationship are i) visual activity and its interaction with the speech signal and ii) co-occurrences of speakers' identities across modalities in the form of face and speech. The two approaches have their limitations: the audio-visual activity models get confused with other frequently occurring vocal activities, such as laughing and chewing, while the speakers' identity-based methods are limited to videos having enough disambiguating information to establish a speech-face association. Since the two approaches are independent, we investigate their complementary nature in this work. We propose a novel unsupervised framework to guide the speakers' cross-modal identity association with the audio …
Metadata
- publication
- IEEE Open Journal of Signal Processing 4, 225-232, 2023
- year
- 2023
- publication date
- 2023/4/14
- authors
- Rahul Sharma, Shrikanth Narayanan
- link
- https://ieeexplore.ieee.org/abstract/document/10102534/
- resource_link
- https://ieeexplore.ieee.org/iel7/8782710/9006934/10102534.pdf
- journal
- IEEE Open Journal of Signal Processing
- volume
- 4
- pages
- 225-232
- publisher
- IEEE