Publications – Information Sciences Institute

Wavelet Scattering Network Features for Intensity Category Classification and Prediction of SPL from Speech

Abstract

Speakers change vocal intensity in daily life to communicate over long distances and to express vocal emotions. Humans produce speech using different intensity categories (e.g. soft, normal and loud voice) and they can regulate intensity across a wide sound pressure level (SPL) range. Knowing the intensity category or the SPL of speech is beneficial in speech-based biomarking of health. Recent studies have explored the vocal intensity category classification and prediction of SPL from speech, which has been recorded without SPL calibration information and is presented on an arbitrary amplitude scale. Using speech signals in such scenario, this study investigates the wavelet scattering network (WSN) features in two tasks: (1) classification of speech into four intensity categories (soft, normal, loud, very loud) (multi-class classification task) and (2) prediction of SPL (regression task). In the former task, the WSN …

Metadata

publication: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2025
year: 2025
publication date: 2025/4/6
authors: Manila Kodali, Sudarsana Reddy Kadiri, Shrikanth Narayanan, Paavo Alku
link: https://ieeexplore.ieee.org/iel8/10887540/10887541/10888824.pdf
journal: ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
pages: 1-5
publisher: IEEE