Publications
Speech acoustics to rt-MRI articulatory dynamics inversion with video diffusion model
Abstract
Inverting speech acoustics to articulatory dynamics presents a multidisciplinary challenge, spanning clinical, linguistic, and engineering domains, with applications including speech therapy and second language learning. Despite its significance, existing methods lack a systematic approach for generating articulatory dynamics of a more complete vocal tract from speech acoustics. Availability of spatio-temporally rich video covering the entire oro-pharynx and laryngeal region of the vocal tract during speech at high frame rates (83 frames/second) using real-time MRI (rt-MRI), alongside linguistic-theory guided computational frameworks, offers new possibilities to improve speech to articulatory inversion. In this work, we propose a novel system for inverting speech acoustics to articulatory dynamics using an rt-MRI driven video diffusion model. Additionally, we introduce a new evaluation method, a linguistic knowledge …
- Date
- December 31, 2025
- Authors
- Xuan Shi, Tiantian Feng, Jay Park, Christina Hagedorn, Louis Goldstein, Shrikanth Narayanan
- Journal
- Computer Speech & Language
- Pages
- 101928
- Publisher
- Academic Press