Publications

Speech acoustics to rt-MRI articulatory dynamics inversion with video diffusion model

Abstract

Inverting speech acoustics to articulatory dynamics presents a multidisciplinary challenge, spanning clinical, linguistic, and engineering domains, with applications including speech therapy and second language learning. Despite its significance, existing methods lack a systematic approach for generating articulatory dynamics of a more complete vocal tract from speech acoustics. Availability of spatio-temporally rich video covering the entire oro-pharynx and laryngeal region of the vocal tract during speech at high frame rates (83 frames/second) using real-time MRI (rt-MRI), alongside linguistic-theory guided computational frameworks, offers new possibilities to improve speech to articulatory inversion. In this work, we propose a novel system for inverting speech acoustics to articulatory dynamics using an rt-MRI driven video diffusion model. Additionally, we introduce a new evaluation method, a linguistic knowledge …

Date
December 31, 2025
Authors
Xuan Shi, Tiantian Feng, Jay Park, Christina Hagedorn, Louis Goldstein, Shrikanth Narayanan
Journal
Computer Speech & Language
Pages
101928
Publisher
Academic Press