Publications : Information Sciences Institute

Emotion-aligned contrastive learning between images and music

Abstract

Traditional music search engines rely on retrieval methods that match natural language queries with music metadata. There have been increasing efforts to expand retrieval methods to consider the audio characteristics of music itself, using queries of various modalities including text, video, and speech. While most approaches aim to match general music semantics to the input queries, only a few focus on affective qualities. In this work, we address the task of retrieving emotionally-relevant music from image queries by learning an affective alignment between images and music audio. Our approach focuses on learning an emotion-aligned joint embedding space between images and music. This embedding space is learned via emotion-supervised contrastive learning, using an adapted cross-modal version of the SupCon loss. We evaluate the joint embeddings through cross-modal retrieval tasks (image-to-music …

Metadata

publication: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
year: 2024
publication date: 2024/4/14
authors: Shanti Stewart, Kleanthis Avramidis, Tiantian Feng, Shrikanth Narayanan
link: https://ieeexplore.ieee.org/abstract/document/10447276/
resource_link: https://arxiv.org/pdf/2308.12610
conference: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
pages: 8135-8139
publisher: IEEE