%0 Journal Article
%A Pan, Jia-Yu
%A Yang, HyungJeong
%A Duygulu, Pinar
%A Faloutsos, Christos
%D 1975
%T Automatic Multimedia Cross-modal Correlation Discovery
%U https://kilthub.cmu.edu/articles/journal_contribution/Automatic_Multimedia_Cross-modal_Correlation_Discovery/6603758
%R 10.1184/R1/6603758.v1
%2 https://kilthub.cmu.edu/ndownloader/files/12094136
%K Design
%K Experimentation
%K Information and Computing Sciences not elsewhere classified
%X Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations.Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any  multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement).
%I Carnegie Mellon University