%0 Journal Article %A Pan, Jia-Yu %A Yang, HyungJeong %A Duygulu, Pinar %A Faloutsos, Christos %D 1975 %T Automatic Multimedia Cross-modal Correlation Discovery %U https://kilthub.cmu.edu/articles/journal_contribution/Automatic_Multimedia_Cross-modal_Correlation_Discovery/6603758 %R 10.1184/R1/6603758.v1 %2 https://kilthub.cmu.edu/ndownloader/files/12094136 %K Design %K Experimentation %X Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations.Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement). %I Carnegie Mellon University