Automatic Multimedia Cross-modal Correlation Discovery Jia-Yu Pan HyungJeong Yang Pinar Duygulu Christos Faloutsos 10.1184/R1/6603758.v1 https://kilthub.cmu.edu/articles/journal_contribution/Automatic_Multimedia_Cross-modal_Correlation_Discovery/6603758 Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations.Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement). 1975-01-01 00:00:00 Design Experimentation