<dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Automatic Multimedia Cross-modal Correlation Discovery</dc:title>
<dc:creator>Jia-Yu  Pan</dc:creator>
<dc:creator>HyungJeong  Yang</dc:creator>
<dc:creator>Pinar  Duygulu</dc:creator>
<dc:creator>Christos  Faloutsos</dc:creator>
<dc:identifier identifierType="DOI">10.1184/R1/6603758.v1</dc:identifier>
<dc:relation>https://kilthub.cmu.edu/articles/journal_contribution/Automatic_Multimedia_Cross-modal_Correlation_Discovery/6603758</dc:relation>
<dc:description>Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, &quot;MMG&quot;, to discover such cross-modal correlations.Our &quot;MMG&quot; method requires no tuning, no clustering, no user-determined constants; it can be applied to any  multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the &quot;standard&quot; Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement).</dc:description>
<dc:date>1975-01-01 00:00:00</dc:date>
<dc:subject>Design</dc:subject>
<dc:subject>Experimentation</dc:subject>
<dc:subject>Information and Computing Sciences not elsewhere classified</dc:subject>
</dc:dc>