Automatic Multimedia Cross-modal Correlation Discovery

Pan, Jia-Yu; Yang, HyungJeong; Duygulu, Pinar; Faloutsos, Christos

doi:10.1184/R1/6603758.v1

file.pdf (192.11 kB)

Automatic Multimedia Cross-modal Correlation Discovery

journal contribution

posted on 1975-01-01, 00:00 authored by Jia-Yu Pan, HyungJeong Yang, Pinar Duygulu, Christos Faloutsos

Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations.Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement).

History

Publisher Statement

Date

1975-01-01

Usage metrics

Keywords

Design Experimentation

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Automatic Multimedia Cross-modal Correlation Discovery

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports