Multi-modal Information Retrieval from Broadcast Video using OCR and Speech Recognition

Hauptmann, Alexander; Jin, Rong; Ng, Tobun Dorbin

doi:10.1184/R1/6607571.v1

file.pdf (206.22 kB)

Multi-modal Information Retrieval from Broadcast Video using OCR and Speech Recognition

journal contribution

posted on 1988-01-01, 00:00 authored by Alexander Hauptmann, Rong Jin, Tobun Dorbin Ng

We examine multi-modal information retrieval from broadcast video where text can be read on the screen through OCR and speech recognition can be performed on the audio track. OCR and speech recognition are compared on the 2001 TREC Video Retrieval evaluation corpus. Results show that OCR is more important that speech recognition for video retrieval. OCR retrieval can further improve through dictionary-based post-processing. We demonstrate how to utilize imperfect multi-modal metadata results to benefit multi-modal information retrieval.

History

Publisher Statement

Date

1988-01-01

Usage metrics

Keywords

Multi-modal Video Information Retrieval Speech Recognition Optical Character Recognition OCR

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Multi-modal Information Retrieval from Broadcast Video using OCR and Speech Recognition

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports