Learning Hidden Markov Model Structure for Information Extraction

Seymore, Kristie; Rosenfeld, Roni

doi:10.1184/R1/6606872.v1

file.pdf (179.57 kB)

Learning Hidden Markov Model Structure for Information Extraction

journal contribution

posted on 1994-06-01, 00:00 authored by Kristie Seymore, Roni Rosenfeld

Statistical machine learning techniques, while well proven in fields such as speech recognition, are just beginning to be applied to the information extraction domain. We explore the use of hidden Markov models for information extraction tasks, specifically focusing on how to learn model structure from data and how to make the best use of labeled and unlabeled data. We show that a manually-constructed model that contains multiple states per extraction field outperforms a model with one state per field, and discuss strategies for learning the model structure automatically from data. We also demonstrate that the use of distantly-labeled data to set model parameters provides a significant improvement in extraction accuracy. Our models are applied to the task of extracting important fields from the headers of computer science research papers, and achieve an extraction accuracy of 90.1%. Introduction Hidden Markov modeling is a powerful statistical machine learning technique that is just b...

History

Date

1994-06-01

Usage metrics

Keywords

computer sciences

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Learning Hidden Markov Model Structure for Information Extraction

History

Date

Usage metrics

Categories

Keywords

Licence

Exports