Bootstrapping Biomedical Ontologies for Scientific Text using NELL

Movshovitz-Attias, Dana; Cohen, William W.

doi:10.1184/R1/6475493.v1

file.pdf (184.8 kB)

Bootstrapping Biomedical Ontologies for Scientific Text using NELL

journal contribution

posted on 2012-06-01, 00:00 authored by Dana Movshovitz-Attias, William W. Cohen

We describe an open information extraction system for biomedical text based on NELL (the Never-Ending Language Learner) (Carlson et al., 2010), a system designed for extraction from Web text. NELL uses a coupled semi-supervised bootstrapping approach to learn new facts from text, given an initial ontology and a small number of “seeds” for each ontology category. In contrast to previous applications of NELL, in our task the initial ontology and seeds are automatically derived from existing resources. We show that NELL’s bootstrapping algorithm is susceptible to ambiguous seeds, which are frequent in the biomedical domain. Using NELL to extract facts from biomedical text quickly leads to semantic drift. To address this problem, we introduce a method for assessing seed quality, based on a larger corpus of data derived from the Web. In our method, seed quality is assessed at each iteration of the bootstrapping process. Experimental results show significant improvements over NELL’s original bootstrapping algorithm on two types of tasks: learning terms from biomedical categories, and named-entity recognition for biomedical entities using a learned lexicon.

History

Date

2012-06-01

Usage metrics

Keywords

Machine Learning

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Bootstrapping Biomedical Ontologies for Scientific Text using NELL

History

Date

Usage metrics

Categories

Keywords

Licence

Exports