posted on 2003-10-01, 00:00authored byYan Liu, Jaime G. Carbonell, Rong Jin
Text classification, whether by topic or genre, is an important
task that contributes to text extraction, retrieval, summarization
and question answering. In this paper we present a new pairwise ensemble
approach, which uses pairwise Support Vector Machine (SVM) classifiers
as base classifiers and “input-dependent latent variable” method
for model combination. This new approach better captures the characteristics
of genre classification, including its heterogeneous nature. Our
experiments on two multi-genre collections and one topic-based classification
datasets show that the pairwise ensemble method outperforms
both boosting, which has been demonstrated as a powerful ensemble
approach, and Error-Correcting Output Codes (ECOC), which applies
pairwise-like classifiers for multiclass classification problems.