posted on 2002-08-01, 00:00authored byYan Liu, Jaime G. Carbonell, Vanathi Gopalakrishnan, Peter Wiegele
Protein fold recognition is a crucial step in inferring
biological structure and function. This paper
focuses on machine learning methods for predicting
quaternary structural folds, which consist of
multiple protein chains that form chemical bonds
among side chains to reach a structurally stable
domain. The complexity associated with modeling
the quaternary fold poses major theoretical and
computational challenges to current machine learning
methods. We propose methods to address these
challenges and show how (1) domain knowledge
is encoded and utilized to characterize structural
properties using segmentation conditional graphical
models; and (2) model complexity is handled
through efficient inference algorithms. Our
model follows a discriminative approach so that
any informative features, such as those representative
of overlapping or long-range interactions, can
be used conveniently. The model is applied to predict
two important quaternary folds, the triple β-
spirals and double-barrel trimers. Cross-family validation
shows that our method outperforms other
state-of-the art algorithms.