Learning Generative Models for Protein Fold Families

Balakrishnan, Sivaraman; Kamisetty, Hetunandan; Carbonell, Jaime G.; Langmead, Christopher J.

doi:10.1184/R1/6606866.v1

file.pdf (398.41 kB)

Learning Generative Models for Protein Fold Families

journal contribution

posted on 2005-06-01, 00:00 authored by Sivaraman Balakrishnan, Hetunandan Kamisetty, Jaime G. Carbonell, Christopher J. Langmead

Statistical models of the amino acid composition of the proteins within a fold family are widely used in science and engineering. Existing techniques for learning probabilistic graphical models from multiple sequence alignments either make strong assumptions about the conditional independencies within the model (e.g., HMMs), or else use sub-optimal algorithms to learn the structure and parameters of the model. We introduce an approach to learning the topological structure and parameters of an undirected probabilistic graphical model. The learning algorithm uses block- L1 regularization and solves a convex optimization problem, thus guaranteeing a globally optimal solution at convergence. The resulting model encodes both the position-specific conservation statistics and the correlated mutation statistics between sequential and long-range pairs of residues. Our model is generative, allowing for the design of new proteins that have corresponding statistical properties to those seen in nature. We apply our approach to two widely studied protein families: the WW and the PDZ folds. We demonstrate that our model is able to capture interactions that are important in folding and allostery. Our results additionally indicate that while the network of interactions within a protein is sparse, it is richer than previously believed.

History

Date

2005-06-01

Usage metrics

Keywords

Structure Learning Generative Models Probabilistic Graphical Models Proteins

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Learning Generative Models for Protein Fold Families

History

Date

Usage metrics

Categories

Keywords

Licence

Exports