Domain architecture comparison for multidomain homology identification.

Song, Nan; Sedgewick, Robert D.; Durand, Dannie

doi:10.1184/R1/6097949.v1

biology-1030.pdf (385.1 kB)

Domain architecture comparison for multidomain homology identification.

journal contribution

posted on 2007-05-01, 00:00 authored by Nan Song, Robert D. Sedgewick, Dannie Durand

Homology identification is the first step for many genomic studies. Current methods, based on sequence comparison, can result in a substantial number of mis-assignments due to the similarity of homologous domains in otherwise unrelated sequences. Here we propose methods to detect homologs through explicit comparison of protein domain content. We developed several schemes for scoring the homology of a pair of protein sequences based on methods used in the field of information retrieval. We evaluate the proposed methods and methods used in the literature using a benchmark of fifteen sequence families of known evolutionary history. The results of these studies demonstrate the effectiveness of comparing domain architectures using these similarity measures. We also demonstrate the importance of both weighting promiscuous domains and of compensating for the statistical effect of having a large number of domains in a protein. Using logistic regression, we demonstrate the benefit of combining similarity measures based on domain content with sequence similarity measures.

History

Publisher Statement

Date

2007-05-01

Usage metrics

Keywords

Models Genetic Protein Structure Tertiary Proteins Sequence Analysis Protein Sequence Homology Amino Acid

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Domain architecture comparison for multidomain homology identification.

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports