Combining N-grams and Alignment in G-Protein Coupling Specificity Prediction
G-protein coupled receptors (GPCR) interact with G-proteins to regulate much of the cell’s response to external stimuli; abnormalities in which cause numerous diseases. We developed a new method to predict the families of G-proteins with which it interacts, given its residue sequence. We combine both alignment and n-gram features. The former captures long-range interactions but assumes the linear ordering of conserved segments is preserved. The latter makes no such assumption but cannot capture long-range interactions. By combining alignment and n-gram features, and using the entire GPCR sequence (instead of intracellular regions alone, as was done by others), our method outperformed the current state-of-the-art in precision, recall and F1, attaining 0.753 in F1 and 0.796 in accuracy on the PTbase 2004 dataset. Moreover, analysis of our results shows that the majority of coupling specificity information lies in the beginning of the 2nd intracellular loop and over the length of the 3rd.