Bioinformatics Vol. 18 no. 12 2002
Pages 1658-1665
© 2002 Oxford University Press
Structuredependent sequence alignment for remotely related proteins
Department of Pharmacology and Columbia Genome Center, Columbia University, 630 West 168th street, PH 7 W Room 318, New York, NY 10032, USA
Received on January 2, 2002
; revised on April 10, 2002
; accepted on May 24, 2002
Motivation: The quality of a model structure derived from a comparative modeling procedure is dictated by the accuracy of the predicted sequencetemplate alignment. As the sequencetemplate pairs are increasingly remote in sequence relationship, the prediction of the sequencetemplate alignments becomes increasingly problematic with sequence alignment methods. Structural information of the template, used in connection with the sequence relationship of the sequencetemplate pair, could significantly improve the accuracy of the sequencetemplate alignment. In this paper, we describe a sequencetemplate alignment method that integrates sequence and structural information to enhance the accuracy of sequencetemplate alignments for distantly related protein pairs.
Results: The structure-dependent sequence alignment (SDSA) procedure was
optimized for coverage and accuracy on a training set of 412
protein pairs; the structures for each of the training pairs are
similar (RMSD<
4Å) but the sequence relationship is
undetectable (average pair-wise sequence identity = 8%). The
optimized SDSA procedure was then applied to extend PSI-BLAST
local alignments by calculating the global alignments under the
constraint of the residue pairs in the local alignments. This
composite alignment procedure was assessed with a testing set
of 1421 protein pairs, of which the pair-wise structures are
similar (RMSD<
4Å) but the sequences are marginally
related at best in each pair (average pair-wise sequence
identity = 13%). The assessment showed that the composite
alignment procedure predicted more aligned residues pairs with
an average of 27% increase in correctly aligned residues
over the standard PSI-BLAST alignments for the protein pairs in
the testing set.
Availability: All the computational and assessment procedures have been implemented in the integrated computational system PrISM.1 (Protein Informatics System for Modeling). The system and associated databases for LINUX systems can be downloaded from the website: http://www.columbia.edu/~ay1/
Contact: ay1{at}columbia.edu