Bioinformatics, Vol 14, 707-714, Copyright © 1998 by Oxford University Press
M Gerstein
MOTIVATION: Transitive sequence matching expands the scope of sequence
comparison by re-running the results of a given query against the databank
as a new query. This sometimes results in the initial query sequence (Q)
being related to a final match (M) indirectly, through a third,
'intermediate' sequence (Q --> I --> M ). This approach has often
been suggested as providing greater sensitivity in sequence comparison;
however, it has not yet been possible to gauge its improvement precisely.
RESULTS: Here, this improvement is comprehensively measured by seeing what
fraction of the known structural relationships transitive sequence matching
can uncover beyond that found by normal pairwise comparison (i.e. direct
linkage). The structural relationships are taken from a well-characterized
test set, the scop classification of protein structure. Specifically, 2055
known structural similarities (called 'pairs') between distantly related
proteins constitute the basic test set. To make the measurement of
transitive matching properly, special data sets, called 'baseline sets',
are derived from this. They consist of pairs of sequences that have a clear
structural relationship that cannot be found by normal sequence comparison
(i.e. they cannot be directly linked). Specifically, using standard
sequence comparison protocols (FASTA with an e-value cut-off of 0. 001), it
is found that the baseline set consists of 1742 pairs. A third intermediate
sequence can link 86 of these indirectly (5%), where this third sequence is
drawn from the entire, current universe of protein sequences. The number of
false positives is minimal. Furthermore, when one considers only the
relationships within the test set that correspond to a close structural
alignment, the coverage increases considerably. In particular, 862 of the
baseline set pairs fit to better than 2.6 A RMS, and transitive matching
can find 62 of these (9%). AVAILABILITY: All the test data, including
precise similarity values calculated from structural alignment, are
available in tabular format over the Web from http://bioinfo.mbb.
yale.edu/align. CONTACT: Mark.Gerstein@yale.edu
ARTICLES
Measurement of the effectiveness of transitive sequence comparison, through a third 'intermediate' sequence
Department of Molecular Biophysics and Biochemistry, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, USA. Mark.Gerstein@yale.ed
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. G. Roessler, B. M. Hall, W. J. Anderson, W. M. Ingram, S. A. Roberts, W. R. Montfort, and M. H. J. Cordes Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds PNAS, February 19, 2008; 105(7): 2343 - 2348. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Coronado, S. Mneimneh, S. L. Epstein, W.-G. Qiu, and P. N. Lipke Conserved Processes and Lineage-Specific Proteins in Fungal Cell Wall Evolution Eukaryot. Cell, December 1, 2007; 6(12): 2269 - 2277. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Soding, M. Remmert, A. Biegert, and A. N. Lupas HHsenser: exhaustive transitive profile search using HMM-HMM comparison. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W374 - W378. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Zheng, L. Y. Han, C. W. Yap, Z. L. Ji, Z. W. Cao, and Y. Z. Chen Therapeutic targets: progress of their exploration and investigation of their characteristics. Pharmacol. Rev., June 1, 2006; 58(2): 259 - 279. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Coronado, O. Attie, S. L. Epstein, W.-G. Qiu, and P. N. Lipke Composition-Modified Matrices Improve Identification of Homologs of Saccharomyces cerevisiae Low-Complexity Glycoproteins. Eukaryot. Cell, April 1, 2006; 5(4): 628 - 637. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. B. Meagher, R. B. Deal, M. K. Kandasamy, and E. C. McKinney Nuclear Actin-Related Proteins as Epigenetic Regulators of Development Plant Physiology, December 1, 2005; 139(4): 1576 - 1585. [Full Text] [PDF] |
||||
![]() |
B. John and A. Sali Detection of homologous proteins by an intermediate sequence search Protein Sci., January 1, 2004; 13(1): 54 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Panchenko Finding weak similarities between proteins by sequence profile comparison Nucleic Acids Res., January 15, 2003; 31(2): 683 - 689. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Brenner, P. Koehl, and M. Levitt The ASTRAL compendium for protein structure and sequence analysis Nucleic Acids Res., January 1, 2000; 28(1): 254 - 256. [Abstract] [Full Text] [PDF] |
||||





