Bioinformatics Vol. 18 no. 6 2002
Pages 864-872
© 2002 Oxford University Press
Hybrid alignment: high-performance with universal statistics
1 Department of Physics, Florida Atlantic University,
777 Glades Road, Boca Raton, FL 33431-0991, USA
2 Department of Physics, University of California
at San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0319, USA
Received on April 20, 2001
; revised on January 8, 2002
; accepted on January 22, 2002
The score statistics of a recently introduced `hybrid
alignment' algorithm is studied in detail numerically. An
extensive survey across the 2216 models of protein domains
contained in the Pfam v5.4 database
(Bateman et al., Nucleic Acids Res.,
28, 263266, 2000) verifies the theoretical predictions: For the
position-specific scoring functions used in the Pfam models, the
score statistics of hybrid alignment obey the Gumbel
distribution, with the key Gumbel parameter
taking on
the asymptotic value 1 universally for all models. Thus, the
use of hybrid alignment eliminates the time-consuming computer
simulations normally needed to assign p-values to alignment
scores, freeing the users to experiment with different scoring
parameters and functions. The performance of the hybrid
algorithm in detecting sequence homology is also studied. For
protein sequences from the SCOP database
(Murzin et al., J. Mol. Biol., 247,
536540, 1995) using uniform scoring functions, the performance is found
to be comparable to the best of the existing methods.
Preliminary results using the PfamA database suggest that the
hybrid algorithm achieves similar performance as existing
methods for position-specific scoring systems as well. Hybrid
alignment is thereby established as a high performance alignment
algorithm with well-characterized, universal statistics.
Contact: yyu{at}fau.edu
* Present address: Department of Physics, The Ohio State University, 174 West 18th Avenue, Columbus, OH 43210-1106, USA
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Stojmirovic, E. M. Gertz, S. F. Altschul, and Y.-K. Yu The effectiveness of position- and composition-specific gap costs for protein similarity searches Bioinformatics, July 1, 2008; 24(13): i15 - i23. [Abstract] [PDF] |
||||
![]() |
M. G. Kann, S. L. Sheetlin, Y. Park, S. H. Bryant, and J. L. Spouge The identification of complete domains within protein sequences using accurate E-values for semi-global alignment Nucleic Acids Res., July 9, 2007; 35(14): 4678 - 4685. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Frith, U. Hansen, J. L. Spouge, and Z. Weng Finding functional sequence elements by multiple local alignment Nucleic Acids Res., January 2, 2004; 32(1): 189 - 200. [Abstract] [Full Text] [PDF] |
||||

