Skip Navigation


Bioinformatics Advance Access originally published online on April 7, 2005
Bioinformatics 2005 21(12):2827-2831; doi:10.1093/bioinformatics/bti433
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/12/2827    most recent
bti433v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Poleksic, A.
Right arrow Articles by Debe, D. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Poleksic, A.
Right arrow Articles by Debe, D. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Convergent Island Statistics: a fast method for determining local alignment score significance

Aleksandar Poleksic *, Joseph F. Danzer , Kevin Hambly and Derek A. Debe

Eidogen-Sertanty Inc. 9381 Judicial Dr., San Diego, CA 92121, USA

*To whom correspondence should be addressed.

Motivation: Background distribution statistics for profile-based sequence alignment algorithms cannot be calculated analytically, and hence such algorithms must resort to measuring the significance of an alignment score by assessing its location among a distribution of background alignment scores. The Gumbel parameters that describe this background distribution are usually pre-computed for a limited number of scoring systems, gap schemes, and sequence lengths and compositions. The use of such look-ups is known to introduce errors, which compromise the significance assessment of a remote homology relationship. One solution is to estimate the background distribution for each pair of interest by generating a large number of sequence shuffles and use the distribution of their scores to approximate the parameters of the underlying extreme value distribution. This is computationally very expensive, as a large number of shuffles are needed to precisely estimate the score statistics.

Results: Convergent Island Statistics (CIS) is a computationally efficient solution to the problem of calculating the Gumbel distribution parameters for an arbitrary pair of sequences and an arbitrary set of gap and scoring schemes. The basic idea behind our method is to recognize the lack of similarity for any pair of sequences early in the shuffling process and thus save on the search time. The method is particularly useful in the context of profile–profile alignment algorithms where the normalization of alignment scores has traditionally been a challenging task.

Contact: aleksandar{at}eidogen.com

Supplementary information: http://www.eidogen-sertanty.com/Documents/convergent_island_stats_sup.pdf


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.