Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (16)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Webber, C.
Right arrow Articles by Barton, G. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Webber, C.
Right arrow Articles by Barton, G. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 12 2001
Pages 1158-1167
© 2001 Oxford University Press

Estimation of P-values for global alignments of protein sequences

Caleb Webber and Geoffrey J. Barton *

EMBL—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Received on March 21, 2001 ; revised on May 14, 2001 ; accepted on May 24, 2001

Motivation: The global alignment of protein sequence pairs is often used in the classification and analysis of full-length sequences. The calculation of a Z-score for the comparison gives a length and composition corrected measure of the similarity between the sequences. However, the Z-score alone, does not indicate the likely biological significance of the similarity. In this paper, all pairs of domains from 250 sequences belonging to different SCOP folds were aligned and Z-scores calculated. The distribution of Z-scores was fitted with a peak distribution from which the probability of obtaining a given Z-score from the global alignment of two protein sequences of unrelated fold was calculated. A similar analysis was applied to subsequence pairs found by the Smith–Waterman algorithm. These analyses allow the probability that two protein sequences share the same fold to be estimated by global sequence alignment.

Results: The relationship between Z-score and probability varied little over the matrix/gap penalty combinations examined. However, an average shift of +4.7was observed for Z-scores derived from global alignment of locally-aligned subsequences compared to global alignment of the full-length sequences. This shift was shown to be the result of pre-selection by local alignment, rather than any structural similarity in the subsequences. The search ability of both methods was benchmarked against the SCOP superfamily classification and showed that global alignment Z-scores generated from the entire sequence are as effective as SSEARCH at low error rates and more effective at higher error rates. However, global alignment Z-scores generated from the best locally-aligned subsequence were significantly less effective than SSEARCH. The method of estimating statistical significance described here was shown to give similar values to SSEARCH and BLAST, providing confidence in the significance estimation.

Availability: Software to apply the statistics to global alignments is available from http://barton.ebi.ac.uk.

Contact: geoff{at}ebi.ac.uk

* To whom correspondence should be addressed. Present address: School of Life Sciences, University of Dundee, Dow St., Dundee, DD1 5EH.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
A. Yu. Mitrophanov and M. Borodovsky
Statistical significance in biological sequence analysis
Brief Bioinform, March 1, 2006; 7(1): 2 - 24.



Home page
DNA ResHome page
A. Shelenkov, K. Skryabin, and E. Korotkov
Search and Classification of Potential Minisatellite Sequences from Bacterial Genomes
DNA Res, January 1, 2006; 13(3): 89 - 102.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.