Skip Navigation


Bioinformatics Advance Access originally published online on January 22, 2004
This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow All Versions of this Article:
20/4/534    most recent
btg440v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (15)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bastien, O.
Right arrow Articles by Maréchal, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bastien, O.
Right arrow Articles by Maréchal, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics 20(4) © Oxford University Press 2004; all rights reserved.

Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics

Olivier Bastien 1,2, Jean-Christophe Aude 3, Sylvaine Roy 4 and Eric Maréchal 1,*

1 Laboratoire de Physiologie Cellulaire Végétale, Département Réponse et Dynamique Cellulaire, UMR 5168 CNRS-CEA-INRA-Université J. Fourier, CEA Grenoble, 17 rue des Martyrs, F-38054, Grenoble cedex 09, France, 2 Gene-IT, 147 avenue Paul Doumer, F-92500 Rueil-Malmaison, France, 3 Laboratoire de Bioinformatique, Génomique et Modélisation, Département de Biologie Joliot Curie, CEA Saclay, F-91191 Gif sur Yvette Cedex, France and 4 Service de Développements pour la Bioinformatique Sud-Est, CEA Grenoble, 17 rue des Martyrs, F-38054, Grenoble cedex 09, France

Received on May 23, 2003 ; revised on July 18, 2003 ; accepted on August 4, 2003
Advance Access Publication January 22, 2004

Motivation:Different automatic methods of sequence alignments are routinely used as a starting point for homology searches and function inference. Confidence in an alignment probability is one of the major fundamentals of massive automatic genome-scale pairwise comparisons, for clustering of putative orthologs and paralogs, sequenced genome annotation or multiple-genomic tree constructions. Extreme value distribution based on the Karlin–Altschul model, usually advised for large-scale comparisons are not always valid, particularly in the case of comparisons of non-biased with nucleotide-biased genomes (such that of Plasmodium falciparum). Z-values estimates based on Monte Carlo technics, can be calculated experimentally for any alignment output, whatever the method used. Empirically, a Z-value higher than ~8 is supposed reasonable to assess that an alignment score is significant, but this arbitrary figure was never theoretically justified.

Results: In this paper, we used the Bienaymé–Chebyshev inequality to demonstrate a theorem of the upper limit of an alignment score probability (or P-value). This theorem implies that a computed Z-value is a statistical test, a single-linkage clustering criterion and that 1/Z-value2 is an upper limit to the probability of an alignment score whatever the actual probability law is. Therefore, this study provides the missing theoretical link between a Z-value cut-off used for an automatic clustering of putative orthologs and/or paralogs, and the corresponding statistical risk in such genome-scale comparisons (using non-biased or biased genomes).

Contact: emarechal{at}cea.fr

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
A. Yu. Mitrophanov and M. Borodovsky
Statistical significance in biological sequence analysis
Brief Bioinform, March 1, 2006; 7(1): 2 - 24.



Home page
J. Biol. Chem.Home page
C. Botte, C. Jeanneau, L. Snajdrova, O. Bastien, A. Imberty, C. Breton, and E. Marechal
Molecular Modeling and Site-directed Mutagenesis of Plant Chloroplast Monogalactosyldiacylglycerol Synthase Reveal Critical Residues for Activity
J. Biol. Chem., October 14, 2005; 280(41): 34691 - 34701.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Petryszak, E. Kretschmann, D. Wieser, and R. Apweiler
The predictive power of the CluSTr database
Bioinformatics, September 15, 2005; 21(18): 3604 - 3609.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Lefebvre, J.-C. Aude, E. Glemet, and C. Neri
Balancing protein similarity and gene co-expression reveals new links between genetic conservation and developmental diversity in invertebrates
Bioinformatics, April 15, 2005; 21(8): 1550 - 1558.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.