Bioinformatics Vol. 17 no. 8 2001
Pages 700-712
© 2001 Oxford University Press
AL2CO: calculation of positional conservation in a protein sequence alignment
1 Howard Hughes Medical Institute
2 Department of Biochemistry, University of
Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas,
TX 75390-9050, USA
Received on September 12, 2000
; revised on February 23, 2001
; accepted on February 28, 2001
Motivation: Amino acid sequence alignments are widely used in the analysis of protein structure, function and evolutionary relationships. Proteins within a superfamily usually share the same fold and possess related functions. These structural and functional constraints are reflected in the alignment conservation patterns. Positions of functional and/or structural importance tend to be more conserved. Conserved positions are usually clustered in distinct motifs surrounded by sequence segments of low conservation. Poorly conserved regions might also arise from the imperfections in multiple alignment algorithms and thus indicate possible alignment errors. Quantification of conservation by attributing a conservation index to each aligned position makes motif detection more convenient. Mapping these conservation indices onto a protein spatial structure helps to visualize spatial conservation features of the molecule and to predict functionally and/or structurally important sites. Analysis of conservation indices could be a useful tool in detection of potentially misaligned regions and will aid in improvement of multiple alignments.
Results: We developed a program to calculate a conservation index
at each position in a multiple sequence alignment using several
methods. Namely, amino acid frequencies at each position are
estimated and the conservation index is calculated from these
frequencies. We utilize both unweighted frequencies and frequencies
weighted using two different strategies. Three conceptually
different approaches (entropy-based, variance-based and matrix
score-based) are implemented in the algorithm to define the
conservation index. Calculating conservation indices for 35522
positions in 284 alignments from SMART database we demonstrate that
different methods result in highly correlated (correlation
coefficient more than 0.85) conservation indices. Conservation
indices show statistically significant correlation between
sequentially adjacent positions
and
, where
, and averaging of the indices over the window of three
positions is optimal for motif detection. Positions with gaps
display substantially lower conservation properties. We compare
conservation properties of the SMART alignments or FSSP structural
alignments to those of the ClustalW alignments. The results suggest
that conservation indices should be a valuable tool of alignment
quality assessment and might be used as an objective function for
refinement of multiple alignments.
Availability: The C code of the AL2CO program and its pre-compiled versions for several platforms as well as the details of the analysis are freely available at ftp://iole.swmed.edu/pub/al2co/.
Contact: grishin{at}chop.swmed.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Blouin, S. Perry, A. Lavell, E. Susko, and A. J. Roger Reproducing the manual annotation of multiple sequence alignments using a SVM classifier Bioinformatics, December 1, 2009; 25(23): 3093 - 3098. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Samuels, G. Gulati, J.-H. Shin, R. Opara, E. McSweeney, M. Sekedat, S. Long, Z. Kelman, and D. Jeruzalmi A biochemically active MCM-like helicase in Bacillus cereus Nucleic Acids Res., July 1, 2009; 37(13): 4441 - 4452. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wang, R. I. Sadreyev, and N. V. Grishin PROCAIN: protein profile comparison with assisting information Nucleic Acids Res., June 1, 2009; 37(11): 3522 - 3530. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Larrea, I. M. Pedroso, A. Malhotra, and R. S. Myers Identification of two conserved aspartic acid residues required for DNA digestion by a novel thermophilic Exonuclease VII in Thermotoga maritima Nucleic Acids Res., October 1, 2008; 36(18): 5992 - 6003. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Ahola, T. Aittokallio, M. Vihinen, and E. Uusipaikka Model-based prediction of sequence alignment quality Bioinformatics, October 1, 2008; 24(19): 2165 - 2171. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, M. Tang, and N. V. Grishin PROMALS3D web server for accurate multiple protein sequence and structure alignments Nucleic Acids Res., July 1, 2008; 36(suppl_2): W30 - W34. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Aguilar, L. Skrabanek, S. S. Gross, B. Oliva, and F. Campagne Beyond tissueInfo: functional prediction using tissue expression profile similarity searches Nucleic Acids Res., June 1, 2008; 36(11): 3728 - 3737. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Park and V. Helms Prediction of the translocon-mediated membrane insertion free energies of protein sequences Bioinformatics, May 15, 2008; 24(10): 1271 - 1277. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Fischer, C. E. Mayer, and J. Soding Prediction of protein functional residues from sequence by probability density estimation Bioinformatics, March 1, 2008; 24(5): 613 - 620. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-W. Wang, J. Wang, F. Ding, K. Callahan, M. A. Bratkowski, J. S. Butler, E. Nogales, and A. Ke Architecture of the yeast Rrp44 exosome complex suggests routes of RNA recruitment for 3' end processing PNAS, October 23, 2007; 104(43): 16844 - 16849. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Klein, S. Tatzel, S. Raimundo, T. Saussele, E. Hustert, J. Pleiss, M. Eichelbaum, and U. M. Zanger A Natural Variant of the Heme-Binding Signature (R441C) Resulting in Complete Loss of Function of CYP2D6 Drug Metab. Dispos., August 1, 2007; 35(8): 1247 - 1250. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, B.-H. Kim, M. Tang, and N. V. Grishin PROMALS web server for accurate multiple protein sequence alignments Nucleic Acids Res., July 13, 2007; 35(suppl_2): W649 - W652. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. J. Shenker, M. Dlakic, L. P. Walker, D. Besack, E. Jaffe, E. LaBelle, and K. Boesze-Battaglia A Novel Mode of Action for a Microbial-Derived Immunotoxin: The Cytolethal Distending Toxin Subunit B Exhibits Phosphatidylinositol 3,4,5-Triphosphate Phosphatase Activity J. Immunol., April 15, 2007; 178(8): 5099 - 5108. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei and N. V. Grishin PROMALS: towards accurate multiple sequence alignments of distantly related proteins Bioinformatics, April 1, 2007; 23(7): 802 - 808. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Park and V. Helms On the derivation of propensity scales for predicting exposed transmembrane residues of helical membrane proteins Bioinformatics, March 15, 2007; 23(6): 701 - 708. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Thusberg and M. Vihinen The structural basis of hyper IgM deficiency - CD40L mutations Protein Eng. Des. Sel., March 1, 2007; 20(3): 133 - 141. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-Y. Huang, J.-Y. Deng, J. Gu, Z.-P. Zhang, A. Maxwell, L.-J. Bi, Y.-Y. Chen, Y.-F. Zhou, Z.-N. Yu, and X.-E. Zhang The key DNA-binding residues in the C-terminal domain of Mycobacterium tuberculosis DNA gyrase A subunit (GyrA) Nucleic Acids Res., November 14, 2006; 34(19): 5650 - 5659. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. McMahon and M. J. Sanderson Phylogenetic Supermatrix Analysis of GenBank Sequences from 2228 Papilionoid Legumes Syst Biol, October 1, 2006; 55(5): 818 - 836. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Song, J.-H. Choi, G. Chen, J. Szymanski, G.-Q. Zhang, A. K. H. Tung, J. Kang, S. Kim, and J. Yang ARCS: an aggregated related column scoring scheme for aligned sequences Bioinformatics, October 1, 2006; 22(19): 2326 - 2332. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Wainreb, N. Haspel, H. J. Wolfson, and R. Nussinov A permissive secondary structure-guided superposition tool for clustering of protein fragments toward protein structure prediction via fragment assembly Bioinformatics, June 1, 2006; 22(11): 1343 - 1352. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, W. Cai, L. N. Kinch, and N. V. Grishin Prediction of functional specificity determinants from protein sequences using log-likelihood ratios Bioinformatics, January 15, 2006; 22(2): 164 - 171. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lassmann and E. L. L. Sonnhammer Automatic assessment of alignment quality Nucleic Acids Res., December 16, 2005; 33(22): 7120 - 7128. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Douthit, M. Dlakic, D. E. Ohman, and M. J. Franklin Epimerase Active Domain of Pseudomonas aeruginosa AlgG, a Protein That Contains a Right-Handed {beta}-Helix J. Bacteriol., July 1, 2005; 187(13): 4573 - 4583. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. G. Riepe, S. Tatzel, W. G. Sippell, J. Pleiss, and N. Krone Congenital Adrenal Hyperplasia: The Molecular Basis of 21-Hydroxylase Deficiency in H-2aw18 Mice Endocrinology, June 1, 2005; 146(6): 2563 - 2574. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. G. Beiko, C. X. Chan, and M. A. Ragan A word-oriented approach to alignment validation Bioinformatics, May 15, 2005; 21(10): 2230 - 2239. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Soding Protein homology detection by HMM-HMM comparison Bioinformatics, April 1, 2005; 21(7): 951 - 960. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Balasubramanian, Y. Xia, E. Freinkman, and M. Gerstein Sequence variation in G-protein-coupled receptors: analysis of single nucleotide polymorphisms Nucleic Acids Res., March 22, 2005; 33(5): 1710 - 1721. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. DLAKIC 3D models of yeast RNase P/MRP proteins Rpp1p and Pop3p RNA, February 1, 2005; 11(2): 123 - 127. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. FATICA, D. TOLLERVEY, and M. DLAKIC PIN domain of Nob1p is required for D-site cleavage in 20S pre-rRNA RNA, November 18, 2004; 10(11): 1698 - 1701. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Kurzbauer, D. Teis, M. E. G. de Araujo, S. Maurer-Stroh, F. Eisenhaber, G. P. Bourenkov, H. D. Bartunik, M. Hekman, U. R. Rapp, L. A. Huber, et al. Crystal structure of the p14/MP1 scaffolding complex: How a twin couple attaches mitogen-activated protein kinase signaling to late endosomes PNAS, July 27, 2004; 101(30): 10984 - 10989. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Thompson, V. Prigent, and O. Poch LEON: multiple aLignment Evaluation Of Neighbours Nucleic Acids Res., February 24, 2004; 32(4): 1298 - 1307. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Espadaler, N. Fernandez-Fuentes, A. Hermoso, E. Querol, F. X. Aviles, M. J. E. Sternberg, and B. Oliva ArchDB: automated protein loop classification as a tool for structural genomics Nucleic Acids Res., January 1, 2004; 32(90001): D185 - 188. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, N. V. Dokholyan, E. I. Shakhnovich, and N. V. Grishin Using protein design for homology detection and active site searches PNAS, September 30, 2003; 100(20): 11361 - 11366. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Letunic, L. Goodstadt, N. J. Dickens, T. Doerks, J. Schultz, R. Mott, F. Ciccarelli, R. R. Copley, C. P. Ponting, and P. Bork Recent improvements to the SMART domain-based sequence annotation resource Nucleic Acids Res., January 1, 2002; 30(1): 242 - 244. [Abstract] [Full Text] [PDF] |
||||









