Skip Navigation


Bioinformatics Advance Access originally published online on May 3, 2005
Bioinformatics 2005 21(14):3166-3167; doi:10.1093/bioinformatics/bti474
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/14/3166    most recent
bti474v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pible, O.
Right arrow Articles by Pellequer, J.-L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pible, O.
Right arrow Articles by Pellequer, J.-L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

INTERALIGN: interactive alignment editor for distantly related protein sequences

Olivier Pible *, Gilles Imbert and Jean-Luc Pellequer

Commissariat à I'Energie Atomique, CEA VALRTTO, DSV-DIEP-SBTN, Service de Biochimie post-génomique & Toxicologie Nucliaire Bagnols-sur-Cèze, France

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 REFERENCES
 

Summary: Improving and ascertaining the quality of a multiple sequence alignment is a very challenging step in protein sequence analysis. This is particularly the case when dealing with sequences in the ‘twilight zone’, i.e. sharing <30% identity. Here we describe INTERALIGN, a dedicated user-friendly alignment editor including a view of secondary structures and a synchronized display of carbon alpha traces of corresponding protein structures. Profile alignment, using CLUSTALW, is implemented to improve the alignment of a sequence of unknown structure with the visually optimized structural alignment as compared with a standard multiple sequence alignment. Tree-based ordering further helps in identifying the structure closest to a given sequence.

Availability: Windows and Linux packages, as well as source files, are available under the CeCILL free software licensing agreement at the following address: http://www-dsv.cea.fr/content/cea/d_dep/d_diep/d_sbtn/download.htm

Contact: olivier.pible{at}cea.fr

Since structure is more conserved than sequence during evolution, similar sequences usually adopt very similar 3D structures. An unchanging conclusion from the community initiative, CASP (critical assessment of techniques for structure prediction) is that, despite the conservation of structure within distantly related proteins, it is a very challenging task to predict the one-to-one correspondence between residues in the sequence and in the structure. Best results in predicting an alignment between a protein sequence with an unknown structure and a protein sequence whose 3D structure is known are achieved by manually refining automated sequence alignments. Some viewers such as CHIMERA (Pettersen et al., 2004) can link sequences and structures without allowing manual curation of detected alignment problems. True editors of sequence alignments are also available, such as JALVIEW (Clamp et al., 2004), STRAP (Gille et al., 2003) and CINEMA5 (Parry-Smith et al., 1998). However, none of them makes full use of 3D structures to help in manually refining a given multiple sequence alignment.

We present INTERALIGN, a dedicated tool to interactively manipulate and refine multiple sequence alignments using 3D structures. The program is written in java2 and makes use of java3D for the 3D viewer. INTERALIGN is able to read and manipulate protein sequences extracted from the Protein Data Bank (PDB) and from files in CLUSTALW or Fasta format, either from a local or remote file server. The user is able to move a single, a group, or a cluster of residues in the multiple sequence alignment window. If necessary, empty columns can be inserted to introduce gaps. The strength of INTERALIGN is to automatically superimpose protein structures whose sequences are aligned in the multiple sequence alignment window. We choose to display in a given 3D viewer only a pair of superimposed structures in tubes while CA atoms are represented by balls. Interactively, a user can select a range of residues in the multiple sequence alignment window. Then, the selected residues become highlighted and thicker tubes are drawn on the 3D viewer (Fig. 1). Connecting lines between aligned residues can also be displayed to pick a misaligned residue in the 3D viewer which, in turn, will highlight the corresponding region in the multiple sequence alignment. An additional feature, based on atom–atom distances, makes it possible to colour residues in white when they are involved in hydrophobic cores either in {alpha}-helices or in ß-strands. This function makes it easy to highlight conserved core residues among distantly related proteins. To help towards improving the alignment quality, we connected INTERALIGN with a secondary structure prediction tool, PSIPRED (Jones, 1999), and the secondary structure assignment program STRIDE (Frishman and Argos, 1995). For comparative modeling purposes, it is often useful to know where secondary structure elements are located, to avoid the introduction of gaps in the middle of helices or strands. A different colour code is used depending on the origin of secondary structure: light and dark colours for PSIPRED and STRIDE, respectively (Fig. 1).



View larger version (69K):
[in this window]
[in a new window]
 
Fig. 1 Four simultaneous 3D viewers (top) and the multiple sequence alignment window (bottom). Structures in the 3D viewers are drawn as thin tubes, yellow for the reference structure 1MAT and purple for other structures. Selected residues in the multiple sequence alignment window are highlighted in yellow and simultaneously drawn as thick tubes in the 3D viewers. The colour code for residues is kept to minimum, pink for polar, green for hydrophobic and grey for Gly, Tyr and Trp. To run PSIPRED and STRIDE on all sequences and structures, just click on the top left buttons. Secondary structure is colour-coded on the outline of each residue box, red for helices, blue for strands and orange for turns. Gaps in the alignment are represented by a dash character. The RMSD of aligned C{alpha} atoms is shown on the multiple sequence alignment window for the two active structures 1MAT and 1CHM. The help tab describes various menus of INTERALIGN in detail.

 
To complement the 3D display capability, the superimposition RMSD of a chosen pdb pair can be used to accurately fine-tune the alignment at the per residue level. To get an actual feedback on a modification in the multiple sequence alignment window, not only on a given pdb pair but on all of the structures involved in the alignment, the RMSD calculation can be performed using a reference structure superimposed with all the other available PDB structures. This can easily be performed after each alignment modification, and gives a ‘better’ or ‘worse’ type of indication depending on the selection motion's direction, which can be confirmed using the 3D displays.

INTERALIGN offers another useful feature: a ‘one-against-many’ sequence alignment capability. When an alignment is considered acceptable, a sequence of unknown structure can be realigned or a novel sequence can be added to this alignment and aligned using the profile alignment mode of CLUSTALW (Thompson et al., 1994). This will ensure that the structurally optimized alignment remains unchanged. This new alignment can then be reordered using the dendrogram of CLUSTALW (PHYLIP format.dnd). For example, the sequence ordering will highlight the best structural representative for a protein sequence with unknown structure (probably the protein that shares the largest number of aligned residues and with the best resolution among the closest sequences).

These features make INTERALIGN a suitable tool for allowing a quick improvement of multiple sequence alignments. As long as 3D structures span the entire range of sequences, the resulting alignment is optimal. INTERALIGN was helpfully used in the recent CASP6 campaign (J.-L. Pellequer, Personal communication). Benefits are 2-fold: on the one hand it helped in refining the multiple sequence alignment and on the other hand it allowed us to select the best templates for a given target. As more and more 3D structures become available, our tool will be very useful to align and evaluate multiple sequence alignments of distantly related proteins.

Received on January 14, 2005; revised on April 11, 2005; accepted on April 25, 2005

    REFERENCES
 TOP
 Abstract
 REFERENCES
 

    Clamp, M., et al. (2004) The Jalview Java alignment editor. Bioinformatics, 20, 426–427[Abstract/Free Full Text].

    Frishman, D. and Argos, P. (1995) Knowledge-based protein secondary structure assignment. Proteins, 23, 566–579[CrossRef][Web of Science][Medline].

    Gille, C., et al. (2003) KISS for STRAP: user extensions for a protein alignment editor. Bioinformatics, 19, 2489–2491[Abstract/Free Full Text].

    Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292, 195–202[CrossRef][Web of Science][Medline].

    Parry-Smith, D.J., et al. (1998) CINEMA—a novel colour INteractive editor for multiple alignments. Gene, 221, GC57–GC63[CrossRef][Web of Science][Medline].

    Pettersen, E.F., et al. (2004) UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem., 25, 1605–1612[CrossRef][Web of Science][Medline].

    Thompson, J.D., et al. (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/14/3166    most recent
bti474v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pible, O.
Right arrow Articles by Pellequer, J.-L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pible, O.
Right arrow Articles by Pellequer, J.-L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?