Bioinformatics Advance Access originally published online on May 3, 2005
Bioinformatics 2005 21(14):3166-3167; doi:10.1093/bioinformatics/bti474
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
INTERALIGN: interactive alignment editor for distantly related protein sequences
Commissariat à I'Energie Atomique, CEA VALRTTO, DSV-DIEP-SBTN, Service de Biochimie post-génomique & Toxicologie Nucliaire Bagnols-sur-Cèze, France
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: Improving and ascertaining the quality of a multiple sequence alignment is a very challenging step in protein sequence analysis. This is particularly the case when dealing with sequences in the twilight zone, i.e. sharing <30% identity. Here we describe INTERALIGN, a dedicated user-friendly alignment editor including a view of secondary structures and a synchronized display of carbon alpha traces of corresponding protein structures. Profile alignment, using CLUSTALW, is implemented to improve the alignment of a sequence of unknown structure with the visually optimized structural alignment as compared with a standard multiple sequence alignment. Tree-based ordering further helps in identifying the structure closest to a given sequence.
Availability: Windows and Linux packages, as well as source files, are available under the CeCILL free software licensing agreement at the following address: http://www-dsv.cea.fr/content/cea/d_dep/d_diep/d_sbtn/download.htm
Contact: olivier.pible{at}cea.fr
Since structure is more conserved than sequence during evolution, similar sequences usually adopt very similar 3D structures. An unchanging conclusion from the community initiative, CASP (critical assessment of techniques for structure prediction) is that, despite the conservation of structure within distantly related proteins, it is a very challenging task to predict the one-to-one correspondence between residues in the sequence and in the structure. Best results in predicting an alignment between a protein sequence with an unknown structure and a protein sequence whose 3D structure is known are achieved by manually refining automated sequence alignments. Some viewers such as CHIMERA (Pettersen et al., 2004) can link sequences and structures without allowing manual curation of detected alignment problems. True editors of sequence alignments are also available, such as JALVIEW (Clamp et al., 2004), STRAP (Gille et al., 2003) and CINEMA5 (Parry-Smith et al., 1998). However, none of them makes full use of 3D structures to help in manually refining a given multiple sequence alignment.
We present INTERALIGN, a dedicated tool to interactively manipulate and refine multiple sequence alignments using 3D structures. The program is written in java2 and makes use of java3D for the 3D viewer. INTERALIGN is able to read and manipulate protein sequences extracted from the Protein Data Bank (PDB) and from files in CLUSTALW or Fasta format, either from a local or remote file server. The user is able to move a single, a group, or a cluster of residues in the multiple sequence alignment window. If necessary, empty columns can be inserted to introduce gaps. The strength of INTERALIGN is to automatically superimpose protein structures whose sequences are aligned in the multiple sequence alignment window. We choose to display in a given 3D viewer only a pair of superimposed structures in tubes while CA atoms are represented by balls. Interactively, a user can select a range of residues in the multiple sequence alignment window. Then, the selected residues become highlighted and thicker tubes are drawn on the 3D viewer (Fig. 1). Connecting lines between aligned residues can also be displayed to pick a misaligned residue in the 3D viewer which, in turn, will highlight the corresponding region in the multiple sequence alignment. An additional feature, based on atomatom distances, makes it possible to colour residues in white when they are involved in hydrophobic cores either in
-helices or in ß-strands. This function makes it easy to highlight conserved core residues among distantly related proteins. To help towards improving the alignment quality, we connected INTERALIGN with a secondary structure prediction tool, PSIPRED (Jones, 1999), and the secondary structure assignment program STRIDE (Frishman and Argos, 1995). For comparative modeling purposes, it is often useful to know where secondary structure elements are located, to avoid the introduction of gaps in the middle of helices or strands. A different colour code is used depending on the origin of secondary structure: light and dark colours for PSIPRED and STRIDE, respectively (Fig. 1).
|
To complement the 3D display capability, the superimposition RMSD of a chosen pdb pair can be used to accurately fine-tune the alignment at the per residue level. To get an actual feedback on a modification in the multiple sequence alignment window, not only on a given pdb pair but on all of the structures involved in the alignment, the RMSD calculation can be performed using a reference structure superimposed with all the other available PDB structures. This can easily be performed after each alignment modification, and gives a better or worse type of indication depending on the selection motion's direction, which can be confirmed using the 3D displays.
INTERALIGN offers another useful feature: a one-against-many sequence alignment capability. When an alignment is considered acceptable, a sequence of unknown structure can be realigned or a novel sequence can be added to this alignment and aligned using the profile alignment mode of CLUSTALW (Thompson et al., 1994). This will ensure that the structurally optimized alignment remains unchanged. This new alignment can then be reordered using the dendrogram of CLUSTALW (PHYLIP format.dnd). For example, the sequence ordering will highlight the best structural representative for a protein sequence with unknown structure (probably the protein that shares the largest number of aligned residues and with the best resolution among the closest sequences).
These features make INTERALIGN a suitable tool for allowing a quick improvement of multiple sequence alignments. As long as 3D structures span the entire range of sequences, the resulting alignment is optimal. INTERALIGN was helpfully used in the recent CASP6 campaign (J.-L. Pellequer, Personal communication). Benefits are 2-fold: on the one hand it helped in refining the multiple sequence alignment and on the other hand it allowed us to select the best templates for a given target. As more and more 3D structures become available, our tool will be very useful to align and evaluate multiple sequence alignments of distantly related proteins.
Received on January 14, 2005; revised on April 11, 2005; accepted on April 25, 2005
| REFERENCES |
|---|
|
|
|---|
Clamp, M., et al. (2004) The Jalview Java alignment editor. Bioinformatics, 20, 426427
Frishman, D. and Argos, P. (1995) Knowledge-based protein secondary structure assignment. Proteins, 23, 566579[CrossRef][Web of Science][Medline].
Gille, C., et al. (2003) KISS for STRAP: user extensions for a protein alignment editor. Bioinformatics, 19, 24892491
Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292, 195202[CrossRef][Web of Science][Medline].
Parry-Smith, D.J., et al. (1998) CINEMAa novel colour INteractive editor for multiple alignments. Gene, 221, GC57GC63[CrossRef][Web of Science][Medline].
Pettersen, E.F., et al. (2004) UCSF Chimeraa visualization system for exploratory research and analysis. J. Comput. Chem., 25, 16051612[CrossRef][Web of Science][Medline].
Thompson, J.D., et al. (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 46734680
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
