Skip Navigation


Bioinformatics Advance Access originally published online on August 23, 2006
Bioinformatics 2006 22(21):2691-2692; doi:10.1093/bioinformatics/btl449
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/21/2691    most recent
btl449v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rai, B. K.
Right arrow Articles by Fiser, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rai, B. K.
Right arrow Articles by Fiser, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

MMM: a sequence-to-structure alignment protocol

Brajesh K. Rai , Carlos J. Madrid-Aliste , J. Eduardo Fajardo and András Fiser *

Department of Biochemistry and Seaver Center for Bioinformatics, Albert Einstein College of Medicine 1300 Morris Park Avenue, Bronx, NY 10461, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 USING THE SERVER
 REFERENCES
 

Motivation: Accurate alignment of a target sequence to a template structure continues to be a bottleneck in producing good quality comparative protein structure models.

Results: Multiple Mapping Method (MMM) is a comparative protein structure modeling server with an emphasis on a novel alignment optimization protocol. MMM takes inputs from five profile-to-profile based alignment methods. The alternatively aligned regions from the input alignment set are combined according to their fit in the structural environment of the template structure. The resulting, optimally spliced MMM alignment is used as input to an automated comparative modeling module to produce a full atom model.

Availability: The MMM server is freely accessible at http://www.fiserlab.org/servers/mmm

Contact: andras{at}fiserlab.org


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 USING THE SERVER
 REFERENCES
 
Comparative protein structure modeling relies on detectable similarity spanning most of the modeled sequence and at least one known structure (Fiser, 2004). When the structure of one protein in the family has been determined by experiment, the other members of the family can be modeled based on their alignment to the known structure. Accurate alignment of a target sequence to a template structure continues to be a bottleneck in producing good quality homology models. A number of alignment methods have been developed and are publicly available (Edgar, 2004; Madhusudhan et al., 2006). However, none of these alignment methods consistently produces better solution for all cases (Prasad et al., 2003; Rai and Fiser, 2006). Furthermore, alignments produced by different methods are often better in some regions and worse in others when compared with each other. One possible solution to this problem is to consider several alignment methods and combine better-aligned parts into a unique solution (Kosinski et al., 2005).

Mutliple Mapping Method (MMM) has been developed to minimize errors associated with input alignments. The details of the method and its performance have been recently described elsewhere (Rai and Fiser, 2006). In brief, MMM constructs an optimal and often unique solution from an arbitrary set of input alignments. The two principal components of this method are (i) Sampling and (ii) Scoring. MMM efficiently explores a limited, but biologically relevant, sampling space, which is defined by the observed differences in a set of alternative alignments of the same template and target sequence using different methods (or using the same method but different parameters). A composite environment-specific scoring function is used to evaluate various sampling scenarios that is composed of (i) environment-specific substitution matrices from FUGUE (Shi et al., 2001); (ii) a 3D–1D substitution matrix, H3P2 (Rice and Eisenberg, 1997), that scores the matches of predicted secondary structure of the target sequence to the observed secondary structures and accessibility types of the template residues; and (iii) a statistically derived residue–residue contact energy term, which determines the compatibility of alternative variable segments in the protein environment (Miyazawa and Jernigan, 1996).

MMM combines the better-aligned parts into a unique solution, which, on average, is more accurate than any of the input alignments alone (Rai and Fiser, 2006).


    2 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 USING THE SERVER
 REFERENCES
 
The current implementation of the server has several new additions as compared with the original publication. One of them is the expansion of the set of alignment tools that are implemented; another one is the automated building of multiple sequence profiles, to perform profile-to-profile alignments instead of simple pairwise alignments.

The server takes as input the target sequence (to be modeled) and the Protein Data Bank (PDB) code of the template structure or a user uploaded coordinate file that serves as template in a subsequent modeling exercise. The server currently provides five competitive alternative alignment approaches of which at least two need to be selected: ClustalW (Thompson et al., 1994), ClustalW with modified gap penalty function, Align2D (Sali and Blundell, 1993), MUSCLE (Edgar, 2004) and T-Coffee (Notredame et al., 2000). In addition any number of other alignments can be added from other sources, e.g. manually edited ones.

Next, a newly developed module, BlastProfiler is run to build sequence profiles for both the target and template sequences. BlastProfiler initiates a PSI-BLAST search on a locally installed and frequently updated NR (Boeckmann et al., 2003) database. The program then parses all iterations of PSI-BLAST outputs and locates and stores those pairwise alignments between the query and database sequences that meet the filtering criteria. The values specified for filtering are as follows: (i) Lower and upper cutoffs for percent sequence identities between the hit and the query, as reported in the pairwise Blast alignment; 30 and 90%, respectively. (ii) Lower bound for alignment length is 30 residues. (iii) Maximal E-value for each hit is 1E – 4. (iv) Minimal required coverage of the query in the alignment, in percentage; default: 30%. Typically the PSI-BLAST output contains more than one alignment for the same hit sequence, especially when multiple iterations are performed. Such alternative alignments may include either the same or different regions of the hit sequence. Alignments to different regions of the target are kept as separate entries. Two alignments that involve the same hit sequence are considered redundant if the overlap is >50%. Because alignments produced in later iterations contain more specific information about the sequence profile, these alignments are preferred over earlier ones in case of overlapping cases.

The second major step in the selection of a set of representative hit sequences is removing sequence redundancy by the CD-HIT clustering program (Li et al., 2002) at 40% identity level. At the end of this step, alternative profile-to-profile-based sequence alignments are available, which are used as input to the MMM module (Rai and Fiser, 2006). MMM samples all alternatives and splices together an optimal, consensus alignment, which is then used as input to MODELLER (Sali and Blundell, 1993) to generate an all-atom comparative model for the target sequence.

The performance of the original MMM method is discussed in detail in a recent publication (Rai and Fiser, 2006). The algorithm was tested on a dataset of 1400 protein pairs using 11 combinations of 2–5 alignment methods. In all cases MMM showed statistically significant improvement by reducing alignment errors in the range of 3–17%. MMM also compared favorably over two alignment meta-servers tested (Lambert et al., 2002; Prasad et al., 2003). Figure 1 illustrates that the current implementation of MMM using sequence profiles and an optimized set of input alignments further improves performance over the previous version of the program.


Figure 1
View larger version (29K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Performance of input alignments (CW: ClustalW with default gap penalty; CWM: ClustalW with modified gap penalty; ali2d: Align2D; Muscle) when using them in pairwise (PW) and Profile mode as a function of alignment error. The corresponding MMM performances are shown, standard errors are indicated with vertical lines. The performances are determined by calculating the percent of differently aligned positions between the test alignments and the gold-standard STAMP (Russell and Barton, 1992) structural alignment for the same template–target pair.

 

    3 USING THE SERVER
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 USING THE SERVER
 REFERENCES
 
The MMM server has a straightforward interface. The user only needs to provide a target sequence, which can be entered in a text box, or can be uploaded as a text file and a PDB code for the template structure or upload a coordinate file in the PDB format. The target sequence must either be in pure text containing one letter amino acid codes (without any header), in the FASTA (Pearson, 1990) or in the PIR format. The user also needs to supply a return e-mail address.

The MMM server returns a full atom model in the PDB format as output. In addition the alignment that is used for modeling is sent to the user by e-mail.


    Acknowledgments
 
Financial support for this work was provided by NIH GM62519-04 and the Seaver Foundation.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on June 2, 2006; revised on July 17, 2006; accepted on August 16, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 USING THE SERVER
 REFERENCES
 

    Boeckmann, B., et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res, . 31, 365[Abstract/Free Full Text].

    Edgar, R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5, 113[CrossRef][Medline].

    Fiser, A. (2004) Protein structure modeling in the proteomics era. Exp. Rev. Proteom, . 1, 97–110.

    Kosinski, J., et al. (2005) FRankenstein becomes a cyborg: the automatic recombination and realignment of fold recognition models in CASP6. Proteins, 61, Suppl. 7, 106–113.

    Lambert, C., et al. (2002) ESyPred3D: prediction of proteins 3D structures. Bioinformatics, 18, 1250–1256[Abstract/Free Full Text].

    Li, W., et al. (2002) Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics, 18, 77–82[Abstract/Free Full Text].

    Madhusudhan, M.S., et al. (2006) Variable gap penalty for protein sequence–structure alignment. Protein Eng. Des. Sel, . 19, 129–133[Abstract/Free Full Text].

    Miyazawa, S. and Jernigan, R.L. (1996) Residue–residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol, . 256, 623[CrossRef][ISI][Medline].

    Notredame, C., et al. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol, . 302, 205[CrossRef][ISI][Medline].

    Pearson, W.R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol, . 183, 63[ISI][Medline].

    Prasad, J.C., et al. (2003) Consensus alignment for reliable framework prediction in homology modeling. Bioinformatics, 19, 1682[Abstract/Free Full Text].

    Rai, B.K. and Fiser, A. (2006) Multiple mapping method: a novel approach to the sequence-to-structure alignment problem in comparative protein structure modeling. Proteins, 63, 644–661[CrossRef][ISI][Medline].

    Rice, D.W. and Eisenberg, D. (1997) A 3D–1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. J. Mol. Biol, . 267, 1026[CrossRef][ISI][Medline].

    Russell, R.B. and Barton, G.J. (1992) Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels. Proteins, 14, 309[CrossRef][ISI][Medline].

    Sali, A. and Blundell, T.L. (1993) Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol, . 234, 779–815[CrossRef][ISI][Medline].

    Shi, J., et al. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol, . 310, 243[CrossRef][ISI][Medline].

    Thompson, J.D., et al. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, . 22, 4673[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
R. Rubinstein and A. Fiser
Predicting disulfide bond connectivity in proteins by correlated mutations analysis
Bioinformatics, February 15, 2008; 24(4): 498 - 504.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Fernandez-Fuentes, C. J. Madrid-Aliste, B. K. Rai, J. E. Fajardo, and A. Fiser
M4T: a comparative protein structure modeling server
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W363 - W368.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
M. Torres, N. Fernandez-Fuentes, A. Fiser, and A. Casadevall
The Immunoglobulin Heavy Chain Constant Region Affects Kinetic and Thermodynamic Parameters of Antibody Variable Region Interactions with Antigen
J. Biol. Chem., May 4, 2007; 282(18): 13917 - 13927.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/21/2691    most recent
btl449v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rai, B. K.
Right arrow Articles by Fiser, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rai, B. K.
Right arrow Articles by Fiser, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?