Skip Navigation


Bioinformatics Advance Access originally published online on August 16, 2005
Bioinformatics 2005 21(19):3803-3805; doi:10.1093/bioinformatics/bti619
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/19/3803    most recent
bti619v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, W.
Right arrow Articles by Feng, J.-A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, W.
Right arrow Articles by Feng, J.-A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

NdPASA: a pairwise sequence alignment server for distantly related proteins

Wei Li 1, Junwen Wang 2,{dagger} and Jin-An Feng 1,2,*

1Department of Chemistry, Temple University Philadelphia, PA 19122, USA
2Center for Biotechnology, Temple University Philadelphia, PA 19122, USA

*To whom correspondence should be addressed at Department of chemistry, Temple University, 1901 N. 13th street, Philadelphia, PA19122, USA


    Abstract
 TOP
 Abstract
 NdPASA SERVER
 AVAILABILITY
 REFERENCES
 

Summary: NdPASA is a web server specifically designed to optimize sequence alignment between distantly related proteins. The program integrates structure information of the template sequence into a global alignment algorithm by employing neighbor-dependent propensities of amino acids as a unique parameter for alignment. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. NdPASA is most effective in aligning homologous proteins sharing low percentage of sequence identity. The server is designed to aid homologous protein structure modeling. A PSI-BLAST search engine was implemented to help users identify template candidates that are most appropriate for modeling the query sequences.

Availability: http://guanyin.chem.temple.edu

Contact: feng{at}temple.edu

Protein sequence alignment is an essential component of biomedical research. It is one of the standard approaches to explore potential functional activity of a newly discovered protein by identifying sequence homologues that may be evolutionarily related (Pearson and Lipman, 1988). Structural and functional information of a new protein can often be inferred from the knowledge of well-characterized homologous proteins. An accurate sequence alignment is critical to comparative protein structure prediction. While closely related protein sequences in general are relatively easy to align using the existing sequence-based methods, the success rate of these methods in finding correct alignment is significantly reduced when the sequence identity between two aligned sequences is <25%, a threshold often referred to as the twilight zone (Rost, 1999). Recent attempts to improve pairwise sequence alignment have benefited greatly from incorporating sequence-profile and structural information into the alignment algorithms (Marti-Renom et al., 2004; Wang and Feng, 2005).

NdPASA is a web server for pairwise sequence alignment of distantly related homologous proteins. It provides a user-friendly interface for a global sequence alignment algorithm that incorporates neighbor-dependent amino acid propensity. By utilizing the structural information on the template sequence, NdPASA has significant improvements over the standard PSI-BLAST in aligning sequence pairs with <20% sequence identity (Wang and Feng, 2005). In addition to neighbor-dependent amino acid secondary structure propensities, the algorithm also utilizes a structure-dependent gap opening and extension penalty scheme. A higher gap penalty was applied for gaps that occurred within the regular secondary structures than for gaps that occurred in the loops. The neighbor-dependent amino acid secondary structure propensities were derived from sequence analysis of proteins that calculated the effect of neighboring amino acid type on the propensity of residues for adopting {alpha}-helices, ß-strands and loops in proteins (Crasto and Feng, 2001; Wang and Feng, 2003). The values of neighbor-dependent propensity reflected the likelihood of an amino acid pair adopting a particular secondary structure conformation. The rationale for the utilization of neighbor-dependent amino acid propensity in sequence alignment is easily recognized. Methods employing sequence-based substitution matrix often have limited success in aligning sequences sharing low percentage of sequence identity. The incorporation of the neighbor-dependent amino acid propensities allowed us to estimate the probability of an amino acid pair to be aligned with a corresponding amino acid pair adopting a specific secondary structure in the template sequence. For example, an amino acid pair in the query sequence having a low neighbor-dependent propensity for {alpha}-helical conformation would be less likely aligned with an amino acid pair in an {alpha}-helix of the template sequence. NdPASA performs most effectively when the structural information of the template sequence is available.

The NdPASA incorporated the information of secondary structure propensity into the Needleman–Wunsch global alignment algorithm with affined gap penalty (Wang and Feng, 2005). A scaling factor was introduced to augment the relative weight between the neighbor-dependent secondary structure propensity score and the amino acid substitution score. The default substitution matrix was BLOSUM62. The gap opening and extension penalties were introduced as secondary structure-dependent parameters, whose values were estimated from optimizing the alignment accuracy of 500 randomly selected homologous sequence pairs that were used as a training dataset (Wang and Feng, 2005). Considering that regular secondary structures were often more conserved than loop regions, the gap opening penalties for the helices and strands were assigned higher values than that for the loops.

A detailed analysis on the performance and the benchmarking tests of the NdPASA algorithm is presented elsewhere (Wang and Feng, 2005). Using super-positions of homologous proteins derived from the PSI-BLAST analysis and the SCOP classification of a non-redundant Protein Data Bank (PDB) database as a gold standard, we found that NdPASA had improved pairwise alignment. Statistical analyses of the performance of NdPASA indicated that the introduction of sequence patterns of secondary structure derived from the neighbor-dependent sequence analysis clearly improved alignment performance for sequence pairs sharing <20% sequence identity. For sequence pairs sharing 13–21% sequence identity, NdPASA improved the accuracy of alignment over the conventional global alignment (GA) algorithm using the BLOSUM62 by an average of 8.6% (Wang and Feng, 2005).


    NdPASA SERVER
 TOP
 Abstract
 NdPASA SERVER
 AVAILABILITY
 REFERENCES
 
The NdPASA server is designed mainly to aid homologous protein structure modeling of query sequences. It provides a simple user interface for easy interaction. Figure 1a shows a schematic diagram of the algorithm implemented in the server. Since NdPASA alignment is most effective when the structural information of the template is available, we designed an input page with three options. In addition to entering a query sequence, the user may input either the sequence or the PDB entry-ID of a template protein. If a user-specified template sequence is entered, the NdPASA server will perform a PSI-BLAST search against the PDB database and return results containing PDB entries that share at least 80% sequence identity with the template (Altschul et al., 1997). The user is asked to select one of the PDB entry-IDs as the desired template for subsequent pairwise sequence alignment with the query sequence. When the identity of the template is determined, the NdPASA server assigns secondary structure elements using DSSP for the template before applying NdPASA algorithm for sequence alignment with the query sequence (Kabsch and Sander, 1983). However, if no sequence match is found between the input template and the proteins in the PDB, the user can submit the template sequence to the PSIPRED server for secondary structure predictions (Jones, 1999; http://bioinf.cs.ucl.ac.uk/psipred/psiform.html). The NdPASA accepts the returned secondary structure assignments for subsequent alignment. However, when the user has no information about the template, the NdPASA server will perform a PSI-BLAST search against the non-redundant protein structure database (PDB) for sequences homologous to the query using the BLOSUM62 matrix. The user also has the option to choose different scoring matrices, including PAM250, PAM300, PAM120, BLOSUM35, BLOSUM45, BLOSUM50, BLOSUM60, BLOSUM62 and BLOSUM80. In addition, the gap opening and extending penalty parameters can also be changed. The default options of BLOSUM62, as well as the gap opening (–11) and extending (–1) penalties, were selected based on experimental tests by aligning 2021 pairs of remotely related sequences when the NdPASA yielded the best overall results (Wang and Feng, 2005). The returned results contain essential information that may be helpful for user to determine the most appropriate template candidate for the query sequence. All returned results are displayed with their sequence names, PDB entry-ID, PSI-BLAST scores and the percentage sequence identities, as determined by the PSI-BLAST when compared with the query protein. In order to limit the scope of template selection, we specified the PSI-BLAST output to contain only the sequences with either top 5 or 15 ranked scores for inspection. An optional filter was also implemented where the user may limit the output of the PSI-BLAST search to those sequences that share sequence homology above a defined identity range. When a template candidate is identified, the user may select the radial button next to the desired sequence and click ‘submit’ for optimized pairwise sequence alignment by NdPASA. Upon receiving a command to align the query sequence against one of the templates identified by PSI-BLAST, the program fetches the template sequence from the PDB and assigns a secondary structure conformation for every residue in the template by using DSSP (Kabsch and Sander, 1983). It then performs NdPASA alignment incorporating the structure information of the template derived from the DSSP. The result of the NdPASA alignment is displayed in a pop-up window with the query and the template sequences aligned (Fig. 1b). The secondary structure information of the template sequence is also displayed for inspection. NdPASA also produce an alignment in the standardized FASTA format so that the results can be easily integrated with other bioinformatics tools.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 1 (a) A schematic diagram of the NdPASA server and (b) an example of the NdPASA alignment output.

 

    AVAILABILITY
 TOP
 Abstract
 NdPASA SERVER
 AVAILABILITY
 REFERENCES
 
NdPASA was implemented in JAVA. It was compiled on a LINUX-based workstation. The web server can be freely accessed on the World Wide Web at http://guanyin.chem.temple.edu. A brief description of the program and a detailed user guide with examples are also available at the website.


    Acknowledgments
 
The authors would like to thank members of the Feng laboratory for helpful discussions. The authors also thank the American Cancer Society for the financial support (PRG9926301GMC) and the commonwealth of Pennsylvania for the appropriation.

Conflict of Interest: none declared.


    Footnotes
 
{dagger}Present address: Department of Genetics, Center for Bioinformatics, University of Pennsylvania, PA 19104, USA Back

Received on May 9, 2005; revised on July 13, 2005; accepted on August 8, 2005

    REFERENCES
 TOP
 Abstract
 NdPASA SERVER
 AVAILABILITY
 REFERENCES
 

    Altschul, S.F., et al. (1997) Gapped BLAST and PSI–BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402[Abstract/Free Full Text].

    Crasto, C.J. and Feng, J.A. (2001) Sequence codes for extended conformation: a neighbor-dependent sequence analysis of loops in proteins. Proteins, 42, 399–413[CrossRef][ISI][Medline].

    Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292, 195–202[CrossRef][ISI][Medline].

    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637[CrossRef][ISI][Medline].

    Marti-Renom, M.A., et al. (2004) Alignment of protein sequences by their profiles. Protein Sci., 13, 1071–1087[Abstract/Free Full Text].

    Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 85, 2444–2448[Abstract/Free Full Text].

    Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Eng., 12, 85–94[Abstract/Free Full Text].

    Wang, J. and Feng, J.A. (2003) Exploring the sequence patterns in the alpha-helices of proteins. Protein Eng., 16, 799–807[Abstract/Free Full Text].

    Wang, J. and Feng, J.A. (2005) NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities. Proteins, 58, 628–637[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/19/3803    most recent
bti619v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, W.
Right arrow Articles by Feng, J.-A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, W.
Right arrow Articles by Feng, J.-A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?