Skip Navigation


Bioinformatics Advance Access originally published online on April 27, 2006
Bioinformatics 2006 22(13):1656-1657; doi:10.1093/bioinformatics/btl157
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/13/1656    most recent
btl157v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mihalek, I.
Right arrow Articles by Lichtarge, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mihalek, I.
Right arrow Articles by Lichtarge, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Evolutionary trace report_maker: a new type of service for comparative analysis of proteins

I. Mihalek *, I. Res and O. Lichtarge

Department of Molecular and Human Genetics, Baylor College of Medicine One Baylor Plaza, Houston, TX 77030, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 IMPLEMENTATION AND DEPENDENCIES
 METHODS
 FEATURES
 CONCLUSION
 REFERENCES
 

Summary: Evolutionary trace report_maker offers a new type of service for researchers investigating the function of novel proteins. It pools, from different sources, information about protein sequence, structure and elementary annotation, and to that background superimposes inference about the evolutionary behavior of individual residues, using real-valued evolutionary trace method. As its only input it takes a Protein Data Bank identifier or UniProt accession number, and returns a human-readable document in PDF format, supplemented by the original data needed to reproduce the results quoted in the report.

Availability: Evolutionary trace reports are freely available for academic users at http://mammoth.bcm.tmc.edu/report-maker

Contact: {imihalek,ires,lichtarge}@bcm.tmc.edu


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 IMPLEMENTATION AND DEPENDENCIES
 METHODS
 FEATURES
 CONCLUSION
 REFERENCES
 
Evolutionary trace (ET) (Lichtarge et al., 1996) is a method that uses a sequence similarity tree of a family of homologous proteins to highlight residues, which are statistically likely to be under evolutionary pressure and, therefore, of functional or structural importance for the family. This enables researchers looking for residues responsible for a detectable phenotype change to focus their search on the individual residues or regions of proteins most likely to produce such a change e.g. (Shenoy et al., 2006). Real-valued ET (Mihalek et al., 2004) is particularly robust and suitable for making blind predictions about the evolutionary behavior of protein residues. When the structure is available, ET results may be mapped onto the structure, thus outlining known as well as putative functional parts of the protein surface.

Several servers are already available to rank protein residues according to the estimated evolutionary pressure they experience [(Innis et al., 2000, http://www-cryst.bioc.cam.ac.uk/jiye/evoltrace/evoltrace.html.; Valdar, 2002, http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/valdar/scorecons_server.pl.; Glaser et al., 2003, http://consurf.tau.ac.il/)]. They make their results available in the form of tables of residue scores and scripts for a visualization program. In this article, we describe a server geared toward the community of experimental protein scientists, which collects ET relevant data and presents them as a report-style, human-readable and printable text.


    IMPLEMENTATION AND DEPENDENCIES
 TOP
 ABSTRACT
 INTRODUCTION
 IMPLEMENTATION AND DEPENDENCIES
 METHODS
 FEATURES
 CONCLUSION
 REFERENCES
 
ET report_maker is implemented as a set of interacting Perl modules, with computationally demanding parts written in C, and visualization of the mapping of results onto primary sequence implemented in Java. Report_maker draws on the work of many researchers outside of our own group. The programs, all free for academic users, that report_maker depends on are the following: alistat (statistical profile of a multiple sequence alignment; part of the HMMER package http://hmmer.wustl.edu), BLAST [sequence database search; (Altschul et al., 1997)], CE [structural alignment of proteins, used in the mapping of geometrically determined ligand binding surfaces between different Protein Data Bank (PDB) entries; (Shindyalov and Bourne, 1998)], ClustalW [multiple sequence alignment for a small number of sequences; (Thompson et al., 1994)], DSSP [determination of protein surface; (Kabsch and Sander, 1993)], LaTex [typesetting of the final text; (Lamport, 1986)], Muscle [multiple sequence alignment for a large number of sequences; (Edgar, 2004)], PyMol [structure visualization; (DeLano, 2002, http://www.pymol.org)]. It also relies on the following publicly available databases: HSSP (Sander and Schneider, 1991), PDB (Berman et al., 2000) and UniProt (Boeckmann et al., 2005).


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 IMPLEMENTATION AND DEPENDENCIES
 METHODS
 FEATURES
 CONCLUSION
 REFERENCES
 
Real-valued evolutionary trace. To rank the evolutionary importance of residues, report_maker uses real-valued ET, described in Mihalek et al. (2004).

Heuristic suggestions for disruptive mutations. Report_maker makes some heuristic suggestions for mutations, meant to be disruptive to the interaction of the protein with its ligand. They are based on complementarity of the proposed mutation with the physical and chemical properties of a residue and its substitutes found in the alignment.

The attempt is made to complement the following 11 properites p: small [A V G S T C], medium [L P N Q D E M I K], large [W F Y H R]; hydrophobic [L P V A M W F I], polar [G T C Y]; positively [K H R] or negatively charged [D E]; aromatic [W F Y H]; long aliphatic chain [E K R Q M]; OH-group possession [S D E T Y]; and NH2 group possession [N Q R K]. For a given column i, a score Si(a)is assigned to each of 20 amino acid types a:

Formula 1(1)
where the sum runs over the 11 properties p; sp(a) is a function that assigns 1 to an amino acid a that has the property p (given in square brackets), and 0 to all other types. qi is the amino acid type at the i-th position of the query protein, and Formula 1 is average for all substitution amino acid types found in the column i. Thus, Si(a) is the biggest for the cases when the amino acid type a differs in the maximal number of properties from the amino acid types already seen in the alignment column. Amino acid types with high Si(a) are listed in report_maker as disruptive mutation suggestions for some top ranking residues.

Geometric estimate of functional residues. In report_maker, protein residues are estimated to be involved in ligand interaction if in the co-crystal they have at least one heavy atom within 5 Å of the ligand.

Sequence selection. Report_maker imposes the following requirements on sequences: they should not be < 75% of the query, no pair should be >99% identical and the final homologue set should consist of at least 10 sequences; also to perform meaningful database searches, the minimal length of the analyzed protein sequence be 20 residues.

When the structure is known but no HSSP alignment is available, report_maker uses Monte Carlo sequence selection procedure described in Mihalek et al. (2006).


    FEATURES
 TOP
 ABSTRACT
 INTRODUCTION
 IMPLEMENTATION AND DEPENDENCIES
 METHODS
 FEATURES
 CONCLUSION
 REFERENCES
 
ET report_maker takes as input a PDB identifier or UniProt accession number. The amount of work it does subsequently, as well as the output, depends on the data available for the query protein (or complex).

The default output is a brief statistical description of the alignment used, in terms of its homologue and taxonomical content, and the estimate of the evolutionary pressure on the protein residues, mapped onto the primary sequence.

If the input is a PDB identifier, or if a related structure can be found, the following additional output is produced: (1) structural map of the evolutionary pressure, (2) discussion of binding sites for known small ligands and protein binding partners, (3) outline of potential novel active sites on the protein surface and (4) suggestions for mutations to block (disrupt) protein function through known and putative binding surfaces.

If the input is PDB identifier and the related entry consists of several protein chains, all chains are discussed.

The whole analysis is presented in the form of a human-readable and printable document.

Finally, report_maker will produce an accompanying package of data essential for reproducing results given in the report: the alignments in GCG format; a brief description of sequences used; the raw ET output; PyMol scripts with mapping of ET data onto structure (when applicable) and an etvx file for ET Viewer (D. Morgan and O. Lichtarge, manuscript in preparation, http://mammoth.bcm.tmc.edu/traceview.).

The list of possible extensions for report_maker is long: a better evalutaion of the impact of suggested mutations [see (Capriotti et al., 2005) and references therein], difference analysis (Madabushi et al., 2004), prediction of specificity determinants and an expert input page for interested users, to name a few possibilities. At the same time it is our hope that the current version of report_maker will serve as an invitation for collaboration with experimental groups, which may be aware of the possibility for customization of the report, such as by providing a model structure or relevant selection of sequences for the comparative analysis.


    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 IMPLEMENTATION AND DEPENDENCIES
 METHODS
 FEATURES
 CONCLUSION
 REFERENCES
 
In the current era of information overflow, ET report_maker provides a new data accumulation and presentation service, allowing the users to focus quickly on interesting features of protein under the investigation. The resulting printable report, is our hope, will help bridge the time-consuming gap between multiple databases and services for comparative analysis of proteins, and the laboratory bench.


    Acknowledgments
 
The authors thank Mr José Rivera for the technical help in setting up the server, and the members of the Lichtarge laboratory for testing and criticizing the reports. This work was supported by grants from the National Institute of Health (GM066099), the National Science Foundation (DBI #0318415) and the March of Dimes (MOD FY03-93).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on March 7, 2006; revised on April 4, 2006; accepted on April 20, 2006

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 IMPLEMENTATION AND DEPENDENCIES
 METHODS
 FEATURES
 CONCLUSION
 REFERENCES
 

    Altschul, S., et al. (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acid Res, . 25, 3389–3402[Abstract/Free Full Text].

    Berman, H., et al. (2000) The Protein Data Bank. Nucleic Acid Res, . 28, 235–242[Abstract/Free Full Text].

    Boeckmann, B., et al. (2005) The Universal Protein Resource (Uniprot). Nucleic Acid Res, . 33, D154–D159[Abstract/Free Full Text].

    Capriotti, E., et al. (2005) Predicting protein stability changes from sequences using support vector machines. Bioinformatics, 21, ii54–ii58[Abstract].

    DeLano, W. (2002) The pymol molecular graphics system.

    Edgar, R. (2004) Muscle: multiple sequence alignment with high accuracy and high throughput. Nuclic Acid Res, . 32, 1792–1797.

    Glaser, F. (2003) Consurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics, 19, 163–164[Abstract/Free Full Text].

    Innis, C., et al. (2000) Evolutionary trace analysis of tgf-ß and related growth factors: implications for site-directed mutagenesis. Prot. Eng, . 13, 839–847[Abstract/Free Full Text].

    Kabsch, W. and Sander, C. (1993) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637.

    Lamport, L. LaTeX: A Document Preparation System, (1986) , Reading, Mass Addison-Wesley.

    Lichtarge, O., et al. (1996) An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol, . 257, 342–358[CrossRef][ISI][Medline].

    Madabushi, S., et al. (2004) Evolutionary trace of G protein-coupled receptors reveals clusters of residues that determine global and class-specific functions. J. Mol. Biol, . 279, 8126–8132.

    Mihalek, I., et al. (2004) A family of evolution-entropy hybrid methods for ranking protein residues by importance. J. Mol. Biol, . 336, 1265–1282[CrossRef][ISI][Medline].

    Mihalek, I., et al. (2006) A structure and evolution guided monte carlo sequence selection strategy for multiple alignment-based analysis of proteins. Bioinformatics, 22, 149–156[Abstract/Free Full Text].

    Sander, C. and Schneider, R. (1991) Database of homology derived protein structures and the structural meaning of sequence alignment. Proteins, 9, 56–68[CrossRef][ISI][Medline].

    Shenoy, S., et al. (2006) Beta-arrestin-dependent, G protein-independent ERK1/2 activation by the beta2 adrenergic receptor. J. Mol. Biol, . 281, 261–273.

    Shindyalov, I. and Bourne, P. (1998) Protein structure alignment by incremental combinatorial extension (ce) of the optimal path. Protein Eng, . 11, 739–747[Abstract/Free Full Text].

    Thompson, J., et al. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, . 22, 4673–4680[Abstract/Free Full Text].

    Valdar, W. (2002) Scoring residue conservation. 48, 227–241.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/13/1656    most recent
btl157v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mihalek, I.
Right arrow Articles by Lichtarge, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mihalek, I.
Right arrow Articles by Lichtarge, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?