Bioinformatics Advance Access originally published online on May 6, 2005
Bioinformatics 2005 21(14):3176-3178; doi:10.1093/bioinformatics/bti486
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Published by Oxford University Press 2005
PMUT: a web-based tool for the annotation of pathological mutations on proteins
1Molecular Modeling and Bioinformatics Unit, Institut de Recerca Biomédica, Parc Científic de Barcelona Josep Samitier 1-5, Barcelona 08028, Spain
2Departament de Bioquímica i Biología Molecular, Facultat de Química, Universitat de Barcelona Martí i Franquès 1, Barcelona 08028, Spain
3Instituto Nacional de Bioinformàtica, Parc Científic de Barcelona Josep Samitier 1-5, Barcelona 08028, Spain
4Institució Catalana per laRecerca i Estudis Avançats (ICREA) Passeig Lluís Companys 23, 08018 Barcelona, Spain
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: PMUT allows the fast and accurate prediction (~80% success rate in humans) of the pathological character of single point amino acidic mutations based on the use of neural networks. The program also allows the fast scanning of mutational hot spots, which are obtained by three procedures: (1) alanine scanning, (2) massive mutation and (3) genetically accessible mutations. A graphical interface for Protein Data Bank (PDB) structures, when available, and a database containing hot spot profiles for all non-redundant PDB structures are also accessible from the PMUT server.
Availability: PMUT is freely accessible at http://mmb2.pcb.ub.es:8080/PMut/
Contact: modesto{at}mmb.pcb.ub.es
Supplementary information: http://mmb2.pcb.ub.es:8080/PMutWeb/methodology.html
| INTRODUCTION |
|---|
|
|
|---|
The processing of the massive amount of data on single nucleotide polymorphisms (SNPs) requires the development of automatic annotation tools to determine the potential pathological character of a given SNP. Some of these programs (e.g. http://pupasnp.bioinfo.cnio.es/) trace the positioning of SNPs in the genome, detecting when they occur in a functionally important region. Others are centered on the study of non-synonymous mutations mapped on proteins; for discussions see (Chasman and Adams, 2001; Ferrer-Costa et al., 2002,2004,2005; Ng and Henikoff, 2002; Saunders and Baker, 2002; Sunyaev et al., 2001; Wang and Moult, 2001).
Our group has developed an accurate (>80% success rate) and robust methodology to predict disease-associated mutations (DAMUs) (Ferrer-Costa et al., 2002,2004,2005). The method is based on the use of neural networks (NNs) trained with a large database of neutral mutations (NEMUs) and pathological mutations. In this paper we present the PMUT server, which implements our predictive models and complementary tools that can help in the annotation of SNPs.
| SERVER STRUCTURE |
|---|
|
|
|---|
PMUT works at two different levels (Fig. 1S): (1) it retrieves information from a local database of mutational hotspots and (2) it analyzes a given SNP in a specific protein. Results are displayed in the form of various text files and, when the structure is experimentally known, 2-D and 3-D plots are also available.
| PMUT PREDICTOR |
|---|
|
|
|---|
The first input to PMUT is either the sequence of the protein or its SwissProt/trEMBL code. The user has to select the mutation site and whether to analyze a single mutation (default) or to perform a complete mutation scan at this position. The program can simulate massive single-point mutation along the whole sequence (Mutation Hot-Spot analysis), helping to detect regions where mutations are expected to have a large pathological impact. Irrespective of the selection, the program retrieves a series of parameters describing the mutation (Ferrer-Costa et al., 2002, 2004) from (1) its internal databases, (2) PHD output (Rost and Sander, 1993) and (3) multiple alignments. The latter are either introduced by the user [e.g. from the PFAM database (Bateman et al., 2002)] or automatically generated by the program from a two-iterations PSI-Blast (Altschul et al., 1997) run on a non-redundant SwissProt/trEMBL database.
Two NNs are implemented as predictor engines: a large one (the default) with 1 hidden layer, 20 nodes and 15 descriptors (Ferrer-Costa et al., 2002, 2004) and a small one (20 nodes, no hidden layer) with 3 parameters (Ferrer-Costa et al. submitted for publication). Both NNs were carefully trained with human mutational data. The final output is always (1) a pathogenicity index ranging from 0 to 1 (indexes >0.5 signal pathological mutations) and (2) a confidence index ranging from 0 (low) to 9 (high). Additionally, the program allows the user to retrieve all the intermediate information (alignments, Blast and PHD outputs, etc.) used in PMUTpredictions.
The PMUT server allows the display of the mutation site on the protein structure (when this is available) using a color code to trace the pathogenicity associated with the mutation. For this purpose, we used Blast (Altschul et al., 1997) to find highly homologous sequences in the complete PDB (Berman et al., 2000). PDP with >70% of identity with Query protein were eliminated and also those ones with a coverage of less than 70 residues for that PDP longer than 200 residues. A Rasmol (Sayle and Milner-White, 1995) script is automatically created to display the mutation site on the protein structure. Visualization can be done remotely using the Chime plug-in (MDL Information Systems, Inc.) or alternatively the file can be download for local inspection using Rasmol.
| PMUT DATABASE |
|---|
|
|
|---|
We have pre-computed the mutation profiles of all the proteins in the 90% identity cluster of the PDB database (Berman et al., 2000). For this purpose we mutated all the residues of each protein to all 19 possible alternative amino acids. The mutation matrix is manipulated to define mutation hot spots in different ways: (1) maximum, mean and minimum pathogenicity indexes in each mutation site, (2) the pathogenicity index associated with the mutation to Ala (alanine-scanning) of all the residues and (3) the maximum, mean and minimum pathogenicity indexes associated with the genetically accessible mutations (i.e. those implying only one nucleotide change) in each position of the protein.
All this information can be retrieved from the server (Fig. 1) in text and graphical formats (Fig. 1). To avoid over-interpretation of the results, the user is alerted when the protein is not human. The help section includes a description of the validity of prediction in non-human proteins using human-trained NNs.
|
| SOFTWARE IMPLEMENTATION AND USE |
|---|
|
|
|---|
PMUT is freely accesible through a web interface at the Molecular Modeling and Bioinformatics website (http://mmb2.pcb.ub.es:8080/PMut/). The interface is written as a Java servlet. PMUT has a core written in C, complemented with a series of Perl scripts responsible for the overall workflow, including the execution of auxiliary programs. Two-dimensional graphical output is obtained using the GNUPLOT software (http://www.gnuplot.info/) running on the server and is provided as standard image files. The 3-D outputs are provided as Rasmol scripts, and visualization requires the use of Rasmol or the Chime plug-in on the client side. Calculations are run using a batch queue in the server and the user is informed of their completion either from the web page or by email. A limited version of PMUT Predictor providing a hot spot analysis is also available as a web service running according to the BioMoby standard (http://www.biomoby.org; http://www.inab.org).
| Acknowledgments |
|---|
This work has been supported by the Instituto Nacional de Bioinformática (INB-Genoma España), Fundación Ramón-Areces and the Spanish Ministry of Education and Science (BIO2003-06848 and GEN2001-4758-C07-07).
Received on March 8, 2005; revised on April 13, 2005; accepted on May 3, 2005
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402
Bateman, A., et al. (2002) The Pfam protein families database. Nucleic Acids Res., 30, 276280
Berman, H.M., et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235242
Chasman, D. and Adams, R.M. (2001) Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J. Mol. Biol., 307, 683706[CrossRef][Web of Science][Medline].
Ferrer-Costa, C., et al. (2002) Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J. Mol. Biol., 315, 771786[CrossRef][Web of Science][Medline].
Ferrer-Costa, C., et al. (2004) Sequence-based prediction of pathological mutations. Proteins, 57, 811819[CrossRef][Web of Science][Medline].
Ferrer-Costa, C., et al. (2005) Use of bioinformatics tools for the annotation of disease-associated mutations in animal models. Proteins, in press.
Ng, P.C. and Henikoff, S. (2002) Accounting for human polymorphisms predicted to affect protein function. Genome Res., 12, 436446
Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol., 232, 584599[CrossRef][Web of Science][Medline].
Saunders, C.T. and Baker, D. (2002) Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J. Mol. Biol., 322, 891901[CrossRef][Web of Science][Medline].
Sayle, R.A. and Milner-White, E.J. (1995) RASMOL: biomolecular graphics for all. Trends Biochem. Sci., 20, 374[CrossRef][Web of Science][Medline].
Sunyaev, S., et al. (2001) Prediction of deleterious human alleles. Hum. Mol. Genet., 10, 591597
Wang, Z. and Moult, J. (2001) SNPs, protein structure, and disease. Hum. Mutat., 17, 263270[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
B. Li, V. G. Krishnan, M. E. Mort, F. Xin, K. K. Kamati, D. N. Cooper, S. D. Mooney, and P. Radivojac Automated inference of molecular mechanisms of disease from amino acid substitutions Bioinformatics, November 1, 2009; 25(21): 2744 - 2750. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Ticozzi, V. Silani, A. L. LeClerc, P. Keagle, C. Gellera, A. Ratti, F. Taroni, T. J. Kwiatkowski Jr, D. M. McKenna-Yasek, P. C. Sapp, et al. Analysis of FUS gene mutation in familial amyotrophic lateral sclerosis within an Italian cohort Neurology, October 13, 2009; 73(15): 1180 - 1185. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Ramsay, V. Quesada, M. Sanchez, C. Garabaya, M. P. Sarda, M. Baiget, A. Remacha, G. Velasco, and C. Lopez-Otin Matriptase-2 mutations in iron-refractory iron deficiency anemia patients provide new insights into protease activation mechanisms Hum. Mol. Genet., October 1, 2009; 18(19): 3673 - 3683. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A Reeves, D. Talavera, and J. M Thornton Genome and proteome annotation: organization, interpretation and integration J R Soc Interface, February 6, 2009; 6(31): 129 - 147. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Karchin Next generation tools for the annotation of human SNPs Brief Bioinform, January 1, 2009; 10(1): 35 - 52. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Nan, T. Niu, D. J. Hunter, and J. Han Missense Polymorphisms in Matrix Metalloproteinase Genes and Skin Cancer Risk Cancer Epidemiol. Biomarkers Prev., December 1, 2008; 17(12): 3551 - 3557. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z-B Jin, M Mandai, T Yokota, K Higuchi, K Ohmori, F Ohtsuki, S Takakura, T Itabashi, Y Wada, M Akimoto, et al. Identifying pathogenic genetic background of simplex or multiplex retinitis pigmentosa patients: a large scale mutation screening study J. Med. Genet., July 1, 2008; 45(7): 465 - 472. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Soegaard, S. K. Kjaer, M. Cox, E. Wozniak, E. Hogdall, C. Hogdall, J. Blaakaer, I. J. Jacobs, S. A. Gayther, and S. J. Ramus BRCA1 and BRCA2 Mutation Prevalence and Clinical Characteristics of a Population-Based Series of Ovarian Cancer Cases from Denmark Clin. Cancer Res., June 15, 2008; 14(12): 3761 - 3767. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Wang, C. Szabo, C. Qian, P. G. Amadio, S. N. Thibodeau, J. R. Cerhan, G. M. Petersen, W. Liu, and F. J. Couch Mutational Analysis of Thirty-two Double-Strand DNA Break Repair Genes in Breast and Pancreatic Cancers Cancer Res., February 15, 2008; 68(4): 971 - 975. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Singh, A. Olowoyeye, P. H. Baenziger, J. Dantzer, M. G. Kann, P. Radivojac, R. Heiland, and S. D. Mooney MutDB: update on development of tools for the biochemical analysis of genetic variation Nucleic Acids Res., January 11, 2008; 36(suppl_1): D815 - D819. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Torkamani and N. J. Schork Accurate prediction of deleterious protein kinase polymorphisms Bioinformatics, November 1, 2007; 23(21): 2918 - 2925. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Thusberg and M. Vihinen The structural basis of hyper IgM deficiency - CD40L mutations Protein Eng. Des. Sel., March 1, 2007; 20(3): 133 - 141. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. G. Jegga, S. Gowrisankar, J. Chen, and B. J. Aronow PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease Nucleic Acids Res., January 12, 2007; 35(suppl_1): D700 - D706. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











