Skip Navigation


Bioinformatics Advance Access originally published online on June 8, 2009
Bioinformatics 2009 25(15):1869-1875; doi:10.1093/bioinformatics/btp342
This Article
Right arrow Full Text
Right arrow Full Text (Print PDF)
Right arrow Supplementary Data
Right arrow A corrigendum has been published
Right arrow All Versions of this Article:
25/15/1869    most recent
btp342v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Neuwald, A. F.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Neuwald, A. F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Rapid detection, classification and accurate alignment of up to a million or more related protein sequences

Andrew F. Neuwald

Department of Biochemistry & Molecular Biology and The Institute for Genome Sciences, University of Maryland, School of Medicine, 801 West Baltimore Street, BioPark II, Room 617, Baltimore, MD 21201, USA


   Abstract

Motivation: The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical.

Results: This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin–Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences.

Availability: A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu.

Contact: aneuwald{at}som.umaryland.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Associate Editor: John Quackenbush


Received on February 9, 2009; revised on May 27, 2009; accepted on May 27, 2009

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.