Skip Navigation


Bioinformatics Advance Access originally published online on September 21, 2009
Bioinformatics 2009 25(23):3093-3098; doi:10.1093/bioinformatics/btp552
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
25/23/3093    most recent
btp552v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Blouin, C.
Right arrow Articles by Roger, A. J.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Blouin, C.
Right arrow Articles by Roger, A. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author(s) 2009. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reproducing the manual annotation of multiple sequence alignments using a SVM classifier

Christian Blouin 1,2,3,*, Scott Perry 2, Allan Lavell 2, Edward Susko 3,4 and Andrew J. Roger 1,3

1 Department of Biochemistry and Molecular Biology, Dalhousie University, Sir Charles Tupper Medical Building, Halifax NS B3H 1X5, 2 Faculty of Computer Science, Dalhousie University, Halifax NS B3H 5W1, 3 Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University and 4 Department of Mathematics and Statistics, Dalhousie University, Halifax NS B3H 6J3, Canada

* To whom correspondence should be addressed.


   Abstract

Motivation: Aligning protein sequences with the best possible accuracy requires sophisticated algorithms. Since the optimal alignment is not guaranteed to be the correct one, it is expected that even the best alignment will contain sites that do not respect the assumption of positional homology. Because formulating rules to identify these sites is difficult, it is common practice to manually remove them. Although considered necessary in some cases, manual editing is time consuming and not reproducible. We present here an automated editing method based on the classification of ‘valid’ and ‘invalid’ sites.

Results: A support vector machine (SVM) classifier is trained to reproduce the decisions made during manual editing with an accuracy of 95.0%. This implies that manual editing can be made reproducible and applied to large-scale analyses. We further demonstrate that it is possible to retrain/extend the training of the classifier by providing examples of multiple sequence alignment (MSA) annotation. Near optimal training can be achieved with only 1000 annotated sites, or roughly three samples of protein sequence alignments.

Availability: This method is implemented in the software MANUEL, licensed under the GPL. A web-based application for single and batch job is available at http://fester.cs.dal.ca/manuel.

Contact: cblouin{at}cs.dal.ca

Supplementary information: Supplementary data are available at Bioinformatics online.

Associate Editor: Martin Bishop


Received on July 28, 2009; revised on August 31, 2009; accepted on September 16, 2009

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.