Skip Navigation

This Article
Right arrow Full Text (Print PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ferrán, E. A.
Right arrow Articles by Pflugfelder, B.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Ferrán, E. A.
Right arrow Articles by Pflugfelder, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© Oxford University Press

A hybrid method to cluster protein sequences based on statistics and artificial neural networks

Edgardo A. Ferrán and Bernard Pflugfelder 1

Sanofi Elf Bio Recherches, Laèbege Innopole BP 137, 31676 Labège Cedes
1Societé Nationale Elf Aquitaine 26 avenue des Lilas, 64018 Pau, France

We have recently proposed a method, based on artificial neural networks (ANNs) to cluster protein sequences into families according to their degree of sequence similarity. The network was trained with an unsupervised learning algorithm, using, as inputs, matrix patterns den ved from the hip eptide composition of the protein sequences. We describe here some frrther improvements to that approach. First, we propose a statistical method to cluster a set of bipeptidic matrices into families. It consists of three stages: (i) principal component analysis, (ii) detennination of the optimal number M of clusters and (iii) final class cation of the bipeptidic matrices into M clusters. Using a set of 444 protein sequences, we show that the class given by the statistical method is in agreement with biological knowledge. We also show that the resulting classification is very similar to the one previously obtained with the ANN approach. Finally, we propose a new hybrid method of the statistical and ANN approaches, in which the results of the statistical method are used to choose the number of neurons and inputs of the network. We show that a network built in this way, and fed with afew principal components of the set of bipeptidic matrices as input signals, can be trained in an extremely short computing time. The resulting topological maps do not essentially differ from the ones obtained with the initial ANN approach.


Received on February 8, 1993; accepted on May 31, 1993

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.