Bioinformatics, Vol 14, 439-451, Copyright © 1998 by Oxford University Press
O Trelles, MA Andrade, A Valencia, EL Zapata and JM Carazo
MOTIVATION: The explosive growth of the biological sequences databases
stimulated by genome projects has modified the framework of several
applications in the biological sequence analysis area. In most cases, this
new scenario is characterized by studies on large sets of sequences,
suggesting the need for effective and automatic methods for their
clustering. A more effective clustering of the database could be followed
by the application of common family analysis schemes to the groups so
formed. RESULTS: In this work, we present a new strategy to reduce the
computational cost associated with the clustering of large sets of
sequences which are expected to contain several families. The strategy is
based on the grouping of the sequences into families by using a dynamic
threshold on a pairwise sequence similarity criterion. Routine clustering
of large data sets can now be done very efficiently. The method developed
here achieves a computational space reduction of about an order of
magnitude over more traditional ones of all-versus- all comparisons. The
outcome of this approach produces family groupings that reproduce closely
already accepted biological results. Our work includes a parallel
implementation for distributed memory multiprocessors with a dynamic
scheduling strategy for performance optimization. AVAILABILITY: By
anonymous ftp at ftp.ac.uma.es (/pub/ots/pCluster directory), or from our
Web site http://www.cnb. uam.es/www/software/software_index.html CONTACT:
ots@ac.uma.es
ARTICLES
Computational space reduction and parallelization of a new clustering approach for large groups of sequences
Computer Architecture Department, University of Malaga, 29017 Malaga, Spain. ots@ac.uma.es
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
W. Huang, D. M. Umbach, and L. Li Accurate anchoring alignment of divergent sequences Bioinformatics, January 1, 2006; 22(1): 29 - 34. [Abstract] [Full Text] [PDF] |
||||
