Bioinformatics Advance Access published online on October 10, 2006
Bioinformatics, doi:10.1093/bioinformatics/btl511
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Theoretical Systems Biology, Leibniz Institute for Age Research - Fritz-Lipmann-Institute e. V. (former IMB Jena), Beutenbergstr. 11, D-07745 Jena, Germany
* To whom correspondence should be addressed.
Motivation: Pattern identification in biological sequence data is one of the main objectives of bioinformatics research. However, few methods are available for detecting patterns (substructures) in unordered datasets. Data mining algorithms mainly developed outside the realm of bioinformatics have been adapted for that purpose, but typically do not determine the statistical significance of the identified patterns. Moreover, these algorithms do not exploit the often modular structure of biological data. Results: We present the algorithm DASS (Discovery of All Significant Substructures) that first identifies all substructures in unordered data (DASSsub) in a manner that is especially efficient for modular data. In addition, DASS calculates the statistical significance of the identified substructures, for sets with at most one element of each type (DASSPset), or for sets with multiple occurrence of elements (DASSPmset). The power and versatility of DASS is demonstrated by four examples: combinations of protein domains in multi-domain proteins, combinations of proteins in protein complexes (protein subcomplexes), combinations of transcription factor target sites in promoter regions, and evolutionary conserved protein interaction subnetworks. Availability: The program code and additional data are available at http://www.fli-leibniz.de/tsb/DASS. Supplementary information: Supplementary data are available at Bioinformatics online.
Received May 31, 2006
Revised September 28, 2006
Accepted October 1, 2006
Article
DASS: efficient discovery and p-value calculation of substructures in unordered data
Jens Hollunder 1, Maik Friedel 1, Andreas Beyer 2, Christopher T. Workman 3, and Thomas Wilhelm 1 *
2 Department of Theoretical Systems Biology, Leibniz Institute for Age Research - Fritz-Lipmann-Institute e. V. (former IMB Jena), Beutenbergstr. 11, D-07745 Jena, Germany; Department of Bioengineering, University of California San Diego, La Jolla, California, 92093, USA
3 Department of Bioengineering, University of California San Diego, La Jolla, California, 92093, USA
Thomas Wilhelm, E-mail: wilhelm{at}fli-leibniz.de
![]()
Abstract
Associate Editor: Martin Bishop
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. C. Friedel and R. Zimmer Identifying the topology of protein complexes from affinity purification assays Bioinformatics, August 15, 2009; 25(16): 2140 - 2146. [Abstract] [Full Text] [PDF] |
||||
