Skip Navigation


Bioinformatics Advance Access originally published online on June 27, 2007
Bioinformatics 2007 23(17):2247-2255; doi:10.1093/bioinformatics/btm320
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/17/2247    most recent
btm320v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Tseng, G. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tseng, G. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data

George C. Tseng *

Department of Biostatistics, University of Pittsburgh, Pittsburgh, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: Cluster analysis is one of the most important data mining tools for investigating high-throughput biological data. The existence of many scattered objects that should not be clustered has been found to hinder performance of most traditional clustering algorithms in such a high-dimensional complex situation. Very often, additional prior knowledge from databases or previous experiments is also available in the analysis. Excluding scattered objects and incorporating existing prior information are desirable to enhance the clustering performance.

Results: In this article, a class of loss functions is proposed for cluster analysis and applied in high-throughput genomic and proteomic data. Two major extensions from K-means are involved: penalization and weighting. The additive penalty term is used to allow a set of scattered objects without being clustered. Weights are introduced to account for prior information of preferred or prohibited cluster patterns to be identified. Their relationship with the classification likelihood of Gaussian mixture models is explored. Incorporation of good prior information is also shown to improve the global optimization issue in clustering. Applications of the proposed method on simulated data as well as high-throughput data sets from tandem mass spectrometry (MS/MS) and microarray experiments are presented. Our results demonstrate its superior performance over most existing methods and its computational simplicity and extensibility in the application of large complex biological data sets.

Availability: http://www.pitt.edu/~ctseng/research/software.html

Contact: ctseng{at}pitt.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Associate Editor: Chris Stoeckert


Received on February 27, 2007; revised on May 11, 2007; accepted on June 8, 2007

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genes Dev.Home page
L. Ni, C. Bruce, C. Hart, J. Leigh-Bell, D. Gelperin, L. Umansky, M. B. Gerstein, and M. Snyder
Dynamic and complex transcription factor binding during an inducible response in yeast
Genes & Dev., June 1, 2009; 23(11): 1351 - 1363.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Pan
Network-based multiple locus linkage analysis of expression traits
Bioinformatics, June 1, 2009; 25(11): 1390 - 1396.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. A. Shabalin, H. Tjelmeland, C. Fan, C. M. Perou, and A. B. Nobel
Merging two gene-expression studies via cross-platform normalization
Bioinformatics, May 1, 2008; 24(9): 1154 - 1160.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.