Bioinformatics Vol. 18 no. 4 2002
Pages 555-565
© 2002 Oxford University Press
Binary analysis and optimization-based normalization of gene expression data
Cancer Genomics Laboratory, Department of Pathology, University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd, Box 85, Houston, TX 77030, USA
Received on August 3, 2001
; revised on October 11, 2001
; accepted on November 23, 2001
Motivation: Most approaches to gene expression analysis use real-valued expression data, produced by high-throughput screening technologies, such as microarrays. Often, some measure of similarity must be computed in order to extract meaningful information from the observed data. The choice of this similarity measure frequently has a profound effect on the results of the analysis, yet no standards exist to guide the researcher.
Results: To address this issue, we propose to analyse gene expression data entirely in the binary domain. The natural measure of similarity becomes the Hamming distance and reflects the notion of similarity used by biologists. We also develop a novel data-dependent optimization-based method, based on Genetic Algorithms (GAs), for normalizing gene expression data. This is a necessary step before quantizing gene expression data into the binary domain and generally, for comparing data between different arrays. We then present an algorithm for binarizing gene expression data and illustrate the use of the above methods on two different sets of data. Using Multidimensional Scaling, we show that a reasonable degree of separation between different tumor types in each data set can be achieved by working solely in the binary domain. The binary approach offers several advantages, such as noise resilience and computational efficiency, making it a viable approach to extracting meaningful biological information from gene expression data.
Contact: is{at}ieee.org
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Dingel and O. Milenkovic List-decoding methods for inferring polynomials in finite dynamical gene network models Bioinformatics, July 1, 2009; 25(13): 1686 - 1693. [Abstract] [Full Text] [PDF] |
||||
![]() |
G Malouf, B Falissard, D Azoulay, F Callea, L D Ferrell, Z D Goodman, Y Hayashi, H-C Hsu, S G Hubscher, M Kojiro, et al. Is histological diagnosis of primary liver carcinomas with fibrous stroma reproducible among experts? J. Clin. Pathol., June 1, 2009; 62(6): 519 - 524. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-K. Ching, S. Zhang, M. K. Ng, and T. Akutsu An approximation method for solving the steady-state probability distribution of probabilistic Boolean networks Bioinformatics, June 15, 2007; 23(12): 1511 - 1518. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Shmulevich, S. A. Kauffman, and M. Aldana Eukaryotic cells are dynamically ordered or critical but not chaotic PNAS, September 20, 2005; 102(38): 13439 - 13444. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Jampachaisri, L. Valinsky, J. Borneman, and S. J. Press Classification of oligonucleotide fingerprints: application for microbial community and gene expression analyses Bioinformatics, July 15, 2005; 21(14): 3122 - 3130. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-h. Taguchi and Y. Oono Relational patterns of gene expression via non-metric multidimensional scaling analysis Bioinformatics, March 15, 2005; 21(6): 730 - 740. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Wang, H. Wang, W. Shen, H. Huang, L. Hu, L. Ramdas, Y.-H. Zhou, W. S-L. Liao, G. N. Fuller, and W. Zhang Insulin-like Growth Factor Binding Protein 2 Enhances Glioblastoma Invasion by Activating Invasion-enhancing Genes Cancer Res., August 1, 2003; 63(15): 4315 - 4321. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhou, X. Wang, and E. R. Dougherty Binarization of Microarray Data on the Basis of a Mixture Model Mol. Cancer Ther., July 1, 2003; 2(7): 679 - 684. [Abstract] [Full Text] [PDF] |
||||





