Skip Navigation


Bioinformatics Advance Access originally published online on September 16, 2004
Bioinformatics 2005 21(5):644-649; doi:10.1093/bioinformatics/bti036
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/5/644    most recent
bti036v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Asyali, M. H.
Right arrow Articles by Alci, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Asyali, M. H.
Right arrow Articles by Alci, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Reliability analysis of microarray data using fuzzy c-means and normal mixture modeling based classification methods

Musa H. Asyali 1,* and Musa Alci 2

1 Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Center PO Box 3354, MBC-03, Riyadh 11211, Saudi Arabia
2 Department of Electrical and Electronics Engineering, Ege University Bornova, Izmir 35100, Turkey

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEM AND METHODS
 3 RESULTS
 4 DISCUSSION AND CONCLUSION
 REFERENCES
 

Motivation: A serious limitation in microarray analysis is the unreliability of the data generated from low signal intensities. Such data may produce erroneous gene expression ratios and cause unnecessary validation or post-analysis follow-up tasks. Therefore, the elimination of unreliable signal intensities will enhance reproducibility and reliability of gene expression ratios produced from microarray data. In this study, we applied fuzzy c-means (FCM) and normal mixture modeling (NMM) based classification methods to separate microarray data into reliable and unreliable signal intensity populations.

Results: We compared the results of FCM classification with those of classification based on NMM. Both approaches were validated against reference sets of biological data consisting of only true positives and true negatives. We observed that both methods performed equally well in terms of sensitivity and specificity. Although a comparison of the computation times indicated that the fuzzy approach is computationally more efficient, other considerations support the use of NMM for the reliability analysis of microarray data.

Availability: The classification approaches described in this paper and sample microarray data are available as Matlab TM (The MathWorks Inc., Natick, MA) programs (mfiles) and text files, respectively, at http://rc.kfshrc.edu.sa/bssc/staff/MusaAsyali/Downloads.asp. The programs can be run/tested on many different computer platforms where Matlab is available.

Contact: asyali{at}kfshrc.edu.sa


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEM AND METHODS
 3 RESULTS
 4 DISCUSSION AND CONCLUSION
 REFERENCES
 
DNA microarray technology is a powerful and efficient means of measuring relative gene activity or expression in a variety of applications. A comprehensive review of the biological and technical aspects of microarray technology can be found in Nyugen et al., (2002) and Golub et al., (1999). In any microarray hybridization experiment, only a small fraction of the genes become expressed as a result of the investigated conditions. Thus, a large portion of microarray data comprises low signal intensities that cause variability or impair reproducibility of the measured ratios between control and experimental samples. There are also other situations that give rise to low signal values, such as the deposition of suboptimal amounts of the probes, quality of the probes or incorrect segmentation of the spots. The identification of reliable and unreliable data points before generating the gene expression ratios provides the biologist with an extra layer of protection against the false positives. In a recent study, Asyali et al., (2004) described a classification method based on univariate and bivariate normal mixture modeling (NMM) (McLachlan and Gordon, 1989 Symons, 1981 Wolfe, 1970 Duda et al., 2000 Martinez and Martinez, 2001) for the reliability analysis of microarray data. First, the Expectation Maximization (EM) algorithm (Dempster et al., 1977 Redner and Walker, 1984 Moon, 1996) was utilized to estimate the parameters of the mixture model and the class posterior probabilities. Subsequently, the Bayesian decision theory (Duda et al., 2000) was applied to find the optimal decision boundary that discriminates between the reliable and the unreliable (low) signal intensity populations, based on the estimated class posterior probabilities.

The fuzzy c-means (FCM) classification has been successfully applied to the clustering analysis of microarray hybridization data for identifying biologically relevant groups of genes (Dembele and Kastner, 2003); however, the use and efficacy of this technique for the purpose of reliability analysis of microarray data has not yet been evaluated. In this study, as an alternative to the classification based on NMM, we proposed the use of FCM classification (Bezdek, 1981 Bezdek et al., 1987 Jang et al., 1997 Wang, 1997 Ross, 1995) which is a non-parametric approach that has found widespread biomedical applications recently (Hall et al., 1992 Karlik et al., 2003 Akay, 2000) and compared the results of both approaches against the reference (or ground truth) sets that have been constructed using our experimental data. We also evaluated the overall agreement between the results of two approaches and compared their execution times on our experimental data with a publicly available large dataset.


    2 SYSTEM AND METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEM AND METHODS
 3 RESULTS
 4 DISCUSSION AND CONCLUSION
 REFERENCES
 
2.1 Experimental data
We used data from three independent experiments of microarray gene expression from the same cell system (monocytic leukemia cell line, THP-1, induced by the endotoxin, LPS) (Suzuki et al., 2000 Murayama et al., 1997) in order to test and compare different classification approaches. We used complementary DNA (cDNA) microarray, which contained about 2000 cDNA distinct probes and a total of about 4000 elements Frevel et al., 2003. The details of microarray preparation, image acquisition and intensity extraction procedures can be found in Asyali et al., 2004. Our data consist of Cy3 (green) and Cy5 (red) channel fluorescence signal intensities. After background-subtraction and normalization, both channels were natural log-transformed, as commonly performed in microarray data analysis. In our case, the log-transform also brings the distribution of the data closer to normality, which helps fitting normal mixture models. In addition to our experimental data, a publicly available dataset from a recent study Chang et al., 2004 was also used. The microarray procedures of the study involved about 40 000 elements, representing about 36 000 different genes. We downloaded the raw Excel data file No. 17368 that corresponds to the profiling of asynchronous arm fibroblasts versus common reference fibroblasts from the website (http://genome-www5.stanford.edu). Table 1 shows summary statistics, including mean, SD, median, and the correlation between the channels ({rho} Cy3,Cy5) and the number of samples (n), for the two channel data in the four datasets.


View this table:
[in this window]
[in a new window]
 
Table 1 Summary statistics for the four microarray datasets

 
In the classification of microarray data, it is possible to analyze each channel separately and combine the individual classification results (Asyali et al., 2004). However, as indicated by the high correlation values in Table 1 bivariate analysis, where both channels are considered together, is more suitable (Asyali et al., 2004).

We constructed and used reference validation sets from our experimental data to assess and compare how well different classification methods are performing. For each dataset, a reference set of about 50 expressed genes including both endotoxin-induced and housekeeping genes, based on their expression status in human monocytes, was compiled. The expression status was obtained from the literature and from our previous large-scale microarray expression data (Suzuki et al., 2000 Murayama et al., 1997 Frevel et al., 2003). The signal intensities in the reference sets were identified as true positives if the microarray gene expression data agree with the prior knowledge about the expression status, whereas microarray gene expression data for true negatives were not consistent with the expected expression or inducibility of these genes. For further details about the construction of the reference datasets and a list of genes used, refer to Asyali et al., (2004). The genes or data points in the reference datasets can be seen in Figure 1A–C. As noted, there are both true positives (i.e. reliable data points) shown with normal triangles and true negatives (i.e. reliable data points) shown with inverted triangles in the reference datasets.



View larger version (67K):
[in this window]
[in a new window]
 
Fig. 1 FCM and NMM based classification results. The ‘high’ and ‘low’ refer to the classes of reliable and unreliable data points, respectively. (AD) Correspond to the datasets 1, 2, 3 and 4. The data points classified by FCM as low and high are marked with light-gray circles and dark-gray plus marks, respectively. The FCM decision boundary is a line that passes through the points where circles and plus marks are touching. A sample FCM decision boundary, obtained by visual inspection, is shown in (B). The dashed ellipsoid lines or contours represent the components of the bivariate normal mixture model, i.e. N(x1, {Sigma}1) and N(x2, {Sigma}2). The contour correspond to the level at which the pdf of the bivariate normal density drops to 60.65% of its peak value. The peak values are attained at the centers (indicated by large black dots) of the pdfs. In other words, the contour is obtained by cutting the two-dimensional pdf at 1 SD away from the center in each direction. The dotted ellipsoid is the NMM decision boundary, obtained by equating the two weighted density components (i.e. the decision boundary is the collection points x in two-dimension, for which w 1 N(x1, {Sigma}1)=w 2 N(x2, {Sigma}2). In (D), to underline the discrepancy between FCM and NMM classification results, the areas for which there is a disagreement are annotated.

 
2.2 FCM classification
Cluster analysis (Duda et al., 2000 Ross, 1995) is based on partitioning a collection of data points into a number of subgroups, where the objects in a particular group or cluster show a certain degree of closeness or similarity. (If the number of clusters is known a priori, as in our case, clustering problem turns into a classification problem, we therefore use the terms ‘clustering’ and ‘classification’ interchangeably.) The similarity measure is generally taken as the Euclidean distance between the data points. Hard clustering, also known as k-Means, assigns each data point to one and only one of the clusters, therefore the degree of membership for each data point to a particular class is either 0 or 1.

There are several applications in which the clusters have no clear or well-defined boundaries (Dembele and Kastner, 2003 Hall et al., 1992 Karlik et al., 2003). In fuzzy clustering, each data point may belong to any class with a certain possibility or ‘degree of membership’, a value between 0 and 1. As it will be noted shortly, this concept is similar to the posterior probability in the case of mixture models. The rationale behind the fuzzy clustering lies in the reality that an object or data point could be assigned to different classes. That is, if an object does not clearly fit into any of the clusters, this knowledge, expressed by the degree of membership, can be captured.

The FCM algorithm was first proposed by Bezdek (1981) and is briefed here for convenience. Below, c (the number of clusters) is 2, i=1,2 is the class, k=1,2,...,n is the data point and l=1,2,...,L is the iteration index. The norm operator || || refers to the Euclidean norm for vectors and Frobenius norm for matrices. Following the common practice (Dembele and Kastner, 2003 Jang et al., 1997 Ross, 1995) we selected the exponent parameter m (must be >1, also known as the fuzziness parameter) as 2: the maximum number of iterations (L) and termination criterion ({varepsilon}) were taken as 100 and 10–5, respectively.

Step 1. For a given dataset X={x 1,x 2,....,x n }, x k R 2, set l=1 and initialize nx 2 partition or membership matrix U (l) with elements u ki (l), such that 0≤ u ki (l) ≤ 1, {sum} i=1 c u ki (l)=1,{forall} k. (We initialized U with random numbers, normalized to make row sums equal to 1.)

Step 2. Compute c mean vectors (fuzzy centroids) v i (l) s as follows:


Step 3. Compute the degree of membership of all data points for all clusters and update the partition matrix, i.e. obtain U (l+1), as follows:


Step 4. Check for convergence: stop, if || U (l+1)U (l)||<{varepsilon} or l=L, otherwise, set l <- l +1 and go to Step 2.

The FCM algorithm converges into a solution usually rather rapidly and there is a guaranteed convergence in a finite number iterations (Bezdek et al., 1987); however, the algorithm may converge into a local minimum as well. Since algorithm runs relatively fast, it is possible to run it with several different initial conditions to check for the optimality of the resulting clustering.

2.3 Classification using NMM
Mixture modeling is a widely used technique for probability density function (pdf) estimation (Wolfe, 1970 Martinez and Martinez, 2001) and found significant applications in various biological problems (McLachlan et al., 2002 McManus, 1983 Shoukri and McLachlan, 1994 McLachlan and Gordon, 1989). We modeled the pdf of microarray data with two bivariate normal pdfs as follows:


where, N(x i , {Sigma} i )=(2{pi})–1 det({Sigma} i )–1/2 exp [–(x–µ i ) T {Sigma} i –1(x–µ i )/2],i=1,2 is a bivariate normal pdf with mean µ i R 2 and 2 x 2 covariance matrix {Sigma} i . The w i (≥0) denotes the weight of N(x i , {Sigma} i ). For each component, we have to estimate two parameters for the mean vector and three parameters for the covariance matrix (because of its symmetry). In addition, we have only one weight to estimate, as w 1+w 2 must be 1, for f(x) to be a proper pdf. The weighted bivariate normal pdfs or components, i.e. w 1 N(x1,{Sigma}1) and w 2 N(x2, {Sigma}2), correspond to the posterior probabilities of the low (unreliable) and high (potentially reliable) intensity classes of data, respectively. Once the mixture model parameters are estimated, we can calculate the posterior probability of any data point x belonging to the i-th class as f(x;x Class)=w i N(x i , {Sigma} i )/f(x), and to decide which class it belongs to the resulted probabilities were compared (Duda et al., 2000). By equating the class posterior probabilities and solving for x, we obtain the decision boundary, which is a hyper-quadratics that can assume many different forms depending on the parameters of the pdfs (Asyali et al., 2004 Duda et al., 2000). However, in our case, the decision boundaries turn out to be hyper-ellipsoids, due to the characteristics of microarray data that it mostly lies along the Cy3 = Cy 5 line, as there is a high correlation between the channels.

We used EM algorithm (Dempster et al., 1977) to estimate the mixture parameters. Following the common practice, we started the algorithm with an initial estimate of the parameters obtained from k-Means algorithm (Duda et al., 2000 Martinez and Martinez, 2001 Jang et al., 1997) and iterated the Expectation and Maximization steps until the changes in the parameters were less than a small preset tolerance (0.0001) or a certain number of iterations (300) was reached.

Similar to FCM, depending on the initial conditions, the EM algorithm may also converge into a local solution, i.e. to a local maximum of the likelihood function. The EM algorithm can be run multiple times starting with different initial guesses; however, this heuristic approach is computationally costly. Fortunately, when initialized by the k-Means algorithm, the EM algorithm will always find a good or acceptable local maximum (McLachlan and Basford, 1989) that is often considered sufficient in practical applications.


    3 RESULTS
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEM AND METHODS
 3 RESULTS
 4 DISCUSSION AND CONCLUSION
 REFERENCES
 
We performed all the computations of FCM and NMM classifications using our in-house programs, developed under Matlab TM (The MathWorks Inc., Natick, MA), on a personal computer with 1.5 MHz Pentium IV processor and 384 MB of memory, running under Windows-2000 TM operating system. Table 2 presents the NMM modeling results, i.e. mean vectors, covariance matrices and the weights, for the four datasets.


View this table:
[in this window]
[in a new window]
 
Table 2 The results of NMM for the four datasets (Comp. component)

 
The classification performance of both the approaches against the reference sets for the first three cases, corresponding to our experimental data, are reported in Table 3. For the fourth dataset, obtained from the study of Chang et al. (2004), we do not have a reference set. The performance of the two approaches are compared by the 2x 2 tables showing the agreement between the true state of the nature, i.e. true positive (reliable) and the negative (unreliable) data points in the reference sets, and the classification results, i.e. reliable and unreliable decisions obtained for those data points, and the corresponding sensitivity and specificity rates. For datasets 1 and 2, both methods correctly classify all the true positives and true negatives, signifying sensitivity and specificity rates of 100%. However, for the third dataset, the FCM incorrectly classifies two true positives as low or unreliable (sensitivity ~ 93%, specificity 100%), while NMM incorrectly classifies one true negative as reliable (sensitivity 100%, specificity ~ 93%).


View this table:
[in this window]
[in a new window]
 
Table 3 Comparison of the FCM and NMM classification results on the reference sets

 
We also explored the overall agreement between the FCM and NMM classification results, i.e. the agreement between unreliable (low) or reliable (high) decisions made for all the data points in the sets. The 2x 2 comparison tables along with the corresponding agreement rates are presented in Table 4 For datasets 1 and 2, the overall agreement rate between the FCM and NMM is ~95%, whereas for datasets 3 and 4, it is only ~90%.


View this table:
[in this window]
[in a new window]
 
Table 4 Comparison of overall agreement between the FCM and NMM classification results

 
The computation time required for executing the algorithms for all the four cases are given in Table 5. We observe that the FCM consistently takes less time to run.


View this table:
[in this window]
[in a new window]
 
Table 5 Comparison of the execution times of the FCM and NMM classification algorithms

 
Figure 1A–D (corresponding to datasets 1–4) shows the FCM and NMM classification results pictorially. The data points classified by the FCM as belonging to the low (or unreliable) class are marked with light-gray circles, whereas the data points identified as high (or potentially reliable) are marked with dark-gray plus marks. We observe that in all four cases, the bivariate data lie along the identity line (Cy3 = Cy5 axis) mostly, due to the high correlation between the channels. The FCM decision boundary can be visualized as a line passing through the points where circles and plus marks are meeting. A sample FCM decision boundary, obtained by visual inspection, is shown in Figure 1B. The FCM decision boundary, which appears to be perpendicular to the identity line, divides the data space into two-half spaces, the points above the line have a higher degree of membership for the reliable class. On the other hand, the NMM decision boundary is an ellipse whose major axis aligned with the Cy3 = Cy5 axis. The data points that fall outside this ellipsoid decision boundary are marked or identified as reliable.

Figure 1A and B, i.e. for the datasets 1 and 2, we observe that the classification boundaries of the FCM and NMM approaches are very close and they both correctly classify all the points in the reference sets. Whereas for the third and fourth datasets (Fig. 1C and D) the FCM and NMM classification results considerably disagree, supporting the finding that we have obtained from the overall agreement rates in Table 4. In Figure 1D, the areas for which there is a disagreement (i.e. where FCM decides low and NMM decides for high and vice versa) are annotated.


    4 DISCUSSION AND CONCLUSION
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEM AND METHODS
 3 RESULTS
 4 DISCUSSION AND CONCLUSION
 REFERENCES
 
The microarray technology lets the biologist study the expression or the activity of thousands of genes at the same time but only a fraction of genes are differentially expressed and low signal intensities constitute a relatively large portion of the data. Such low signal intensities may give rise to erroneous gene expression ratios or false positives. Therefore, careful filtering of such signals before the subsequent steps of the analysis is essential. Various techniques (Brody et al., 2002 Hughes et al., 2000 Bilban et al., 2002 Tran et al., 2002 Fielden et al., 2002) to study the microarray spot accuracy and identify the true array signals have been suggested in the literature. Recently, Asyali et al. (2004) suggested a NMM-based approach and successfully demonstrated its advantages over the existing techniques. The major novelty of their approach was the assignment of ‘reliability probability’ to the raw data points. In this study, we have explored the possibility of accomplishing the same signal classification goal, i.e. reliable versus unreliable, using the popular FCM classification technique.

The FCM assigns a ‘degree of membership’ to each data point as well, similar to the ‘posterior probability’ assignment of the NMM. This feature is very important because depending on the characteristic of the data, the biologist may want to change the default cutoffs to make the ‘reliable’ or ‘ unreliable’ calls for the data points. For instance, we assumed that if the degree of membership (FCM) or the posterior probability (NMM) for a data point for belonging to the reliable (unreliable) class is >0.5, then the point should be identified as reliable. However, suppose this type of reliability analysis based filtering of microarray data turn out to be too restrictive. Then, one can relax the reliability constraint slightly and decide for reliable class if the degree of membership (FCM) or the posterior probability (NMM) for a data point belonging to the reliable class is >0.45. In any case, the estimated ‘degree of membership’ or ‘posterior probability’ can be kept in perspective to assess the reliability of the gene expression ratios. Essentially, making this type of ‘hard’ calls or filtering is not even necessary, as the basic idea is to have a ‘degree of reliability’ or ‘probability of reliability’ assigned to each data point so that one can know the reliability of corresponding gene expression ratios.

Therefore, we thought that a comparison of FCM and NMM classification approaches, which both seem to be suitable for the reliability analysis of microarray data, would be interesting to do. To this end, we have applied both algorithms on four datasets and assessed their performance by checking the classification decisions, ‘reliable’ versus ‘unreliable’, against the information in the reference sets where available ( Table 3) and also by comparing the overall agreement between the results of the two approaches (Table 4).

Based on the performance comparison against the reference sets, which indicates that both algorithms are performing equally well (Table 3), and considering the speed advantage of the FCM (Table 5), one may jump to the conclusion that it is advantageous to use FCM. Especially in the case of batch processing of large datasets, the speed advantage of FCM may be an appealing factor. However, a closer look into the classification results, particularly for the third and fourth datasets shown in Figure 1C and D reveals that the decision boundary (and corresponding decision region), which is identified by the NMM has some unique properties. Both FCM and NMM decision regions for the unreliable data lie in the lower left quarter of the two-dimensional data space, which is quite sensible, as in our context ‘unreliability’ is directly related with the ‘lowness’ of signal values. On the other hand, the decision boundary of the NMM is aligned with the Cy3 = Cy5 axis. This means that when both Cy3 and Cy5 channel signals are ‘low’ and ‘unbalanced’ the NMM will most likely identify those points as reliable, whereas the FCM will fail to do so. This point is clearly seen in Figure 1D, the regions annotated as ‘FCM Low, NMM High’ most probably correspond to reliable data points. For the region marked as ‘FCM Low, NMM High’, one may argue that the FCM is reaching a more fair decision than NMM, as the NMM decision boundary reaches too deep into the region of bivariate normal density component with the higher mean. (The NMM decision boundary is almost touching the center of the component with the higher mean. This is quite possible, depending on the parameters of the density components and the class prior probabilities, i.e. weights.) However, for these points, the gene expression ratio, i.e. Cy5/Cy3 ratio, will not be interesting anyway (a ratio close to 1 does not signify any differential gene expression). This observation, i.e. the alignment of the decision boundary of the NMM along the identity line, led us to conclude that NMM is superior to FCM in terms of identifying or assessing the reliability of microarray data.

Received on May 11, 2004; revised on August 26, 2004; accepted on September 11, 2004

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEM AND METHODS
 3 RESULTS
 4 DISCUSSION AND CONCLUSION
 REFERENCES
 

    Akay, M. Nonlinear Biomedical Signal Processing, Volume 1: Fuzzy Logic, Neural Networks, and New Algorithms, (2000) , NJ Wiley-IEEE.

    Asyali, M.H., Shoukri, M.M., Demirkaya, O., Khabar, K.S.A. (2004) Estimation of signal thresholds for microarray data using mixture modeling. Nucleic Acids Res., 32, , pp. 2323–2335[Abstract/Free Full Text].

    Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithm, (1981) , NY Plenum Press.

    Bezdek, J.C., Hataway, R.J., Sabin, M.J., Tucker, W.T. (1987) Convergence theory for fuzzy c-means: counterexamples and repairs, IEEE. SMC, 17, , pp. 873–877.

    Bilban, M., Buehler, L., Head, S., Desoye, G., Quaranta, V. (2002) Defining signal thresholds in DNA microarrays: exemplary application for invasive cancer. BMC Genomics, 3, 19[CrossRef][Medline].

    Brody, J.P., Williams, B.A., Wold, B.J., Quake, S.R. (2002) Significance and statistical errors in the analysis of DNA microarray data. Proc. Natl Acad. Sci. USA, 99, 12975–12978[Abstract/Free Full Text].

    Chang, H.Y., Sneddon, J.B., Alizadeh, A.A., Sood, R., West, R.B., Montgomery, K., Chi, J.T., Rijn Mv, M., Botstein, D., Brown, P.O. (2004) Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol., 2, E7[CrossRef][Medline].

    Dembele, D. and Kastner, P. (2003) Fuzzy c-means method for clustering microarray data. Bioinformatics, 19, 973–980[Abstract/Free Full Text].

    Dempster, A., Laird, N., Rubin, D. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., B39, 1–38.

    Duda, R., Hart, P., Stork, D. Pattern Classification, (2000) , NY Wiley.

    Fielden, M.R., Halgren, R.G., Dere, E., Zacharewski, T.R. (2002) GP3: GenePix post-processing program for automated analysis of raw microarray data. Bioinformatics, 18, , pp. 771–773[Abstract/Free Full Text].

    Frevel, M.A., Bakheet, T., Silva, A.M., Hissong, J.G., Khabar, K.S., Williams, B.R. (2003) p38 Mitogen-activated protein kinase-dependent and -independent signaling of mRNA stability of AU-rich element-containing transcripts. Mol. Cell. Biol., 23, 425–436[Abstract/Free Full Text].

    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537[Abstract/Free Full Text].

    Hall, L.O., Bensaid, A.M., Clarke, L.P., Velthuizen, R.P., Silbiger, M.S., Bezdek, J.C. (1992) A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Trans. Neural Net., 3, 672–682[CrossRef].

    Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D. (2000) Functional discovery via a compendium of expression profiles. Cell, 102, 109–126[CrossRef][Web of Science][Medline].

    Jang, J.-S.R., Sun, C.-T., Mizutani, E. Neuro-Fuzzy and Soft Computing, (1997) , NJ Prentice-Hall.

    Karlik, B., Tokhi, O., Alci, M. (2003) Fuzzy clustering neural network architecture for multifunction upper-limb prosthesis. IEEE Trans. Biomed. Eng., 50, , pp. 1255–1261[CrossRef][Web of Science][Medline].

    Martinez, W.L. and Martinez, A.R. Computational Statistics Handbook with MATLAB, (2001) , Boca Raton, FL CRC Press.

    McLachlan, G.J. and Basford, K.E. Mixture Models, Inference and Applications to Clustering, (1989) , NY Marcel Dekker.

    McLachlan, G.J. and Gordon, R.D. (1989) Mixture models for partially unclassified data: a case study of renal venous renin in hypertension. Stat. Med., 8, , pp. 1291–1300[Web of Science][Medline].

    McLachlan, G.J., Bean, R.W., Peel, D. (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics, 18, 413–422[Abstract/Free Full Text].

    McManus, I.C. (1983) Bimodality of blood pressure levels. Stat. Med., 2, 253–258[Medline].

    Moon, T.K. (1996) The Expectation-maximization algorithm. IEEE Signal Process. Mag., 13, 47–60[CrossRef].

    Murayama, T., Ohara, Y., Obuchi, M., Khabar, K.S., Higashi, H., Mukaida, N., Matsushima, K. (1997) Human cytomegalovirus induces interleukin-8 production by a human monocytic cell line, THP-1, through acting concurrently on AP-1- and NF-kappaB-binding sites of the interleukin-8 gene. J. Virol., 71, 5692–5695[Abstract].

    Nguyen, D.V., Arpat, A.B., Wang, N., Carroll, R.J. (2002) DNA microarray experiments: biological and technological aspects. Biometrics, 58, 701–717[CrossRef][Web of Science][Medline].

    Redner, R. and Walker, H. (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev., 26, 195–202[CrossRef].

    Ross, T.J. Fuzzy logic with engineering applications, (1995) , NY McGraw-Hill.

    Shoukri, M.M. and McLachlan, G.J. (1994) Parametric estimation in a genetic mixture model with application to nuclear family data. Biometrics, 50, , pp. 128–139[CrossRef][Web of Science][Medline].

    Suzuki, T., Hashimoto, S., Toyoda, N., Nagai, S., Yamazaki, N., Dong, H.Y., Sakai, J., Yamashita, T., Nukiwa, T., Matsushima, K. (2000) Comprehensive gene expression profile of LPS-stimulated human monocytes by SAGE. Blood, 96, 2584–2591[Abstract/Free Full Text].

    Symons, M. (1981) Clustering criteria and multivariate normal mixtures. Biometrics, 37, 35–43[CrossRef].

    Tran, P.H., Peiffer, D.A., Shin, Y., Meek, L.M., Brody, J.P., Cho, K.W. (2002) Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals. Nucleic Acids Res., 30, e54[Abstract/Free Full Text].

    Wang, L-X. A Course in Fuzzy Systems and Control, (1997) , NJ Prentice Hall.

    Wolfe, J. (1970) Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res., 5, , pp. 329–350.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Neurosci.Home page
P. Sathyan, H. B. Golden, and R. C. Miranda
Competing Interactions between Micro-RNAs Determine Neural Progenitor Survival and Proliferation after Ethanol Exposure: Evidence from an Ex Vivo Model of the Fetal Cerebral Cortical Neuroepithelium
J. Neurosci., August 8, 2007; 27(32): 8546 - 8557.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. S. Siddiqui, A. D. Delaney, A. Schnerch, O. L. Griffith, S. J. M. Jones, and M. A. Marra
Sequence biases in large scale gene expression profiling data
Nucleic Acids Res., July 13, 2006; 34(12): e83 - e83.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
J. S. Verducci, V. F. Melfi, S. Lin, Z. Wang, S. Roy, and C. K. Sen
Microarray analysis of gene expression: considerations in data mining and statistical treatment
Physiol Genomics, May 16, 2006; 25(3): 355 - 363.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/5/644    most recent
bti036v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Asyali, M. H.
Right arrow Articles by Alci, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Asyali, M. H.
Right arrow Articles by Alci, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?