Skip Navigation


Bioinformatics Advance Access originally published online on December 22, 2006
Bioinformatics 2007 23(4):473-479; doi:10.1093/bioinformatics/btl640
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
23/4/473    most recent
btl640v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, H.
Right arrow Articles by Zhan, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, H.
Right arrow Articles by Zhan, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

The discovery of transcriptional modules by a two-stage matrix decomposition approach

Huai Li , Yu Sun and Ming Zhan *

Bioinformatics Unit, Branch of Research Resources, National Institute on Aging NIH, Baltimore, MD 21224, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 SUMMARY
 REFERENCES
 

Motivation: We address the problem of identifying gene transcriptional modules from gene expression data by proposing a new approach. Genes mostly interact with each other to form transcriptional modules for context-specific cellular activities or functions. Unraveling such transcriptional modules is important for understanding biological network, deciphering regulatory mechanisms and identifying biomarkers.

Method: The proposed algorithm is based on two-stage matrix decomposition. We first model microarray data as non-linear mixtures and adopt the non-linear independent component analysis to reduce the non-linear distortion and separate the data into independent latent components. We then apply the probabilistic sparse matrix decomposition approach to model the ‘hidden’ expression profiles of genes across the independent latent components as linear weighted combinations of a small number of transcriptional regulator profiles. Finally, we propose a general scheme for identifying gene modules from the outcomes of the matrix decomposition.

Results: The proposed algorithm partitions genes into non-mutually exclusive transcriptional modules, independent from expression profile similarity measurement. The modules contain genes with not only similar but different expression patterns, and show the highest enrichment of biological functions in comparison with those by other methods. The usefulness of the algorithm was validated by a yeast microarray data analysis.

Availability: The software is available upon request to the authors.

Contact: zhanmi{at}mail.nih.gov


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 SUMMARY
 REFERENCES
 
Genes mostly interact with each other to form transcriptional modules for context-specific cellular activities or functions (Brunet et al., 2004; Hughes et al., 2000; Segal et al., 2004). Unraveling such transcriptional modules is important for understanding biological network, deciphering regulatory mechanisms and identifying biomarkers.

Various methods have been proposed for identifying gene transcriptional modules from microarray data. Most of them are clustering-based, such as hierarchical clustering (Eisen et al., 1998), K-means (Tavazoie et al., 1999) and self-organizing map (SOM) (Tamayo et al., 1999). The clustering-based methods mostly measure correlative and linear relationship among genes, and partition genes into mutually exclusive clusters. The methods identify transcriptional modules by assuming that genes with similar expression profiles share similar functions or the same pathway. However, genes involved in the same biological process or pathway can have different expression patterns (Zhou et al., 2005). It is also important to capture contextual modularity that might result from highly non-linear interactions among genes. Moreover, one gene may be part of several biological processes and thus should belong to different modules. Apparently, the clustering-based methods are not suitable to address these problems.

Recently, matrix decomposition methods have been introduced for uncovering transcriptional modules from microarray data. These methods treat microarray data as a mixture of unknown signals that may correspond to specific biological sources. Since matrix decomposition methods do not cluster genes based on pair-wise similarity measurement, functional related genes showing different expression patterns can be clustered together (Dueck et al., 2005; Lee and Batzoglou, 2003; Zhou et al., 2005). The methods can also partition genes into non-mutually exclusive modules to reflect the fact that genes may have multiple functions or are active in multiple biological processes. A variety of matrix decomposition methods have been proposed for microarray data analysis. Some of them take non-probabilistic approaches, which include singular value decomposition (Alter et al., 2000; Holter et al., 2001), independent components analysis (Chiappetta et al., 2004; Frigyesi et al., 2006; Lee and Batzoglou, 2003; Liebermeister, 2002), non-negative matrix factorizations (Brunet et al., 2004; Carmona-Saez et al., 2006; Gao and Church, 2005; Kim and Tidor, 2003; Lee and Seung, 1999; Wang et al., 2006) and network component analysis (Liao et al., 2003). Others take probabilistic approaches that account for noise, such as probabilistic sparse matrix factorization (Dueck et al., 2005). Most of these methods however use linear models that describe gene expression data as linear combinations of latent biological sources. In fact, non-linear interactions among genes are often observed in gene regulatory networks, such as negative feedback events, two consecutive biological events of threshold and saturation, etc. (Atkinson et al., 2003; Yuh et al., 1998). Thus, non-linear mixtures would provide a more realistic model for gene relationships and transcriptional modules. Recently, progresses have been made for general non-linear mixed models in matrix decomposition (Jutten and Karhunen, 2004). A promising method among these advances is the non-linear independent component analysis (NICA) by variational Bayesian learning (Jutten and Karhunen, 2004; Lappalainen and Honkela, 2000). This method uses a multilayer perceptron network to model the non-linear mapping from sources to observed data, and infers statistically independent sources and parameters of the mapping that most likely generate the observed data (Lappalainen and Honkela, 2000). Other non-linear mappings include kernel-based functions (Muller et al., 2001), radial basis functions (RBF) (Tan et al., 2001), etc. These statistical methods however have not yet been used in microarray data analysis for transcriptional module discovery.

In this study, we introduce a novel approach, which is based on two-stage matrix decomposition, to model gene expression data and uncover the modular structure. Considering that interactions among many genes, particularly between genes and transcription factors, are possibly non-linear, we first model microarray data as non-linear mixtures and adopt a NICA decomposition to reduce the non-linear distortion in the data and separate the data into independent latent components, which may correspond to specific biological sources unknown or hidden. We then apply a probabilistic sparse matrix factorization (PSMF) approach to model the ‘hidden’ expression profile of genes across the independent latent components as linear weighted combinations of profiles of a small number of predominant transcriptional regulators (e.g. transcription factors). The NICA transformation allows capture the non-linear structure of transcriptional profiles in our method. The PSMF procedure is appropriate for modeling gene expression data, in which while many genes are involved in gene regulation, only a small number of regulators (e.g. transcription factors) have predominant impact to the expression of any specific gene. From the model, we propose a general approach for identifying transcriptional modules. The newly proposed algorithm takes into account that the non-linear structure existed in the data. Besides clustering together genes with similar expression profiles and similar function categories, our algorithm can also cluster genes, which show different expression profiles but related functions. One gene can be assigned into multiple modules. Compared with other methods [e.g. hierarchical clustering, K-means clustering, self-organizing maps, PSMF or independent components analysis (ICA)], the newly proposed algorithm appears to have improved performance in identifying biologically meaningful transcriptional modules. The usefulness of our algorithm in transcriptional module discovery was validated by a yeast microarray data analysis.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 SUMMARY
 REFERENCES
 
Framework of the algorithm
Figure 1 shows the general schema of the proposed algorithm. The algorithm is based on two-stage matrix decomposition. First, the NICA stage de-non-linearizes microarray data into independent latent components. Then, the PSMF stage models the gene ‘hidden’ expression profile across the independent latent components as linear weighted combinations of a small number of regulator profiles.


Figure 1
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 A two-stage matrix decomposition of a microarray data set X (N genes and M samples) is obtained by the proposed algorithm. NICA extracts non-linear independent components (columns in Figure 1) from X. At the PSMF stage, Figure 1 is approximated by the product of sparse matrix Y and low-rank Z. The values of all matrices are color coded by using a color heatmap, from dark green (minimum) to dark red (maximum).

 
The NICA stage The NICA method we adopt is based on a variational Bayesian learning approach (Jutten and Karhunen, 2004; Lappalainen and Honkela, 2000). This approach assumes that the microarray data can be described by the noisy non-linear mixing model


Formula 1

(1)
where Formula and Formula denote the observed data and latent source matrices, N is the number of genes in the array, M is the sample size, M' is the number of latent sources. The column dimensions of X and S are in general different. N is the white Gaussian noise matrix and F(·) is a non-linear mixing function. The non-linear mapping F(·) is modeled using a multilayer perceptron (MLP) network (Haykin, 1999) with one non-linear hidden layer as:


Formula 2

(2)
where B and C are the weight matrices of the hidden and output layers and D and E are the corresponding bias matrices. The sigmoid tanh non-linearity is applied component-wise to its each argument.

Assuming that the source signals S at the input layer of the MLP network have simple Gaussian distributions, we obtain a non-linear principal component analysis solution for S based on variational Bayesian learning for blind estimation and separation in non-linear mixture data model in Equation (1). This solution can usually model quite well non-linear mixtures (observed data), but it does not yet provide estimates of the independent source signals. To find independent components from S, we apply a standard linear ICA to S using the FastICA algorithm (Hyvarinen and Oja, 2000). The goal of the linear ICA is to decompose Formula so that columns (components) of Formula are statistically as independent as possible.

The PSMF stage. The PSMF approach is a probabilistic extension of sparse matrix factorization, which can account for uncertainties due to different levels of noise in the data (Dueck et al., 2005). Given a matrix Formula in which element Formula represents the ‘hidden’ expression of gene i under the independent latent component j, PSMF is to find Formula and Formula such that Formula . Each row of Formula has at most K non-zero entries. Row vectors of Z contain unobserved L factor profiles. More specifically, let ki be the number of non-zero entries (ki ≤ K) of the row vector Formula in Y and Formula be the vector that contains column indices of non-zero entries of yi, we model each gene ‘hidden’ expression profile across the independent latent component Formula , as a linear combination of ki of the factor profiles Formula , plus noise:


Formula 3

(3)
Supposing the noise is Gaussian with variance Formula for Formula , then the likelihood of Formula can be written as:


Formula 4

(4)
Assume that zl is normally distributed, li is uniformly distributed and ki is multinomially distributed. Multiplying these priors by Equation (4) forms the joint distribution Formula . From the joint distribution, we estimate elements in Y and Z by utilizing a factorized variational inference method (Jordan et al., 1999). The details about the PSMF model estimation using factorized variational inference are described in Dueck et al. (2005).

Identification of gene modules Given the non-linear independent latent component matrix Formula and its composition matrices Y and Z, we propose a general approach for identifying gene modules. Gene modules are identified by assigning genes with the indices of non-zero element of the column vector Formula of Y (Formula ) into the cluster associated with the factor l, l = 1, ··· , L. Based on the assumption that Z contains unobserved L factor profiles, we compute correlations between row vectors Formula of Formula and row vectors zl of Z and then select the genes that ‘hidden’ expression profiles are significantly correlated with zl as potential transcription factors. The significant level of the observed correlation is dependent on the Pearson correlation value, Ril (i = 1, ··· , N; l = 1, ··· , L) and the sample size M'. The p-value of the correlation is obtained by treating Formula as coming from a t-distribution with M' –2 degrees of freedom (SAS, 2004).

The proposed algorithm is summarized as follows.

Input

  • Microarray data matrix X = [Xij], where the element Xij represents the expression level of gene i associated with the jth sample, i = 1, ··· , N, j = 1, ··· , M.
  • Number of modules (determined by L, the number of possible regulators).
  • Maximum number of effective regulators, K. K should be less or equal to L.

Output

  • Gene set in each of L modules.
  • Potential transcription regulators.

Algorithm

  • De-non-linearize X using the NICA approach.
    • Find the non-linear principle component matrix S from Equation (1) and Equation (2) by variational Bayesian learning.
    • Decompose Formula by the linear FastICA algorithm to obtain linear independent components in Formula .

  • Decompose Formula using the PSMF method with constraint Formula ≥ 0.
    • If prior partial connection information is known, initialize the elements yil = 1 in Y; else set the value of yil in Y randomly.
    • Obtain Y and Z by factorized variational inference.

  • Cluster gene modules and predict key transcription regulators based on Formula , Y and Z.
    • For l = 1, ··· , L, find the indices of non-zero element of column vector Formula in Y where Formula . Genes with these indices are the members in module l.
    • For i = 1, ··· , N and l = 1, ··· , L, compute the correlation coefficient Ril between Formula and zl where Formula and zl isin R1 x M'; sort Ril for a given l and obtain the indices of top Ril's that satisfy cutting threshold and p-value.

Evaluation of the algorithm
We compare our algorithm with other algorithms using the ClusterJudge method based on the Gene Ontology (GO) (Gibbons and Roth, 2002). In the GO, each gene is described by a set of GO attributes of molecular function, biological process, or cellular component that the gene is associated to (www.geneontology.org). Based on the list of genes and associated GO attributes, a contingency table is constructed for each cluster-attribute pair, from which the entropy is computed for each cluster-attribute pair (HAiC), for the clustering result independent of attributes (HC), and for each of the NA attributes in the table independent of clusters (HAi). The total mutual information between the cluster result C and all attributes Ai is computed as:


Formula 5

(5)
where summation is over all attributes i. Two kinds of MI are computed: (1) MIreal, computed from the real clustered data; and (2) MIrandom, computed from a clustering obtained by randomly assigning genes to clusters of uniform size, repeating until a distribution of values is obtained. A z-score is then computed for MIreal and the distribution of MIrandom values (with mean MIrandom and standard deviation Srandom) according to z = (MIrealMIrandom)/Srandom. The z-score represents the standardized distance between MIreal and MIrandom. The larger the z-score is, the greater the distance is, and thus more significantly a clustering result is related to gene functions.

We also examine statistical over-representation of GO terms in each transcriptional module identified by our algorithm. The Fisher's exact test is conducted to calculate the hypergeometric probability of observing a GO term as enriched in a transcriptional module. In specific, the probability p that a GO term is significantly enriched in a module is calculated as:


Formula 6

(6)
where k is the number of genes in the group, G is the total number of genes, n is the number of genes in the module with a given GO term and A is the total number of genes with a given GO term. We further conduct the false discovery rate (FDR) correction, which provides strong control to have less false negatives at the cost of a few more false positives.

Software and experimental validation
We implemented a Matlab-based computational tool, ModulePro, for the algorithm that we developed. We applied ModulePro to a publicly available yeast cDNA microarray dataset for method validation. The yeast dataset contains 6285 genes and 156 samples, including different experimental conditions such as temperature shocks, amino acid starvation, nitrogen source deletion, progression into stationary phase, etc. (Gasch et al., 2000). The normalization of the data was conducted by Zero-Transform (Gollub et al., 2006), and the missing data were filled using the KNNimpute approach (Troyanskaya et al., 2001).


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 SUMMARY
 REFERENCES
 
Feature of the algorithm
The essence of our algorithm is to improve the recovery of transcriptional modules in the form of functional links among genes, made possible by incorporating NICA transformation into PSMF decomposition on microarray data. The NICA method we proposed is based on a variational Bayesian learning (Jutten and Karhunen, 2004; Lappalainen and Honkela, 2000). The method uses a multilayer perceptron (MLP) network as a non-linear mapping to model non-linear mixtures of data. Theoretically, given enough nodes in the hidden layer, a MLP network can model any non-linear mapping from sources to observed data with certain accuracy (Haykin, 1999). In addition, a MLP network is a flexible non-linear mapping because its model complexity scales linearly with the dimension of the latent source space (Lappalainen and Honkela, 2000). To our best knowledge, this NICA method has never been used in microarray data analysis.

PSMF decomposition models gene expression as a linear weighted combination of a small number of prototypes that represent the influence of either different biological or experimental factors (Dueck et al., 2005). The PSMF model is based on the assumption that the expression of each gene is influenced by only a small set of the hidden regulatory variables to various degrees. Prior to PSMF decomposition, we apply NICA transformation. This allows non-linear mixtures and thus provides a more realistic model for gene expression data, since many interactions between genes (e.g. between transcription factors and target genes) are not necessarily linear. The proposed algorithm projects the gene expression under different experimental conditions into corresponding meta-expression in the latent independent components that may correspond to specific biological sources. The algorithm can thus overcome the singularity problem, which often causes PSMF to fail. In comparison with other methods, our algorithm can dramatically improve the quality of gene clustering (described later).

When applying our algorithm, the choice of the parameters L and K at the PSMF stage is important for the structure of decomposition. As described in the Methods section, K is the maximum number of ‘effective’ regulators. L is the predefined number of possible regulators that determines the number of modules in our algorithm. L is in general much smaller than the number of variables (genes) N since gene expression is thought to be regulated by a small set of genes (compared with the total number of genes). These genes act in combination as key regulators to maintain the steady-state abundance of specific mRNAs. For the choice K = 1, each row in the data matrix is associated with only a single factor, and the sparse matrix factorization is a clustering of the data rows. For the choice K = L, the factorization is simply a low-rank approximation. Since we assume that the expression of each gene is determined by only a small subset of the possible regulators we heuristically set K = 3 in our study. It is worthy to note that although our algorithm is unsupervised, prior biological information such as binding transcriptional factor data could be incorporated into the initialization of Y (also see Methods section for details).

Performance of the algorithm
To evaluate the performance, we compared our algorithm with average linkage hierarchical clustering (HC), K-means clustering, SOM, ICA (Lee and Batzoglou, 2003) and PSMF (Dueck et al., 2005) methods. We assert that the best clustering method is that with the strongest tendency to bring genes of similar function together when applied to diverse expression datasets, as we claim in our algorithm. For each method, we examined the relationship between identified clusters and the GO attributes of the genes in the clusters, using the z-score. The z-score is based on mutual information between cluster membership and GO attributes, derived from comparing real clustering with random clustering of genes. A higher z-score indicates that a clustering result is further from random so that more biologically meaningful.

Figure 2 shows the z-scores from clustering by our algorithm and the five other clustering methods on a yeast microarray dataset. The yeast data contains 156 samples from 20 different conditions. The clustering by each method was made under different cluster numbers, ranging from 5 to 60, and the calculated z-scores were plotted as a function of cluster numbers (l) (Fig. 2). In our algorithm, we set the number of independent latent components equal to the number of experimental conditions for simplicity. For getting more accurate non-linear mapping, we set the number of hidden neurons in the MLP network as twice as the number of independent latent components. We also set K = 3 in both our algorithm and PSMF. We selected clusters for ICA as: clusters containing 10% of all genes with either high or low expression levels within the independent components (Lee and Batzoglou, 2003). As illustrated, our algorithm consistently performed well with high z-scores at different choices of l, showing the best enrichment of functional attributes in the gene clusters, in comparison with HC, SOM, PSMF and ICA. The performance of K-means clustering, which was based on the Pearson correlation as the distance metric, was comparable with our algorithm. However, our algorithm partition genes into non-mutually exclusive modules, reflecting the fact that genes may have multiple functions and are active in multiple biological processes, while K-means clustering of genes is mutually exclusive. Strikingly, HC and ICA performed significantly poorer than our algorithm, SOM, K-means and PSMF. We also observed that when applied to gene expression data directly, PSMF often failed on the matrix decomposition. This is because of matrix singularity caused by strong correlation among experimental conditions in gene expression data. Our algorithm, by incorporating NICA into PSMF decomposition and projecting the gene expression of different experimental conditions into corresponding meta-expression in independent latent components, overcomes the singularity problem.


Figure 2
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Comparison of clustering results produced by our algorithm, hierarchical clustering, K-means, self-organizing maps, ICA and PSMF. The horizontal axis shows the number of clusters desired, and the vertical axis shows z-scores. The data set examined is from the yeast (Gasch et al., 2000).

 
For further evaluation of our algorithm, we examined the enriched GO categories in transcriptional modules uncovered from the same yeast microarray dataset. We subsequently identified 30 transcriptional modules by setting L = 30 and K = 3 in our algorithm. Among them, eight modules were significantly enriched by GO terms (corrected P-values ≤ 0.001). Each gene module appeared to be dominated by only a few functional categories (Table 1). The proposed approach is effective in clustering together genes with similar expression profiles and similar function categories. Module C16, for example, is strongly associated with the ‘ribosomal subunit assembly’ process (P-value < 1.39E-16). The heatmap indicated similar expression patterns of genes under different experimental conditions (Fig. 3). More significantly, our algorithm can cluster genes which show different expression profiles but similar functions. Module C11, for example, is significantly enriched by ‘proteolysis’ (P-value < 2.45E-33). The heatmap revealed at least three distinctive expression patterns in this module (denoted as A, B and C, Fig. 4). This clearly places our algorithm above similarity-based methods (e.g. HC and SOM) that are under the assumption that genes with similar expression profiles should have similar functions. We also compared statistical significance of common enriched functional categories in transcriptional modules uncovered by our algorithm, HC, SOM, K-means, ICA and PSMF (Table 2). Among those common functional categories detected significantly by all methods, there are five out of eight functional categories that our method produced significantly lower corrected p-values than other methods did.


Figure 3
View larger version (150K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 A heatmap of expression values of genes in Module C16 analyzed by the proposed algorithm from the yeast dataset. The heatmap shows similar expression patterns of genes under different experimental conditions.

 


Figure 4
View larger version (144K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 A heatmap of expression values of genes in Module C11 analyzed by the proposed algorithm from the yeast dataset. The heatmap shows different expression patterns of genes.

 


View this table:
[in this window]
[in a new window]

 
Table 1 Enrichment of GO categories in transcriptional modules uncovered from the yeast microarray dataset

 


View this table:
[in this window]
[in a new window]

 
Table 2 Comparison of statistical significance of enriched functional categories in transcriptional modules uncovered by our algorithm, HC, SOM, K-means, ICA and PSMF from the yeast microarray dataset

 

    4 SUMMARY
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 SUMMARY
 REFERENCES
 
In this study, we present a new method for transcriptional module discovery from microarray data. The new approach is based on two-stage decomposition of microarray data, firstly by NICA and then by PSMF. Our method offers several advantages: (1) it takes into account the non-linear structure existed in the data; (2) the approach can identify genes of the similar functions yet without similar expression profiles; (3) the method can assign one gene into different modules; and (4) the method avoids the singularity problem in matrix decomposition. In comparison with other methods, our method leads to a significant improvement in uncovering biologically relevant transcriptional modules.


    Acknowledgments
 
This study is supported by the Intramural Research Program, National Institute on Aging, NIH.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Golan Yona

Received on September 18, 2006; revised on November 14, 2006; accepted on December 14, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 SUMMARY
 REFERENCES
 

    Alter, O., et al. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl Acad. Sci. USA, 97, 10101–10106[Abstract/Free Full Text].

    Atkinson, M.R., et al. (2003) Development of genetic circuitry exhibiting toggle switch or oscillatory behavior in Escherichia coli. Cell, 113, 597–607[CrossRef][Web of Science][Medline].

    Brunet, J.P., et al. (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl Acad. Sci. USA, 101, 4164–4169[Abstract/Free Full Text].

    Carmona-Saez, P., et al. (2006) Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics, 7, 78[CrossRef][Medline].

    Chiappetta, P., et al. (2004) Blind source separation and the analysis of microarray data. J. Comput. Biol, . 11, 1090–1109[CrossRef][Web of Science][Medline].

    Dueck, D., et al. (2005) Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics, 21, Suppl 1, i144–i151[Abstract].

    Eisen, M.B., et al. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA, 95, 14863–14868[Abstract/Free Full Text].

    Frigyesi, A., et al. (2006) Independent component analysis reveals new and biologically significant structures in microarray data. BMC Bioinformatics, 7, 290[CrossRef][Medline].

    Gao, Y. and Church, G. (2005) Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics, 21, 3970–3975[Abstract/Free Full Text].

    Gasch, A.P., et al. (2000) Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell, 11, 4241–4257[Abstract/Free Full Text].

    Gibbons, F.D. and Roth, F.P. (2002) Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res, . 12, 1574–1581[Abstract/Free Full Text].

    Gollub, J., Ball, C.A., Sherlock, G. (2006) The Stanford Microarray Database: a user's guide. Methods Mol. Biol, . 338, 191–208[Medline].

    Haykin, S. Neural Networks: A Comprehensive Foundation, (1999) , Prentice Hall McMaster University.

    Holter, N.S., et al. (2001) Dynamic modeling of gene expression data. Proc. Natl Acad. Sci. USA, 98, 1693–1698[Abstract/Free Full Text].

    Hughes, J.D., et al. (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol, . 296, 1205–1214[CrossRef][Web of Science][Medline].

    Hyvarinen, A. and Oja, E. (2000) Independent component analysis: algorithms and applications. Neural Networks, 13, 411–430[CrossRef][Web of Science][Medline].

    Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K. (1999) An introduction to variational methods for graphical models. Learn. Graph. Models, In Jordan, M.I. (Ed.). , Cambridge MIT Press.

    Jutten, C. and Karhunen, J. (2004) Advances in blind source separation (BSS) and independent component analysis (ICA) for nonlinear mixtures. Int. J. Neural. Syst, . 14, 267–292[CrossRef][Medline].

    Kim, P.M. and Tidor, B. (2003) Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Res, . 13, 1706–1718[Abstract/Free Full Text].

    Lappalainen, H. and Honkela, A. (2000) Bayesian nonlinear independent component analysis by multi-layer perceptrons. In Girolami, M. (Ed.). Advances in Independent Component Analysis, Springer-Verlag, pp. 93–121.

    Lee, D.D. and Seung, H.S. (1999) Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791[CrossRef][Medline].

    Lee, S.I. and Batzoglou, S. (2003) Application of independent component analysis to microarrays. Genome Biol, . 4, R76[CrossRef][Medline].

    Liao, J.C., et al. (2003) Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl Acad. Sci. USA, 100, 15522–15527[Abstract/Free Full Text].

    Liebermeister, W. (2002) Linear modes of gene expression determined by independent component analysis. Bioinformatics, 18, 51–60[Abstract/Free Full Text].

    Muller, K., et al. (2001) An introduction to kernel-based learning algorithms. IEEE Trans. Neural Networks, 12, 181–201.

    SAS. Base SAS 9.1 Procedures Guide, (2004) SAS Publishing.

    Segal, E., et al. (2004) A module map showing conditional activity of expression modules in cancer. Nat. Genet, . 36, 1090–1098[Web of Science][Medline].

    Tamayo, P., et al. (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl Acad. Sci. USA, 96, 2907–2912[Abstract/Free Full Text].

    Tan, Y., et al. (2001) Nonlinear blind source separation usign a radial basis function network. IEEE Trans Neural Networks, 12, 124–134[CrossRef].

    Tavazoie, S., et al. (1999) Systematic determination of genetic network architecture. Nat. Genet, . 22, 281–285[CrossRef][Web of Science][Medline].

    Troyanskaya, O., et al. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics, 17, 520–525[Abstract/Free Full Text].

    Wang, G, et al. (2006) LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates. BMC Bioinformatics, 7, 175[CrossRef][Medline].

    Yuh, C.H., et al. (1998) Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. Science, 279, 1896–1902[Abstract/Free Full Text].

    Zhou, X.J., et al. (2005) Functional annotation and network reconstruction through cross-platform integration of microarray data. Nat. Biotechnol, . 23, 238–243[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
H. Li and M. Zhan
Unraveling transcriptional regulatory programs by integrative analysis of microarray and transcription factor binding data
Bioinformatics, September 1, 2008; 24(17): 1874 - 1880.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Chang, Z. Ding, Y. S. Hung, and P. C. W. Fung
Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data
Bioinformatics, June 1, 2008; 24(11): 1349 - 1358.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
23/4/473    most recent
btl640v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, H.
Right arrow Articles by Zhan, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, H.
Right arrow Articles by Zhan, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?