Bioinformatics Advance Access originally published online on August 9, 2005
Bioinformatics 2005 21(19):3748-3754; doi:10.1093/bioinformatics/bti617
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Accounting for probe-level noise in principal component analysis of microarray data
1Department of Computer Science, Regent Court 211 Portobello Road, Sheffield S1 4DP, UK
2School of Computer Science, University of Manchester Oxford Road, Manchester M13 9PL, UK
3Department of Biomedical Science, Addison Building Western Bank, Sheffield S10 2TN, UK
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.
Results: We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to denoise a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.
Availability: The software used in the paper is available from http://www.bioinf.man.ac.uk/resources/puma. The microarray data are depo-sited in the NCBI database.
Contact: neil{at}dcs.shef.ac.uk
| 1 INTRODUCTION |
|---|
|
|
|---|
The advent of large-scale gene-expression level analysis through cDNA and oligonucleotide-based microarrays has led to an increasing interest in methods of downstream analysis which can succinctly summarize the information content of the data. Different approaches to analysis include hierarchical clustering, the self organizing map (SOM) and support vector machines [for references and a review see Baldi and Brunak (2001, Ch. 12)]. In practice, expression levels obtained through microarray analysis can be very noisy, particularly for genes which have low expression. A common weakness of the analyses mentioned above is that they cannot take advantage of any credibility intervals provided by the low-level analysis that extracts the expression data. Such credibility intervals (or error bars) are becoming more commonly available for both cDNA arrays (Lawrence et al., 2003) and oligonucleotide arrays (Milo et al., 2003; Hein et al., 2005; Liu et al., 2005). Such low-level analyses are often based on probabilistic models, and by continuing our downstream analysis in a probabilistic manner we can propagate the uncertainty through the analysis. In this paper we show how this may be achieved using principal component analysis (PCA).
PCA is one of the most popular techniques for extracting information from high-dimensional datasets. It seeks to explain high-dimensional data by using low-dimensional latent variables. This approach1 was introduced for the analysis of microarray data by Alter et al. (2000) and has subsequently been used in a number of papers [see references in Girolami and Breitling (2004)]. The motivating idea is that variability between gene-expression levels should be explained by the (few) physiological processes taking place (e.g. response to drug treatment). The principal components (sometimes called eigengenes) would then represent a physiological process and the components of the eigengene would represent the relative weight of each gene in the process.
PCA implicitly assumes that the uncertainty associated with each gene under each condition is constant. This is often an unreasonable assumption, particularly when we are considering the logarithm of the expression levels. One popular method for dealing with this problem is to non-linearly transform the gene-expression levels so that variance across experiments is comparable for each gene (Huber et al., 2002). A drawback with this approach is that a global transformation does not adequately account for the fact that the same gene may be measured with different precision in different experiments. For example, a gene which has a lower expression level in one condition will typically be measured with relatively less precision in that condition. Another drawback with this approach is that a complex non-linear transformation of the data complicates the interpretation of measurements when compared with a global log transformation which measures fold-changes on a comparable scale for all genes. Instead of transforming the data, we prefer instead to propagate variances through the downstream analysis using probabilistic models.
In this paper we propose a modified approach to PCA which can take account of credibility intervals. This leads to a far more robust downstream analysis. Our approach eliminates the need to heuristically reject those genes which are perceived as unreliable before the downstream analysis is applied and allows us to automatically select the latent subspace dimension in a principled way without recourse to complex Bayesian methods. Our model builds on a latent variable model known as probabilistic PCA (Tipping and Bishop, 1999). This model has previously been applied to microarray data analysis and has been extended in order to deal with Bayesian learning (Oba et al., 2003b), missing value estimation (Oba et al., 2003a) and non-Gaussian latent variables (Girolami and Breitling, 2004). However, in all of these previous applications the noise has been considered spherical. Here we provide the first extension to a model with a non-spherical and non-i.i.d. noise distribution. Our approach is quite general and could also be applied to other probabilistic models.
In the next section we review probabilistic PCA and describe how the structure of the model may be modified to account for dimension and measurement-specific noise in the data. We then discuss how the parameters of our model may be optimized in a practical way using an expectationmaximization algorithm. In Section 3 we investigate a dataset derived from oligonucleotide arrays with our method. We show how our approach not only gives a robust version of PCA but also allows us to obtain a denoised version of the dataset and automatically determine the number of principal components to be retained.
| 2 METHODS |
|---|
|
|
|---|
We will start by briefly reviewing the latent variable model known as probabilistic PCA (Tipping and Bishop, 1999). In probabilistic PCA it is assumed that each d-dimensional data point yn can be reconstructed from a q-dimensional latent point xn via a linear transformation W and a corrupting noise vector
n,
![]() |
![]() |
![]() |
N(0,I), then the marginal distribution over yn is found to be
![]() | (1) |
2.1 Relationship to factor analysis
Tipping and Bishop (1999) showed that the maximum-likelihood solution for W recovers the principal subspace of the data; this model is thereby recognized as a probabilistic interpretation of PCA. The model is strongly related to factor analysis (Bartholomew, 1987). In factor analysis, however, the distribution governing
n is a diagonal covariance Gaussian distribution,
![]() |
. We refer to the inverse variance, ßi, as a precision. This leads to a marginal distribution governing yn of the form
![]() |
2.2 Propagating measurement uncertainty
Factor analysis allows different variances (or precisions) for each data dimension but here we want to allow the precision to vary for each data point and dimension. In other words, we wish to allow the variance to be different for every gene in each experiment, i.e. Nd variances where there are d experiments and N different genes. Such a model is far richer than factor analysis but it involves estimation of Nd precision parameters from a data set of typically only Nd separate values. Fortunately, recent advances in probe level analysis techniques have meant that these precisions can be derived directly from the microarray chip (Lawrence et al., 2003; Milo et al., 2003; Hein et al., 2005). For example, this can be achieved, for Affymetrix arrays, by making use of the information about measurement uncertainty provided by multiple probes in each gene's probe-set. The probabilistic models that have been developed associate each gene with a level of uncertainty.
We will consider the following modified model. Take yn to be a d dimensional vector which represents the true log expression level associated with the n-th gene under d different conditions. Rather than observing yn directly we assume that we observe a corrupted form
where
![]() | (2) |
n is noise which is distributed as
![]() |
So far our approach is rather general and we could take the distribution yn to come from any probabilistic model of interest. We will assume a probabilistic PCA model as the marginal distribution for the true expression level yn, as given in Equation (1), and obtain,
![]() | (3) |
and using2 xn
N (0,I) we have the following marginalized likelihood,
![]() |
. Traditionally in PCA the data are centred and µ is taken to be zero. This is reasonable as in probabilistic PCA the maximum-likelihood solution for µ is the empirical mean of the data. However, in the modified model we present, as we shall see, the maximum-likelihood solution for µ is no longer the empirical data mean, so it does not make sense to work with a centred dataset. The parameter µ must be learnt jointly with the rest of the model.
The log likelihood for the suggested model takes the form
![]() | (4) |
. There are several important differences from standard probabilistic PCA: first the matrix An is not proportional to the identity, hence we cannot obtain a closed analytical solution for the maximum-likelihood value of the parameters. Second, differentiating Equation (4) with respect to the mean vector µ (while keeping the other parameters fixed) yields
![]() |
![]() | (5) |
2.3 Efficient likelihood optimization
Given the gradients of the likelihood we can optimize the parameters through a non-linear optimization, such as scaled conjugate gradients. Unfortunately, for our model such an optimization will have high computational demands because the likelihood and its gradient contain several multiplications of large matrices. In practice, for a typical sized microarray dataset, such an approach becomes impractical. A more efficient algorithm can be obtained through an expectationmaximization (EM) approach (Dempster et al., 1977).
Generally EM algorithms lead to a simplified optimization problem (the M-step) by incorporating an additional step (the E-step). For our corrupted data PCA model this additional step is the computation of the posterior distribution for the latent space. This posterior is obtained through Bayes' theorem
![]() | (6) |
![]() |
The portion of the lower bound that directly depends on the model parameters is given by
![]() | (7) |
·
denotes an expectation under the posterior distribution over xn. The required expectations may be evaluated as
![]() |
2,
![]() | (8) |
These gradients lead to fixed point equations for W and µ,
![]() | (9) |
![]() |
These update equations can be iterated to find the maximum-likelihood solution with respect to W and µ. Unfortunately a fixed point equation for
2 (which accounts for any residual variance) is not so straightforward as the gradient with respect to
2 is not linear. An efficient update for
2 can be obtained by using Newton's method.
When maximizing likelihoods through fixed point equations convergence can be slow if there are strong correlations between the parameters. In the modified PCA model, for example, the solution for W is strongly dependent on the solution for µ. When such correlations occur it can be advantageous to introduce apparently redundant parameters to improve the speed of convergence. In the Appendix we describe a redundant parameterisation of this model that allows us to achieve a significant speed up.
2.4 Processing data
Processing of a microarray dataset is undertaken in the following manner. The expression levels are represented in a matrix
. The elements of this matrix are the corrupted log expression levels or, in the case of cDNA arrays, log ratios of expression levels. The uncertainty associated with the data is stored in a separate matrix
in the form of precisions. These precisions are the inverse variances which correspond to the log expression levels (or ratios) in
. The algorithm returns the principal subspace, W, the residual variance,
2 and the inferred mean µ as well as a set of moments under the posterior:
and
. Note that the dimensionality of the subspace can be automatically determined owing to the inclusion of the uncertainty information (see Section 2.5).
We can now recover a posterior estimate for the uncorrupted log expression levels, Y = [y1,...,yN]T. The mean of this estimate for the n-th gene is given by
![]() |
![]() |
n. As we shall show in Section 3 these cleaned-up expression levels can be used in a further analysis stage, such as hierarchical clustering, where they lead to more consistent clusters and expression profiles with reduced levels of uncertainty.
2.5 Number of principal components
The usual approach when implementing PCA for microarray data is to retain a reduced number of principal directions, q, and project the log expression levels along these directions before further processing. Typically the directions retained are those associated with the largest q eigenvalues of the data covariance. This approach has a natural interpretation: the directions associated with the smaller eigenvalues are assumed to arise from noise.
In general, the number of principal components retained is predetermined according to the specific problem under consideration. In our model, however, the probabilistic treatment of the probe-level uncertainty allows us to obtain the maximum number of principal components that can be retained. Intuitively, a direction will be discarded if the variation in the data is not statistically significant given the measurement noise; the model will then return an eigenvalue 0 associated with that direction.
| 3 RESULTS |
|---|
|
|
|---|
To demonstrate the efficacy of our approach we considered a dataset that consisted of a temporal assay of Affymetrix GeneChip arrays that measured the gene-expression profiles of a conditionally immortal cell line, UB/OC-1, from mouse cochlear epithelial cells at embryonic day 13.5 (E13.5), across 14 days of differentiation. The dataset aims to discover gene-expression patterns associated with early differentiation of mammalian auditory hair cells (Rivolta et al., 2002). The experiments consisted of 12 samples obtained at 12 different time points during 14 days of differentiation. Up to day 0 the cells were cultured under proliferating conditions at 33°C. Differentiation was induced by shifting the temperature to 39°C. The dataset is then a temporal profile of 12 data points with no replicates. Of particular interest in this study is the identification of targets regulated by the transcription factor gata-3, which is essential for normal inner ear development. Also important in inner ear development is the protein kinase inhibitor p27kip1. In vivo the expression values of gata-3 and p27kip1 are low before day 4 when they start to rise. They peak on day 8 or 9 and after a couple of days the expression level decreases again to then stabilize around a constant value.
In a probabilistic setting, where gene expression is extracted with a different level of sensitivity (Milo et al., 2003; Liu et al., 2005) it is possible to analyse robustly low expressed genes, like transcription factors, that are crucial in development and regulation of gene networks. For this reason we used the gata-3 expression profile to test our model. Our results show that it is possible to improve in a principled setting the detection of the activity of genes that work at low level but have a crucial role in gene networking. This improvement allows them to be included in any downstream analysis for the identification of related targets.
The raw data were processed using a modified version of the gMOS algorithm (Milo et al., 2003) in which the scales of the gMOS gamma distributions were shared across probe pairs in different experiments, rather than across the probe-set. The means and variances of the log expression levels were then derived from the resulting gamma distribution.
3.1 Profile reconstruction
Our first aim was to assess the sensitivity of the profile reconstruction (as described in Section 2.4) to changes in the associated variances. We therefore considered two sub-sampled datasets containing only 500 of the 13 178 genes. The first dataset contained gata-3 and the second p27kip1. In each case we modelled the data three times with our modified PCA. To show the effect of reduced uncertainty in the data we divided the variances by 1, 4 and 9. The corrected expression profiles are shown in Figures 1 and 2. Note that as the uncertainty in the original profile is decreased the corrected profile tends to stay closer to its original course. As can be seen from the plots, any point with large associated uncertainty (such as the day 1 point for the gata-3 profile) can be significantly changed and this can lead to a large decrease in the associated uncertainty. However, the only data point that is reconstructed at a significantly different level as a result of the reduced uncertainties is the expression level on day 1 for the gata-3 profile, showing a certain robustness to changes in the estimates of the uncertainties.
|
|
3.2 Clustering
Clustering is a widely used technique for summarizing expression levels obtained from gene array data. Because of the large number of genes measured and the complexity of the associated gene networks, identifying groups of genes that behave similarly in the dataset can be a useful exploratory technique for finding functional analogues.
One suggested use of PCA in microarray analysis is as a preprocessing step before cluster analysis. The use of PCA before clustering can be justified by the fact that the larger principal components are expected to capture the structure in the dataset.
In practice, when using standard PCA as a preprocessing step, it is necessary to manually determine the dimension of the latent space, q. Furthermore, the use of standard PCA does not always improve the clustering but often degrades it (Yeung et al., 2001). This is attributable to the fact that the dominant components, which contain most of the variation in the data, are highly influenced by the very noisy data points. Therefore, they do not necessarily capture the data's structure. This is often avoided by introducing arbitrary thresholds in order to reject the low expressed genes; the advantage of our method is to give a principled automatic way to select the relevant genes. By accounting for the variance in the log expression levels we can ensure that the components we extract accurately reflect the structure of the data. In Figures 3, 4 and 5 we perform hierarchical clustering on the first fifty genes. Figure 3 uses the reconstructed profiles obtained as described in the previous section. Figure 4 uses the non-reconstructed profiles whereas Figure 5 uses the (non-reconstructed) profiles of the genes obtained by standard PCA. In both cases these components were derived by considering the entire dataset of 13 178 probes. As a result, standard PCA was severely compromised by the low expressed genes. However, by taking the variances into account, our modified model is able to downweight the influence of these points thereby reducing their influence. Furthermore, the model was able to automatically determine the requisite number of retained components for this dataset (q = 3).
|
|
|
Clustering was performed on the selected genes using the Gene Cluster software from the Eisen Lab (available from http://rana.lbl.gov/EisenSoftware.htm). As expected, there was little information in the genes selected for clustering by standard PCA (Fig. 5). However, for the genes selected by the modified model (Figs 3 and 4) there are high functional correlations within the clusters. In particular, the clustering of the genes with corrected profiles produces three very distinct functionally related groups. For example, the two larger groups are related to cell proliferation and cell-cycle regulation. The heat shock proteins (Hsp) cluster together with the chaperonin proteins (Cct2 and Cct3) and are involved in cell growth and survival. Very important is the role of the mortalin mitochondrial heat shock protein that is highly involved in cell-cycle regulation and cellular senescence specification. The second largest group contains more cell migration and cell proliferation-related genes, such as PTX3 and NAP-1 which seem to be more related to the delamination of these sensory epithelial cells. The groups are highlighting the main primary biological processes in this development, which are cell proliferation and migration that will be then followed by a more specific differentiation stage, where the final fate of the cells is determined.
| 4 DISCUSSION |
|---|
|
|
|---|
We have shown how the noise which is inherent in microarray data may be accounted for, in a principled manner, through a probabilistic model that considers the variances associated with each gene's expression level in each condition. We presented a model that performs a modified form of PCA on the data, automatically determining the required number of components for describing the data structure. We have shown, for an example using Affymetrix GeneChips, how the model can recover an improved estimate of the gene-expression profiles through denoising.
We were also able to show that the improved profiles can lead to tighter and more coherent clusters by applying hierarchical clustering to both the original profiles and the corrected profiles. We expect similar results to be achievable for cDNA arrays where the variances can be extracted through the image processing techniques suggested in Lawrence et al. (2004).
One of the features of the proposed method is that the importance of the genes with large-associated variance is reduced in the downstream analysis. This can obviously be a problem if the estimates of the uncertainties are systematically too large (leading to too many genes being smoothed away) or too small (leading irrelevant genes to dominate the results). It is therefore important that the probe-level analysis required to extract confidence intervals is carried out as accurately as possible.
One aspect of microarray studies is often to provide a list of significant target genes in a given experimental system. In order to provide this, methods such as scoring and cut-off thresholding are normally used. One of the benefits of the proposed method is that it automatically implements a cut-off by downweighting genes with high associated variance. Current work includes how to use the variance information to produce a list of significant targets and assess differential gene expressions under different experimental conditions.
| APPENDIXFAST IMPLEMENTATION OF THE EM ALGORITHM |
|---|
|
|
|---|
The speed of convergence of the EM algorithm can be improved by introducing a redundant parameterization as follows. We introduce non-spherical priors in the latent space so that xn
N(m,C). The E-step in the EM algorithm will now be modified giving
![]() |
![]() |
![]() |
![]() |
![]() |
1/2RT. Here R is an arbitrary rotation matrix, U are the eigenvectors of the matrix WCWT and
is a diagonal matrix whose diagonal elements are the corresponding eigenvalues.
| Acknowledgments |
|---|
GS, MR and NL gratefully acknowledge support from a BBSRC award Improved processing of microarray data with probabilistic models. MM is supported by an Advanced Training Fellowship from the Wellcome Trust. We thank the anonymous reviewers for useful suggestions.
Conflict of Interest: none declared.
| Footnotes |
|---|
1PCA is sometimes referred to as singular value decomposition which is an algorithm that can be used to solve the eigenvalue problem that underpins PCA.
2Girolami and Breitling (2004) argue that this choice of prior distribution has the problem of implying negative expression levels. This is certainly a problem if one works directly with the expression levels; however when considering log expression levels this is no longer a problem. ![]()
Received on June 14, 2005; revised on July 15, 2005; accepted on August 8, 2005
| REFERENCES |
|---|
|
|
|---|
Alter, O., Brown, P.O., Botstein, D. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl Acad. Sci. USA, 97, 1010110106
Baldi, P. and Brunak, S. Bioinformatics: The Machine Learning Approach, (2001) , Cambridge, MT MIT Press.
Bartholomew, D.J. Latent Variable Models and Factor Analysis, (1987) , London Charles Griffin & Co. Ltd.
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39, 138.
Girolami, M. and Breitling, R. (2004) Biologically valid linear factor models of gene expression. Bioinformatics, 20, 30213033
Hein, A.-M.K., Richardson, S., Causton, H.C., Ambler, G.K., Green, P.J. (2005) BGX: a fully Bayesian gene expression index for Affymetrix GeneChip data. Biostatistics, (to appear).
Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A., Vingron, M. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 18, S96S104[Abstract].
Lawrence, N.D., Milo, M., Niranjan, M., Rashbass, P., Soullier, S. (2003) Bayesian processing of microarray images. In Molina, C., Adali, T., Larsen, J., Hulle, M.V., Douglas, S., Rouat, J. (Eds.). Neural Networks for Signal Processing XIII, , Toulouse, France IEEE, XIII Workshop on Neural Networks for Signal Processing, pp. 7180.
Lawrence, M.D., Milo, M., Niranjan, M., Rashbass, P., Soullier, S. (2004) Reducing the variability in cDNA microarray image processing by Bayesian inference. Bioinformatics, 20, 518526
Liu, X., Milo, M., Lawrence, N.D., Rattray, M. (2005) A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips. Bioinformatics, 21, 36373644
Milo, M., Fazeli, A., Niranjan, M., Lawrence, N.D. (2003) A probabilistic model for the extraction of expression levels from oligonucleotide arrays. Biochem. Trans., 31, 15101512.
Oba, S., aki Sato, M., Ishii, S. (2003a) Prior hyperparameters in Bayesian pca. ICANN/ICONIP 2003, , pp. 271279.
Oba, S., aki Sato, M., Takemasa, I., Monden, M., ichi Matsubara, K., Ishii, S. (2003b) A Bayesian missing value estimation method for gene expression data. Bioinformatics, 19, 20882096
Rivolta, M.N., Halsall, A., Johnson, C., Tones, M., Holley, M.C. (2002) Genetic profiling of functionally related groups of genes during conditional differentiation of a mammalian cochlear hair cell line. Genome Research, 12, 10911099
Tipping, M.E. and Bishop, C.M. (1999) Probabilistic principal component analysis. J. Roy. Statist. Soc. B, 6, 611622.
Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L. (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics, 17, 977987
This article has been cited by other articles:
![]() |
E. C. Pacheco-Pinedo, M. T. Budak, U. Zeiger, L. H. Jorgensen, S. Bogdanovich, H. D. Schroder, N. A. Rubinstein, and T. S. Khurana Transcriptional and functional differences in stem cell populations isolated from extraocular and limb muscles Physiol Genomics, March 3, 2009; 37(1): 35 - 42. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Kim, D. G. Bates, I. Postlethwaite, P. Heslop-Harrison, and K.-H. Cho Linear time-varying models can reveal non-linear interactions of biomolecular regulatory networks using multiple time-series data Bioinformatics, May 15, 2008; 24(10): 1286 - 1292. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Sanguinetti, J. Noirel, and P. C. Wright MMG: a probabilistic tool to identify submodules of metabolic pathways Bioinformatics, April 15, 2008; 24(8): 1078 - 1084. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Sanguinetti, N. D. Lawrence, and M. Rattray Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities Bioinformatics, November 15, 2006; 22(22): 2775 - 2781. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Liu, M. Milo, N. D Lawrence, and M. Rattray Probe-level measurement error improves accuracy in detecting differential gene expression Bioinformatics, September 1, 2006; 22(17): 2107 - 2113. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Sanguinetti, M. Rattray, and N. D. Lawrence A probabilistic dynamical model for quantitative inference of the regulatory mechanism of transcription Bioinformatics, July 15, 2006; 22(14): 1753 - 1759. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Rattray, X. Liu, G. Sanguinetti, M. Milo, and N. D. Lawrence Propagating uncertainty in microarray data analysis Brief Bioinform, March 1, 2006; 7(1): 37 - 47. |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

























. The processed profiles appear to be much more tightly clustered than the corrupted profiles in 








