Bioinformatics Advance Access originally published online on September 25, 2008
Bioinformatics 2008 24(23):2706-2712; doi:10.1093/bioinformatics/btn508
Improving 2D-DIGE protein expression analysis by two-stage linear mixed models: assessing experimental effects in a melanoma cell study
1School of Engineering, Intelligent Data Analysis Group, Catholic University of Córdoba, 2CONICET, Córdoba, 3Laboratory of Molecular and Cellular Therapy, Fundación Instituto Leloir, Buenos Aires, Argentina, 4Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain, 5Facultad de Agronomía, UBA (University of Buenos Aires), Buenos Aires and 6Biometric Department, National University of Córdoba, Córdoba, Argentina
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Difference in-gel electrophoresis (DIGE)-based protein expression analysis allows assessing the relative expression of proteins in two biological samples differently labeled (Cy5, Cy3 CyDyes). In the same gel, a reference sample is also used (Cy2 CyDye) for spot matching during image analysis and volume normalization. The standard statistical techniques to identify differentially expressed (DE) proteins are the calculation of fold-changes and the comparison of treatment means by the t-test. The analyses rarely accounts for other experimental effects, such as CyDye and gel effects, which could be important sources of noise while detecting treatment effects.
Results: We propose to identify DIGE DE proteins using a two-stage linear mixed model. The proposal consists of splitting the overall model for the measured intensity into two interconnected models. First, we fit a normalization model that accounts for the general experimental effects, such as gel and CyDye effects as well as for the features of the associated random term distributions. Second, we fit a model that uses the residuals from the first step to account for differences between treatments in protein-by-protein basis. The modeling strategy was evaluated using data from a melanoma cell study. We found that a heteroskedastic model in the first stage, which also account for CyDye and gel effects, best normalized the data, while allowing for an efficient estimation of the treatment effects. The Cy2 reference channel was used as a covariate in the normalization model to avoid skewness of the residual distribution. Its inclusion improved the detection of DE proteins in the second stage.
Contact: elmer.fernandez{at}ucc.edu.ar
Supplementary information: R and SAS codes to analyze DIGE data with the proposed approach are available at http://www.uccor.edu.ar/modelo.php?param=3.8.5.15.2
Received on March 5, 2008; revised on May 14, 2008; accepted on September 22, 2008