Bioinformatics Advance Access originally published online on April 27, 2006
Bioinformatics 2006 22(14):1760-1766; doi:10.1093/bioinformatics/btl162
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Identification of humoral immune responses in protein microarrays using DNA microarray data analysis techniques
1 School of Information and Computer Sciences, University of California Irvine, CA, USA
2 Institute for Genomics and Bioinformatics and, University of California Irvine, CA, USA
3 Center for Virus Research, University of California Irvine, CA, USA
4 Naval Medical Research Center, Silver Spring MD, USA
5 Department of Molecular Microbiology and Immunology, School of Hygiene and Public Health, Johns Hopkins University Baltimore, MD, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: We present a study of antigen expression signals from a newly developed high-throughput protein microarray technique. These signals are a measure of antibodyantigen binding activity and provide a basis for understanding humoral immune responses to various infectious agents and supporting vaccine and diagnostic development.
Results: We investigate the characteristics of these expression profiles and show that noise models, normalization, variance estimation and differential expression analysis techniques developed in the context of DNA microarray analysis can be adapted and applied to these protein arrays. Using a high-dimensional dataset containing measurements of expression profiles of antibody reactivity against each protein (295 antigens and 9 controls) in 42 malaria (Plasmodium falciparum) protein arrays derived from 22 donors with various clinical presentations of malaria, we present a methodology for the analysis and identification of significantly expressed antigens targeted by immune responses for individual sera, groups of sera and across stages of infection. We also conduct a short study highlighting the top immunoreactive antigens where we identify three novel high priority antigens for future evaluation.
Availability: All software programs (in R) used for the analysis described in this paper are freely available for academic purposes at www.igb.uci.edu/servers/servers.html
Contact: pfbaldi{at}uci.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
The understanding of humoral immune responses and development of safe and effective vaccines against infectious microorganisms is a worldwide goal and the successful smallpox and polio eradication campaigns proved that widespread and devastating diseases can be eliminated by the implementation of effective vaccines. These and other successes inspire optimism that diseases like malaria and tuberculosis, which affect hundreds of millions of people worldwide, may also be controlled or eliminated by vaccines. Attention toward the effectiveness and broader use of existing vaccines, as well as toward the development of new vaccines, has been heightened by recent concerns related to bioterrorist attacks and the emergence of new infectious strains (Russell, 1999, http://www.cdc.gov/ncidod/EID/vol5no4/russell.htm), such as the Asian flu.
There are two types of vaccines: (1) vaccines that are produced from whole infectious organisms that are killed or attenuated and (2) vaccines produced from a small subset of recombinant proteins derived from the organism, called subunit vaccines. Live attenuated or killed organism vaccines can be difficult to manufacture safely; there are risks that some vaccinated individuals may develop disease and the vaccines can be prone to toxic side effects. Subunit vaccines are a safer alternative, but it can be difficult to identify which proteins to use in the vaccine, particularly when the organism comprises a large number of proteins. For example, the genome of Plasmodium falciparum (the parasite responsible for malaria) encodes 5300 proteins and Mycobacterium tuberculosis encodes 4000 proteins. In order to produce a subunit vaccine against either of these agents, how can one choose the best 510 proteins to use?
Accordingly, a high-throughput proteomics technology (Doolan et al., 2003) has been developed in the Felgner laboratory, to systematically screen and identify the subset of antigens expressed by infectious agents, which is preferentially targeted by immune responses associated with infection and protection, and to prioritize the most promising antigens for vaccine development. The general approach was first described in the context of malaria vaccine antigen discovery, but it has since been extended to vaccinia (Davies et al., 2005a,b), Francisella tularensis, Burkholderia pseudomallei and Mycobacterium tuberculosis (unpublished data). The proteome synthesis method takes advantage of a high-throughput cloning and expression approach. The proteins are expressed in a cell-free in vitro transcription/translation system and are printed directly without purification onto microarray chips. The chips are probed with serum from humans or animals that are vaccinated or infected with different microorganisms, developed with Cy3-labeled anti-antibody and read with a confocal laser scanner. The readout is a profile of antibody reactivity against each protein in the infectious agent characteristic of the particular infection, type of vaccine, route of administration, location of infection, species, haplotype, etc. Systematic and reliable identification of significant antigenantibody binding expression signals from these microarray chips is an important task toward (1) determining and understanding humoral immune responses and (2) aiding in vaccine discovery.
In this paper, we analyze expression profiles obtained from proteome microarray chips. Using the proteomics technology described above that can generate expression profiles for large numbers of proteins in a given pathogen, we present the characteristics of protein array signals obtained from probing serum from individuals infected with or exposed to the malaria parasite as an example. We investigate the extension of existing DNA microarray analysis techniques to the automated identification of humoral immune responses. In recent years, we have seen rapid advances in the analysis of DNA microarray chips for the identification of differentially expressed genes (Baldi and Long, 2001; Baldi and Hatfield, 2002; Tusher et al., 2001; Huber et al., 2002; Durbin et al., 2002) for various conditions or cell cycle time-series. There have also been studies of signals from other proteomic technology. For example, Kriel et al. (2004) applied a log-variant transformation (Huber et al., 2002; Durbin et al., 2002) of signals obtained from 2D difference gel electrophoresis technology to effectively normalize and remove dye-specific bias. We show that, with appropriate modification in the processing methodology, some of these techniques can also be applied to the signal intensities from these new high-throughput proteome array chips to analyze immune responses. In particular, a recently validated technique in DNA microarray analysis (Choe et al., 2005) that performs t-tests using a Bayesian estimation of variance (Baldi and Long, 2001) can be used to identify significant bindings between antibodies and antigens in protein microarrays and thus identify new important targets for vaccine development against malaria.
| 2 MATERIALS AND METHODS |
|---|
|
|
|---|
2.1 Immunoblots and microarrays
Plasmid templates used for in vitro transcription/translation are prepared by using QIAprep Spin Miniprep kits (Qiagen), including the optional step, which contains protein denaturants to deplete RNase activity. In vitro transcription/translation reactions (RTS 100 Escherichia coli HY kits from Roche) are set up in 0.2 ml PCR 12-well strip tubes and incubated for 5 h at 30°C, according to the manufacturer's instructions. For immunodot blotting, 0.3 µl of whole RTS reactions is spotted manually onto nitrocellulose and allowed to air dry before blocking in 5% non-fat milk powder in TBS containing 0.05% Tween-20. Dot blots are stained with both mouse anti-poly-His mAb (clone, His-1; H-1029, Sigma) and with rat antihemagglutinin (HA) mAb (clone, 3F10; 1 867 423, Roche). Bound antibodies are detected by incubation in alkaline phosphatase-conjugated goat anti-mouse IgG (H/L) (BioRad) or goat anti-rat IgG (H/L) (Jackson ImmunoResearch) secondary Abs and visualized with nitroblue tetrazolium/BCIP to confirm the presence of recombinant protein. For microarrays, 10 µl of 0.125% Tween-20 is mixed with 15 µl of RTS reaction (to give a final concentration of 0.05% Tween-20), and 15 µl volumes are transferred to 384-well plates. The plates are centrifuged at 1600 g to pellet any precipitate, and supernatant is printed without further purification onto nitrocellulose-coated FAST glass slides (Schleicher & Schuell) by using an OmniGrid 100 microarray printer (Genomic Solutions, Ann Arbor, MI). The production of the microarrays is as described in Davies et al. (2005a). Briefly, a set of open reading frames derived from the P.falciparum (Pf) genomic sequence database (www.PlasmoDB.org) is selected according to several criteria, including their pattern of stage-specific gene or protein expression deduced from genomic, proteomic or cell biology datasets. For genes with introns, primers are designed to amplify each exon separately. The exons of genes containing introns are designated with a small letter e, as e1, e2, etc. Large genes and exons >3000 bp are amplified in segments, with each segment overlapping by 150 nt. The segments are designated with a small letter s as s1, s2, etc. DNA template for PCR is obtained from 3D7 genomic DNA. Initially, PCR using regular Taq DNA polymerase was optimized by decreasing the extension temperature from 6572 to 50°C. Subsequently PCR products are obtained using a Taq polymerase with improved proof-reading characteristics (Triplemaster from Eppendorf) and improving the efficiency of the PCR step to 87%. All other aspects of chip printing and probing with sera are performed as described in Davies et al. (2005a). Proteome chips are probed with sera from 12 semi-immune individuals naturally exposed to malaria [taken from a random subset of 185 individuals in Kenya with different degrees of antibody recognition of Pf sporozoites (SPZ) and parasitized erythrocytes/red blood cells (RBC), by indirect fluorescent antibody test (IFAT)] and 10 individuals experimentally immunized with radiation-attenuated Pf sporozoites at pre-immunization, post-immunization and post-challenge (challenge with infectious sporozoites) time points (provided by the Naval Medical Research Center) (Table 1). Six out of the ten immunized donors were protected against challenge with infectious Pf sporozoites. The pre-bleed samples collected from each of these enrolled subjects prior to immunization represent the appropriate negative/baseline controls for these individuals. Kenyan subjects were exclusively of the Luo ethnic group, and the geographical distribution of this ethnic group combined with the fact that malaria is endemic in Kenya means that ethnically matched malaria-naïve controls do not exist. A pool of hyperimmune sera of the 185 individuals was also evaluated (data not shown). For all staining, slides are first blocked for 30 min in protein array-blocking buffer (Schleicher & Schuell) before incubation in serum diluted 1:50 in blocking buffer with 10% E.coli lysate for 2 h. Bound antibodies are then visualized with Cy3-conjugated anti-human secondary Abs (Jackson ImmunoResearch) and scanned in a ScanArray ExpressHT Microarray Scanner (PerkinElmer). Fluoresence intensities are quantified by using Proscanarray Express software (PerkinElmer). Human serum has high titers of anti-E.coli Abs that mask any antigen-specific responses when using whole rapid-translation system reactions on microarrays. This masking is overcome by the addition of E.coli lysate to the serum dilutions. E.coli lysate is produced from a 1l stationary-phase culture of E.coli (DH5
) re-suspended in 25 ml of TBS/Tween-20 and sonicated with a probe of 2 cm in diameter. We store 1 ml aliquots at 80°C.
|
The quantification software processes spot intensities on the array and determines the mean intensity of pixels within a spot as well as the background pixels around the spot. These local background intensities are subtracted from the raw signals to obtain the local-background corrected antigen expression values. All antigens are spotted at least twice on the array, some of them more than twice. Clones of antigens may be considered separately or as replicates and combined for analysis as a single antigen. In the malaria proteome microarray chip, each slide measurement contains 295 duplicate spots corresponding to 250 antigens, 316 ORFs, which represents
5% of the entire Pf genome (Gardner et al., 2002). In addition, nine true negative controls are spotted on the array in duplicate. An example of the spot intensities is shown in Figure 1.
|
This dataset with 42 array measurements provides an excellent basis for studying the effects of various analysis techniques for the identification of significant antigens on an individual basis, for specific groups within a population, as well as across time or pre-post conditions. For example, we might be interested in studying a specific donor's humoral immune response to different antigens. We might also want to generalize the results for specific groups, such as pre-immunization, post-immunization and post-challenge taking into account the biological differences within the specific donors in each group. The following sections present a framework for conducting differential immunoinformatics analysis on the corresponding protein array data.
2.2 Identification of significantly bound antigens
Given a set of measurements of antigen signals obtained from the sera of several donors/specimens or groups of donors sharing common characteristics (as seen in the malaria data sera pools), the main computational tasks are identifying positively bound antigens in (1) each individual sera sample and (2) each group/pool of samples. In the first case, the task involves comparison of each of the antigen signals (in each measurement) with the control signal to identify significant increase in expression of the antigen. In the second case, sera from each donor/specimen group are pooled together and the mean signals of each antigen in the pooled group are compared with the mean control signal. A diagram outlining the steps involved in each of these tasks is shown in Figure 2.
|
Normalization and transformation. It has been noted that in measurements obtained from DNA microarrays, the standard deviation (SD) of measurements increases with the expression level of genes (Chen et al., 1997). Rocke and Durbin (2001) propose a two-component model to express this relationship as follows:
![]() | (1) |
is the background signal and µ is the actual expression level.
N(0, 
) is the error term that captures proportional error and
is the background error. In this model, the variance in the measurements has a quadratic relationship with the mean signal intensity. We observe that in protein arrays as well, there is evidence to suggest a similar proportional increase in the SD with the mean expression level of the proteins. We illustrate this in Figure 3, which shows a scatter plot of the SD versus the mean signal intensity for the measurements obtained from the sera of the group of pre-immunized donors in the malaria dataset. We observe this variance-mean dependence even for replicate antigens spotted on the same array. For the 42 sera in the malaria dataset, the average intra-array correlation coefficient (r) between the SD and raw mean signal intensity of replicate antigens is found to be 0.47 ± 0.26.
|
In assessing differential expression, there is a need to consider the inherent variance-mean dependence in the data so that changes in intensity signals can be precisely assessed by standard statistical methods for both low- and high-signal intensities. Data can be log-transformed, an operation that has been used (Speed, 2001, http://www.stat.berkeley.edu/users/terry/zarray/Html/log.html) to address this issue. However, several studies (Rocke and Durbin, 2001; Huber et al., 2002) have noted that given the error model shown in Equation (1), the log-transformation shows inflated variances for signals at low intensities. Huber et al. (2002) and Durbin et al. (2002) independently proposed a variant of the log-transform (asinh) that addresses and corrects the inflated variance in low signal intensities. Along with performing the variance stabilization, this method (implemented in a package called vsn which is part of Bioconductor, www.bioconductor.org) calibrates the different measurements in the dataset to be in the same scale to minimize experimental effects (as shown in Kreil et al., 2004, in the context of 2D gel electrophoresis data). The asinh function has the added advantage that, unlike the log function, it is defined for zero and negative signal intensities. Figure 4 illustrates the stabilizing effect of the vsn transformation on the measurements obtained from the group of pre-immunized donors shown in Figure 3. In addition to inter-array variance stabilization, the transformations have a stabilizing effect on the intra-array variance as well. The correlation between the SD and the mean of replicated antigens spotted on the same array is also found to be low for all 42 measurements (correlation mean ± SD: 0.03 ± 0.12 for log data, 0.15 ± 0.09 for vsn transformed data).
|
Issue of small replicates. As in the case of DNA microarray technology, or for that matter, in the case of any measurement, estimation of the variance is a major issue when the degree of measurement replication is small. Even after variance stabilization, we observe that several signals show variances that are artificially too low or too high owing to small number of replicates (e.g. intra-array antigen replication, number of donors per sera pool) or due to outliers in the data. An effective technique to address this issue in DNA microarrays using a Bayesian framework has been described in Baldi and Long (2001). A web implementation called Cyber-T is available at http://www.igb.uci.edu/servers/servers.html and the source code in R is also available for download. In this method, the variance estimate of each gene is regularized by taking into account the variance of neighboring genes, e.g. genes with similar expression levels. More precisely, the variance of a signal is estimated by
![]() | (2) |
0 is the confidence in the background variance of the neighboring genes,
is the average variance of neighboring genes (specified by a window size) or background variance, s2 is the empirical variance and n is the number of measurements.
0 can also be thought of as pseudo-counts associated with a particular prior distribution. In turn, this regularized estimate can be used to conduct t-tests for differential analysis. This approach has been used in several studies to successfully infer gene changes in microarray data (Choe et al., 2005; Hatfield et al., 2002, Hung et al., 2002; Long et al., 2001). Choe et al. (2005) recently showed that the regularized t-tests are more effective than other differential expression estimation techniques [standard t-tests and SAM (Tusher et al., 2001)] by analyzing a spiked microarray dataset with known concentrations. Since most of the antigens in the protein chip datasets are spotted in low replicates, we employ a similar approach of using Bayes regularized t-tests to identify significantly expressed antigens on these newly developed protein microarray chips. This method is easily applied to the situation where antigens with variable number of replicates spotted on the same array need to be compared with controls. True-negative control signal estimation. In order to identify significantly bound antigens, we compare each antigen signal with the true-negative control signal. The most direct way of estimating the control signal is from deliberate introduction of these spots on the array. This was done for the malaria sera where nine separate negative control spots were quantified in duplicate.
In the absence of negative control spots on the array, if the approximate percentage (x) of positive antigens is known or can be estimated (e.g. 1030%), we propose to take the mean of the antigen signals that comprise the lower 100 x% (e.g. 7090%) of the signals on a given array as the control signal and compute the corresponding pooled SD. When there is no information about the approximate fraction of unexpressed antigens, the technique proposed by Rocke and Durbin (2001) may be applied. This method starts with a set of low intensity measurements and iteratively adds genes/antigens whose intensities are within a certain SD lower and higher from the mean intensity of the set. The set mean and pooled SD are re-computed and the iterations continue until the set of identified genes does not change.
Individual analysis for malaria data: comparing antigens in each array with controls. In the malaria dataset, nine controls are spotted in duplicate. When the dataset comprises a set of different replicated control spots, we may select the replicated control spot which had the largest (max) mean signal among all the separately probed controls with the reasoning being that all positive antigens would have a signal at least as strong as any of the replicated controls. However, if any of the control spots in the array get contaminated due to spill-over from adjacent positive antigens, this will result in an abnormally high true negative control estimate. An alternative is to consider all the control spots as replicates and use the mean and SD of the group as the true negative signal. We have used this method since it is more robust to outliers among the control intensities even though it results in a lower average true negative signal than the stricter max control approach.
We apply the vsn transformation (Huber et al., 2002) to the data to approximately render the variance independent of the mean intensity of intra-array replicated antigens. The vsn method also calibrates the 42 arrays through scaling and shifting so the resultant ranking/P-values from the analysis may be comparable across arrays. To smoothen the artificially low or high sample SD of replicated antigens within an array, we compute the Bayes-regularized variance estimates as shown in Equation (2). In comparison to DNA microarrays which probe thousands of genes, the protein arrays considered here are relatively much smaller containing on the order of several hundreds of antigens (selected as described earlier). We therefore use a relatively small window size (31, i.e. 15 neighboring antigens) with a moderate confidence of five pseudo-counts (
0) to achieve a regularizing effect (e.g. Fig. 5). We next conduct a series of t-tests using the regularized variance and modified degrees of freedom (incorporating the pseudo-counts). We discard antigens with mean signals lower than the control and obtain P-values from the t-tests comparing the remaining antigens with the mean control signal.
|
Pooled analysis for malaria data: comparing antigens in each cohort/group with controls. As indicated in Table 1, the 42 measurements of malaria sera belong to several groups. In addition to determining the positive antigens in each of the individual sera, we may also be interested in combining the sera belonging to each pool/group and determining the positive antigens taking into account the biological variation within the sera. We describe the method of performing the pooled analysis using the malaria sera as an example. Sera pools containing more than one measurement in each pool (Table 1) are included in this analysis. Replicated antigens in each array/measurement are averaged and then grouped according to the different pools. Antigen signals in arrays belonging to a pool are now considered as replicates. Prior to the averaging and pooling, we perform a calibration with variance stabilization using the vsn transformation on the raw expression signals. We estimate the true negative control as the average of the control spots in each pool. A Bayes regularized t-test is performed (window size = 31,
0 = 5) comparing the signals of each antigen with a higher mean signal than the control, with the average control signal of the measurements in the pool. The results are described in the following section. Pooled analysis: comparing antigen changes between groups. An aspect of interest is determining whether any of the antigens found positive in one of the pools (after comparing with controls) also shows significant changes compared with the other pools. We apply a modified one-way analysis of variance (ANOVA) which identifies significant between-group effects where the within-group error uses the Bayes-regularized variances of each group (the R source called bayesAnova.R is available for download on Cyber-T). Antigens found significant from the Bayes regularized ANOVA are further analyzed using TukeyHSD post hoc pair-wise comparisons to determine which pairs of groups show changes. The TukeyHSD test uses the same regularized within-group error term computed for the ANOVA.
2.3 Estimation of global false positive error rates
For each of the 42 individual measurements and the 7 pooled groups, we perform t-tests comparing positively expressed antigen to the control signal. Given the large number of hypothesis being tested, there is a need to determine a P-value cutoff below which we will consider antigens to be of interest and a methodology for estimating the experiment-wide false positive and false discovery rates (FPR and FDR). Several approaches have been used in DNA microarray analyses to estimate these error rates (Storey, 2002 implemented in the GeneTS package (Ahdesmaki et al., 2005, http://www.statistik.lmu.de/~strimmer/software/genets/) in R http://www.R-project.org) and (Allison et al., 2002) (available on Cyber-T). Both implementations estimate the proportion of truly null hypothesis and yield highly consistent estimates of the P-value cutoff needed to achieve the desired FDR (data not shown).
In the following section, we present antigens that are significantly expressed with a P-value cutoff of 3e5 which corresponds to an estimated global FDR of 1e4. We select this strict cutoff to present a subset of the identified antigens as an example of the type of results that can be obtained by such analyses. An extended study with the biological implications of the findings is easily achieved by modifying the selection criteria such as the desired FDR.
| 3 RESULTS |
|---|
|
|
|---|
3.1 Normalized versus raw signals
As an example, we examine the plots of raw and normalized signal intensities obtained from two different arrayspre-immunization and post-immunization sera signals of a single donor (Fig. 6, upper panel, left figure). Due to experimental differences we see that the two array intensities are not in the same scale even though most of the antigens are known not to be differentially expressed. A shifting (offset) and scaling operation performed during the vsn transformation appropriately normalizes the signals as shown in the upper right panel in Figure 6. The lower panel shows the effect of the asinh function applied to the raw signals which approximates a linear function of the lower intensities and a logarithmic function of the higher intensities.
|
3.2 Identification of significantly bound antigens
Among the 42 arrays, 30 measurements comprise antigenantibody binding intensities of 10 individuals' sera obtained during three stages of infectionpre-immunization (including pre-immune, pre-bleed and pre-boost), post-immunization and post-challenge. As an example, we present the positive antigens of one such individual (immunized and protected) across all three stages. For this individual, the 3 sera (pre-immunization, post-immunization and post-challenge) are quantified separately and signal intensities recorded for the 295 antigens and 9 controls each spotted in duplicate. After performing the Bayes-regularized t-tests on the normalized signals, we examine the significantly expressed antigens (P < 3e 5) of this donor's sera when compared with the control signal. Tables 2 and 3 shows the list of significant antigens found in each category and the mean transformed signals.
|
|
The top antigens in the immunized groups (post-immunization and post-challenge) are already characterized Pf antigens. None of these antigens shows a significant response in the pre-immunized group. The array signals have been normalized so that these signals are comparable and the differential expression can be viewed across the three stages of immunization. Figure 7 shows the top four post-challenge antigen signals for each stage of immunization. Control signals are similar across all stages. While the individual analysis provides very specific information about the humoral immune response of an individual to a pathogen and highlights individual-to-individual variation, the pooled analysis provides a comprehensive picture of the response in the bigger population. We present the antigens found significant from the pooled analysis in the irradiated sporozoite immunized subcategories in Table 1.
|
In contrast to the immunized groups, which reacted to only a few antigens, the naturally exposed group was found to react strongly to a broad range of antigens (107 proteins with P < 3e 5, Supplementary Table 1). The top three immunoreactive proteins CSP (Dame et al., 1984), SSP2 (Robson et al., 1988) and AMA1 (Bodescot et al., 2004) are already characterized Pf antigens known to be expressed in the sporozoite and/or liver stage of the parasite life cycle. They are recognized by both the naturally exposed as well as the immunized groups. We see interesting responses from a few hypothetical proteins MAL7P1.32 and PFL2410w-e1, PFI0580c-e2, PF13_0267a. These antigens also show strong responses in the naturally exposed group.
We observe consistency between the top antigens recognized in the immunized-protected group (pooled analysis) and the single donor belonging to that group (individual analysis). However, specific individual responses to certain antigens can be expected (e.g. PFB0915w-e2s1). The analysis of individual sera helps us investigate the number of donors that responded to various antigens in each of the groups.
Three of the hypothetical antigens, MAL7P1.32, PFL2410w-e1 and PFI0580c-e2 are found to have significantly strong responses in at least ten of the naturally or sporozoite immunized donor sera (post-immunization and post-challenge) and are stronger in some sets of donors than others. For example, MAL7P1.32 is recognized well by several donors across all groups. PFL2410w-e1 response is mainly prevalent in naturally exposed donors (9/12) and post hoc ANOVA pair-wise comparisons (TukeyHSD) reveal that PFL2410w signals in the naturally exposed group are significantly stronger (P < 3e 5) than each of the other groups. As a start, these can be regarded as high priority antigens for further evaluation. By modifying our criteria for selecting top antigens (e.g. relaxing the desired global FDR and corresponding P-value cutoff), we will be able to, as shown above, study a broader set of antigen responses (as ranked in Supplementary Table 1).
| 4 CONCLUSION |
|---|
|
|
|---|
In summary, we present a study of antigen expression signals obtained from protein microarray chips that are based on a novel high-throughput proteomics technology (Doolan et al., 2003; Davies et al., 2005a,b). To the best of our knowledge, ours is the first study to extend DNA microarray data analysis techniques to the analysis of protein microarray data. We are addressing a similar problem in both, namely differential analysis of light signals derived from molecular binding events. Protein microarray experiments are subject to measurement errors similar to DNA microarray experiments and we observe a similar variance-mean relationship in both signals. We are therefore able to process these data using essentially the same error model. Furthermore, it is expensive to conduct repeated experiments and hence the Bayes-regularized t-test of Baldi and Long (2001), which addresses the problem of low replication, can be reused. We believe this general data analysis methodology can be applied to the study of expression profiles from other proteome projects. In combination with the protein chips used in this study, these statistical methods can help identify new antigens of relevance in diagnostic and vaccine development applications. This is an important step toward promoting a better understanding of humoral immune responses in subgroups and individuals in the general population.
Future studies validating the effectiveness of using these antigens in sub-unit vaccines will provide a basis for evaluating these and other emerging protein microarray experimental and computational methods. In time, these methods should lead to better datasets for machine learning applications in immunological bioinformatics (Lund et al., 2005).
| Acknowledgments |
|---|
The bioinformatics and primer design components in this work were supported primarily by National Institutes of Health Bio-medical Informatics Training Program Grant 5T15LM007743 and National Science Foundation Grant MRI EIA-0321390 to P.B. and the Institute for Genomics and Bioinformatics at UCI. The protein array component was supported primarily by National Institute of Allergy and Infectious Diseases Grants U01AI056464 and 1U01AI061363-01 to P.F. The views expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the Department of Navy, Department of Defense or the US Government.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: David Rocke
Received on January 23, 2006; revised on April 16, 2006; accepted on April 23, 2006
| REFERENCES |
|---|
|
|
|---|
Ahdesmaki, M., Fokianos, K., Schaefer, J., Strimmer, K. (2005) GeneTS: Microarray Time Series and Network Analysis, R package version 2.8.0.
Allison, D.B., et al. (2002) A mixture model approach for the analysis of microarray gene expression data. Comput. Stat. Data Anal, . 39, 120.
Baldi, P. and Long, A.D. (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics, 17, 509519
Baldi, P. and Hatfield, G.W. DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling, (2002) , Cambridge, UK Cambridge University Press.
Bodescot, M., et al. (2004) Transcription status of vaccine candidate genes of Plasmodium falciparum during the hepatic phase of its life cycle. Parasitol. Res, . 92, 449452[CrossRef][ISI][Medline].
Chen, Y., et al. (1997) Ratiobased decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics, 2, 364374[CrossRef].
Choe, S.E., et al. (2005) Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol, . 6, R16[CrossRef][Medline].
Dame, J.B., et al. (1984) Structure of the gene encoding the immunodominant surface antigen on the sporozoite of the human malaria parasite Plasmodium falciparum. Science, 225, 593599
Davies, D.H., et al. (2005a) Profiling the humoral immune response to infection by using proteome microarrays: high-throughput vaccine and diagnostic antigen discovery. Proc. Natl Acad. Sci. USA, 102, 547552
Davies, D.H., et al. (2005b) Vaccinia H3L envelope protein is a target of neutralizing antibodies in humans and elicits protection against lethal challenge in mice. J. Virol, . 79, 1172411733
Doolan, D.L., et al. (2003) Utilization of genomic sequence information to develop malaria vaccines. J. Exp. Biol, . 206, 37893802
Durbin, B., et al. (2002) A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics, 18, S105S110[Abstract].
Gardner, M.J., et al. (2002) Genome sequence of the human malaria parasite Plasmodium falciparum. Nature, 419, 498511[CrossRef][Medline].
Hatfield, G.W., et al. (2002) Differential analysis of DNA microarray gene expression data. Mol. Microbiol, . 47, 871877.
Huber, W., et al. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 18, Suppl., S96S104[Abstract].
Hung, S.-P., et al. (2002) Global gene expression profiling in Escherichia coli K12: the effects of leucine-responsive regulatory protein. J. Biol. Chem, . 277, 4030940323
Kreil, D.P., et al. (2004) DNA microarray normalization methods can remove bias from differential protein expression analysis of 2D difference gel electrophoresis results. Bioinformatics, 20, 20262034
Long, A.D., et al. (2001) Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. Analysis of global gene expression in Escherichia coli K12. J. Biol. Chem, . 276, 1993719944
Lund, O., Nielsen, M., Lundegaard, C., Kesmir, C., Brunak, S. Immunological Bioinformatics, (2005) , Cambridge, MA, USA MIT Press.
Russell, P.K. (1999) Vaccines in civilian defense against bioterrorism. Emerg. Infect. Dis, . 5, 531533[ISI][Medline].
Robson, K.J., et al. (1988) A highly conserved amino-acid sequence in thrombospondin, properdin and in proteins from sporozoites and blood stages of a human malaria parasite. Nature, 335, 7982[CrossRef][Medline].
Rocke, D.M. and Durbin, B. (2001) A model for measurement errors for gene expression arrays. J. Comput. Biol, . 8, 557569[CrossRef][ISI][Medline].
Speed, T. (2001) Always log spot intensities and ratios. Speed Group Microarray Page.
Storey, J.D. (2002) A direct approach to false discovery rates. J. R. Stat. Soc. B, 64, 479498[CrossRef].
Tusher, V.G., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 51165121
This article has been cited by other articles:
![]() |
A. G. Barbour, A. Jasinskas, M. A. Kayala, D. H. Davies, A. C. Steere, P. Baldi, and P. L. Felgner A Genome-Wide Proteome Array Reveals a Limited Set of Immunogens in Natural Infections of Humans and White-Footed Mice with Borrelia burgdorferi Infect. Immun., August 1, 2008; 76(8): 3374 - 3389. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Jing, D. H. Davies, T. M. Chong, S. Chun, C. L. McClurkan, J. Huang, B. T. Story, D. M. Molina, S. Hirst, P. L. Felgner, et al. An Extremely Diverse CD4 Response to Vaccinia Virus in Humans Is Revealed by Proteome-Wide T-Cell Profiling J. Virol., July 15, 2008; 82(14): 7120 - 7134. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. H. A. Osier, G. Fegan, S. D. Polley, L. Murungi, F. Verra, K. K. A. Tetteh, B. Lowe, T. Mwangi, P. C. Bull, A. W. Thomas, et al. Breadth and Magnitude of Antibody Responses to Multiple Plasmodium falciparum Merozoite Antigens Are Associated with Protection from Clinical Malaria Infect. Immun., May 1, 2008; 76(5): 2240 - 2248. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sundaresh, A. Randall, B. Unal, J. M. Petersen, J. T. Belisle, M. Gill Hartley, M. Duffield, R. W. Titball, D. H. Davies, P. L. Felgner, et al. From protein microarrays to diagnostic antigen discovery: a study of the pathogen Francisella tularensis Bioinformatics, July 1, 2007; 23(13): i508 - i518. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











