Skip Navigation


Bioinformatics Advance Access originally published online on May 8, 2008
Bioinformatics 2008 24(13):1547-1548; doi:10.1093/bioinformatics/btn224
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/13/1547    most recent
btn224v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (17)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Du, P.
Right arrow Articles by Lin, S. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Du, P.
Right arrow Articles by Lin, S. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

lumi: a pipeline for processing Illumina microarray

Pan Du *, Warren A. Kibbe and Simon M. Lin

Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, IL, 60611, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 A USE CASE
 REFERENCES
 

Summary: Illumina microarray is becoming a popular microarray platform. The BeadArray technology from Illumina makes its preprocessing and quality control different from other microarray technologies. Unfortunately, most other analyses have not taken advantage of the unique properties of the BeadArray system, and have just incorporated preprocessing methods originally designed for Affymetrix microarrays. lumi is a Bioconductor package especially designed to process the Illumina microarray data. It includes data input, quality control, variance stabilization, normalization and gene annotation portions. In specific, the lumi package includes a variance-stabilizing transformation (VST) algorithm that takes advantage of the technical replicates available on every Illumina microarray. Different normalization method options and multiple quality control plots are provided in the package. To better annotate the Illumina data, a vendor independent nucleotide universal identifier (nuID) was devised to identify the probes of Illumina microarray. The nuID annotation packages and output of lumi processed results can be easily integrated with other Bioconductor packages to construct a statistical data analysis pipeline for Illumina data.

Availability: The lumi Bioconductor package, www.bioconductor.org

Contact: dupan{at}northwestern.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 A USE CASE
 REFERENCES
 
Due to cost effectiveness and accuracy, the Illumina microarray (BeadArray) is becoming a popular microarray platform (Kuhn et al., 2004). The Illumina BeadArray technology is based on randomly arranged beads, with each bead binding many identical copies of a gene-specific probe. The BeadArray is constructed so that there are roughly 30 randomly positioned replicates on average for each type of bead. This redundant design yields higher confidence calls and more robust estimations compared with other types of microarrays. However, the uniqueness of Illumina microarray design makes preprocessing and quality control steps significantly different from other types of microarrays. Unfortunately, until now, most other analyses have not taken advantage of the unique properties of the Illumina BeadArray system, and have just incorporated preprocessing methods originally designed for Affymetrix microarrays.

Bioconductor is an open source and open development software project (mainly written in R programming language) for the analysis and comprehension of genomic data (Gentleman et al., 2004). To date, there are two other packages (beadarray and BeadExplorer) in Bioconductor to extend functionalities provided by the Illumina LIMS software named BeadStudio. Beadarray package (Dunning et al., 2007) is mainly designed for quality control and bead-level analysis of BeadArrays. BeadExplorer is aimed to provide data exploration and quality control by leveraging the output of BeadStudio and existing algorithms in the affy package; it does not take advantage of larger number of technical replicates available on the Illumina microarray in the preprocessing. The design objectives of the new lumi package are 2-fold: first, to provide algorithms uniquely designed for Illumina and second, to best utilize existing algorithms and frameworks by following the class infrastructure and gene annotation framework in Bioconductor.

With the comments and contributions from the users and researchers all around the world, the lumi package has made big improvements. The current version of lumi package includes methods for data import, quality control, preprocessing and gene annotation of Illumina microarray data. Besides supporting the existing algorithms for microarray data, the lumi package includes several unique parts: (1) a variance-stabilizing transformation (VST) that utilizes the technical replicates available on the Illumina microarray (Lin et al., 2008); (2) normalization algorithms [including robust spline normalization (RSN) and simple scaling normalization (SSN)] designed for Illumina microarray data and;3) the nucleotide universal identifier (nuID) annotation packages (Du et al., 2007). The nuID annotation packages allow for version- and vendor-independent annotation of each probe. The nuID also uniquely and exactly encodes the original probe sequence through a process that includes error checking.


    2 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 A USE CASE
 REFERENCES
 
2.1 Preprocessing and quality control
The lumi package includes one major class: LumiBatch, which is inherited from ExpressionSet class in Bioconductor to enable interoperability with other Bioconductor packages. The class diagram is shown in Figure 1. LumiBatch class includes numerous methods as discussed below. It extends ExpressionSet class by including three elements, se.exprs, beadNum and detection, in assayData slot to hold the additional information unique to Illumina microarrays. The controlData slot keeps the control probe information, and QC slot keeps the quality control summary. A new history slot is added to the class to track all the operations made on the LumiBatch object. This provides a convenient container for data provenance. Users can choose to keep the Illumina annotation information outputted by BeadStudio in the featureData of the LumiBach object.


Figure 1
View larger version (59K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Object model relationships in the lumi package.

 
There are several major processing methods in the lumi package. lumiR initializes the LumiBatch object by intelligently reading raw data from all versions of Illumina BeadStudio software and a lumiR.batch method is designed to read a batch of data files. lumiB adjusts the array background; lumiT performs variance stabilization of data; lumiN normalizes the variance stabilized data and lumiQ assesses the data quality. All these methods constitute a preprocessing pipeline. For convenience, a lumiExpresso method encapsulates all four methods as one. The methods also include options to call other processing methods previously designed for Affymetrix data. For better visualization and quality control purpose, the lumi package also provides different kinds of plot functions. These plot functions can handle both expression and control probe data. Please refer to the tutorial and function help files for more details.

2.2 Annotation packages
A good annotation package is important for interpreting the analysis results. Illumina BeadStudio uses TargetID or ProbeId to identify individual genes. However, the identifiers are not consistent among different versions of BeadArray chips or even between different batches. This causes difficulties when combining the results using different versions of the chips. We designed a nuID (Du et al., 2007) to address these issues. A nuID is a loss-less compression of the 50mer oligonucleotide sequence and contains error checking and self-identification code.

The Illumina annotation packages were built by using Bioconductor annotation tools with the nuID of each probe used as the identifier. The mappings from TargetID or ProbeId to nuID are also included in the annotation package. Because all the Illumina microarrays use 50mers, by using the nuID universal identifier, we are able to build one annotation database for different versions of the chips of same species. Moreover, a nuID can be directly converted to the probe sequence, and used to get the most updated refSeq matches and annotations. Annotation packages for all current Illumina expression chips (the package names are prefixed with ‘lumi’, and followed by the species name and version number, e.g. lumiHumanAll.db) can be downloaded from Bioconductor.


    3 A USE CASE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 A USE CASE
 REFERENCES
 
Figure 2 shows the data processing flow chart of the use case. The R source code for preprocessing is shown in Figure 3. Since the classes in lumi package is extended from the class ExpressionSet, lots of data analysis packages in Bioconductor can be directly applied to the results of lumi methods. Figure 2 graphs a scenario using the lumi package plus limma, GOstats and MLInterfaces. The more details of the implementation can be found in the tutorial of the lumi package.


Figure 2
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Flow chart of the use case.

 

Figure 3
View larger version (25K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Example R code for Illumina data preprocessing.

 
In conclusion, the lumi package provides class infrastructure and associated methods to construct an Illumina analysis workflow pipeline starting with raw data through functional analysis.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Joaquin Dopazo

Received on December 20, 2007; revised on February 3, 2008; accepted on May 5, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 A USE CASE
 REFERENCES
 

    Du P, et al. nuID: a universal naming scheme of oligonucleotides for Illumina, Affymetrix, and other microarrays. Biol. Direct (2007) 2:16.[CrossRef][Medline]

    Dunning MJ, et al. beadarray: R classes and methods for Illumina bead-based data. Bioinformatics (2007) 23:2183–2184.[Abstract/Free Full Text]

    Gentleman RC, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol (2004) 5:R80.[CrossRef][Medline]

    Kuhn K, et al. A novel, high-performance random array platform for quantitative gene expression profiling. Genome Res (2004) 14:2347–2356.[Abstract/Free Full Text]

    Lin SM, et al. Model-based variance-stabilizing transformation for Illumina microarray data. Nucleic Acid Res (2008) 36:e11.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Virol.Home page
T. Peng, J. Zhu, A. Klock, K. Phasouk, M.-L. Huang, D. M. Koelle, A. Wald, and L. Corey
Evasion of the Mucosal Innate Immune System by Herpes Simplex Virus Type 2
J. Virol., December 1, 2009; 83(23): 12559 - 12568.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
A. J. Jasinska, S. Service, O.-w. Choi, J. DeYoung, O. Grujic, S.-y. Kong, M. J. Jorgensen, J. Bailey, S. Breidenthal, L. A. Fairbanks, et al.
Identification of brain transcriptional variation reproduced in peripheral blood: an approach for mapping brain expression traits
Hum. Mol. Genet., November 15, 2009; 18(22): 4415 - 4427.
[Abstract] [Full Text] [PDF]


Home page
Clin. Microbiol. Rev.Home page
M. B. Miller and Y.-W. Tang
Basic Concepts of Microarrays and Potential Applications in Clinical Microbiology
Clin. Microbiol. Rev., October 1, 2009; 22(4): 611 - 633.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. E. Ritchie, B. S. Carvalho, K. N. Hetrick, S. Tavare, and R. A. Irizarry
R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips
Bioinformatics, October 1, 2009; 25(19): 2621 - 2623.
[Abstract] [Full Text] [PDF]


Home page
StrokeHome page
C. Shi, I. A. Awad, N. Jafari, S. Lin, P. Du, Z. A. Hage, R. Shenkar, C. C. Getch, M. Bredel, H. H. Batjer, et al.
Genomics of Human Intracranial Aneurysm Wall
Stroke, April 1, 2009; 40(4): 1252 - 1261.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Xie, X. Wang, and M. Story
Statistical methods of background correction for Illumina BeadArray data
Bioinformatics, March 15, 2009; 25(6): 751 - 757.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. L. Asare, Z. Gao, V. J. Carey, R. Wang, and V. Seyfert-Margolis
Power enhancement via multivariate outlier testing with gene expression arrays
Bioinformatics, January 1, 2009; 25(1): 48 - 53.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Pathol.Home page
F. Fang, A. J. Flegler, P. Du, S. Lin, and C. V. Clevenger
Expression of Cyclophilin B is Associated with Malignant Progression and Regulation of Genes Implicated in the Pathogenesis of Breast Cancer
Am. J. Pathol., January 1, 2009; 174(1): 297 - 308.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/13/1547    most recent
btn224v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (17)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Du, P.
Right arrow Articles by Lin, S. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Du, P.
Right arrow Articles by Lin, S. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?