Skip Navigation


Bioinformatics Advance Access originally published online on July 28, 2005
Bioinformatics 2005 21(18):3697-3699; doi:10.1093/bioinformatics/bti600
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/18/3697    most recent
bti600v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by He, M.
Right arrow Articles by Kepler, T. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by He, M.
Right arrow Articles by Kepler, T. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

SpA: web-accessible spectratype analysis: data management, statistical analysis and visualization

Min He 1, John K. Tomfohr 1, Blythe H. Devlin 2, Marcella Sarzotti 3, M. Louise Markert 2,3 and Thomas B. Kepler 1,3,*

1Department of Biostatistics and Bioinformatics, Duke University Medical Center Durham, NC 27708, USA
2Department of Pediatrics, Duke University Medical Center Durham, NC 27708, USA
3Department of Immunology, Duke University Medical Center Durham, NC 27708, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 IMPLEMENTATION AND...
 3 CONCLUSION
 REFERENCES
 

Summary: SpA is a web-accessible system for the management, visualization and statistical analysis of T-cell receptor spectratype data. Users upload data from their spectratype analyzers to SpA, which saves the raw data and user-defined supplementary covariates to a secure database. The statistical engine performs several data analyses and statistical summaries. The visualization engine displays spectratype histograms in a Java applet and in an image file suitable for download. All of these results are also saved to the database and remain accessible to the user. Additional statistical tools specific to the analysis of multiple spectratypes are also available through the SpA interface.

Availability: The service is freely accessible via the web at http://www.duke.edu/~kepler/spa.html. Additional technical support and specialized statistical analysis and consultation are available by arrangement with the authors and, depending on the service requested, may be subject to fee.

Contact: kepler{at}duke.edu


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 IMPLEMENTATION AND...
 3 CONCLUSION
 REFERENCES
 
The vertebrate immune system depends crucially on the generation and maintenance of an enormous diversity of specialized receptors called T-cell receptors (TCRs) and B-cell receptors. They are used to sense the presence of microbial pathogens (see e.g. Janeway et al., 2004). Loss of this diversity can compromise the effectiveness of immune surveillance; it can also signal other underlying deficiencies. Spectratype analysis was developed for gauging TCR diversity by measuring the length-distribution of the third complementarity determining region (CDR3) in rearranged T-cell receptor ß-chain (TCRB) genes using PCR amplification and size separation of the amplified products (Cochet et al., 1992; Pannetier et al., 1993, Pannetier et al., 1997).

Spectratype data are typically analyzed subjectively, using expert judgment to classify CDR3 length histograms into a small number of categories (see e.g. Sarzotti et al., 2003). Although such analyses have yielded much useful information in both basic biological and clinical settings, we developed an objective, a statistically rigorous approach to quantitative spectratype analysis based on the hierarchical-relative multinomial model (Kepler et al., 2005). Here we describe SpA (Spectratype Analysis), a comprehensive data management system that integrates these statistical tools with tools for the management and visualization of spectratype and covariate data. A system designed with similar purposes in mind was developed by Collette and Six (2002) for use within spreadsheet programs and is available from those authors as well.


    2 IMPLEMENTATION AND FUNCTIONALITY
 TOP
 Abstract
 1 INTRODUCTION
 2 IMPLEMENTATION AND...
 3 CONCLUSION
 REFERENCES
 
SpA is written in Fortran90, C++ and Java, and is interconnected to a relational database Oracle on a Linux platform.

Raw spectratype data, processed data, results of summary statistical analysis and graphics are stored in an Oracle database. These individual data are available for comparative analyses of multiple datasets, hypothesis testing and statistical modeling using the integrated statistical engine. The system architecture of SpA is briefly diagrammed in Figure 1.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 1 SpA system architecture.

 
2.1 Data fetching and preprocessing
Software developed for DNA sequence analysis, such as GeneScan® (Applied Biosystems, Foster City, CA) and Genotester® (Amersham, Uppsala), is typically used for peak detection and intensity quantification. The software typically produces 24 spectratype data files (one for each primer pair or, roughly, each TCRBV family). In SpA, the user compresses these 24 files into a single ZIP-formatted archive (zipfile), which is then uploaded through the input interface. After the archive has been uploaded, the decompression module opens the archive and stores the individual files in a temporary directory. The data are then preprocessed to convert the so-called peak locations and sizes into CDR3 lengths and relative abundances, and these preprocessed data are stored along with the raw peak calls with a preprocessing log detailing the data conversion in the database.

SpA registered users (registration is free of charge) can customize the flexible interface to facilitate the use of the software for his/her specific purpose, specifying any number of user-defined covariates (such as subject age and clinical status) to be stored with the spectratype data and used in subsequent data analyses. In subsequent sessions, the interface will display the previously entered covariate descriptors. Additional covariates can be added at any time. The interface for adding covariates is shown in Figure 2A. For comparative or integrative analyses users can add or modify covariates at any time, even after the spectratype data are uploaded to the SpA system.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 2 SpA screenshots. (A) Interface for adding covariates; (B) Interface for modifying the size conversion table; (C) Interface for spectratype visualization.

 
Because different users may use different PCR primers to amplify CDR3, spectratype analysis requires the specification of a Size Conversion Table that provides the correspondence between amplicon absolute length and CDR3 length for each receptor family. Using SpA the user can select from available standard Size Conversion Tables or supply his/her custom Size Conversion Tables, which will be stored for subsequent use by that user. The interface for modifying the Size Conversion Table is shown in Figure 2B.

2.2 Statistical data analysis
After preprocessing, the statistical data analysis modules are invoked to perform summary analyses of the histograms for each TCRBV family and for the assay as a whole. This information is made available for immediate viewing and stored for later retrieval.

Additional statistical analyses and visualizations are available for comparison and integration of multiple assays. For example, if the user has supplied categorical covariate data, such as presence or absence of a given drug therapy, statistical comparisons of repertoire diversity based on that covariate can be performed. If continuous covariate data are supplied, such as date-post-intervention, regressions of TCR diversity can be computed, relevant hypothesis tests performed and plots of these fits produced. In addition to the specific procedures provided by SpA, we have developed an interface that integrates the powerful and general purpose data analysis package R (http://www.r-project.org/) to the system for more specialized procedures. These procedures can be designed and carried out by the user (it is free for academic users), or, through special arrangement, by the authors' research team.

2.3 Visualization
Each TCRBV family produces a CDR3 length histogram, which is rendered along with a curve showing the population mean curve appropriate to that family (Fig. 2C). The image file is produced using the Java 2D image package, and can be saved on the user's machine as a PNG (portable network graphics) file as well as stored into the SpA database.

2.4 Security and access
It is essential to maintain strict data confidentiality for both ethical and scientific reasons. Our system is developed on a secure server with secure login-based access. Academic users may register free of charge and receive the benefits of the customizable interfaces discussed above, in addition to data security. Unregistered users can gain access through the public interface, but in this case, leave their data open for public access. Technical support and assistance with more complex data analyses are available by arrangement with the authors. SpA is publicly available at https://www.duke.edu/~kepler/spa.html


    3 CONCLUSION
 TOP
 Abstract
 1 INTRODUCTION
 2 IMPLEMENTATION AND...
 3 CONCLUSION
 REFERENCES
 
SpA is a web-accessible spectratype statistical analysis and data management system. It allows users to submit spectratype and general covariate data for processing, analysis and visualization. Existing analyses can be interactively retrieved and used in comparative and integrative analyses. A detailed tutorial is available on the SpA site.


    Acknowledgments
 
The authors thank Lindsay Cowell, Shaza Fadel, Jun Lu and Faheem Mitha for helpful discussions, and Bill Zeggert, Dan Ozaki and Jie Li for technical assistance. This work was supported financially by the Duke University Center for Translational Research NIH 5 P30 AI051445-03 and the Southeast Regional Center for Biodefense and Emerging Infections NIH U54 AI057157-02 as well as grants R01 AI 47040 and R01 AI 54843 from the NIH.

Conflict of Interest: none declared.

Received on April 29, 2005; revised on July 10, 2005; accepted on July 26, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 IMPLEMENTATION AND...
 3 CONCLUSION
 REFERENCES
 

    Cochet, M., et al. (1992) Molecular detection and in vivo analysis of the specific T cell response to a protein antigen. Eur. J. Immunol, 22, 2639–2647[Web of Science][Medline].

    Collette, A. and Six, A. (2002) ISEApeaks: an Excel platform for GeneScan and Immunoscope data retrieval, management and analysis. Bioinformatics, 18, 329–330[Abstract/Free Full Text].

    Janeway, C.A., Travers, P., Walport, M., Shlomchik, M.J. Immunobiology, (2004) 6th edn , Garland Publishing.

    Kepler, T.B., et al. (2005) Statistical analysis of antigen receptor spectratype data. Bioinformatics, 21, 3394–3400[Abstract/Free Full Text].

    Pannetier, C., et al. (1993) The size of the CDR3 hypervariable regions of the murine T-cell receptor B chains vary as a function of the recombined germ-line segments. Proc. Natl Acad. Sci. USA, 90, 4319[Abstract/Free Full Text].

    Pannetier, C., Levraud, J.-P., Lim, A., Even, J., Kourilsky, P. (1997) The immunoscope approach for the analysis of T cell repertoires. In Oksenberg, J.R. (Ed.). The Antigen T Cell Receptor: Selected Protocols and Applications, , Austin, TX Landes, pp. 287–325.

    Sarzotti, M., et al. (2003) T cell repertoire development in humans with SCID after nonablative allogeneic marrow transplantation. J. Immunol, 170, 2711–2718[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BloodHome page
M. Sarzotti-Kelsoe, C. M. Win, R. E. Parrott, M. Cooney, B. K. Moser, J. L. Roberts, G. D. Sempowski, and R. H. Buckley
Thymic output, T-cell diversity, and T-cell function in long-term human SCID chimeras
Blood, August 13, 2009; 114(7): 1445 - 1453.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/18/3697    most recent
bti600v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by He, M.
Right arrow Articles by Kepler, T. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by He, M.
Right arrow Articles by Kepler, T. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?