Skip Navigation


Bioinformatics Advance Access originally published online on September 16, 2004
Bioinformatics 2005 21(5):669-670; doi:10.1093/bioinformatics/bti030
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/5/669    most recent
bti030v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Huang, Y.
Right arrow Articles by Gingle, A. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Huang, Y.
Right arrow Articles by Gingle, A. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

ESTminer: a Web interface for mining EST contig and cluster databases

Yecheng Huang 1, Janie Pumphrey 2 and Alan R. Gingle 1,*

1 Center for Applied Genetic Technologies, University of Georgia 111 Riverbend Road, Athens, GA 30602, USA
2 Janie Pumphrey 2028 Spruce St Apt 3R, Philadelphia, PA 19103, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 DESCRIPTION
 FUTURE PLANS
 REFERENCES
 

Summary: ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows ‘queries within queries’ where the result set of a query is further filtered by the subsequent query.

Availability: ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp

Contact: agingle{at}uga.edu


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 DESCRIPTION
 FUTURE PLANS
 REFERENCES
 
ESTminer is a Web application for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The importance of EST assembly and clustering has been well established as evidenced by the number of data processing pipelines, such as STACK (Christoffels et al., 2001) and XGI (http://www.ncgr.org/xgi), and database resources such as Unigene (Wheeler et al., 2003) and the TIGR gene indices (Quackenbush et al., 2001) that have been developed for these data types. The mining of these datasets is an important component of gene discovery and expression profiling. However, their typically large size is challenging to the development of compact displays that provide an overview and facilitate focused queries to identify expressed genes associated with particular tissues or experimental conditions.


    DESCRIPTION
 TOP
 Abstract
 INTRODUCTION
 DESCRIPTION
 FUTURE PLANS
 REFERENCES
 
ESTminer was originally developed as a component (http://cggc.agtec.uga.edu/estMiner/estMiner.jsp) of the CGGC (http://cggc.agtec.uga.edu/) resource for sorghum to provide user-friendly data querying and visualization for the large volume of EST data in the website. Similarly, the downloadable interface allows users to access their own EST contig, cluster and ‘unigene’ datasets stored in their MySql or Oracle relational database management system (RDBMS). The downloadable installation package includes schema creation scripts and sample data.

Views of the associated interface components are shown in Figure 1. A query interface (Fig. 1A) allows the selection of contigs/clusters with a specific library makeup or a threshold number of members. The interface also allows nested queries, in which the result of one query is further filtered by a subsequent query; thus, further enhancing the ‘drill-down’ capabilities of the interface. In addition, contigs and clusters can be selected from an alphanumerically ordered list (Fig. 1C and D) or based on name and GenBank accession id (Fig. 1E). Query results are displayed in a color-coded expandable tree structure (Fig. 1B) in which contigs and clusters are represented by a dynamic color-coded bar graph indicating the relative number of members from each of their cDNA library components. The nodes are expandable, revealing library statistics, sequence data and GenBank records as well as expandable subnodes that correspond to EST members for contigs or singleton ESTs and contig members for similarity-based clusters.



View larger version (126K):
[in this window]
[in a new window]
 
Fig. 1 ESTminer browse, query, search and display interface components are shown. The query interface (A) allows users to filter the result sets based on library composition and contig or cluster size. Contig and cluster library makeup and statistics are displayed in color-coded graphic and text fashion (B). The browse interface (C and D) allows users to select specific contigs or clusters from an alphanumerically ordered list. The search interface (E) allows users to make selections based on name or GenBank accession id.

 
In the CGGC environment, the interface allows users to search for candidate sorghum genes, associated with environmental conditions (e.g. biotic and abiotic stresses), species, tissues and developmental stages. The query interface (Fig. 1A) allows users to filter the results in the presence or absence of any combination of cDNA libraries as well as by setting ranges on contig or cluster size. In addition, a range of clustering parameters such as alignment length or percentage identities threshold, for BLAST-based clustering, can be selected to meet the specific needs of the individual study. The downloadable version provides these flexibilities and is compatible with datasets that involve multiple clustering algorithms/methodologies.

ESTminer application has been developed for a multi-tier Internet architecture and can be deployed on platforms that are compatible with the Apache/Tomcat Web/Application server and either MySQL or Oracle RDBMS. So far, we have successfully tested it on Windows and Linux operating systems. The project was developed with Jbuilder7 (Borland) and is structured as Object-Oriented CVM with Java JSPs and servlets generating the front-end interface components, such as the color-coded bar graph tree nodes and java classes, which handles all non-database computing functions. All SQL queries are encapsulated in two Java classes to facilitate easy modification for adapting to changes in database schema and RDBMS. The database schema contains tables to accommodate cDNA library, EST sequence, contig and cluster data with table partitioning and materialized views being employed in the Oracle RDBMS schema to enhance the overall performance of large datasets.


    FUTURE PLANS
 TOP
 Abstract
 INTRODUCTION
 DESCRIPTION
 FUTURE PLANS
 REFERENCES
 
At the time of this writing we added ‘fuzzy search’ capabilities to the name based lookup form (Fig. 1E), a Perl script loader for populating the MySQL schema from a combination of file formats and popup help tips to supplement the already available documentation. These will be incorporated in the upcoming versions of the installation package. We are planning to develop a GMOD CHADO schema (http://www.gmod.org/) compatible version that will be made available as a separate installation package. We plan to leverage their developing schema standards to facilitate more seamless data exchange with other databases and integration with related GMOD tools. We are also considering the development of an alignment viewer for EST contigs, a feature that is not currently available as part of the interface package.


    Acknowledgments
 
The authors wish to thank the collaborating laboratories for providing data, access to Web resources and advice. We are grateful to the National Science Foundation, the Georgia Research Alliance, the National Grain Sorghum Producers and the University of Georgia Research Foundation for financial support.

Received on June 25, 2004; revised on July 30, 2004; accepted on September 9, 2004

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 DESCRIPTION
 FUTURE PLANS
 REFERENCES
 

    Christoffels, A., van Gelder, A., Greyling, G., Miller, R., Hide, T., Hide, W. (2001) STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res., 29, 234–238[Abstract/Free Full Text].

    Quackenbush, J., Cho, J., Lee, D., Liang, F., Holt, I., Karamycheva, S., Parvizi, B., Pertea, G., Sultana, R., White, J. (2001) The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res., 29, 159–164[Abstract/Free Full Text].

    Wheeler, D.L., Church, D.M., Federhen, S., Lash, A.E., Madden, T.L., Pontius, J.U., Schuler, G.D., Schrimi, L.M., Sequeira, E., Tatusova, T.A., Wagner, L. (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Res., 31, 28–33[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
S. H. Nagaraj, R. B. Gasser, and S. Ranganathan
A hitchhiker's guide to expressed sequence tag (EST) analysis
Brief Bioinform, January 1, 2007; 8(1): 6 - 21.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/5/669    most recent
bti030v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Huang, Y.
Right arrow Articles by Gingle, A. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Huang, Y.
Right arrow Articles by Gingle, A. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?