Skip Navigation

Bioinformatics 2005 21(17):3459-3460; doi:10.1093/bioinformatics/bti591
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Spellman, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Spellman, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

A STATUS REPORT ON MAGE

Paul Spellman

Lawrence Berkeley National Laboratory 1 Cyclotron Road, MS977-255A, Berkeley, CA 94720, USA

Recently, the Microarray Gene Expression Data Society (MGED, www.mged.org) was awarded funds from NHGRI/NIH to continue development of MAGE, the MGED Ontology (MO) and related technologies for communication and interpretation of microarray data. This award reflects NIH's substantial commitment of resources to microarray experimentation and the recognition that these data should be preserved. In this editorial I discuss MGED's goals, show how MGED's efforts have helped the microarray community and describe MGED's future efforts in the context of the recent award.

What is MGED Trying to Do?

MGED's efforts are 2-fold. First, it is committed to ensure that microarray experiments are scientifically sound. Second, it is involved in building the infrastructure that will support experimental data, from millions of microarrays. This second goals allows microarray data to be treated similar to DNA sequences which are repeatedly queried, reanalyzed and amalgamated. This is obviously critical because the data are extremely valuable if they are computationally interpretable. MGED's solutions to achieve these two goals are the introduction of scientific guidelines (MIAME), data communications standards (MAGE) and biological annotationsystems (MO).

Concerning the development of infrastructure for data storage, query and analysis, perhaps it is better to start from what I think MGED is not trying to do. Explicitly MGED's effort is not to build an interoperability and computational infrastructure for a few hundred or a few thousand experiments. This is the current size of the ArrayExpress and GEO data archives, which hold 700 and 2000 experiments, respectively. If a user requires information on the biology of one of these experiments it is simply a matter of reading the paper that describes the experiment. Data standards for interpretation and communication are unnecessary in this case.

MGED's efforts are focused squarely on the future; a future where 100 million individual microarrays (or more) are archived and served to the research community. One hundred million publicly available microarrays is an enormous number, but I suspect it will be lower than that exists in 20 years, because as costs go down and experimental systems become more automated, microarrays will be increasingly used in biological research, just as sequencing technologies have. Both ArrayExpress and GEO have doubled in size every six months, which would mean that 100 million microarrays would be available in 2010; assuming a more modest doubling every year, 100 million microarrays will be available by 2015. As a ballpark estimate, 100 million microarrays would account for 2% of public global medical research spending in the next 10 years. In practice, MGED's efforts will become useful well before this point, typically once more data are available than a user can easily access and understand without computer assistance. For some data consumers, data surfeit has already occurred, for some, it may be imminent; whereas for others it may be several years in the future. No matter what the current strengths of any one research group may be, at some point there will be too much published microarray data for anyone to accurately translate between formats and to hand-annotate by reading research papers.

What is a researcher, who is interested in viewing or analyzing data from several hundred or several thousand of these experiments, supposed to do without a common infrastructure? If each is in a different format with its own meaning, the advantage of public data sharing has been lost. Unlike DNA sequences where nearly all of the information is encoded in the data stream itself, interpretation of microarray results depend on context and the experimental variables, design and parameters all have a profound impact on what information can be extracted from the data. The result is that sharing microarray data is as much a matter of explaining the biology of the system as it is describing a table of numbers.

How have MIAME, MAGE, and the MO Helped?

The MGED solution to this problem has been the development of a publication standard, a communication model and an annotation system for microarray experiments. MIAME specifies what information is necessary to understand a microarray experiment. MAGE and its associated technologies provide a computational model for describing a microarray experiment. MO provides standardized descriptions of biological and experimental properties to better facilitate interpretation and comparison. These three efforts, in particular, have enormous benefits for the scientific community:

  • MIAME, MAGE and MO provide a concrete framework for software developers to learn about the structure of microarray experiments—one that can evolve and adapt as our needs and the technology develop.
  • MIAME, MAGE and MO have stimulated discussion of what is biologically important, how it should be modeled and how to work together with others in a productive and efficient manner.
  • These efforts have accelerated the pace of software and database development by providing tools and standards for implementation.

What is MGED Doing Now?

The funding provided by NIGMS will allow MGED to further develop MIAME, MAGE and MO as standards and work to bring them into wider acceptance and use. First, we are extending MAGE–MO in three areas that are recognized to be limiting: the organization of MAGE components that limits its application to other technologies; incomplete incorporation of the MO; and limited ability to communicate analysis and interpretation beyond the data. Second, we will provide better maintenance of the MAGE software toolkit (MAGEstk), to ensure that the software is kept up-to-date, to respond to feature requests and to develop new technologies, such as the ability to efficiently accommodate very large datasets. Third, we are building the MO as an integrated extension of the MAGE–OM. Full details of our efforts can be found on our website www.mged.org and are routinely provided at our annual meetings.

Members of MGED have committed their efforts to ensure that the biomedical research community is able to take advantage of the results being generated by investigators utilizing microarrays. We realize that success in this endeavor is closely tied to the availability of software and tools that make these standards useful to the broader research community and to developing standards that are practical and acceptable to researchers generating the data. Our commitment during the coming years remains to involve both commercial and academic partners in developing and implementing these standards and to create open and accessible packages that facilitate biological inquiry. Our hope is these efforts will facilitate the exponential growth in the public repositories that we expect to see in the coming years. More importantly, we believe that our successes and the lessons we learn will allow similar standards to be developed and implemented in a wide range of functional genomics, proteomics and metabolomics experimentation so that we, as a community, can effectively describe computable biological records of experimentation and enable wide use and analysis of the data.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Spellman, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Spellman, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?