Skip Navigation


Bioinformatics Advance Access originally published online on November 25, 2004
Bioinformatics 2005 21(8):1717-1718; doi:10.1093/bioinformatics/bti152
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow A correction has been published
Right arrow All Versions of this Article:
21/8/1717    most recent
bti152v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (18)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kikuchi, N.
Right arrow Articles by Narimatsu, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kikuchi, N.
Right arrow Articles by Narimatsu, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures

Norihiro Kikuchi 1,*, Akihiko Kameyama 2, Shuuichi Nakaya 2,3, Hiromi Ito 2, Takashi Sato 2, Toshihide Shikanai 1,2, Yoriko Takahashi 1 and Hisashi Narimatsu 2

1Mitsui Knowledge Industry Co., Ltd. Nakano-ku, Tokyo, Japan
2Glycogene Function Team, Research Center for Glycoscience, National Institute of Advanced Industrial Science and Technology (AIST) Tsukuba, Ibaraki, Japan
3Shimadzu Corporation Nakagyo-ku, Kyoto, Japan

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 ABBREVIATIONS
 REFERENCES
 

Summary: Bioinformatics resources for glycomics are very poor as compared with those for genomics and proteomics. The complexity of carbohydrate sequences makes it difficult to define a common language to represent them, and the development of bioinformatics tools for glycomics has not progressed. In this study, we developed a carbohydrate sequence markup language (CabosML), an XML description of carbohydrate structures.

Availability: The language definition (XML Schema) and an experimental database of carbohydrate structures using an XML database management system are available at http://www.phoenix.hydra.mki.co.jp/CabosDemo.html

Contact: kikuchi{at}hydra.mki.co.jp

The research of post-translational modification has become one of the most important areas in the post-genomic era. Glycosylation has been recognized as the most common post-translational modification of protein in vivo. It has been predicted that more than half of all proteins are glycosylated (Apweiler et al., 1999).

Bioinformatics resources for glycomics are very poor as compared with those for genomics and proteomics which are widely available for molecular biologists. The carbohydrate sequence is more complex than DNA and protein sequences because of the variety of their structures including regio- and stereo-chemistry, modification and branching. The complexity of carbohydrate sequences makes it difficult to define a common language to represent them. The sequence data of carbohydrates can also be difficult to parse, and the development of bioinformatics tools for glycomics has not progressed. Several languages for describing the carbohydrate structure, LINUCS in SWEET-DB (Bohne-Lang et al., 2001), KCF in KEGG GLYCAN (Kanehisa et al., 2004) and LinearCode in the Glycomics database (http://www.glycominds.com), have been developed, but no standard has been adopted to date.

Extensible Markup Language (XML) has been promoted as the method of standardization for distributed computing and web applications, and has been applied in the distribution of biological data. In this study, we developed a carbohydrate sequence markup language (CabosML), an XML description of carbohydrate structures for exchanging and storing carbohydrate sequences. CabosML accomplishes five things:

  1. The description of the types of monosaccharides.
  2. The description of the types of linkages and anomers.
  3. The description of oligosaccharides containing branching.
  4. The description of modifications.
  5. The description of repeating and cyclic structures.

CabosML defines the connection between adjacent monosaccharides as Parent–child relationships in an XML document, in which a parent element represents a monosaccharide of the reducing end side and a child element represents a monosaccharide of the non-reducing end side. A simple example of describing a carbohydrate structure of the N-Glycan core, Man5GlcNAc2, is illustrated in the following:

<xml? version="1.0" encoding="UTF-8"?>

<g:Glyco xmlns:g="http://bio.mki.co.jp/

glycoinformatics/2003">

<g:Carb_ID/>

<g:Carb_structure>

<g:MS name="GlcNAc" >

<g:MS link="1-4" anom="b"

name="GlcNAc" >

<g:MS link="1-4" anom="b"

name="Man" >

<g:MS link="1-3"

anom="a" name="Man" />

<g:MS link="1-6"

anom="a" name="Man" />

</g:MS>

</g:MS>

</g:MS>

</g:Carb_structure>

</g:Glyco>

where the element ‘g:MS’ represents the monosaccharide. The types of monosaccharides, anomers and linkages are represented by using the attributes ‘name’, ‘anom’ and ‘link’, respectively. Modifications are defined as an element, ‘g:MOD’, which has attributes of position of the modification to the monosaccharide. For example, GlcNAc with Sulfate (S) in position 4 is represented by the following:

<g:MS name="GlcNAc">

<g:MOD name="S" pos="4" />

</g:MS>

In addition to modifications, some carbohydrates contain cyclic or repeating structures. The elements ‘g:COS’ and ‘g:ROS’ represent cyclic and repeating, respectively. The number of repeats is expressed by the attribute ‘num’. Carb_ID is an element to represent the indentifier of carbohydrates. This element is used to identify carbohydrates stored in the database.

We also developed graphical user interfaces for editing and searching carbohydrate structures using CabosML, by which the depicted structures can be automatically converted into XML formats. Figure 1 shows the editing of a structure of 6-sulfo sialyl Lewis X, of which the XML description is the following:



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 1 An editor of carbohydrate structure.

 
<xml? version="1.0" encoding="UTF-8"?>

<g:Glyco xmlns:g="http://bio.mki.co.jp/

glycoinformatics/2003">

<g:Carb_ID/>

<g:Carb_structure>

<g:MS name="GlcNAc" >

<g:MOD pos="6" name="S"/>

<g:MS link="1-3" anom="a"

name="Fuc" />

<g:MS link="1-4" anom="b"

name="Gal" >

<g:MS link="2-3" anom="a"

name="Neu5Ac" />

</g:MS>

</g:MS>

</g:Carb_structure>

</g:Glyco>

Unlike other carbohydrate descriptions such as LINUCS and LinearCode, CabosML will make progress possible in the development of bioinformatics tools for glycobiology because of simplicity, flexibility and versatility of XML format. We have already developed an experimental database of carbohydrate structures using an XML database management system, and implemented the function for searching carbohydrate structures using XPath. In addition, many software and libraries for several programming languages developed to process XML documents already exist. These should help the developers to make the bioinformatics tools utilizing CabosML. Thus CabosML will make it possible to facilitate the development of software for glycomics, and so will be the most suitable language to standardize. Moreover it will be easy for XML descriptions of carbohydrate sequences to be integrated to other bioinformatics resources, because XML has been used for the standardization and distribution of many kinds of biological data, i.e., MIAME (http://www.mged.org/Workgroups/MIAME/miame.html) and EMBL (Wang et al., 2002). The XML description for carbohydrates described here will greatly contribute to the progress of informatics tools for glycomics.


    ABBREVIATIONS
 TOP
 Abstract
 ABBREVIATIONS
 REFERENCES
 
The abbreviations used are: Gal, galactose; Man, mannose; GlcNAc, N-acetylglucosamine; Fuc, fucose; Neu5AC, N-acetylneuraminic acid.

Received on July 23, 2004; revised on October 11, 2004; accepted on November 12, 2004

    REFERENCES
 TOP
 Abstract
 ABBREVIATIONS
 REFERENCES
 

    Apweiler, R., Hermjakob, H., Sharon, N. (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim. Biophys. Acta, 1473, 4–8[Medline].

    Bohne-Lang, A., Lang, E., Forster, T., von der Lieth, C.W. (2001) LINUCS: linear notation for unique description of carbohydrate sequences. Carbohydr. Res., 336, 1–11[CrossRef][Web of Science][Medline].

    Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res., 32, D277–D280[Abstract/Free Full Text].

    Wang, L., Riethoven, J.J., Robinson, A. (2002) XEMBL: distributing EMBL data in XML format. Bioinformatics, 18, 1147–1148[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
G. Liu, D. D. Marathe, K. L. Matta, and S. Neelamegham
Systems-level modeling of cellular glycosylation reaction networks: O-linked glycan formation on natural selectin ligands
Bioinformatics, December 1, 2008; 24(23): 2740 - 2747.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Nakahara, R. Hashimoto, H. Nakagawa, K. Monde, N. Miura, and S.-I. Nishimura
Glycoconjugate Data Bank:Structures an annotated glycan structure database and N-glycan primary structure verification service
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D368 - D371.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Toukach, H. J Joshi, R. Ranzinger, Y. Knirel, and C.-W. von der Lieth
Sharing of worldwide distributed carbohydrate-related digital resources: online connection of the Bacterial Carbohydrate Structure DataBase and GLYCOSCIENCES.de
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D280 - D286.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
J. Iijima, Y. Zhao, T. Isaji, A. Kameyama, S. Nakaya, X. Wang, H. Ihara, X. Cheng, T. Nakagawa, E. Miyoshi, et al.
Cell-Cell Interaction-dependent Regulation of N-Acetylglucosaminyltransferase III and the Bisected N-Glycans in GE11 Epithelial Cells: INVOLVEMENT OF E-CADHERIN-MEDIATED CELL ADHESION
J. Biol. Chem., May 12, 2006; 281(19): 13038 - 13046.
[Abstract] [Full Text] [PDF]


Home page
GlycobiologyHome page
T. Lutteke, A. Bohne-Lang, A. Loss, T. Goetz, M. Frank, and C.-W. von der Lieth
GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research
Glycobiology, May 1, 2006; 16(5): 71R - 81R.
[Abstract] [Full Text] [PDF]


Home page
GlycobiologyHome page
R. Raman, M. Venkataraman, S. Ramakrishnan, W. Lang, S. Raguram, and R. Sasisekharan
Advancing glycomics: Implementation strategies at the Consortium for Functional Glycomics
Glycobiology, May 1, 2006; 16(5): 82R - 90R.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow A correction has been published
Right arrow All Versions of this Article:
21/8/1717    most recent
bti152v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (18)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kikuchi, N.
Right arrow Articles by Narimatsu, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kikuchi, N.
Right arrow Articles by Narimatsu, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?