Bioinformatics Advance Access originally published online on October 22, 2007
Bioinformatics 2007 23(23):3256-3257; doi:10.1093/bioinformatics/btm516
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
NAPROC-13: a database for the dereplication of natural product mixtures in bioassay-guided protocols


1Departamento de Química Farmacéutica, Facultad de Farmacia, Campus M. Unamuno, 37007 Salamanca and 2Departamento de Informática y Automática, Facultad de Ciencias, Universidad de Salamanca, Spain
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Although natural products represent a reservoir of molecular diversity, the process of isolating and identifying active compounds is a bottleneck in drug discovery programs. The rapid isolation and identification of the bioactive component(s) of natural product mixtures during the bioassay-guided fractionation have become crucial factors in the competition with chemical compound libraries and combinatorial synthetic efforts. In this respect, the use of spectral databases in identification processes is indispensable.
Results: We have developed a database containing 13C spectral information of over 6000 natural compounds, which allows for fast identifications of known compounds present in the crude extracts and provides insight into the structural elucidation of unknown compounds.
Availability: http://c13.usal.es
Contact: theron{at}usal.es
| 1 INTRODUCTION |
|---|
|
|
|---|
Natural products have traditionally been a major drug source and continue to play a significant role in today's drug discovery environments (Butler, 2004). In fact, in some therapeutic areas, as for example, oncology, infections and immunomodulation targets, many of the currently available drugs are derived from natural products (Buss et al., 2004).
For drug discovery and their development, natural products represent a reservoir of molecular diversity that may become a complementary resources to combinatorial libraries. Nevertheless, the process of isolating and identifying active compounds is at present a bottleneck in drug discovery programs. However, these practical difficulties can be overcome due to progress made in separation technologies as well as in the speed and sensitivity of structure elucidation (Clarkson et al., 2006). The rapid identification of the bioactive component(s) of natural product mixtures in high-throughput screening programs has become an indispensable factor that guarantees effective competition with chemical compound libraries and combinatorial synthetic methodologies. For the isolation, identification and biological profiling of bioactive compounds, the effective use of automated procedures and databases will be necessary. In order to compare and identify known structures, a query to a database can be performed by spectral data or substructures. 13C NMR spectroscopy is the most powerful tool for this task. A tremendous amount of work can be eliminated by the identification of previously characterized structures with the help of a database search. Instead, our efforts can be reoriented towards the characterization of novel compounds. In the area of natural products, where hundred thousand compounds have been reported in the literature, most compounds are absent from commercially available spectral libraries. SuperNatural is a public resource containing 3D structures developed for searches of bioactive natural compounds (Dunkel et al., 2006). A complementary tool of SuperNatural could be NAPROC-13, since it allows for a rapid identification of natural products in phytochemical studies. Once a compound has been identified from vegetal extracts by means of NAPROC-13, searches by similarity in SuperNatural could be performed in order to determine its hypothetical biological activities. The existing open source database in the web (NMRShiftDB; http://nmrshiftdb.ice.mpg.de) contains spectral information of natural compounds, yet, unfortunately, it also contains a significant quantity of synthetic compounds that lack drug-like properties. NAPROC-13 presents the advantage of containing only natural products and few related compounds. Unlike NMRShiftDB, NAPROC-13 uses Cartesian coordinates for graphics that improve structure representations and allow for a better appreciation of the stereocenters in accordance with IUPAC recommendations (Fig. 1b). Given the stereospecificity of most biological Targets, stereochemistry determines the biological properties of natural products and drugs. Another characteristic of NAPROC-13 is the homogeneity of the numbering system within the same family compounds, which enables the comparison of spectral data lists of a variety of related structures. For every family compounds, NAPROC-13 uses the numbering system of the Dictionary of Natural Products (http://www.chemnetbase.com, Chapmann & Hall/CRC Press). Finally, it contains collections of compounds and their spectroscopic associated data that is not available in other databases.
|
| 2 METHODS |
|---|
|
|
|---|
In order to enhance the drug discovery process of natural products, we have developed NAPROC-13, a suitable tool that deals with complex chemical problems such as structure elucidation, which necessitates the joint efforts of information science and chemistry experts. NAPROC-13 significantly enhances the search of spectroscopic information on the web by integrating structure-based and numerical chemical shift searches. The basic database scheme is relationally organized and the molecular structures are defined and stored in the database with SMILES code (Weininger et al., 1989). This format of structural specification, which uses one line notation, is designed to share chemical structure information over the internet. The sub-structural searches are performed by SMARTS code, a variation of the SMILES code. The spectral 13C NMR data, in the form of a numerical list of chemical shift and carbon multiplicity, is always associated with every compound structure. An applet, JME (Ertl et al., 1997), is used to convert these notations into a graph that represents a structure that will be interpreted by organic chemists. The database contains more than 6000 natural compounds and related compounds, mainly terpenoids (triterpenoids, diterpenoids, etc.). At present, other families as alkaloids are being introduced, too. The largest number of heavy atoms of a compound in the database is 99, and its molecular formula is C66H106O33. More than 2000 compounds contain 30 or more carbon atoms and other 2000 compounds contain 20 or more carbon atoms. Structures and spectral data collected in the database are mainly compiled from papers published in the last issues of the following research journals: Journal of Natural Products, Phytochemistry, Planta Medica, Chemical & Pharmaceutical Bulletin, Chemistry of Natural Compounds, Helvetica Chimica Acta and Magnetic Resonance in Chemistry. NAPROC-13 allows for flexible searches by chemical substructure of structures, by spectral features, chemical shifts and multiplicities. Searches for trivial and semi-systematic names, molecular forms, families, types and groups of compounds according to standard classification of natural compounds are also provided for in a pull-down list system. An important implemented search type enables a search by hot-spots of the molecule looking for chemical shifts of connected carbons, which can be deduced by the interpretation of the 2D NMR experiments like HMQC, HMBC, ROESY, etc.
| 3 CASE STUDY |
|---|
|
|
|---|
A phytochemical study of Caesalpinia bonduc (L.) Roxb has been described in the current September issue of Journal of Natural Products (Pudhom et al., 2007). This species is used as a medicinal plant in various regions of the tropics. In fact, metabolites of the same family isolated from this plant present antiviral, antimalarial, antibacterial and antioxidant activities. In this article, the isolation of 3 new natural products together with 13 known diterpenoids are described. 13C NMR data of 11 from the 13 known compounds are available in NAPROC-13 (only
-caesalpin and
-caesalpin are absent). All of them could be easy and rapidly identified as a group of Diterpenoids, called Vouacapanes, by the use of NAPROC-13 database. A later search by type of compound, Vouacapanes, allowed us to find spectroscopical data of 78 compounds and to graphically visualize their chemical shifts over the structure. Had the authors used an iterative search by chemical shifts carried out in NAPROC-13, they could have rapidly elucidated the structures of the three new compounds. Indeed, as can be appreciated in Figure 1a, a search carried out by only 4 chemical shifts of the 23 signals of the spectrum of compound 1, Bonducellpin E, has allowed us to find 6 compounds (Fig. 1b) all of which belong to the Vouacapanes group. The 11 already described compounds belong to the same group. These findings indicate that searches carried out with NAPROC-13 are highly efficient and selective. In addition, the results derived from the analysis of the HMBC correlations published in the same paper could be studied by group searches also implemented in NAPROC-13. | ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Financial support came from the Ministerio de Educación y Ciencia, project TIN2006-06313 and the Junta de Castilla y León, projects SAO30A06 and US21/06. The authors wish to thank the courtesy of Dr Peter Ertl for consenting to the non-profit use of JME.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Jonathan Wren
The authors wish it to be known that in their opinion, the first two authors should be regarded as joint First Authors. ![]()
Received on August 5, 2007; revised on October 2, 2007; accepted on October 8, 2007
| REFERENCES |
|---|
|
|
|---|
Buss DB, Butler MS. A new model for utilising chemical diversity from natural sources. Drug Dev. Res (2004) 62:362–370.[CrossRef][Web of Science]
Butler MS. The role of natural product chemistry in drug discovery. J. Nat. Prod (2004) 67:2141–2153.[CrossRef][Medline]
Clarkson C, et al. Discovering new natural products directly from crude extracts by HPLC-SPE-NMR: chinane diterpenes in Harpagophytum procumbens. J. Nat. Prod (2006) 69:527–530.[CrossRef][Medline]
Dunkel M, et al. SuperNatural: a searchable database of available natural compounds. Nucleic Acids Res (2006) 34:D678–683.
Ertl P, Jacob O. WWW-based chemical information system. Theochem (1997) 419:113–120.[CrossRef]
Pudhom K, et al. Cassane Furanoditerpenoids from the Seed Kernels of Caesalpinia bonduc from Thailand. J. Nat. Prod (2007) 70:1542–1544.[CrossRef][Medline]
Weininger D, et al. Smiles. 2. Algorithm for generation of unique SMILES. J. Chem. Inf. Comput. Sci (1989) 29:97–101.[CrossRef][Web of Science]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
