Bioinformatics 2005 21(11):2797-2802; doi:10.1093/bioinformatics/bti399
Erratum
| Design and implementation of a mosquito database through an entomological ontology |
|---|
|
|
|---|
Guillaume Koum, Augustin Yekel, Bengyella Ndifon and Frédéric Simard
The publisher wishes to apologize for accidentally publishing the wrong version of this paper. The correct version is published below.
Design and implementation of a mosquito database through an entomological ontology
1Laborima, Ecole Polytechnique B.P 8390 Yaounde Cameroon
2Oceac, B.P 288 Yaounde Cameroon
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: There have been constant changes in the biology and behavior of the vector and parasite involved in the transmission of malaria. There is limited interest in developing new technologies and procedures for controlling the underlying factors of this threat, which poses an enormous challenge to health systems. To understand the various vector species and their interrelations is of prime importance in understanding the transmission mechanisms of malaria in order to react efficiently. To attain this objective, we have used an ontological approach to produce a database that we consider to be our own contribution in helping to control malarial vectors if eradication has been unsuccessful in the previous control campaign.
Contact: g_koum{at}yahoo.fr
| 1 INTRODUCTION |
|---|
|
|
|---|
Fighting malaria (Biology of Plasmodium Parasites and Anopheles Mosquitoes, http://www-cro.msb.le.ac.uk/224/Bradley/Biology.html and MALARIA, http://www.who.int/inf-fs/en/fac094.html), especially in tropical regions, is a permanent preoccupation for the governments of the countries involved. World Health Organization (WHO) sources say the disease affects
40% of the world's population, mostly those living in the world's poorest countries, and causes >300 million cases of acute illness and at least one million deaths annually. The disease is the principal cause of morbidity in these regions and accounts for
50% of hospital consultations and 23% of admissions and consumes >40% of the annual health budget of a family (Cameroon Tribune, April 28, 2003). It is estimated that illness transmitted by the female mosquito, Anopheles, kills an African child every 30 s by infecting and destroying red blood cells (anaemia) and by clogging the capillaries that carry blood in the brain (cerebral malaria) or other vital organs. Children, pregnant women and their unborn children are particularly vulnerable to malaria, which is a major cause of prenatal mortality. The purpose of this work is, first of all, to constitute an entomological ontology of mosquitoes (in Central Africa) and design and implement a mosquito database. This database will further be associated with a geographical information system (GIS) in order to localize all varieties of mosquitoes in the region and enable to respond to requests like what types of mosquitoes occur in a given zone, their characteristics, what the density of mosquitoes is in a given zone, the density of the human population and density of malarial patients, mortality estimates and so on.
The sequence of this paper is as follows : we first define the materials and methods (Section 2). After presenting database concepts, particularly building the database (Section 3), we talk about ontology (Section 4). In Section 5, we construct an ontology to create a mosquito database, as a basis of a universal relation. Section 6 presents the results obtained from this implementation. In conclusion (Section 7), we say that the database will be associated with a GIS for multiusers.
| 2 MATERIALS AND METHODS |
|---|
|
|
|---|
2.1 Mosquito collection
Mosquitoes were collected from 19 locations in different ecological situations. Anopheline females were captured at night when they landed on the legs of the volunteers' for a blood meal and collected inside bedrooms in the afternoon using a pyrethrum spray. The Anophelines were identified using the morphologic identification key of Gillies and deMeillon (1968). The specimens were stored individually at 20°C, in a tube with desiccant for laboratory processing in the laboratory at Yaounde.
2.2 Cytogenetic identification
All the mosquitoes analysed under cytogenetic identification were indoor resting half-gravid females, captured in the afternoon. The abdomen of these mosquitoes were fixed in Carnoy's solution (3 parts pure ethanol: 1 part glacial acetic acid). Chromosomal preparations were processed following the technique of Hunt (1973). Paracentric inversion karyotypes were scored according to the nomenclature of Coluzzi et al. (1979).
2.3 Infection rate of mosquitoes and blood meal identification
The head and thorax of most mosquitoes collected in Simbock were homogenized in blocking buffer before being tested by enzyme-linked immunosorbent assay (ELISA) for the circumsporozoite protein of Plasmodium falciparum, Plasmodium malariae and Plasmodium ovale as described by Burkot et al. (1981) and modified by Wirtz et al. (1987). The details have been given in a previous paper (Fontanille et al., 2001). The technique identified human, bovine, ovine (sheep and goat), horse, pig and chicken hosts.
2.4 Areas of study
Areas of study are described in detail in Table 1.
|
| 3 DATABASE CONCEPTS |
|---|
|
|
|---|
3.1 Generalities
Many authors have written on databases (Delobel and Adiba, 1982; Martin, 1976). We have retained the following concepts in this domain.
A database is a collection of stored operational data on disks, drums or other secondary storage media used by the application systems of some particular organism. A set of application programs runs against these data, operating on it in all the usual ways (retrieving, updating, inserting and deleting). In addition, there may be a set of on-line users who interact with the database from remote terminals, performing all the functions of retrieving, updating, inserting and deleting, but mostly retrieving.
The database is integrated. This means that the database contains the data of many users, which in turn implies that any one user will be concerned with just a small portion of it. Moreover, the portions of different users will overlap in various ways, i.e. individual pieces of data may be shared by many different users.
Any enterprise or organization, must necessarily maintain many data about its operation: for example, product data for a manufacturing company; account data for a bank; patient data for a hospital; mosquito data for medical researchers; planning data for the government.
A database must be organized in some format to be maintained by a database management system. There are many classical data models: relational data, hierarchical or network data, entity-relational data and other variant, object-oriented data. At this stage, we can only say that the relational ideas have had and continue to have, a major impact on many areas of database technology. They form the basis for continuing investigations into such diverse fields as normalization and data semantics, support of natural language and more formal casual user interfaces, deductive inference, security, integrated and concurrency control database performance and high-level hardware database support.
One of the most popular models is the relational model, which we use in this study. The presentation of this approach is given below.
3.2 The relational approach
The basic data construction of the relational approach is the relation (or the table) itself. All information in the database is represented using just this one construct. Most of the research since 1970 into areas, such as concurrency, locking, security, integrity, view definition and so on has taken the relational model as a starting point because it provides a clean conceptual base. And as for symmetry and non-redundancy, the database seems to meet the requirements. The normalization discipline (fourth and fifth normal forms) guarantees that the same fact will not appear in two places.
Relations are also easy to manipulate. Moreover, the statement is true at both tuple-at-a-time and set-at-a-time levels; very high-level operators (those of the relational algebra and equivalent languages) are available, as well as the more familiar low-level operators. The number of distinct operators in any given language is small, because there is only one type of data construct to deal with; so we need just one operator for each of the four basic functions, retrieve, insert, delete and update. If we also consider the operators needed for authorization and integrity purposes (e.g. the SQL GRANT and ASSERT operators), we find that a single set of operators is all that is necessary for this same purpose. Finally, relational languages generally provide what CODD calls symmetric exploitation: the ability to access a relation by specifying known values for its other attributes and seeking the (unknown) values for its other attributes.
| 4 ONTOLOGY |
|---|
|
|
|---|
4.1 What is an ontology?
The ontology we are about to elaborate for the malarial vector (the various species of Anopheles mosquito found in Cameroon) is limited to the fields of the database of
40 000 mosquitoes found at OCEAC (Common Central Africa States Organization), Yaounde, Cameroon. Ontology (Duineveld et al., 1999 http://sern.ucalgary.ca/KSI/KAW/KAW99/; Gruninger and Fox, 1995, http://sunsite.Informatik.Rwth-aachen.de/Publications/CEUR-WS/Vol-18/; Gruber, 1993, ftp.ksl.stanford.edu/pub/KSL_Reports/KSL-983-04.ps; Uschold et al., 1998) is the study concerned with what kinds of things exist: what entities or things there are in the universe. The view of ontology in computer science is somewhat narrower, where an ontology is a working model of entities and interactions either generic (e.g. the Cyc ontology) or in some particular domain of knowledge or practice, such as molecular biology or bioinformatics. The following definition has been given: An ontology may take a variety of forms but it will necessarily include a vocabulary of terms and some specification of their meaning. This includes definitions and an indication of how concepts that are inter-related collectively impose a structure on the domain and constrain the possible interpretations of terms. Gruber defines an ontology as the specification of conceptualizations, used to help programs and humans share knowledge.
4.2 Applications of bio-ontologies
A common ideal for ontology is that it should be re-usable. This ambition distinguishes ontology from a database schema, even though both are conceptualizations. For example, a database schema is intended to satisfy only one application, but an ontology could be re-used in many applications. However, an ontology is re-usable only when it is to be used for the same purpose for which it was developed. Not all ontologies have the same intended purpose and may have parts that are re-usable and other parts that are not. They will also vary in their coverage and level of detail. We can divide ontology use into three types:
- Domain-oriented, which is either domain specific (e.g. Escherichia coli) or domain generalized (e.g. gene function or ribosomes).
- Task-oriented, which is task specific (e.g. annotation analysis) or task generalized (e.g. problem solving).
- Generic, which captures common high-level concepts, such as physical, abstract, structure and substance. This can be especially useful when trying to re-use an ontology as it allows concepts to be correctly or more reliably placed. It can also be important when generating or analysing natural language expressions using an ontology. Generic ontologies are also known as upper ontologies, core ontologies or reference ontologies.
Ontologies are used in a wide range of application scenarios:
- A community referenceneutral authoring. The knowledge is authored in a single language and converted into a different form for use in multiple target systems. The benefits include knowledge re-use, improved maintainability and long-term knowledge retention.
- Either defining a database schema or defining a common vocabulary for database annotationontology as specification. Describing a mosquito entry as species name will ensure that a common vocabulary is available for description, sharing and posing questions (Item 4 in this list). The benefits include documentation, maintenance, reliability, sharing and knowledge re-use.
- Providing common access to information. Information must be shared but is expressed using unfamiliar vocabulary. The ontology helps to render the information intelligible by providing a shared understanding of the terms or the mapping between the terms. The benefits include interoperability and more effective use and re-use knowledge resources.
- Ontology-based search by forming queries over databases. An ontology is used for searching an information repository. For example, when searching databases for species name all and only those mosquitoes will be found, as only exact terms can be used for searching. Whether the user of the terms can be sure of their meaning depends on how the knowledge in the ontology has been represented. For example, is it explicit using the genus name or the species name?
Queries can be refined by following relationships within the ontology, e.g. following relationships to find those processes in which certain functions of mosquitoes act and gathering the associated mosquitoes. Moving up and down the is a kind of hierarchy within the ontology can also be used to refine queries, e.g. making genus name more specific as species name by moving down the hierarchy when the former gathered many answers. The benefits include more effective access and hence more effective use and re-use of knowledge resources.
- Understanding database annotation and technical literature. These ontologies are designed to support natural language processing (NLP) and not only link domain knowledge but also identify how knowledge is related to linguistic structure, such as grammar and lexicons.
| 5 AN ONTOLOGICAL APPROACH FOR THE CONCEPTION OF A MALARIA DATABASE |
|---|
|
|
|---|
5.1 The approach
Much of biology works by applying prior knowledge (what is known) to an unknown entity, rather than by applying a set of axioms that will elicit knowledge. In addition, the complex biological data stored in bioinformatics databases often require the addition of knowledge to specify and constrain the values held in that database. One way of capturing knowledge within bioinformatics applications and databases is the use of ontologies. An ontology is the concrete form of a conceptualization of a community's knowledge of a domain.
The conceptualization is the couching of knowledge about the world in terms of entities (things, the relationships they hold and the constraints between them). The specification is the representation of this conceptualization in a concrete form. One step in this specification is the encoding of the conceptualization in a knowledge representation language. The goal is to create an agreed-upon vocabulary and semantic structure for exchanging information about that domain. The main components of an ontology are concepts, relations, instances and axioms. A concept represents a set or class of entities or things within a domain e.g. mosquito is a concept within the domain of malaria. Concepts are of two kinds:
- Primitive concepts are those that only have necessary conditions (in terms of their properties) for membership of the class. For example, Anopheles gambiae is a kind of mosquito with a segmented body; mosquitoes must have a segmented body, but there could be others that have a segmented body that are not mosquitoes.
- Defined concepts are those whose description is both necessary and sufficient for a thing to be a member of the class. For example, an Anopheles mosquito is a kind of insect that transmits the human malarial parasite. Not only does every Anopheles transmit the human malarial parasite, but every transmitter of human malaria is an Anopheles mosquito.
- Taxonomies that organize concepts into subconcept and super-concept tree structure. The most common forms of these are specialization relationships commonly known as the is a kind of relationship. For example, A.gambiae is a kind of mosquito, which in turn is a kind of Insect. Partitive relationships describe concepts that are a part of other concepts, e.g. a mosquito has the Component Proboscis.
- Associative relationships that relate concepts across tree structures. Commonly found examples include the following:
- Nominative relationships describe the names of concepts, e.g. a mosquito has the Accession Number VectorID (In the context of bioinformatics) and a habitat has the name Vector's Refuge. Locative relationships describe the location of one concept with respect to anothera mosquito has a Captured location Locality Name.
- Associative relationships that represent, e.g. the functions, processes a concept has or is involved in and other properties of the concepta mosquito has the Function parasite transmission, a mosquito is associated with process transmission and a mosquito has organism classification species. Many other types of relationships exist, such as causative relationships.
- Nominative relationships describe the names of concepts, e.g. a mosquito has the Accession Number VectorID (In the context of bioinformatics) and a habitat has the name Vector's Refuge. Locative relationships describe the location of one concept with respect to anothera mosquito has a Captured location Locality Name.
- Whether it is universally necessary that a relationship must hold on a concept. For example, when describing a mosquito database, we might want to say that a mosquito has an Accession Number VectorID hold universally, i.e. for all mosquitoes in our database.
- Whether a relationship can optionally hold on a concept; for example, we might want to describe that a vector possesses a parasitic form only describes the possibility that a vector has a parasite as not all vector do possess parasites.
- Whether the concept a relationship links to is restricted to certain kinds of concepts. For example, a mosquito has Function Parasite Transmission restricts the has Function relation to only link to concepts that are kinds of Parasite Transmission. A mosquito has Function says that a mosquito has a function but does not restrict as to what kind of concept the function might be.
The cardinality of the relationship. For example, a particular Accession Number is the accession number of only one mosquito, but one Vector may have many parasites.
- Whether the relationship is transitive. For example if a mosquito is Associated with Process Parasite Transmission and Parasite Transmission is Associated with Process Malaria Symptoms Manifestation then mosquito is Associated with Process Malaria Symptoms Manifestation. Taxonomy relations always have this property.
Finally, axioms are used to constrain values for classes or instances. In this sense, the properties of relations are kinds of axioms. Axioms also, however, include more general rules, such as nucleic acids shorter than 20 residues are oligonucleiotides.
5.2 Experimentation
Thus, our presentation has introduced the need for and use of an ontology within the malaria research community. The need for ontologies arises from the need to be able to cope with the size and complexity of biological knowledge and data. Ontologies enable knowledge to be used within systems for communication, specification and other processing tasks. Several ontologies may exist within a particular domain.
If the application genuinely needs an ontology and that ontology will be long lived, then the investment may well be worthwhile. Like many technologies, in a discipline such as bioinformatics, it is community effort that is important in making the use of that technology to be productive (Fig. 1).
|
| 6 THE RESULTING MOSQUITO DATABASE |
|---|
|
|
|---|
For us, using an ontology is a way of replacing the traditional topdown approach in designing databases. Therefore, the universal relation becomes the original table in this study. The above ontology has helped us to come up with four relational tables after decomposing the single table on mosquitoes. Since we had only one table from the beginning, it is as if we have used the hierarchical (the topdown approach design method) model of the database (Fig. 2).
|
The only table with 18 fields was decomposed into four different tables to reduce data redundancy and to retain the ability to recreate the original relation without leaving out tuples or adding new tuples. Therefore, from the only table we came out with the following tables (Tables 24):
|
|
The original table was the flat file MOSQUITO (Number, Species, PCR, Locality, Quarter, Hour, TypeofCapture, Date, IntExt, Case, Gsal, Parturity, CSP, Blood, Harvester, Genus, Genetics, Commentaries).
This flat file contains the various attributes that the entomological staff took into consideration.
The new tables resulting from the ontology are the following:
- Vectors (SpeciesID, SpeciesName, SpeciesPopulation, SpeciesInfectionRate)
- VectorsRefuge (SpeciesID, RefugeIR, RefugeName, SamplePopulation)
- SeasonalSpecies (SeasonID, SpeciesID, SeasonName, SpeciesPopulation)
- LocalStat (LocalityID, Latitude, Longitude, SpeciesID, LocalityName, PopSampleSpecies).
|
| 7 CONCLUSION |
|---|
|
|
|---|
The work presented in this paper is a contribution to entomological control of malarial vectors through the building of an ontology. If ontology building is not all quite a new technique, its entry in the entomological domain is new. This is why the subject of the paper can lead to many directions. One of the directions involved in this paper is the database field. As a matter of fact, we used the ontology to design a relational database on mosquitoes that can be queried by technicians working in the health domain. Next, we are going to introduce artificial intelligence concept in this work in order to obtain an intelligent system that can support knowledge, deductive capacities, etc. so that we can have at our disposal a decision support system (DSS). We hope that by this effort the information system implemented here is going to give opportunities to many users to exploit data contained is the database to have more information on mosquitoes, malaria, etc. in Cameroon. This can be extended to any region in the world.
|
| Acknowledgments |
|---|
The authors are grateful to the OCEAC, who put at our disposal the study on M and S Molecular Forms of A.gambiae in Cameroon, by Charles Wondji [OCEAC (Organization de coordination pour la lutte contre les endémies en Afrique centrale) and IRD (Institut de Recherche pour le Développement), Yaounde, Cameroon], Josiane Etan (OCEAC, Yaounde, Cameroon), Vincenzo Petrarca (Sezione di Parassitologia, Department di Scienze di Sanità Publica, Università and Dipartimento di Genetica e Biologia Molecolare, Università La Sapienza, Rome, Italy), Timoleon Tchuinkam (OCEAC, Yaounde, Cameroon; University of Dschang, Dschang Cameroon), Federica Santolamazza (Sezione di Parassitologia, Department di Scienze di Sanità Publica, Università La Sapienza, Rome, Italy), Etienne Fondjo (OCEAC, Yaounde, Cameroon; Ministry of Public Health, Yaounde, Cameroon), Alessandra della Torre (Sezione di Parassitologia, Department di Scienze di Sanità Publica, Università La Sapienza, Rome, Italy) and Didier Fotenille (OCEAC, Yaounde, Cameroon; Sezione di Parassitologia, Department di Scienze di Sanità Publica, Università La Sapienza, Rome, Italy).
| REFERENCES |
|---|
|
|
|---|
Burkot, T.R., et al. (1981) Identification of mosquito blood meals by enzyme-linked immunosorbent assay. Am. J. Trop. Med. Hyg., 30, 13361441
Coluzzi, M., et al. (1979) Chromosomal differentiation and adaptation to human environments in the Anopheles gambiae complex. Tran. R. Soc. Trop. Med. Myg., 73, 483497.
Delobel, C. and Adiba, M. (1982) Bases de données et systèmes relationnels, DUNOD informatique.
Technical Report Duineveld, A.J., Storer, R., Weiden, M.R., Kenepa, B., Benjamins, V.R. (1999) A Comparative study of ontological tools. , The Netherlands University of Amsterdam, Department of Social Science Informatics.
Gillies, M.T. and deMeillon, B. (1968) The Anophelinae of Africa south of the Sahara (Afrotropical Region). Publ. S. Afr. Inst. Med. Res., 55, 144.
Gruber, T.R. (1993) Towards principles for the design of ontologies used for knowledge sharing. International Workshop on Formal OntologyPadova, Italy Available as Technical Report KSL-93-04 Knowledge Systems Laboratory, Stanford University.
Technical Report Gruninger, M. and Fox, M.S. (1995) Methodology for the design and evaluation of ontologies. , Canada University of Toronto, Department of Industrial Engineering.
Fontenille, D., et al. (2001) Use of Circumsporozoite protein enzyme-linked immuno-sorbent assay compared with microscopic examination of salivary glands for calculation of malaria infectivity rates in mosquitoes (Diptera: Culicidae) from Cameroon. J. Med. Entomol., 38, , pp. 451454[ISI][Medline].
Hunt, R.H. (1973) A cytological technique for the study of Anopheline gambiae complex. Parassitologia, 15, 137139[Medline].
Martin, J. Principles of Database Management, (1976) , Englewood Cliffs, NJ Prentice-Hall, Inc.
Uschold, M., King, M., Moralee, S., Zorgios, Y. (1998) The enterprise ontology. Knowl. Eng. Rev., 13, 3189[CrossRef].
Writz, R.A., et al. (1987) Comparative testing of monoclonal antibodies against P. falciparum sporozoites for ELISA development. Bull. WHO, 65, 3945[ISI][Medline].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


