An environmental perspective on large-scale genome clustering based on metabolic capabilities
1Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Ingolstädter Landstraße 1, D-85764 Neuherberg, 2Computer-Chemie-Centrum, University of Erlangen-Nürnberg, Nägelsbachstraße 25, D-91052 Erlangen, 3Molecular Networks GmbH, Henkestraße 91, D-91052 Erlangen and 4Chair for Genome-Oriented Bioinformatics, Technische Universität München, Life and Food Science Center Weihenstephan, Am Forum 1, D-85354 Freising-Weihenstephan, Germany
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: In principle, an organism's ability to survive in a specific environment, is an observable result of the organism's regulatory and metabolic capabilities. Nonetheless, current knowledge about the global relation of the metabolisms and the niches of organisms is still limited.
Results: In order to further investigate this relation, we grouped species showing similar metabolic capabilities and systematically mapped their habitats onto these groups. For this purpose, we predicted the metabolic capabilities for 214 sequenced genomes. Based on these predictions, we grouped the genomes by hierarchical clustering. Finally, we mapped different environmental conditions and diseases related to the genomes onto the resulting clusters. This mapping uncovered several conditions and diseases that were unexpectedly enriched in clusters of metabolically similar species. As an example, Encephalitozoon cuniculi—a microsporidian causing a multisystemic disease accompanied by CNS problems in rabbits— occurred in the same metabolism-based cluster as bacteria causing similar symptoms in humans.
Supplementary information: Supplementary data are available at Bioinformatics online.
Contact: g.kastenmueller{at}helmholtz-muenchen.de
| 1 INTRODUCTION |
|---|
|
|
|---|
Any observable phenotype of an organism must be related to its regulatory and metabolic capabilities by some means or other. The ability of an organism to invade and survive in a specific environment represents a special sort of phenotype. Several relations between the metabolism and the niche of organisms have already been shown by both experimental and in silico analyses (Bjursell et al., 2006;Comstock and Coyne, 2003; DeLong and Karl, 2005; Ginger, 2006; Matilla et al., 2007; Pedrós-Alió, 2006; Rediers et al., 2005; Stein et al., 2007; Templeton et al., 2004). Many studies uncovered or confirmed global metabolic tendencies for host-associated lifestyles. In particular, genome analyses provided valuable insights into the inter-relation of niche and metabolism for organisms that are difficult to cultivate in the laboratory such as obligate intracellular organisms (Zhou and Thompson, 2004). As an example, most host-associated species show reduced metabolic capabilities compared to their free-living relatives. Whereas symbionts are able to produce a variety of amino acids, pathogens have lost many of these capabilities. Though highly valuable, most of these studies have been limited to either a specific group of organisms (Bjursell et al., 2006; Comstock and Coyne, 2003; Ginger, 2006; Stein et al., 2007; Templeton et al., 2004) or a specific environmental condition (DeLong and Karl, 2005; Matilla et al., 2007; Pedrós-Alió, 2006).
The growing number of complete genomic sequences facilitates comparing the metabolic capabilities of species in a larger scale. In several phylogenetic studies, genome trees have already been built based on the similarities in the genomes metabolic features. Thereby, metabolic similarity has been defined by different means: (i) similarity of individual metabolic pathways using various similarity measures (Clemente et al., 2005; Forst and Schulten, 1999, 2001; Heymans and Singh, 2003; Zhang et al., 2006), (ii) similarity of the whole-genome enzyme or reaction content (Aguilar et al., 2004; Ma and Zeng, 2004) and (iii) similarity of the profile of all metabolic pathways (Hong et al., 2004; Liao et al., 2002). However, these studies mainly focused on the evolution of metabolic pathways (Clemente et al., 2005; Forst and Schulten, 1999, 2001; Heymans and Singh, 2003; Liao et al., 2002; Zhang et al., 2006), the investigation of horizontal gene transfer (Ma and Zeng, 2004), and the comparison of metabolism-based phylogenies to 16S rRNA-based genome trees (Aguilar et al., 2004; Hong et al., 2004; Heymans and Singh, 2003; Liao et al., 2002; Ma and Zeng, 2004), respectively. Thus, no systematic comparison of the grouping of genomes to different environmental conditions or diseases has been accomplished so far.
In order to provide an environmental perspective on metabolism-based genome trees, our approach extends previous work as follows:
- We shifted the focus of the analysis to the relation of niches and metabolism by systematically mapping several different environmental conditions and diseases onto the groups of metabolically similar genomes. Thereby, our study comprised 214 genomes and 12 environmental conditions and diseases, respectively.
- For deriving a metabolism-based genome tree, we assessed the metabolic capabilities of the genomes using a method that considers both the completeness and the uniqueness of enzymes in a metabolic pathway. Compared to previous approaches, this method facilitates the comparison of pathways across genomes. On the one hand, the method allows the comparison of incomplete pathways, which are often found for parasitic organisms. On the other hand, it is more robust against gaps in genome annotation. (For details see Methods.)
Here, we show that—by these extensions—our approach was able to uncover several interesting and previously unreported inter-relations of metabolism and niches. In case of disease-related niches, the knowledge about such inter-relations is highly valuable for the development of antibiotic drugs (Ali and Nozaki, 2007; Monaghan and Barrett, 2006).
| 2 METHODS |
|---|
|
|
|---|
Our approach for relating environmental conditions and diseases to the metabolic capabilities of species can be divided into three steps: (i) assessing metabolic capabilities based on annotated genomes, (ii) clustering of genomes described by their metabolic capabilities and (iii) mapping of environmental conditions and diseases related to the genomes onto the clusters of metabolically similar genomes.
2.1 Assessing metabolic capabilities
Analogous to the approaches by Hong et al. (2004) and Liao et al. (2002), we represented the metabolic capabilities of a species by a comprehensive set of entire metabolic pathways. Thus, we focused on the whole metabolism instead of single pathways as used by Clemente et al. (2005); Forst and Schulten (1999, 2001) Heymans and Singh (2003); Zhang et al. (2006). Similar to the approach by Hong et al. (2004), we represented each metabolic pathway by a score value instead of a binary value, which can indicate only the presence or absence of a pathway (as used by Liao et al., 2002). Compared to a binary value, a pathway score allows assessing the completeness of pathways in species. This is particularly advantageous for pathways of parasitic genomes. Since these organisms often cover only parts of known pathways, a decision about presence or absence is often not appropriate.
In contrast to Hong et al. (2004), we considered not only the ratio of catalysed reactions found for each pathway in a specific genome (annotation), but also the importance of key reactions (enzymes) for assessing pathways based on genomic data (Paley and Karp, 2002). The presence of such key reactions often indicates the presence of the whole pathway. For our score, we therefore weighted the presence of an enzymatic reaction in a pathway p for a genome g by the number of occurrences of this reaction in all reference pathways considered. The sum of these weighted values for all reactions in p is normalized to scores ranging from 0 (no reaction of the pathway is catalysed) to 1 (pathway is complete) by dividing it by the maximum sum, representing the case that all reactions needed for p are available in the genome. This score—herein after referred to as pathway reconstruction score—is formally defined as follows.
Given a set of reference pathways p, a set R of known biochemical reactions and a set of reactions (RO) with (RO
R) containing all reactions identified for the organism o (by mapping of the annotated enzymes onto the corresponding reactions), the pathway reconstruction score score(p) for a reference pathway p (formally defined as multiset of reactions in R) is determined by Equation 1.
|
| (1) |
As resource for reference pathways (p) and biochemical reactions (R), we chose the freely available Biochemical Pathways database (BioPath) (http://www.mol-net.de/databases/biopath.html) (Reitz et al., 2004), an electronic representation of the Roche Applied Science Biochemical Pathways wall chart (Michal, 1999). Similar to KEGG (Kanehisa et al., 2006), BioPath provides a generic, multi-organism view on pathways. In contrast to KEGG, these pathways are defined in smaller sets (5.4 (BioPath) compared to 21 (KEGG) reactions per reference pathway on average). For our purposes, pathway maps in KEGG summarize too many different metabolic functions in one unit (e.g. Arginine and Proline Metabolism). BioPath contains 290 pathways, for which EC numbers are assigned to the underlying reactions.
As a resource for annotated enzymes of sequenced genomes, we used the PEDANT genome database (Frishman et al., 2003). This database provides exhaustive automatic analysis for a huge number of genomic sequences by a large variety of established bioinformatics tools. Out of 269 sequenced genomes in PEDANT (mitochondria, chloroplasts and genomes, lacking some of their plasmids, were not considered), we chose 214 genomes by (randomly) retaining only one genome (e.g. E.coli K12) for each species (e.g. E.coli). For the mapping of annotated enzymes onto biochemical reactions, we used the EC number predictions of enzymes provided by PEDANT.
Hence, assessing the metabolic capabilities for 214 genomes based on 290 BioPath pathways, using the method delineated above, resulted in a 214 x 290 matrix of pathway reconstruction scores. Each row of this matrix describes the score-based pathway profile (metabolic pattern) of a specific genome. Each column can be interpreted as the phylogenetic profile of a pathway (Fig. 1).
|
2.2 Clustering of genomes
In order to group genomes that show similar metabolic capabilities, we applied a hierarchical clustering algorithm to the genomes represented by their score-based pathway profiles. For clustering, we used the Ward's clustering as implemented in the R statistical software package (http://www.r-project.org). Ward's clustering criterion optimizes the minimum variance within clusters and creates clusters of near equal size (Ward, 1963).
Before clustering, the pathway profiles have been standardized. For this purpose, we subtracted the mean of all 214 pathway reconstruction scores for a pathway from each score value of this pathway in the matrix and then divided the result by the standard deviation observed for this pathway. We determined the Euclidean distance matrix for the standardized pathway profiles as input for the clustering algorithm.
After clustering, we tested the confidence level of the genome clustering obtained, using the multiscale bootstrap resampling technique that has been introduced by Shimodaira (2002). This technique is based on the theory described in Efron et al. (1996). For the test of confidence, we used the implementation of Shimodaira's (2002) technique provided within the R package pvclust (http://www.is.titech.ac.jp/~shimo/prog/pvclust). To all clusters, contained in the original clustering, pvclust assigns (approximately unbiased) confidence values (AU values) between 0 and 1 representing low and high confidence, respectively.
For assessing the differences between the metabolism-based genome tree, as produced by our method, and a gene-content-based genome tree, we used the SHOT webserver (http://coot.embl.de/~korbel/SHOT_v2) (Korbel et al., 2002). Within the SHOT server, the similarity between two species is defined as the ratio of the number of shared orthologs and a normalization value reflecting the genome size. We downloaded the respective distance matrix for the 110 genomes provided by the server (using the default parameters). In order to obtain a gene-content-based genome tree, we applied the same clustering method as used for the metabolism-based clustering to this distance matrix.
2.3 Mapping of environmental conditions and diseases onto genome clusters
Based on phenotypic descriptions provided by the NCBI Genome Project (http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj), by the EBI Integr8 database (http://www.ebi.ac.uk/integr8/) and by Karyn's Genomes (http://www.ebi.ac.uk/2can/genomes/genomes.html), we collected information on several habitats, environmental conditions, and diseases for all 214 genomes considered in this analysis (if available). We assigned the preferences of the corresponding organisms with respect to three common habitats (terrestrial, aquatic, intracellular), two environmental conditions (anaerobic and thermophilic), five host-associated (symbionts and pathogens) habitats (plant-associated, gastrointestinal tract of animals, gastrointestinal tract of insects, urogenital tract of animals, and oral cavity of humans) and two diseases (periodontal disease and meningitis/encephalitis). Since part of the phenotypic information assigned to genomes (e.g. diseases) was only available in form of free-text descriptions of the genomes, these assignments might be incomplete.
We systematically investigated the association of each phenotypic feature to each genome cluster that has been formed by the hierarchical clustering procedure. For this purpose, we cut the genome tree 106 times at each level representing a partitioning of the genomes into k=2, 3, 4, ..., 107 clusters, respectively. Based on the hypergeometric distribution, we calculated a P-value for each feature and each cluster in a specific partitioning. In order to correct the P-values for multiple testing of clusters and features, we multiplied it by the number of clusters (k) and the number of features (12) tested (Bonferroni correction).
| 3 RESULTS |
|---|
|
|
|---|
We assessed the metabolic capabilities of 214 species based on 290 reference pathways given by BioPath and based on genome annotations given by PEDANT. As result, we obtained a 214 x 290 matrix of pathway reconstruction scores. Each row of this matrix represents the pathway profile of a specific genome and thus describes its metabolic capabilities. Figure 2 shows the dendrogram that we received by clustering the genomes based on the distance of pathway profiles (after standardization). (For confidence values assigned to the clusters, see the Supplement.)
|
For each habitat, environmental condition and disease, we marked the preferences of each species by colored bars in a specific column next to the dendrogram. In detail, red bars in the columns 4–8 denote an anaerobic, obligate intracellular, aquatic, thermophilic and terrestrial lifestyle, respectively. Facultative intracellular species are marked by orange bars. All species that are able to populate one of the following niches as a symbiont or as a pathogen are also marked by red bars in the columns 9–13, respectively: the gastrointestinal tract of mammals or insects, the urogenital tract of mammals, a plant-associated niche and the oral cavity of humans. For plant-associated organisms, red bars denote plant symbionts, whereas orange bars denote pathogens. In the last two columns, red bars mark species that are strongly related to periodontal disease or meningitis/encephalitis, respectively. For further comparisons of the genome clusters to the NCBI taxonomy (http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy), the genome size and the gram stain (red: gram-positive), we color-coded these genome features in column 1–3 of Figure 2. The darker the bar representing a genome's size (in mega base pairs (Mbp)), the smaller its genome is. A genome size above 5 Mbp is represented by a white bar. Each taxonomic group is described by a specific color in the first column in Figure 2 (red: Archaea; orange: Eukaryota; white: Proteobacteria; blue: Firmicutes and pink: Spirochaetes).
Table 1 lists the clusters that showed the most significant association (lowest P-value) to the 12 phenotypic features, respectively. These clusters are also marked in Figure 2 by black rectangles.
|
By the systematic mapping of habitats, environmental conditions and diseases onto the genome clusters, as shown in Figure 2, one can immediately capture the phenotypic features that are enriched in specific genome clusters. Such enrichments suggest that the corresponding feature is related to a specific metabolic pattern by some means. On the one hand, our clustering confirmed the inter-relations of metabolisms and niches, found by other studies, for a higher number of genomes. On the other hand, we also uncovered inter-relations that have not been reported so far to the best of our knowledge. In the following, we focus on selected examples for such inter-relations.
3.1 Taxonomic aspects
Previous studies on metabolism-based genome trees reported that most bacterial taxonomic groups are widely spread over several clusters in the tree. In contrast, eukaryotes and Archaea are clearly separated into two distinct clusters (Aguilar et al., 2004; Liao et al., 2002; Ma and Zeng, 2004). This separation has also been obtained for the gene-content-based clustering derived from the SHOT genome distance matrix. In our analyses, we found the same tendency for the clustering of these taxonomic groups. However, we observed two exceptions: Nanoarchaeum equitans, an archaeon showing a symbiotic (or parasitic) lifestyle (Waters et al., 2003) and Encephalitozoon cuniculi, a pathogenic microsporidian (Katinka et al., 2001), did not cluster with their taxonomic relatives.
Compared to previous studies, the systematic mapping of environmental conditions onto the genome clusters allows for additional assertions. As an example, the clustering within the Archaea group largely corresponds to three different combinations of environmental conditions: a cluster containing anaerobic aquatic thermophiles, another cluster containing mostly non-aneaerobic non-aquatic thermophiles and a cluster consisting of non-thermophilic aquatic Archaea.
3.2 Symbiotic and parasitic lifestyles
In general, Figure 2 demonstrates that most species that exclusively or mainly co-occur together with their hosts are grouped in the same cluster. The clustering of obligate intracellular species, species associated to the gastrointestinal tract of insects or the urogenital tract of animals, and several species associated to the oral cavity of humans becomes obvious by looking at the corresponding columns in Figure 2.
In previous metabolism-based genome trees, obligate bacterial parasites also tended to occur in the same cluster, though they belong to different (bacterial) taxonomic groups (Aguilar et al., 2004; Hong et al., 2004; Liao et al., 2002; Ma and Zeng, 2004; Zhang et al., 2006). Thus, our analysis confirmed this tendency for a higher number of genomes and shows that it also extends to a eukaryotic microsporidian species, namely E.cuniculi, and to the archaeal symbiont (or parasite) N.equitans. Due to their lifestyles, both species show reduced metabolic capabilities compared to their taxonomic neighbors. This might be the reason why N.equitans and E.cuniculi occur in the same cluster as other host-associated species.
3.3 Oral cavity and periodontal disease
Besides the general tendency of clustering parasitic bacteria, the systematic mapping of environmental conditions and diseases revealed an inter-relation of metabolism and niche for genomes residing in the oral cavity of humans. This inter-relation has not been reported in previous studies on metabolism-based genome clustering. Five species (Fusobacterium nucleatum, Porphyromonas gingivalis, Treponema denticola, Lactobacillus johnsonnii and Lactobacillus acidophilus) that are related to the human oral cavity form a distinct cluster. With respect to taxonomy, this clustering has been unexpected, since these species belong to four completely different taxonomic groups, namely to the Fusobacteria, the Bacteriodetes, the Spirochaetes and the Firmicutes. In the gene-content-based clustering derived from SHOT data, F.nucleatum is clustered with Thermotoga maritima and three genomes of the class Clostridia. The SHOT dataset lacks the remaining four genomes of the metabolism-based cluster.
Among the five oral species grouped in our clustering, there are three out of the four species (within our dataset) that are known to be enhanced in periodontal disease (Socransky et al., 2004). Thereby, the clustering of F.nucleatum and P.gingivalis is most confident showing a confidence value of 0.98.
3.4 Diseases accompanied by central nervous system problems
Analyzing our clustering with respect to meningitis and encephalitis revealed another unexpected, previously unreported inter-relation of metabolism and disease: we found the cluster containing the three Spirochaetes Borrelia burgdorferi, Borrelia garinii, Treponema pallidum, and the eukaryotic microsporidian E.cuniculi to be the most significant for the association of disease and clusters. The two Borrelia species are the causing agents of borreliosis, a tick-borne multisystemic disease, that causes problems in the central nervous system (CNS) in the chronic state of the disease (sometimes occurring not until years after infection) (Kaiser, 1998; Wilske et al., 2007). Treponema pallidum is the causing agent of syphilis, which also causes multisystemic disorders and CNS problems in a late chronic state of the disease (Lafond and Lukehart, 2006). Though taxonomically distant, the species E.cuniculi also causes a multisystemic disease in rabbits which is usually accompanied by neurological symptoms (Künzel et al., 2008). In the clustering by gene-content using SHOT data, the two Spirochaetes B.burgdorferi and T.pallidum are also clustered together. However, E.cuniculi is grouped with all other eukaryotes.
| 4 DISCUSSION |
|---|
|
|
|---|
As demonstrated above, our approach facilitated uncovering previously unreported associations of metabolism and niches. In this work, we mainly focused on analysing the associations that showed the highest statistical significance for the respective habitats, environmental conditions and diseases. However, highest statistical significance does not necessarily imply highest biological relevance. Nonetheless, a deeper analysis of metabolic similarities found in our analysis could provide valuable insights in at least some of the biochemical mechanisms that are associated to the corresponding environmental condition or disease.
In addition to analyzing significant associations, further analysis of exceptions—such as the clustering of Coxiella burnetii—could provide new biochemical insights for diseases. Coxiella burnetii is the causing agent of the (tick-borne) Q-fever and shares several parasitic strategies with the Rickettsiales genomes (Seshadri et al., 2003). Though C.burnetii also shows an obligate intracellular lifestyle, it occurs in a cluster that mainly contains free-living bacteria.
Finally, the investigation of specific combinations of phenotypic features with respect to metabolism-based clusters, and the consideration of dependencies among features could further improve the analysis.
| 5 CONCLUSION |
|---|
|
|
|---|
We have presented an approach that provides an environmental perspective on metabolism-based genome clusters. The systematic mapping of environmental conditions and diseases onto these clusters, led to new insights into the inter-relation of metabolism and niches for several habitats and diseases. Further analysis of the metabolic similarity of species in the two disease-related clusters found (periodontal disease and multisystemic disease accompanied by CNS problems) could give valuable hints for the development of new drugs.
| ACKNOWLEDGEMENT |
|---|
|
|
|---|
We thank Carsten Marr and Karsten Suhre for their advice regarding statistical issues and Oliver Sacher for advice regarding the BioPath data.
Conflict of Interest: none declared.
| REFERENCES |
|---|
|
|
|---|
Aguilar D, et al. Analysis of phenetic trees based on metabolic capabilites across the three domains of life. J. Mol. Biol (2004) 340:491–512.[CrossRef][Web of Science][Medline]
Ali V, Nozaki T. Current therapeutics, their problems, and sulfur-containingamino-acid metabolism as a novel target against infections by amitochondriate protozoan parasites. Clin. Microbiol. Rev (2007) 20:164–187.
Bjursell MK, et al. Functional genomic and metabolic studies of the adaptations of a prominent adult human gut symbiont, Bacteroides thetaiotaomicron, to the suckling period. J. Biol. Chem (2006) 281:36269–36279.
Clemente JC, et al. Reconstruction of phylogenetic relationships from metabolic pathways based on the enzyme hierarchy and the gene ontology. Genome Inform (2005) 16:45–55.[Medline]
Comstock LE, Coyne MJ. Bacteroides thetaiotaomicron: a dynamic, niche-adapted human symbiont. Bioessays (2003) 25:926–929.[CrossRef][Web of Science][Medline]
DeLong EF, Karl DM. Genomic perspectives in microbial oceanography. Nature (2005) 437:336–342.[CrossRef][Web of Science][Medline]
Efron B, et al. Bootstrap confidence levels for phylogenetic trees. Proc. Natl Acad. Sci. USA (1996) 93:7085–7090.
Forst CV, Schulten K. Evolution of metabolisms: a new method for the comparison of metabolic pathways using genomics information. J. Comput. Biol (1999) 6:343–360.[CrossRef][Web of Science][Medline]
Forst CV, Schulten K. Phylogenetic analysis of metabolic pathways. J. Mol. Evol (2001) 52:471–489.[Web of Science][Medline]
Frishman D, et al. The PEDANT genome database. Nucleic Acids Res (2003) 31:207–211.
Ginger ML. Niche metabolism in parasitic protozoa. Philos. Trans. R. Soc. Lond. B Biol. Sci (2006) 361:101–118.
Heymans M, Singh AK. Deriving phylogenetic trees from the similarity analysis of metabolic pathways. Bioinformatics (2003) 19(Suppl. 1):i138–i146.[Abstract]
Hong SH, et al. Phylogenetic analysis based on genome-scale metabolic pathway reaction content. Appl. Microbiol. Biotechnol (2004) 65:203–210.[Web of Science][Medline]
Kaiser R. Neuroborreliosis. J. Neurol (1998) 245:247–255.[CrossRef][Web of Science][Medline]
Kanehisa M, et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res (2006) 34:D354–D357.
Katinka MD, et al. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature (2001) 414:450–453.[CrossRef][Web of Science][Medline]
Korbel JO, et al. SHOT: a web server for the construction of genome phylogenies. Trends. Genet (2002) 18:158–162.[CrossRef][Web of Science][Medline]
Künzel F, et al. Clinical symptoms and diagnosis of encephalitozoonosis in pet rabbits. Vet. Parasitol (2008) 151:115–124.[CrossRef][Web of Science][Medline]
Lafond RE, Lukehart SA. Biological basis for syphilis. Clin. Microbiol. Rev (2006) 19:29–49.
Liao L, et al. Genome comparisons based on profiles of metabolic pathways. (2002) In Sixths International Conference on Knowledge-Based Intelligent Information & Engineering Systems, 16–18 September 2002: Crema, Italy.
Ma H-W, Zeng A-P. Phylogenetic comparison of metabolic capacities of organisms at genome level. Mol Phylogenet. Evol (2004) 31:204–213.[CrossRef][Web of Science][Medline]
Matilla MA, et al. Genomic analysis reveals the major driving forces of bacterial life in the rhizosphere. Genome. Biol (2007) 8:R179.[CrossRef][Medline]
Michal G. Biochemical Pathways. (1999) Heidelberg: Spektrum Akademischer Verlag GmbH.
Monaghan RL, Barrett JF. Antibacterial drug discovery–then, now and the genomics future. Biochem. Pharmacol (2006) 71:901–909.[CrossRef][Web of Science][Medline]
Paley SM, Karp PD. Evaluation of computational metabolic pathway predictions for Helicobacter pylori. Bioinformatics (2002) 18:715–724.
Pedrós-Alió C. Genomics and marine microbial ecology. Int. Microbiol (2006) 9:191–197.[Web of Science][Medline]
Rediers H, et al. Unraveling the secret lives of bacteria: use of in vivo expression technology and differential fluorescence induction promoter traps as tools for exploring niche-specific gene expression. Microbiol. Mol. Biol. Rev (2005) 69:217–261.
Reitz M, et al. Enabling the exploration of biochemical pathways. Org. Biomol. Chem (2004) 2:3226–3237.[CrossRef][Web of Science][Medline]
Seshadri R, et al. Complete genome sequence of the Q-fever pathogen Coxiella burnetii. Proc. Natl. Acad. Sci. USA (2003) 100:5455–5460.
Shimodaira H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol (2002) 51:492–508.[CrossRef][Web of Science][Medline]
Socransky SS, et al. Use of checkerboard DNA-DNA hybridization to study complex microbial ecosystems. Oral Microbiol. Immunol (2004) 19:352–362.[CrossRef][Web of Science][Medline]
Stein LY, et al. Whole-genome analysis of the ammonia-oxidizing bacterium, Nitrosomonas eutropha C91: implications for niche adaptation. Environ. Microbio (2007) 9:2993–3007.[CrossRef]
Templeton TJ, et al. Comparative analysis of apicomplexa and genomic diversity in eukaryotes. Genome Res (2004) 14:1686–1695.
Ward JH. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc (1963) 58:236–244.[CrossRef][Web of Science]
Waters E, et al. The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc. Natl. Acad. Sci. USA (2003) 100:12984–12988.
Wilske B, et al. Microbiological and serological diagnosis of lyme borreliosis. FEMS Immunol. Med. Microbiol (2007) 49:13–21.[CrossRef][Web of Science][Medline]
Zhang Y, et al. Phylophenetic properties of metabolic pathway topologies as revealed by global analysis. BMC Bioinform (2006) 7:1–13.[CrossRef][Web of Science][Medline]
Zhou J, Thompson DK. Genomic insights into lifestyle evolution. In: Microbial Functional Genomics.—Zhou J, et al, eds. (2004) New Jersey: Wiley-IEEE. 67–112.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


