Bioinformatics Advance Access originally published online on January 9, 2008
Bioinformatics 2008 24(5):606-612; doi:10.1093/bioinformatics/btn005
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reconstructing ancestral genome content based on symmetrical best alignments and Dollo parsimony
1Department of Computer Science, 2Neuroscience Research Institute, 3Department of Molecular, Cellular and Developmental Biology and 4Department of Ecology, Evolution and Marine Biology, University of California, Santa Barbara, CA 93106, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Gene duplications and losses (GDLs) are important events in genome evolution. They result in expansion or contraction of gene families, with a likely role in phenotypic evolution. As more genomes become available and their annotations are improved, software programs capable of rapidly and accurately identifying the content of ancestral genomes and the timings of GDLs become necessary to understand the unique evolution of each lineage.
Results: We report EvolMAP, a new algorithm and software that utilizes a species tree-based gene clustering method to join all-to-all symmetrical similarity comparisons of multiple gene sets in order to infer the gene composition of multiple ancestral genomes. The algorithm further uses Dollo parsimony-based comparison of the inferred ancestral genes to pinpoint the timings of GDLs onto evolutionary intervals marked by speciation events. Using EvolMAP, first we analyzed the expansion of four families of G-protein coupled receptors (GPCRs) within animal lineages. Additional to demonstrating the unique expansion tree for each family, results also show that the ancestral eumetazoan genome contained many fewer GPCRs than modern animals, and these families expanded through concurrent lineage-specific duplications. Second, we analyzed the history of GDLs in mammalian genomes by comparing seven proteomes. In agreement with previous studies, we report that the mammalian gene family sizes have changed drastically through their evolution. Interestingly, although we identified a potential source of duplication for 75% of the gained genes, remaining 25% did not have clear-cut sources, revealing thousands of genes that have likely gained their distinct sequence identities within the descent of mammals.
Availability: Query server, source code and executable are available at http://kosik-web.mcdb.ucsb.edu/evolmap/index.htm
Contact: kosik{at}lifesci.ucsb.edu, oakley{at}lifesci.ucsb.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Alex Bateman
Received on October 3, 2007; revised on December 13, 2007; accepted on January 3, 2008