Bioinformatics Vol. 19 no. 9 2003
Pages 1162-1168
© 2003 Oxford University Press
The small-world dynamics of tree networks and data mining in phyloinformatics

1 Institute of Evolutionary and Ecological
Sciences, Kaiserstraat 63, Leiden University 2311 GP Leiden,
The Netherlands
2 Section of Evolution and Ecology, One
Shields Ave, University of California, Davis, CA 95616,
USA
3 Department of Ecology and Evolutionary
Biology, PO Box 208106, Yale University, New Haven, CT 06520-8106,
USA
Received on May 11, 2002
; revised on November 6, 2002 and January 7, 2003
; accepted on January 14, 2003
Motivation: A noble and ultimate objective of phyloinformatic research is to assemble, synthesize, and explore the evolutionary history of life on earth. Data mining methods for performing these tasks are not yet well developed, but one avenue of research suggests that network connectivity dynamics will play an important role in future methods. Analysis of disordered networks, such as small-world networks, has applications as diverse as disease propagation, collaborative networks, and power grids. Here we apply similar analyses to networks of phylogenetic trees in order to understand how synthetic information can emerge from a database of phylogenies
Results: Analyses of tree network connectivity in TreeBASE show that a
collection of phylogenetic trees behaves as a small-world
networkwhile on the one hand the trees are clustered, like
a non-random lattice, on the other hand they have short
characteristic path lengths, like a random graph. Tree
connectivities follow a dual-scale power-law distribution (first
power-law exponent
1.87; second
4.82).
This unusual pattern is due, in part, to the presence of
alternative tree topologies that enter the database with each
published study. As expected, small collections of trees
decrease connectivity as new trees are added, while large
collections of trees increase connectivity. However, the
inflection point is surprisingly low: after about 600 trees the
network suddenly jumps to a higher level of coherence. More
stringent definitions of neighbour greatly delay the
threshold whence a database achieves sufficient maturity for a
coherent network to emerge. However, more stringent definitions
of neighbour would also likely show improved focus in data
mining
Availability: http://treebase.org
Contact: wpiel{at}buffalo.edu
* To whom correspondence should be addressed.
Current address: Department of
Biological Sciences, University at Buffalo, Buffalo, NY 14260,
USA
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R.D. Reeleder Rhexocercosporidium panacis sp. nov., a new anamorphic species causing rusted root of ginseng (Panax quinquefolius) Mycologia, January 1, 2007; 99(1): 91 - 98. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Kunin, L. Goldovsky, N. Darzentas, and C. A. Ouzounis The net of life: Reconstructing the microbial phylogenetic network Genome Res., July 1, 2005; 15(7): 954 - 959. [Abstract] [Full Text] [PDF] |
||||

