Skip Navigation


Bioinformatics Advance Access originally published online on September 24, 2007
Bioinformatics 2007 23(24):3384-3385; doi:10.1093/bioinformatics/btm438
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
23/24/3384    most recent
btm438v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Laubach, T.
Right arrow Articles by von Haeseler, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Laubach, T.
Right arrow Articles by von Haeseler, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

TreeSnatcher: coding trees from images

Thomas Laubach 1,2,3,4 and Arndt von Haeseler 1,2,3,4,*

1Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, Dr-Bohr-Gasse 9/6, A-1030 Vienna, 2University of Vienna, 3Medical University of Vienna and 4University of Veterinary Medicine Vienna, Vienna, Austria

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 PERFORMANCE
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 

Summary: TreeSnatcher is a GUI-driven JAVA application for the semi-automatic recognition of multifurcating phylogenetic trees in pixel images. The program accepts an image file as input and analyzes the topology and the metrics of a tree depicted. The analysis is carried out in a multiple-stage process using algorithms from image analysis. In the end, TreeSnatcher produces a Newick tree code that represents the tree structure optionally including branch lengths. TreeSnatcher can process trees with 100 leaves or more in a few seconds.

Availability: TreeSnatcher was developed in JAVA under Mac OS X and is executable on UNIX/Linux, Windows and Mac OS X systems. The application and its documentation can be freely downloaded from http://www.cibiv.at/software/treesnatcher.

Contact: arndt.von.haeseler{at}univie.ac.at


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 PERFORMANCE
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 
Figures of phylogenetic trees are widely used in publications to illustrate the result of an evolutionary analysis. However, as one cannot effortlessly extract a machine-readable representation, i.e. a Newick expression, of the phylogeny from such images, those are not suited for subsequent reanalysis or easy compilation of tree topologies. Therefore, a computer-readable representation of a published tree has either to be built completely by hand or by using special applications. The program TreeThief (Rambaut, 2000), presumably the first that meets this demand, allows the user to trace a given tree by clicking on each of its nodes in turn. However, TreeThief gets quite tedious if the number of taxa represented in the tree is large.

Here, we present a conceptually advanced program, TreeSnatcher. It identifies the topology of a tree (e.g. a figure from a publication) and its branch lengths almost automatically with only minor user interaction. The application features a sophisticated graphical user interface that is based on the JAVA Swing API.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 PERFORMANCE
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 
2.1 Prerequisites
The current version of TreeSnatcher opens tree-image files in the formats PNG, JPG/JPEG or GIF. The PDF format is currently not supported. The user has to provide an image that (a) shows a tree and (b) complies with the following requirements:

  • The image has a light and homogeneous background that is clearly separated from a dark foreground.
  • The tree along with species names and branch captions constitute the foreground. Lettering and branches must not overlap.
  • The branches consist of evenly thin, plain, not necessarily straight continuous lines.
  • The inner nodes are branching points between lines and have no circles, rectangles, etc. inscribed.
  • The outer nodes do not feature leaves: the edges end at a well-defined location.

If the image does not meet those requirements, the tree topology will not be identified correctly. An example of an admissible tree-image is displayed in Figure 1a.


Figure 1
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. An admissible pixel image displaying a tree with arbitrary branch lengths is shown in (a), an early stage of the tree encoding production chain in (b), and the Newick code for the tree in (c).

 
2.2 WORKFLOW
Working with TreeSnatcher takes place along a mandatory succession of program stages. The program checks whether the user has performed the formal task(s) specified at any given stage and permits advancing to the next stage. On the other hand, the user controls all image manipulations and recognition tasks performed by the program. Figure 1b displays an early stage of the tree encoding production chain.

The entire workflow is as follows:

  • The program converts the specified RGB image into a grayscale image (c.f. ITU-R Recommendation BT.709-3, 1998). As color information is not needed for the tree recognition, all RGB values get expressed as shades of gray.
  • The user trims and cuts the image at will. That is to say, the user can select sub-trees or a subset of taxa from the converted image.
  • The user corrects the image by drawing and erasing pixels. If a line of known length is present in the image, i.e. a scale, the user can mark it and specify a unit length in a dialog box to scale the tree.
  • The program first separates all foreground shades from the background shades in order to extract the tree from the background. Then, it thins out the foreground to prepare the detection of tree nodes and edges. The user marks a location on an edge and instructs the program to color the foreground reachable from this location.
  • The program carries out the detection of nodes and leaves based on the previously colored part of the image. The user rejects or adds tree nodes by drawing and erasing pixels. Then TreeSnatcher repeats recognition step 5 on demand.
  • The program displays the inner nodes and edges of the tree. The user can provide individual edge lengths in a dialog box by clicking on the nodes that are uniquely assigned to an edge. This overrides a previous specification of a unit length for the whole tree.
  • The user clicks on each leaf node in turn in order to type in the corresponding species name in a dialog box.
  • The user either clicks on a tree node to designate it as the root of the tree, or clicks on a position on an edge to mark the root.
  • The program constructs and displays the Newick expression that represents the tree structure. If the user has provided scaling information for the tree, the Newick string contains branch lengths.

Figure 1c displays the Newick code for the tree in Figure 1a. A detailed description of the thinning (skeletonizing) algorithm used in step 4 is given in Zhang and Suen (1984). In steps 4 and 5, several variants on the flood-filling technique are used (Burger and Burge, 2005).


    3 PERFORMANCE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 PERFORMANCE
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 
We carried out some authentic tree recognition tasks with TreeSnatcher in which we analyzed images of different size and resolution on a PowerPC equipped Apple iBook 800 MHz with 640 MB main memory. The JAVA SDK 1.5 Virtual Machine running on top of Mac OS X consumes a fairly high amount of heap memory and reserves a huge portion of virtual memory, leaving a comparatively small amount of physical memory for TreeSnatcher. The performance decrease had to be compensated by swapping out images to disk.

The fourth and the fifth steps in the workflow are the most time consuming: The program performs binarizing and skeletonizing of a 300 x 400 RGB grayscale image in 2 s and of a 1400 x 1200 image in 8 s. For the first image representing a tree with 20 species the node and edge detection is done within 2 s. For the second image representing a tree with 100 species the latter is done in 12 s. The response time will be further improved by algorithmic means.


    4 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 PERFORMANCE
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 
We presented a program that automates the generation of a machine-readable representation of phylogenetic trees contained in images. TreeSnatcher uses established methods from image analysis to preprocess the source image and to detect the tree structure. The user supervises the semi-automatic recognition process on every program stage and makes corrections to the image where necessary. This novel approach capitalizes on the cooperation between user and program and works well in practice. Furthermore, it accelerates inputting huge topologies from scanned images. This suggests that TreeSnatcher is well fitted to support the research on complex phylogenies. TreeSnatcher will be helpful if one wants to combine phylogenies based on a collection of already published trees that are not electronically available (c.f. Bininda-Emonds, 2004).


    ACKNOWLEDGEMENT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 PERFORMANCE
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 
We thank the Vienna Science and Technology Fund (WWTF) for generous financial support.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Keith Crandall

Received on July 3, 2007; revised on August 15, 2007; accepted on August 17, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 PERFORMANCE
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 

    Bininda-Edmonds ORP. Phylogenetic supertrees: combining information to reveal the tree of life. Comput. Biol. Ser (2004) 4:1–550.

    Burger W, Burge MJ. Digitale Bildverarbeitung (2005) Heidelberg, Berlin: Springer Verlag. 196–200.

    International Telecommunications Union: ITU-R Recommendation BT.709-3 (1998) Basic Parameter Values for the HDTV Standard for the Studio and for International Programme Exchange.

    Rambaut A. TreeThief: a tool for manual phylogenetic tree entry. (2000) http://evolve.zoo.ox.ac.uk/software/TreeThief/main.html.

    Zhang TY, Suen CY. A fast parallel algorithm for thinning digital patterns. Image Process. Comput. Vis (1984) 27:236–239.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
23/24/3384    most recent
btm438v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Laubach, T.
Right arrow Articles by von Haeseler, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Laubach, T.
Right arrow Articles by von Haeseler, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?