Skip Navigation


Bioinformatics Advance Access originally published online on April 18, 2008
Bioinformatics 2008 24(12):1433-1441; doi:10.1093/bioinformatics/btn196
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/12/1433    most recent
btn196v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kojima, K.
Right arrow Articles by Miyano, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kojima, K.
Right arrow Articles by Miyano, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Fast grid layout algorithm for biological networks with sweep calculation

Kaname Kojima , Masao Nagasaki * and Satoru Miyano

Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 FLOW PENALIZATION AND...
 4 EXPERIMENTAL RESULTS
 5 CONCLUSION
 REFERENCES
 

Motivation: Properly drawn biological networks are of great help in the comprehension of their characteristics. The quality of the layouts for retrieved biological networks is critical for pathway databases. However, since it is unrealistic to manually draw biological networks for every retrieval, automatic drawing algorithms are essential. Grid layout algorithms handle various biological properties such as aligning vertices having the same attributes and complicated positional constraints according to their subcellular localizations; thus, they succeed in providing biologically comprehensible layouts. However, existing grid layout algorithms are not suitable for real-time drawing, which is one of requisites for applications to pathway databases, due to their high-computational cost. In addition, they do not consider edge directions and their resulting layouts lack traceability for biochemical reactions and gene regulations, which are the most important features in biological networks.

Results: We devise a new calculation method termed sweep calculation and reduce the time complexity of the current grid layout algorithms through its encoding and decoding processes. We conduct practical experiments by using 95 pathway models of various sizes from TRANSPATH and show that our new grid layout algorithm is much faster than existing grid layout algorithms. For the cost function, we introduce a new component that penalizes undesirable edge directions to avoid the lack of traceability in pathways due to the differences in direction between in-edges and out-edges of each vertex.

Availability: Java implementations of our layout algorithms are available in Cell Illustrator.

Contact: masao{at}ims.u-tokyo.ac.jp

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 FLOW PENALIZATION AND...
 4 EXPERIMENTAL RESULTS
 5 CONCLUSION
 REFERENCES
 
For biological pathways such as signal transduction pathways, gene regulatory networks and metabolic pathways, one of the crucial techniques for understanding their characteristics is to use graph representation. Both publicly available (Kanehisa, 2002) and commercial pathway databases (Schacherer et al. 2001) display retrieved pathways in the form of graphs to facilitate the users' comprehension of them. Considering the large numbers of pathways retrieved with various types of criterion according to biologists' purposes and the tediousness of manually drawing graphs, automatic layout algorithms are strongly required.

There exist rigorous studies in standard graph layout algorithms including circular, orthogonal or planar drawing, and force-directed heuristics (Battista et al., 1994; Battista et al., 1999; Brandenburg et al., 1997). However, none of these algorithms give satisfactory results when applied to pathway networks due to insufficient use of biological properties such as structural characteristics and subcellular localization information (Becker and Rojas, 2001; Li and Kurata, 2005; Wegner and Kummer, 2005).

Thus far, several types of drawing algorithms have been designed for biological pathways and they have been integrated in biological simulation software (Demir et al., 2002; Dogrusoz et al., 2006; Doi et al., 2003; Kurata et al., 2003, 2005; Nagasaki et al., 2003; Shannon et al., 2003).

Karp and Paley (1994) proposed the extraction of biological topologies such as linear, cyclic and branching pathways and their utilization as the backbone of the layout. For chemical reaction networks, the method described by Becker and Rojas (2001) uses the longest directed cycle drawn as the backbone of the layout to capture the flow of reactions. On the other hand, as small cycles are known to participate in important recycling processes, Wegner and Kummer (2005) proposed the use of recursively extracted small cycles as the backbone of the layout and their work has been integrated in Systems Biology Markup Language (SBML) with layout extensions (Gauges et al., 2006).

Several biological properties are considered in force-directed approaches. The use of edge directions and simple positional constraints (Dogrusoz et al., 2004; Genc and Dogrusoz, 2003) have been proposed for more general metabolic pathways. In GOlorize (Garcia et al., 2007), the extra attractive force is applied to vertices belonging to the same Gene Ontology class. An SBML layout extension, SBWAutoLayout (Deckard et al., 2006), employs the force-directed approach as its layout algorithm. However, force-directed approaches are not suitable for generating compact layouts of complex pathways (Li and Kurata, 2005). In addition, force-directed approaches have a difficulty in handling complicated positional constraints such as arranging some vertices only on a tours-shaped region.

Grid layout algorithms were originally proposed by Li and Kurata (2005), in which vertices composing the graph are mapped to grid points, and are arranged to minimize a cost function defined over all possible mappings. Cost functions comprise several components: vertex distances weighed according to the graph structure (Li and Kurata, 2005), edge–edge and vertex–edge crossings (Kato et al., 2005), and biologically preferable criterion such as aligning vertices having the same attribute (Kojima et al., 2007). Since even finding the layout with the minimum edge–edge crossings is NP-hard (Gary and Johnson, 1983) the basic grid layout algorithm repeatedly updates the layout by moving vertices one by one under a greedy search strategy, and a locally optimal layout is obtained after convergence, while its variants consider vertex position swappings (Kojima et al., 2007) or restrict the search space to stochastically chosen adjacents (Barsky et al., 2007). In addition, grid layout algorithms can deal with complicated positional constraints and thus they succeed in generating compact and biologically comprehensible layouts.

In this study, we propose a new grid layout algorithm intended for applications of biological pathway databases. We adopt the approach proposed by Kato et al. (2005), termed CB-grid layout, as the baseline of our algorithm because CB-grid layout first considers the edge–edge and vertex–edge crossings, which strongly influence the understandability of layouts. The CB-grid layout is, however, unsuitable for applications to pathway databases in the following reasons. (i) Although at each update step, all the differences in the cost between a current layout and its candidate adjacents are calculated efficiently using the calculation results from the previous step, current grid layout algorithms still require a high-computational time. In addition, the initial step is calculated inefficiently and has a large computational overhead, due to the unavailability of previous calculation results. Thus, current layout algorithms cannot be used for real-time drawing. (ii) Since the extant grid layout algorithms do not consider the edge directions, the resulting layouts lack traceability for biochemical reactions and gene regulations, which are very important characteristics of biological pathways.

In order to address the first problem, we propose a new calculation method for cost differences termed sweep calculation. In sweep calculation, costs changed by moving a vertex of interest are encoded, and then the cost differences corresponding to the movements are calculated by using the encoded data. Since both the encoding and decoding processes require less-time complexities than that of the method used in CB-grid layout, sweep calculation succeed in reducing the computational time of CB-grid layout, in particular that of the initial step.

For the second problem, we consider that the lack of traceability is caused by the differences in direction between in-edges and out-edges of each vertex. We introduce a new component to the cost function that employs the negative-inner product between the in-edges and out-edges on each vertex as the cost to penalize undesirable directions.

The remainder of this article is organized as follows. In Section 2, after presenting details of the basic grid layout algorithm, we describe sweep calculation and how to use this method for the calculation of the initial step. We also compare the time complexities of the CB-grid layout and our new grid layout algorithm. Section 3 introduces a new component of the cost function, the flow penalizing cost, and describes its efficient calculation. In Section 4, the performance of our approach is evaluated by using various pathway models from a biological pathway database. Section 5 summarizes our study.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 FLOW PENALIZATION AND...
 4 EXPERIMENTAL RESULTS
 5 CONCLUSION
 REFERENCES
 
2.1 Grid layout and extant search strategy
Given a graph G = (V,E) with vertices V and edges E and a grid of h rows and w columns, a layout L = (V,E,U,P) of G comprises the underlying graph G, w·h grid points U, and a function P:V->U such that Formula for any two distinct vertices v{alpha},vβisinV. Hereafter, we denote the degree of vertex v as Formula , the cardinality of a set as |·|, a set of edges connected to vertex v as Ev, a set of vertices adjacent to vertex v as Vv and an edge between vertices v and v' as e(v,v'). We define the following functions:

  • Formula : a binary function that returns 1 if an edge ei crosses an edge ej and 0 otherwise.
  • Formula : a binary function that returns 1 if an edge ej crosses a vertex vi and 0 otherwise.
  • Formula : a function that returns


    Formula 1

    (1)

where wvi,vj is the weight of a pair of vertices vi and vj according to the graph structure; md(vi,vj), the Manhattan distance between vi and vj; and sd, the saturation distance. For details see Li and Kurata (2005).

By using the above functions, the layout cost C(L) of L is defined as follows:


Formula 2

(2)
where Wee, Wne and Wdc are constant values used to adjust the effect of each component.

At each step, the algorithm calculates the costs of all layouts that can be generated by moving one of all vertices to one of all vacant points. The layout with the minimum cost is selected as a starting layout for the next step. After convergence, the algorithm outputs a locally optimal layout. Since inefficiently calculating all possible adjacent layouts requires a high-time complexity, the CB-grid layout algorithm uses a {Delta} matrix, whose entry {Delta}v,p stores the cost difference by moving vertex v to a vacant point p at the previous step, and reduces the time complexity at each step from O (w·h·(|V|2 + |E|2) to O(|V|2 + |E|2 + w·h·|Evβ| (|V| + |E|)), where vβ is the vertex moved at the previous step.

In addition, when vertices and grid points in the model hold subcellular-localization information as positional constraints, CB-grid layout limits vertices to be mapped to the grid points that satisfy their subcellular localizations.

In the following subsections, we introduce a new calculation method termed sweep calculation, which can efficiently calculate the {Delta} matrix without using the previous calculation results, and describe how to reduce the time complexity of the CB-grid layout.

2.2 Sweep calculation
As mentioned in the previous subsection, at the initial step, the {Delta} matrix is calculated inefficiently and O(w·h·(|V|2 + |E|2)) time is required due to the unavailability of the previous calculation results. Sweep calculation can calculate the same operation in Formula (|V|2 + |E|2) + w·h|V|) time, i.e. Formula times faster than the inefficient method, without increasing the space complexity. Below we show the principle of sweep calculation through Lemmas 1–5, and prove the time complexity of the above case in Proposition 1.

Assume that an edge e and a vertex v are lying on the grid, as shown in Figure 1a. We also assume that there exists an edge f connected to the vertex v. Edge f has a crossing with edge e only when the other end point of edge f is in the shaded region Re,v shown in Figure 1a. We call this shaded region Re,v the crossing region.


Figure 1
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. (a) Edge e and vertex v are lying on the grid. A crossing occurs only when another end point of f opposite to v lies in the shaded region. The shaded region always forms a convex set. (b) An image of a distance pattern on the grid. The digit on each grid point denotes the distance from a vertex v, which is defined in Equation (1).

 
Except for the case where v is on the extension of e, the crossing region is surrounded by edge e and two half-lines from the end points of e to the counterside of vertex v, as shown in Figure 1a. Since the size of the grid is finite in practice, the crossing region is also surrounded by the boundary of the grid. We define grid points with the same x-coordinate as vertical grid point set, i.e. (x,0),...,(x,h – 1). The crossing region has a convex form, as shown in Figure 1a, and thus the intersection of the vertical grid point set and the crossing region can be represented as (x,y1),...,(x,y2). We call the two bookending points (x,y1) and (x,y2) the upper crossing boundary point and lower crossing boundary point, respectively, and the set of these crossing boundary points for all x crossing boundary. We denote the crossing boundary induced by edge e and the edge connected to vertex v as Be,v. Note that some vertical grid point sets do not intersect the crossing region and hence have no crossing boundary point.

Each crossing boundary point is obtained in constant time because a crossing region is surrounded by at most six lines, and for each line, the y-coordinate value corresponding to a specific x-coordinate value is calculated in constant time. Without loss of generality, we assume that w ≤ h, and by following the above fact, we have the following lemma.

LEMMA 1.
Crossing boundary Be,v can be obtained in Formula time.

We introduce the following two functions associated with the crossing region Re,v and crossing boundary Be,v:

  • re,v(x,y): a binary function [0,w – 1] x [0,h 1]↦{0,1} that returns 1 if (x,y) is in the crossing region Re,v and 0 otherwise.
  • be,v(x,y): a function [0,w – 1] x [0,h – 1]↦{ – 1,0,1} that returns


    Formula


Note that re,v(x,y) and be,v(x,y) represent the same things as crossing region Re,v and crossing boundary Be,v, respectively. We implement these two functions as an h x w matrix.

LEMMA 2.
re,v can be obtained from be,v in O(w·h) time.

PROOF.
Let Formula . Since re,v(x,y) = s(x,y) and s(x,y) = s(x,y – 1) + Be,v(x,y), it is evidenced that re,v(x,0),..., re,v(x,h – 1) can be calculated in O(h) time. Thus, Re,v is obtained from Be,v in O(w·h) time.

LEMMA 3.
A function that returns the same values as Formula can be obtained in Formula time.

PROOF.
From the proof of Lemma 2, Formula Formula . Instead of preparing bei,vi, by preparing a h x w matrix whose every entry is 0, and adding non-trivial values to it, which are supposed to be used for each bei,vi, we obtain a function that returns the same value as Formula in O(w·h) time. Thus, by following Lemma 2 as well, Formula can be obtained in Formula time.

LEMMA 4.
Difference of the number of edge–edge crossings induced by moving a vertex v from (x,y) to (x',y') can be calculated by


Formula

The calculation of all the differences induced by the movement of v requires Formula time.

PROOF.
The former part of this lemma is obvious. We prove the latter part. Since {sum}v'isinVv{sum}eisinE\Evre,v' can be prepared in Formula time from Lemma 3 and each movement can be calculated in O(1) time, Formula time is required in total.

We can consider the case of vertex-edge crossings in a similar manner because in our setting, the regions of vertices are denoted by a rectangle, and thus we can detect the vertex–edge crossing by checking the edge–edge crossings between the edge of interest and lines surrounding the rectangular region of the vertex of interest. Assume that vertex v and edge e are lying on the grid and e is connected to vertex v'. By moving v' around, a crossing between v and e occurs when v' is in some region. On the other hand, by moving v around, a crossing between v and e occurs when v is in some different region. As in the case of edge–edge crossings, we call these regions crossing regions and denote the region in the former case as Formula and the region in the latter case as Formula . Further, we denote the crossing boundary for Formula as Formula and that for Formula as Formula .

Unfortunately, we cannot directly use the same strategy for the distance cost because the value of each grid point for the distance from a vertex v changes gradually with the distance from v until the distance attains the saturation distance, as shown in Figure 1b. Beyond the region bounded by the saturation distance, values corresponding to grid points are set to the same value or the saturation distance. Thus, we focus on the region bounded by the saturation distance, which we call the distance region, and introduce the new strategy. For simplicity, we consider that the distance region is completely contained in the grid. It is evidenced that the distance region is surrounded by four line segments induced by the saturation distance. We call these line segments the distance boundary and denote them as Formula . We define the upper and lower distance boundaries as in the case of crossings. We introduce the following functions, which are also implemented as h x w matrices:

  • drv(x,y): a function Formula that returns the distance between (x,y) and vertex v defined in Equation (1).
  • dbv(x,y): a function Formula that returns


    Formula


where vx and vy are the x and y-coordinate values of the grid point to which vx is mapped, respectively.

The following lemma shows that drv can be decoded from dbv.

LEMMA 5.
drv can be generated from dbv. This operation requires O(w·h) time.

PROOF.
All drv(x,y) are set to sd, which requires O(w·h) time. Let Formula and Formula . When Formula , the distance from v is decremented as y is incremented. On the other hand, when Formula , the distance from v is incremented as y is incremented. Since s(x,y) returns the distance between v and (x,y) subtracted from the distance between v and (x,y – 1), which satisfies this law, Formula . Since s(x,y) = s(x,y – 1) + dbv(x,y), t(x,y) = t(x,y – 1) + s(x,y), and drv(x,y) = drv(x,y – 1) + t(x,y), drv can be generated from dbv in O(w·h) time.

By using the facts from Lemmas 1–5, we present sweep calculation for the following cases and function initialization, which computes the initial step of the CB-grid layout using sweep calculation, as shown in Figure 2 with pseudo-codes.

  • Formula : a function that calculates the costs from the sum of the edge–edge crossings between evisinE';vsubEv and eisine' for all the mappings of v to a point.
  • Formula : a function that calculates the costs from the sum of the vertex–edge crossings between evisinE';vsubEv and v'isinV' for all the mappings of v to a point.
  • Formula : a function that calculates the costs from the sum of the vertex–edge crossings between v and e'isinE' for all the mappings of v to a point.
  • Formula : a function that calculates the sum of the distance costs between v and v'isinV' for all the mappings of v to a point.
In the pseudo-codes, the following matrices and functions are newly introduced:
  • R: A matrix of h rows and w columns. r(j, i) denotes the ith row and jth column of R and corresponds to a grid point of the ith row and jth column. We also use the notation r(p) to denote an element of R corresponding to grid point p.
  • B: A matrix of h – 1 rows and w – 1 columns. b(j, i) denotes the ith row and jth column of B and corresponds to a grid point of the ith row and jth column. We also use the notation b(p) to denote an element of B corresponding to grid point p.
  • Formula : a function that sets all values in matrix M to value. This function requires O(w·h) time.
  • Formula : Let (i,yu) be an upper boundary point and (i,yl – 1) be a lower boundary point in Be,v. This function increments b(i,yu) and decrements b(i,yl). This function requires O(1) time.
  • Formula : Let (i,yu) be an upper boundary point and (i,yl – 1) be a lower boundary point in Formula . Note that e must be connected to v'. This function increments b(i,yu) and decrements b(i,yl). This function requires O(1) time.
  • Formula : Let (i,yu) be an upper boundary point and (i,yl – 1) be a lower boundary point in Formula . This function increments b(i,yu) and decrements b(i,yl + 1). This function requires O(1) time.
  • Formula : Let (i,yu) be an upper boundary point and (i,yl – 1) be a lower boundary point in Formula . This function subtracts wv,v' from b(i,yu) and b(i,yl + 1), and adds 2wv,v' to b(i,vy). This function requires O(1) time.


Figure 2
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Pseudo-codes of the functions of sweep calculation and the initial step of the CB-grid layout using sweep calculation.

 
In the initialization, for each vertex visinV, all the values in R are set to 0 by Formula and Formula , Formula , Formula and Formula are calculated. In Formula , B is initialized by Formula and the crossing boundaries are recorded in B. The cost from the edge–edge crossings is then calculated from the crossing boundaries by following Lemma 3 and it is summed up to R. In Formula and Formula , the vertex–edge crossings are calculated in a manner similar to that of Formula . In Formula , all values in B are set to sd by Formula and the distance boundaries are recorded in B. The distance cost is then calculated from the crossing boundaries by following Lemma 5 and it is summed up to R. After the calculation of these four functions, the differences of the cost between the current layout and its adjacents are calculated and stored in the {Delta} matrix by using the values in R. During this process, the function selects the best movement of vertex v_min to the grid point q and finally returns the best movement and its corresponding cost difference d_min.

The following proposition concludes the time and space complexity of the initialization.

PROPOSITION 1.
Initialization requires Formula (w·h)|V|) time and O(w·h) space.

PROOF.
First, we focus on vertex v and show the time complexity corresponding to vertex v at each step. Since Formula , Formula , Formula and setDistanceBoundary require constant time and Formula requires O(w·h) time, Formula , Formula , Formula and Formula require Formula , Formula , and Formula time, respectively. In addition, the calculation of all cost differences requires O(w·h) time, and the time complexity required for vertex v amounts to Formula Formula time.Thus, by summing up Formula +w·h) time with respect to v, we observe that the initialization requires Formula time, where we use the fact that Formula .Since O(w·h) space is required to store the data of R and B, sweep calculation requires O(w·h) space.

2.3 Grid layout algorithm with sweep calculation
The previous subsection shows the efficient calculation of the initial step in the CB-grid layout using sweep calculation. At the update step, the time complexity is mainly dependent on the crossings related to a vertex of interest v and a vertex moved at the previous step vβ, which cannot be calculated by the {Delta} matrix:

  • edge–edge crossing between evisinEv and evβisinEvβ: O(|Ev||Evβ|) time for each movement.
  • vertex–edge crossing between evisinEv and vβ: O(|Ev|) time for each movement.
  • vertex–edge crossing between evβisinEvβ and v: O(|Evβ|) time for each movement.
  • edge–edge crossing between edge e(v,vβ) and E\{Ev{cup}Evβ} and vertex-edge crossing between edge e(v,vβ) and V\{v,vβ} if edge e(v,vβ) exists: O(|E|) and O(|V|) time for each movement.

The calculations for these cases require a total time of Formula at each step. Sweep calculation can be applied to these calculations and it reduces the time complexity to Formula .

However, when the vertex is moved to the point to which vβ was mapped at the previous step, the cost difference is calculated inefficiently since the cost difference for the point was not calculated at the previous step. Thus, this case requires a total time of O(|E|2) at each step.

As is done in Kojima et al. (2007) by defining the cost differences corresponding to occupied points, this case can be efficiently calculated as well. This is because when two vertices are mapped to the same point, we can calculate the cost in the same manner as a layout in which no two vertices are mapped to the same point, and the cost differences corresponding to the occupied points are also defined in the same manner as those for vacant points. Note that if an edge passes sufficiently close to a grid point to which two vertices are mapped to and the edge makes crosses with both vertices, the number of crossings is counted as two.

Due to its definition, we observe that the cost differences obtained by sweep calculation include the cases of occupied points, and thus we can use the cost differences obtained by sweep calculation directly to update the cost differences for the occupied points.

By summarizing the above definition and properties, we show the new grid layout algorithm, CBS-grid layout (CB-grid layout with sweep calculation), with the pseudo-code in Figure 3.


Figure 3
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Pseudo-code of CBS-grid layout.

 
The next proposition gives the time complexity of the CBS-grid layout.

PROPOSITION 2.
At each step, the CBS-grid layout shown in Figure 3 calculates all the cost differences as in the case of the CB-grid layout in Formula time.

PROOF.
As discussed abov, since sweep calculation requires Formula time and by using the data from the {Delta} matrix and sweep calculation, the calculation of each cost difference requires a constant time, as shown in Figure 3, the CB-grid layout with sweep calculation requires Formula w·h·|V|) time.

2.4 Comparison of time complexities
We compare the time complexities of the CB and CBS-grid layouts. It is assumed that |V| ≤ |E|, w·h is proportional to |V|, and all vertices are equally selected to update the layout on average. At each step, the CB-grid layout requires Formula , where vβ is the vertex moved at the previous step (Kato et al., 2005; Kojima et al., 2007), while the CBS-grid layout requires Formula +(w·h)|V|) time.

Since from the above assumptions, Formula equals to Formula and w·h = C|V|, where C is a constant, the time complexities of the CB and CBS-grid layouts are Formula and Formula , respectively. Thus, for the case where Formula , the second term of the time complexity of the CBS-grid layout dominates and the CBS-grid layout is Formula times faster than the CB-grid layout, and Formula times faster otherwise.

Since the CB-grid layout calculates the cost differences inefficiently at the initial step, its time complexity is O(w·h·|E|2), while CBS-grid layout requires Formula . Arranging these time complexities by using the above assumptions, we observe that the CBS-grid layout is Formula times faster than the CB-grid layout at the initial step.

The method counting edge–edge and vertex–edge crossings in CB-grid layout is naïve one. In the supplementary file, we compare the time complexity of sweep calculation with those of algorithms studied in computational geometry (Akman et al., 1989; Chazelle et al., 1986; Cheng et al., 1991; Palazzi and Snoeyink, 1993; Tunkelang et al., 1994). Here, we just give the conclusion of the comparison. If the average degree of the model can be bounded by Formula , sweep calculation is the most efficient of all them. In general, this assumption is reasonable for biological networks due to their small average degree.


    3 FLOW PENALIZATION AND ITS CALCULATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 FLOW PENALIZATION AND...
 4 EXPERIMENTAL RESULTS
 5 CONCLUSION
 REFERENCES
 
In biological pathways, edges have directions that represent the biochemical reactions and gene regulations. Consider the layout shown in Figure 4a. In general, we can easily trace the directions of edges in the layout since edges connected to a vertex have similar directions. In contrast, when edges have completely different directions as compared to each other, we may face a difficulty in tracing the directions, as shown in Figure 4b. Thus, considering these properties, as a component for the cost function, we define the new function that penalizes undesirable edge directions, as shown in Figure 4b:


Formula 3

(3)
where Formula and Formula are sets of in-edges and out-edges connected to v, respectively; Formula , a vector associated with edge e; operation Formula , the inner product; and Formula , the length of vector Formula .


Figure 4
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Examples of (a) rational edge directions and (b) undesirable edge directions. In (a) edges connected to vertex v have similar directions, and we may easily capture the flow represented by the directions, while the example shown in (b) lacks the traceability of flows due to the different directions.

 
Note that for edge e of length 0, let Formula be Formula . By using this function, we redefine the cost function given in Equation (2) as:


Formula 4

(4)
where Wfl is a constant value called the flow penalizing weight.

The movement of vertex v changes Formula and Formula related to edge e(v,v'), where v' is each adjacent vertex of v. Since Equation (4) can be rearranged as


Formula 5

(5)
if Formula and Formula are pre-calculated, this type of change requires Formula for Formula and O(1) for Formula . Thus, the cost difference for Formula is calculated in Formula time at each step. However, the calculation method for Formula still increases the time complexity, and thus we cache flow penalizing costs corresponding to each movement of a vertex and change only the costs influenced by the update of the layout to reduce the time complexity. When vβ was moved at the previous step, the cached flow penalizing costs to be updated are corresponding to vβ, vertices adjacent to vβ and vertices adjacent to v'isinVvβ, where Vvβ is the set of vertices adjacent to vβ. Therefore, the time complexity of the update for the cached flow penalizing cost amounts to Formula and hence it does not increase the time complexity of the grid layout algorithm. We call the CBS-grid layout with the cost function given in Equation 4 the CBSF-grid layout.


    4 EXPERIMENTAL RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 FLOW PENALIZATION AND...
 4 EXPERIMENTAL RESULTS
 5 CONCLUSION
 REFERENCES
 
We use 95 pathway models of various sizes from TRANSPATH pathway database (Schacherer et al., 2001) to compare the performance of the CB-grid, CBS-grid and CBSF-grid layouts. The largest model contains 316 vertices and 592 edges. For each model, we set the numbers of rows w and columns h of a grid to Formula and Formula , respectively. These pathway models do not hold subcellular localization information.

From repeated trials, we empirically decided the weights for the cost function: W_dc = 100, W_ee = 100, W_ve = 150, and W_fl = 100.

4.1 Comparison of running time
We generate 10 layouts for each model by randomly mapping vertices to the grid without their overlaps and positional violations. These are preprocessed by the Eades grid layout algorithm (Kojima et al., 2007) with 100 iterations and they are then used as initial layouts for the grid layout algorithms. Note that the Eades grid layout on average requires <1s even for the largest model.

For the comparison of the running time, we evaluate the CB, CBS and CBSF-grid layouts. All the algorithms were implemented in Java and experiments were performed on Xeon 3.6 GHz with 4GB RAM.

The running time of each grid layout algorithm is summarized in Figure 5a. These running time results include the time consumed by the Eades grid layout and are averaged over the 10 layouts for each model. From the comparison, we observe that the CB-grid layout is significantly slower than any other grid layout algorithm. The CBS-grid layout is the fastest among them, but since the CBS and CBSF-grid layouts have the same time complexity, there is no large gap among them in terms of running time.


Figure 5
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Comparison of the running time of each grid layout algorithm. (a) The running time of the CB, CBS and CBSF-grid layouts for pathway models of various sizes. (b) The ratio of the running time of the CB and CBS-grid layouts (CB/CBS) and the ratio of the running time of the CB-grid layout normalized by the squared average degree of each model and that of the CBS-grid layout (CB(norm)/CBS).

 
We show the ratio of the running time of the CB and CBS-grid layouts (CB/CBS in Figure 5b) and the ratio of the running time of the CB-grid layout normalized by the average degree of each model and the running time of the CBS-grid layout (CB(norm)CBS in Figure 5b). For small models containing <30 vertices, the CB and CBS-grid layouts do not have large differences, while the CBS-grid layout is more than 10 times faster than the CB-grid layout for relatively large models containing more than 200 vertices. This is because the overhead time dominates in the case of a short-running time.

For the case of CB(norm)/CBS, we observe that as the size of the model increase, its ratio becomes constant. Since biological pathway models are known to have relatively small degrees, from the discussion in Section 2.4, the running time of the CB-grid layout normalized by the squared average degree of each model is expected to be proportional to that of the CBS-grid layout. This assessment agrees with the experimental results.

4.2 Effect of flow penalization
We say that a vertex has undesirable direction edges if no line passing through the vertex can divide the edges connected to the vertex into in-edges and out-edges. Note that in this definition vertices of degree <4 never have undesirable direction edges. We count the number of vertices having undesirable direction edges in the resulting layouts of CBS and CBSF-grid layout obtained through the previous section process. From the comparison shown in Figure 6, we observe that flow penalizing cost reduces the number of vertices with undesirable direction edges all over the models from TRANSPATH.


Figure 6
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. Comparison of the numbers of vertices having undesirable directions edges between CBS and CBSF-grid layouts. We count the number of vertices having undesirable direction edges in the resulting layouts of CBS and CBSF-grid layouts.

 
4.3 Comparison with a force-directed approach
We compare our approach with a force-directed approach termed Jiggle proposed by Tunkelang (1998). Following settings are used in Jiggle:
  • Conjugate-gradient method is selected as the optimization process due to its efficiency.
  • Iteration count is restricted to 1000 since according to the stopping criterion in Tunkelang (1998) Jiggle iterates around 1000 times for the largest model we use.

Figures 7a, b, respectively shows the layouts of CBSF-grid layout and Jiggle for endothelial cell model (Pober et al., 2002), which consists of 318 vertices and 371 edges. Note that subcellular localizations are removed in the resulting layout of Jiggle since Jiggle, does not use them.


Figure 7
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 7. Comparison of CBSF-grid layout and Jiggle using the endothelial cell model. (a) A resulting layout of CBSF-grid layout. (b) A resulting layout of Jiggle. Since Jiggle does not use subcellular localizations, they are removed from the layout in (b).

 
As stated in Li and Kurata (2005), due to the definition of distance cost, vertex clusters are formed according to the topological structure of the model in the layout of CBSF-grid layout, thus facilitate to capture the relationship among these topologically decided clusters. In addition, consideration of subcellular localizations and edge directions helps understand the system of the model.

We also compare the running time of CBS and CBSF-grid layouts and Jiggle as in Figure 8 using TRANSPATH models under the same condition of subsection Comparison of the running time. From the comparison our methods are competitive with Jiggle for models of <250 vertices. Although for models of >250 vertices our methods take more running time than Jiggle, their running time is sustainable in practical use.


Figure 8
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 8. Comparison of the running time of CBS and CBSF-grid layouts and Jiggle.

 
In the supplementary file we also give the layouts of CBSF-grid layout and Jiggle for Caenorhabditis elegans cell-fate model (Saito et al., 2006).


    5 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 FLOW PENALIZATION AND...
 4 EXPERIMENTAL RESULTS
 5 CONCLUSION
 REFERENCES
 
For the purpose of applications to pathway databases, we develop grid layout algorithms and address two issues: computational time and traceability of the flow. We devise sweep calculation and reduce the time complexity of existing grid layout algorithms, although for brevity, our description was limited to the CB-grid layout. Experiments using various models from an actual pathway database prove that our new grid layout algorithm is much faster than the existing grid layout algorithm, and is sufficiently fast for the real-time drawing of models containing <300 vertices. By using flow penalization, we reduce undesirable edge directions that cause a difficulty in tracing the flow of biochemical reactions and gene regulations.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Jonathan Wren

Received on January 17, 2008; revised on March 7, 2008; accepted on April 16, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 FLOW PENALIZATION AND...
 4 EXPERIMENTAL RESULTS
 5 CONCLUSION
 REFERENCES
 

    Akman V, et al. Geometric computing and uniform grid technique. Comput. Aided Des (1989) 21:410–420.[CrossRef]

    Barsky A, et al. Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation. Bioinformatics (2007) 23:1040–1042.[Abstract/Free Full Text]

    Battista DG, et al. Annotated bibliography on graph drawing algorithms. Comput. Geom. Theor. Appl (1994) 4:235–282.

    Battista DG, et al. Graph Drawing: Algorithms for the Visualization of Graphs. (1999) New Jersey: Prentice Hall.

    Becker MY, Rojas I. A graph layout algorithm for drawing metabolic pathways. Bioinformatics (2001) 17:461–467.[Abstract/Free Full Text]

    Brandenburg F-J, et al. Algorithmenzum automatischen Zeichnen von Graphen. Inform. Spektrum (1997) 20:199–207.[CrossRef]

    Chazelle B. Reporting and counting segment intersections. J. Comput. Syst. Sci (1986) 32:156–182.[CrossRef]

    Cheng SW, Janardan R. Space-efficient ray-shooting and intersection searching: algorithms, dynamization, and applications. In: In Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms. (1991) San Francisco, USA: Society for Industrial & Applied. 7–16.

    Deckard A, et al. Supporting the SBML layout extension. Bioinformatics (2006) 22:2966–2967.[Abstract/Free Full Text]

    Demir E, et al. PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways. Bioinformatics (2002) 18:996–1003.[Abstract/Free Full Text]

    Dogrusoz U, et al. A compound graph layout algorithm for biological pathways. In: In Proceedings of the 12th International Symposium on Graph Drawing. (2004) New York City, USA: Springer-Verlag. 442–447.

    Dogrusoz U, et al. PATIKAweb: a Web interface for analyzing biological pathways through advanced querying and visualization. Bioinformatics (2006) 22:374–375.[Abstract/Free Full Text]

    Doi A, et al. Genomic Object Net: II. Modelling biopathways by hybrid functional Petri net with extension. Appl. Bioinformatics (2003) 2:185–188.[Medline]

    Genc B, Dogrusoz U. A constrained, force-directed layout algorithm for biological pathways. In: In Proceedings of the 11th International Symposium on Graph Drawing. (2003) Rerugia, Italy: Springer-Verlag. 314–319.

    Garcia O, et al. GOlorize: a cytoscape plug-in for network visualization with Gene Ontology-based layout and coloring. Bioinformatics (2007) 23:394–396.[Abstract/Free Full Text]

    Gary MR, Johnson DS. Crossing number is NP-complete. SIAM J. Algebra. Discr (1983) 4:312–316.[CrossRef]

    Gauges R, et al. A model diagram layout extension for SBML. Bioinformatics (2006) 22:1879–1885.[Abstract/Free Full Text]

    Kanehisa M. The KEGG database. Novartis Found Symp (2002) 247:91–101.[Web of Science][Medline]

    Karp PD, Paley SM. Automated drawing of metabolic pathways. In: In Proceedings of the 3rd International Conference on Bioinformatics and Genome Research. (1994) Florida, USA: World Scientific Pub. Co. Inc. 225–238.

    Kato M, et al. Automatic drawing of biological networks using cross cost and subcomponent data. Genome Inform (2005) 16:22–31.

    Kojima K, et al. An efficient grid layout algorithm for biological networks utilizing various biological attributes. BMC Bioinformatics (2007) 8:1–16.[CrossRef][Medline]

    Kurata H, et al. CADLIVE for constructing a large-scale biochemical network based on a simulation-directed notation and its application to yeast cell cycle. Nucleic Acids Res (2003) 31:4071–4084.[Abstract/Free Full Text]

    Kurata H, et al. CADLIVE dynamic simulator: direct link of biochemical networks to dynamic models. Genome Res (2005) 15:590–600.[Abstract/Free Full Text]

    Li W, Kurata H. A grid layout algorithm for automatic drawing of biochemical networks. Bioinformatics (2005) 21:2036–2042.[Abstract/Free Full Text]

    Nagasaki M, et al. Genomic Object Net: I. A platform for modelling and simulating biopathways. In: Appl. Bioinformatics. (2003) 2:181–184.[Medline]

    Palazzi L, Snoeyink J. Counting and reporting red/blue segment intersections. CVGIP: Graph. Model. Im (1993) 56:304–310.

    Pober JS. Endothelial activation: intercellular signalizng pathways. Arthritis Res (2002) 4:S109–S116.[CrossRef][Medline]

    Saito A, et al. Cell fate simulation model of gustatory nuerons with microRNAs double-negative feedback loop by hybrid functional Petri net with extension. Genome Inform (2006) 17:100–111.

    Schacherer F, et al. The TRANSPATH signal transduction database: a knowledge base on signal transduction networks. Bioinformatics (2001) 17:1053–1057.[Abstract/Free Full Text]

    Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res (2003) 13:2498–2504.[Abstract/Free Full Text]

    Tunkelang D. A practical appraoch to drawing undirected graphs. In: Technical Report CMUCS-94-161. (1994) Carnegie Mellon University, School of Computer Science.

    Tunkelang D. JIGGLE: Java interactive graph layout environment. In: In Proceedings of the 6th International Symposium on Graph Drawing. (1998) Montréal, Canada: Springer-Verlag. 412–422.

    Wegner K, Kummer U. A new dynamical layout algorithm for complex biochemical reaction networks. BMC Bioinformatics (2005) 6:1–12.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/12/1433    most recent
btn196v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kojima, K.
Right arrow Articles by Miyano, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kojima, K.
Right arrow Articles by Miyano, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?