Skip Navigation


Bioinformatics Advance Access originally published online on February 15, 2005
Bioinformatics 2005 21(10):2171-2176; doi:10.1093/bioinformatics/bti327
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2171    most recent
bti327v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Nguyen, C. T.
Right arrow Articles by Zhang, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nguyen, C. T.
Right arrow Articles by Zhang, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Divide-and-conquer approach for the exemplar breakpoint distance

C. Thach Nguyen 1, Y. C. Tay 1,2 and Louxin Zhang 2,*

1School of Computing, National University of Singapore Singapore 117543
2Department of Mathematics, National University of Singapore Singapore 117543

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 THE EXEMPLAR DISTANCE...
 3 INDEPENDENCE OF GENE...
 4 ALGORITHMS
 5 EXPERIMENTAL TESTS
 6 CONCLUSION
 REFERENCES
 

Motivation: A one-to-one correspondence between the sets of genes in the two genomes being compared is necessary for the notions of breakpoint and reversal distances. To compare genomes where there are paralogous genes, Sankoff formulated the exemplar distance problem as a general version of the genome rearrangement problem. Unfortunately, the problem is NP-hard even for the breakpoint distance.

Results: This paper proposes a divide-and-conquer approach for calculating the exemplar breakpoint distance between two genomes with multiple gene families. The combination of our approach and Sankoff's branch-and-bound technique leads to a practical program to answer this question. Tests with both simulated and real datasets show that our program is much more efficient than the existing program that is based only on the branch-and-bound technique.

Availability: Code for the program is available from the authors.

Contact: matzlx{at}nus.edu.sg


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 THE EXEMPLAR DISTANCE...
 3 INDEPENDENCE OF GENE...
 4 ALGORITHMS
 5 EXPERIMENTAL TESTS
 6 CONCLUSION
 REFERENCES
 
In the theory of genome rearrangement, a genome is usually modeled as a permutation (or equivalently a linear order) on a set of genes and the breakpoint or reversal distances are used for evolutionary inference (Nadeau and Taylor, 1984; Watterson et al., 1982; International Mouse Genome Consortium, 2002; Moret et al., 2005). The breakpoint and signed reversal distances were introduced by Watterson et al. (1982) and Sankoff (1989) respectively. Since then, efficient algorithms for these and other measures have been studied extensively [see Kececioglu and Sankoff (1994); Sankoff and Blanchette (1998) and El-Mabrouk (2005) for a complete bibliography]. The breakpoint distance is easily calculated in linear time. However, the permutations is polynomial-time computable only for the signed reversal distance (Hannenhalli and Pevzner, 1999; Caprara, 1997; Kaplan et al., 2000).

A one-to-one correspondence between the sets of genes in the two genomes is necessary for the notions of both breakpoint and reversal distances. While this may be appropriate for small viruses and mitochondria genomes, it becomes problematic when applied to eukaryotic genomes where paralogous genes exist. Hence, several approaches have been proposed for comparison of genomic sequences with gene families. In inferring phylogeny from genome rearrangement data, Tang and Moret (2003) assumed that the size of each gene family is the same in each genome and simply treated each gene member in a gene family as a distinct gene. Similarly, Chen et al. (2005) also assumed that each gene family has the same size in each genome by removing from a gene family those members that were the least homologous to the members of the counterpart family. Under the assumption of equal content, they proposed a graph-based heuristic algorithm for computing the assignment of orthologous genes that minimizes the reversal distance between two genomes. Finally, a more generalized version of the genome rearrangement problem was formulated by Sankoff (1999) where the restriction of the identical size of each gene family is removed. His idea is to delete all but one member of a gene family in each of the two genomes, so as to minimize the breakpoint or reversal distance between the two reduced genomes, called exemplars. Such an approach is clearly relevant in the phylogenetic context where the object is to reconstruct the ancestral gene order and the size of each gene family had been changed due to gene duplication/deletion (Sankoff and El-Mabrouk, 2000; El-Mabrouk, 2005).

In the current paper, we study efficient algorithms for the exemplar problem for genomes with gene families. Formally, the exemplar problem with the breakpoint distance is, given two genomes and , to find the minimum breakpoint distance between g and h overall choices of examplars g and h of and respectively. Even though it is almost trivial to calculate the breakpoint distance between two genomes with single-gene families, the exemplar breakpoint distance problem is not only NP-hard (Bryant, 2000), but also unlikely to have polynomial-time constant-ratio approximation algorithm unless NP = P (C.T.Nguyen, Unpublished manuscript). By observing the monotonicity of the exemplar distances, Sankoff proposed a branch-and-bound method to tackle this problem. The idea is to work on one gene family separately, choosing the pair of exemplars that least increases the distance when inserted into the partial exemplar genomes already constructed.

Motivated by the above idea, we propose to calculate the exemplar breakpoint distance by partitioning the gene families into disjoint subsets such that two gene families in different subsets are ‘independent’. Then, we find the pair of exemplars of gene families in each subset at a time, and finally merge all the exemplars together to obtain two optimal exemplars of the given genomes. Tests with both simulated and real datasets show that our divide-and-conquer approach is much more efficient than the branch-and-bound approach.

The paper is organized into six parts. Section 2 provides the needed definitions and notations. Section 3 introduces the concept of independence of gene families and proves a theorem that characterizes the relationship between the independence of gene families and the breakpoint distance. Section 4 presents our divide-and-conquer approach that is based on the characterization theorem established in Section 3, and Section 5 reports our test results. Section 6 concludes with a few remarks on future research.


    2 THE EXEMPLAR DISTANCE PROBLEM
 TOP
 Abstract
 1 INTRODUCTION
 2 THE EXEMPLAR DISTANCE...
 3 INDEPENDENCE OF GENE...
 4 ALGORITHMS
 5 EXPERIMENTAL TESTS
 6 CONCLUSION
 REFERENCES
 
In the study of genome rearrangement with gene duplication, a genome is modeled as a string of signed (+ or –) symbols from an alphabet A, which represents the gene families. For a symbol a A, all its occurrences in genome constitute a gene family; and the number of its occurrences is the size of the gene family.

If for all a A, then is a ‘permutation’ on the symbols in A. Consider two permutation genomes G' = g1,g2,...,gn and H' = h1,h2,...,hn. We say gi precedes gi+1 in G' and hi precedes hi+1 in H' for i ≤ n – 1. If gene a precedes gene b in G', but neither a precedes b nor –b precedes –a in H', they determine a ‘breakpoint’ in G'. Additional breakpoints are posited if g1 != h1 and if gn != hn. The number of breakpoints, dB(G', H'), in G' is called the breakpoint distance between G' and H' since dB(G', H') = dB(H', G').

A reversal operation on a segment ab···cd of a genome ···ab···cd··· transforms the genome to ···dc···ba···. For example, genome 3214 can be transformed into 4 –3 –1 –2 in two reversals as follows: 3214 -> 3 –4 –1 –2 -> 4 –3 –1 –2. The reversal distance} between G' and H', denoted by dR(G', H'), is the minimum number of reversals necessary to transform G' into H' or vice versa.

Given two genomes and (on alphabet A) satisfying and for all a A. An exemplar of a genome is obtained by deleting all but one occurrence of each gene family. Note that an exemplar string is a permutation. Let denote the set of all exemplars of . Then, we define the exemplar breakpoint distance (EBD) between and as

Similarly, the exemplar reversal distance (ERD) is

In the rest of the paper, we shall focus on deriving an efficient algorithm for finding given and over an alphabet.


    3 INDEPENDENCE OF GENE FAMILIES
 TOP
 Abstract
 1 INTRODUCTION
 2 THE EXEMPLAR DISTANCE...
 3 INDEPENDENCE OF GENE...
 4 ALGORITHMS
 5 EXPERIMENTAL TESTS
 6 CONCLUSION
 REFERENCES
 
Let and be two genome strings with gene family set A. For gene family a, each occurrence in and each occurrence in form an occurrence pair. If , a is called a single-pair family; otherwise, it is called a multi-pair family. For example, in genomes

there are three multi-pair families: x, y and z. We shall number the occurrences of a gene family g from left to right in a genome and use and to denote the i-th copy of g in and respectively.

A permutation on a subset A' of gene families is called a partial exemplar of if it can be obtained by deleting all the occurrences of each gene family in A\A' and deleting all but one occurrence of each family in A'. For instance, in the above example, the six single-pair gene families a, b, c, d, e, l and r form a partial exemplar to labcder of and lb aedcr of . For an occurrence of a gene a A\A' in , we use to denote the resulting partial exemplar of after is inserted into . Similarly, we define for a set S of occurrences of different gene families. We let if a A'. In the rest of this section, we assume that and are partial exemplars of and , respectively.

For two gene families a and b, we say their consecutive occurrences and in are adjacent in and if and only if they do not form a breakpoint in and .

DEFINITION 1
Assume a and b are two gene families at least one of which does not occur in and .
  1. Two occurrence pairs and in and are said to be possibly adjacent with respect to and if and only if and are adjacent in and and are adjacent in .
  2. Let c and d be an adjacent pair in and . An occurrence pair of a kills (c, d) if and only if (c, d) is no longer adjacent in and .
  3. Let r and s be two possibly adjacent pairs with respect to and . A pair blocks r and s with respect to and if and only if r and s are not possibly adjacent with respect to exemplars and .

Take the partial exemplars ab c d e r of and bae dc r of as an example. Pairs and are possibly adjacent with respect to and ; and kill adjacent pair (c,–d) in ; and blocks and (–d, d) with respect to and .

DEFINITION
Assume a and b are two different gene families that do not occur in and .
  1. Two occurrence pairs of a and b are independent with respect to and if and only if (1) they are not possibly adjacent with respect to and , (2) they do not block each other from forming an adjacency with another pair with respect to and and (3) they do not kill the same adjacenct pair in and .
  2. Two gene families a and b are independent with respect to and if and only if any occurrence pair of a and occurrence pair of b are independent with respect to and .

For a gene pair of a gene family a that does not occur in and , we use and to denote and , respectively. Similarly, we can define and for set P of occurrence pairs that are from different gene families.

LEMMA 1
Let P and Q be sets of occurrence pairs of different gene families that are not in and . If any p P and q Q are independent with respect to and ,then

where , i.e. it denotes the change in the number of breakpoints after all the occurrence pairs in P are inserted.

PROOF
Let a and b be two consecutive signed symbols in . Let {delta}ab(X) denote the change in the number of breakpoints between a and b after all the occurrence pairs in X are inserted in and for X = P, Q, P {cup} Q. Obviously, for each X, {Delta}(X) = {Sigma}(a,b){delta}ab(X). Thus we only need to prove the following equation:

(1)
for any consecutive symbols a and b in . To this end, we consider the following two cases.

Case 1. The pair (a,b) is adjacent in . After inserting all pairs in P {cup} Q, (a, b) is still adjacent in both and or separated by some occurrence pairs in or . In the former case, we have {delta}ab(P {cup} Q) = {delta}ab(P) = {delta}ab(Q) = 0 and hence Equation (1) holds. In the latter case, we have the following facts.

CLAIM 1
a and b can only be separated by either some pairs in P or some pairs in Q, but not both in and

PROOF
Assume that a and b are separated by some pairs in P and some pairs in Q. Suppose and such that and are inserted between a and b in in the following configuration: . Then, both p and q kill the same adjacency (a, b) in and , contradicting the independence hypothesis.

Suppose and such that only is inserted between a and b in and is inserted between a and b (or –b and –a) in . Again, both p and q kill the same adjacency (a, b) in , contradicting the independence hypothesis. This finishes the proof of Claim 1.

We may assume that a and b are separated by k pairs (1 ≤ i ≤ k) in P after P {cup} Q are inserted: a b is a substring of . For simplicity, we define and . Then, we have the following.

CLAIM 2
and are adjacent in and if and only if they are adjacent in and .

PROOF
Assume that and are adjacent in and . If they are not adjacent in and , there must be an occurrence pair in Q such that separates and . In other words, blocks and . Since at least one of and is in P, this contradicts the independence hypothesis. Thus and are adjacent in and .

On the other hand, it is obvious that if and are adjacent in and , they are adjacent in and . This finishes the proof of Claim 2.

By Claim 2, {delta}ab(Q) = 0 and {delta}ab(P {cup} Q) = {delta}ab(P). Hence, Equation (1) holds.

Case 2. The pair (a,b) is a breakpoint in . Denote by tX the number of gene occurences between a and b in , and by uX the number of adjacent pairs between a and b in . Obviously, tPQ = tP + tQ. Moreover, uPQ = uP + uQ since a pair in P can neither block a pair in Q to form adjacency with other pairs nor form an adjacency with a pair in Q and vice versa. Therefore, we have that {delta}ab(X) = (tX + 1 – uX) – 1 = tX uX, and thus

This concludes the proof.

Using the above lemma, we obtain the following.

THEOREM 1
Let A' and A'' be sets of gene families that do not occur in and such that any a' A' and a'' A'' are independent with respect to and . If {pi|i ≤ | A'|} and {qi|i ≤ |A''|} are occurrence pairs of gene families in A' and A'' such that and reach the minimum values over all the choices of occurrence pairs, then has the minimum value over all the occurrence pairs of A' and A''.

The occurrence pairs pi are called the optimal representative pairs of A' with respect to and .


    4 ALGORITHMS
 TOP
 Abstract
 1 INTRODUCTION
 2 THE EXEMPLAR DISTANCE...
 3 INDEPENDENCE OF GENE...
 4 ALGORITHMS
 5 EXPERIMENTAL TESTS
 6 CONCLUSION
 REFERENCES
 
Divide-and-Conquer approach. Following the discussion in the previous section, we propose to calculate in two steps:

  1. Form partial exemplars and with all the single-pair gene families, and then divide multi-pair gene families into disjoint subsets that are independent with respect to and .
  2. For each gene family subset found in step (1), find its optimal representative pairs. Obtain the optimal exemplars of and by taking the union of all the optimal representative pairs.

To partition the gene families into disjoint subsets that are independent, we define a graph , called the dependence graph of the gene families. Each vertex in represents a multi-pair family. Two vertices are joined by an edge if and only if they are not independent with respect to and . By Definition, one can efficiently determine whether two families are ‘independent’ or not. Noting that gene families in different connected components are independent, we just partition the gene families according to the connected components of in Step (1).

In Step (2), we use Sankoff's program (1999) to find optimal exemplar occurrence pairs for each gene family subset and call it SimplBB.

Approximating algorithm. Dividing multigene families into disjoint subsets does not always solve the problem. For large genomes, these subsets are still too large and finding their optimal representative pairs is forbidding.

To speed up the procedure, we delete a small subset X of vertices to split each connected component into smaller ones with an appropriate size in the graph. By these deletions, we trade accuracy for efficient computation. Hence, we only obtain tight lower and upper bounds of . Assume that , are the partial genomes obtained by removing the gene families corresponding to vertices in X. Since the breakpoint distance is monotone (Sankoff, 1999), is a lower bound of . On the other hand, we also obtain an upper bound of by finding the breakpoint distance between and , where and are the optimal exemplars of and , respectively. When X is small, these bounds are tight.

The deletion can be done solely on since removing a vertex (or a family) does not affect the others. However, the problem of deleting the smallest subset of vertices to reduce all connected components to a desired size is NP-hard since it contains the independent set problem as a special case. Hence, we simply delete the vertices with the largest degree until the size of each component is small enough for SimplBB to run fast. On the other hand, the size should not be too small to make the gap between the upper and lower bounds large. Because of this, we determine the value of the size parameter according to the input dataset.


    5 EXPERIMENTAL TESTS
 TOP
 Abstract
 1 INTRODUCTION
 2 THE EXEMPLAR DISTANCE...
 3 INDEPENDENCE OF GENE...
 4 ALGORITHMS
 5 EXPERIMENTAL TESTS
 6 CONCLUSION
 REFERENCES
 
In Sankoff (1999) the computational cost of SimplBB is measured in the number of calls to subroutine BD which calculates the breakpoint distance between two partial exemplar strings. In our program, most of the computation is done by calling SimplBB on each independent subset of gene families. Thus, we also measure the computational cost of our program FastEBD in the number of calls to BD for comparing it with SimplBB. We implemented both SimplBB and FastEBD in C++ and ran them on a PC with 1.6 GHz Pentium processor and 512 MB RAM.

5.1 Results on simulated genomes
We use the simulation method proposed by Sankoff (1999) to generate genome pairs for our validation test. Each genome pair is characterized by two parameters: the number of (both single-pair and multi-pair) gene families and their configuration, which indicates the number of copies of each multi-pair gene family in each of the two genomes. In the configurations given in Table 1, each row corresponds to a genome, each column corresponds to a multi-pair gene family and each entry gives the number of genes of the corresponding gene family in the corresponding genome. In each genome pair generated according to the configuration ‘threes’, a genome contains roughly about three copies of each multi-pair family. However, in each genome pair generated according to the second configuration, the number of gene copies in each multi-pair family varies. Some multi-pair families contain as many as 10 gene copies in each genome while others contain only one or two copies in each genome. Hence, the second configuration is named unbalanced.


View this table:
[in this window]
[in a new window]
 
Table 1 The two configurations from Sankoff (1999) used to compare the cost of SimplBB and FastEBD

 
We run both SimplBB and FastEBD to find the exemplar breakpoint distance between two genomes with the same configuration described in Table 1. In Figure 1, we plot the numbers of calls to the subroutine BD made by SimplBB and FastEBD in log scale against the length of exemplars. The graph indicates that FastEBD is much more efficient than SimplBB.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 1 Calling cost of SimplBB and FastEBD on genomes with each of the two configurations given in Table 1. Each point represents the average number of calls of 100 tests on different genome pairs generated with the same parameters.

 
We also note that when the number of single-pair gene families increases, the running cost of SimplBB increases while the cost of FastEBD increases for a narrow range of length and then decreases. The reason for FastEBD running faster when there is a large number of gene families is that when more single-pair gene families are included, the multi-pair gene families become less dependent on each other and hence the large set of gene families could be partitioned into small unrelated subsets.

Another property of our program FastEBD is that it allows a trade-off between accuracy and computational efficiency. This is vital when working with real genome datasets that contain hundreds or thousands of genes. Thus we conducted experiments to examine the accuracy of the approximating version of our program. By setting the threshold T on the number of calls to BD, our algorithm outputs the lower and upper bounds on the exemplar breakpoint distance between the input genomes. If T is small, we have to remove more gene families so that the remaining gene families can be partitioned into ‘independent’ subsets on which SimplBB can finish by calling BD at most T times. Symmetrically, if T is large, we just need to remove less gene families and hence obtain better lower and upper bounds on the exemplar breakpoint distance between the input genomes. We measured accuracy using the ratio of the difference of the lower and upper bounds to the upper bound of the exemplar breakpoint distance output from the program.

To generate genomic datasets that are close to biological data, we adopt a common approach in benchmarking algorithms for calculating genomic distance. Starting with two identical genomes, we let them undergo a series of mutations such as duplication, transposition and reversal with certain probabilities. Since duplication, transposition and reversal of long segments happen less frequently than those of short segments (Pinter and Skiena, 2002), we use a Poisson distribution P{lambda}(k) to calculate the probabilities of mutations on a segment of length k with the mean {lambda} as a parameter.

We generated 30 genome pairs in which each genome contains 3000 genes and undergoes 500 cycles of mutations. The mean of probabilities for each type of mutation is 0.1. The resulting genome pairs contain about 3–8 copies in each gene family, giving rise to about 10250–10350 exemplar pairs. We ran FastEBD on each pair of genomes with T taking values 104, 2x 104, 3x 104, 4x 104 and 5x 104. The accuracy ratios in these tests are summarized in Figure 2. From this graph, it is clear that accuracy is improved when T is bigger. It also shows that even with T=104, the error is <5% in most of our experiments.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 2 Error of the results returned by FastEBD against number of calls SimplBB is allowed to make.

 
5.2 Bacterial genomes
We ran the algorithms on five bacteria genome pairs studied in Earnst-De Young et al. (2004): Buchnera aphidicola and W. brevipalpis (Baphi–Wigg), Pasteurella multocida and Hamophilus influenzae (Pmult–Hinfl), Escherechia coli and Salmonella typhimurium (Ecoli–Styphi), Xanthomonas axonopodis and Xanthomonas campestris (Xaxo–Xcamp) and Yersinia pestis-KIM and Yersinia pestis-CO92 (Ypes).

Each genome in the five genome pairs above contains gene families that are not in the other. There are two ways to deal with these genes for our purpose. The easy way is to delete all such genes and consider the reduced genomes. Alternatively, we first put a ‘pivot’ gene x at the right end of both genomes. Then, we construct two genomes from and by doing the following: (1) for each block B of consecutive occurrences of gene families that are in but not in , we introduce a new gene family gB, replace B with a copy of gB and append a copy at the right end of . (2) We deal with the gene families that appear in , but not in , in the same way. The purpose of this preprocessing method is to take gene deletions into accounts. But it may introduce redundant hypothetical gene deletion events. Our computation results show that we might underestimate the exemplar breakpoint distance between two given genomes by using the first method and overestimate it by using the second method. By using these two methods to deal with gene losses, we obtain two genomic datasets from each of the five bacteria genome pairs, denoted by Baphi–Wigg1, Baphi–Wigg2, Pmult–Hinfl1, Pmult–Hinfl2, Ecoli–Styphi1, Ecoli–Styphi2, Xaxo–Xcamp1, Xaxo–Xcamp2, Ypes1 and Ypes2 respectively.

We ran both SimplBB and FastEBD on these datasets. SimplBB could finish only on Baphi–Wigg1 and made about 105 calls to BD. FastEBD with T=104 finished on all the genome pairs, returning either an exact EBD or lower and upper bounds. (See Section 5.1 for threshold T.) In the latter case, the bounds differ by only 1. Furthermore, we are able to obtain the exact EBD for all pairs by increasing T. For example, with T=3x 106, FastEBD output 119 given Xaxo-Xcamp1 as input. The results are summarized in Table 2.


View this table:
[in this window]
[in a new window]
 
Table 2 Experimental results on 10 pairs of bacteria genomes

 

    6 CONCLUSION
 TOP
 Abstract
 1 INTRODUCTION
 2 THE EXEMPLAR DISTANCE...
 3 INDEPENDENCE OF GENE...
 4 ALGORITHMS
 5 EXPERIMENTAL TESTS
 6 CONCLUSION
 REFERENCES
 
In this paper, we have proposed an efficient algorithm for calculating the exemplar breakpoint distance between two genomes with gene families. It is not only fast, but also outputs accurate breakpoint distances. Our idea can be incorporated into the existing methods to derive an efficient algorithm for computing the exemplar reversal distance because of the close relation of the reversal distance and the breakpoint distance. Future research topics also include how to incorporate our idea into the existing methods such as the branch-and-bound approach to derive efficient algorithms for the exemplar problem with other distance measures such as translocation distance (Hannenhalli and Pevzner, 1995) or syntenic distance (Feretti et al., 1996).

As for applications, investigating whether our approach can be extended into an efficient method for genomic comparison or for reconstructing phylogeny from gene-content data is also interesting.


    Acknowledgments
 
The authors would like to thank the reviewers for the careful reading of the manuscript and the useful suggestions for revising it. They would also like to thank D. Sankoff for the helpful discussions on the exemplar problem, E. Lerat for sending us her genomic datasets, and J. Tang for the comments on the first draft of this manuscript. L.Z. was partially supported by BMRC Research Grant BMRC01/1/21/19/140.

Received on December 3, 2004; revised on February 5, 2005; accepted on February 10, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 THE EXEMPLAR DISTANCE...
 3 INDEPENDENCE OF GENE...
 4 ALGORITHMS
 5 EXPERIMENTAL TESTS
 6 CONCLUSION
 REFERENCES
 

    Bryant, D. (2000) The complexity of calculating exemplar distances. In Sankoff, D. and Nadeau, J.H. (Eds.). Comparative Genomics, , Dordrecht Kluwer Academic Publishers, pp. 207–212.

    Caprara, A. (1997) Sorting by reversals is difficult. Proceedings of the First International Conference on Computational Molecular Biology (RECOM97) CM Press, pp. , pp. 75–83.

    Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T. (2005) Computing the assignment of orthologous genes via genome rearrangement. Proceedings of the Third Asia-Pacific Bioinformatics Conference Imperial College Press, pp. , pp. 363–378.

    Earnst-De Young, J., Lerat, E., Moret, B. (2004) Reversing gene erosion—reconstructing ancestral bacterial genomes from gene content and order data. Proceedings of the 4th Annual Workshop on Algorithms in Bioinformatics (WABI'04) , Berlim Springer, pp. , pp. 1–13.

    El-Mabrouk, N. (2005) Genome rearrangements with gene families. In Gascuel, O. (Ed.). Mathematics of Evolution and Phylogeny, (In press) Oxford University Press.

    Feretti, V., Nadeau, J.H., Sankoff, D. (1996) Original synteny. Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching , Berlin Lecture Notes in Computer Science, Vol. 1075 Springer, pp. , pp. 78–87.

    Hannenhalli, S. and Pevzner, P. (1995) Transforming men into mice (polynomial algorithm for genomic distance problem). Proceedings of IEEE 36th Annual Symposium on Foundations of Computer Science The IEEE Computer Society, pp. , pp. 581–592.

    Hannenhalli, S. and Pevzner, P. (1999) Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. Journal of the Association for Computing Machinery, 46, 1–27.

    International Mouse Genome Consortium. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520–562[CrossRef][Medline].

    Kaplan, H., et al. (2000) Faster and simpler algorithm for sorting signed permutations by reversals. SIAM J. Comput., 29, 880–892[CrossRef].

    Kececioglu, J. and Sankoff, D. (1994) Efficient bounds for oriented chromosome-inversion distance. Proceedings of the Fifth Symposium Combinatorial Pattern Matching , Berlin Lecture Notes in Computer Science, Vol. 807 Springer, pp. 307–325.

    Moret, B., et al. (2005) Reconstructing phylogenies from gene-content and gene-order data. In Gascuel, O. (Ed.). Mathematics Of Evolution and Phylogeny, Oxford University Press.

    Nadeau, J.H. and Taylor, B.A. (1984) Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl. Acad. Sci. USA, 81, 814–818[Abstract/Free Full Text].

    Pinter, R.Y. and Skiena, S. (2002) Genomic sorting with length-weighted reversals. Proceedings of the Thirteenth International Conference on Genomics Informatics , Tokyo, Japan Universal Academy Press, pp. 103–111.

    Sankoff, D. (1989) Mechanisms of genome evolution: models and inference. Bull. Int. Stat. Institut., 47, 461–475.

    Sankoff, D. and Blanchette, M. (1998) Multiple genome rearrangement and breakpoint phylogeny. J. Comput. Biol., 5, 555–570[Web of Science][Medline].

    Sankoff, D. and El-Mabrouk, N. (2000) Duplication, rearrangement and reconcilliation. In Sankoff, D. and Nadeau, J.H. (Eds.). Comparative Genomics, , Dordrecht Kluwer Academic Press, pp. 537–550.

    Sankoff, D. (1999) Genome rearrangement with gene families. Bioinformatics, 15, 909–917[Abstract/Free Full Text].

    Tang, J. and Moret, B. (2003) Phylogenetic reconstruction from gene rearrangement data with unequal gene contents. Proc. 8th Workshop on Algorithms and Data Structures (WADS'03) , Berlin Lecture Notes in Computer Science, vol. 2748 Springer Verlag, pp. 37–46.

    Watterson, G.A., et al. (1982) The chromosome inversion problem. J. Theor. Biol., 99, 1–7[CrossRef].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2171    most recent
bti327v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Nguyen, C. T.
Right arrow Articles by Zhang, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nguyen, C. T.
Right arrow Articles by Zhang, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?