Skip Navigation


Bioinformatics Advance Access originally published online on March 1, 2006
Bioinformatics 2006 22(10):1166-1171; doi:10.1093/bioinformatics/btl073
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/10/1166    most recent
btl073v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Google Scholar
Right arrow Articles by Luz, H.
Right arrow Articles by Vingron, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Luz, H.
Right arrow Articles by Vingron, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org

Family specific rates of protein evolution

Hannes Luz * and Martin Vingron

Max Planck Institute for Molecular Genetics Ihnestrasse 73, 14195 Berlin, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 SUMMARY AND CONCLUSION
 SUPPLEMENTARY DATA
 REFERENCES
 

Motivation: Amino acid changing mutations in proteins are contstrained by purifying selection and accumulate at different rates. We estimate evolutionary rates on multiple alignments of eukaryotic protein families in a maximum likelihood framework and spot sets of slow and fast evolving proteins.

Results: We find that the evolution of indispensable proteins is constrained by selection and that protein secretion is coupled to an increased evolutionary rate.

Contact: luz{at}molgen.mpg.de

Supplementary information: http://speeds.molgen.mpg.de


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 SUMMARY AND CONCLUSION
 SUPPLEMENTARY DATA
 REFERENCES
 
Proteins evolve at different rates because the mutational process that acts on a protein is subject to a specific selective pressure. For example, an alignment of histones from man, fugu, fly and worm shows only few amino acid exchanges. Here almost all amino acid changing substitutions are deleterious, the selective regime is rather stringent. At the other extreme orthologous receptor tyrosine kinases from the same organisms may be even non-trivial to align and only few residues are under strong purifying selection.

The rates of protein evolution are routinely quantified by comparing the coding nucleotide sequences of orthologous gene pairs between closely related organisms. Several authors apply the codon substitution model implemented in PAML (Yang, 1997) to estimate dN, the expected number of non-synonymous substitutions causing a change of the amino acid sequence (Davis and Petrov, 2004; Castillo-Davis et al., 2004; Hahn and Kern, 2005). The molecular clock model assumes that the number of substitutions is proportional to the divergence time and to a constant rate at which substitutions accumulate. Since orthologous sequences have diverged by speciation, estimates of dN for individual orthologous gene pairs constitute a rate distribution of the proteomes. Still when dN, measured between two close lineages, is transferred to other distantly related protein coding genes as in Davis and Petrov (2004), the rate variations among diverse lineages are not taken into account. Here it becomes feasible to estimate evolutionary rates by measuring the degree of sequence divergence among larger sets of orthologous proteins, i.e. within orthologous families. Koonin et al. (2004) apply a measure for an evolutionary rate on sets of distantly related orthologs by averaging distances from the outgroup sequence to other sequences. In our study, evolutionary rates of orthologous families are estimated in a maximum likelihood (ML) framework. We require the orthologous families to hold members of a defined set of organisms. Thus, the total time that has passed since the sequences diverged is the same within different orthologous families and different levels of sequence divergence can be compared and related to historical time.

The promising goals when pinpointing rate distributions include the disclosure of global principles influencing selection. For example, purifying selection is expected to act weaker on dispensable than on indispensable proteins and some authors accomplish correlating some degree of a protein's dispensability to its evolutionary rate (Hirsh and Fraser, 2001; Hahn and Kern, 2005). Others try to relate evolutionary rates to sequence length, tissue specificity, secretion or the affiliation of the proteins to functional categories (Lipman et al., 2002; Winter et al., 2004; Koonin et al., 2004; Castillo-Davis et al., 2004).


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 SUMMARY AND CONCLUSION
 SUPPLEMENTARY DATA
 REFERENCES
 
2.1 The data, orthologous families and alignments
We derive orthologous families containing members of the primate Homo sapiens, the pufferfish Fugu rubripes, the arthropode Drosophila melanogaster and the nematode Caenorhabditis elegans. The sample among completely sequenced and divergent model organisms is chosen such that pairs of orthologous amino acid sequences are subject to a significant and informative portion of sequence divergence.

The peptide sequences were downloaded from the Ensembl database (version 16) (Hubbard et al., 2002). We first apply the INPARANOID software to obtain orthologous groups for each pair of organisms by requiring a high confidence for orthologous assignments and setting the INPARANOID confidence value to 95% (Remm et al., 2001). Since orthologous relationships are transitive, the orthologous groups derived for pairs of organisms are merged into orthologous families if they have a sequence in common.

We select the orthologous families that contain at least one representative of each organism. When an orthologous family contains more than one sequence per organism we select four sequences of similar length. Sequences are filtered for low complexity regions (Wootton and Federhen, 1993) and for each family a multiple alignment of four orthologous sequences is generated using DCA (Stoye et al., 1997). The recursion stop size in DCA is set to 400. That is, obtained multiple alignments with <400 sites are definite optimal alignments with respect to the sum of pairs score. Finally, we discard orthologous families with alignments containing <80 gapless sites as well as some families with spurious alignments containing large numbers of gaps. We end up with a set of 3640 orthologous families and multiple alignments. For the ML tree computations described below we consider the gapless sites of the alignments only.

2.2 Estimating family specific rates of protein evolution
We apply two approaches to estimate family-specific evolutionary rates using standard ML phylogenetic tree estimation procedures. Under the assumption that point mutations accumulate according to a stochastic process that acts independently on the sites of a sequence, a reversible Markov process with a stationary distribution is commonly used as a probabilistic model of sequence evolution (Müller and Vingron, 2000; Müller et al., 2002). We choose the Müller–Vingron model as amino acid replacement model where replacement frequencies were estimated on alignments of varying degree of divergence (Müller et al., 2002). Further, the Markov process is calibrated to PAM units. In concrete terms, in a sequence of 100 residues that evolves according to the Markov process and along an edge in a model tree with length t = 1 PAM one substitution event is expected to occur.

For the species under study here, the unrooted tree topology T of the species phylogeny is known. Consider the likelihood Formula of the phylogentic tree (Felsenstein, 1981) for orthologous family i where four orthologous sequences of length n(i) are placed at the leaves of a tree (Fig. 1b)

Formula 1(1)


Figure 1
View larger version (5K):
[in this window]
[in a new window]
 
Fig. 1 (a) Species tree and (b) a gene tree for the species under study. Edge lengths of the species tree {tau}j, j = 1, ... , 5, are estimates of divergence times in Millions of years (My) (Wang et al., 1999; Hegges, 2002). Species names are abbreviated by C (C.elegans), D (D.melanogaster), F (F.rubripes), and H (H.sapiens). Gene names of family i are abbreviated by c, d, f and h. The ultrametricity implies {tau}1 = {tau}2 and {tau}3 = {tau}4{tau}2.

 
Formula 1 is the probability to observe the alignment Formula 1(i) with the aligned positions Formula 1 s isin {1, ... , n(i)}, that have evolved according to the tree T with edge lengths Formula 1 under our evolutionary Markov process Q. (In the following, we omit the notation of T and Q that remain fixed in our likelihood computations.)

The literature provides estimates of divergence times for the species under study here (Wang et al., 1999; Hedges, 2002). Figure 1a shows the species tree with edge lengths Formula 1 representing times. We use these divergence times to relate measures of amino acid replacements in a phylogenetic tree to historical times.

First, we do not assume rate constancy among lineages and no constraints are imposed on the edge lengths of the phylogenetic tree. The edge length estimates Formula 1 j = 1, ... , 5, are the values where the likelihood function Formula 1 assumes its maximum. The tree length Formula 1 of the ML tree holds the total amount of substitutions having accumulated on the evolutionary paths. The time that has passed since mutations accumulated is given by the tree length of the species tree {sum}j{tau}j. Thus, a natural measure for a family-specific evolutionary rate is given by the tree length ratio Formula 1

Formula 2(2)

At the other extreme, we assume that the sequences have evolved at a constant rate along the edges of the species tree in Figure 1a. With the parametrization

Formula 3(3)
[suggested, e.g. in Yang, (1996)] the likelihood depends just on the parameter {lambda}i

Formula 4(4)

We call the scaling factor Formula 4 that maximizes Formula 4({lambda}i) the Family Specific Rate FSR of family i.

2.3 Is the model adequate?
How well does the FSR model fit the data? This question can be adressed either for the individual orthologous families or for the whole dataset that contains all families. In the latter case, we consider the alignments of all 3640 orthologous families in a large concatenated alignment Formula 4 with 1 476 454 sites. Competing model assumptions can be tested by likelihood ratio tests (LRT), the Akaike Information Criterion (AIC) and the Bayes Information Criterion (BIC) (Felsenstein, 2004). Under the assumption that the sequences in Formula 4 have evolved according to our species tree and at FSRs, the total ML in the FSR model is equal to the product over the MLs Formula 4 of the individual orthologous families and the likelihood function depends on 3640 rate parameters, one for each family. When applying the above mentioned tests, the FSR model performs well in comparison with other models. For example, we make use of PAML (Yang, 1997) and assume a model where the rate parameter at an alignment position of Formula 4 is drawn from a discretized gamma distribution with 40 rate categories. The latter model depends on the five edge lengths of the tree estimated and on the shape parameter of the gamma distribution. In a comparision of this model with the FSR model, the many more rate parameters in the FSR model are penalized by the AIC and by the BIC. Nevertheless, the values of the AIC and of the BIC for both models are in the same range and even slightly better for the FSR model (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1 Goodness of fit for the whole dataset

 
Regarding the individual families, the likelihood function L{lambda}({lambda}i) is recovered from the likelihood function Formula 4 with the parametrization in Equation (3). This indicates that the models to estimate {lambda}i and li are nested models. Thus, a LRT that checks whether the simpler FSR model is preferable can be carried out. The test statistic for orthologous family i is the difference in log likelihoods between the FSR model (the null hypothesis) and the model which has no constraints on edge lengths. The difference in the number of parameters of the two models is 4 and the distribution of {Delta}i is expected to be approximately {chi}2 with 4 degrees of freedom. Since different families possess different characteristics, we do not assume that the distribution of {Delta}i under the null hypothesis is the same for all families. Instead of applying the {chi}2 statistic we make the different {Delta}i-values comparable across families by simulating family-specific null-distributions for {Delta}i. For family i we do this by repeatedly (500 times) simulating sequence families according to our evolutionary model and the species tree using Rose (Stoye et al., 1998). For each set of simulated families we calculate the corresponding differences Formula 4 in log likelihoods and obtain a pi-value from the fraction of Formula 4-values being larger than {Delta}i. We reject the null hypothesis if pi < 0.05. The LRT reveals 888 orthologous families that have evolved at an approximately constant rate according to our species tree. We call these families rate constant families. Further, we calculate and compare Formula 4 and Formula 4 for all orthologous families. Interestingly both tree models yield almost the same rate estimates. Still, values of Formula 4 and Formula 4 for the whole set of families are also highly correlated with a correlation coefficient of r = 0.982. A scatter plot comparing values of Formula 4 and Formula 4 is available in the Supplementary Material. The different rate estimates are even close for 218 cases where a different topology than the one of the species tree yields a higher likelihood in the ML tree computation.

We conclude that the rates of protein evolution are driven by family specific effects. In the following we refer to the rate of an orthologous family by its FSR Formula 4. Figure 2 shows the overall distribution of FSRs Formula 4 ranging from 1 to 162 PAM per billions of years (PAM/BYr). The mean rate amounts to 52 PAM/BYr, the median rate to 50 PAM/BYr. In order to assess the uncertainty of the rate estimates, we apply non-parametric bootstrapping to the alignments of individual families. The 95% bootstrap-confidence interval (Efron and Tibshirani, 1993) for Formula 4 is given by the 2.5 and 97.5% points of the ranked list of rates estimated for 1000 bootstrap replicates for family i. If the estimates of {lambda}i are normally distributed around their true values, we can calculate variances Formula 4 by means of the second derivative of log Formula 4({lambda}i). Intervals given by Formula 4 generally are smaller than the 95% bootstrap-confidence intervals. The rate estimates with confidence intervals and pi-values of likelihood ratio tests are available in the Supplementary Material.


Figure 2
View larger version (17K):
[in this window]
[in a new window]
 
Fig. 2 Histogram of all FSRs Formula 4 and of the subset of rates for families in the nonviable class (see Section 3.1).

 

    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 SUMMARY AND CONCLUSION
 SUPPLEMENTARY DATA
 REFERENCES
 
3.1 Indispensable proteins are more conserved
Essentiality of a gene can be identified by knock-out experiments. If the absence of a gene results in a lethal or sterile phenotype the gene is considered essential. Such genes are expected to be subject to stringent purifying selection (Cutter et al., 2003). Small double-stranded RNA molecules can interfere with the translation of mRNA molecules obeying a similar sequence. The small RNA molecules are called short interfering RNAs (siRNAs) and the mechanism is known as RNA interference (RNAi). A ‘genome-wide’ loss-of-function analysis covered 86% of C.elegans genes (Kamath et al., 2003). According to the observed phenotype the genes were grouped into three classes: ‘the non-viable class (Nonv), consisting of embryonic or larval lethality or sterility (with or without associated post-embryonic defects); the growth defects (Gro) class, consisting of slow or arrested post-embryonic growth and the viable post-embryonic phenotype (Vpep) class, consisting of defects in post-embryonic development (e.g. in movement or body shape) without any associated lethality or slowed growth’ (Kamath et al., 2003).

We compare rate distributions of orthologous families with proteins in specific phenotypic classes with the overall rate distribution. Table 2 summarizes the results. The non-viable class is the only phenotype class where significant differences in rate distributions are observed. Of the 1170 worm genes within the non-viable class, 502 genes are found within our alignments of orthologous families. The histograms in Figure 2 illustrate that the families of the non-viable class have lower rates compared with the rates of all orthologous families. The mean rate of the C.elegans-non-viable set amounts to 39.8 PAM/BYr. The Wilcoxon two sample test comparing the overall rate with the non-viable rate distribution yields a p-value of 1.29 x 10–28 (Table 2).


View this table:
[in this window]
[in a new window]
 
Table 2 FSRs of essential genes

 
Further, we investigate the interrelation of our evolutionary rates and a large fraction of the C.elegans interactome. Li et al. (2004) obtained >4000 interactions through carefully performed high-throughput two-hybrid analysis. Already known and further potential interactions predicted from orthologs of other organisms were added and alltogether 5534 interactions for 2898 proteins were combined into the Worm Interactome version 5 (WI5). Like other biological networks WI5 exhibits scale-free properties.

Do the number of interactions within WI5 correlate to evolutionary rates? In the following, the number of interaction partners is referred to as degree k. We find 765 of 2898 worm genes in WI5 in our dataset. Rates and degrees are weakly negatively correlated. A relation is established when partitioning the set of 765 proteins with respect to the degree of the worm proteins and with respect to the rates.

First we split the 765 proteins into three sets with degrees k isin {1}, k isin {2, 3} and k isin {4, ... , 89} and compare the rate distributions among the three sets. We find 380 proteins with degree k isin {1}, 199 with degree k isin {2, 3} and 186 with degree k isin {4, ... , 89}. Indeed, the mean rate of the three sets decreases with growing k, suggesting that purifying selection acts stronger on hubs of the interactome. It turns out that the rate distributions of the sets for k isin {2, 3} and k isin {4, ... , 89} do not significantly differ. Yet the comparison of both of them together or individually with the rate distribution of families with k isin {1} yields significant p-values (e.g. for the sets with k isin {1} and with k isin {4, ... , 89} the p-value is p = 1.04 x 10–4).

Second we split the set of 765 orthologous families into four approximately same sized sets with rates in four different non-intersecting rate intervals. The bar chart in Figure 3 compares the frequencies of families for a given rate interval and a certain degree category. For k isin {1} we observe that most of the families belong to the fastest rate interval. For k isin {4, ... , 89} the reverse holds. Our results support the view that interactions impose additional constraints on the replacement of amino acid residues.


Figure 3
View larger version (28K):
[in this window]
[in a new window]
 
Fig. 3 The bar chart compares FSRs to numbers of interaction partners in the WI5 dataset. 765 orthologous families were defined as belonging to one of four rate categories as well as to one of three degree categories. Numbers in parentheses indicate the numbers of orthologous families belonging to a rate category. For degree k isin {1}, most of the orthologous families are fast evolving. For proteins with k isin {4, ... , 89} small rates are over-represented.

 
3.2 Extra-cellular proteins are fast evolving
We combine different in silico approaches to derive a set of extra-cellular families. Namely we check putative extra-cellular localization due to Swiss-Prot annotations (Nair and Rost, 2002), the detection of extra-cellular SMART domains (Mott et al., 2002; Letunic et al., 2004) and the existence either of a predicted signal peptide (Nielsen et al., 1997) or a transmembrane helix (Sonnhammer et al., 1998). If the proteins of an orthologous family meet at least two of those criteria, we call the respective family extra-cellular.

We end up with a set of 241 orthologous families with extra-cellular proteins. This set also includes transmembrane proteins which follow the secretory pathway but are only in part extra-cellular. We observe that the rate distribution of the extra-cellular families is significantly shifted to larger rates. The mean rate and the median rate are 67.2 PAM/BYr and 64 PAM/BYr, respectively. A Wilcoxon two sample test that compares the rate distribution of all orthologous families with the rate distribution of the extra-cellular families yields a significant p-value of p = 1.80 x 10–19.

Protein tyrosine kinases (PTKs) are involved in cellular signalling pathways and regulate key cell functions such as proliferation, cell growth, immune response and differentiation. We focus on the large multigene family of PTKs in greater detail and analyze the 14 domain architectures shown in Figure 4. Each of the domain architectures is present within a distinct orthologous family. Comparing PTKs is interesting with regard to a putative interrelation of a protein's extra-cellular localization and its evolutionary rate. While non-receptor PTKs are purely cytoplasmic, receptor PTKs are membrane anchored and contain an extra-cellular ligand binding domain. The set of 14 orthologous families divides into 9 families with receptor PTKs shown above the dashed line and 5 families containing non-receptor PTKs shown below the dashed line in Figure 4.


Figure 4
View larger version (20K):
[in this window]
[in a new window]
 
Fig. 4 Receptor tyrosine kinases (above dashed line) and PTKs (below dashed line). Representations of the proteins by their domain architectures were downloaded from the SMART web server (Letunic et al., 2004) and comprise predicited SMART domains (bubbles), PFAM domains (rectangles), transmembrane helices (narrow vertically oriented rectangles) and signal peptides (at the N-terminus). Domain symbols and names are itemized in an online appendix (Supplementary Material). FSRs of orthologous families are written above gene names. Numbers below domains are rates of domains in PAM/BYr units.

 
FSRs of orthologous families are written above gene names in Figure 4. While the rates of purely cytoplasmic PTKs range from 33 to 67 PAM/BYr, rates of receptor PTKs range from 56 to 84 PAM/BYr. This suggests that there is a general trend of the receptor PTKs to evolve at larger rates than the non-receptor PTKs.

We further disentangle the evolutionary rates by assessing the rates of the proteins' constituting domains. For that purpose we align the domains to the domain models using ‘hmmalign’ (Eddy, 1998). Finally, we apply the FSR estimator to the domain alignments. Domain rates are written below domain symbols in Figure 4. It is revealed that the extra-cellular domains are more divergent than their cytoplasmic counterparts. The largest rate observed for the Tyrosine Kinase domain is 44 PAM/BYr. In contrast, each of the extra-cellular domains is more divergent. We conclude that the large rates of the receptor PTKs indeed are due to the extra-cellular portions of the proteins.


    4 SUMMARY AND CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 SUMMARY AND CONCLUSION
 SUPPLEMENTARY DATA
 REFERENCES
 
We analyze evolutionary rates of protein families that comprise orthologs from man, fugu, fly and worm. The assumption that the number of mutations per time unit is constant, the so called molecular clock hypothesis, allows to represent the evolution of a family by a rooted ultrametric phylogenetic tree where all leaves are equally distant to the root. In such a tree the edge lengths are proportional to the estimated number of mutation events and can be scaled with a rate parameter. The fact that a protein's evolutionary rate differs for different lineages, i.e. that the molecular clock does not hold in general, is accounted for by reconstructing an additive tree rather than an ultrametric one. We apply both tree models and a ML framework to estimate family specific evolutionary rates. A pregiven set of divergence times is used to relate measures of amino acid replacements to historical times.

We consider publicly available data of RNAi knock-out experiments and high-throughput 2-hybrid systems in C.elegans and establish relationships of family specific rates to the essentiality and the connectivity of proteins. We find that indispensable proteins are subject to strong purifying selection.

Finally, we analyze fast evolving extra-cellular proteins and the large multigene family of protein tyrosine kinases. For the latter we reveal that extra-cellular domains compared with their cytoplasmic counterparts are more divergent.


    SUPPLEMENTARY DATA
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 SUMMARY AND CONCLUSION
 SUPPLEMENTARY DATA
 REFERENCES
 
Supplementary Material is available at http://speeds.molgen.mpg.de. It includes the whole set of orthologous families, together with alignments and ML trees, FSRs, tree length ratios, pi-values of LRTs, 95%-bootstrap confidence intervals and variances of FSRs, a scatter plot that compares FSRs with tree length ratios, a colored and large version of Figure 4, and a list with names of domains in Figure 4.


    Acknowledgments
 
The authors thank Tobias Müller, Anja von Heydebreck, Sven Rahmann, Eike Staub, Antje Krause and Thomas Manke for valuable discussions. They thank Georg Füllen for a comment on the manuscript. Part of this work was supported by grants from the Bundesministerium für Forschung und Bildung, Germany, through its contribution to the Helmholtz Network for Bioinformatics.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Thomas Lengauer

Received on November 30, 2005; revised on January 26, 2006; accepted on February 23, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 SUMMARY AND CONCLUSION
 SUPPLEMENTARY DATA
 REFERENCES
 

    Castillo-Davis, C.I., et al. (2004) The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res, . 14, 802–811[Abstract/Free Full Text].

    Cutter, A.D., et al. (2003) Molecular correlates of genes exhibiting RNAi phenotypes in Caenorhabditis elegans. Genome Res, . 13, 2651–2657[Abstract/Free Full Text].

    Davis, J.C. and Petrov, D.A. (2004) Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol, 2, E55[CrossRef][Medline].

    Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763[Abstract/Free Full Text].

    Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. , Cambridge Chapman and Hall.

    Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol, . 17, 368–376[CrossRef][ISI][Medline].

    Felsenstein, J. Inferring Phylogenies, (2004) , Sunderland, MA Sinauer Associates, Inc.

    Hahn, M.W. and Kern, A.D. (2005) Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol. Biol. Evol, . 22, 803–805[Abstract/Free Full Text].

    Hedges, S.B. (2002) The origin and evolution of model organisms. Nat. Rev. Genet, . 3, 838–849[CrossRef][ISI][Medline].

    Hirsh, A.E. and Fraser, H.B. (2001) Protein dispensability and rate of evolution. Nature, 411, 1046–1049[CrossRef][Medline].

    Hubbard, T., et al. (2002) The Ensembl genome database project. Nucleic Acids Res, . 30, 38–41[Abstract/Free Full Text].

    Kamath, R.S., et al. (2003) Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature, 421, 231–237[CrossRef][Medline].

    Koonin, E.V., et al. (2004) A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol, . 5, R7[CrossRef][Medline].

    Letunic, I., et al. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res, . 32, D142–D144[Abstract/Free Full Text].

    Lipman, D.J., et al. (2002) The relationship of protein conservation and sequence length. BMC Evol. Biol, . 2, 20[CrossRef][Medline].

    Li, S., et al. (2004) A map of the interactome network of the metazoan C.elegans. Science, 303, 540–543[Abstract/Free Full Text].

    Mott, R., et al. (2002) Predicting protein cellular localization using a domain projection method. Genome Res, . 12, 1168–1174[Abstract/Free Full Text].

    Müller, T., et al. (2002) Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method. Mol. Biol. Evol, . 19, 8–13[Abstract/Free Full Text].

    Müller, T. and Vingron, M. (2000) Modeling amino acid replacement. J. Comput. Biol, . 7, 761–776[CrossRef][ISI][Medline].

    Nair, R. and Rost, B. (2002) Inferring sub-cellular localization through automated lexical analysis. Bioinformatics, 18, S78–S86[Abstract].

    Nielsen, H., et al. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng, . 10, 1–6[Abstract/Free Full Text].

    Remm, M., et al. (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol, . 314, 1041–1052[CrossRef][ISI][Medline].

    Sonnhammer, E.L.L., von Heijne, G., Krogh, A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. In Glasgow, J. (Ed.), et al. Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, AAAI Press, pp. 175–182.

    Stoye, J., et al. (1997) DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment. Comput. Appl. Biosci, . 13, 625–626[Abstract/Free Full Text].

    Stoye, J., et al. (1998) Rose: generating sequence families. Bioinformatics, 14, 157–163[Abstract/Free Full Text].

    Wang, D.Y., Kumar, S., Hedges, S.B., et al. (1999) Divergence time estimates for the early history of animal phyla and the origin of plants, animals and fungi. Proc. Biol. sci, . 266, 163–171.

    Winter, E.E., et al. (2004) Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res, . 14, 54–61[Abstract/Free Full Text].

    Wootton, J.C. and Federhen, S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. Comput. Biochem, . 17, 149–163.

    Yang, Z. (1996) Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol, . 42, 587–596[CrossRef][ISI][Medline].

    Yang, Z. (1997) PAML: A program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci, . 13, 555–556[Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/10/1166    most recent
btl073v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Google Scholar
Right arrow Articles by Luz, H.
Right arrow Articles by Vingron, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Luz, H.
Right arrow Articles by Vingron, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?