Skip Navigation


Bioinformatics Advance Access originally published online on November 5, 2004
Bioinformatics 2005 21(7):951-960; doi:10.1093/bioinformatics/bti125
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow A correction has been published
Right arrow All Versions of this Article:
21/7/951    most recent
bti125v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (187)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Söding, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Söding, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Protein homology detection by HMM–HMM comparison

Johannes Söding

Department of Protein Evolution, Max-Planck-Institute for Developmental Biology Spemannstrasse 35, D-72076 Tübingen, Germany


    Abstract
 TOP
 Abstract
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 

Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution.

Results: We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile–profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.

Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile–profile comparison methods is attributable to the use of profile HMMs in place of simple profiles.

Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments (‘balanced’ score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.

Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.

Availability: HHsearch can be downloaded from http://www.protevo.eb.tuebingen.mpg.de/download/ together with up-to-date versions of SCOP and PFAM. A web server is available at http://www.protevo.eb.tuebingen.mpg.de/toolkit/index.php?view=hhpred

Contact: johannes.soeding{at}tuebingen.mpg.de


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 
Homology detection and sequence alignment are central themes in bioinformatics because of their manifold applications in areas such as protein function prediction, 3D protein structure prediction and protein evolution (Bork and Koonin, 1998; Kinch et al., 2003; Henn-Sax et al., 2001). But often no close homolog with known function or structure can be found that would allow to make inferences about the protein of interest. In many of these cases, new and highly sensitive methods could detect and align remotely homologous sequences that provide information about the protein's function, structure or evolution. Extending the limits of sensitivity is therefore of great practical importance.

The development of profile–sequence comparison methods such as PSI-BLAST (Altschul et al., 1997) has led to a great improvement in sensitivity over sequence–sequence comparison methods such as FASTA or BLAST (Pearson and Lipman, 1988; Altschul et al., 1990). This is because a sequence profile, which is built from a multiple alignment of homologous sequences, contains more information about the sequence family than a single sequence. The profile allows one to distinguish between conserved positions that are important for defining members of the family and non-conserved positions that are variable among the members of the family. More than that, it describes exactly what variation in amino acids is possible at each position by recording the probability for the occurrence of each amino acid along the multiple alignment.

A significant improvement over profile–sequence based methods was made possible by comparing profiles to profiles. Several programs for homology recognition have recently been developed that are based on profile–profile comparison: LAMA by Pietrokovski (1996) PROF_SIM by Yona and Levitt (2002) and COMPASS by Sadreyev and Grishin (2003). These programs were shown to be significantly more sensitive than PSI-BLAST and have been applied for identifying evolutionary links between protein families previously thought to be unrelated (Pietrokovski, 1996; Kunin et al., 2001; Sadreyev et al., 2003). LAMA is part of the BLOCKS database software suite and was developed to compare a sequence alignment with a database of conserved, ungapped alignments (blocks) that characterize protein families. PROF_SIM and COMPASS allow for gaps and use the Smith–Waterman local alignment algorithm. PROF_SIM employs a column score based on Jensen–Shannon entropy. Statistical significance is reported as a P-value that is calculated directly from the raw score. COMPASS uses a column score based on the relative entropy between the two amino acid distributions. It estimates E-values analytically by generalizing the approach of PSI-BLAST to the profile–profile case. Before profile–profile comparison was applied to homology detection it was standardly employed in multiple sequence alignment methods, for example in the popular tool CLUSTAL by Thompson et al. (1994). Programs for multiple sequence alignment that incorporate recent advances in profile–profile comparison are PCMA by Pei et al. (2003) and SATCHMO by Edgar and Sjölander (2003).

A number of structure prediction servers exist that rely on profile–profile comparison (Rychlewski et al., 2000; Ginalski et al., 2003; Tang et al., 2003; von Öhsen et al., 2003; Tomii and Akiyama, 2004). They build a profile from the query sequence and search for homologous templates of known structure. In general, these templates are similar in structure because structures diverge much more slowly than sequences, and proteins may remain structurally very similar long after their sequence similarity has disappeared (Kinch and Grishin, 2002). These servers are among the best-performing present-day methods for fold recognition, as can be seen from the results of the blind, automated structure prediction contests CAFASP, LIVEBENCH and EVA (Fischer et al., 2003; Rychlewski et al., 2003; Koh et al., 2003).

Profile HMMs are similar to simple sequence profiles, but in addition to the amino acid frequencies in the columns of a multiple sequence alignment they contain the position-specific probabilities for inserts and deletions along the alignment (Fig. 1a). The logarithms of these probabilities are in fact equivalent to position-specific gap penalties (Durbin et al., 1998). Profile HMMs perform better than sequence profiles in the detection of homologs and in the quality of alignments (Krogh et al., 1994; Eddy, 1998; Karplus et al., 2001) albeit at the price of a decrease in computational speed. The higher sensitivity is due to the fact that the position-specific gap penalties penalize chance hits much more than true positives which tend to have insertions or deletions at the same positions as the sequences from which the HMM was built. Lyngsø et al. (1999) developed an algorithm for the alignment of two HMMs based on the exact maximization of the co-emission probability. They compared several score variants with each other but did not benchmark their method against others.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 1 (a) The alignment of a sequence to a profile HMM can be represented by a path through the HMM (bold arrows). (b) Alignment of two HMMs by maximization of log-sum-of-odds score. The path through the two HMMs corresponds to a sequence that is co-emitted by both HMMs. With dynamical programming one finds the path which maximizes the log-sum-of-odds score [Equation (2)]. (c) Allowed transitions between pair states. Other transitions are possible but can be neglected.

 
Here, we generalize the log-odds score maximized in HMM–sequence alignment to the case of HMM–HMM alignment. We present a novel algorithm for HMM–HMM alignment that is based on this theory and that makes some simplifications for increased efficiency. We show that by aligning profile HMMs instead of simple sequence profiles we are able to improve both sensitivity and alignment quality significantly.

Even sequences that are only distantly homologous will have secondary structures more similar to each other than what is to be expected by chance. For this reason the predicted secondary structure can help to distinguish real homologs from chance hits. Several methods score secondary structure to improve homology recognition. Kelley et al. (2000) score secondary structure with a simple +1/–1 scoring function. Hargbo and Elofsson (1999) include predicted probabilities to emit one of three secondary structure states in their profile HMMs.1 Kawabata and Nishigawa (2000) developed a statistical approach using a 3 x 3 substitution matrix for secondary structure states in their structure comparison program Matras. For HHsearch we developed a statistical method which aims at exploiting all available information, including the confidence values and the full seven-state secondary structure determined by DSSP2 (Kabsch and Sander, 1983). Like Kawabata and Nishigawa, we score pairs of aligned secondary structure states with substitution matrices, but we use ten 3 x 7 matrices, one for each confidence value, that we derive from a statistical analysis of PSIPRED predictions.

Our motivation in developing HHsearch was to provide the scientific community with a powerful tool for remote homology detection which maximizes sensitivity while ensuring reliability, speed and ease of use. To achieve maximum sensitivity we include as much information about query and database sequences as possible. We use HMM–HMM comparison instead of profile–profile comparison and we score predicted secondary structure. Reliability is crucial for a tool that is to be applied to detect evolutionary links. It was found that E-values reported by most tools, including ours, can be very unreliable. HHsearch therefore reports, in addition to E-values, the probability for each match to be a true positive, based on the 1.4 x 107 pairwise comparisons of our benchmark.


    THEORY
 TOP
 Abstract
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 
In the first subsection we show how to generalize the log-odds score to the case of pairwise comparison of profile HMMs. In the next subsection we then derive an efficient method to find the alignment between two profile HMMs that maximizes the log-sum-of-odds score.

Log-sum-of-odds score
The log-odds score for sequence–profile or sequence–HMM comparison has proven to be highly successful in homology recognition. This is underscored by the fact that virtually all sequence–profile and sequence–HMM comparison methods are based on it (Barrett et al., 1997). The log-odds score is a measure for how much more probable it is that a sequence is emitted by an HMM rather than by a random null model. More specifically, we write the probability for emitting the sequence x1,...,xL along the path through an HMM (Fig. 1a) by P(x1,...,xL|emission on path). This probability is a product of the amino acid emission probabilities for each state on the path and the transition probabilities between states. The log-odds score for the sequence to be emitted along the path by the HMM is (Durbin et al., 1998)

(1)
The denominator is the probability of the standardly used null model, , where f(a) are the fixed amino acid background frequencies.

We would like to generalize the log-odds score for sequence–HMM comparison to the case of HMM–HMM comparison. Suppose we are given an alignment of two profile HMMs, q and p (Fig. 1b). This alignment corresponds to a certain path through the two HMMs along which the HMMs emit amino acid residues. A natural generalization of Equation (1) to the case of HMM–HMM comparison is the log-sum-of-odds (LSO) score

(2)
The sum over x1,...,xL runs over all sequences of L residues that can be emitted along the alignment path through the HMMs (e.g. L = 6 in Fig. 1b). The numerator is the probability that x1,...,xL is co-emitted by both HMMs along the alignment path and the denominator is the same null model probability as before.

The log-sum-of-odds score generalizes the log-odds score: when we use one of the HMMs to represent a single sequence, i.e. the HMM can emit only this single sequence, only one term can contribute in the sum and we get the same result as with Equation (1). Note that the omission of the null model probability in the denominator would yield the logarithm of the co-emission probability.

In order to apply the Viterbi algorithm (i.e. dynamical programming) to find the path through the two HMMs with the maximum log-sum-of-odds score, we need to be more explicit about what Equation (1) means in terms of HMM probabilities. Let the two HMMs q and p have probabilities qi(a) and pj(a) to emit amino acid a in match state i or j and transition probabilities qi(X,X') and pj(Y,Y') to go from state X or Y {M,I,D} in column i or j to a state X' or Y' {M,I,D}. Insert states emit amino acids according to the fixed amino acid background frequencies f(a). Suppose we are given an alignment of q with p, or rather the path P through the two HMMs (Fig. 1b). We define K as the number of columns of the alignment of q with p (e.g. K = 7 in Fig. 1b). Let the Xk,Yk {M,I,D} be the states in q and p in the kth column of the pairwise alignment of q and p and let i(k) and j(k) be the respective columns from q and p. For the residues x_1(1 = 1,...,L) emitted along the path, we define and as the emission probabilities from q and p. More explicitely, for Xk = M and for Xk = I. Finally, we define as the product of all transition probabilities for the path through p and q. With these definitions, we can rewrite the log-sum-of-odds score as

(3)
In the last line we have introduced the column score,

(4)
by which we compare the amino acid distributions from the two HMMs. If we omitted the factor 1/f(a), we would obtain the logarithm of the co-emission probability as the total score. In this respect, the 1/f(a) can be interpreted as weight factors to the co-emission probability. They increase the weight of the rare amino acids with respect to the more common ones. This makes sense since co-emission of rare amino acids is harder to produce by chance. When one profile column contains only amino acid xi, i.e. qi(xi) = 1, the 1/f(a) ensure that we retrieve the log-odds score Saa(qi,pj) = log(pj(xi)/f(xi)). Furthermore, when one of the columns is completely non-conserved, pj(a) = f(a), the column score vanishes. For the same reason, insert states have vanishing column scores. The column score is positive when the two distributions are similar and negative otherwise, a property that makes local alignment possible. The column score is symmetric and furthermore fast to evaluate since it contains only one logarithm.

Pairwise alignment of HMMs
A profile HMM contains in each column a match state M, a delete state D and an insert state I (Fig. 1a). Match states and insert states emit amino acids whereas delete states do not. Therefore a match or insert state in one HMM can only be aligned with a match or insert state in the other HMM. Conversely, a delete state can only be aligned with a delete state or with a Gap G (Fig. 1b). A gap in a pairwise alignment of HMMs is completely analogous to a gap in a pairwise sequence alignment. It signifies that the column of the other HMM that is aligned with the gap does not have a homologous partner.3 We denote the alignment pair states as MM, MI, IM, II, DD, DG and GD. Figure 1b shows an example of two aligned profile HMMs. In the third column HMM q emits a residue from its M state and HMM p emits a residue from the I state. The pair state for this alignment column is MI. In column six of the alignment HMM q does not emit anything since it passes through the D state. HMM p does not emit anything either since it has a gap in the alignment. The corresponding pair state is DG. In principle, pair states MI and DG can be interchanged without changing the alignment of the two HMMs. The reason why we distinguish between them is that changing the path through the two HMMs changes the transition probabilities that contribute to the total score [Equation (2)].

At this point, we make two simplifications that speed up the algorithm and that can be argued to have a negligible or even positive effect on its performance: First, we exclude pair states II and DD, and second, we only allow transitions between a pair state and itself and between pair state MM and pair states MI, IM, DG or GD (Fig. 1c). The reasoning is very similar to the case of neglecting the I -> D and D -> I transitions in profile HMMs (Durbin et al., 1998).

To calculate the log-sum-of-odds score according to Equation (3), we need five dynamical programming matrices SXY, one for each pair state XY {MM, MI, IM, DG, GD}. They contain the score of the best partial alignment which ends in column i of q and column j of p in pair state XY. These matrices are calculated recursively,

(5)

(6)

(7)
and similarly for SIM(i,j) and SGD(i,j). Note that in the last equation no transition probabilities for HMM p appear. The pair state DG that is joined to the best partial alignment by this equation has a gap in HMM p and therefore no new transition is added to the path through p (Fig. 1b).

We have implemented both a semi-global and a local alignment version in HHsearch. For semi-global alignment the terminal gaps are not scored, so we set SMM(i, 0) = SMM(0, j) = 0. The other four matrices are initialized to –{infty} to forbid any pair state except MM as the first state. The total score SLSO is the maximum over the last column and last row of SMM. For local alignment a zero is added as a sixth case to the maximization in Equation (5) to permit the HMM–HMM alignment to start at any MM pair state without penalty. The total score SLSO is found as the maximum over the whole matrix SMM. The optimal alignment is constructed as usual by backtracing from the cell with maximum score.

Score offset
Most profile–profile methods add a score offset to the column score Saa in order to adjust how greedily the alignments will be constructed (Wang and Dunbrack, 2004) negative offsets producing shorter alignments. We found that adding a small offset of –0.1 bits indeed improves the performance of HHsearch and we use it as a default parameter. We think that this suppresses false matches caused by compositional bias, i.e. by a global similarity in amino acid composition. This compositional bias can lead to per-column scores slightly above zero (but in general below 0.1) which can add up to appreciable total scores over long proteins. We have also experimented with more refined methods for compositional bias correction. We replaced the background frequencies f(a) in Equation (4) by the average amino acid frequencies in the query or target protein, or , for example, but found the simple offset method to work best.

Sequence weighting and pseudocounts
For sequence weighting, we use the scheme of PSI-BLAST (Altschul et al. 1997) which is a modified version of Henikoff and Henikoff's scheme (1994). We add amino acid pseudocounts to both HMMs with a substitution matrix method similar to PSI-BLAST (Altschul et al., 1997), employing the Gonnet matrix (Gonnet et al., 1992) in place of the BLOSUM62 matrix as default. In contrast to the scheme of PSI-BLAST, the pseudocount admixture depends on the position in the multiple alignment. The modification ensures that, as in the sequence weighting scheme, alignments that are composed of several (sub)domains and which contain many sequences that cover only parts of the alignment get transformed to profiles in the same way as if the alignment was first cut into (sub)domains and the profiles calculated separately. Transition pseudocounts are added in a way analogous to amino acid pseudocounts.

Scoring correlations
It was shown by Pei et al. that in alignments of homologous sequences conserved columns tend to occur in clusters along the sequence (Pei and Grishin, 2001). When applied to the alignment of homologous HMMs, conserved columns of the underlying super-alignment should also occur in clusters. The conservation score of the super-alignment constructed from the two alignments will be higher wherever the distributions in the two aligned columns are similar, or, in other words, wherever the column score Saa [Equation (4)] is high. To sum up, in an alignment of two homologous HMMs we expect high column scores to occur in clusters along the sequence whereas in an alignment of non-homologous HMMs we do not expect any clustering.

This observation can help to distinguish homologous from non-homologous alignments. Suppose the lth pair state of the optimum path aligns columns i(l) from q and j(l) from p. We write Sl for the column score of the lth pair state, i.e. Sl = Saa(qi(l), pj(l)) if the lth pair state is an MM state and zero otherwise. The autocorrelation function

(8)
describes the correlation of Sl at a fixed sequence separation d. When the two HMMs are homologous we expect g(d) to be positive for small d. We therefore add

(9)
to the total score, after the best alignment is found.4 The weight wcorr = 0.1 was determined empirically on a small test set of 317 x 317 pairwise alignments.

Scoring secondary structure
HHsearch allows to score a predicted secondary structure either against a predicted secondary structure or against a known secondary structure. We first treat the latter case which is applicable to 3D structure prediction. The goal is a statistical score for aligning a pair of secondary structure states that takes the confidence values of the secondary structure prediction into account. Intuitively, the confidence values contain very valuable information since, for example, an H aligned to a predicted E should be penalized much more when the confidence value is 9 instead of 0.

We use DSSP (Kabsch and Sander, 1983) to assign one of seven states of observed secondary structure. PSIPRED is employed to predict secondary structure states H, E and C (Jones, 1999). We predicted the secondary structure for all domains in SCOP (version 1.63, filtered to a maximum sequence identity of 20%) and compared the PSIPRED predictions for each residue with the DSSP assignments. We counted how often each combination ({sigma}; {rho}, c) occurred in which a DSSP state {sigma} {H, E, C, G, B, S, T} was predicted by PSIPRED as state {rho} {H, E, C} with confidence value c {0, 1, ..., 9}. From this we calculated the probability P({sigma}; {rho}, c) for {sigma}, {rho} and c to occur together, as well as the probability P({sigma}) for {sigma} to occur and the probability P({rho}, c) for the pair ({rho}, c) to occur. In this way we derived ten 3 x 7 substitution matrices, one for each value of c:

(10)
Now suppose column i of HMM q has predicted secondary structure and confidence value and column j of HMM p has known secondary structure5 . The secondary structure score for qi and pj is obtained by multiplying the log-odds in MSS with a weight wSS,

(11)
This score is added to the amino acid column score Saa(qi, pj) in Equation (5). The weight coefficient wSS accounts for the fact that the secondary structure states are not independent of their neighbors. Since the average length of stretches of identical states of predicted secondary structure is ~7 we expect an optimum weight wSS {approx} 1/7. Empirically we indeed find a broad optimum around wSS = 0.15, the value we use in this benchmark.

Note that matrix MSS only quantifies how actual secondary structure is correlated with predicted secondary structure for one profile HMM. What is missing is a matrix that quantifies the mapping of the actual secondary structure from one HMM to another distantly related HMM. We derived such a 7 x 7 matrix and provided it with a variable exponent, but we find that an exponent of zero represents the optimum case, which is why this matrix was omitted in the following.6

We now come to the case of scoring predicted against predicted secondary structure. This time we need to account for the mapping of DSSP secondary structure states to predicted states twice, once for q and once for p. Again we can omit the mapping of the DSSP seven-state secondary structure from one profile HMM to a homologous HMM. The substitution matrix for the alignment of states ( and is

(12)
where the sum runs over all seven DSSP states. This matrix tells us how much more probable it is to obtain predictions and for a pair of aligned homologous residues than to obtain them independently of each other, whatever the actual secondary structure state {sigma} may be. The secondary structure score calculated from this matrix by SSS(qi,pj) = wSS is added to the column score with the same weight wSS = 0.15 as before.


    RESULTS AND DISCUSSION
 TOP
 Abstract
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 
We have performed an all-against-all comparison with various similarity search tools to test their ability to detect remote homologs and to produce high-quality alignments below the twilight zone (Doolittle, 1981) of sequence similarity. We compared BLAST and PSI-BLAST (version 2.2.9) as popular representatives of sequence–sequence and profile–sequence methods, the HMM–sequence comparison package HMMER (2.2g), the profile–profile alignment tools PROF_SIM (obtained 04/02/2004) and COMPASS (1.24), and our method HHsearch (1.0). All tools except COMPASS were run with default parameters.7

In order to pinpoint the source of improvments, we benchmarked four versions of HHsearch. HHsearch 0 uses simple profile–profile comparison by setting all gap opening penalties to –3.5 bits and all gap extension penalties to –0.2 bits and using these instead of the logarithms of the transition probabilities in Equations (5)(7). HHsearch 1 is the basic HMM–HMM version, HHsearch 2 includes the correlation score [Equation (9)]; in addition to this HHsearch 3 compares predicted with predicted secondary structure [Equation (12)] and HHsearch 4 uses predicted versus known secondary structure [Equation (10)].

The 3691 sequences of the SCOP database (Murzin et al., 1995) (version 1.63) filtered to a maximum sequence identity of 20% (‘SCOP-20’) were obtained from the ASTRAL server (Chandonia et al., 2004). Each sequence corresponds to a single structural domain, except for 73 sequences from the SCOP class of multi-domain proteins. An alignment was built from each seed sequence by PSI-BLAST with up to eight iterations. An inclusion threshold of 10–4 in the last iteration and 10–5 in previous iterations was used. We used several filters in order to make sure that only homologous sequences enter the alignments. All methods in the benchmark (except BLAST) were tested with this same set of alignments.

Detection of homologs
Each domain in SCOP is classified into a hierarchy of family, superfamily, fold and class (Murzin et al., 1995). Domains within one family are clearly homologous, based either on a sequence identity >30% or on a very similar structure and function at lower sequence similarity. Domains from the same superfamily but different family are likely to be homologous based on an expert analysis of structural and sequence similarity, location of binding sites, functional groups, etc. Domains in the same fold but different superfamilies share the same spatial arrangement and connectivity of secondary structure elements. They may be similar either by common descent or by convergence. Following SCOP, we classify each pair of domains as homologous if they are members of the same superfamily. Domains from different classes are classified as non-homologous. All other pairs are considered as ‘unknown’ in the benchmark since their evolutionary relationship cannot be ascertained.

Figure 2 is a classical chart with the number of true positives (TP) versus the number of false positives (FP). True positives are homologous pairs and false positives are non-homologous pairs with a score above a certain threshold. By varying the threshold score the curve of TP versus FP is traced out. The ideal method would detect all homologs before the first non-homologous pair is reported. The curve would rise up vertically from zero until it reached the total number of homologous pairs.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 2 Sensitivity of various homology detection tools, measured by how many true positives are detected at varying numbers of false positives in an all-against-all benchmark on SCOP-20. True positives are pairs from the same superfamily, false positives are pairs from different classes. Dashed straight line: error rate 10%. There are 41 505 true positives and 1.08 x 107 false positives in total. For definitions of HHsearch 0–4, please refer to the main text.

 
Starting from the bottom, we see that BLAST is obviously inadequate to search for homologies in such a difficult dataset. At a rate of false positives (‘error rate’) FP/(TP + FP) = 10% (dashed line) it finds only 908 homologous pairs, or 2.2% out of a total of 41 505. PSI-BLAST detects 17.7% and HMMER finds 18.7%. PROF_SIM and COMPASS find 24.9% and 34.0%, respectively. Next in performance is HHsearch 0 with 40.0% and the basic HMM–HMM version HHsearch 1 with 44.2%. Inclusion of the correlation score [Equation (9)] improves this value to 46.7% (HHsearch 2). When in addition, the predicted secondary structure is used for both HMMs, a value of 48.8% is achieved (HHsearch 3). And finally, HHsearch 4 uses actual secondary structure from DSSP in one of the two HMMs and finds 50.0% of the 41 505 homologs. This is a factor 23 more than BLAST, 2.8 and 2.7 times more than PSI-BLAST and HMMER, 2.0 times more than PROF_SIM and 1.47 times more than COMPASS.

All of the HHsearch versions in Figure 2 use local alignment. We found that the semi-global version did not perform nearly as well (data not shown). We believe that this is owing to the fact that distant homologs are often not alignable over their entire length but only over a core that defines their superfamily. The semi-global algorithm aligns these non-homologous regions by force which leads to random noise added to the score of the aligned homologous regions.

In an analysis of the complete data we found many pairs of sequences from different superfamilies and sometimes even different folds that HHsearch predicts as homologs with high confidence. In most cases their structures are also very similar, either in parts or globally. This convinced us that many superfamilies that are classified by SCOP into different folds are in fact homologous. We name just two examples, the TIM barrels (Henn-Sax et al., 2001) (SCOP superfamilies c.1.1 – c.1.25) and the beta propellers (SCOP folds b.66 – b.70). To test how well the various methods detect these cases of structural similarity and putative homology, we analyze the data with a second, alternative definition of true and false positives. A pair is now defined as true positive if the domains belong to the same SCOP superfamily or if the sequence-based alignment yields a structural alignment with a MaxSub score (Siew et al., 2000) of at least 0.1. Pairs of sequences from different classes and with zero MaxSub score are classified as non-homologous. All other relationships are classified as unknown. Roughly speaking, the MaxSub score tells us what fraction of the query residues can be structurally superposed with the aligned residues from the other structure. It is defined such that a score >0 occurs rarely by chance.8

Figure 3 plots true versus false positives for the new definition. The overall picture is similar to the previous figure with a few noteworthy differences. First, all tools except BLAST find more true positives at a fixed error rate. Second, the more sensitive tools improve more than the less sensitive ones, even in relative terms. This indicates that the new definition of true and false positives comes closer to defining homology than the previous, more rigid definition by SCOP superfamilies. The improvement is particularly conspicuous for HHsearch 3 and 4 that use secondary structure. The reason is that the ‘new’ true positives which come from different superfamilies are on average harder to detect than the ‘old’ true positives used in the previous figure. Since the less sensitive tools are not likely to detect them as homologs it is mainly the most sensitive tools which profit from their reclassification as true positives. Third, a notable exception to the above remark is the improvement of PROF_SIM and COMPASS in relation to the basic profile–profile version of HHsearch (HHsearch 0). Whereas HHsearch 0 was 18% more sensitive than COMPASS in Figure 2 it is only 6% more sensitive now (at 10% error rate). This could mean that they reach their peak performance at more remote relationships than HHsearch.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 3 Same as previous figure, but for a broader definition of true positives: True positives are pairs from the same superfamily or with MaxSub score of at least 0.1; false positives are pairs from different classes and zero MaxSub score.

 
To test this hypothesis we plot the number of true versus false positives again (Fig. 4), but this time we keep only the true positive pairs from different families. Indeed PROF_SIM further improves with respect to HHsearch 0 and COMPASS even draws equal. Remarkably, HHsearch 3 and 4 which use secondary structure information are now much more sensitive (~50%) than HHsearch 2. At an error rate of 10%, HHsearch 3 detects a factor of 190 more true positives than BLAST, 7.5 and 7.2 times more than PSI-BLAST and HMMER, 4.0 times more than PROF_SIM and 2.2 times more than COMPASS. Note that the improvement in sensitivity due to inclusion of secondary structure grows quickly with increasing evolutionary divergence.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 4 Same as in previous figure, but all pairs at family level are ignored. This leaves as closest homologs only the pairs related at the superfamily level.

 
Interestingly, the sensitivity for HHsearch in Figures 3 and 4 decreases slightly when known instead of predicted secondary structure is used in one HMM. The likely reason is that the way in which we score predicted versus predicted secondary structure makes it better optimized for remote homologies for which the secondary structures have diverged more: The scoring matrix for this case [Equation (12)] embodies twice as much uncertainty as the scoring matrix for known versus predicted secondary structure [Equation (10)].

Alignment quality
The quality of an alignment between a query protein and a distant homolog is critical to its usefulness for structure prediction, evolutionary studies and functional analysis. In comparative modeling, for example, the alignment between query and template is the key determinant of model quality (Venclovas, 2003). The quality of sequence alignments can be assessed by comparing them with reference alignments generated by structural alignment algorithms (‘the gold standard’). Here we employ a more direct approach, developed for the automatic assessment of structure prediction servers (Siew et al., 2000) where the generation of a structure-based sequence alignment is omitted as an intermediate step. One thus avoids the arbitrariness involved in transforming a structural superposition into a sequence alignment (see also O'Sullivan et al., 2003). Instead, the sequence alignment is assessed directly by looking at the spatial distances between aligned pairs of residues upon superposition of their 3D structures.

We use two scores for alignment quality. The first is the plain MaxSub score. A drawback of this score is that it does not penalize overprediction: pairs of residues that are wrongly predicted to be superposable are not penalized at all. A method optimized for this score will generate alignments of maximal length even when only a few residues can be reliably aligned. Similar to the MaxSub score is the developer's score, SDev = Ncorrect/min(Lq, Lp), where Ncorrect is the number of residue pairs that are present in the maximum subset identified by MaxSub, and Lq and Lp denote the number of residues in the two sequences to be aligned. At the other extreme, the so-called modeler's score does not penalize underprediction of residues. It is defined as SMod = Ncorrect/Lali, where Lali is the number of aligned residue pairs in the sequence alignment. A method optimized for this score alone would always predict just one pair of aligned residues.9

As the golden mean between these two extremes we define a ‘balanced’ score which penalizes both overprediction and underprediction:

(13)
We set Sbalanced to zero when the maximum subset contains less than 40 residue pairs. As for the MaxSub score, this ensures that a score larger than zero is unlikely to occur by chance. Other balanced scores have been proposed by Cline et al. (2002) and Yona and Levitt (2002).

Figure 5a–c plots the binned distribution of MaxSub scores for all pairs related at the family level (10 223 pairs), superfamily level (31 282 pairs) and fold level (66 813 pairs). First of all, note that PSI-BLAST produces much better alignments than BLAST. Second, the group of profile–profile methods PROF_SIM, COMPASS and HHsearch 0 perform clearly better than PSI-BLAST, especially at the superfamily level. Third, within this group COMPASS is a little better than PROF_SIM and HHsearch 0 is a little better than COMPASS at all levels of relationship. Fourth, aligning profile HMMs (HHsearch 2) instead of simple profiles (HHsearch 0) improves the alignment quality significantly, especially for the difficult alignments on the superfamily and fold levels. Fifth, adding predicted secondary structure greatly improves alignment quality on the superfamily and fold levels. Sixth, as a general trend the good methods get even better relative to the others with increasing difficulty of the alignments, the same as was observed for the sensitivity in Figures 24. Last, HMMER alignments have better MaxSub scores than the simple profile–profile methods because HMMer is run in its default global mode and MaxSub does not penalize overpredicted residues.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 5 Distribution of MaxSub scores for alignments of domain pairs related at the family, superfamily and fold level in percent of the total number of homologous sequence pairs at that level of relationship. Counts with MaxSub score of exactly zero are not shown.

 
Figure 6a–c plots the binned score distribution for the balanced score defined in Equation (13). The points discussed with the previous figure are borne out here, with the exception that HMMER now comes out as inferior to the profile–profile methods, as it should. HHsearch 3 is the clear winner. At the family level, it aligns 58% of all pairs with a balanced score of 0.3 or larger. This is 1.23 times more than COMPASS, 1.28 times more than PROF_SIM, 1.34 times more than HMMER, 1.57 times more than PSI-BLAST and 4.4 times more than BLAST. At the superfamily level, where 27% of HHsearch 3 alignments have a score of 0.3 or above, the improvement over the other tools is by a factor 1.7 (COMPASS), 1.9 (PROF_SIM), 2.2 (HMMER), 2.9 (PSI-BLAST) and 14 (BLAST). At the fold level, where 4.5% of HHsearch 3 alignments have a score of 0.3 or above, the factors are 3.3 (COMPASS), 6.0 (PROF_SIM), 7.3 (HMMER), 9.4 (PSI-BLAST) and 63 (BLAST).10



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 6 Distribution of balanced scores [Equation (13)] for alignments of domain pairs related at the family, superfamily and fold level. Counts with zero score are not shown.

 
In several recent benchmark studies, column scores for profile–profile alignment were compared for their ability to produce alignments similar to structure-based alignments (Mittelman et al., 2003; Panchenko, 2003; Marti-Renom et al., 2004; Edgar and Sjölander, 2004). In these studies differences in performance between the tested column scores are generally small and no clear winner has emerged. Indeed, the quality of alignments produced by HHsearch 0, COMPASS and PROF_SIM is rather similar in our benchmark. In this light the improvements by HMM–HMM alignment (HHsearch 2) and secondary structure scoring (HHsearch 3 or 4) is all the more remarkable and shows that they matter much more than the choice of column score.

Structure prediction
When predicting structure we are allowed to use the best match with a sequence in the database, whereas Figures 5 and 6 show the score distribution for all pairs at a given level. Figure 7a shows the score distribution of the best match in each of the 3691 database scans. Figure 7b plots the alignment score distribution of the best matches at or below the superfamily level, i.e. where members from the same family have been excluded as templates. Similarly, Figure 7c shows the score distribution for pairs at or below the fold level. For structure prediction the true secondary structure of the templates is available and HHsearch 4 can be used. We also show the results for HHsearch 4g, which is the same as HHsearch 4 except that the alignments have all been realigned with the semi-global algorithm.



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 7 Distribution of MaxSub scores for the best match in each of 3691 database scans that are related at maximum (a) at the family level, (b) at the superfamily, or (c) at the fold level. For HHsearch 4g all HHsearch 4 alignments are realigned with the semi-global algorithm.

 
As expected, the performance in Figure 7 depends on a combination of alignment quality and sensitivity per database scan because the more sensitive a method is, the better it will be able to rank the best 3D template at the top. HHsearch 4 is again much better than COMPASS and PROF_SIM. COMPASS and PROF_SIM are much better than PSI-BLAST and HMMER due to their much higher sensitivities. A bit surprisingly, PROF_SIM is better than COMPASS on the superfamily level and particularly so at the fold level. We think that the method of calculating its P-values is the cause for PROF_SIM's rather sub-optimal sensitivity in Figures 24. On a per-scan basis it seems to be even better than COMPASS in ranking the best structural templates at the top, at least below the family level. Finally, HHsearch 4g with its global alignments fares a bit better than HHsearch 4.

What chances does one have to get a structural template with a usable alignment? If a template from the same family as the query is available in the database HHsearch 4 will produce a usable alignment with MaxSub score ≥ 0.1 in 66% of all cases, and COMPASS and PROF_SIM in 56% of all cases. When the closest relative in the structure database is from the same superfamily, a usable alignment is produced in 44% (HHsearch 4), 29% (COMPASS) and 31% (PROF_SIM) of all cases. When the most closely related structure has the same fold, HHsearch 4 can still come up with an alignment with a score of at least 0.1 in 19% of all cases, COMPASS in 7.3% and PROF_SIM in 9.7%.


    CONCLUSION
 TOP
 Abstract
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 
We have generalized HMM–sequence alignment to the pairwise alignment of profile HMMs and presented a fast algorithm that maximizes the log-sum-of-odds score, the generalization of the well-known log-odds score. A novel correlation score was derived which increases the sensitivity by 5–10% at no cost and which can easily be applied to other similarity search methods. Moreover, we have proposed a statistical method to score predicted versus known secondary structure as well as predicted versus predicted secondary structure that exploits the confidence values of the secondary structure prediction. Based on these methods, we have developed the homology detection tool HHsearch which we benchmarked together with five other homology detection tools on a hard dataset below the twilight zone of sequence similarity (20% sequence identity). HHsearch represents a significant improvement over existing methods, both in terms of sensitivity and alignment quality, and the contributions to this improvement were analyzed.

Two servers (HHpred.2/3) that use HHsearch have been registered for the blind structure prediction contests CAFASP4 (Fischer et al., 2003) and LiveBench (Rychlewski et al., 2003). Preliminary results are below our expectations and indicate that the multiple alignment construction method rather than HHsearch limited the performance, since it was geared too much to high selectivity at the cost of sensitivity. We plan to improve this by using seperate alignment databases for structure prediction and homology detection.

We hope that HHsearch will be a useful tool for functional annotation, structure prediction and protein evolution. We have set up a web server for homology detection and structure prediction that we plan to extend into a structure and function prediction pipeline with maximum flexibility for manual use. But the speed of HHsearch should also allow an application to large-scale automatic annotation projects and any requests in this direction are welcome.


    Acknowledgments
 
I am grateful to Andrei Lupas for many fruitful discussions, mentoring and encouragement. Many thanks go to Daniel Huson for critically reading the manuscript, to Ruslan Sadreyev and Golan Yona for making their tools COMPASS and PROF_SIM available, and to an anonymous referee for his helpful comments.


    Footnotes
 
1A problem with this approach is that the probabilities given for example by PSIPRED (Jones, 1999) do not represent a probability distribution since they do not sum to one. Back

2The {pi}-helix is very rare and is mapped to the coil state. Back

3Residues or columns from multiple alignments are homologous if they evolve from the same residue in an ancestral sequence. Back

4We devised an alignment algorithm that maximizes the total score including the correlation score but the performance was only marginally better at approximately twice the computation time. Back

5The known secondary structure of p is the secondary structure of the seed sequence of the alignment. Back

6One explanation is that the PSIPRED prediction is calculated from alignments that may be quite diverse. Therefore the matrix MSS already contains some contribution of evolution in time. Back

7We obtained significantly better results by changing the default setting ‘–g 0.5’ to ‘–g 1.0’ and building the profiles from those columns of the multiple alignment that have a residue in the seed sequence instead of using the 50% gaps rule. Back

8More specifically, MaxSub equals the weighted number of aligned pairs that can be superimposed with a maximum distance per pair of 3.5 Å, divided by the number of residues in the query sequence. Pairs with Å deviation carry weight 1 and pairs with 3.5 Å deviation have weight 0.5. If no subset with 40 or more aligned residue pairs can be found that are within 3.5 Å and if no more than 25 such pairs can be found with score ≥ 0.125 the MaxSub score is set to 0. Back

9The developer's score and the modeler's score were first defined by Sauder et al. (2000) in a slightly different way using the structure-based alignments as gold standard. In their definition, min(Lq,Lp) is replaced by the number of aligned residue pairs in the structural alignment and Ncorrect refers to the number of residues which are aligned in the same way as in the structure-based sequence alignment. Back

10Note that 4.5% of all alignments at the fold level is quite a lot. Domain pairs related at the fold level are deemed non-homologous by SCOP and we might not expect any reasonably good alignments at all. This relatively high number suggests that many sequences classified into different superfamilies by SCOP are in fact homologous. Back

Received on July 7, 2004; revised on October 18, 2004; accepted on November 2, 2004

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 

    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410[CrossRef][Web of Science][Medline].

    Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res., 25, 3389–3402[Abstract/Free Full Text].

    Barrett, C., Hughey, R., Karplus, K. (1997) Scoring hidden markov models. Comput. Appl. Biosci., 13, 191–199[Abstract/Free Full Text].

    Bork, P. and Koonin, E.V. (1998) Predicting functions from protein sequences – where are the bottlenecks. Nat. Genet., 18, 313–318[CrossRef][Web of Science][Medline].

    Chandonia, J.M., Hon, G., Walker, N.S., Lo Conte, L., Koehl, P., Levitt, M., Brenner, S. (2004) The ASTRAL compendium in 2004. Nucleic Acids Res., 32, D189–D192[Abstract/Free Full Text].

    Cline, M., Hughey, R., Karplus, K. (2002) Predicting reliable regions in protein sequence alignments. Bioinformatics, 18, 306–314[Abstract/Free Full Text].

    Doolittle, R.F. (1981) Similar amino acid sequences: chance or common ancestry. Science, 214, 149–159[Abstract/Free Full Text].

    Durbin, R., Eddy, S., Krogh, A., Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, (1998) , Cambridge Cambridge University Press.

    Eddy, S.R. (1998) Profile hidden markov models. Bioinformatics, 14, , pp. 755–763[Abstract/Free Full Text].

    Edgar, R.C. and Sjölander, K. (2003) SATCHMO: sequence alignment and tree construction using hidden markov models. Bioinformatics, 19, 1404–1411[Abstract/Free Full Text].

    Edgar, R.C. and Sjölander, K. (2004) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics, 20, 1301–1308[Abstract/Free Full Text].

    Fischer, D., Rychlewski, L., Dunbrack, R.L.J., Ortiz, A.R., Elofsson, A. (2003) Cafasp3: the third critical assessment of fully automated structure prediction methods. Proteins, 53, 503–516.

    Ginalski, K., Pas, J., Wyrwicz, L.S., von Grotthus, M., Bujnicki, J.M., Rychlewski, L. (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acid Res., 31, 3804–3807[Abstract/Free Full Text].

    Gonnet, G.H., Cohen, M.A., Brenner, S.A. (1992) Exhaustive matching of the entire protein sequence database. Science, 256, 1443–1445[Abstract/Free Full Text].

    Hargbo, J. and Elofsson, A. (1999) Hidden markov models that use predicted secondary structures for fold recognition. Proteins, 36, 68–76[CrossRef][Web of Science][Medline].

    Henikoff, S. and Henikoff, J.G. (1994) Position-based sequence weights. J. Mol. Biol., 243, 574–578[CrossRef][Web of Science][Medline].

    Henn-Sax, H.B., Wilmanns, M., Sterner, R. (2001) Divergent evolution of (ß {alpha})8–barrel enzymes. Biol. Chem., 382, 1315–1320[CrossRef][Web of Science][Medline].

    Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292, 195–202[CrossRef][Web of Science][Medline].

    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637[CrossRef][Web of Science][Medline].

    Karplus, K., Karchin, R., Barrett, C., Tu, S., Cline, M., Diekhans, M., Grate, L., Casper, J., Hughey, R. (2001) What is the value added by human intervention in protein structure prediction. Proteins, 45, Suppl. 5, 86–91[CrossRef].

    Kawabata, T. and Nishikawa, K. (2000) Protein structure comparison using the markov transition model of evolution. Proteins, 41, 108–122[CrossRef][Web of Science][Medline].

    Kelley, L.A., MacCallum, R.M., Sternberg, M.J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol., 299, 499–520[Web of Science][Medline].

    Kinch, L. and Grishin, N. (2002) Evolution of protein structures and functions. Curr. Opin. Struct. Biol., 12, 400–408[CrossRef][Web of Science][Medline].

    Kinch, L.N., Wrabl, J.O., Krishna, S.S., Majumdar, I., Sadreyev, R.I., Qi, Y., Pei, C.H.J., Grishin, N.V. (2003) CASP5 assessment of fold recognition target predictions. Proteins, 53, 395–409.

    Koh, I., Eyrich, V.A., Marti-Renom, M.A., Przybylski, D., Madhusudhan, M.S., Eswar, N., Grana, O., Pazos, F., Valencia, A., Sali, A., Rost, B. (2003) EVA: evaluation of protein structure prediction servers. Nucleic Acids Res., 31, 3311–3315[Abstract/Free Full Text].

    Krogh, A., Brown, M., Mian, I.S., Sjölander, K., Haussler, D. (1994) Hidden markov models in computational biology. Applications to protein modeling. J. Mol. Biol., 235, 1501–1531[CrossRef][Web of Science][Medline].

    Kunin, V., Chan, B., Sitbon, E., Lithwick, G., Pietrokovski, S. (2001) Consistency analysis of similarity between multiple alignments: prediction of protein function and fold structure from analysis of local sequence motifs. J. Mol. Biol., 307, 939–949[CrossRef][Web of Science][Medline].

    Lyngsø, R.B., Pedersen, C.N.S., Nielsen, H. (1999) Metrics and similarity measures for hidden markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol., 178–186.

    Marti-Renom, M.A., Madhusudhan, M.S., Sali, A. (2004) Alignment of protein sequences by their profiles. Protein Sci., 13, 1071–1087[CrossRef][Web of Science][Medline].

    Mittelman, D., Sadreyev, R., Grishin, N.V. (2003) Probabilistic scoring measures for profile–profile comparison yields more accurate short seed alignments. Bioinformatics, 19, 1531–1539[Abstract/Free Full Text].

    Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536–540[CrossRef][Web of Science][Medline].

    O'Sullivan, O., Zehnder, M., Higgins, D., Bucher, P., Grosdidier, A., Notredame, C. (2003) APDB: a novel measure for benchmarking sequence alignment methods without reference alignments. Bioinformatics, 19, i215–i221[Abstract].

    Panchenko, A.R. (2003) Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res., 31, 683–689[Abstract/Free Full Text].

    Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 85, 2444–2448[Abstract/Free Full Text].

    Pei, J. and Grishin, N.V. (2001) AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics, 17, 700–712[Abstract/Free Full Text].

    Pei, J., Sadreyev, R., Grishin, N.V. (2003) PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics, 19, 427–428[Abstract/Free Full Text].

    Pietrokovski, S. (1996) Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Res., 24, 3836–3845[Abstract/Free Full Text].

    Rychlewski, L., Fischer, D., Elofsson, A. (2003) LiveBench–6: large-scale automated evaluation of protein structure prediction servers. Proteins, 53, 542–547.

    Rychlewski, L., Jaroszewski, L., Li, W., Godzik, A. (2000) Comparison of sequence-profiles. strategies for structural predictions using sequence information. Protein Sci., 9, 232–241[Web of Science][Medline].

    Sadreyev, R.I., Baker, D., Grishin, N.V. (2003) Profile–profile comparisons by COMPASS predict intricate homologies between protein families. Protein Sci., 12, 2262–2272[CrossRef][Web of Science][Medline].

    Sadreyev, R.I. and Grishin, N.V. (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J. Mol. Biol., 326, 317–336[CrossRef][Web of Science][Medline].

    Sauder, J.M., Arthur, J.W., Dunbrack, R.L.J. (2000) Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins, 40, 6–22[CrossRef][Web of Science][Medline].

    Siew, N., Elofsson, A., Rychlewski, L., Fischer, D. (2000) MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics, 16, 776–785[Abstract/Free Full Text].

    Tang, C.L., Xie, L., Koh, I.Y., Posy, S., Alexov, E., Honig, B. (2003) On the role of structural information in remote homology detection and sequence alignment: new methods using hybrid sequence profiles. J. Mol. Biol., 334, 1043–1062[CrossRef][Web of Science][Medline].

    Thompson, J.D., Higgins, D.G., Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680[Abstract/Free Full Text].

    Tomii, K. and Akiyama, Y. (2004) FORTE: a profile–profile comparison tool for protein fold recognition. Bioinformatics, 20, 594–595[Abstract/Free Full Text].

    Venclovas, C. (2003) Comparative modeling in CASP5: progress is evident, but alignment errors remain a significant hindrance. Proteins, 53, 380–388.

    von Öhsen, N., Sommer, I., Zimmer, R. (2003) Profile–profile alignment: a powerful tool for protein structure prediction. Pac. Symp. Biocomput., 252–263.

    Wang, G. and Dunbrack, R.L.J. (2004) Scoring profile–profile sequence alignments. Protein Sci., 13, 1612–1626[CrossRef][Web of Science][Medline].

    Yona, G. and Levitt, M. (2002) Within the twilight zone: a sensitive profile–profile comparison tool based on information theory. J. Mol. Biol., 315, 1257–1275[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Bacteriol.Home page
O. Tsoy, D. Ravcheev, and A. Mushegian
Comparative Genomics of Ethanolamine Utilization
J. Bacteriol., December 1, 2009; 191(23): 7157 - 7164.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Dlakic
HHsvm: fast and accurate classification of profile-profile matches identified by HHsearch
Bioinformatics, December 1, 2009; 25(23): 3071 - 3076.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. D. Finn, J. Mistry, J. Tate, P. Coggill, A. Heger, J. E. Pollington, O. L. Gavin, P. Gunasekaran, G. Ceric, K. Forslund, et al.
The Pfam protein families database
Nucleic Acids Res., November 17, 2009; (2009) gkp985v1.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
T. A. Leski, C. C. Caswell, M. Pawlowski, D. J. Klinke, J. M. Bujnicki, S. J. Hart, and S. Lukomski
Identification and Classification of bcl Genes and Proteins of Bacillus cereus Group Organisms and Their Application in Bacillus anthracis Detection and Fingerprinting
Appl. Envir. Microbiol., November 15, 2009; 75(22): 7163 - 7172.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Sanchez, M. Drechsler, H. Stark, and G. Lipps
DNA translocation activity of the multifunctional replication protein ORF904 from the archaeal plasmid pRN1
Nucleic Acids Res., November 1, 2009; 37(20): 6831 - 6848.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. M. Thevelein and K. Voordeckers
Functioning and Evolutionary Significance of Nutrient Transceptors
Mol. Biol. Evol., November 1, 2009; 26(11): 2407 - 2414.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
M. Rastgou, M. K. Habibi, K. Izadpanah, V. Masenga, R. G. Milne, Y. I. Wolf, E. V. Koonin, and M. Turina
Molecular characterization of the plant virus genus Ourmiavirus and evidence of inter-kingdom reassortment of viral genome segments as its possible route of origin
J. Gen. Virol., October 1, 2009; 90(10): 2525 - 2535.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
S. Broer, H.-P. Schneider, A. Broer, and J. W. Deitmer
Mutation of Asparagine 76 in the Center of Glutamine Transporter SNAT3 Modulates Substrate-induced Conductances and Na+ Binding
J. Biol. Chem., September 18, 2009; 284(38): 25823 - 25831.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
Z. Zhang, T. Albers, H. L. Fiumera, A. Gameiro, and C. Grewer
A Conserved Na+ Binding Site of the Sodium-coupled Neutral Amino Acid Transporter 2 (SNAT2)
J. Biol. Chem., September 11, 2009; 284(37): 25314 - 25323.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
S. Balasubramanian, T. R. Kannan, P. J. Hart, and J. B. Baseman
Amino Acid Changes in Elongation Factor Tu of Mycoplasma pneumoniae and Mycoplasma genitalium Influence Fibronectin Binding
Infect. Immun., September 1, 2009; 77(9): 3533 - 3541.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Wang, R. I. Sadreyev, and N. V. Grishin
PROCAIN server for remote protein sequence similarity search
Bioinformatics, August 15, 2009; 25(16): 2076 - 2077.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. H. Fong and A. Marchler-Bauer
CORAL: aligning conserved core regions across domain families
Bioinformatics, August 1, 2009; 25(15): 1862 - 1868.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Lobley, M. I. Sadowski, and D. T. Jones
pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination
Bioinformatics, July 15, 2009; 25(14): 1761 - 1767.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B. W. Brandt and J. Heringa
webPRC: the Profile Comparer for alignment-based searching of public domain databases
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W48 - W52.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. I. Sadreyev, M. Tang, B.-H. Kim, and N. V. Grishin
COMPASS server for homology detection: improved statistical accuracy, speed and functionality
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W90 - W94.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B.-H. Kim, H. Cheng, and N. V. Grishin
HorA web server to infer homology between proteins using sequence and structural similarity
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W532 - W538.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Remmert, D. Linke, A. N. Lupas, and J. Soding
HHomp--prediction and classification of outer membrane proteins
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W446 - W451.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J.-L. Pons and G. Labesse
@TOME-2: a new pipeline for comparative modeling of protein-ligand complexes
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W485 - W491.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
O. Fornes, R. Aragues, J. Espadaler, M. A. Marti-Renom, A. Sali, and B. Oliva
ModLink+: improving fold recognition by using protein-protein interactions
Bioinformatics, June 15, 2009; 25(12): 1506 - 1512.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Wang, R. I. Sadreyev, and N. V. Grishin
PROCAIN: protein profile comparison with assisting information
Nucleic Acids Res., June 1, 2009; 37(11): 3522 - 3530.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
R. de Groot, A. Bardhan, N. Ramroop, D. A. Lane, and J. T. B. Crawley
Essential role of the disintegrin-like domain in ADAMTS13 function
Blood, May 28, 2009; 113(22): 5609 - 5616.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
F. I. Andersson, A. Tryggvesson, M. Sharon, A. V. Diemand, M. Classen, C. Best, R. Schmidt, J. Schelin, T. M. Stanne, B. Bukau, et al.
Structure and Function of a Novel Type of ATP-dependent Clp Protease
J. Biol. Chem., May 15, 2009; 284(20): 13519 - 13532.
[Abstract] [Full Text] [PDF]


Home page
DatabaseHome page
S. Shi, J. Pei, R. I. Sadreyev, L. N. Kinch, I. Majumdar, J. Tong, H. Cheng, B.-H. Kim, and N. V. Grishin
Analysis of CASP8 targets, predictions and assessment methods
Database, April 28, 2009; 2009(0): bap003 - bap003.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
J. Stuttmann, E. Lechner, R. Guerois, J. E. Parker, L. Nussaume, P. Genschik, and L. D. Noel
COP9 Signalosome- and 26S Proteasome-dependent Regulation of SCFTIR1 Accumulation in Arabidopsis
J. Biol. Chem., March 20, 2009; 284(12): 7920 - 7930.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. G. Leiman, M. Basler, U. A. Ramagopal, J. B. Bonanno, J. M. Sauder, S. Pukatzki, S. K. Burley, S. C. Almo, and J. J. Mekalanos
From the Cover: Type VI secretion apparatus and phage tail-associated protein complexes share a common evolutionary origin
PNAS, March 17, 2009; 106(11): 4154 - 4159.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. Jung and D. Kim
SIMPRO: simple protein homology detection method by using indirect signals
Bioinformatics, March 15, 2009; 25(6): 729 - 735.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. Biegert and J. Soding
Sequence context-specific profiles for homology searching
PNAS, March 10, 2009; 106(10): 3770 - 3775.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Bateman, R. D. Finn, P. J. Sims, T. Wiedmer, A. Biegert, and J. Soding
Phospholipid scramblases and Tubby-like proteins belong to a new superfamily of membrane tethered transcription factors
Bioinformatics, January 15, 2009; 25(2): 159 - 162.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
J. Nakonieczna, T. Kaczorowski, A. Obarska-Kosinska, and J. M. Bujnicki
Functional Analysis of MmeI from Methanol Utilizer Methylophilus methylotrophus, a Subtype IIC Restriction-Modification Enzyme Related to Type I Enzymes
Appl. Envir. Microbiol., January 1, 2009; 75(1): 212 - 223.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Kiefer, K. Arnold, M. Kunzli, L. Bordoli, and T. Schwede
The SWISS-MODEL Repository and associated resources
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D387 - D392.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
S. Aviram, E. Simon, T. Gildor, F. Glaser, and D. Kornitzer
Autophosphorylation-Induced Degradation of the Pho85 Cyclin Pcl5 Is Essential for Response to Amino Acid Limitation
Mol. Cell. Biol., November 15, 2008; 28(22): 6858 - 6869.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Madera
Profile Comparer: a program for scoring and aligning profile hidden Markov models
Bioinformatics, November 15, 2008; 24(22): 2630 - 2631.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
X. Guo and X. Gao
A novel hierarchical ensemble classifier for protein fold recognition
Protein Eng. Des. Sel., November 1, 2008; 21(11): 659 - 664.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Meisner, X. Wang, M. Serrano, A. O. Henriques, and C. P. Moran Jr
A channel connecting the mother cell and forespore during bacterial endospore formation
PNAS, September 30, 2008; 105(39): 15100 - 15105.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Loewenstein and M. Linial
Connect the dots: exposing hidden protein family connections from the entire sequence tree
Bioinformatics, August 15, 2008; 24(16): i193 - i199.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
S. Eisenbeis, S. Lohmiller, M. Valdebenito, S. Leicht, and V. Braun
NagA-Dependent Uptake of N-Acetyl-Glucosamine and N-Acetyl-Chitin Oligosaccharides across the Outer Membrane of Caulobacter crescentus
J. Bacteriol., August 1, 2008; 190(15): 5230 - 5238.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
B. E. Dutilh, B. Snel, T. J. G. Ettema, and M. A. Huynen
Signature Genes as a Phylogenomic Tool
Mol. Biol. Evol., August 1, 2008; 25(8): 1659 - 1667.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Roovers, K. H. Kaminska, K. L. Tkaczuk, D. Gigot, L. Droogmans, and J. M. Bujnicki
The YqfN protein of Bacillus subtilis is the tRNA: m1A22 methyltransferase (TrmK)
Nucleic Acids Res., June 1, 2008; 36(10): 3252 - 3262.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Orlowski and J. M. Bujnicki
Structural and evolutionary classification of Type II restriction enzymes based on theoretical and experimental analyses
Nucleic Acids Res., June 1, 2008; 36(11): 3552 - 3569.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
J. White, Z. Li, R. Sardana, J. M. Bujnicki, E. M. Marcotte, and A. W. Johnson
Bud23 Methylates G1575 of 18S rRNA and Is Required for Efficient Nuclear Export of Pre-40S Subunits
Mol. Cell. Biol., May 15, 2008; 28(10): 3151 - 3161.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Szczesny and A. Lupas
Domain annotation of trimeric autotransporter adhesins--daTAA
Bioinformatics, May 15, 2008; 24(10): 1251 - 1256.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Poleksic and M. Fienup
Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms
Bioinformatics, May 1, 2008; 24(9): 1145 - 1153.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. I. Sadreyev and N. V. Grishin
Accurate statistical model of comparison between multiple sequence alignments
Nucleic Acids Res., April 1, 2008; 36(7): 2240 - 2248.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Wu and Y. Zhang
A comprehensive assessment of sequence-based and template-based methods for protein contact prediction
Bioinformatics, April 1, 2008; 24(7): 924 - 931.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. R. Shah, C. S. Oehmen, and B.-J. Webb-Robertson
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection
Bioinformatics, March 15, 2008; 24(6): 783 - 790.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Biegert and J. Soding
De novo identification of highly diverged protein repeats by probabilistic consistency
Bioinformatics, March 15, 2008; 24(6): 807 - 814.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
S. Sundaram, B. Rathinasabapathi, L. Q. Ma, and B. P. Rosen
An Arsenate-activated Glutaredoxin from the Arsenic Hyperaccumulator Fern Pteris vittata L. Regulates Intracellular Arsenite
J. Biol. Chem., March 7, 2008; 283(10): 6095 - 6101.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. D. Finn, J. Tate, J. Mistry, P. C. Coggill, S. J. Sammut, H.-R. Hotz, G. Ceric, K. Forslund, S. R. Eddy, E. L. L. Sonnhammer, et al.
The Pfam protein families database
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D281 - D288.
[Abstract] [Full Text] [PDF]


Home page
JCBHome page
K. Michelsen, V. Schmid, J. Metz, K. Heusser, U. Liebel, T. Schwede, A. Spang, and B. Schwappach
Novel cargo-binding site in the {beta} and {delta} subunits of coatomer
J. Cell Biol., October 22, 2007; 179(2): 209 - 217.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. J. Reid, C. Yeats, and C. A. Orengo
Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone
Bioinformatics, September 15, 2007; 23(18): 2353 - 2360.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Heger, S. Mallick, C. Wilton, and L. Holm
The global trace graph, a novel paradigm for searching protein sequence databases
Bioinformatics, September 15, 2007; 23(18): 2361 - 2367.
[Abstract] [Full Text] [PDF]


Home page
RNAHome page
C. Dez, M. Dlakic, and D. Tollervey
Roles of the HEAT repeat proteins Utp10 and Utp20 in 40S ribosome maturation
RNA, September 1, 2007; 13(9): 1516 - 1527.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. I. Sadreyev, M. Tang, B.-H. Kim, and N. V. Grishin
COMPASS server for remote homology inference
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W653 - W658.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Cheng
DOMAC: an accurate, hybrid protein domain prediction server
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W354 - W356.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Smits, J. A. M. Smeitink, L. P. van den Heuvel, M. A. Huynen, and T. J. G. Ettema
Reconstructing the evolution of the mitochondrial ribosomal proteome
Nucleic Acids Res., July 9, 2007; 35(14): 4686 - 4703.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
D. Westphal, E. C. Ledgerwood, M. H. Hibma, S. B. Fleming, E. M. Whelan, and A. A. Mercer
A Novel Bcl-2-Like Inhibitor of Apoptosis Is Encoded by the Parapoxvirus Orf Virus
J. Virol., July 1, 2007; 81(13): 7178 - 7188.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Z. Yu, P.-A. Genest, B. ter Riet, K. Sweeney, C. DiPaolo, R. Kieft, E. Christodoulou, A. Perrakis, J. M. Simmons, R. P. Hausinger, et al.
The protein that binds to DNA base J in trypanosomatids has features of a thymidine hydroxylase
Nucleic Acids Res., April 1, 2007; 35(7): 2107 - 2115.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Sukackaite, A. Lagunavicius, K. Stankevicius, C. Urbanke, C. Venclovas, and V. Siksnys
Restriction endonuclease BpuJI specific for the 5'-CCCGT sequence is related to the archaeal Holliday junction resolvase family
Nucleic Acids Res., April 1, 2007; 35(7): 2377 - 2389.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Pei and N. V. Grishin
PROMALS: towards accurate multiple sequence alignments of distantly related proteins
Bioinformatics, April 1, 2007; 23(7): 802 - 808.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Bateman and R. D. Finn
SCOOP: a simple method for identification of novel protein superfamily relationships
Bioinformatics, April 1, 2007; 23(7): 809 - 814.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
C. A. S. Banks, S. E. Kong, H. Spahr, L. Florens, S. Martin-Brown, M. P. Washburn, J. W. Conaway, A. Mushegian, and R. C. Conaway
Identification and Characterization of a Schizosaccharomyces pombe RNA Polymerase II Elongation Factor with Similarity to the Metazoan Transcription Factor ELL
J. Biol. Chem., February 23, 2007; 282(8): 5761 - 5769.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. D. Silva, L. Shen, V. Tcherepanov, C. Watson, and C. Upton
Predicted function of the vaccinia virus G5R protein
Bioinformatics, December 1, 2006; 22(23): 2846 - 2850.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Dlakic
DUF283 domain of Dicer proteins has a double-stranded RNA-binding fold
Bioinformatics, November 15, 2006; 22(22): 2711 - 2714.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
J. Boekhorst, M. Wels, M. Kleerebezem, and R. J. Siezen
The predicted secretome of Lactobacillus plantarum WCFS1 sheds light on interactions with its environment.
Microbiology, November 1, 2006; 152(Pt 11): 3175 - 3183.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
M. Liao and M. Kielian
Site-Directed Antibodies against the Stem Region Reveal Low pH-Induced Conformational Changes of the Semliki Forest Virus Fusion Protein
J. Virol., October 1, 2006; 80(19): 9599 - 9607.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
W. S. A. Maaty, A. C. Ortmann, M. Dlakic, K. Schulstad, J. K. Hilmer, L. Liepold, B. Weidenheft, R. Khayat, T. Douglas, M. J. Young, et al.
Characterization of the Archaeal Thermophile Sulfolobus Turreted Icosahedral Virus Validates an Evolutionary Link among Double-Stranded DNA Viruses from All Domains of Life.
J. Virol., August 1, 2006; 80(15): 7625 - 7635.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Soding, M. Remmert, and A. Biegert
HHrep: de novo protein repeat detection and the origin of TIM barrels.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W137 - W142.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Biegert, C. Mayer, M. Remmert, J. Soding, and A. N. Lupas
The MPI Bioinformatics Toolkit for protein sequence analysis.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W335 - W339.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Soding, M. Remmert, A. Biegert, and A. N. Lupas
HHsenser: exhaustive transitive profile search using HMM-HMM comparison.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W374 - W378.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Cheng and P. Baldi
A machine learning information retrieval approach to protein fold recognition
Bioinformatics, June 15, 2006; 22(12): 1456 - 1463.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
E. Becker, V. Meyer, H. Madaoui, and R. Guerois
Detection of a tandem BRCT in Nbs1 and Xrs2 with functional implications in the DNA damage response
Bioinformatics, June 1, 2006; 22(11): 1289 - 1292.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. W. Ginzinger and J. Fischer
SimShift: Identifying structural similarities from NMR chemical shifts
Bioinformatics, February 15, 2006; 22(4): 460 - 465.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. Devos, S. Dokudovskaya, R. Williams, F. Alber, N. Eswar, B. T. Chait, M. P. Rout, and A. Sali
Simple fold composition and modular architecture of the nuclear pore complex
PNAS, February 14, 2006; 103(7): 2172 - 2177.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
J. Boekhorst, Q. Helmer, M. Kleerebezem, and R. J. Siezen
Comparative analysis of proteins with a mucus-binding domain found exclusively in lactic acid bacteria
Microbiology, January 1, 2006; 152(1): 273 - 280.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
J. Jin, Y. Cai, T. Yao, A. J. Gottschalk, L. Florens, S. K. Swanson, J. L. Gutierrez, M. K. Coleman, J. L. Workman, A. Mushegian, et al.
A Mammalian Chromatin Remodeling Complex with Similarities to the Yeast INO80 Complex
J. Biol. Chem., December 16, 2005; 280(50): 41207 - 41212.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
H. Neugebauer, C. Herrmann, W. Kammer, G. Schwarz, A. Nordheim, and V. Braun
ExbBD-Dependent Transport of Maltodextrins through the Novel MalA Protein across the Outer Membrane of Caulobacter crescentus
J. Bacteriol., December 15, 2005; 187(24): 8300 - 8311.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
K. Suhre
Gene and Genome Duplication in Acanthamoeba polyphaga Mimivirus
J. Virol., November 15, 2005; 79(22): 14095 - 14101.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. Tilburn, J. C. Sanchez-Ferrero, E. Reoyo, H. N. Arst Jr., and M. A. Penalva
Mutational Analysis of the pH Signal Transduction Component PalC of Aspergillus nidulans Supports Distant Similarity to BRO1 Domain Family Members
Genetics, September 1, 2005; 171(1): 393 - 401.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
A. Gibson, A. P. Lewis, K. Affleck, A. J. Aitken, E. Meldrum, and N. Thompson
hCLCA1 and mCLCA3 Are Secreted Non-integral Membrane Proteins and Therefore Are Not Ion Channels
J. Biol. Chem., July 22, 2005; 280(29): 27205 - 27212.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Soding, A. Biegert, and A. N. Lupas
The HHpred interactive server for protein homology detection and structure prediction
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W244 - W248.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
B. Schuster-Bockler and A. Bateman
Visualizing profile-profile alignment: pairwise HMM logos
Bioinformatics, June 15, 2005; 21(12): 2912 - 2913.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow A correction has been published
Right arrow All Versions of this Article:
21/7/951    most recent
bti125v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (187)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Söding, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Söding, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?