Protein structure alignment considering phenotypic plasticity
Practical Informatics and Bioinformatics Group, Department of Informatics, Ludwig-Maximilians-University, Amalienstr. 17, D-80333, Munich, Germany
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Protein structure comparison exhibits differences and similarities of proteins and protein families and may help to elucidate protein sequence and structure evolution. Despite many methods to score protein structure similarity with and without flexibility and to align proteins accurately based on their structures, a meaningful evolutionary distance measure and alignment method which models the cost of mutations, insertions and deletions occurring in protein sequences on the structure level is still missing.
Results: Here, we introduce a new measure for protein structure similarity and propose a novel method called phenotypic plasticity method (PPM) which explicitly tries to model the evolutionary distance of two proteins on the structure level by measuring the cost of morphing one structure into the other one. PPM aligns protein structures taking variations naturally observed in groups of structures (phenotypic plasticity) into account while preserving the overall topological arrangement of the structures. The performance of PPM in detecting similarities between protein structures is evaluated against well-known structure classification methods on two benchmark sets. The larger set consists of more than 3.6 million structure pairs from the SCOP database which are also consistently classified in CATH. In the current parameterization, PPM already performs comparable or better than other methods such as TM-Align and Vorolign on those two sets according to various evaluation criteria showing that the method is able to reliably classify known protein structures, to detect their similarities and to compute accurate alignments despite phenotypic plasticity.
Availability: Executables are available upon request. Datasets and supplementary data (datasets and superpositions) can be accessed on http://www.bio.ifi.lmu.de/PPM
Contact: ralf.zimmer{at}bio.ifi.lmu.de
| 1 INTRODUCTION |
|---|
|
|
|---|
The comparison of protein structures and the detection of their similarities and differences is one of the major steps in understanding the interplay of protein sequence and protein structure evolution.
To date, the gold standard for protein structure classification are the (manually) curated databases SCOP (Andreeva et al., 2007) and CATH (Greene et al., 2007). Despite large efforts over many years, these gold standards are known to disagree to some extent (Hadley and Jones, 1999), which indicates that defining and identifying structural similarity is difficult and still controversial. For benchmarking purposes of new measures and methods it is therefore suggestive to use consensus sets defined from both classifications. SCOP classifies structures into classes, folds, superfamilies and families in a hierarchic manner. But with the growing number of protein structures in the PDB (Berman et al., 2000) and the arrival of structural genomics projects (Grabowski et al., 2007) there has been a growing need to model expert knowledge in automatic methods for protein structure classification and protein structure analysis. Currently (February 2008), out of the 45 000 protein structures in the PDB, more than 10 000 are not classified in the most recent SCOP version 1.73.
A crucial step in structure analysis is the computation of a structural alignment of two or more protein structures and, in contrast to sequence alignment methods, structure alignment methods aim directly at optimizing the structural similarity of the input proteins.
Sequence alignments are commonly computed using amino acid exchange matrices as well as gap penalties. Those parameters explicitly model the process of sequence evolution by mutations as well as insertions and deletions. It is difficult to propose an equivalent model for protein structure evolution which could be used for structure alignment.
Therefore, to date, all structure alignment methods capture protein structure similarity not by the evolutionary cost of mutating (morphing) one sequence and structural conformation into the other but by a similarity of the 3D objects. The methods differ in their score for protein structure similarity as well as in the algorithms used to optimize this score. Different similarity measures may be differently well suited for specific applications.
Many methods measure structural similarity based on a rigid body superposition of the input structures (given the alignment) and for that purpose several scores have been proposed. Among them are CE (Shindyalov and Bourne, 1998) (optimizing the RMSD) and TM-Align (Zhang and Skolnick, 2005) optimizing the TM-score (Zhang and Skolnick, 2004).
Though being a valid approach for many applications, rigid superposition neglects the fact that proteins are dynamic and flexible objects that allow for (significant) structural variation even within families and much more so within superfamilies and folds. Therefore, in recent years, also methods for flexible protein structure alignment have been published that account for large scale structural flexibilities and can align protein structures even in cases that show large conformational changes. A well known method is FATCAT (Ye and Godzik, 2003).
While the two approaches discussed so far explicitly use the coordinates of the protein structure in the scoring and optimization process, other methods work on different representations like contact matrices (Holm and Sander, 1993), graphs (Harrison et al., 2003) or Voronoi cells (Birzele et al., 2007). Those methods also allow for a certain amount of intrinsic flexibility of the protein structures as long as the contact networks are conserved.
The method proposed here is based on the observation that, despite conformational changes and large scale flexibilities, protein structure families exhibit a high flexibility on a smaller scale which we call phenotypic plasticity (Fig. 1). Phenotypic plasticity of a protein structure comprises the changes in the actual 3D structure, which are to be expected for the given protein sequence (the genotype) or within a given genotype population (i.e. a group of proteins related according to a family, superfamily or fold relationship). Further, phenotypic plasticity is grounded in evolutionary events that happen on the sequence level, namely mutations, insertions and deletions.
|
In contrast to existing approaches, our method, in the following called phenotypic plasticity method (PPM), explicitly tries to model, score and optimize the changes which naturally occur in protein structure evolution. Under the hypothesis that two protein structures are similar, we assume that they share locally similar substructures (not necessarily restricted to secondary structure elements). For those, single amino acid exchanges (if they occurred) have not led to major structure rearrangements. Nevertheless, some point mutations as well as most insertions and deletions require certain topological rearrangements (movement of elements) on the structure level. In PPM we now try to measure the structural cost of mutating (or morphing) one structure into the other one. During this process, we score the similarity of locally similar substructures as well as the conservation of their topological arrangement. This approach explicitly allows to map corresponding structural elements of the two structures onto each other and to observe their phenotypic plasticity in a population. This feature of PPM will be especially helpful to study the interplay of sequence and structure evolution in the future.
In the following, we will first describe our model of protein structure similarity based on phenotypic plasticity and an algorithm to optimize this similarity score. We will then evaluate and discuss in detail the performance of PPM in recognizing similar structures on two benchmark sets and will compare PPM against other well known structure classification methods. We will conclude with an outlook on future work and extensions of PPM.
| 2 METHODS |
|---|
|
|
|---|
PPM scores the similarity of the topology of locally similar substructures (core blocks), and we determine possible core blocks and mappings in a first step. Having identified core blocks, we need some measure to quantify their topological similarities/differences (i.e. the implied structural mutations). We address this problem in three steps: (i) we introduce a measure quantifying topological difference of core block pairs in two structures, (ii) we introduce the graph (the PPM graph) of topological changes of pairs of core blocks, and define their topological difference on this graph and (iii) we introduce an algorithm that identifies the consistent subgraph with maximal score in the PPM graph. The PPM method is summarized in Figure 2. Please note that all similarity functions used are only presented in the form of examples in the text for space reasons. Details on those functions are available in the supplementary material on http://www.bio.ifi.lmu.de/PPM.
|
2.1 Phenotypic plasticity measure
2.1.1 Determining possible core blocks
In Figure 2, we show the result of the core block detection and the subsequent filter step. Given two protein structures p1 and p2, let B(p1) and B(p2) be sets of blocks (subsequences of the respective proteins) which are structurally similar to at least one block in the other protein.
DEFINITION [Block]
A subpart (subsequence) b1 in protein p1 is called a block if it has at least length Tl (set to 6 in our experiments) and is rigidly superposable to some subsequence b2 in protein p2, such that all associated distances in the superposition are below Td Å.The threshold Td is chosen in a length dependent manner and represents a tradeoff between the length of the block pair and the the maximally allowed pairwise distance of the residues. The longer the fragment, the higher the variability allowed. In our experiments Td is for example set to 1.24, 2.37, 2.91 and 3.3 Å for blocks of length 6, 12, 18 and 24, respectively. Please note that using pairwise distances instead of e.g. RMSD results in highly similar local substructures. Each block mapping represents a local, gapless alignment with locally similar structures. In the following steps, we will search for the maximal set of consistent blocks inducing similar topologies in the input protein structures.
In order to reduce the complexity of subsequent steps and to avoid shift errors in the core block alignment, we filter the core block mappings found using global topology information. Therefore, we calculate the superposition of the two proteins induced by a core block mapping and compute the maximum number of residue pairs under 6 Å. When applying this step to every residue of the query protein we can drop many incorrect core block mappings. An example for this approach is shown in Figure 3, while the effect of the filter step is shown in Figure 2b. The resulting set of possible core blocks is denoted by V={vj=(b1,b2)|bi
B(pi)=B(p1) x B(p2).
The similarity of a mapping of two core blocks sim(b1,b2)=sim(vj) is determined by a empirical function representing a tradeoff between the length of the pairs and the corresponding RMSD. The score for a pair of blocks lies between 0 and length with higher scores indicating better pairs. Examples of the score are: length = 10, RMSD = 1.2 leads to a score of 10.0, length = 10, RMSD = 2.0 to a score of 5.46 and length = 20, RMSD = 2.0 results in a score of 17.05.
|
2.1.2 Similarity of topologies implied by pairs of core block mappings
Having identified possible mappings of core blocks, we can compare pairs of such mappings using their topological relationships. In general, we represent the topology of a protein structure by the 3D geometry of the blocks. In more detail, having assigned b1,1 from protein one to b2,1 in protein two and b1,2 to b2,2, we calculate the implied structural mutation assuming that the topology of (b1,1, b1,2) changed into the topology (b2,1, b2,2). For this we compute the superposition of the two mappings (b1,1, b1,2) and (b2,1, b2,2) and calculate their RMSD in this superposition. The maximal value of the two RMSDs obtained (b1,1 on b2,1 and b1,2 on b2,2) is used as a structural cost, of a mutation i.e. a cost of the topological change implied by the joint mapping of the blocks. The maximal RMSD is used in order to avoid effects of large blocks on the mutation cost. A schematic example of the procedure is shown in Figure 2c.
Please note that, while the similarity of a core block pair is measured by an RMSD-based score, the final cost of the structural mutations observed is measured on the PPM graph (Section 2.2). The PPM graph explicitly tries to score the topological costs of a structural mutation.
The function to convert the lengths of the two mappings and their maximal RMSD into a cost function for structural mutations is empirical and represents again a tradeoff between length and allowed RMSD, i.e. a maximal RMSD of 4.3 Å across 33 residues leads to costs of –2.79, while an RMSD of 5.1 on 65 residues is less expensive (due to it's larger size) and costs –2.09.
2.2 The PPM-graph: similarity of the topologies
To measure the similarity of topologies we define the PPM graph. The set of core blocks mappings (Section 2.1.1) defines the nodes V of the graph. In the first step, the graph is fully connected and the edges are weighted by the pairwise similarity of pairs of core blocks as defined in Section 2.1.2. The schematic idea of the scoring process is shown in Figure 2d.
DEFINITION [PPM-graph, GPPM]
Given two proteins p1 and p2, we define the phenotypic plasticity graph as a labeled graph GPPM(p1, p2)=(V, E) as follows: V=B(p1) x B(p2) and E=V x V. Please note that nodes vj represent aligned block pairs, i.e. mappings of two core blocks in an alignment of proteins one and two.To reduce the number of edges, we only use the coordinating set of edges in our experiment, i.e. we only use edges if the distance of the core block pairs is <30 Å in both proteins.
DEFNITION [Linear Phenotypic Plasticity Measure of an alignment, LPPM(A)]
Having defined the similarity of nodes and edges, let A be an alignment of protein structures p1 and p2, A=(v1, e1,...,vn–1, en–1, vn), i.e. an alignment is simply a subset of core blocks connected by topological edges which form a spanning tree of the core blocks in the alignment. The linear LPMM(A) connects the blocks in a linear fashion from the N- to the C-terminus and is defined as: LPPM(A)=sim(vi)+
weight(ej).
The LPPM(A) score defined above always links a block to the alignment by the edge connecting it to the previous block in the sequence. Therefore, the edge weight does not take the overall topology of the protein into account. To overcome that problem, we define a variant of score as follows (see also Fig. 2d). For each aligned block node vi, the graph GPPM contains n–1 incident edges and each could be used to connect vi to the alignment.
Consider the sorted list of edges weights ei1, ei2,...,ein, with ei1 having the smallest weight. A tree T where the nodes are the n block pairs connected by n–1 edges represents a structural alignment/superposition of p1 and p2, T=(v1,..., vn; e1,..., en–1). The level allows to control the plasticity of the alignment: Level 1 connects a node vi via the edge of smallest weight ei1 to the tree, Level k uses the edge eik to score connecting node vi.
The higher the level the more edges are penalized in the alignment. Therefore, Level 1 allows for the largest plasticity of the protein structures, since adding a node (block pairs) to the alignment can always be done at the lowest possible cost. Using level n–1 enforces the largest rigidity onto the protein structures, since always using the worst edge for connecting the nodes (the pair with the largest inconsistency) increases the cost of the alignment and therefore decreases the score.
DEFNITION [Level k Tree Phenotypic Plasticity Measure of an alignment, TkPPM(A)]
Let the tree T represent an alignment of protein structures p1 and 2, T=(v1,..., vn; e1,..., en–1), where ej=ejk (the k-th best edge connecting the nodes) TkPPM(A)=(vi)+
weight(ej)
DEFNITION Phenotypic Plasticity Measure of an alignment PPM(A)]
In our experiments the level has been set to 3 which accounts for a reasonable tradeoff between flexibility and rigidity allowed for the input structures. Therefore, we define PPM(A)=T3PPM(A).
2.3 Optimizing and Normalizing the PPM score
Having calculated all possible core blocks and all topological edges, we need to identify the consistent subgraph with maximal score in the PPM graph. This subgraph represents the structural alignment of the two input structures. We solve this problem by using the A* algorithm with an appropriate heuristic function. The A* algorithm searches for the path in the graph maximizing the similarity of the two structures. A* guarantees to find the optimal solution (if the corresponding heuristic function is admissible which means that the function always overestimates the best possible similarity resulting from a path through the graph). Since it is possible to find such a heuristic function, the algorithm is able to find the optimal solution to our problem in reasonable time (less than a second on average). Please note that, the algorithm does not incorporate sequence gap costs at any time in the optimization procedure. However, for measuring the overall structural similarity it is useful to penalize non-aligned (na) residues and to normalize the score. In our experiment, we simply penalize every non-aligned residue with a constant value of –0.1 and the similarity of a query and target protein is normalized by the length of the query protein. The normalized PPM score is defined as:
DEFNITION [Normalized Phenotypic Plasticity Measure of two proteins]
| 3 RESULTS AND DISCUSSION |
|---|
|
|
|---|
In the following, we will evaluate the performance of PPM according to its capability to recognize structural similarities of a query protein in a template database of protein structures. Therefore, we will first show the results of PPM on the Vorolign testset (Birzele et al., 2007) and will then perform a larger, more difficult and more detailed evaluation on a comprehensive set of domains from the most recent SCOP version 1.73. PPM is evaluated against the two state-of-the-art structure alignment methods Vorolign and TM-Align. While TM-Align represents a method optimizing a rigid-body similarity criterion, Vorolign represents a method which uses a different, more flexible and more sequence-based representation of the protein structure, namely Voronoi neighborhoods, and allows to compute alignments even in cases where large flexibilities exist.
3.1 Family recognition on the Vorolign set
In order to test the ability of PPM to detect the correct protein family in a database of representative protein structures, we first use the family recognition set used by Birzele et al. (2007). This set contains 979 non-trivial test proteins from the ASTRAL compendium that are contained in version 1.67 (February 2005) but not in the previous version 1.65 (December 2003). Simulating the Vorolign test scenario, those proteins were scanned against the ASTRAL25 database (version 1.65).
The performance of all methods is measured as the fraction of proteins that can be assigned to their correct family, superfamily or fold according to the SCOP classification. Only the hit with the best score as computed by the respective method is taken into account for the evaluation.
PPM is compared against the structure alignment methods Vorolign, CE and TM-Align as well as a sequence-based method, profile–profile alignment (von Ohsen et al., 2003). The results are shown in Table 1 (the performances of further, sequence-based methods are additionally discussed in Birzele et al. (2007).
|
In this test scenario, all structure-based methods clearly outperform sequence-based methods like profile–profile alignment. Further, PPM outperforms the second best method (Vorolign) by about 2% on the family and the superfamily level and performs similar to Vorolign on the fold level. The test shows that PPM is able to predict the SCOP classification for a protein structure with a very high accuracy and can improve the family prediction rate on this set by another 2% compared to Vorolign.
3.2 Large-scale evaluation
To evaluate the performance of PPM, Vorolign and TM-Align (the three top-scoring methods on the Vorolign set) on a larger and also more recent test set, we used a comprehensive set of 4933 domains from SCOP 1.73. The domains are a representative subset of all domains classified in SCOP 1.73 which are similarly defined in CATH. We require an overlap of the residues in the two-domain definitions for a protein of >80%. All domains share a sequence identity of <50%. Every domain is used as query and compared against all domains in the set which belong to the same family, superfamily, fold (positive examples) and the same class but not the same fold (negative examples). Additionally, we require a pair of domains to be classified consistently in the CATH database, i.e. pairs from the same family/superfamily in SCOP need to be classified into the same homologous superfamily in CATH and pairs belonging to the same SCOP fold into the same CATH topology. This approach assures a high quality of the structural similarity relationships defined by the gold standards. Templates from other classes are not taken into account to limit the number of pairs to be evaluated. In total, we compute >3.6 million pairwise structure comparisons (the set can be obtained from http://www.bio.ifi.lmu.de/PPM).
When evaluating the same scenario as above, namely the family, superfamily and fold-recognition rate of the methods according to the first hit (Table 1) the methods turn out to perform very similar. Vorolign slightly outperforms PPM and TM-Align on family and superfamily level while PPM performs better than the other two methods on fold level. The somewhat different results of the two benchmarks in Tables 1 and 2 on family and superfamily as compared to fold level may be due to a smaller sequence identity allowed for pairs in the Vorolign set (<30%) compared to the larger set (50% at maximum).
In contrast to Table 2, Table 3 evaluates the performance of the methods in a different and more difficult scenario. Here, the performance is evaluated in a situation where the family of the target has been removed from the benchmark set to test a methods ability to detect similarities in the absence of clear sequence similarity. This set-up becomes for example important for folds which contain only one family and need to be extended by a new family (protein structure threading).
Therefore, all hits from the query domain's own fold are regarded as correct hits while hits from the same class but from a different fold are regarded to be errors. The table shows the number of query proteins where templates from wrong folds score better than templates from the own family, superfamily or fold, respectively. The error rates obtained allow for a detailed analysis of the performance of a method in the absence of the true family or superfamily since in those cases wrong folds would be predicted. In this more detailed and also more difficult test scenario (compared to pure family recognition), PPM significantly improves on both, Vorolign and TM-Align, on the superfamily and fold level, significantly reducing the number of errors. On the family level, it performs slightly worse than Vorolign. On this level, Vorolign's more sequence based scoring function seems to be an advantage.
|
|
3.3 Detailed analysis of examples
We further analyzed the errors made by PPM on the family level (cases where family members are not identified correctly) on both sets. The results for all errors made can be visually inspected in 3D at http://www.bio.ifi.lmu.de/PPM. The errors may be partitioned into three different classes. For some cases, the true family and the other fold found by PPM cannot be distinguished on the structure level. One such example is shown in Figure 4. The figure allows for a fast and detailed analysis of the differences of the two hits found. When analyzing the similarities (i.e. the aligned parts) and differences (i.e. the unaligned parts) of the proteins in the two superpositions one can see that both families share the same common core (though being classified in different folds) of three helices and differ in some other, mostly unstructured, parts. It will be difficult for any purely structure-based method to distinguish those two folds (if they turn out to be truly different).
Further, we find examples where PPM overestimates the cost of phenotypic plasticity in cases where an unexpectedly large plasticity can be observed. Those errors may be solved in future versions by a better understanding of the costs of structural mutations and, may be, even family or fold-specific cost functions.
The third class of errors comprises cases, where e.g. length differences of template and query lead to problems in the score normalization and in the ranking process.
|
|
To exemplify the ability of PPM to compute high quality alignments of core regions and to deal with phenotypic plasticity, we show the example of a.39.2.1 which has already been briefly discussed in Figure 1. This family consists of six helices and is stabilized by six cystein bridges. The family shows a high level of phenotypic plasticity which makes it hard for a rigid body method to align the helices correctly, i.e. without shifts and without adding gaps to the helices. Also, TM-Align aligns a number of single residues which are not related but are close to each other in space in a rigid body superposition. In contrast, such errors are avoided by PPM and the core regions (the six helices) are correctly identified and aligned (w.r.t. to the cystein bridges) by our PPM method as shown in Figure 5.
| 4 CONCLUSION |
|---|
|
|
|---|
We presented a novel idea and a new method to address the protein structure alignment and protein structure classification problem in the presence of phenotypic plasticity. In contrast to existing approaches, PPM explicitely tries to model the evolutionary cost of the mutations, insertions and deletions that occurred on the structure level during the transformation of one structure to another as implied by the sequence changes. Thereby, it accounts for the natural variance observed in protein structure families, a phenomenon we call phenotypic plasticity. This plasticity can lead to problems and artifacts in the alignment process (e.g. shifting helices or random matches in space as shown in Fig. 5) and those are avoided by PPM since it only aligns highly similar local substructures in a first step and then optimizes the alignment of those core blocks using the A* algorithm. The process takes the overall topology of the structures into account and avoids overestimating the structural similarity due to a too large level of flexibility.
The performance of PPM has been evaluated on two comprehensive benchmark sets. On the Vorolign set, PPM outperforms all other methods in recognizing the correct family and superfamily of a query protein. On the more recent benchmark set of almost 5000 SCOP domains and the corresponding 3.6 million structure pairs PPM performs comparable to Vorolign and TM-Align in the family/superfamily and fold-recognition scenario but significantly outperforms the other methods in a different test set-up where the ability to recognize at least a member from the correct fold in the absence of family or superfamily members is evaluated.
Future development of PPM will be into different directions. First, the method is completely modular according to which scoring functions are used to score the similarity of core blocks as well as their topologies. Our final goal of finding an evolutionary measure for structural mutations similar to the idea behind accepted mutations in matrices like PAM (Dayhoff et al., 1978) is not reached yet. So far, the scoring functions are purely structure based and due to our current lack of knowledge on the true costs of mutations and insertions/deletions on the structure level can only be optimized and parameterized empirically. It is an on-going and iterative process to further refine those parameters to capture the real costs of mutations and insertions/deletions on the structure level, may be even in a family or fold-specific manner.
Second, we would like to extend the currently purely structure-based scoring functions used in the method to a combination of sequence and structure scores to account for both criteria and further improve on the quality of the protein alignments. High-quality alignments with respect to sequence and structure form the basis for a detailed analysis of the interplay of protein sequence and structure evolution.
Third, conserved pairwise core blocks identified by PPM will further be used to identify conserved cores of protein structure families with interesting applications in protein structure prediction and sequence–structure alignment (e.g. threading). For this task, PPM will be extended in order to compute multiple structure alignments or to combine sets of pairwise alignments to a multiple alignment.
Finally, the model of structure evolution underlying PPM may be an interesting starting point toward a novel definition of protein structure similarity and protein structure flexibility within and between groups of structures, a feature that will also help to detect similarities across different folds which may be explored in nature for example by alternative splicing (Birzele et al., 2008) or during structure evolution.
| REFERENCES |
|---|
|
|
|---|
Andreeva A, et al. Data growth and its impact on the scop database: new developments. Nucleic Acids Res (2007) 36(Database issue):D419–D425.[CrossRef][Web of Science][Medline]
Berman H, et al. The protein data bank. Nucleic Acids Res (2000) 28:235–242.
Birzele F, et al. Vorolign–fast structural alignment using Voronoi contacts. Bioinformatics (2007) 23:e205–e211.
Birzele F, et al. Alternative splicing and protein structure evolution. Nucleic Acids Res (2008) 36:550–558.
Dayhoff MO, et al. Amodel of evolutionary change in proteins. Atlas of Protein Sequence and Structure (1978) 5:345–352.
Grabowski M, et al. Structural genomics: keeping up with expanding knowledge of the protein universe. Curr. Opin. Struct. Biol (2007) 17:347–353.[CrossRef][Web of Science][Medline]
Greene LH, et al. The cath domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res (2007) 35(Database issue):D291–D297.
Hadley C, Jones D. A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure (1999) 7:1099–1112.[Medline]
Harrison A, et al. Recognizing the fold of a protein structure. Bioinformatics (2003) 19:1748–1759.
Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J. Mol. Biol (1993) 233:123–138.[CrossRef][Web of Science][Medline]
Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng (1998) 11:739–747.
von Ohsen N, et al. Profile-profile alignment: a powerful tool for protein structure prediction. Pac. Symp. Biocomput (2003) 252–263.
Ye Y, Godzik A. Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics (2003) 19(Suppl. 2):II246–II255.[Medline]
Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins (2004) 57:702–710.[CrossRef][Web of Science][Medline]
Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res (2005) 33:2302–2309.
This article has been cited by other articles:
![]() |
A. Poleksic Algorithms for optimal protein structure alignment Bioinformatics, November 1, 2009; 25(21): 2751 - 2756. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


B(pi)=B(p1) x B(p2).

sim(vi)+
weight(ej).
(vi)+

