Skip Navigation


Bioinformatics Advance Access originally published online on June 25, 2008
Bioinformatics 2008 24(16):1798-1804; doi:10.1093/bioinformatics/btn326
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/16/1798    most recent
btn326v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by McGuffin, L. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by McGuffin, L. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Intrinsic disorder prediction from the analysis of multiple protein fold recognition models

Liam J. McGuffin

School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Intrinsic protein disorder is functionally implicated in numerous biological roles and is, therefore, ubiquitous in proteins from all three kingdoms of life. Determining the disordered regions in proteins presents a challenge for experimental methods and so recently there has been much focus on the development of improved predictive methods. In this article, a novel technique for disorder prediction, called DISOclust, is described, which is based on the analysis of multiple protein fold recognition models. The DISOclust method is rigorously benchmarked against the top five methods from the CASP7 experiment. In addition, the optimal consensus of the tested methods is determined and the added value from each method is quantified.

Results: The DISOclust method is shown to add the most value to a simple consensus of methods, even in the absence of target sequence homology to known structures. A simple consensus of methods that includes DISOclust can significantly outperform all of the previous individual methods tested.

Availability: http://www.reading.ac.uk/bioinf/DISOclust/

Contact: l.j.mcguf.n{at}reading.ac.uk

Supplementary information: Supplementary data are available at http://www.reading.ac.uk/bioinf/DISOclust/suppl.pdf


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
It is well known that numerous proteins, from most organisms, contain regions that lack a fixed 3D structure and are said to be in a state of intrinsic disorder, therefore. In fact, in a previous study of several organisms we found that as many as 33% of the proteins encoded within a typical eukaryotic genome, over 4% of those from eubacteria and ~2% from archea are likely to contain long regions of native disorder (Ward et al., 2004). The reason for this ubiquity is that intrinsic protein disorder is implicated in many functional roles that are essential to all life, such as molecular recognition, cell signaling, transcription and translation (Radivojac et al., 2007; Ward et al., 2004).

The presence of native disorder in proteins complicates protein structure determination using X-ray crystallography and often disorder residues only become ordered upon binding. Due to the technical challenges of elucidating native disorder experimentally, there has been much focus on the development of novel predictive methods (Bordoli et al., 2007). The first method for disorder prediction from local sequence was developed over a decade ago (Romero et al., 1997). Over the past 10 years a number of different methods have been developed for the prediction of native disorder, most of which use machine learning methods that have been trained to recognize various sequence features.

Recently, Bordoli et al. (2007) remarked that the field of disorder prediction had converged, due to the similarity in the performance of methods between the CASP6 and CASP7 experiments. Radivojac et al. (2007) have suggested that new techniques, other than those purely derived from sequence properties, must be developed in order to progress the field further and to reach the upper theoretical limit for disorder prediction.

In this article, a novel structure-based approach to disorder prediction, DISOclust, is investigated. The DISOclust method is based on the simple premise that the ordered residues within a protein target should be conserved in 3D space within multiple models, whereas the residues that vary or are consistently missing may be correlated with disorder. The idea of structurally comparing multiple models on a per-residue basis has also been used for model quality assessment. Indeed, currently the most successful technique for prediction of the local error in a model is to measure the variation in residue positions by structurally comparing multiple models (Cozzetto et al., 2007). This study extends this idea for the prediction of intrinsic protein disorder—if the predicted per-residue error is conserved across models, is this correlated with protein disorder? In order to answer this question, the DISOclust method is benchmarked against the top five disorder prediction methods that were assessed in the CASP7 experiment. In addition, the further enhancement of disorder prediction is investigated by simply combining the DISOclust method with the current sequence-based methods in order to form consensus predictions.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 The DISOclust method
DISOclust is an unsupervised method composed of two steps; the prediction of the per-residue error in multiple fold recognition models followed by a simple analysis of the conservation of per-residue error across all models. The per-residue error in each model was calculated using a score based on the average S-score (Levitt and Gerstein, 1998). The S-score was originally used for model evaluation in the 3D-SHOTGUN method (Fischer, 2003), it was recently benchmarked using the Pcons server (Wallner and Elofsson, 2006) and has been incorporated into the ModFOLD server (McGuffin, 2008). Similarly, in this study pairwise superpositions of the CASP7 server models were carried out in order to evaluate the local structural conservation of each residue in each model. For each CASP7 target, a number of fold recognition server models were available (N). The per-residue quality of each server model was calculated by carrying out structural alignments with every other model using the TM-score program (Zhang and Skolnick, 2004), with the ‘–d’ option set to 3.9 Â. In a pairwise structural alignment, if the overall TM-score was found to be >0.2, then S-scores were calculated for each residue (i) in the model. If a pair of residues were structurally aligned within the TM-score distance cut-off (as indicated by a ‘:’ in the TM-score alignment output), then the S-score was calculated as below:


Formula

Where di was the distance between aligned residues and d0 was the dis-tance threshold (3.9). Unaligned residues in the model were given an Si score of 0. The mean S-score (Sr) was then calculated for each residue (r) in the target sequence:


Formula

Where Sr was the predicted residue accuracy for the model (McGuffin, 2008), N–1 was the number of pairwise structural alignments carried out for that model, A was the set of alignments and Sia was the Si score for a residue in a structural alignment (a). An Sr score of 0 was given to any residues that were missing from the model, so that all residues in the target sequence were scored. Finally, the DISOclust score for each residue in a target sequence was calculated as 1 minus the per-residue accuracy across all models:


Formula

Where Pd was the approximate posterior probability of a residue being in a disordered state, N was the number of models, M was the set of models and Srm was the Sr score for a model (m).

The DISOclust predictions were made for each target using the CASP7 fold recognition server models, which were downloaded from the CASP7 website:

http://www.predictioncenter.org/download_area/CASP7/server_predictions/

The DISOclust web server is available at the following URL: http://www.reading.ac.uk/bioinf/DISOclust/

In addition, the DISOclust method has been integrated into the ModFOLD server as an additional output option for the ModFOLDclust method (McGuffin, 2007, 2008).

2.2 Baseline methods
Two baseline disorder prediction methods were evaluated in order to gauge the added value of combining simple filtering methods with established methods, compared with using the DISOclust method. The first baseline method, Baseline1, is purely a coil predictor, as disordered regions may be more likely to occur in regions where helices and strands are not confidently predicted. For each target, the probability of each residue being in a coil state was taken from the PSIPRED ss2 output file and all coil residues were predicted to be disordered. The Baseline2 method focuses purely on identifying the residues that are consistently missing in multiple fold recognition models, as these residues may be indicative of those which are missing in the template structures. For each residue in a target, the probability of disorder was taken as the number of times the residue was found to be missing in a model divided by the total number of models.

2.3 The top five disorder prediction methods according to the CASP7 assessment
The official CASP7 assessment of disorder prediction (Bordoli et al., 2007) ranked the following prediction groups as the top five: ISTZORAN, CBRC-DR, Fais, DISOPRED and DISpro. Although each group mostly employed automated methods, the first three of these groups were registered at CASP7 as human expert groups and the last two as automated servers. The two automated servers, DISOPRED (Ward et al., 2004) and DISpro (Cheng et al., 2005), both used machine learning methods for the analysis of sequence-profiles, however, the DISpro method also incorporated solvent accessibility and predicted secondary structure scoring. From the CASP7 abstracts it is understood that the Fais prediction group used a similar technique based on the support vector regression (SVR) of position specific scoring matrices. The CBRC-DR group also used a number of SVM-based prediction methods, which varied depending on the length of the target disorder [POODLE-S (Shimizu et al., 2007)] as well as two other variations of the method). Finally, the ISTZORAN group used a combination of two specialized length dependent SVM-based disorder prediction methods, VSL2-L and VSL2-S (Peng et al., 2006). The disorder predictions made by each of these groups for each CASP7 target were downloaded from the CASP7 website:

http://predictioncenter.org/download_area/CASP7/predictions/

2.4 Gauging the added value of individual methods to a simple consensus of methods
The value of adding each disorder prediction method into a consensus of methods was investigated by systematically benchmarking different combinations of methods. Each of the disorder prediction methods tested here produced single continuous scores or ‘P-values’, which related to the likelihood of each residue being in a state of disorder [These values are not strictly P-values, rather they are approximations of posterior probabilities. The terminology P-value’ has been used here for consistency with other benchmarking studies (Jin and Dunbrack, 2005)]. The consensus value for each combination of methods was simply taken as the average P-value for each residue and no optimization or weighting was used.

The highest scoring automated consensus method was obtained by identifying the maximal AUC score that could be achieved by simply taking the averages of the outputs from the DISOPRED, DISpro and DISOclust methods. The highest scoring overall consensus was measured by considering different combinations of all six individual methods tested. In the initial round, the scores from all six methods were averaged and then each method was left out in turn in order to gauge its added value. The same methodology was used in each subsequent round to find the highest scoring consensus of 4, 3 and 2 methods. Using this approach to measure the added value of individual methods to each consensus was efficient and gave equivalent results to exhaustively searching every combination.

2.5 Benchmarking data
In order to analyze the accuracy of each prediction method, the disorder definitions for 95 targets were downloaded from the CASP7 website (http://predictioncenter.org/download_area/CASP7/). Although 96 CASP7 targets were identified to have regions of disorder (Bordoli et al., 2007), a common subset of 95 targets was used here in order to ensure that the assessment scores for each method were directly comparable. (The ISTZORAN results reported here are identical to official analysis (Bordoli et al., 2007), as only 95 targets were predicted, however, the results for other methods vary insignificantly, due to the use here of common subset.) In addition, for comparison with the official assessment only segments of disorder >3 residues in length were assessed. Therefore, 19 429 residues were considered in this assessment (1073 disordered and 18 356 ordered).

The methods were also tested on subsets of the CASP7 data in order to determine whether the DISOclust method is able to provide added value even in the absence of sequence homology. Firstly the high accuracy-template-based modeling (HA-TBM) targets were removed leaving 71 targets with 14 244 residues (923 disordered and 13 321 ordered). Finally, only the free modeling (FM) targets were considered leaving 18 targets with 3473 residues (252 disordered and 3221 ordered).

In order to test whether using the DISOclust can be of practical benefit outside the context of CASP experiment, the method was also benchmarked using a set of targets obtained from the DisProt database (Vucetic et al., 2005) and models obtained from the LOMETS server (Wu and Zhang, 2007). The dataset of 14 934 PDB chains with disordered regions assigned from missing residues was downloaded from the DisProt supplemental datasets: http://www.disprot.org/data/missingxray/missingXray.080503.zip.

This dataset was reduced to 1036 sequences using a redundancy reduction procedure, so that all pairs of sequences had FASTA E-values >0.001 and <25% sequence identity. These 1036 sequences were then aligned against sequences in the PDB (as of February 11, 2008). Any targets with obvious homologs to other structures (E-values ≤0.001 and ≥25%) were removed, leaving 199 sequences with 54 606 residues (4466 disordered and 50 140 ordered). Of these 199 sequences, 23 (~11.6%) contained long regions of disorder (>30 residues).

All 199 sequences were then submitted to the LOMETS web server, which uses a number of alternative profile–profile-based methods in order to align sequences to structures. Any models built using the target structure as a template were discarded and only models built from alignments to templates with <25% sequence identity to the target sequence were considered for the DISOclust prediction. The CASP7 version of the DISOPRED method was also run in-house on all 199 targets for comparison.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The disorder prediction methods were compared using standard ROC analysis. The ROC curves for each method are plotted in Figure 1. Supplementary Table 1 includes several standard benchmarking scores for assessing binary predictions (disorder/order).The scores in Supplementary Table 1 are ranked by the SW score (Jin and Dunbrack, 2005), but all of the scores in this table are dependent on the P-value cut-offs used to assign disorder. The P-value cut-offs used for DISOclust and the consensus methods were determined by maximizing the MCC, Acc, Prod and Sw scores and then taking the mean P-values at these maximal values. The maximal scores and the associated P-value cut-offs for every method are also tabulated in Supplementary Table 2. This analysis is similar to that carried out by the CASP6 assessors of disorder prediction (Jin and Dunbrack, 2005). From the data in Supplementary Table 2, it is clear that the optimal P-value cut-off varies between methods and from score to score. If maximal scores are considered, then the DISOclust method is similar in performance to the other top server methods—with lower scores than DISOPRED but with higher scores than DISpro. The consensus of the DISOclust and DISOPRED scores outperforms all of the methods tested in CASP7, including the human expert groups according to all scores. Furthermore, a consensus of the human expert groups and DISOclust produces the highest scores overall, but these scores are not necessarily the best indicators of performance, as they are dependent on the P-value cut-off.


Figure 1
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. ROC curves for each disorder prediction method. DISO-clust+DISOPRED, a consensus method that takes the average of the P-values from DISOclust and DISOPRED; DISOclust+human, a consensus method that takes the average of the P-values from DISOclust, ISTZORAN, CBRCDR and Fais.

 

View this table:
[in this window]
[in a new window]

 
Table 1. The assessment of disorder prediction using the areas under ROC curves (AUC)

 

View this table:
[in this window]
[in a new window]

 
Table 2. The significance test results for the comparison of AUC scores (95 CASP7 targets)

 
The AUC and associated SE (Hanley and McNeil, 1982) were calculated for each method shown in Figure 1 and the baseline methods (Table 1). The AUC score is arguably the best indicator of the overall performance of methods that produce continuous scores because it is independent of the P-value cut-off. The results in Table 1, panel A again show that the DISOclust method is similar in performance to the other automated methods and that the consensus methods outperform all methods (reflecting the results shown in Supplementary Table 2). Despite the fact that the DISOclust method falls behind the other automated methods in the absence of sequence homology and at low false positive rates, there is still value to be gained from combining the method with DISOPRED (Table 1, panel B and C). Table 2 shows the data for the significance test for the comparison of AUC scores calculated using the method of (Hanley and McNeil, 1983). The results indicate that there is no significant difference between DISOclust and the two server methods (DISOPRED and DISpro) when all 95 targets are considered. All three human expert groups significantly outperform DISpro, confirming the result in the official evaluation carried out by Bordoli et al. (2007). ISTZORAN has a significantly higher AUC score than all three automated methods (DISOPRED, DISpro and DISOclust). In addition, CBRCDR and Fais have a significantly higher score than DISOclust alone. Importantly, the simple consensus of DISOPRED and DISOclust significantly outperforms all of the top five methods tested in CASP7. Furthermore, the consensus of DISOclust plus the human expert groups significantly outperforms all methods tested here.

The Baseline2 method also appears to be similar in AUC performance to the automated methods (Table 1, panel A) when homologous templates are available, however, do the baseline methods add as much value as DISOclust when used in combination with established disorder prediction methods? The results in Table 3 show that the Baseline2 method does provide added value to some methods, however, it is clear that the DISOclust method adds the most value to all methods by increasing both the overall AUC score and the AUC score for false positive rates between 0% and 10%.


View this table:
[in this window]
[in a new window]

 
Table 3. The added value of using DISOclust versus the baseline methods in combination with each disorder prediction method

 
The results in Table 4 indicate that the most value to a consensus of automated methods is obtained by the inclusion of the DISOclust method, as significantly higher scores are produced than those from the individual methods. Furthermore, a consensus of DISOPRED and DISpro produces an AUC score similar to that of just using DISOPRED alone (~0.84), indicating little added value when DISOclust is excluded.


View this table:
[in this window]
[in a new window]

 
Table 4. Excluding DISOclust from the consensus of automateda methods results in a significant drop in the AUC score

 
Table 5 shows the results obtained from different combinations of methods using a systematic analysis. The highest AUC score overall is from the consensus of DISOclust plus the three human group methods. However, it must be noted that there is no significant difference in AUC scores between the highest scoring consensus and the simple consensus of DISOclust and ISTZORAN (round 4). In each ‘round’, the greatest decrease in AUC performance is seen when the DISOclust method is removed from the consensus and in each case the loss is significant (i.e. greater than the combined SE in AUC). The DISOclust method clearly provides the most added value to any consensus of methods.


View this table:
[in this window]
[in a new window]

 
Table 5. The DISOclust method provides the most added value to any consensus of methods

 
The results in Tables 6, 7 and 8 show the analysis of the DISOclust and DISOPRED predictions using the DisProt supplementary dataset. The results indicate that the DISOclust method can be used to improve predictions outside the context of CASP, even when remote homologues are the only available templates for building models. Using the DISOclust method to analyze multiple models obtained from the LOMETS server is shown add significant value to DISOPRED predictions. The results for the subsets of DisProt targets without/with long regions of disorder (>30 residues) are shown in Table 7 and 8, respectively. All methods perform better on the subset of targets containing only short regions of disorder (Table 7) than they do on the subset containing long regions (Table 8). However, on both subsets there is still significant value to be gained from simply taking the mean value of outputs from DISOclust and DISOPRED.


View this table:
[in this window]
[in a new window]

 
Table 6. Can the DISOclust method be useful outside the context of CASP? The assessment of disorder prediction using 199 targets from the DisProt database

 

View this table:
[in this window]
[in a new window]

 
Table 7. The performance of methods on the subset of 176 DisProt targets containing only short regions of disorder (<30 residues)

 

View this table:
[in this window]
[in a new window]

 
Table 8. The performance of methods on the subset of 23 DisProt targets containing long regions of disorder (>30 residues)

 
Supplementary Table 3 shows the assessment of disorder prediction methods using the DisProt dataset and standard benchmarking scores, which are dependent on P-value thresholds. For the DISOclust and DISOclust+DISOPRED methods, the P-value thresholds used for assigning residues as disordered (the D cut-off column) were obtained from Supplementary Table 2 (the pMeanmax column).


    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The conserved error in fold recognition models can be used as an effective predictor of intrinsic protein disorder and is the basis for the new method described in this article, DISOclust. Additionally, significant improvements in disorder prediction accuracy can be achieved by simply taking the average of the P-values from the structure-based approach used in the DISOclust method and those from a leading automated sequence-based method, such as DISOPRED, even in the absence of sequence homology.

The results in Table 2 show that there is no significant difference in the AUC performance between the DISOclust method and DISOPRED or DISpro, when all CASP7 targets are considered. However, combining the DISOclust method with DISOPRED significantly outperforms the DISOPRED method alone, even on the FM subset of targets (Table 1). The consensus approach that combines the P-values from DISOclust and the human expert groups is shown to outperform all of the top performing methods assessed at CASP7, on all subsets of targets and at low false positive rates. Furthermore, using the DISOclust method in combination with each established method leads to a significant increase in AUC performance (Table 3).

The Baseline1 filter does not add value to methods, which may be due to the fact that many methods already include optimized secondary structure filtering. Simply considering the missing residues within models, using the Basleline2 method, does lead to an increase in AUC score for some methods, however, the DISOclust approach clearly adds the most value to all methods (Table 3). The analysis of the results in Tables 4 and 5 indicate that most of the added value to a simple consensus of methods arises from the addition of the DISOclust method. This is evidenced by the significant drop in AUC performance when the DISOclust method is removed in each round and by the fact that the DISOclust method appears in the highest scoring consensus of methods in each round. A possible explanation for this may be that alternative ‘flavors’ of disorder (Vucetic et al., 2003) are being identified by the DISOclust method, where as the mostly sequence-based methods may be identifying similar flavors. Future work would be to extend the study of Vucetic et al. in order to compare the types of disorder most accurately predicted by DISOclust with those from each method tested here.

The ROC curves in Figure 1 indicate that the DISOclust method alone achieves a higher rate of true positives overall than all previous methods at false positive rates >0.4. Nevertheless, using the DISOclust method alone achieves lower true positive rates than previous methods at lower false positive rates (<0.1). The preference for higher sensitivity or specificity is dependent on the proposed application and so different disorder prediction methods could be applied in different circumstances (Bordoli et al., 2007). However, in most applications we are interested in maximizing both the specificity and sensitivity and this can be achieved by using the DISOclust method in combination with a method such as DISOPRED and by selecting an appropriate P-value cut-off. This is evidenced by the maximal Acc and Prod scores achieved by the consensus approaches (Supplementary Table 2). In this study, the consensus predictions were made by simply taking the mean per-residue P-values, yet it may be the case that further accuracy could be achieved by combining outputs from each method using optimized weightings.

The results in Tables 6, 7 and 8 indicate that the DISOclust method can provided added value outside the context of the CASP experiment, even when remote homologues are the only available templates for building models and when targets contain long regions of disorder. A simple analysis of the variation in multiple models from the LOMETS server is shown to add significant value to DISOPRED predictions, verifying the results shown using the CASP7 datasets (Table 3).

The LOMETS server provides a convenient source of multiple fold recognition models for the DISOclust method, although several other meta-servers are available, which could also provide multiple models, for example, Pcons (Lundstrom et al., 2001) and 3D-JURY (Ginalski et al., 2003). The results in Supplementary Table 4 show that the performance of DISOclust increases with the number of models. However, significant added value can still be achieved using only the top few models from just a single fold recognition method, if available CPU resources are limited.

The DISOclust server has been developed, which obtains models from the latest version of the nFOLD method (Jones et al., 2005) and combines the DISOclust results with those from DISOPRED. In addition, an option is available to obtain DISOclust predictions via the ModFOLD server (McGuffin, 2008), whereby users can upload multiple models of their own choice along with their target sequence.

Fold recognition models can be built for the majority of proteins occurring within most organisms. In a previous study, we found that up to 84% of proteins encoded within a genome can be confidently assigned to known folds with 72% of the residues aligned using standard sequence profile-based fold recognition (McGuffin et al., 2006). Additionally, the coverage of proteome annotations can be further increased using more intensive profile–profile-based fold recognition (McGuffin et al., 2006). The DISOclust method should, therefore, be a useful supplement to disorder prediction for the majority of new protein sequences. In cases where models can not be built from templates using fold recognition methods, the DISOclust method should still be of added value if more sophisticated free modeling servers are used to generate models (Table 1, panel C).


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
I wish to thank Dr Kevin Bryson and Prof. David T. Jones for making the DISOPRED code available.

Funding: This work was supported by an RCUK Academic Fellowship.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Burkhard Rost

Received on March 9, 2008; revised on June 4, 2008; accepted on June 20, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Bordoli L, et al. Assessment of disorder predictions in CASP7. Proteins (2007) 69(Suppl 8):129–136.[CrossRef][Web of Science][Medline]

    Cheng J, et al. Accurate prediction of protein disordered regions by mining protein structure data. In: Data Mining and Knowledge Discovery. (2005) 11:213–222.[CrossRef][Web of Science]

    Cozzetto D, et al. Assessment of predictions in the model quality assessment category. Proteins (2007) 69(Suppl 8):175–183.[CrossRef][Web of Science][Medline]

    Fischer D. 3D-SHOTGUN: a novel, cooperative, fold-recognition metapredictor. Proteins (2003) 51:434–441.[CrossRef][Web of Science][Medline]

    Ginalski K, et al. 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics (2003) 19:1015–1018.[Abstract/Free Full Text]

    Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology (1982) 143:29–36.[Abstract/Free Full Text]

    Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology (1983) 148:839–843.[Abstract/Free Full Text]

    Jin Y, Dunbrack RL Jr. Assessment of disorder predictions in CASP6. Proteins (2005) 61(Suppl 7):167–175.[CrossRef][Web of Science][Medline]

    Jones DT, et al. Prediction of novel and analogous folds using fragment assembly and fold recognition. Proteins (2005) 61(Suppl 7):143–151.[CrossRef][Web of Science][Medline]

    Levitt M, Gerstein M. A unified statistical framework for sequence comparison and structure comparison. Proc. Natl Acad. Sci. USA (1998) 95:5913–5920.[Abstract/Free Full Text]

    Lundstrom J, et al. Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci (2001) 10:2354–2362.[CrossRef][Web of Science][Medline]

    McGuffin LJ. Benchmarking consensus model quality assessment for protein fold recognition. BMC Bioinformatics (2007) 8:345.[CrossRef][Medline]

    McGuffin LJ. The ModFOLD server for the quality assessment of protein structural models. Bioinformatics (2008) 24:586–587.[Abstract/Free Full Text]

    McGuffin LJ, et al. The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms. Nucleic Acids Res (2004) 32:D196–D199.[Abstract/Free Full Text]

    McGuffin LJ, et al. High throughput profile-profile based fold recognition for the entire human proteome. BMC Bioinformatics (2006) 7:288.[CrossRef][Medline]

    Peng K, et al. Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics (2006) 7:208.[CrossRef][Medline]

    Radivojac P, et al. Intrinsic disorder and functional proteomics. Biophys. J (2007) 92:1439–1456.[CrossRef][Web of Science][Medline]

    Romero P, et al. Sequence data analysis for long disordered regions prediction in the calcineurin family. Genome Inform. Ser. Workshop Genome Inform (1997) 8:110–124.[Medline]

    Shimizu K, et al. POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix. Bioinformatics (2007) 23:2337–2338.[Abstract/Free Full Text]

    Vucetic S, et al. Flavors of protein disorder. Proteins (2003) 52:573–584.[CrossRef][Web of Science][Medline]

    Vucetic S, et al. DisProt: a database of protein disorder. Bioinformatics (2005) 21:137–140.[Abstract/Free Full Text]

    Wallner B, Elofsson A. Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci (2006) 15:900–913.[CrossRef][Web of Science][Medline]

    Ward JJ, et al. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol (2004) 337:635–645.[CrossRef][Web of Science][Medline]

    Wu S, Zhang Y. LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res (2007) 35:3375–3382.[Abstract/Free Full Text]

    Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins (2004) 57:702–710.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/16/1798    most recent
btn326v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by McGuffin, L. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by McGuffin, L. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?