Skip Navigation


Bioinformatics Advance Access originally published online on May 26, 2006
Bioinformatics 2006 22(13):1658-1659; doi:10.1093/bioinformatics/btl158
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/13/1658    most recent
btl158v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (159)
Google Scholar
Right arrow Articles by Li, W.
Right arrow Articles by Godzik, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, W.
Right arrow Articles by Godzik, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

Weizhong Li * and Adam Godzik

Burnham Institute for Medical Research La Jolla, CA 92037, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 RESULTS
 REFERENCES
 

Motivation: In 2001 and 2002, we published two papers (Bioinformatics, 17, 282–283, Bioinformatics, 18, 77–82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.

Availability: http://cd-hit.org

Contact: liwz{at}sdsc.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 RESULTS
 REFERENCES
 
In recent years, the amount of biological sequence data has been growing explosively, which has imposed growing difficulties on analyzing them. The complexity of many sequence analyses is of the order of n2, where n is the number of sequences to be considered. One such example is protein sequence clustering, which groups similar proteins into clusters based on their sequence similarities. To address this computational challenging problem, we developed a novel method and published a program, cd-hit (Li et al., 2001, 2002a), which can efficiently handle huge databases. For example, it takes only a few hours to cluster the NCBI-nr with ~3 million proteins on a single high-end workstation.

Since its release, cd-hit has been used by many groups, including Uniprot (Apweiler et al., 2004) and PDB (Bourne et al., 2004), in various research fields. In our group, we applied it to generate non-redundant protein datasets to reduce the database search efforts and to improve the homology detection sensitivity (Li et al., 2002a).

The algorithm behind cd-hit is short word filtering, which can determine that the similarity between two sequences is below a certain value without performing an actual sequence alignment. This algorithm is not limited to protein sequence clustering; it can also be applied to many other analyses that involve a large amount of sequence comparisons.

Here, we present several new programs based on cd-hit algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA database and cd-hit-est-2d compares two nucleotide datasets. The common advantages of these programs are ultrahigh speed and the ability to handle huge databases.


    2 ALGORITHMS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 RESULTS
 REFERENCES
 
2.1 Short word filtering
The details of the algorithm for short word filtering were described in our earlier papers (Li et al., 2001, 2002a). In short, the minimum number of identical short substrings, called ‘words’, such as dipeptides, tripeptides and so on, shared by two proteins is a function of their sequence similarity. We calculated this function by analytical and large-scale statistical analyses. Therefore, we can effectively estimate that the similarity of two sequences is below a certain threshold by simple word counting and without an actual sequence alignment. For nucleotide sequences, we can also obtain such a short word requirement by a similar combination of analytical and statistical analyses.

We implemented this idea using an index table. For instance, the total number of possible pentapeptides is only 215 (each position has 21 possibilities, 20 amino acids plus ‘X’), and such an index table requires only 4 million entries, which just matches the RAM size of current computers. Index tables maximize the speed of short word counting. Details regarding how to choose an appropriate short word are documented in the cd-hit user's guide.

2.2 Cd-hit and cd-hit-est clustering
The original cd-hit program clusters a protein database, and its variant, cd-hit-est, clusters a DNA/RNA database. For eukaryotic genes, long introns can cause long gaps in sequence alignments, which significantly reduces the efficiency of short word filtering. So, practically, cd-hit-est can be applied only for non-intron-containing sequences, such as ESTs.

The clustering algorithm in both cd-hit and cd-hit-est is a greedy incremental clustering algorithm. Briefly, sequences are first sorted in order of decreasing length. The longest sequence becomes the representative of the first cluster. Then, each remaining sequence is compared with the representatives of existing clusters. If the similarity with any representative is above a given threshold, it is grouped into that cluster. Otherwise, a new cluster is defined with that sequence as the representative. For each sequence comparison, short word filtering is applied to the sequences to confirm whether the similarity is below the clustering threshold. If this cannot be confirmed, an actual sequence alignment is performed.

2.3 Cd-hit-2d and cd-hit-est-2d
Program cd-hit-2d compares two protein databases and identifies similar sequences between them above a certain threshold. Cd-hit-est-2d works for two DNA/RNA databases. For the same reason that we mentioned earlier, cd-hit-est-2d is a practical choice only for non-intron-containing sequences.

Given two databases, db1 and db2, cd-hit-2d or cd-hit-est-2d works in a straightforward way. Sequences in db1 are first sorted in order of decreasing length. Then, each sequence in db2 is compared with db1 from the top (the longest one), and if the similarity to any one in db1 is above the threshold, this sequence is attached to the matched one in db1. At the end of comparing, the program reports matches between db1 and db2 and also outputs a list of proteins in db2 that is not similar to any sequence in db1.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 RESULTS
 REFERENCES
 
The cd-hit package was written in C++ and was tested on Linux systems. It is distributed as an open source package and can be run on almost all systems that support C++ with little or no modification.

Some example runs are listed in Table 1. All these examples were performed on a Linux workstation with dual 3.0 GHz Xeon processors and 4 GB RAM. The programs used only one processor. For example, cd-hit took <8 h to cluster the NCBI-nr with more than 3.2 million proteins at 90% sequence identity level. Cd-hit-est-2d took a similar amount of time to identify the matches above 95% identity in both strands between human ESTs with ~6 million sequences and ~30 thousand human mRNAs.


View this table:
[in this window]
[in a new window]
 
Table 1 Example runs of cd-hit programs

 
Many options and functions were implemented for the users to control the clustering or comparing process. For example, a useful function is the incremental clustering that offers not only a higher speed but also a stable clustering structure for regularly updated databases. Full set of options are described in the documentation for the program.

In addition to the programs described above, the package contains several utility tools. Some tools help analyze, sort and format the clustering results. One script runs clustering in parallel mode by distributing jobs on a computer cluster (details can be found in the user's guide). The cd-hit package will be under regular maintenance and further development, which will focus on the efficiency at low sequence similarity thresholds. We are also open to adding new functionalities as suggested by users.


    Acknowledgments
 
Funding to pay the Open Access publication charges was provided by the institutional funds of Burnham Institute for Medical Research.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Golan Yona

Received on March 23, 2006; revised on April 20, 2006; accepted on April 20, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 RESULTS
 REFERENCES
 

    Apweiler, R., et al. (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res, . 32, (Database issue) D115–D119[Abstract/Free Full Text].

    Bourne, P.E., et al. (2004) The distribution and query systems of the RCSB Protein Data Bank. Nucleic Acids Res, . 32, (Database issue) D223–D225[Abstract/Free Full Text].

    Li, W., et al. (2001) Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics, 17, 282–283[Abstract/Free Full Text].

    Li, W., et al. (2002a) Sequence clustering strategies improve remote homology recognitions while reducing search times. Bioinformatics, 15, 643–649.

    Li, W., et al. (2002b) Tolerating some redundancy significantly speeds up clustering of large protein databases. Protein Eng, . 18, 77–82.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Appl. Environ. Microbiol.Home page
P. D. Schloss, S. L. Westcott, T. Ryabin, J. R. Hall, M. Hartmann, E. B. Hollister, R. A. Lesniewski, B. B. Oakley, D. H. Parks, C. J. Robinson, et al.
Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities
Appl. Envir. Microbiol., December 1, 2009; 75(23): 7537 - 7541.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
K. Mochida, T. Yoshida, T. Sakurai, K. Yamaguchi-Shinozaki, K. Shinozaki, and L.-S. P. Tran
In silico Analysis of Transcription Factor Repertoire and Prediction of Stress Responsive Transcription Factors in Soybean
DNA Res, December 1, 2009; 16(6): 353 - 369.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Thomas, S. Karnik, R. S. Barai, V. K. Jayaraman, and S. Idicula-Thomas
CAMP: a useful resource for research on antimicrobial peptides
Nucleic Acids Res., November 18, 2009; (2009) gkp1021v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. A. Lee, R. Rentzsch, and C. Orengo
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
Nucleic Acids Res., November 18, 2009; (2009) gkp1049v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
V. K. Sharma, N. Kumar, T. Prakash, and T. D. Taylor
MetaBioME: a database to explore commercially useful enzymes in metagenomic datasets
Nucleic Acids Res., November 11, 2009; (2009) gkp1001v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
W. Li, H. McWilliam, A. R. de la Torre, A. Grodowski, I. Benediktovich, M. Goujon, S. Nauche, and R. Lopez
Non-redundant patent sequence databases with value-added annotations at two levels
Nucleic Acids Res., November 1, 2009; (2009) gkp960v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Vanhee, J. Reumers, F. Stricher, L. Baeten, L. Serrano, J. Schymkowitz, and F. Rousseau
PepX: a structural database of non-redundant protein-peptide complexes
Nucleic Acids Res., October 30, 2009; (2009) gkp893v1.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
M. Giannakis, H. K. Backhed, S. L. Chen, J. J. Faith, M. Wu, J. L. Guruge, L. Engstrand, and J. I. Gordon
Response of Gastric Epithelial Progenitors to Helicobacter pylori Isolates Obtained from Swedish Patients with Chronic Atrophic Gastritis
J. Biol. Chem., October 30, 2009; 284(44): 30383 - 30394.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
S. Lata and G.P.S. Raghava
Prediction and classification of chemokines and their receptors
Protein Eng. Des. Sel., July 1, 2009; 22(7): 441 - 444.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Bernsel, H. Viklund, A. Hennerdal, and A. Elofsson
TOPCONS: consensus prediction of membrane protein topology
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W465 - W468.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Pavelka, E. Chovancova, and J. Damborsky
HotSpot Wizard: a web server for identification of hot spots in protein engineering
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W376 - W383.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
K. Mochida, T. Yoshida, T. Sakurai, Y. Ogihara, and K. Shinozaki
TriFLDB: A Database of Clustered Full-Length Coding Sequences from Triticeae with Applications to Comparative Grass Genomics
Plant Physiology, July 1, 2009; 150(3): 1135 - 1146.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
A. Hernandez, M. J. Mate, P. C. Sanchez-Diaz, A. Romero, F. Rojo, and J. L. Martinez
Structural and Functional Analysis of SmeT, the Repressor of the Stenotrophomonas maltophilia Multidrug Efflux Pump SmeDEF
J. Biol. Chem., May 22, 2009; 284(21): 14428 - 14438.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
B. Andreopoulos, A. An, X. Wang, and M. Schroeder
A roadmap of clustering algorithms: finding a match for a biomedical application
Brief Bioinform, May 1, 2009; 10(3): 297 - 314.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Childs, Z. Nikoloski, P. May, and D. Walther
Identification and classification of ncRNA molecules using graph properties
Nucleic Acids Res., May 1, 2009; 37(9): e66 - e66.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Lo, Y.-Y. Chiu, E. A. Rodland, P.-C. Lyu, T.-Y. Sung, and W.-L. Hsu
Predicting helix-helix interactions from residue contacts in membrane proteins
Bioinformatics, April 15, 2009; 25(8): 996 - 1003.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
Y. Okazaki, M. Shimojima, Y. Sawada, K. Toyooka, T. Narisawa, K. Mochida, H. Tanaka, F. Matsuda, A. Hirai, M. Y. Hirai, et al.
A Chloroplastic UDP-Glucose Pyrophosphorylase from Arabidopsis Is the Committed Enzyme for the First Step of Sulfolipid Biosynthesis
PLANT CELL, March 1, 2009; 21(3): 892 - 909.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Gunther, J. von Eichborn, P. May, and R. Preissner
JAIL: a structure-based interface library for macromolecules
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D338 - D341.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Gerlach, E. V. Kriventseva, N. Rahman, C. E. Vejnar, and E. M. Zdobnov
miROrtho: computational survey of microRNA genes
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D111 - D117.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
O. Goldenberg, E. Erez, G. Nimrod, and N. Ben-Tal
The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D323 - D327.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Letunic, T. Doerks, and P. Bork
SMART 6: recent updates and new developments
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D229 - D232.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. H. Saier Jr, M. R. Yen, K. Noto, D. G. Tamang, and C. Elkan
The Transporter Classification Database: recent advances
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D274 - D278.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. J. Bordner
Predicting small ligand binding sites in proteins using backbone structure
Bioinformatics, December 15, 2008; 24(24): 2865 - 2871.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Holm, S. Kaariainen, P. Rosenstrom, and A. Schenkel
Searching protein structure databases with DaliLite v.3
Bioinformatics, December 1, 2008; 24(23): 2780 - 2781.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
J. Ren, L. Wen, X. Gao, C. Jin, Y. Xue, and X. Yao
CSS-Palm 2.0: an updated software for palmitoylation sites prediction
Protein Eng. Des. Sel., November 1, 2008; 21(11): 639 - 644.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Zhang, H. Zhang, K. Chen, S. Shen, J. Ruan, and L. Kurgan
Accurate sequence-based prediction of catalytic residues
Bioinformatics, October 15, 2008; 24(20): 2329 - 2338.
[Abstract] [Full Text] [PDF]


Home page
Sci SignalHome page
M. L. Miller, L. J. Jensen, F. Diella, C. Jorgensen, M. Tinti, L. Li, M. Hsiung, S. A. Parker, J. Bordeaux, T. Sicheritz-Ponten, et al.
Linear Motif Atlas for Phosphorylation-Dependent Signaling
Sci. Signal., September 2, 2008; 1(35): ra2 - ra2.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
M. Watanabe, K. Mochida, T. Kato, S. Tabata, N. Yoshimoto, M. Noji, and K. Saito
Comparative Genomics and Reverse Genetics Analysis Reveal Indispensable Functions of the Serine Acetyltransferase Gene Family in Arabidopsis
PLANT CELL, September 1, 2008; 20(9): 2484 - 2496.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc BHome page
F. R. Tabita, T. E Hanson, S. Satagopan, B. H Witte, and N. E Kreel
Phylogenetic and evolutionary relationships of RubisCO and the RubisCO-like proteins and the functional lessons provided by diverse molecular forms
Phil Trans R Soc B, August 27, 2008; 363(1504): 2629 - 2640.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. P. Brown
Efficient functional clustering of protein sequences using the Dirichlet process
Bioinformatics, August 15, 2008; 24(16): 1765 - 1771.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
E. Capriotti and M. A. Marti-Renom
RNA structure alignment by a unit-vector approach
Bioinformatics, August 15, 2008; 24(16): i112 - i118.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. S. Miller and D. Eisenberg
Using inferred residue contacts to distinguish between correct and incorrect protein models
Bioinformatics, July 15, 2008; 24(14): 1575 - 1582.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Gao and J. Skolnick
DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions
Nucleic Acids Res., July 1, 2008; 36(12): 3978 - 3992.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. Bernsel, H. Viklund, J. Falk, E. Lindahl, G. von Heijne, and A. Elofsson
Prediction of membrane-protein topology from first principles
PNAS, May 20, 2008; 105(20): 7177 - 7181.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Park and V. Helms
Prediction of the translocon-mediated membrane insertion free energies of protein sequences
Bioinformatics, May 15, 2008; 24(10): 1271 - 1277.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
P. May, S. Wienkoop, S. Kempa, B. Usadel, N. Christian, J. Rupprecht, J. Weiss, L. Recuenco-Munoz, O. Ebenhoh, W. Weckwerth, et al.
Metabolomics- and Proteomics-Assisted Genome Annotation and Analysis of the Draft Metabolic Network of Chlamydomonas reinhardtii
Genetics, May 1, 2008; 179(1): 157 - 166.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
Y.-R. Tang, Z.-Y. Sheng, Y.-Z. Chen, and Z. Zhang
An improved prediction of catalytic residues in enzyme structures
Protein Eng. Des. Sel., May 1, 2008; 21(5): 295 - 302.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Xu, J. Wu, J. Xiao, Y. Tan, Q. Bao, F. Zhao, and X. Li
PlasmoGF: an integrated system for comparative genomics and phylogenetic analysis of Plasmodium gene families
Bioinformatics, May 1, 2008; 24(9): 1217 - 1220.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. G. Artieri, W. Haerty, B. P. Gupta, and R. S. Singh
Sexual Selection and Maintenance of Sex: Evidence from Comparisons of Rates of Genomic Accumulation of Mutations and Divergence of Sex-Related Genes in Sexual and Hermaphroditic Species of Caenorhabditis
Mol. Biol. Evol., May 1, 2008; 25(5): 972 - 979.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
G. Palacios, J. Druce, L. Du, T. Tran, C. Birch, T. Briese, S. Conlan, P.-L. Quan, J. Hui, J. Marshall, et al.
A New Arenavirus in a Cluster of Fatal Transplant-Associated Diseases
N. Engl. J. Med., March 6, 2008; 358(10): 991 - 998.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Rubinstein and A. Fiser
Predicting disulfide bond connectivity in proteins by correlated mutations analysis
Bioinformatics, February 15, 2008; 24(4): 498 - 504.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
O. J. Jabado, Y. Liu, S. Conlan, P. L. Quan, H. Hegyi, Y. Lussier, T. Briese, G. Palacios, and W. I. Lipkin
Comprehensive viral oligonucleotide probe design using conserved protein regions
Nucleic Acids Res., January 17, 2008; 36(1): e3 - e3.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. V. Kriventseva, N. Rahman, O. Espinosa, and E. M. Zdobnov
OrthoDB: the hierarchical catalog of eukaryotic orthologs
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D271 - D275.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. M. R. Davila, P. N. Mendes, G. Wagner, D. A. Tschoeke, R. R. C. Cuadrat, F. Liberman, L. Matos, T. Satake, K. A. C. S. Ocana, O. Triana, et al.
ProtozoaDB: dynamic visualization and exploration of protozoan genomes
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D547 - D552.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Yeats, J. Lees, A. Reid, P. Kellam, N. Martin, X. Liu, and C. Orengo
Gene3D: comprehensive structural and functional annotation of genomes
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D414 - D418.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
S. Moslavac, K. Nicolaisen, O. Mirus, F. Al Dehni, R. Pernil, E. Flores, I. Maldener, and E. Schleiff
A TolC-Like Protein Is Required for Heterocyst Development in Anabaena sp. Strain PCC 7120
J. Bacteriol., November 1, 2007; 189(21): 7887 - 7895.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Chen and L. Kurgan
PFRES: protein fold classification by using evolutionary information and predicted secondary structure
Bioinformatics, November 1, 2007; 23(21): 2843 - 2850.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
M. Islinger, G. H. Luers, K. W. Li, M. Loos, and A. Volkl
Rat Liver Peroxisomes after Fibrate Treatment: A SURVEY USING QUANTITATIVE MASS SPECTROMETRY
J. Biol. Chem., August 10, 2007; 282(32): 23055 - 23069.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. A. Innis
siteFiNDER|3D: a web-based tool for predicting the location of functional sites in proteins
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W489 - W494.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Lopez, A. Valencia, and M. L. Tress
firestar--prediction of functionally important residues using structural templates and alignment reliability
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W573 - W577.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
B. E. Suzek, H. Huang, P. McGarvey, R. Mazumder, and C. H. Wu
UniRef: comprehensive and non-redundant UniProt reference clusters
Bioinformatics, May 15, 2007; 23(10): 1282 - 1288.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Feng and E. R.M. Tillier
A fast and flexible approach to oligonucleotide probe design for genomes and gene families
Bioinformatics, May 15, 2007; 23(10): 1195 - 1202.
[Abstract] [Full Text] [PDF]


Home page
RNAHome page
S. NG Kwang Loong and S. K. Mishra
Unique folding of precursor microRNAs: Quantitative evidence and implications for de novo identification
RNA, February 1, 2007; 13(2): 170 - 187.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Lopez, A. Valencia, and M. Tress
FireDB--a database of functionally important residues from proteins of known structure
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D219 - D223.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/13/1658    most recent
btl158v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (159)
Google Scholar
Right arrow Articles by Li, W.
Right arrow Articles by Godzik, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, W.
Right arrow Articles by Godzik, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?