Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (158)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, W.
Right arrow Articles by Godzik, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, W.
Right arrow Articles by Godzik, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 3 2001
Pages 282-283
© 2001 Oxford University Press


Applications Note

Clustering of highly homologous sequences to reduce the size of large protein databases

Weizhong Li 1, Lukasz Jaroszewski 2 and Adam Godzik 2,*

1 San Diego Supercomputer Center, La Jolla, CA 92093, USA
2 The Burnham Institute, La Jolla, CA 92037, USA

Received on October 4, 2000 ; revised on November 1, 2000 ; accepted on November 6, 2000

Summary: We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive database searches.

Availability: The program is available from http://bioinformatics.burnham-inst.org/cd-hi

Contact: liwz{at}sdsc.edu or adam{at}burnham-inst.org

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
M. von Grotthuss, D. Plewczynski, G. Vriend, and L. Rychlewski
3D-Fun: predicting enzyme function from structure
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W303 - W307.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Guo, L. Yu, Z. Wen, and M. Li
Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences
Nucleic Acids Res., May 1, 2008; 36(9): 3025 - 3030.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
M. Chruszcz, W. Potrzebowski, M. D. Zimmerman, M. Grabowski, H. Zheng, P. Lasota, and W. Minor
Analysis of solvent content and oligomeric states in protein crystals--does symmetry matter?
Protein Sci., April 1, 2008; 17(4): 623 - 632.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Heger, E. Korpelainen, T. Hupponen, K. Mattila, V. Ollikainen, and L. Holm
PairsDB atlas of protein sequence space
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D276 - D280.
[Abstract] [Full Text] [PDF]


Home page
Biophys. JHome page
K. Brock, K. Talley, K. Coley, P. Kundrotas, and E. Alexov
Optimization of Electrostatic Interactions in Protein-Protein Complexes
Biophys. J., November 15, 2007; 93(10): 3340 - 3352.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Smialowski, A. J. Martin-Galiano, A. Mikolajka, T. Girschick, T. A. Holak, and D. Frishman
Protein solubility: sequence based prediction and experimental verification
Bioinformatics, October 1, 2007; 23(19): 2536 - 2542.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. A. Innis
siteFiNDER|3D: a web-based tool for predicting the location of functional sites in proteins
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W489 - W494.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
B. E. Suzek, H. Huang, P. McGarvey, R. Mazumder, and C. H. Wu
UniRef: comprehensive and non-redundant UniProt reference clusters
Bioinformatics, May 15, 2007; 23(10): 1282 - 1288.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Qiu, M. Hue, A. Ben-Hur, J.-P. Vert, and W. S. Noble
A structural alignment kernel for protein structures
Bioinformatics, May 1, 2007; 23(9): 1090 - 1098.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Przybylski and B. Rost
Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments
Nucleic Acids Res., April 1, 2007; 35(7): 2238 - 2246.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
The UniProt Consortium
The Universal Protein Resource (UniProt)
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D193 - D197.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
E. Capriotti, R. Calabrese, and R. Casadio
Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information
Bioinformatics, November 15, 2006; 22(22): 2729 - 2734.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
Q.-B. Gao and Z.-Z. Wang
Classification of G-protein coupled receptors at four levels
Protein Eng. Des. Sel., November 1, 2006; 19(11): 511 - 516.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. R. Banatao, D. Cascio, C. S. Crowley, M. R. Fleissner, H. L. Tienson, and T. O. Yeates
An approach to crystallizing proteins by synthetic symmetrization
PNAS, October 31, 2006; 103(44): 16230 - 16235.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
I. Friedberg
Automated protein function prediction--the genomic challenge
Brief Bioinform, September 1, 2006; 7(3): 225 - 242.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
H. Liu, Z.-Z. Hu, M. Torii, C. Wu, and C. Friedman
Quantitative Assessment of Dictionary-based Protein Named Entity Tagging
J. Am. Med. Inform. Assoc., September 1, 2006; 13(5): 497 - 507.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Lee, B. Lee, I. Jang, S. Kim, and J. Bhak
Localizome: a server for identifying transmembrane topologies and TM helices of eukaryotic proteins utilizing domain information.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W99 - W103.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Novatchkova, G. Schneider, R. Fritz, F. Eisenhaber, and A. Schleiffer
DOUTfinder--identification of distant domain outliers using subsignificant sequence similarity.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W214 - W218.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. T.-H. Chang, Y.-Z. Weng, J.-H. Lin, M.-J. Hwang, and Y.-J. Oyang
Protemot: prediction of protein binding sites with automatically extracted geometrical templates.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W303 - W309.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Li and A. Godzik
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
Bioinformatics, July 1, 2006; 22(13): 1658 - 1659.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Arnold, L. Bordoli, J. Kopp, and T. Schwede
The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling
Bioinformatics, January 15, 2006; 22(2): 195 - 201.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
T. Kosuge, T. Abe, T. Okido, N. Tanaka, M. Hirahata, Y. Maruyama, J. Mashima, A. Tomiki, M. Kurokawa, R. Himeno, et al.
Exploration and Grading of Possible Genes from 183 Bacterial Strains by a Common Protocol to Identification of New Genes: Gene Trek in Prokaryote Space (GTPS)
DNA Res, January 1, 2006; 13(6): 245 - 254.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Winter, A. Henschel, W. K. Kim, and M. Schroeder
SCOPPI: a structural classification of protein-protein interfaces
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D310 - D314.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
H. J. Korza and M. Bochtler
Pseudomonas aeruginosa LD-Carboxypeptidase, a Serine Peptidase with a Ser-His-Glu Triad and a Nucleophilic Elbow
J. Biol. Chem., December 9, 2005; 280(49): 40802 - 40812.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K.-J. Park, M. M. Gromiha, P. Horton, and M. Suwa
Discrimination of outer membrane proteins using support vector machines
Bioinformatics, December 1, 2005; 21(23): 4223 - 4229.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Guda and S. Subramaniam
TARGET: a new method for predicting protein subcellular localization in eukaryotes
Bioinformatics, November 1, 2005; 21(21): 3963 - 3969.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. R. Yang, R. Thomson, P. McNeil, and R. M. Esnouf
RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins
Bioinformatics, August 15, 2005; 21(16): 3369 - 3376.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Smith, V. Kunin, L. Goldovsky, A. J. Enright, and C. A. Ouzounis
MagicMatch--cross-referencing sequence identifiers across databases
Bioinformatics, August 15, 2005; 21(16): 3429 - 3430.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
N. Huang, H. Chen, and Z. Sun
CTKPred: an SVM-based method for the prediction and classification of the cytokine superfamily
Protein Eng. Des. Sel., August 1, 2005; 18(8): 365 - 368.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Jaroszewski, L. Rychlewski, Z. Li, W. Li, and A. Godzik
FFAS03: a server for profile-profile sequence alignments
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W284 - W288.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Qi, R. Lee, and S. Hayward
A comprehensive and non-redundant database of protein domain movements
Bioinformatics, June 15, 2005; 21(12): 2832 - 2838.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
M. Schneider, A. Bairoch, C. H. Wu, and R. Apweiler
Plant Protein Annotation in the UniProt Knowledgebase
Plant Physiology, May 1, 2005; 138(1): 59 - 66.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Malde, E. Coward, and I. Jonassen
A graph based algorithm for generating EST consensus sequences
Bioinformatics, April 15, 2005; 21(8): 1371 - 1375.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Itoh, S. Goto, T. Akutsu, and M. Kanehisa
Fast and accurate database homology search using upper bounds of local alignment scores
Bioinformatics, April 1, 2005; 21(7): 912 - 921.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Pugalenthi, A. Bhaduri, and R. Sowdhamini
GenDiS: Genomic Distribution of protein structural domain Superfamilies
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D252 - D255.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
O. Th. Magnusson, H. Toyama, M. Saeki, A. Rojas, J. C. Reed, R. C. Liddington, J. P. Klinman, and R. Schwarzenbacher
Quinone biogenesis: Structure and mechanism of PqqC, the final catalyst in the production of pyrroloquinoline quinone
PNAS, May 25, 2004; 101(21): 7913 - 7918.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
M. L. Sierk and W. R. Pearson
Sensitivity and selectivity in protein structure comparison
Protein Sci., March 1, 2004; 13(3): 773 - 785.
[Abstract] [Full Text] [PDF]


Home page
GlycobiologyHome page
S. Ben-Dor, N. Esterman, E. Rubin, and N. Sharon
Biases and complex patterns in the residues flanking protein N-glycosylation sites
Glycobiology, February 1, 2004; 14(2): 95 - 101.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
K. G. Wirth, R. Ricci, J. F. Gimenez-Abian, S. Taghybeeglu, N. R. Kudo, W. Jochum, M. Vasseur-Cognet, and K. Nasmyth
Loss of the anaphase-promoting complex in quiescent cells causes unscheduled hepatocyte proliferation
Genes & Dev., January 1, 2004; 18(1): 88 - 98.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. E. Bourne, K. J. Addess, W. F. Bluhm, L. Chen, N. Deshpande, Z. Feng, W. Fleri, R. Green, J. C. Merino-Ott, W. Townsend-Merino, et al.
The distribution and query systems of the RCSB Protein Data Bank
Nucleic Acids Res., January 1, 2004; 32(90001): D223 - 225.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Cotter, P. Guda, E. Fahy, and S. Subramaniam
MitoProteome: mitochondrial protein sequence database and annotation system
Nucleic Acids Res., January 1, 2004; 32(90001): D463 - 467.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
H. Lavoie, F. Debeane, Q.-D. Trinh, J.-F. Turcotte, L.-P. Corbeil-Girard, M.-J. Dicaire, A. Saint-Denis, M. Page, G. A. Rouleau, and B. Brais
Polymorphism, shared functions and convergent evolution of genes with sequences coding for polyalanine domains
Hum. Mol. Genet., November 15, 2003; 12(22): 2967 - 2979.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B. de Bono and C. Chothia
Exegesis: a procedure to improve gene predictions and its use to find immunoglobulin superfamily proteins in the human and mouse genomes
Nucleic Acids Res., November 1, 2003; 31(21): 6096 - 6103.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Mika and B. Rost
UniqueProt: creating representative protein sequence sets
Nucleic Acids Res., July 1, 2003; 31(13): 3789 - 3791.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
W. Li, L. Jaroszewski, and A. Godzik
Sequence clustering strategies improve remote homology recognitions while reducing search times
Protein Eng. Des. Sel., August 1, 2002; 15(8): 643 - 649.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.