Bioinformatics Vol. 19 no. 11 2003
Pages 1391-1396
© 2003 Oxford University Press
Alignment of BLAST high-scoring segment pairs based on the longest increasing subsequence algorithm
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA
Received on September 16, 2002
; revised on January 2, 2003
; accepted on February 14, 2003
Motivation:The popular BLAST algorithm is based on a local similarity search strategy, so its high-scoring segment pairs (HSPs) do not have global alignment information. When scientists use BLAST to search for a target protein or DNA sequence in a huge database like the human genome map, the existence of repeated fragments, homologues or pseudogenes in the genome often makes the BLAST result filled with redundant HSPs. Therefore, we need a computational strategy to alleviate this problem.
Results: In the gene discovery group of Celera Genomics, I developed a two-step method, i.e. a BLAST step plus an LIS step, to align thousands of cDNA and protein sequences into the human genome map. The LIS step is based on a mature computational algorithm, Longest Increasing Subsequence (LIS) algorithm. The idea is to use the LIS algorithm to find the longest series of consecutive HSPs in the BLAST output. Such a BLAST+LIS strategy can be used as an independent alignment tool or as a complementary tool for other alignment programs like Sim4 and GenWise. It can also work as a general purpose BLAST result processor in all sorts of BLAST searches. Two examples from Celera were shown in this paper.
Contact: me{at}hongyu.org
* Present address: Ceres Inc., 3007 Malibu Canyon Road, Malibu, CA 90265, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. She, J. S.-C. Chu, K. Wang, J. Pei, and N. Chen genBlastA: Enabling BLAST to identify homologous gene sequences Genome Res., January 1, 2009; 19(1): 143 - 149. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. D. Wu and C. K. Watanabe GMAP: a genomic mapping and alignment program for mRNA and EST sequences Bioinformatics, May 1, 2005; 21(9): 1859 - 1875. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Tian, J. Hu, H. Zhang, and C. S. Lutz A large-scale analysis of mRNA polyadenylation of human and mouse genes Nucleic Acids Res., January 12, 2005; 33(1): 201 - 212. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Veeramachaneni and W. Makalowski Visualizing Sequence Similarity of Protein Families Genome Res., June 1, 2004; 14(6): 1160 - 1169. [Abstract] [Full Text] [PDF] |
||||


