Bioinformatics Vol. 18 no. 1 2002
Pages 83-91
© 2002 Oxford University Press
Improved database searches for orthologous sequences by conditioning on outgroup sequences*
Department of Clinical Pharmacology, Royal College of Surgeons in Ireland, 123 Stephens Green, Dublin 2, Ireland
Received on December 8, 2000
; revised on July 18, 2001
; accepted on August 15, 2001
Motivation: Searches of biological sequence databases are usually focussed on distinguishing significant from random matches. However, the increasing abundance of related sequences on databases present a second challenge: to distinguish the evolutionarily most closely related sequences (often orthologues) from more distantly related homologues. This is particularly important when searching a database of partial sequences, where short orthologous sequences from a non-conserved region will score much more poorly than non-orthologous (outgroup) sequences from a conserved region.
Results: Such inferences are shown to be improved by conditioning the search results on the scores of an outgroup sequence. The log-odds score for each target sequence identified on the database has the log-odds score of the outgroup sequence subtracted from it. A test group of Caenorhabditis elegans kinase sequences and their identified C.elegans outgroups were searched against a test database of human Expressed Sequence Tag (EST) sequences, where the sets of true target sequences were known in advance. The outgroup conditioned method was shown to identify 58% more true positives ahead of the first false positive, compared to the straightforward search without an outgroup. A test dataset of 151 proteins drawn from the C.elegans genome, where the putative outgroup was assigned automatically, similarly found 50% more true positives using outgroup conditioning. Thus, outgroup conditioning provides a means to improve the results of database searching with little increase in the search computation time.
Availability: Perl scripts for the Outgroup Conditioned Score (OCS) method are available without charge for non-profit academic use from http://www.bioinf.org/vibe/software/OCS/. Scripts have been optimized for Linux or OSF with a Perl v5 interpreter.
Contact: dshields{at}rcsi.ie.
* Non-standard abbreviations: OCS: Outgroup Conditioned Score, HOCS Heuristic Outgroup Conditioned Score, HOCP: Heuristic Outgroup Conditioned P-value ratio, EST: Expressed Sequence Tag.
1 To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Brilli, R. Fani, and P. Lio Current trends in the bioinformatic sequence analysis of metabolic pathways in prokaryotes Brief Bioinform, January 1, 2008; 9(1): 34 - 45. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Uchiyama Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes Nucleic Acids Res., January 25, 2006; 34(2): 647 - 658. [Abstract] [Full Text] [PDF] |
||||

