Bioinformatics Advance Access originally published online on December 20, 2008
Bioinformatics 2009 25(7):845-852; doi:10.1093/bioinformatics/btn649
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The analysis of inconsistencies between cytogenetic annotations and sequence mapping by defining the imprecision zones of cytogenetic banding
1Department of Computer Science and Information Engineering, 2Institute of Molecular Medicine, National Cheng Kung University, No.1 Da-Shueh Road and 3Department of Pathology, National Cheng Kung University Hospital, No.138 Sheng-Li Road, Tainan, Taiwan
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: In current databases, there are many genes with inconsistent mapping positions between their cytogenetic annotations and sequence map positions. However, not all inconsistencies are the same. Some of them may be problematic which should be corrected in the future; while others may result from the imprecise nature of chromosomal banding which may be tolerable. It is important to stratify the cytogenetic position information into different confidence groups with the recognition of the impreciseness of cytogenetic banding.
Results: When plotting their cytogenetic annotations against sequence map positions on a 2D plane, the consistent genes tend to have a compact linear distribution; while genes with inconsistent positions are more scattered. The overlapping areas between these two groups are defined as the tolerable imprecision zones by linear regression and distance analysis. The system was implemented using sequence information from NCBI Map Viewer Build 36.3 and cytogenetic annotations from NCBI Entrez Gene. The genes' position information is classified into five confidence groups: inconsistent-intolerable, inconsistent-tolerable, consistent-imprecise, consistent-precise and consistent-rough. Using information from NCBI Map Viewer Build 36.3 and NCBI Entrez Gene, the percentages of these confidence groups are 1.4%, 7.0%, 54.0%, 35.4% and 2.2%, respectively. Using information from NCBI Map Viewer Build 36.3 and NCBI online Mendelian inheritance in man (OMIM), the percentages are 3.7%, 16.9%, 49.0%, 19.0% and 11.4%, respectively. Combining these two results, a confidence table of genes' position information was constructed.
Availability: The detailed results are accessible over the Internet at http://centrallab.hosp.ncku.edu.tw/imz.
Contact: clh9{at}mail.ncku.edu.tw
Associate Editor: John Quackenbush
Received on August 28, 2008; revised on December 11, 2008; accepted on December 16, 2008