Bioinformatics Vol. 19 no. 1 2003
Pages 30-36
© 2003 Oxford University Press
Representation of DNA sequences with virtual potentials and their processing by (SEQREP) Kohonen self-organizing maps
1 Departamento de Química, CQFB,
campus Faculdade de Ciências e Tecnologia,
Universidade Nova de Lisboa, Quinta da Torre, 2829-516 Monte de Caparica
2 Clínica Universitária de Pediatria,
Hospital de Santa Maria, Av. Prof. Egas Moniz, 1699 Lisboa Codex,
Portugal
Received on May 29, 2002
; revised on July 22, 2002
; accepted on July 25, 2002
Motivation: We propose representing individual positions in DNA sequences by virtual potentials generated by other bases of the same sequence. This is a compact representation of the neighbourhood of a base. The distribution of the virtual potentials over the whole sequence can be used as a representation of the entire sequence (SEQREP code). It is a flexible code, with a length independent of the sequence size, does not require previous alignment, and is convenient for processing by neural networks or statistical techniques.
Results: To evaluate its biological significance, the SEQREP code was used for training Kohonen self-organizing maps (SOMs) in two applications: (a) detection of Alu sequences, and (b) classification of sequences encoding for HIV-1 envelope glycoprotein (env) into subtypes A-G. It was demonstrated that SOMs clustered sequences belonging to different classes into distinct regions. For independent test sets, very high rates of correct predictions were obtained (97% in the first application, 91% in the second). Possible areas of application of SEQREP codes include functional genomics, phylogenetic analysis, detection of repetitions, database retrieval, and automatic alignment.
Availability: Software for representing sequences by SEQREP code, and for training Kohonen SOMs is made freely available from http://www.dq.fct.unl.pt/qoa/jas/seqrep
Contact: jas{at}fct.unl.pt
Supplementary Information: Supplementary material is available at http://www.dq.fct.unl.pt/qoa/jas/seqrep/bioinf2002
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
N. Fankhauser and P. Maser Identification of GPI anchor attachment signals by a Kohonen self-organizing map Bioinformatics, May 1, 2005; 21(9): 1846 - 1852. [Abstract] [Full Text] [PDF] |
||||
