Bioinformatics Advance Access originally published online on September 30, 2004
Bioinformatics 2005 21(5):608-616; doi:10.1093/bioinformatics/bti050
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Biological sequence analysis through the one-dimensional percolation transform and its enhanced version
Cybernetic Vision Research Group, GII-IFSC, Universidade de São Paulo São Carlos, SP, Caixa Postal 369, 13560-970, Brazil
Motivation: The necessity to characterize the spatial uniformity (or lack of it) of symbols in biological sequences, given its implications for identification of the properties of the structures associated with the sequences.
Methods: A one-dimensional version of a recently introduced percolation-based approach is presented, which allows the accurate quantification of symbol distributions even in the presence of co-existing densities. An enhanced version of this methodology, which uses an agglomerative process to organize hierarchically the sequence into subsequences, is also proposed and illustrated.
3. Results: The potential of the proposed methodology is illustrated with respect to synthetic and real data (1881 zebrafish and 1200 Xenopus proteins) and compared to two alternative multiscale methodologies, with encouraging results including the possibility to identify particularly remarkable amino acid arrangements in proteins.
4. Contact: luciano{at}if.sc.usp.br