Bioinformatics Vol. 19 no. 10 2003
Pages 1208-1215
© 2003 Oxford University Press
Characteristic substructures and properties in chemical carcinogens studied by the cascade model
Center for Information & Media Studies, Kwansei Gakuin University, 1-1-155 Uegahara, Nishinomiya, 662-8501, Japan
Received on December 29, 2001
; revised on October 31, 2002
; accepted on November 6, 2002
Motivation: Chemical carcinogenicity is an important subject in health and environmental sciences, and a reliable method is expected to identify characteristic factors for carcinogenicity. The predictive toxicology challenge (PTC) 20002001 has provided the opportunity for various data mining methods to evaluate their performance. The cascade model, a data mining method developed by the author, has the capability to mine for local correlations in data sets with a large number of attributes. The current paper explores the effectiveness of the method on the problem of chemical carcinogenicity.
Results: Rodent carcinogenicity of 417 compounds examined by the National Toxicology Program (NTP) was used as the training set. The analysis by the cascade model, for example, could obtain a rule Highly flexible molecules are carcinogenic, if they have no hydrogen bond acceptors in halogenated alkanes and alkenes. Resulting rules are applied to predict the activity of 185 compounds examined by the FDA. The ROC analysis performed by the PTC organizers has shown that the current method has excellent predictive power for the female rat data.
Availability: The binary program of DISCAS 2.1 and samples of input data sets on Windows PC are available at http://www.clab.kwansei.ac.jp/mining/discas/discas.html
Supplementary information: Summary of prediction results and cross validations is accessible via http://www.clab.kwansei.ac.jp/~okada/BIJ/BIJsupple.htm Used rules and the prediction results for each molecule are also provided.
Contact: okada-office{at}ksc.kwansei.ac.jp