Benchmark data set for in silico prediction of Ames mutagenicity

Katja Hansen, Sebastian Mika, Timon Schroeter, Andreas Sutter, Antonius Ter Laak, Steger Hartmann Thomas, Nikolaus Heinrich, Klaus Robert Müller

Research output: Contribution to journalArticlepeer-review

219 Citations (Scopus)


Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biological activity. Three commercial tools (DEREK, MultiCASE, and an off-the-shelf Bayesian machine learner in Pipeline Pilot) are compared with four noncommercial machine learning implementations (Support Vector Machines, Random Forests, k-Nearest Neighbors, and Gaussian Processes) on the new benchmark data set.

Original languageEnglish
Pages (from-to)2077-2081
Number of pages5
JournalJournal of Chemical Information and Modeling
Issue number9
Publication statusPublished - 2009 Sep 28

ASJC Scopus subject areas

  • Chemistry(all)
  • Chemical Engineering(all)
  • Computer Science Applications
  • Library and Information Sciences


Dive into the research topics of 'Benchmark data set for in silico prediction of Ames mutagenicity'. Together they form a unique fingerprint.

Cite this