Benchmark data set for in silico prediction of Ames mutagenicity

Katja Hansen, Sebastian Mika, Timon Schroeter, Andreas Sutter, Antonius Ter Laak, Steger Hartmann Thomas, Nikolaus Heinrich, Klaus Muller

Research output: Contribution to journalArticle

161 Citations (Scopus)

Abstract

Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biological activity. Three commercial tools (DEREK, MultiCASE, and an off-the-shelf Bayesian machine learner in Pipeline Pilot) are compared with four noncommercial machine learning implementations (Support Vector Machines, Random Forests, k-Nearest Neighbors, and Gaussian Processes) on the new benchmark data set.

Original languageEnglish
Pages (from-to)2077-2081
Number of pages5
JournalJournal of Chemical Information and Modeling
Volume49
Issue number9
DOIs
Publication statusPublished - 2009 Sep 28
Externally publishedYes

Fingerprint

Bioactivity
Support vector machines
Learning systems
Pipelines
learning

ASJC Scopus subject areas

  • Chemistry(all)
  • Chemical Engineering(all)
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Hansen, K., Mika, S., Schroeter, T., Sutter, A., Laak, A. T., Thomas, S. H., ... Muller, K. (2009). Benchmark data set for in silico prediction of Ames mutagenicity. Journal of Chemical Information and Modeling, 49(9), 2077-2081. https://doi.org/10.1021/ci900161g

Benchmark data set for in silico prediction of Ames mutagenicity. / Hansen, Katja; Mika, Sebastian; Schroeter, Timon; Sutter, Andreas; Laak, Antonius Ter; Thomas, Steger Hartmann; Heinrich, Nikolaus; Muller, Klaus.

In: Journal of Chemical Information and Modeling, Vol. 49, No. 9, 28.09.2009, p. 2077-2081.

Research output: Contribution to journalArticle

Hansen, K, Mika, S, Schroeter, T, Sutter, A, Laak, AT, Thomas, SH, Heinrich, N & Muller, K 2009, 'Benchmark data set for in silico prediction of Ames mutagenicity', Journal of Chemical Information and Modeling, vol. 49, no. 9, pp. 2077-2081. https://doi.org/10.1021/ci900161g
Hansen K, Mika S, Schroeter T, Sutter A, Laak AT, Thomas SH et al. Benchmark data set for in silico prediction of Ames mutagenicity. Journal of Chemical Information and Modeling. 2009 Sep 28;49(9):2077-2081. https://doi.org/10.1021/ci900161g
Hansen, Katja ; Mika, Sebastian ; Schroeter, Timon ; Sutter, Andreas ; Laak, Antonius Ter ; Thomas, Steger Hartmann ; Heinrich, Nikolaus ; Muller, Klaus. / Benchmark data set for in silico prediction of Ames mutagenicity. In: Journal of Chemical Information and Modeling. 2009 ; Vol. 49, No. 9. pp. 2077-2081.
@article{914186a5c8b941bbb2cc32cb106996e7,
title = "Benchmark data set for in silico prediction of Ames mutagenicity",
abstract = "Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biological activity. Three commercial tools (DEREK, MultiCASE, and an off-the-shelf Bayesian machine learner in Pipeline Pilot) are compared with four noncommercial machine learning implementations (Support Vector Machines, Random Forests, k-Nearest Neighbors, and Gaussian Processes) on the new benchmark data set.",
author = "Katja Hansen and Sebastian Mika and Timon Schroeter and Andreas Sutter and Laak, {Antonius Ter} and Thomas, {Steger Hartmann} and Nikolaus Heinrich and Klaus Muller",
year = "2009",
month = "9",
day = "28",
doi = "10.1021/ci900161g",
language = "English",
volume = "49",
pages = "2077--2081",
journal = "Journal of Chemical Information and Computer Sciences",
issn = "0095-2338",
publisher = "American Chemical Society",
number = "9",

}

TY - JOUR

T1 - Benchmark data set for in silico prediction of Ames mutagenicity

AU - Hansen, Katja

AU - Mika, Sebastian

AU - Schroeter, Timon

AU - Sutter, Andreas

AU - Laak, Antonius Ter

AU - Thomas, Steger Hartmann

AU - Heinrich, Nikolaus

AU - Muller, Klaus

PY - 2009/9/28

Y1 - 2009/9/28

N2 - Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biological activity. Three commercial tools (DEREK, MultiCASE, and an off-the-shelf Bayesian machine learner in Pipeline Pilot) are compared with four noncommercial machine learning implementations (Support Vector Machines, Random Forests, k-Nearest Neighbors, and Gaussian Processes) on the new benchmark data set.

AB - Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biological activity. Three commercial tools (DEREK, MultiCASE, and an off-the-shelf Bayesian machine learner in Pipeline Pilot) are compared with four noncommercial machine learning implementations (Support Vector Machines, Random Forests, k-Nearest Neighbors, and Gaussian Processes) on the new benchmark data set.

UR - http://www.scopus.com/inward/record.url?scp=70349910465&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349910465&partnerID=8YFLogxK

U2 - 10.1021/ci900161g

DO - 10.1021/ci900161g

M3 - Article

VL - 49

SP - 2077

EP - 2081

JO - Journal of Chemical Information and Computer Sciences

JF - Journal of Chemical Information and Computer Sciences

SN - 0095-2338

IS - 9

ER -