Machine learning models for lipophilicity and their domain of applicability

Timon Schroeter, Anton Schwaighofer, Sebastian Mika, Antonius Ter Laak, Detlev Suelzle, Ursula Ganzer, Nikolaus Heinrich, Klaus Robert Müller

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Unfavorable lipophilicity and water solubility cause many drug failures; therefore these properties have to be taken into account early on in lead discovery. Commercial tools for predicting lipophilicity usually have been trained on small and neutral molecules, and are thus often unable to accurately predict in-house data. Using a modern Bayesian machine learning algorithm - a Gaussian process model - this study constructs a log D7 model based on 14556 drug discovery compounds of Bayer Schering Pharma. Performance is compared with support vector machines, decision trees, ridge regression, and four commercial tools. In a blind test on 7013 new measurements from the last months (including compounds from new projects) 81% were predicted correctly within 1 log unit, compared to only 44% achieved by commercial software. Additional evaluations using public data are presented. We consider error bars for each method (model based error bars, ensemble based, and distance based approaches), and investigate how well they quantify the domain of applicability of each model.

Original languageEnglish
Pages (from-to)524-538
Number of pages15
JournalMolecular Pharmaceutics
Volume4
Issue number4
DOIs
Publication statusPublished - 2007 Jul 1

Keywords

  • Bayesian
  • Decision tree
  • Distance
  • Domain of applicability
  • Drug discovery
  • Ensemble
  • Error bar
  • Error estimation
  • Gaussian process
  • Machine learning
  • Modeling
  • Random forest
  • Support vector machine
  • Support vector regression

ASJC Scopus subject areas

  • Molecular Medicine
  • Pharmaceutical Science
  • Drug Discovery

Fingerprint Dive into the research topics of 'Machine learning models for lipophilicity and their domain of applicability'. Together they form a unique fingerprint.

  • Cite this

    Schroeter, T., Schwaighofer, A., Mika, S., Ter Laak, A., Suelzle, D., Ganzer, U., Heinrich, N., & Müller, K. R. (2007). Machine learning models for lipophilicity and their domain of applicability. Molecular Pharmaceutics, 4(4), 524-538. https://doi.org/10.1021/mp0700413