Wasserstein training of restricted boltzmann machines

Grégoire Montavon, Klaus Muller, Marco Cuturi

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.

Original languageEnglish
Pages (from-to)3718-3726
Number of pages9
JournalAdvances in Neural Information Processing Systems
Publication statusPublished - 2016

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Wasserstein training of restricted boltzmann machines. / Montavon, Grégoire; Muller, Klaus; Cuturi, Marco.

In: Advances in Neural Information Processing Systems, 2016, p. 3718-3726.

Research output: Contribution to journalArticle

Montavon, Grégoire ; Muller, Klaus ; Cuturi, Marco. / Wasserstein training of restricted boltzmann machines. In: Advances in Neural Information Processing Systems. 2016 ; pp. 3718-3726.
@article{10c8a42297a3494bbbe8f062ffcf3b39,
title = "Wasserstein training of restricted boltzmann machines",
abstract = "Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.",
author = "Gr{\'e}goire Montavon and Klaus Muller and Marco Cuturi",
year = "2016",
language = "English",
pages = "3718--3726",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

TY - JOUR

T1 - Wasserstein training of restricted boltzmann machines

AU - Montavon, Grégoire

AU - Muller, Klaus

AU - Cuturi, Marco

PY - 2016

Y1 - 2016

N2 - Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.

AB - Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.

UR - http://www.scopus.com/inward/record.url?scp=85019180324&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019180324&partnerID=8YFLogxK

M3 - Article

SP - 3718

EP - 3726

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -