Going metric: Denoising pairwise data

Volker Roth, Julian Laub, Joachim M. Buhmann, Klaus Muller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

24 Citations (Scopus)

Abstract

Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pair-wise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems
PublisherNeural information processing systems foundation
ISBN (Print)0262025507, 9780262025508
Publication statusPublished - 2003 Jan 1
Externally publishedYes
Event16th Annual Neural Information Processing Systems Conference, NIPS 2002 - Vancouver, BC, Canada
Duration: 2002 Dec 92002 Dec 14

Other

Other16th Annual Neural Information Processing Systems Conference, NIPS 2002
CountryCanada
CityVancouver, BC
Period02/12/902/12/14

Fingerprint

Learning systems
Invariance
Signal processing
Proteins

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Roth, V., Laub, J., Buhmann, J. M., & Muller, K. (2003). Going metric: Denoising pairwise data. In Advances in Neural Information Processing Systems Neural information processing systems foundation.

Going metric : Denoising pairwise data. / Roth, Volker; Laub, Julian; Buhmann, Joachim M.; Muller, Klaus.

Advances in Neural Information Processing Systems. Neural information processing systems foundation, 2003.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Roth, V, Laub, J, Buhmann, JM & Muller, K 2003, Going metric: Denoising pairwise data. in Advances in Neural Information Processing Systems. Neural information processing systems foundation, 16th Annual Neural Information Processing Systems Conference, NIPS 2002, Vancouver, BC, Canada, 02/12/9.
Roth V, Laub J, Buhmann JM, Muller K. Going metric: Denoising pairwise data. In Advances in Neural Information Processing Systems. Neural information processing systems foundation. 2003
Roth, Volker ; Laub, Julian ; Buhmann, Joachim M. ; Muller, Klaus. / Going metric : Denoising pairwise data. Advances in Neural Information Processing Systems. Neural information processing systems foundation, 2003.
@inproceedings{2ba63ce0178546e1a988956afdee8b9c,
title = "Going metric: Denoising pairwise data",
abstract = "Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pair-wise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods.",
author = "Volker Roth and Julian Laub and Buhmann, {Joachim M.} and Klaus Muller",
year = "2003",
month = "1",
day = "1",
language = "English",
isbn = "0262025507",
booktitle = "Advances in Neural Information Processing Systems",
publisher = "Neural information processing systems foundation",

}

TY - GEN

T1 - Going metric

T2 - Denoising pairwise data

AU - Roth, Volker

AU - Laub, Julian

AU - Buhmann, Joachim M.

AU - Muller, Klaus

PY - 2003/1/1

Y1 - 2003/1/1

N2 - Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pair-wise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods.

AB - Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pair-wise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods.

UR - http://www.scopus.com/inward/record.url?scp=84898938392&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898938392&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84898938392

SN - 0262025507

SN - 9780262025508

BT - Advances in Neural Information Processing Systems

PB - Neural information processing systems foundation

ER -