Feature discovery in non-metric pairwise data

Julian Laub, Klaus Muller

Research output: Contribution to journalArticle

44 Citations (Scopus)

Abstract

Pairwise proximity data, given as similarity or dissimilarity matrix, can violate metricity. This occurs either due to noise, fallible estimates, or due to intrinsic non-metric features such as they arise from human judgments. So far the problem of non-metric pairwise data has been tackled by essentially omitting the negative eigenvalues or shifting the spectrum of the associated (pseudo-)covariance matrix for a subsequent embedding. However, little attention has been paid to the negative part of the spectrum itself. In particular no answer was given to whether the directions associated to the negative eigenvalues would at all code variance other than noise related. We show by a simple, exploratory analysis that the negative eigenvalues can code for relevant structure in the data, thus leading to the discovery of new features, which were lost by conventional data analysis techniques. The information hidden in the negative eigenvalue part of the spectrum is illustrated and discussed for three data sets, namely USPS handwritten digits, text-mining and data from cognitive psychology.

Original languageEnglish
Pages (from-to)801-818
Number of pages18
JournalJournal of Machine Learning Research
Volume5
Publication statusPublished - 2004 Jul 1
Externally publishedYes

Fingerprint

Covariance matrix
Pairwise
Eigenvalue
Exploratory Analysis
Text Mining
Dissimilarity
Violate
Digit
Proximity
Data analysis
Estimate

Keywords

  • Embedding
  • Exploratory data analysis
  • Feature discovery
  • Non-metric
  • Pairwise data
  • Unsupervised learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Cite this

Feature discovery in non-metric pairwise data. / Laub, Julian; Muller, Klaus.

In: Journal of Machine Learning Research, Vol. 5, 01.07.2004, p. 801-818.

Research output: Contribution to journalArticle

Laub, Julian ; Muller, Klaus. / Feature discovery in non-metric pairwise data. In: Journal of Machine Learning Research. 2004 ; Vol. 5. pp. 801-818.
@article{9028a8bdd9ce44aa92c8ec697d8869a3,
title = "Feature discovery in non-metric pairwise data",
abstract = "Pairwise proximity data, given as similarity or dissimilarity matrix, can violate metricity. This occurs either due to noise, fallible estimates, or due to intrinsic non-metric features such as they arise from human judgments. So far the problem of non-metric pairwise data has been tackled by essentially omitting the negative eigenvalues or shifting the spectrum of the associated (pseudo-)covariance matrix for a subsequent embedding. However, little attention has been paid to the negative part of the spectrum itself. In particular no answer was given to whether the directions associated to the negative eigenvalues would at all code variance other than noise related. We show by a simple, exploratory analysis that the negative eigenvalues can code for relevant structure in the data, thus leading to the discovery of new features, which were lost by conventional data analysis techniques. The information hidden in the negative eigenvalue part of the spectrum is illustrated and discussed for three data sets, namely USPS handwritten digits, text-mining and data from cognitive psychology.",
keywords = "Embedding, Exploratory data analysis, Feature discovery, Non-metric, Pairwise data, Unsupervised learning",
author = "Julian Laub and Klaus Muller",
year = "2004",
month = "7",
day = "1",
language = "English",
volume = "5",
pages = "801--818",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Feature discovery in non-metric pairwise data

AU - Laub, Julian

AU - Muller, Klaus

PY - 2004/7/1

Y1 - 2004/7/1

N2 - Pairwise proximity data, given as similarity or dissimilarity matrix, can violate metricity. This occurs either due to noise, fallible estimates, or due to intrinsic non-metric features such as they arise from human judgments. So far the problem of non-metric pairwise data has been tackled by essentially omitting the negative eigenvalues or shifting the spectrum of the associated (pseudo-)covariance matrix for a subsequent embedding. However, little attention has been paid to the negative part of the spectrum itself. In particular no answer was given to whether the directions associated to the negative eigenvalues would at all code variance other than noise related. We show by a simple, exploratory analysis that the negative eigenvalues can code for relevant structure in the data, thus leading to the discovery of new features, which were lost by conventional data analysis techniques. The information hidden in the negative eigenvalue part of the spectrum is illustrated and discussed for three data sets, namely USPS handwritten digits, text-mining and data from cognitive psychology.

AB - Pairwise proximity data, given as similarity or dissimilarity matrix, can violate metricity. This occurs either due to noise, fallible estimates, or due to intrinsic non-metric features such as they arise from human judgments. So far the problem of non-metric pairwise data has been tackled by essentially omitting the negative eigenvalues or shifting the spectrum of the associated (pseudo-)covariance matrix for a subsequent embedding. However, little attention has been paid to the negative part of the spectrum itself. In particular no answer was given to whether the directions associated to the negative eigenvalues would at all code variance other than noise related. We show by a simple, exploratory analysis that the negative eigenvalues can code for relevant structure in the data, thus leading to the discovery of new features, which were lost by conventional data analysis techniques. The information hidden in the negative eigenvalue part of the spectrum is illustrated and discussed for three data sets, namely USPS handwritten digits, text-mining and data from cognitive psychology.

KW - Embedding

KW - Exploratory data analysis

KW - Feature discovery

KW - Non-metric

KW - Pairwise data

KW - Unsupervised learning

UR - http://www.scopus.com/inward/record.url?scp=33745428531&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745428531&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33745428531

VL - 5

SP - 801

EP - 818

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -