Doubly supervised embedding based on class labels and intrinsic clusters for high-dimensional data visualization

Hannah Kim, Jaegul Choo, Chandan K. Reddy, Haesun Park

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Visualization of data can assist decision-making processes by presenting the underlying information in a perceptible manner. Many dimension reduction techniques have been proposed to generate faithful visualization snapshots given high-dimensional data. When class labels associated with the data are already provided, supervised dimension reduction methods, which utilize such pre-given label information as well as the data, have been effective in revealing the overall structure of data with respect to their pre-given class labels. However, the main principle of most of these supervised methods has been to enhance class separability, which generally leads to significant distortion of original relationships. To compensate for such distortion, we propose a novel doubly supervised dimension reduction approach that highlights both natural groupings conforming to original relationships and classes determined by pre-given labels. Our method imposes minimal supervision on the pre-given class information depending on their original distributions while imposing additional supervision on natural groupings to better preserve them in reduced feature space. Specifically, we apply the notion of doubly supervised dimension reduction to a state-of-the-art method called t-distributed stochastic neighbor embedding and present a new formulation and an algorithm. By performing both quantitative and qualitative analyses, we demonstrate the effectiveness of our method using various visualization examples on real-world data. Our results show that, compared to other existing methods, the proposed method better preserves the original high-dimensional relationships while simultaneously maintaining class separability and preserving cluster structures. In addition, due to the characteristics of preserving natural groupings, the visualization results generated by our method reveal interesting sub-groups that cohesively preserve the original relationships in the data.

Original languageEnglish
Pages (from-to)570-582
Number of pages13
JournalNeurocomputing
Volume150
Issue numberPB
DOIs
Publication statusPublished - 2015 Jan 1
Externally publishedYes

Fingerprint

Data visualization
Labels
Visualization
Decision making
Decision Making

Keywords

  • Clustering
  • Multidimensional projection
  • Scatter plot
  • Supervised dimension reduction
  • T-distributed stochastic neighbor embedding
  • Visualization

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Cite this

Doubly supervised embedding based on class labels and intrinsic clusters for high-dimensional data visualization. / Kim, Hannah; Choo, Jaegul; Reddy, Chandan K.; Park, Haesun.

In: Neurocomputing, Vol. 150, No. PB, 01.01.2015, p. 570-582.

Research output: Contribution to journalArticle

Kim, Hannah ; Choo, Jaegul ; Reddy, Chandan K. ; Park, Haesun. / Doubly supervised embedding based on class labels and intrinsic clusters for high-dimensional data visualization. In: Neurocomputing. 2015 ; Vol. 150, No. PB. pp. 570-582.
@article{769faf3e75a34d8983a0d6d428fcb344,
title = "Doubly supervised embedding based on class labels and intrinsic clusters for high-dimensional data visualization",
abstract = "Visualization of data can assist decision-making processes by presenting the underlying information in a perceptible manner. Many dimension reduction techniques have been proposed to generate faithful visualization snapshots given high-dimensional data. When class labels associated with the data are already provided, supervised dimension reduction methods, which utilize such pre-given label information as well as the data, have been effective in revealing the overall structure of data with respect to their pre-given class labels. However, the main principle of most of these supervised methods has been to enhance class separability, which generally leads to significant distortion of original relationships. To compensate for such distortion, we propose a novel doubly supervised dimension reduction approach that highlights both natural groupings conforming to original relationships and classes determined by pre-given labels. Our method imposes minimal supervision on the pre-given class information depending on their original distributions while imposing additional supervision on natural groupings to better preserve them in reduced feature space. Specifically, we apply the notion of doubly supervised dimension reduction to a state-of-the-art method called t-distributed stochastic neighbor embedding and present a new formulation and an algorithm. By performing both quantitative and qualitative analyses, we demonstrate the effectiveness of our method using various visualization examples on real-world data. Our results show that, compared to other existing methods, the proposed method better preserves the original high-dimensional relationships while simultaneously maintaining class separability and preserving cluster structures. In addition, due to the characteristics of preserving natural groupings, the visualization results generated by our method reveal interesting sub-groups that cohesively preserve the original relationships in the data.",
keywords = "Clustering, Multidimensional projection, Scatter plot, Supervised dimension reduction, T-distributed stochastic neighbor embedding, Visualization",
author = "Hannah Kim and Jaegul Choo and Reddy, {Chandan K.} and Haesun Park",
year = "2015",
month = "1",
day = "1",
doi = "10.1016/j.neucom.2014.09.064",
language = "English",
volume = "150",
pages = "570--582",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",
number = "PB",

}

TY - JOUR

T1 - Doubly supervised embedding based on class labels and intrinsic clusters for high-dimensional data visualization

AU - Kim, Hannah

AU - Choo, Jaegul

AU - Reddy, Chandan K.

AU - Park, Haesun

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Visualization of data can assist decision-making processes by presenting the underlying information in a perceptible manner. Many dimension reduction techniques have been proposed to generate faithful visualization snapshots given high-dimensional data. When class labels associated with the data are already provided, supervised dimension reduction methods, which utilize such pre-given label information as well as the data, have been effective in revealing the overall structure of data with respect to their pre-given class labels. However, the main principle of most of these supervised methods has been to enhance class separability, which generally leads to significant distortion of original relationships. To compensate for such distortion, we propose a novel doubly supervised dimension reduction approach that highlights both natural groupings conforming to original relationships and classes determined by pre-given labels. Our method imposes minimal supervision on the pre-given class information depending on their original distributions while imposing additional supervision on natural groupings to better preserve them in reduced feature space. Specifically, we apply the notion of doubly supervised dimension reduction to a state-of-the-art method called t-distributed stochastic neighbor embedding and present a new formulation and an algorithm. By performing both quantitative and qualitative analyses, we demonstrate the effectiveness of our method using various visualization examples on real-world data. Our results show that, compared to other existing methods, the proposed method better preserves the original high-dimensional relationships while simultaneously maintaining class separability and preserving cluster structures. In addition, due to the characteristics of preserving natural groupings, the visualization results generated by our method reveal interesting sub-groups that cohesively preserve the original relationships in the data.

AB - Visualization of data can assist decision-making processes by presenting the underlying information in a perceptible manner. Many dimension reduction techniques have been proposed to generate faithful visualization snapshots given high-dimensional data. When class labels associated with the data are already provided, supervised dimension reduction methods, which utilize such pre-given label information as well as the data, have been effective in revealing the overall structure of data with respect to their pre-given class labels. However, the main principle of most of these supervised methods has been to enhance class separability, which generally leads to significant distortion of original relationships. To compensate for such distortion, we propose a novel doubly supervised dimension reduction approach that highlights both natural groupings conforming to original relationships and classes determined by pre-given labels. Our method imposes minimal supervision on the pre-given class information depending on their original distributions while imposing additional supervision on natural groupings to better preserve them in reduced feature space. Specifically, we apply the notion of doubly supervised dimension reduction to a state-of-the-art method called t-distributed stochastic neighbor embedding and present a new formulation and an algorithm. By performing both quantitative and qualitative analyses, we demonstrate the effectiveness of our method using various visualization examples on real-world data. Our results show that, compared to other existing methods, the proposed method better preserves the original high-dimensional relationships while simultaneously maintaining class separability and preserving cluster structures. In addition, due to the characteristics of preserving natural groupings, the visualization results generated by our method reveal interesting sub-groups that cohesively preserve the original relationships in the data.

KW - Clustering

KW - Multidimensional projection

KW - Scatter plot

KW - Supervised dimension reduction

KW - T-distributed stochastic neighbor embedding

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=84922598421&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922598421&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2014.09.064

DO - 10.1016/j.neucom.2014.09.064

M3 - Article

VL - 150

SP - 570

EP - 582

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

IS - PB

ER -