Density-based geodesic distance for identifying the noisy and nonlinear clusters

Jaehong Yu, Seoung Bum Kim

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Clustering analysis can facilitate the extraction of implicit patterns in a dataset and elicit its natural groupings without requiring prior classification information. For superior clustering analysis results, a number of distance measures have been proposed. Recently, geodesic distance has been widely applied to clustering algorithms for nonlinear groupings. However, geodesic distance is sensitive to noise and hence, geodesic distance-based clustering may fail to discover nonlinear clusters in the region of the noise. In this study, we propose a density-based geodesic distance that can identify clusters in nonlinear and noisy situations. Experiments on various simulation and benchmark datasets are conducted to examine the properties of the proposed geodesic distance and to compare its performance with that of existing distance measures. The experimental results confirm that a clustering algorithm with the proposed distance measure demonstrated superior performance compared to the competitors; this was especially true when the cluster structures in the data were inherently noisy and nonlinearly patterned.

Original languageEnglish
Pages (from-to)231-243
Number of pages13
JournalInformation Sciences
Volume360
DOIs
Publication statusPublished - 2016 Sep 10

Fingerprint

Geodesic Distance
Clustering algorithms
Distance Measure
Clustering Analysis
Grouping
Clustering Algorithm
Experiments
Clustering
Benchmark
Experimental Results
Experiment
Distance measure
Simulation

Keywords

  • Geodesic distance
  • Mutual neighborhood-based density coefficient
  • Noisy data clustering
  • Nonlinearity

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management

Cite this

Density-based geodesic distance for identifying the noisy and nonlinear clusters. / Yu, Jaehong; Kim, Seoung Bum.

In: Information Sciences, Vol. 360, 10.09.2016, p. 231-243.

Research output: Contribution to journalArticle

@article{ef3f2084d8444e96b2041cf0fb518d3a,
title = "Density-based geodesic distance for identifying the noisy and nonlinear clusters",
abstract = "Clustering analysis can facilitate the extraction of implicit patterns in a dataset and elicit its natural groupings without requiring prior classification information. For superior clustering analysis results, a number of distance measures have been proposed. Recently, geodesic distance has been widely applied to clustering algorithms for nonlinear groupings. However, geodesic distance is sensitive to noise and hence, geodesic distance-based clustering may fail to discover nonlinear clusters in the region of the noise. In this study, we propose a density-based geodesic distance that can identify clusters in nonlinear and noisy situations. Experiments on various simulation and benchmark datasets are conducted to examine the properties of the proposed geodesic distance and to compare its performance with that of existing distance measures. The experimental results confirm that a clustering algorithm with the proposed distance measure demonstrated superior performance compared to the competitors; this was especially true when the cluster structures in the data were inherently noisy and nonlinearly patterned.",
keywords = "Geodesic distance, Mutual neighborhood-based density coefficient, Noisy data clustering, Nonlinearity",
author = "Jaehong Yu and Kim, {Seoung Bum}",
year = "2016",
month = "9",
day = "10",
doi = "10.1016/j.ins.2016.04.032",
language = "English",
volume = "360",
pages = "231--243",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Density-based geodesic distance for identifying the noisy and nonlinear clusters

AU - Yu, Jaehong

AU - Kim, Seoung Bum

PY - 2016/9/10

Y1 - 2016/9/10

N2 - Clustering analysis can facilitate the extraction of implicit patterns in a dataset and elicit its natural groupings without requiring prior classification information. For superior clustering analysis results, a number of distance measures have been proposed. Recently, geodesic distance has been widely applied to clustering algorithms for nonlinear groupings. However, geodesic distance is sensitive to noise and hence, geodesic distance-based clustering may fail to discover nonlinear clusters in the region of the noise. In this study, we propose a density-based geodesic distance that can identify clusters in nonlinear and noisy situations. Experiments on various simulation and benchmark datasets are conducted to examine the properties of the proposed geodesic distance and to compare its performance with that of existing distance measures. The experimental results confirm that a clustering algorithm with the proposed distance measure demonstrated superior performance compared to the competitors; this was especially true when the cluster structures in the data were inherently noisy and nonlinearly patterned.

AB - Clustering analysis can facilitate the extraction of implicit patterns in a dataset and elicit its natural groupings without requiring prior classification information. For superior clustering analysis results, a number of distance measures have been proposed. Recently, geodesic distance has been widely applied to clustering algorithms for nonlinear groupings. However, geodesic distance is sensitive to noise and hence, geodesic distance-based clustering may fail to discover nonlinear clusters in the region of the noise. In this study, we propose a density-based geodesic distance that can identify clusters in nonlinear and noisy situations. Experiments on various simulation and benchmark datasets are conducted to examine the properties of the proposed geodesic distance and to compare its performance with that of existing distance measures. The experimental results confirm that a clustering algorithm with the proposed distance measure demonstrated superior performance compared to the competitors; this was especially true when the cluster structures in the data were inherently noisy and nonlinearly patterned.

KW - Geodesic distance

KW - Mutual neighborhood-based density coefficient

KW - Noisy data clustering

KW - Nonlinearity

UR - http://www.scopus.com/inward/record.url?scp=84969833537&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84969833537&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2016.04.032

DO - 10.1016/j.ins.2016.04.032

M3 - Article

AN - SCOPUS:84969833537

VL - 360

SP - 231

EP - 243

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -