Integrating cluster validity indices based on data envelopment analysis

Boseop Kim, Hakyeon Lee, Pilsung Kang

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Because clustering is an unsupervised learning task, a number of different validity indices have been proposed to measure the quality of the clustering results. However, there is no single best validity measure for all types of clustering tasks because individual clustering validity indices have both advantages and shortcomings. Because each validity index has demonstrated its effectiveness in particular cases, it is reasonable to expect that a more generalized clustering validity index can be developed, if individually effective cluster validity indices are appropriately integrated. In this paper, we propose a new cluster validity index, named Charnes, Cooper & Rhodes − cluster validity (CCR-CV), by integrating eight internal clustering efficiency measures based on data envelopment analysis (DEA). The proposed CCR-CV can be used for purposes that are more general because it extends the coverage of a single validity index by adaptively adjusting the combining weights of different validity indices for different datasets. Based on the experimental results on 12 artificial and 30 real datasets, the proposed clustering validity index demonstrates superior ability to determine the optimal and plausible cluster structures compared to benchmark individual validity indices.

Original languageEnglish
Pages (from-to)94-108
Number of pages15
JournalApplied Soft Computing Journal
Volume64
DOIs
Publication statusPublished - 2018 Mar 1

Fingerprint

Unsupervised learning
Data envelopment analysis

Keywords

  • Clustering validity
  • Data envelopment analysis
  • Internal measure
  • Linear programming

ASJC Scopus subject areas

  • Software

Cite this

Integrating cluster validity indices based on data envelopment analysis. / Kim, Boseop; Lee, Hakyeon; Kang, Pilsung.

In: Applied Soft Computing Journal, Vol. 64, 01.03.2018, p. 94-108.

Research output: Contribution to journalArticle

@article{8233039f3f76464dbdac908a0f29e77e,
title = "Integrating cluster validity indices based on data envelopment analysis",
abstract = "Because clustering is an unsupervised learning task, a number of different validity indices have been proposed to measure the quality of the clustering results. However, there is no single best validity measure for all types of clustering tasks because individual clustering validity indices have both advantages and shortcomings. Because each validity index has demonstrated its effectiveness in particular cases, it is reasonable to expect that a more generalized clustering validity index can be developed, if individually effective cluster validity indices are appropriately integrated. In this paper, we propose a new cluster validity index, named Charnes, Cooper & Rhodes − cluster validity (CCR-CV), by integrating eight internal clustering efficiency measures based on data envelopment analysis (DEA). The proposed CCR-CV can be used for purposes that are more general because it extends the coverage of a single validity index by adaptively adjusting the combining weights of different validity indices for different datasets. Based on the experimental results on 12 artificial and 30 real datasets, the proposed clustering validity index demonstrates superior ability to determine the optimal and plausible cluster structures compared to benchmark individual validity indices.",
keywords = "Clustering validity, Data envelopment analysis, Internal measure, Linear programming",
author = "Boseop Kim and Hakyeon Lee and Pilsung Kang",
year = "2018",
month = "3",
day = "1",
doi = "10.1016/j.asoc.2017.11.052",
language = "English",
volume = "64",
pages = "94--108",
journal = "Applied Soft Computing",
issn = "1568-4946",
publisher = "Elsevier BV",

}

TY - JOUR

T1 - Integrating cluster validity indices based on data envelopment analysis

AU - Kim, Boseop

AU - Lee, Hakyeon

AU - Kang, Pilsung

PY - 2018/3/1

Y1 - 2018/3/1

N2 - Because clustering is an unsupervised learning task, a number of different validity indices have been proposed to measure the quality of the clustering results. However, there is no single best validity measure for all types of clustering tasks because individual clustering validity indices have both advantages and shortcomings. Because each validity index has demonstrated its effectiveness in particular cases, it is reasonable to expect that a more generalized clustering validity index can be developed, if individually effective cluster validity indices are appropriately integrated. In this paper, we propose a new cluster validity index, named Charnes, Cooper & Rhodes − cluster validity (CCR-CV), by integrating eight internal clustering efficiency measures based on data envelopment analysis (DEA). The proposed CCR-CV can be used for purposes that are more general because it extends the coverage of a single validity index by adaptively adjusting the combining weights of different validity indices for different datasets. Based on the experimental results on 12 artificial and 30 real datasets, the proposed clustering validity index demonstrates superior ability to determine the optimal and plausible cluster structures compared to benchmark individual validity indices.

AB - Because clustering is an unsupervised learning task, a number of different validity indices have been proposed to measure the quality of the clustering results. However, there is no single best validity measure for all types of clustering tasks because individual clustering validity indices have both advantages and shortcomings. Because each validity index has demonstrated its effectiveness in particular cases, it is reasonable to expect that a more generalized clustering validity index can be developed, if individually effective cluster validity indices are appropriately integrated. In this paper, we propose a new cluster validity index, named Charnes, Cooper & Rhodes − cluster validity (CCR-CV), by integrating eight internal clustering efficiency measures based on data envelopment analysis (DEA). The proposed CCR-CV can be used for purposes that are more general because it extends the coverage of a single validity index by adaptively adjusting the combining weights of different validity indices for different datasets. Based on the experimental results on 12 artificial and 30 real datasets, the proposed clustering validity index demonstrates superior ability to determine the optimal and plausible cluster structures compared to benchmark individual validity indices.

KW - Clustering validity

KW - Data envelopment analysis

KW - Internal measure

KW - Linear programming

UR - http://www.scopus.com/inward/record.url?scp=85037975496&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037975496&partnerID=8YFLogxK

U2 - 10.1016/j.asoc.2017.11.052

DO - 10.1016/j.asoc.2017.11.052

M3 - Article

AN - SCOPUS:85037975496

VL - 64

SP - 94

EP - 108

JO - Applied Soft Computing

JF - Applied Soft Computing

SN - 1568-4946

ER -