An Ensemble Feature Ranking Algorithm for Clustering Analysis

Jaehong Yu, Hua Zhong, Seoung Bum Kim

Research output: Contribution to journalArticle

Abstract

Feature ranking is a widely used feature selection method. It uses importance scores to evaluate features and selects those with high scores. Conventional unsupervised feature ranking methods do not consider the information on cluster structures; therefore, these methods may be unable to select the relevant features for clustering analysis. To address this limitation, we propose a feature ranking algorithm based on silhouette decomposition. The proposed algorithm calculates the ensemble importance scores by decomposing the average silhouette widths of random subspaces. By doing so, the contribution of a feature in generating cluster structures can be represented more clearly. Experiments on different benchmark data sets examined the properties of the proposed algorithm and compared it with the existing ensemble-based feature ranking methods. The experiments demonstrated that the proposed algorithm outperformed its existing counterparts.

Original languageEnglish
JournalJournal of Classification
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

Clustering Analysis
Cluster Analysis
ranking
Ranking
Ensemble
Silhouette
Benchmarking
experiment
Feature Selection
Experiment
Subspace
Benchmark
Calculate
Decompose
Clustering analysis
Evaluate

Keywords

  • Ensemble importance score
  • Random subspace method
  • Silhouette decomposition
  • Unsupervised feature ranking

ASJC Scopus subject areas

  • Mathematics (miscellaneous)
  • Psychology (miscellaneous)
  • Statistics, Probability and Uncertainty
  • Library and Information Sciences

Cite this

An Ensemble Feature Ranking Algorithm for Clustering Analysis. / Yu, Jaehong; Zhong, Hua; Kim, Seoung Bum.

In: Journal of Classification, 01.01.2019.

Research output: Contribution to journalArticle

@article{88247e8ba7a0468cb8b0be123f684b37,
title = "An Ensemble Feature Ranking Algorithm for Clustering Analysis",
abstract = "Feature ranking is a widely used feature selection method. It uses importance scores to evaluate features and selects those with high scores. Conventional unsupervised feature ranking methods do not consider the information on cluster structures; therefore, these methods may be unable to select the relevant features for clustering analysis. To address this limitation, we propose a feature ranking algorithm based on silhouette decomposition. The proposed algorithm calculates the ensemble importance scores by decomposing the average silhouette widths of random subspaces. By doing so, the contribution of a feature in generating cluster structures can be represented more clearly. Experiments on different benchmark data sets examined the properties of the proposed algorithm and compared it with the existing ensemble-based feature ranking methods. The experiments demonstrated that the proposed algorithm outperformed its existing counterparts.",
keywords = "Ensemble importance score, Random subspace method, Silhouette decomposition, Unsupervised feature ranking",
author = "Jaehong Yu and Hua Zhong and Kim, {Seoung Bum}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/s00357-019-09330-8",
language = "English",
journal = "Journal of Classification",
issn = "0176-4268",
publisher = "Springer New York",

}

TY - JOUR

T1 - An Ensemble Feature Ranking Algorithm for Clustering Analysis

AU - Yu, Jaehong

AU - Zhong, Hua

AU - Kim, Seoung Bum

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Feature ranking is a widely used feature selection method. It uses importance scores to evaluate features and selects those with high scores. Conventional unsupervised feature ranking methods do not consider the information on cluster structures; therefore, these methods may be unable to select the relevant features for clustering analysis. To address this limitation, we propose a feature ranking algorithm based on silhouette decomposition. The proposed algorithm calculates the ensemble importance scores by decomposing the average silhouette widths of random subspaces. By doing so, the contribution of a feature in generating cluster structures can be represented more clearly. Experiments on different benchmark data sets examined the properties of the proposed algorithm and compared it with the existing ensemble-based feature ranking methods. The experiments demonstrated that the proposed algorithm outperformed its existing counterparts.

AB - Feature ranking is a widely used feature selection method. It uses importance scores to evaluate features and selects those with high scores. Conventional unsupervised feature ranking methods do not consider the information on cluster structures; therefore, these methods may be unable to select the relevant features for clustering analysis. To address this limitation, we propose a feature ranking algorithm based on silhouette decomposition. The proposed algorithm calculates the ensemble importance scores by decomposing the average silhouette widths of random subspaces. By doing so, the contribution of a feature in generating cluster structures can be represented more clearly. Experiments on different benchmark data sets examined the properties of the proposed algorithm and compared it with the existing ensemble-based feature ranking methods. The experiments demonstrated that the proposed algorithm outperformed its existing counterparts.

KW - Ensemble importance score

KW - Random subspace method

KW - Silhouette decomposition

KW - Unsupervised feature ranking

UR - http://www.scopus.com/inward/record.url?scp=85069647345&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069647345&partnerID=8YFLogxK

U2 - 10.1007/s00357-019-09330-8

DO - 10.1007/s00357-019-09330-8

M3 - Article

JO - Journal of Classification

JF - Journal of Classification

SN - 0176-4268

ER -