Stability approach to selecting the number of principal components

Jiyeon Song, Seung Jun Shin

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Principal component analysis (PCA) is a canonical tool that reduces data dimensionality by finding linear transformations that project the data into a lower dimensional subspace while preserving the variability of the data. Selecting the number of principal components (PC) is essential but challenging for PCA since it represents an unsupervised learning problem without a clear target label at the sample level. In this article, we propose a new method to determine the optimal number of PCs based on the stability of the space spanned by PCs. A series of analyses with both synthetic data and real data demonstrates the superior performance of the proposed method.

Original languageEnglish
Pages (from-to)1923-1938
Number of pages16
JournalComputational Statistics
Volume33
Issue number4
DOIs
Publication statusPublished - 2018 Dec 1

Fingerprint

Principal Components
Principal component analysis
Linear transformations
Unsupervised learning
Principal Component Analysis
Labels
Unsupervised Learning
Linear transformation
Synthetic Data
Dimensionality
Subspace
Target
Series
Principal components
Demonstrate

Keywords

  • Principal component analysis
  • Stability selection
  • Structural dimension
  • Subsampling

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Computational Mathematics

Cite this

Stability approach to selecting the number of principal components. / Song, Jiyeon; Shin, Seung Jun.

In: Computational Statistics, Vol. 33, No. 4, 01.12.2018, p. 1923-1938.

Research output: Contribution to journalArticle

@article{a1aa3881bae34c998e9b77e67a634594,
title = "Stability approach to selecting the number of principal components",
abstract = "Principal component analysis (PCA) is a canonical tool that reduces data dimensionality by finding linear transformations that project the data into a lower dimensional subspace while preserving the variability of the data. Selecting the number of principal components (PC) is essential but challenging for PCA since it represents an unsupervised learning problem without a clear target label at the sample level. In this article, we propose a new method to determine the optimal number of PCs based on the stability of the space spanned by PCs. A series of analyses with both synthetic data and real data demonstrates the superior performance of the proposed method.",
keywords = "Principal component analysis, Stability selection, Structural dimension, Subsampling",
author = "Jiyeon Song and Shin, {Seung Jun}",
year = "2018",
month = "12",
day = "1",
doi = "10.1007/s00180-018-0826-7",
language = "English",
volume = "33",
pages = "1923--1938",
journal = "Computational Statistics",
issn = "0943-4062",
publisher = "Springer Verlag",
number = "4",

}

TY - JOUR

T1 - Stability approach to selecting the number of principal components

AU - Song, Jiyeon

AU - Shin, Seung Jun

PY - 2018/12/1

Y1 - 2018/12/1

N2 - Principal component analysis (PCA) is a canonical tool that reduces data dimensionality by finding linear transformations that project the data into a lower dimensional subspace while preserving the variability of the data. Selecting the number of principal components (PC) is essential but challenging for PCA since it represents an unsupervised learning problem without a clear target label at the sample level. In this article, we propose a new method to determine the optimal number of PCs based on the stability of the space spanned by PCs. A series of analyses with both synthetic data and real data demonstrates the superior performance of the proposed method.

AB - Principal component analysis (PCA) is a canonical tool that reduces data dimensionality by finding linear transformations that project the data into a lower dimensional subspace while preserving the variability of the data. Selecting the number of principal components (PC) is essential but challenging for PCA since it represents an unsupervised learning problem without a clear target label at the sample level. In this article, we propose a new method to determine the optimal number of PCs based on the stability of the space spanned by PCs. A series of analyses with both synthetic data and real data demonstrates the superior performance of the proposed method.

KW - Principal component analysis

KW - Stability selection

KW - Structural dimension

KW - Subsampling

UR - http://www.scopus.com/inward/record.url?scp=85049941345&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049941345&partnerID=8YFLogxK

U2 - 10.1007/s00180-018-0826-7

DO - 10.1007/s00180-018-0826-7

M3 - Article

AN - SCOPUS:85049941345

VL - 33

SP - 1923

EP - 1938

JO - Computational Statistics

JF - Computational Statistics

SN - 0943-4062

IS - 4

ER -