Probability-enhanced sufficient dimension reduction for binary classification

Seung Jun Shin, Yichao Wu, Hao Helen Zhang, Yufeng Liu

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

In high-dimensional data analysis, it is of primary interest to reduce the data dimensionality without loss of information. Sufficient dimension reduction (SDR) arises in this context, and many successful SDR methods have been developed since the introduction of sliced inverse regression (SIR) [Li (1991) Journal of the American Statistical Association 86, 316-327]. Despite their fast progress, though, most existing methods target on regression problems with a continuous response. For binary classification problems, SIR suffers the limitation of estimating at most one direction since only two slices are available. In this article, we develop a new and flexible probability-enhanced SDR method for binary classification problems by using the weighted support vector machine (WSVM). The key idea is to slice the data based on conditional class probabilities of observations rather than their binary responses. We first show that the central subspace based on the conditional class probability is the same as that based on the binary response. This important result justifies the proposed slicing scheme from a theoretical perspective and assures no information loss. In practice, the true conditional class probability is generally not available, and the problem of probability estimation can be challenging for data with large-dimensional inputs. We observe that, in order to implement the new slicing scheme, one does not need exact probability values and the only required information is the relative order of probability values. Motivated by this fact, our new SDR procedure bypasses the probability estimation step and employs the WSVM to directly estimate the order of probability values, based on which the slicing is performed. The performance of the proposed probability-enhanced SDR scheme is evaluated by both simulated and real data examples.

Original languageEnglish
Pages (from-to)546-555
Number of pages10
JournalBiometrics
Volume70
Issue number3
DOIs
Publication statusPublished - 2014 Sep 1
Externally publishedYes

Fingerprint

Sufficient Dimension Reduction
Binary Classification
slicing
Slicing
Sliced Inverse Regression
Binary Response
Reduction Method
Slice
Classification Problems
Support vector machines
Support Vector Machine
Central Subspace
Information Loss
High-dimensional Data
Inverse problems
Justify
Dimensionality
data analysis
Data analysis
Regression

Keywords

  • Binary classification
  • Conditional class probability
  • Fisher consistency
  • Sufficient dimension reduction
  • Weighted support vector machines (WSVMs)

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Cite this

Probability-enhanced sufficient dimension reduction for binary classification. / Shin, Seung Jun; Wu, Yichao; Zhang, Hao Helen; Liu, Yufeng.

In: Biometrics, Vol. 70, No. 3, 01.09.2014, p. 546-555.

Research output: Contribution to journalArticle

Shin, Seung Jun ; Wu, Yichao ; Zhang, Hao Helen ; Liu, Yufeng. / Probability-enhanced sufficient dimension reduction for binary classification. In: Biometrics. 2014 ; Vol. 70, No. 3. pp. 546-555.
@article{752ac7e3c3ef490da46175b7f89a832b,
title = "Probability-enhanced sufficient dimension reduction for binary classification",
abstract = "In high-dimensional data analysis, it is of primary interest to reduce the data dimensionality without loss of information. Sufficient dimension reduction (SDR) arises in this context, and many successful SDR methods have been developed since the introduction of sliced inverse regression (SIR) [Li (1991) Journal of the American Statistical Association 86, 316-327]. Despite their fast progress, though, most existing methods target on regression problems with a continuous response. For binary classification problems, SIR suffers the limitation of estimating at most one direction since only two slices are available. In this article, we develop a new and flexible probability-enhanced SDR method for binary classification problems by using the weighted support vector machine (WSVM). The key idea is to slice the data based on conditional class probabilities of observations rather than their binary responses. We first show that the central subspace based on the conditional class probability is the same as that based on the binary response. This important result justifies the proposed slicing scheme from a theoretical perspective and assures no information loss. In practice, the true conditional class probability is generally not available, and the problem of probability estimation can be challenging for data with large-dimensional inputs. We observe that, in order to implement the new slicing scheme, one does not need exact probability values and the only required information is the relative order of probability values. Motivated by this fact, our new SDR procedure bypasses the probability estimation step and employs the WSVM to directly estimate the order of probability values, based on which the slicing is performed. The performance of the proposed probability-enhanced SDR scheme is evaluated by both simulated and real data examples.",
keywords = "Binary classification, Conditional class probability, Fisher consistency, Sufficient dimension reduction, Weighted support vector machines (WSVMs)",
author = "Shin, {Seung Jun} and Yichao Wu and Zhang, {Hao Helen} and Yufeng Liu",
year = "2014",
month = "9",
day = "1",
doi = "10.1111/biom.12174",
language = "English",
volume = "70",
pages = "546--555",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "3",

}

TY - JOUR

T1 - Probability-enhanced sufficient dimension reduction for binary classification

AU - Shin, Seung Jun

AU - Wu, Yichao

AU - Zhang, Hao Helen

AU - Liu, Yufeng

PY - 2014/9/1

Y1 - 2014/9/1

N2 - In high-dimensional data analysis, it is of primary interest to reduce the data dimensionality without loss of information. Sufficient dimension reduction (SDR) arises in this context, and many successful SDR methods have been developed since the introduction of sliced inverse regression (SIR) [Li (1991) Journal of the American Statistical Association 86, 316-327]. Despite their fast progress, though, most existing methods target on regression problems with a continuous response. For binary classification problems, SIR suffers the limitation of estimating at most one direction since only two slices are available. In this article, we develop a new and flexible probability-enhanced SDR method for binary classification problems by using the weighted support vector machine (WSVM). The key idea is to slice the data based on conditional class probabilities of observations rather than their binary responses. We first show that the central subspace based on the conditional class probability is the same as that based on the binary response. This important result justifies the proposed slicing scheme from a theoretical perspective and assures no information loss. In practice, the true conditional class probability is generally not available, and the problem of probability estimation can be challenging for data with large-dimensional inputs. We observe that, in order to implement the new slicing scheme, one does not need exact probability values and the only required information is the relative order of probability values. Motivated by this fact, our new SDR procedure bypasses the probability estimation step and employs the WSVM to directly estimate the order of probability values, based on which the slicing is performed. The performance of the proposed probability-enhanced SDR scheme is evaluated by both simulated and real data examples.

AB - In high-dimensional data analysis, it is of primary interest to reduce the data dimensionality without loss of information. Sufficient dimension reduction (SDR) arises in this context, and many successful SDR methods have been developed since the introduction of sliced inverse regression (SIR) [Li (1991) Journal of the American Statistical Association 86, 316-327]. Despite their fast progress, though, most existing methods target on regression problems with a continuous response. For binary classification problems, SIR suffers the limitation of estimating at most one direction since only two slices are available. In this article, we develop a new and flexible probability-enhanced SDR method for binary classification problems by using the weighted support vector machine (WSVM). The key idea is to slice the data based on conditional class probabilities of observations rather than their binary responses. We first show that the central subspace based on the conditional class probability is the same as that based on the binary response. This important result justifies the proposed slicing scheme from a theoretical perspective and assures no information loss. In practice, the true conditional class probability is generally not available, and the problem of probability estimation can be challenging for data with large-dimensional inputs. We observe that, in order to implement the new slicing scheme, one does not need exact probability values and the only required information is the relative order of probability values. Motivated by this fact, our new SDR procedure bypasses the probability estimation step and employs the WSVM to directly estimate the order of probability values, based on which the slicing is performed. The performance of the proposed probability-enhanced SDR scheme is evaluated by both simulated and real data examples.

KW - Binary classification

KW - Conditional class probability

KW - Fisher consistency

KW - Sufficient dimension reduction

KW - Weighted support vector machines (WSVMs)

UR - http://www.scopus.com/inward/record.url?scp=84927697004&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84927697004&partnerID=8YFLogxK

U2 - 10.1111/biom.12174

DO - 10.1111/biom.12174

M3 - Article

C2 - 24779683

AN - SCOPUS:84927697004

VL - 70

SP - 546

EP - 555

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 3

ER -