Oversampling method using outlier detectable generative adversarial network

Joo Hyuk Oh, Jae Yeol Hong, Jun-Geol Baek

Research output: Contribution to journalArticle

Abstract

A class imbalance problem occurs when a particular class of data is significantly more or less than another class of data. This problem is difficult to solve; however, solutions such as the oversampling method using synthetic minority oversampling technique (SMOTE) or conditional generative adversarial network (cGAN) have been suggested recently to solve this problem. In the case of SMOTE and their variations, it is possible to generate biased artificial data because it does not consider the entire data in the minority class. To overcome this problem, an oversampling method using cGAN has been proposed. However, such a method does not consider the majority class that affects the classification boundary. In particular, if there is an outlier in the majority class, the classification boundary may be biased. This paper presents an oversampling method using outlier detectable generative adversarial network (OD-GAN) to solve this problem. We use a discriminator, which is used only for training purposes in cGAN, as an outlier detector to quantify the difference between the distributions of the majority and minority classes. The discriminator can detect and remove outliers. This prevents the distortion of the classification boundary caused by outliers. The generator imitates the distribution of the minority class and generates artificial data to balance the dataset. We experiment with various datasets, oversampling techniques, and classifiers. The empirical results show that the performance of OD-GAN is better than those of other oversampling methods for imbalanced datasets with outliers.

Original languageEnglish
Pages (from-to)1-8
Number of pages8
JournalExpert Systems with Applications
Volume133
DOIs
Publication statusPublished - 2019 Nov 1

Fingerprint

Discriminators
Classifiers
Detectors
Experiments

Keywords

  • Class imbalance problem
  • Generative adversarial network
  • Outlier detection
  • Oversampling

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Cite this

Oversampling method using outlier detectable generative adversarial network. / Oh, Joo Hyuk; Hong, Jae Yeol; Baek, Jun-Geol.

In: Expert Systems with Applications, Vol. 133, 01.11.2019, p. 1-8.

Research output: Contribution to journalArticle

@article{b80076bfe8ae46adb3a84d93d3db83c2,
title = "Oversampling method using outlier detectable generative adversarial network",
abstract = "A class imbalance problem occurs when a particular class of data is significantly more or less than another class of data. This problem is difficult to solve; however, solutions such as the oversampling method using synthetic minority oversampling technique (SMOTE) or conditional generative adversarial network (cGAN) have been suggested recently to solve this problem. In the case of SMOTE and their variations, it is possible to generate biased artificial data because it does not consider the entire data in the minority class. To overcome this problem, an oversampling method using cGAN has been proposed. However, such a method does not consider the majority class that affects the classification boundary. In particular, if there is an outlier in the majority class, the classification boundary may be biased. This paper presents an oversampling method using outlier detectable generative adversarial network (OD-GAN) to solve this problem. We use a discriminator, which is used only for training purposes in cGAN, as an outlier detector to quantify the difference between the distributions of the majority and minority classes. The discriminator can detect and remove outliers. This prevents the distortion of the classification boundary caused by outliers. The generator imitates the distribution of the minority class and generates artificial data to balance the dataset. We experiment with various datasets, oversampling techniques, and classifiers. The empirical results show that the performance of OD-GAN is better than those of other oversampling methods for imbalanced datasets with outliers.",
keywords = "Class imbalance problem, Generative adversarial network, Outlier detection, Oversampling",
author = "Oh, {Joo Hyuk} and Hong, {Jae Yeol} and Jun-Geol Baek",
year = "2019",
month = "11",
day = "1",
doi = "10.1016/j.eswa.2019.05.006",
language = "English",
volume = "133",
pages = "1--8",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Oversampling method using outlier detectable generative adversarial network

AU - Oh, Joo Hyuk

AU - Hong, Jae Yeol

AU - Baek, Jun-Geol

PY - 2019/11/1

Y1 - 2019/11/1

N2 - A class imbalance problem occurs when a particular class of data is significantly more or less than another class of data. This problem is difficult to solve; however, solutions such as the oversampling method using synthetic minority oversampling technique (SMOTE) or conditional generative adversarial network (cGAN) have been suggested recently to solve this problem. In the case of SMOTE and their variations, it is possible to generate biased artificial data because it does not consider the entire data in the minority class. To overcome this problem, an oversampling method using cGAN has been proposed. However, such a method does not consider the majority class that affects the classification boundary. In particular, if there is an outlier in the majority class, the classification boundary may be biased. This paper presents an oversampling method using outlier detectable generative adversarial network (OD-GAN) to solve this problem. We use a discriminator, which is used only for training purposes in cGAN, as an outlier detector to quantify the difference between the distributions of the majority and minority classes. The discriminator can detect and remove outliers. This prevents the distortion of the classification boundary caused by outliers. The generator imitates the distribution of the minority class and generates artificial data to balance the dataset. We experiment with various datasets, oversampling techniques, and classifiers. The empirical results show that the performance of OD-GAN is better than those of other oversampling methods for imbalanced datasets with outliers.

AB - A class imbalance problem occurs when a particular class of data is significantly more or less than another class of data. This problem is difficult to solve; however, solutions such as the oversampling method using synthetic minority oversampling technique (SMOTE) or conditional generative adversarial network (cGAN) have been suggested recently to solve this problem. In the case of SMOTE and their variations, it is possible to generate biased artificial data because it does not consider the entire data in the minority class. To overcome this problem, an oversampling method using cGAN has been proposed. However, such a method does not consider the majority class that affects the classification boundary. In particular, if there is an outlier in the majority class, the classification boundary may be biased. This paper presents an oversampling method using outlier detectable generative adversarial network (OD-GAN) to solve this problem. We use a discriminator, which is used only for training purposes in cGAN, as an outlier detector to quantify the difference between the distributions of the majority and minority classes. The discriminator can detect and remove outliers. This prevents the distortion of the classification boundary caused by outliers. The generator imitates the distribution of the minority class and generates artificial data to balance the dataset. We experiment with various datasets, oversampling techniques, and classifiers. The empirical results show that the performance of OD-GAN is better than those of other oversampling methods for imbalanced datasets with outliers.

KW - Class imbalance problem

KW - Generative adversarial network

KW - Outlier detection

KW - Oversampling

UR - http://www.scopus.com/inward/record.url?scp=85065580340&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065580340&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2019.05.006

DO - 10.1016/j.eswa.2019.05.006

M3 - Article

AN - SCOPUS:85065580340

VL - 133

SP - 1

EP - 8

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

ER -