Active semi-supervised learning with multiple complementary information

Sung Ho Park, Seoung Bum Kim

Research output: Contribution to journalArticle

Abstract

In many practical machine learning problems, the acquisition of labeled data is often expensive and time consuming. To reduce this labeling cost, active learning has been introduced in many scientific fields. This study considers the problem of active learning of a regression model in the context of an optimal experimental design. Classical optimal experimental design approaches are based on the least square errors of labeled samples. Recently, a couple of active learning approaches that take advantage of both labeled and unlabeled data have been developed based on Laplacian regularized regression models with a single criterion. However, these approaches are susceptible to selecting undesirable samples when the number of initially labeled samples is small. To address this susceptibility, this study proposes an active learning method that considers multiple complementary criteria. These criteria include sample representativeness, diversity information, and variance reduction of the Laplacian regularization model. Specifically, we developed novel density and diversity criteria based on a clustering algorithm to identify the samples that are representative of their distributions, while minimizing their redundancy. Experiments were conducted on synthetic and benchmark data to compare the performance of the proposed method with that of existing methods. Experimental results demonstrate that the proposed active learning algorithm outperforms its existing counterparts.

Original languageEnglish
Pages (from-to)30-40
Number of pages11
JournalExpert Systems with Applications
Volume126
DOIs
Publication statusPublished - 2019 Jul 15

Fingerprint

Supervised learning
Design of experiments
Clustering algorithms
Labeling
Learning algorithms
Redundancy
Learning systems
Problem-Based Learning
Costs
Experiments

Keywords

  • Active learning
  • Diversity
  • Optimal experimental design
  • Representativeness
  • Semi-supervised learning

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Cite this

Active semi-supervised learning with multiple complementary information. / Park, Sung Ho; Kim, Seoung Bum.

In: Expert Systems with Applications, Vol. 126, 15.07.2019, p. 30-40.

Research output: Contribution to journalArticle

@article{481e66c76380416fbed6a11956d08e48,
title = "Active semi-supervised learning with multiple complementary information",
abstract = "In many practical machine learning problems, the acquisition of labeled data is often expensive and time consuming. To reduce this labeling cost, active learning has been introduced in many scientific fields. This study considers the problem of active learning of a regression model in the context of an optimal experimental design. Classical optimal experimental design approaches are based on the least square errors of labeled samples. Recently, a couple of active learning approaches that take advantage of both labeled and unlabeled data have been developed based on Laplacian regularized regression models with a single criterion. However, these approaches are susceptible to selecting undesirable samples when the number of initially labeled samples is small. To address this susceptibility, this study proposes an active learning method that considers multiple complementary criteria. These criteria include sample representativeness, diversity information, and variance reduction of the Laplacian regularization model. Specifically, we developed novel density and diversity criteria based on a clustering algorithm to identify the samples that are representative of their distributions, while minimizing their redundancy. Experiments were conducted on synthetic and benchmark data to compare the performance of the proposed method with that of existing methods. Experimental results demonstrate that the proposed active learning algorithm outperforms its existing counterparts.",
keywords = "Active learning, Diversity, Optimal experimental design, Representativeness, Semi-supervised learning",
author = "Park, {Sung Ho} and Kim, {Seoung Bum}",
year = "2019",
month = "7",
day = "15",
doi = "10.1016/j.eswa.2019.02.017",
language = "English",
volume = "126",
pages = "30--40",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Active semi-supervised learning with multiple complementary information

AU - Park, Sung Ho

AU - Kim, Seoung Bum

PY - 2019/7/15

Y1 - 2019/7/15

N2 - In many practical machine learning problems, the acquisition of labeled data is often expensive and time consuming. To reduce this labeling cost, active learning has been introduced in many scientific fields. This study considers the problem of active learning of a regression model in the context of an optimal experimental design. Classical optimal experimental design approaches are based on the least square errors of labeled samples. Recently, a couple of active learning approaches that take advantage of both labeled and unlabeled data have been developed based on Laplacian regularized regression models with a single criterion. However, these approaches are susceptible to selecting undesirable samples when the number of initially labeled samples is small. To address this susceptibility, this study proposes an active learning method that considers multiple complementary criteria. These criteria include sample representativeness, diversity information, and variance reduction of the Laplacian regularization model. Specifically, we developed novel density and diversity criteria based on a clustering algorithm to identify the samples that are representative of their distributions, while minimizing their redundancy. Experiments were conducted on synthetic and benchmark data to compare the performance of the proposed method with that of existing methods. Experimental results demonstrate that the proposed active learning algorithm outperforms its existing counterparts.

AB - In many practical machine learning problems, the acquisition of labeled data is often expensive and time consuming. To reduce this labeling cost, active learning has been introduced in many scientific fields. This study considers the problem of active learning of a regression model in the context of an optimal experimental design. Classical optimal experimental design approaches are based on the least square errors of labeled samples. Recently, a couple of active learning approaches that take advantage of both labeled and unlabeled data have been developed based on Laplacian regularized regression models with a single criterion. However, these approaches are susceptible to selecting undesirable samples when the number of initially labeled samples is small. To address this susceptibility, this study proposes an active learning method that considers multiple complementary criteria. These criteria include sample representativeness, diversity information, and variance reduction of the Laplacian regularization model. Specifically, we developed novel density and diversity criteria based on a clustering algorithm to identify the samples that are representative of their distributions, while minimizing their redundancy. Experiments were conducted on synthetic and benchmark data to compare the performance of the proposed method with that of existing methods. Experimental results demonstrate that the proposed active learning algorithm outperforms its existing counterparts.

KW - Active learning

KW - Diversity

KW - Optimal experimental design

KW - Representativeness

KW - Semi-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85061675068&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061675068&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2019.02.017

DO - 10.1016/j.eswa.2019.02.017

M3 - Article

VL - 126

SP - 30

EP - 40

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

ER -