Semi-supervised support vector regression based on self-training with label uncertainty: An application to virtual metrology in semiconductor manufacturing

Pilsung Kang, Dongil Kim, Sungzoon Cho

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

Dataset size continues to increase and data are being collected from numerous applications. Because collecting labeled data is expensive and time consuming, the amount of unlabeled data is increasing. Semi-supervised learning (SSL) has been proposed to improve conventional supervised learning methods by training from both unlabeled and labeled data. In contrast to classification problems, the estimation of labels for unlabeled data presents added uncertainty for regression problems. In this paper, a semi-supervised support vector regression (SS-SVR) method based on self-training is proposed. The proposed method addresses the uncertainty of the estimated labels for unlabeled data. To measure labeling uncertainty, the label distribution of the unlabeled data is estimated with two probabilistic local reconstruction (PLR) models. Then, the training data are generated by oversampling from the unlabeled data and their estimated label distribution. The sampling rate is different based on uncertainty. Finally, expected margin-based pattern selection (EMPS) is employed to reduce training complexity. We verify the proposed method with 30 regression datasets and a real-world problem: virtual metrology (VM) in semiconductor manufacturing. The experiment results show that the proposed method improves the accuracy by 8% compared with conventional supervised SVR, and the training time for the proposed method is 20% shorter than that of the benchmark methods.

Original languageEnglish
Pages (from-to)85-106
Number of pages22
JournalExpert Systems with Applications
Volume51
DOIs
Publication statusPublished - 2016 Jun 1

Fingerprint

Labels
Semiconductor materials
Supervised learning
Labeling
Sampling
Uncertainty
Experiments

Keywords

  • Data generation
  • Probabilistic local reconstruction
  • Semi-supervised learning
  • Semiconductor manufacturing
  • Support vector regression
  • Virtual metrology

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Cite this

@article{dd32d7e5ab6c43eda60b7eae6afcaa18,
title = "Semi-supervised support vector regression based on self-training with label uncertainty: An application to virtual metrology in semiconductor manufacturing",
abstract = "Dataset size continues to increase and data are being collected from numerous applications. Because collecting labeled data is expensive and time consuming, the amount of unlabeled data is increasing. Semi-supervised learning (SSL) has been proposed to improve conventional supervised learning methods by training from both unlabeled and labeled data. In contrast to classification problems, the estimation of labels for unlabeled data presents added uncertainty for regression problems. In this paper, a semi-supervised support vector regression (SS-SVR) method based on self-training is proposed. The proposed method addresses the uncertainty of the estimated labels for unlabeled data. To measure labeling uncertainty, the label distribution of the unlabeled data is estimated with two probabilistic local reconstruction (PLR) models. Then, the training data are generated by oversampling from the unlabeled data and their estimated label distribution. The sampling rate is different based on uncertainty. Finally, expected margin-based pattern selection (EMPS) is employed to reduce training complexity. We verify the proposed method with 30 regression datasets and a real-world problem: virtual metrology (VM) in semiconductor manufacturing. The experiment results show that the proposed method improves the accuracy by 8{\%} compared with conventional supervised SVR, and the training time for the proposed method is 20{\%} shorter than that of the benchmark methods.",
keywords = "Data generation, Probabilistic local reconstruction, Semi-supervised learning, Semiconductor manufacturing, Support vector regression, Virtual metrology",
author = "Pilsung Kang and Dongil Kim and Sungzoon Cho",
year = "2016",
month = "6",
day = "1",
doi = "10.1016/j.eswa.2015.12.027",
language = "English",
volume = "51",
pages = "85--106",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Semi-supervised support vector regression based on self-training with label uncertainty

T2 - An application to virtual metrology in semiconductor manufacturing

AU - Kang, Pilsung

AU - Kim, Dongil

AU - Cho, Sungzoon

PY - 2016/6/1

Y1 - 2016/6/1

N2 - Dataset size continues to increase and data are being collected from numerous applications. Because collecting labeled data is expensive and time consuming, the amount of unlabeled data is increasing. Semi-supervised learning (SSL) has been proposed to improve conventional supervised learning methods by training from both unlabeled and labeled data. In contrast to classification problems, the estimation of labels for unlabeled data presents added uncertainty for regression problems. In this paper, a semi-supervised support vector regression (SS-SVR) method based on self-training is proposed. The proposed method addresses the uncertainty of the estimated labels for unlabeled data. To measure labeling uncertainty, the label distribution of the unlabeled data is estimated with two probabilistic local reconstruction (PLR) models. Then, the training data are generated by oversampling from the unlabeled data and their estimated label distribution. The sampling rate is different based on uncertainty. Finally, expected margin-based pattern selection (EMPS) is employed to reduce training complexity. We verify the proposed method with 30 regression datasets and a real-world problem: virtual metrology (VM) in semiconductor manufacturing. The experiment results show that the proposed method improves the accuracy by 8% compared with conventional supervised SVR, and the training time for the proposed method is 20% shorter than that of the benchmark methods.

AB - Dataset size continues to increase and data are being collected from numerous applications. Because collecting labeled data is expensive and time consuming, the amount of unlabeled data is increasing. Semi-supervised learning (SSL) has been proposed to improve conventional supervised learning methods by training from both unlabeled and labeled data. In contrast to classification problems, the estimation of labels for unlabeled data presents added uncertainty for regression problems. In this paper, a semi-supervised support vector regression (SS-SVR) method based on self-training is proposed. The proposed method addresses the uncertainty of the estimated labels for unlabeled data. To measure labeling uncertainty, the label distribution of the unlabeled data is estimated with two probabilistic local reconstruction (PLR) models. Then, the training data are generated by oversampling from the unlabeled data and their estimated label distribution. The sampling rate is different based on uncertainty. Finally, expected margin-based pattern selection (EMPS) is employed to reduce training complexity. We verify the proposed method with 30 regression datasets and a real-world problem: virtual metrology (VM) in semiconductor manufacturing. The experiment results show that the proposed method improves the accuracy by 8% compared with conventional supervised SVR, and the training time for the proposed method is 20% shorter than that of the benchmark methods.

KW - Data generation

KW - Probabilistic local reconstruction

KW - Semi-supervised learning

KW - Semiconductor manufacturing

KW - Support vector regression

KW - Virtual metrology

UR - http://www.scopus.com/inward/record.url?scp=84955137019&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84955137019&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2015.12.027

DO - 10.1016/j.eswa.2015.12.027

M3 - Article

AN - SCOPUS:84955137019

VL - 51

SP - 85

EP - 106

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

ER -