Improved response modeling based on clustering, under-sampling, and ensemble

Pilsung Kang, Sungzoon Cho, Douglas L. MacLachlan

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

The purpose of response modeling for direct marketing is to identify those customers who are likely to purchase a campaigned product, based upon customers' behavioral history and other information available. Contrary to mass marketing strategy, well-developed response models used for targeting specific customers can contribute profits to firms by not only increasing revenues, but also lowering marketing costs. Endemic in customer data used for response modeling is a class imbalance problem: the proportion of respondents is small relative to non-respondents. In this paper, we propose a novel data balancing method based on clustering, under-sampling, and ensemble to deal with the class imbalance problem, and thus improve response models. Using publicly available response modeling data sets, we compared the proposed method with other data balancing methods in terms of prediction accuracy and profitability. To investigate the usability of the proposed algorithm, we also employed various prediction algorithms when building the response models. Based on the response rate and profit analysis, we found that our proposed method (1) improved the response model by increasing response rate as well as reducing performance variation, and (2) increased total profit by significantly boosting revenue.

Original languageEnglish
Pages (from-to)6738-6753
Number of pages16
JournalExpert Systems with Applications
Volume39
Issue number8
DOIs
Publication statusPublished - 2012 Jun 15
Externally publishedYes

Fingerprint

Profitability
Sampling
Marketing
Data structures
History
Costs

Keywords

  • Class imbalance
  • Clustering
  • CRM
  • Data balancing
  • Direct marketing
  • Ensemble
  • Response modeling

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Cite this

Improved response modeling based on clustering, under-sampling, and ensemble. / Kang, Pilsung; Cho, Sungzoon; MacLachlan, Douglas L.

In: Expert Systems with Applications, Vol. 39, No. 8, 15.06.2012, p. 6738-6753.

Research output: Contribution to journalArticle

Kang, Pilsung ; Cho, Sungzoon ; MacLachlan, Douglas L. / Improved response modeling based on clustering, under-sampling, and ensemble. In: Expert Systems with Applications. 2012 ; Vol. 39, No. 8. pp. 6738-6753.
@article{3b357d278b454420bfd960ba0621b2c1,
title = "Improved response modeling based on clustering, under-sampling, and ensemble",
abstract = "The purpose of response modeling for direct marketing is to identify those customers who are likely to purchase a campaigned product, based upon customers' behavioral history and other information available. Contrary to mass marketing strategy, well-developed response models used for targeting specific customers can contribute profits to firms by not only increasing revenues, but also lowering marketing costs. Endemic in customer data used for response modeling is a class imbalance problem: the proportion of respondents is small relative to non-respondents. In this paper, we propose a novel data balancing method based on clustering, under-sampling, and ensemble to deal with the class imbalance problem, and thus improve response models. Using publicly available response modeling data sets, we compared the proposed method with other data balancing methods in terms of prediction accuracy and profitability. To investigate the usability of the proposed algorithm, we also employed various prediction algorithms when building the response models. Based on the response rate and profit analysis, we found that our proposed method (1) improved the response model by increasing response rate as well as reducing performance variation, and (2) increased total profit by significantly boosting revenue.",
keywords = "Class imbalance, Clustering, CRM, Data balancing, Direct marketing, Ensemble, Response modeling",
author = "Pilsung Kang and Sungzoon Cho and MacLachlan, {Douglas L.}",
year = "2012",
month = "6",
day = "15",
doi = "10.1016/j.eswa.2011.12.028",
language = "English",
volume = "39",
pages = "6738--6753",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",
number = "8",

}

TY - JOUR

T1 - Improved response modeling based on clustering, under-sampling, and ensemble

AU - Kang, Pilsung

AU - Cho, Sungzoon

AU - MacLachlan, Douglas L.

PY - 2012/6/15

Y1 - 2012/6/15

N2 - The purpose of response modeling for direct marketing is to identify those customers who are likely to purchase a campaigned product, based upon customers' behavioral history and other information available. Contrary to mass marketing strategy, well-developed response models used for targeting specific customers can contribute profits to firms by not only increasing revenues, but also lowering marketing costs. Endemic in customer data used for response modeling is a class imbalance problem: the proportion of respondents is small relative to non-respondents. In this paper, we propose a novel data balancing method based on clustering, under-sampling, and ensemble to deal with the class imbalance problem, and thus improve response models. Using publicly available response modeling data sets, we compared the proposed method with other data balancing methods in terms of prediction accuracy and profitability. To investigate the usability of the proposed algorithm, we also employed various prediction algorithms when building the response models. Based on the response rate and profit analysis, we found that our proposed method (1) improved the response model by increasing response rate as well as reducing performance variation, and (2) increased total profit by significantly boosting revenue.

AB - The purpose of response modeling for direct marketing is to identify those customers who are likely to purchase a campaigned product, based upon customers' behavioral history and other information available. Contrary to mass marketing strategy, well-developed response models used for targeting specific customers can contribute profits to firms by not only increasing revenues, but also lowering marketing costs. Endemic in customer data used for response modeling is a class imbalance problem: the proportion of respondents is small relative to non-respondents. In this paper, we propose a novel data balancing method based on clustering, under-sampling, and ensemble to deal with the class imbalance problem, and thus improve response models. Using publicly available response modeling data sets, we compared the proposed method with other data balancing methods in terms of prediction accuracy and profitability. To investigate the usability of the proposed algorithm, we also employed various prediction algorithms when building the response models. Based on the response rate and profit analysis, we found that our proposed method (1) improved the response model by increasing response rate as well as reducing performance variation, and (2) increased total profit by significantly boosting revenue.

KW - Class imbalance

KW - Clustering

KW - CRM

KW - Data balancing

KW - Direct marketing

KW - Ensemble

KW - Response modeling

UR - http://www.scopus.com/inward/record.url?scp=84857642451&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857642451&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2011.12.028

DO - 10.1016/j.eswa.2011.12.028

M3 - Article

AN - SCOPUS:84857642451

VL - 39

SP - 6738

EP - 6753

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

IS - 8

ER -