Improved response modeling based on clustering, under-sampling, and ensemble

Pilsung Kang, Sungzoon Cho, Douglas L. MacLachlan

Research output: Contribution to journalArticlepeer-review

21 Citations (Scopus)

Abstract

The purpose of response modeling for direct marketing is to identify those customers who are likely to purchase a campaigned product, based upon customers' behavioral history and other information available. Contrary to mass marketing strategy, well-developed response models used for targeting specific customers can contribute profits to firms by not only increasing revenues, but also lowering marketing costs. Endemic in customer data used for response modeling is a class imbalance problem: the proportion of respondents is small relative to non-respondents. In this paper, we propose a novel data balancing method based on clustering, under-sampling, and ensemble to deal with the class imbalance problem, and thus improve response models. Using publicly available response modeling data sets, we compared the proposed method with other data balancing methods in terms of prediction accuracy and profitability. To investigate the usability of the proposed algorithm, we also employed various prediction algorithms when building the response models. Based on the response rate and profit analysis, we found that our proposed method (1) improved the response model by increasing response rate as well as reducing performance variation, and (2) increased total profit by significantly boosting revenue.

Original languageEnglish
Pages (from-to)6738-6753
Number of pages16
JournalExpert Systems With Applications
Volume39
Issue number8
DOIs
Publication statusPublished - 2012 Jun 15
Externally publishedYes

Keywords

  • CRM
  • Class imbalance
  • Clustering
  • Data balancing
  • Direct marketing
  • Ensemble
  • Response modeling

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Improved response modeling based on clustering, under-sampling, and ensemble'. Together they form a unique fingerprint.

Cite this