TY - GEN
T1 - A Comparison of Oversampling Methods for Constructing a Prognostic Model in the Patient with Heart Failure
AU - Kim, Young Tak
AU - Kim, Dong Kyu
AU - Kim, Hakseung
AU - Kim, Dong Joo
N1 - Funding Information:
This research was supported by the National Research Foundation of Korea (NRF) grant(2019R1A2C1003399, 2020R1C1C1006773); the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2020-2016-0-00464) supervised by the IITP(Institute for Information & communications Technology Promotion). *Asterisk denotes the corresponding author.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/10/21
Y1 - 2020/10/21
N2 - Heart failure (HF) is the terminal stage of all heart disease and the leading cause of mortality. A reliable prognostic model for predicting mortality in patients with HF can help to support better decisions in clinical practice. Many attempts have been made to increase the reliability of the prognostic model using electronic health record (EHR), but it is still not known which oversampling method is efficient in imbalanced and insufficient EHR dataset. This study performed a comparative analysis of renowned oversampling methods (i.e., synthetic minority oversampling technique (SMOTE), borderline-SMOTE, and adaptive synthetic (ADASYN) sampling techniques) in constructing prognostic models for HF patients. All 299 patients had left ventricular systolic dysfunction, belonging to New York Heart Association class III and IV (Survival = 203, Deceased = 96). Follow up time was 4-285 days with an average of 130 days. The above three oversampling methods were compared in the case where the prognostic models were constructed by the random forest to predict mortality of patients with HF. The baseline model without oversampling method showed an F-score of 0.55. The oversampling method improved the F-score by 0.05 or more compared to the baseline model. SMOTE showed the highest prognostic capacity (F-score = 0.63) among the oversampling methods (F-score of borderline SMOTE = 0.60, ADASYN = 0.62). In all three oversampling methods, ejection fraction, serum creatinine, and age were consistently observed with high importance. Consequently, SMOTE is the most adequate algorithm for oversampling EHR data to predict mortality in HF patients.
AB - Heart failure (HF) is the terminal stage of all heart disease and the leading cause of mortality. A reliable prognostic model for predicting mortality in patients with HF can help to support better decisions in clinical practice. Many attempts have been made to increase the reliability of the prognostic model using electronic health record (EHR), but it is still not known which oversampling method is efficient in imbalanced and insufficient EHR dataset. This study performed a comparative analysis of renowned oversampling methods (i.e., synthetic minority oversampling technique (SMOTE), borderline-SMOTE, and adaptive synthetic (ADASYN) sampling techniques) in constructing prognostic models for HF patients. All 299 patients had left ventricular systolic dysfunction, belonging to New York Heart Association class III and IV (Survival = 203, Deceased = 96). Follow up time was 4-285 days with an average of 130 days. The above three oversampling methods were compared in the case where the prognostic models were constructed by the random forest to predict mortality of patients with HF. The baseline model without oversampling method showed an F-score of 0.55. The oversampling method improved the F-score by 0.05 or more compared to the baseline model. SMOTE showed the highest prognostic capacity (F-score = 0.63) among the oversampling methods (F-score of borderline SMOTE = 0.60, ADASYN = 0.62). In all three oversampling methods, ejection fraction, serum creatinine, and age were consistently observed with high importance. Consequently, SMOTE is the most adequate algorithm for oversampling EHR data to predict mortality in HF patients.
KW - Electronic Health Record
KW - Heart Failure
KW - Oversampling
KW - Prognostic Model
UR - http://www.scopus.com/inward/record.url?scp=85099006627&partnerID=8YFLogxK
U2 - 10.1109/ICTC49870.2020.9289522
DO - 10.1109/ICTC49870.2020.9289522
M3 - Conference contribution
AN - SCOPUS:85099006627
T3 - International Conference on ICT Convergence
SP - 379
EP - 383
BT - ICTC 2020 - 11th International Conference on ICT Convergence
PB - IEEE Computer Society
T2 - 11th International Conference on Information and Communication Technology Convergence, ICTC 2020
Y2 - 21 October 2020 through 23 October 2020
ER -