Machine learning for the prediction of new-onset diabetes mellitus during 5-year follow-up in non-diabetic patients with cardiovascular risks

Byoung Geol Choi, Seung-Woon Rha, Suhng Wook Kim, Jun Hyuk Kang, Ji Young Park, Yung Kyun Noh

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Purpose: Many studies have proposed predictive models for type 2 diabetes mellitus (T2DM). However, these predictive models have several limitations, such as user convenience and reproducibility. The purpose of this study was to develop a T2DM predictive model using electronic medical records (EMRs) and machine learning and to compare the performance of this model with traditional statistical methods. Materials and Methods: In this study, a total of available 8454 patients who had no history of diabetes and were treated at the cardiovascular center of Korea University Guro Hospital were enrolled. All subjects completed 5 years of follow up. The prevalence of T2DM during follow up was 4.78% (404/8454). A total of 28 variables were extracted from the EMRs. In order to verify the cross-validation test according to the prediction model, logistic regression (LR), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-nearest neighbor (KNN) algorithm models were generated. The LR model was considered as the existing statistical analysis method. Results: All predictive models maintained a change within the standard deviation of area under the curve (AUC) <0.01 in the analysis after a 10-fold cross-validation test. Among all predictive models, the LR learning model showed the highest prediction performance, with an AUC of 0.78. However, compared to the LR model, the LDA, QDA, and KNN models did not show a statistically significant difference. Conclusion: We successfully developed and verified a T2DM prediction system using machine learning and an EMR database, and it predicted the 5-year occurrence of T2DM similarly to with a traditional prediction model. In further study, it is necessary to apply and verify the prediction model through clinical research.

Original languageEnglish
Pages (from-to)191-199
Number of pages9
JournalYonsei medical journal
Volume60
Issue number2
DOIs
Publication statusPublished - 2019 Feb 1

Fingerprint

Diabetes Mellitus
Type 2 Diabetes Mellitus
Logistic Models
Discriminant Analysis
Electronic Health Records
Area Under Curve
Korea
Machine Learning
Learning
Databases
Research

Keywords

  • Big data
  • Diabetes
  • Machine learning
  • Prediction
  • Type 2 diabetes mellitus

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Machine learning for the prediction of new-onset diabetes mellitus during 5-year follow-up in non-diabetic patients with cardiovascular risks. / Choi, Byoung Geol; Rha, Seung-Woon; Kim, Suhng Wook; Kang, Jun Hyuk; Park, Ji Young; Noh, Yung Kyun.

In: Yonsei medical journal, Vol. 60, No. 2, 01.02.2019, p. 191-199.

Research output: Contribution to journalArticle

@article{b6c887917dd24825bbbe9404ec34b860,
title = "Machine learning for the prediction of new-onset diabetes mellitus during 5-year follow-up in non-diabetic patients with cardiovascular risks",
abstract = "Purpose: Many studies have proposed predictive models for type 2 diabetes mellitus (T2DM). However, these predictive models have several limitations, such as user convenience and reproducibility. The purpose of this study was to develop a T2DM predictive model using electronic medical records (EMRs) and machine learning and to compare the performance of this model with traditional statistical methods. Materials and Methods: In this study, a total of available 8454 patients who had no history of diabetes and were treated at the cardiovascular center of Korea University Guro Hospital were enrolled. All subjects completed 5 years of follow up. The prevalence of T2DM during follow up was 4.78{\%} (404/8454). A total of 28 variables were extracted from the EMRs. In order to verify the cross-validation test according to the prediction model, logistic regression (LR), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-nearest neighbor (KNN) algorithm models were generated. The LR model was considered as the existing statistical analysis method. Results: All predictive models maintained a change within the standard deviation of area under the curve (AUC) <0.01 in the analysis after a 10-fold cross-validation test. Among all predictive models, the LR learning model showed the highest prediction performance, with an AUC of 0.78. However, compared to the LR model, the LDA, QDA, and KNN models did not show a statistically significant difference. Conclusion: We successfully developed and verified a T2DM prediction system using machine learning and an EMR database, and it predicted the 5-year occurrence of T2DM similarly to with a traditional prediction model. In further study, it is necessary to apply and verify the prediction model through clinical research.",
keywords = "Big data, Diabetes, Machine learning, Prediction, Type 2 diabetes mellitus",
author = "Choi, {Byoung Geol} and Seung-Woon Rha and Kim, {Suhng Wook} and Kang, {Jun Hyuk} and Park, {Ji Young} and Noh, {Yung Kyun}",
year = "2019",
month = "2",
day = "1",
doi = "10.3349/ymj.2019.60.2.191",
language = "English",
volume = "60",
pages = "191--199",
journal = "Yonsei Medical Journal",
issn = "0513-5796",
publisher = "Yonsei University College of Medicine",
number = "2",

}

TY - JOUR

T1 - Machine learning for the prediction of new-onset diabetes mellitus during 5-year follow-up in non-diabetic patients with cardiovascular risks

AU - Choi, Byoung Geol

AU - Rha, Seung-Woon

AU - Kim, Suhng Wook

AU - Kang, Jun Hyuk

AU - Park, Ji Young

AU - Noh, Yung Kyun

PY - 2019/2/1

Y1 - 2019/2/1

N2 - Purpose: Many studies have proposed predictive models for type 2 diabetes mellitus (T2DM). However, these predictive models have several limitations, such as user convenience and reproducibility. The purpose of this study was to develop a T2DM predictive model using electronic medical records (EMRs) and machine learning and to compare the performance of this model with traditional statistical methods. Materials and Methods: In this study, a total of available 8454 patients who had no history of diabetes and were treated at the cardiovascular center of Korea University Guro Hospital were enrolled. All subjects completed 5 years of follow up. The prevalence of T2DM during follow up was 4.78% (404/8454). A total of 28 variables were extracted from the EMRs. In order to verify the cross-validation test according to the prediction model, logistic regression (LR), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-nearest neighbor (KNN) algorithm models were generated. The LR model was considered as the existing statistical analysis method. Results: All predictive models maintained a change within the standard deviation of area under the curve (AUC) <0.01 in the analysis after a 10-fold cross-validation test. Among all predictive models, the LR learning model showed the highest prediction performance, with an AUC of 0.78. However, compared to the LR model, the LDA, QDA, and KNN models did not show a statistically significant difference. Conclusion: We successfully developed and verified a T2DM prediction system using machine learning and an EMR database, and it predicted the 5-year occurrence of T2DM similarly to with a traditional prediction model. In further study, it is necessary to apply and verify the prediction model through clinical research.

AB - Purpose: Many studies have proposed predictive models for type 2 diabetes mellitus (T2DM). However, these predictive models have several limitations, such as user convenience and reproducibility. The purpose of this study was to develop a T2DM predictive model using electronic medical records (EMRs) and machine learning and to compare the performance of this model with traditional statistical methods. Materials and Methods: In this study, a total of available 8454 patients who had no history of diabetes and were treated at the cardiovascular center of Korea University Guro Hospital were enrolled. All subjects completed 5 years of follow up. The prevalence of T2DM during follow up was 4.78% (404/8454). A total of 28 variables were extracted from the EMRs. In order to verify the cross-validation test according to the prediction model, logistic regression (LR), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-nearest neighbor (KNN) algorithm models were generated. The LR model was considered as the existing statistical analysis method. Results: All predictive models maintained a change within the standard deviation of area under the curve (AUC) <0.01 in the analysis after a 10-fold cross-validation test. Among all predictive models, the LR learning model showed the highest prediction performance, with an AUC of 0.78. However, compared to the LR model, the LDA, QDA, and KNN models did not show a statistically significant difference. Conclusion: We successfully developed and verified a T2DM prediction system using machine learning and an EMR database, and it predicted the 5-year occurrence of T2DM similarly to with a traditional prediction model. In further study, it is necessary to apply and verify the prediction model through clinical research.

KW - Big data

KW - Diabetes

KW - Machine learning

KW - Prediction

KW - Type 2 diabetes mellitus

UR - http://www.scopus.com/inward/record.url?scp=85060210284&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060210284&partnerID=8YFLogxK

U2 - 10.3349/ymj.2019.60.2.191

DO - 10.3349/ymj.2019.60.2.191

M3 - Article

C2 - 30666841

AN - SCOPUS:85060210284

VL - 60

SP - 191

EP - 199

JO - Yonsei Medical Journal

JF - Yonsei Medical Journal

SN - 0513-5796

IS - 2

ER -