Longitudinal clinical score prediction in Alzheimer's disease with soft-split sparse regression based random forest

Lei Huang, Yan Jin, Yaozong Gao, Kim Han Thung, Dinggang Shen

Research output: Contribution to journalArticle

43 Citations (Scopus)

Abstract

Alzheimer's disease (AD) is an irreversible neurodegenerative disease and affects a large population in the world. Cognitive scores at multiple time points can be reliably used to evaluate the progression of the disease clinically. In recent studies, machine learning techniques have shown promising results on the prediction of AD clinical scores. However, there are multiple limitations in the current models such as linearity assumption and missing data exclusion. Here, we present a nonlinear supervised sparse regression–based random forest (RF) framework to predict a variety of longitudinal AD clinical scores. Furthermore, we propose a soft-split technique to assign probabilistic paths to a test sample in RF for more accurate predictions. In order to benefit from the longitudinal scores in the study, unlike the previous studies that often removed the subjects with missing scores, we first estimate those missing scores with our proposed soft-split sparse regression–based RF and then utilize those estimated longitudinal scores at all the previous time points to predict the scores at the next time point. The experiment results demonstrate that our proposed method is superior to the traditional RF and outperforms other state-of-art regression models. Our method can also be extended to be a general regression framework to predict other disease scores.

Original languageEnglish
Pages (from-to)180-191
Number of pages12
JournalNeurobiology of Aging
Volume46
DOIs
Publication statusPublished - 2016 Oct 1

Fingerprint

Alzheimer Disease
Neurodegenerative Diseases
Disease Progression
Forests
Population

Keywords

  • Alzheimer's disease
  • Clinical scores
  • Longitudinal study
  • Random forest
  • Soft-split
  • Sparse representation

ASJC Scopus subject areas

  • Neuroscience(all)
  • Medicine(all)
  • Ageing
  • Developmental Biology
  • Geriatrics and Gerontology
  • Clinical Neurology

Cite this

Longitudinal clinical score prediction in Alzheimer's disease with soft-split sparse regression based random forest. / Huang, Lei; Jin, Yan; Gao, Yaozong; Thung, Kim Han; Shen, Dinggang.

In: Neurobiology of Aging, Vol. 46, 01.10.2016, p. 180-191.

Research output: Contribution to journalArticle

@article{f267fbed646b46c5a42d05f858b06487,
title = "Longitudinal clinical score prediction in Alzheimer's disease with soft-split sparse regression based random forest",
abstract = "Alzheimer's disease (AD) is an irreversible neurodegenerative disease and affects a large population in the world. Cognitive scores at multiple time points can be reliably used to evaluate the progression of the disease clinically. In recent studies, machine learning techniques have shown promising results on the prediction of AD clinical scores. However, there are multiple limitations in the current models such as linearity assumption and missing data exclusion. Here, we present a nonlinear supervised sparse regression–based random forest (RF) framework to predict a variety of longitudinal AD clinical scores. Furthermore, we propose a soft-split technique to assign probabilistic paths to a test sample in RF for more accurate predictions. In order to benefit from the longitudinal scores in the study, unlike the previous studies that often removed the subjects with missing scores, we first estimate those missing scores with our proposed soft-split sparse regression–based RF and then utilize those estimated longitudinal scores at all the previous time points to predict the scores at the next time point. The experiment results demonstrate that our proposed method is superior to the traditional RF and outperforms other state-of-art regression models. Our method can also be extended to be a general regression framework to predict other disease scores.",
keywords = "Alzheimer's disease, Clinical scores, Longitudinal study, Random forest, Soft-split, Sparse representation",
author = "Lei Huang and Yan Jin and Yaozong Gao and Thung, {Kim Han} and Dinggang Shen",
year = "2016",
month = "10",
day = "1",
doi = "10.1016/j.neurobiolaging.2016.07.005",
language = "English",
volume = "46",
pages = "180--191",
journal = "Neurobiology of Aging",
issn = "0197-4580",
publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Longitudinal clinical score prediction in Alzheimer's disease with soft-split sparse regression based random forest

AU - Huang, Lei

AU - Jin, Yan

AU - Gao, Yaozong

AU - Thung, Kim Han

AU - Shen, Dinggang

PY - 2016/10/1

Y1 - 2016/10/1

N2 - Alzheimer's disease (AD) is an irreversible neurodegenerative disease and affects a large population in the world. Cognitive scores at multiple time points can be reliably used to evaluate the progression of the disease clinically. In recent studies, machine learning techniques have shown promising results on the prediction of AD clinical scores. However, there are multiple limitations in the current models such as linearity assumption and missing data exclusion. Here, we present a nonlinear supervised sparse regression–based random forest (RF) framework to predict a variety of longitudinal AD clinical scores. Furthermore, we propose a soft-split technique to assign probabilistic paths to a test sample in RF for more accurate predictions. In order to benefit from the longitudinal scores in the study, unlike the previous studies that often removed the subjects with missing scores, we first estimate those missing scores with our proposed soft-split sparse regression–based RF and then utilize those estimated longitudinal scores at all the previous time points to predict the scores at the next time point. The experiment results demonstrate that our proposed method is superior to the traditional RF and outperforms other state-of-art regression models. Our method can also be extended to be a general regression framework to predict other disease scores.

AB - Alzheimer's disease (AD) is an irreversible neurodegenerative disease and affects a large population in the world. Cognitive scores at multiple time points can be reliably used to evaluate the progression of the disease clinically. In recent studies, machine learning techniques have shown promising results on the prediction of AD clinical scores. However, there are multiple limitations in the current models such as linearity assumption and missing data exclusion. Here, we present a nonlinear supervised sparse regression–based random forest (RF) framework to predict a variety of longitudinal AD clinical scores. Furthermore, we propose a soft-split technique to assign probabilistic paths to a test sample in RF for more accurate predictions. In order to benefit from the longitudinal scores in the study, unlike the previous studies that often removed the subjects with missing scores, we first estimate those missing scores with our proposed soft-split sparse regression–based RF and then utilize those estimated longitudinal scores at all the previous time points to predict the scores at the next time point. The experiment results demonstrate that our proposed method is superior to the traditional RF and outperforms other state-of-art regression models. Our method can also be extended to be a general regression framework to predict other disease scores.

KW - Alzheimer's disease

KW - Clinical scores

KW - Longitudinal study

KW - Random forest

KW - Soft-split

KW - Sparse representation

UR - http://www.scopus.com/inward/record.url?scp=84982735820&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84982735820&partnerID=8YFLogxK

U2 - 10.1016/j.neurobiolaging.2016.07.005

DO - 10.1016/j.neurobiolaging.2016.07.005

M3 - Article

C2 - 27500865

AN - SCOPUS:84982735820

VL - 46

SP - 180

EP - 191

JO - Neurobiology of Aging

JF - Neurobiology of Aging

SN - 0197-4580

ER -