## Abstract

An artificial neural network with multiple hidden layers (known as a deep neural network, or DNN) was employed as a predictive model (DNN_{p}) for the first time to predict emotional responses using whole-brain functional magnetic resonance imaging (fMRI) data from individual subjects. During fMRI data acquisition, 10 healthy participants listened to 80 International Affective Digital Sound stimuli and rated their own emotions generated by each sound stimulus in terms of the arousal, dominance, and valence dimensions. The whole-brain spatial patterns from a general linear model (i.e., beta-valued maps) for each sound stimulus and the emotional response ratings were used as the input and output for the DNN_{P}, respectively. Based on a nested five-fold cross-validation scheme, the paired input and output data were divided into training (three-fold), validation (one-fold), and test (one-fold) data. The DNN_{P} was trained and optimized using the training and validation data and was tested using the test data. The Pearson's correlation coefficients between the rated and predicted emotional responses from our DNN_{P} model with weight sparsity optimization (mean ± standard error 0.52 ± 0.02 for arousal, 0.51 ± 0.03 for dominance, and 0.51 ± 0.03 for valence, with an input denoising level of 0.3 and a mini-batch size of 1) were significantly greater than those of DNN models with conventional regularization schemes including elastic net regularization (0.15 ± 0.05, 0.15 ± 0.06, and 0.21 ± 0.04 for arousal, dominance, and valence, respectively), those of shallow models including logistic regression (0.11 ± 0.04, 0.10 ± 0.05, and 0.17 ± 0.04 for arousal, dominance, and valence, respectively; average of logistic regression and sparse logistic regression), and those of support vector machine-based predictive models (SVM_{p}s; 0.12 ± 0.06, 0.06 ± 0.06, and 0.10 ± 0.06 for arousal, dominance, and valence, respectively; average of linear and non-linear SVM_{p}s). This difference was confirmed to be significant with a Bonferroni-corrected p-value of less than 0.001 from a one-way analysis of variance (ANOVA) and subsequent paired t-test. The weights of the trained DNN_{P}s were interpreted and input patterns that maximized or minimized the output of the DNN_{P}s (i.e., the emotional responses) were estimated. Based on a binary classification of each emotion category (e.g., high arousal vs. low arousal), the error rates for the DNN_{P} (31.2% ± 1.3% for arousal, 29.0% ± 1.7% for dominance, and 28.6% ± 3.0% for valence) were significantly lower than those for the linear SVM_{P} (44.7% ± 2.0%, 50.7% ± 1.7%, and 47.4% ± 1.9% for arousal, dominance, and valence, respectively) and the non-linear SVM_{P} (48.8% ± 2.3%, 52.2% ± 1.9%, and 46.4% ± 1.3% for arousal, dominance, and valence, respectively), as confirmed by the Bonferroni-corrected p < 0.001 from the one-way ANOVA. Our study demonstrates that the DNN_{p} model is able to reveal neuronal circuitry associated with human emotional processing – including structures in the limbic and paralimbic areas, which include the amygdala, prefrontal areas, anterior cingulate cortex, insula, and caudate. Our DNN_{p} model was also able to use activation patterns in these structures to predict and classify emotional responses to stimuli.

Original language | English |
---|---|

Pages (from-to) | 607-627 |

Number of pages | 21 |

Journal | NeuroImage |

Volume | 186 |

DOIs | |

Publication status | Published - 2019 Feb 1 |

## Keywords

- Deep learning
- Deep neural network
- Emotion
- fMRI
- Machine learning
- Prediction
- Regression
- Support vector machine

## ASJC Scopus subject areas

- Neurology
- Cognitive Neuroscience