A Novel Computational Method for Biomedical Binary Data Analysis: Development of a Thyroid Disease Index Using a Brute-Force Search with MLR Analysis

Jin Kak Lee, Won Seok Han, Jun Seok Lee, Chang No Yoon

Research output: Contribution to journalArticlepeer-review


The thyroid disease index (TDI), which estimates thyroid disease progress based on hormone concentration measurements and hormone pattern changes, was developed. In this study, we measured concentrations of hormone profiles in the androgen and estrogen metabolic pathways from 23 patients with thyroid disease, as well as 20 unaffected people. We illustrated that the hormones 2-hydroxyestrone (2-OH-E1), 2-hydroxyestradiol (2-OH-E2), 2-methoxyestrone (2-MeO-E1), 2-methoxyestradiol (2-MeO-E2), and 2-methoxyestradiol-3-methylether (2-MeO-E2-3-methylether) are related to the development of thyroid disease through t-tests. Though the concentration levels of these hormones generally increase as the disease progresses, big fluctuations cause the determining of a disease's progress by measuring hormone levels to be difficult. The differing patterns between the correlation matrices of the disease and control groups possibly indicates changes in hormone releasing patterns during the thyroid disease's progress. Because of a lack of progressive experimental data on thyroid disease, binary data for the two categories (the thyroid disease patients and the control group) was utilized. Binary logistic regression was used to analyze five risk factors associated with thyroid disease, and the highest overall accuracy was 97.7% with three risk factors. Logistic regression models, however, are unable to describe disease progress. Hence, the TDI was developed to estimate thyroid disease progress. An arbitrary ranking of disease progress was generated for the TDI equation. The ranking contained a total number of 29 030 400 entries with six stages from the control group and eight stages from the disease group. Multiple linear regression (MLR) analysis was performed with a brute-force search. The best result among the MLR runs presented strong correlation (r2 values of 0.840 and q2 values of 0.663) between the selected hormones and the values of the disease progress in the training set. Overall accuracy of our novel method was 90.7%, which is worse than the 97.7% of logistic regression models. Brute-force search with MLR analysis might classify different types of thyroid disease progress such as thyroid mass (0.8055), goiter (0.8806), thyroid mass which was a thyroid cancer before operation (0.8951 and 0.9112), and cancer (1.001–2.144). The results show that the TDI is a good indicator of thyroid disease progress and that brute-force search with MLR analysis is useful for biomedical binary data analysis.

Original languageEnglish
Pages (from-to)1392-1397
Number of pages6
JournalBulletin of the Korean Chemical Society
Issue number12
Publication statusPublished - 2017 Dec
Externally publishedYes


  • Androgen
  • Brute-force search
  • Estrogen
  • Multiple linear regression
  • Thyroid disease index

ASJC Scopus subject areas

  • Chemistry(all)


Dive into the research topics of 'A Novel Computational Method for Biomedical Binary Data Analysis: Development of a Thyroid Disease Index Using a Brute-Force Search with MLR Analysis'. Together they form a unique fingerprint.

Cite this