Abstract. Background: Molecular classification of tumors can be achieved by global gene expression profiling. Most machine learning classification algorithms furnish global error rates for the entire population. A few algorithms provide an estimate of probability of malignancy for each queried patient but the degree of accuracy of these estimates is unknown. On the other hand local minimax learning provides such probability estimates with best finite sample bounds on expected mean squared error on an individual basis for each queried patient. This allows a significant percentage of the patients to be identified as confidently predictable, a condition that ensures that the machine learning algorithm possesses an error rate below the tolerable level when applied to the confidently predictable patients. Results: We devise a new learning method that implements: (i) feature selection using the k-TSP algorithm and (ii) classifier construction by local minimax kernel learning. We test our method on three publicly available gene expression datasets and achieve significantly lower error rate for a substantial identifiable subset of patients. Our final classifiers are simple to interpret and they can make prediction on an individual basis with an individualized confidence level. Conclusions: Patients that were predicted confidently by the classifiers as cancer can receive immediate and appropriate treatment whilst patients that were predicted confidently as healthy will be spared from unnecessary treatment. We believe that our method can be a useful tool to translate the gene expression signatures into clinical practice for personalized medicine.
ASJC Scopus subject areas