An MLP-based feature subset selection for HIV-1 protease cleavage site analysis

Gilhan Kim, Yeonjoo Kim, Heui Seok Lim, Hyeoncheol Kim

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

Objective: In recent years, several machine learning approaches have been applied to modeling the specificity of the human immunodeficiency virus type 1 (HIV-1) protease cleavage domain. However, the high dimensional domain dataset contains a small number of samples, which could misguide classification modeling and its interpretation. Appropriate feature selection can alleviate the problem by eliminating irrelevant and redundant features, and thus improve prediction performance. Methods: We introduce a new feature subset selection method, FS-MLP, that selects relevant features using multi-layered perceptron (MLP) learning. The method includes MLP learning with a training dataset and then feature subset selection using decompositional approach to analyze the trained MLP. Our method is able to select a subset of relevant features in high dimensional, multi-variate and non-linear domains. Results: Using five artificial datasets that represent four data types, we verified the FS-MLP performance with seven other feature selection methods. Experimental results showed that the FS-MLP is superior at high dimensional, multi-variate and non-linear domains. In experiments with HIV-1 protease cleavage dataset, the FS-MLP selected a set of 14 highly relevant features among 160 original features. On a validation set of 131 test instances, classifiers that used the 14 features showed about 95% accuracy which outperformed other seven methods in terms of accuracy and the number of features. Conclusions: Our experimental results indicate that the FS-MLP is effective in analyzing multi-variate, non-linear and high dimensional datasets such as HIV-1 protease cleavage dataset. The 14 relevant features which were selected by the FS-MLP provide us with useful insights into the HIV-1 cleavage site domain as well. The FS-MLP is a useful method for computational sequence analysis in general.

Original languageEnglish
Pages (from-to)83-89
Number of pages7
JournalArtificial Intelligence in Medicine
Volume48
Issue number2-3
DOIs
Publication statusPublished - 2010 Feb 1

Fingerprint

Neural Networks (Computer)
Viruses
HIV-1
Peptide Hydrolases
Neural networks
Feature extraction
Learning
Sequence Analysis
Learning systems
Datasets
Classifiers

Keywords

  • Dimension reduction
  • Feature selection
  • HIV-1 protease cleavage site prediction
  • Multi-layered perceptron

ASJC Scopus subject areas

  • Artificial Intelligence
  • Medicine (miscellaneous)

Cite this

An MLP-based feature subset selection for HIV-1 protease cleavage site analysis. / Kim, Gilhan; Kim, Yeonjoo; Lim, Heui Seok; Kim, Hyeoncheol.

In: Artificial Intelligence in Medicine, Vol. 48, No. 2-3, 01.02.2010, p. 83-89.

Research output: Contribution to journalArticle

@article{ae25c02acaa94de989f63352344e264c,
title = "An MLP-based feature subset selection for HIV-1 protease cleavage site analysis",
abstract = "Objective: In recent years, several machine learning approaches have been applied to modeling the specificity of the human immunodeficiency virus type 1 (HIV-1) protease cleavage domain. However, the high dimensional domain dataset contains a small number of samples, which could misguide classification modeling and its interpretation. Appropriate feature selection can alleviate the problem by eliminating irrelevant and redundant features, and thus improve prediction performance. Methods: We introduce a new feature subset selection method, FS-MLP, that selects relevant features using multi-layered perceptron (MLP) learning. The method includes MLP learning with a training dataset and then feature subset selection using decompositional approach to analyze the trained MLP. Our method is able to select a subset of relevant features in high dimensional, multi-variate and non-linear domains. Results: Using five artificial datasets that represent four data types, we verified the FS-MLP performance with seven other feature selection methods. Experimental results showed that the FS-MLP is superior at high dimensional, multi-variate and non-linear domains. In experiments with HIV-1 protease cleavage dataset, the FS-MLP selected a set of 14 highly relevant features among 160 original features. On a validation set of 131 test instances, classifiers that used the 14 features showed about 95{\%} accuracy which outperformed other seven methods in terms of accuracy and the number of features. Conclusions: Our experimental results indicate that the FS-MLP is effective in analyzing multi-variate, non-linear and high dimensional datasets such as HIV-1 protease cleavage dataset. The 14 relevant features which were selected by the FS-MLP provide us with useful insights into the HIV-1 cleavage site domain as well. The FS-MLP is a useful method for computational sequence analysis in general.",
keywords = "Dimension reduction, Feature selection, HIV-1 protease cleavage site prediction, Multi-layered perceptron",
author = "Gilhan Kim and Yeonjoo Kim and Lim, {Heui Seok} and Hyeoncheol Kim",
year = "2010",
month = "2",
day = "1",
doi = "10.1016/j.artmed.2009.07.010",
language = "English",
volume = "48",
pages = "83--89",
journal = "Artificial Intelligence in Medicine",
issn = "0933-3657",
publisher = "Elsevier",
number = "2-3",

}

TY - JOUR

T1 - An MLP-based feature subset selection for HIV-1 protease cleavage site analysis

AU - Kim, Gilhan

AU - Kim, Yeonjoo

AU - Lim, Heui Seok

AU - Kim, Hyeoncheol

PY - 2010/2/1

Y1 - 2010/2/1

N2 - Objective: In recent years, several machine learning approaches have been applied to modeling the specificity of the human immunodeficiency virus type 1 (HIV-1) protease cleavage domain. However, the high dimensional domain dataset contains a small number of samples, which could misguide classification modeling and its interpretation. Appropriate feature selection can alleviate the problem by eliminating irrelevant and redundant features, and thus improve prediction performance. Methods: We introduce a new feature subset selection method, FS-MLP, that selects relevant features using multi-layered perceptron (MLP) learning. The method includes MLP learning with a training dataset and then feature subset selection using decompositional approach to analyze the trained MLP. Our method is able to select a subset of relevant features in high dimensional, multi-variate and non-linear domains. Results: Using five artificial datasets that represent four data types, we verified the FS-MLP performance with seven other feature selection methods. Experimental results showed that the FS-MLP is superior at high dimensional, multi-variate and non-linear domains. In experiments with HIV-1 protease cleavage dataset, the FS-MLP selected a set of 14 highly relevant features among 160 original features. On a validation set of 131 test instances, classifiers that used the 14 features showed about 95% accuracy which outperformed other seven methods in terms of accuracy and the number of features. Conclusions: Our experimental results indicate that the FS-MLP is effective in analyzing multi-variate, non-linear and high dimensional datasets such as HIV-1 protease cleavage dataset. The 14 relevant features which were selected by the FS-MLP provide us with useful insights into the HIV-1 cleavage site domain as well. The FS-MLP is a useful method for computational sequence analysis in general.

AB - Objective: In recent years, several machine learning approaches have been applied to modeling the specificity of the human immunodeficiency virus type 1 (HIV-1) protease cleavage domain. However, the high dimensional domain dataset contains a small number of samples, which could misguide classification modeling and its interpretation. Appropriate feature selection can alleviate the problem by eliminating irrelevant and redundant features, and thus improve prediction performance. Methods: We introduce a new feature subset selection method, FS-MLP, that selects relevant features using multi-layered perceptron (MLP) learning. The method includes MLP learning with a training dataset and then feature subset selection using decompositional approach to analyze the trained MLP. Our method is able to select a subset of relevant features in high dimensional, multi-variate and non-linear domains. Results: Using five artificial datasets that represent four data types, we verified the FS-MLP performance with seven other feature selection methods. Experimental results showed that the FS-MLP is superior at high dimensional, multi-variate and non-linear domains. In experiments with HIV-1 protease cleavage dataset, the FS-MLP selected a set of 14 highly relevant features among 160 original features. On a validation set of 131 test instances, classifiers that used the 14 features showed about 95% accuracy which outperformed other seven methods in terms of accuracy and the number of features. Conclusions: Our experimental results indicate that the FS-MLP is effective in analyzing multi-variate, non-linear and high dimensional datasets such as HIV-1 protease cleavage dataset. The 14 relevant features which were selected by the FS-MLP provide us with useful insights into the HIV-1 cleavage site domain as well. The FS-MLP is a useful method for computational sequence analysis in general.

KW - Dimension reduction

KW - Feature selection

KW - HIV-1 protease cleavage site prediction

KW - Multi-layered perceptron

UR - http://www.scopus.com/inward/record.url?scp=77951634085&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951634085&partnerID=8YFLogxK

U2 - 10.1016/j.artmed.2009.07.010

DO - 10.1016/j.artmed.2009.07.010

M3 - Article

VL - 48

SP - 83

EP - 89

JO - Artificial Intelligence in Medicine

JF - Artificial Intelligence in Medicine

SN - 0933-3657

IS - 2-3

ER -