A heuristic method for selecting support features from large datasets

Hong Seo Ryoo, In Yong Jang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

For feature selection in machine learning, set covering (SC) is most suited, for it selects support features for data under analysis based on the individual and the collective roles of the candidate features. However, the SC-based feature selection requires the complete pair-wise comparisons of the members of the different classes in a dataset, and this renders the meritorious SC principle impracticable for selecting support features from a large number of data. Introducing the notion of implicit SC-based feature selection, this paper presents a feature selection procedure that is equivalent to the standard SC-based feature selection procedure in supervised learning but with the memory requirement that is multiple orders of magnitude less than the counterpart. With experiments on six large machine learning datasets, we demonstrate the usefulness of the proposed implicit SCbased feature selection scheme in large-scale supervised data analysis.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages411-423
Number of pages13
Volume4508 LNCS
Publication statusPublished - 2007 Dec 1
Event3rd International Conference on Algorithmic Aspects in Information and Management, AAIM 2007 - Portland, OR, United States
Duration: 2007 Jun 62007 Jun 8

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4508 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other3rd International Conference on Algorithmic Aspects in Information and Management, AAIM 2007
CountryUnited States
CityPortland, OR
Period07/6/607/6/8

Fingerprint

Heuristic methods
Heuristic Method
Set Covering
Large Data Sets
Feature Selection
Feature extraction
Selection Procedures
Learning
Learning systems
Machine Learning
Pairwise Comparisons
Supervised learning
Supervised Learning
Datasets
Heuristics
Data analysis
Data storage equipment
Requirements
Demonstrate
Experiment

Keywords

  • Combinatorial optimization
  • Feature selection
  • Large datasets
  • Supervised learning

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Ryoo, H. S., & Jang, I. Y. (2007). A heuristic method for selecting support features from large datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4508 LNCS, pp. 411-423). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4508 LNCS).

A heuristic method for selecting support features from large datasets. / Ryoo, Hong Seo; Jang, In Yong.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4508 LNCS 2007. p. 411-423 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4508 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ryoo, HS & Jang, IY 2007, A heuristic method for selecting support features from large datasets. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4508 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4508 LNCS, pp. 411-423, 3rd International Conference on Algorithmic Aspects in Information and Management, AAIM 2007, Portland, OR, United States, 07/6/6.
Ryoo HS, Jang IY. A heuristic method for selecting support features from large datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4508 LNCS. 2007. p. 411-423. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Ryoo, Hong Seo ; Jang, In Yong. / A heuristic method for selecting support features from large datasets. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4508 LNCS 2007. pp. 411-423 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{31af37d163cd42ebbeb79a6e903c8c93,
title = "A heuristic method for selecting support features from large datasets",
abstract = "For feature selection in machine learning, set covering (SC) is most suited, for it selects support features for data under analysis based on the individual and the collective roles of the candidate features. However, the SC-based feature selection requires the complete pair-wise comparisons of the members of the different classes in a dataset, and this renders the meritorious SC principle impracticable for selecting support features from a large number of data. Introducing the notion of implicit SC-based feature selection, this paper presents a feature selection procedure that is equivalent to the standard SC-based feature selection procedure in supervised learning but with the memory requirement that is multiple orders of magnitude less than the counterpart. With experiments on six large machine learning datasets, we demonstrate the usefulness of the proposed implicit SCbased feature selection scheme in large-scale supervised data analysis.",
keywords = "Combinatorial optimization, Feature selection, Large datasets, Supervised learning",
author = "Ryoo, {Hong Seo} and Jang, {In Yong}",
year = "2007",
month = "12",
day = "1",
language = "English",
isbn = "9783540728689",
volume = "4508 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "411--423",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - A heuristic method for selecting support features from large datasets

AU - Ryoo, Hong Seo

AU - Jang, In Yong

PY - 2007/12/1

Y1 - 2007/12/1

N2 - For feature selection in machine learning, set covering (SC) is most suited, for it selects support features for data under analysis based on the individual and the collective roles of the candidate features. However, the SC-based feature selection requires the complete pair-wise comparisons of the members of the different classes in a dataset, and this renders the meritorious SC principle impracticable for selecting support features from a large number of data. Introducing the notion of implicit SC-based feature selection, this paper presents a feature selection procedure that is equivalent to the standard SC-based feature selection procedure in supervised learning but with the memory requirement that is multiple orders of magnitude less than the counterpart. With experiments on six large machine learning datasets, we demonstrate the usefulness of the proposed implicit SCbased feature selection scheme in large-scale supervised data analysis.

AB - For feature selection in machine learning, set covering (SC) is most suited, for it selects support features for data under analysis based on the individual and the collective roles of the candidate features. However, the SC-based feature selection requires the complete pair-wise comparisons of the members of the different classes in a dataset, and this renders the meritorious SC principle impracticable for selecting support features from a large number of data. Introducing the notion of implicit SC-based feature selection, this paper presents a feature selection procedure that is equivalent to the standard SC-based feature selection procedure in supervised learning but with the memory requirement that is multiple orders of magnitude less than the counterpart. With experiments on six large machine learning datasets, we demonstrate the usefulness of the proposed implicit SCbased feature selection scheme in large-scale supervised data analysis.

KW - Combinatorial optimization

KW - Feature selection

KW - Large datasets

KW - Supervised learning

UR - http://www.scopus.com/inward/record.url?scp=38149104596&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38149104596&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9783540728689

VL - 4508 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 411

EP - 423

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -