Relevant, irredundant feature selection and noisy example elimination

Links

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1275523

Text available via DOI:

https://doi.org/10.1109/TSMCB.2003.817106
Final published version

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Relevant, irredundant feature selection and noisy example elimination. / Lashkia, George V.; Anthony, Laurence.
In: IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, Vol. 34, No. 2, 04.2004, p. 888-897.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Lashkia, GV & Anthony, L 2004, 'Relevant, irredundant feature selection and noisy example elimination', IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, vol. 34, no. 2, pp. 888-897. https://doi.org/10.1109/TSMCB.2003.817106

APA

Lashkia, G. V., & Anthony, L. (2004). Relevant, irredundant feature selection and noisy example elimination. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 34(2), 888-897. https://doi.org/10.1109/TSMCB.2003.817106

Vancouver

Lashkia GV, Anthony L. Relevant, irredundant feature selection and noisy example elimination. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics. 2004 Apr;34(2):888-897. doi: 10.1109/TSMCB.2003.817106

Author

Lashkia, George V. ; Anthony, Laurence. / Relevant, irredundant feature selection and noisy example elimination. In: IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics. 2004 ; Vol. 34, No. 2. pp. 888-897.

Bibtex

@article{f86b4240b471466e9e5e576511ce4d61,

title = "Relevant, irredundant feature selection and noisy example elimination",

abstract = "In many real-world situations, the method for computing the desired output from a set of inputs is unknown. One strategy for solving these types of problems is to learn the input-output functionality from examples in a training set. However, in many situations it is difficult to know what information is relevant to the task at hand. Subsequently, researchers have investigated ways to deal with the so-called problem of consistency of attributes, i.e., attributes that can distinguish examples from different classes. In this paper, we first prove that the notion of relevance of attributes is directly related to the consistency of attributes, and show how relevant, irredundant attributes can be selected. We then compare different relevant attribute selection algorithms, and show the superiority of algorithms that select irredundant attributes over those that select relevant attributes. We also show that searching for an {"}optimal{"} subset of attributes, which is considered to be the main purpose of attribute selection, is not the best way to improve the accuracy of classifiers. Employing sets of relevant, irredundant attributes improves classification accuracy in many more cases. Finally, we propose a new method for selecting relevant examples, which is based on filtering the so-called pattern frequency domain. By identifying examples that are nontypical in the determination of relevant, irredundant attributes, irrelevant examples can be eliminated prior to the learning process. Empirical results using artificial and real databases show the effectiveness of the proposed method in selecting relevant examples leading to improved performance even on greatly reduced training sets.",

author = "Lashkia, {George V.} and Laurence Anthony",

year = "2004",

month = apr,

doi = "10.1109/TSMCB.2003.817106",

language = "English",

volume = "34",

pages = "888--897",

journal = "IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics",

issn = "1083-4419",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "2",

}

RIS

TY - JOUR

T1 - Relevant, irredundant feature selection and noisy example elimination

AU - Lashkia, George V.

AU - Anthony, Laurence

PY - 2004/4

Y1 - 2004/4

N2 - In many real-world situations, the method for computing the desired output from a set of inputs is unknown. One strategy for solving these types of problems is to learn the input-output functionality from examples in a training set. However, in many situations it is difficult to know what information is relevant to the task at hand. Subsequently, researchers have investigated ways to deal with the so-called problem of consistency of attributes, i.e., attributes that can distinguish examples from different classes. In this paper, we first prove that the notion of relevance of attributes is directly related to the consistency of attributes, and show how relevant, irredundant attributes can be selected. We then compare different relevant attribute selection algorithms, and show the superiority of algorithms that select irredundant attributes over those that select relevant attributes. We also show that searching for an "optimal" subset of attributes, which is considered to be the main purpose of attribute selection, is not the best way to improve the accuracy of classifiers. Employing sets of relevant, irredundant attributes improves classification accuracy in many more cases. Finally, we propose a new method for selecting relevant examples, which is based on filtering the so-called pattern frequency domain. By identifying examples that are nontypical in the determination of relevant, irredundant attributes, irrelevant examples can be eliminated prior to the learning process. Empirical results using artificial and real databases show the effectiveness of the proposed method in selecting relevant examples leading to improved performance even on greatly reduced training sets.

AB - In many real-world situations, the method for computing the desired output from a set of inputs is unknown. One strategy for solving these types of problems is to learn the input-output functionality from examples in a training set. However, in many situations it is difficult to know what information is relevant to the task at hand. Subsequently, researchers have investigated ways to deal with the so-called problem of consistency of attributes, i.e., attributes that can distinguish examples from different classes. In this paper, we first prove that the notion of relevance of attributes is directly related to the consistency of attributes, and show how relevant, irredundant attributes can be selected. We then compare different relevant attribute selection algorithms, and show the superiority of algorithms that select irredundant attributes over those that select relevant attributes. We also show that searching for an "optimal" subset of attributes, which is considered to be the main purpose of attribute selection, is not the best way to improve the accuracy of classifiers. Employing sets of relevant, irredundant attributes improves classification accuracy in many more cases. Finally, we propose a new method for selecting relevant examples, which is based on filtering the so-called pattern frequency domain. By identifying examples that are nontypical in the determination of relevant, irredundant attributes, irrelevant examples can be eliminated prior to the learning process. Empirical results using artificial and real databases show the effectiveness of the proposed method in selecting relevant examples leading to improved performance even on greatly reduced training sets.

U2 - 10.1109/TSMCB.2003.817106

DO - 10.1109/TSMCB.2003.817106

M3 - Journal article

VL - 34

SP - 888

EP - 897

JO - IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics

JF - IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics

SN - 1083-4419

IS - 2

ER -

Research

Links

Text available via DOI: