Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features #28from a much larger set#29 to represent the patterns to be classified. This paper presents an approach to the multi-criteria optimization problem of feature subset selection using a genetic algorithm. Our experiments demonstrate the feasibility of this approach for feature subset selection in the automated design of neural networks for pattern classification and knowledge discovery.
1 Introduction Many practical pattern classification tasks #28e.g., medical diagnosis#29 require learning of an appropriate classification function that assigns a given input pattern typically represented using a vector of attribute or feature values#29 to one of a #0Cnite set of classes. The choice of features, attributes, or measurements used to represent patterns that are presented to a classifier a#0Bect among other things#29: #0F The accuracy of the classification function that can be learned using an inductive learning algorithm #28e.g., a decision tree induction algorithm or a neural network learning algorithm#29: The attributes used to describe the patterns implicitly de#0Cne a pattern language. If the language is not expressive enough, it would fail to capture the infor- mationthat isnecessaryfor classi#0Ccation and henceregardlessof the learningalgorithm used, the accuracy of the classi#0Ccation function learned would be limited by this lack of information. #03 This report is a revised version of TR97-02. y This researchwas partially supported by the National Science Foundation #28through grants IRI-9409580 and IRI-9643299#29 and the John Deere Foundation. 1 #0F The time needed for learning a sufficiently accurate classification function: For a given representation of the classification function, the attributes used to describe the patterns implicitly determine the search space that needs to be explored by the learning algorithm. An abundance of irrelevant attributes can unnecessarily increase the size of the search space, and hence the time needed for learning a sufficiently accurate classification function. #0F The number of examples needed for learning a sufficiently accurate classification function: All other things being equal, the larger the number of attributes used to describe the patterns in a domain of interest, the larger is the number of examples needed to learn a classification function to a desired accuracy. #0F The cost of performing classification using the learned classification function: In many practical applications e.g., medical diagnosis, patterns are described using observable symptoms as well as results of diagnostic tests. Di#0Berent diagnostic tests might have di#0Berent costs as well as risks associated with them. For instance, an invasive ex-...
Website: archives.cs.iastate.edu | Filesize: -
No of Page(s): 13
Download Feature Subset Selection Using A Genetic Algorithm.pdf
No comments:
Post a Comment