Articial intelligence models may be used to improve performance of information retrieval (IR) systems and the genetic algorithms (GAs) are an example of such a model. This paper presents an application of GAs as a relevance feedback method aiming to improve the document representation and indexing. In this par- ticular form of GAs, various document descriptions compete with each other and a better collection indexing is sought through reproduction, crossover and mutation operations. In this paradigm, we are searching for
the optimal balance between two genetic parameters: the population size and the number of generations. We try to discover the optimal parameter choice both by experiments using the CACM and CISI collections, and by a theoretical analysis providing explanation of the exper- imental results. The general conclusion tends to be that larger populations have better chance of signi cantly improving the e ectiveness of retrieval. 1 Introduction Probabilistic algorithms are relatively recent in computer science but their range of applications has increased rapidly. They present the advantage of being able to take di erent decisions at di erent moments while solving the same problem (Brassard and Bratley 1994 [2]). If they do not nd the solution to a problem the rst time, they can still nd it in another trial. The GAs are a special case of such algorithms. Since their development (Holland 1975 [10]), they have been applied to various problems, and information retrieval is an example. Inspired by the natural selection of living organisms, the GAs are adapt- able to a large number of problems because they o er a very general paradigm, where the domain-speci c knowledge can easily be plugged in. Their robust- ness, simplicity, and variety of solutions they can nd make them attractive in various elds and especially for problems di cult to solve by more traditional approaches (De Jong and Spears 1989 [4], Sushil and Gong 1997 [21]). The GAs work within a space of possible solutions to a given problem. Starting with a number of such potential solutions, they will seek better ones by operations of reproduction, crossover and mutation. For the GAs to be 2 Dana Vrajitoru e cient, the user needs to provide a good representation of their own problem and a tness function describing how ‘close’ a solution guess is to the goal of the search, departing ‘good’ from ‘bad’ solutions. These two aspects represent the main di culty of the GAs. Information retrieval researchers have suggested these algorithms to im- prove the performance of their systems. Gordon (1988 [8]), and Blair (1990 [1]) have used them to improve document indexing. Chen (1995 [3]), Petry et al. (1993 [14]), Yang et al. (1992 [24], 1992 [24]), Kraft et al. (1992 [11]) and Sanchez et al. (1992 [19]) present an approach based on GAs to enhance the query description. Finally, Gordon (1991 [9]) has employed them to build document clusters....
Website: www.cs.iusb.edu | Filesize: -
No of Page(s): 24
Download Large Population or Many Generations for Genetic Algorithms ....pdf
No comments:
Post a Comment