Monday, October 1, 2012

Learning Fuzzy Logic Controller for Reactive Robot Behaviours

This paper presents a learning approach to the development of a fuzzy logic controller, based on the delayed rewards from the real world. The delayed rewards are apportioned to the individual fuzzy rules by using reinforcement Q-learning. The efficient exploration of a solution space is one of the key issues in the reinforcement learning. A specific genetic algorithm is developed in this paper to trade off the exploration of learning spaces and the exploitation of learned experience. The proposed approach is evaluated on some reactive behaviour of the football-playing robots. Keywords: robot learning, Q-learning, fuzzy logic controller, genetic algorithms. I. INTRODUCTION Fuzzy Logic Controllers (FLC) has

been widely adopted to design reactive robot behaviours [2][8][17]. The main reasons are due to the uncertainty in both sensory information and motor execution. The behaviours designed by FLCs can map real-valued sensory information to real- valued motor commands through fuzzy inference. FLCs are also known as Fuzzy Classifier Systems (FCSs) since they are rule-based and similar to crisp classifier systems in structure [2][19]. Consequent to successful applications of Genetic Algorithms (GAs) for classifier systems [3], the use of GAs or other Evolution Algorithms (EAs) to evolve FLCs has been investigated in many applications [7][10][11][13][19]. In these applications, the parameters of membership functions of fuzzy sets were learned, and so were fuzzy rules, including choosing input variables for fuzzy rules, determining the number of fuzzy sets for input or output variables, etc. However, due to the large search spaces, most complex encoding schemes in robot applications have been implemented in a simulation environment [9][12]. Most evolution scenarios of FLCs falls into two categories: the Michigan approach and the Pitt approach [4][13]. In the Pitt approach, a FLC or an entire fuzzy set is encoded as an individual. GAs maintain a population of FLCs to evolve by using genetic operators. An optimal or sub-optimal FLC at the end of evolution can be found by choosing the individual that has highest fitness value. On the other hand, the Michigan approach encodes one rule as an individual. GAs maintain a population of rules and a FLC is represented by the entire population [7][13]. Through evolving rules, an optimal or sub-optimal FLC is composed of the rules with higher fitness values. Many researchers called this approach symbolic evolution [10]. Fitness values of GAs always reflect cumulative rewards received by learning algorithms during the whole course of interaction with the environment. They indicate the quality of a sequence of actions, rather than any individual action. To evaluate individual actions, the credit assignment problem exits. In the evolution of crisp classifier systems, the Michigan approach relies on the rule strengths. The bucket brigade algorithm is used to apportion the rewards to individual rules according to their strengths [3]. The classifiers or rules compete with each other through a biding mechanism. In the Pitts approach, the credit assignment is believed to be made implicitly [14]. Poor individual actions have less chance to survive. In the FLCs, the situation of the Michigan approach becomes more complex since the...

Website: cswww.essex.ac.uk | Filesize: -
No of Page(s): 6
Download Learning Fuzzy Logic Controller for Reactive Robot Behaviours.pdf

No comments:

Post a Comment