Based on an algorithm for pattern matching in character strings, a pat- tern matching machine is implemented that searches for occurrences of patterns in multidimensional time series. Before the search process takes place, time series data is encoded in user-designed alphabets. The patterns, on the other hand, are formulated as regular expressions that are composed of letters from these alphabets and operators. Furthermore, a genetic algorithm is developed to breed patterns that maximize a user- defined fitness function. In an application to financial data, it is shown that patterns
bred to predict high exchange rates volatility in training samples retain statistically significant predictive power in validation samples. 1. Introduction This work is a contribution to the rapidly developing research area of data mining, a host of methods that aim at revealing hidden relation- ships and regularities in large sets of data. Of particular importance is the class of data mining problems concerned with discovering frequently occurring patterns in sequential data. We propose a versatile nonparametric technique for representing multidimensional data by encoding it in alphabets that are defined by an analyst user. The encoded data is explored by means of patterns, which are composed of operators and letters from these alphabets. Since patterns are regular expressions, they can be automatically manipulated, combined, and evaluated. These operations lie at the heart of our genetic algorithm (GA), which evolves patterns in order to breed ever better descriptors and predictors of the data. A concise and flexible pattern description language is, therefore, a powerful tool for data mining that serves two purposes: on the one hand, as a language in which theories concerned with the underlying data generating process are formulated and tested and, on the other, as a forecasting instrument. The present approach shows its special strength when dealing with multidimensional data that can be analyzed under multiple criteria and/or characterized by several indicators. Usually, each criterion (indicator) forms the base of an alphabet. Preprocessing the data by encoding it in alphabets ensures that the search for patterns unfolds efficiently. Furthermore, the possibility to design data-specific alphabets makes the method applicable not only to highly diverse record sets but also allows each researcher to analyze the (same) data with an idiosyncratic language. We stress here an important departure from the more traditional techniques of forecasting complex systems. Many methods, like kernel regression, neural networks, or reinforcement learning (see [1] for recent.
Website: www.complex-systems.com | Filesize: -
No of Page(s): 15
Download Genetic Algorithm Search for Predictive Patterns ... - Complex Systems.pdf
 
No comments:
Post a Comment