Abstract In this paper, we propose QUANTMINER, a mining quantitative association rules system. This system is based on a genetic algorithm that dynamically discovers “good” intervals in association rules by optimizing both the support and the confidence. The experiments on real and artificial databases have shown the usefulness of QUANTMINER as an interactive data mining tool. Keywords: Association rules, quantitative (numeric) attributes, unsupervised discretization, genetic algorithm 1 Introduction Mining association rules that aims at finding
strong relations between attributes has already long been studied (see for in- stance, [Agrawal et al., 1993]). In that context, an association rule is an expression C 1 ⇒C 2 , where C 1 and C 2 express conditions on the attributes describing the objects. The strength of such a rule is usually evaluated by means of statistical measures, as for instance the support and the confidence, defined as follows: - Support(C), where C expresses conditions on attributes, is the number of tuples in the database that satisfy C. - Support(C 1 ⇒C 2 )=Support(C 1 ∧C 2 ) - Confidence(C 1 ⇒C 2 )=Support(C 1 ∧C 2 )/Support(C 1 ) Given two thresholds MinSupp and MinConf, a rule is strong, when its support is greater than MinSupp and its confidence greater than MinConf. Discovering strong rules is usually a two-step process. The first step consists in mining the set of all frequent itemsets w.r.t. MinSupp. The second step relies on this set to discover strong rules w.r.t. MinConf. Mining association rules on both categorical and numeric at- tributes, also called quantitative association rules, has been less studied. Basically, this task involves several problems: First, numeric attributes are usually defined on a wide range of different values. It is useless to work on all possible numeric values, as done for categorical values, because in most cases, a given numeric value will not appear frequently. A classical way to deal with numeric attributes is to divide their domains into intervals. This is called discretization. But, as already pointed out in [Srikant and Agrawal, 1996], choosing intervals for numeric attributes is quite sen- sitive to the support and the confidence measures: small intervals lead to low supports, thus for a rule A ⇒ B increasing B leads to a higher confidence whereas increasing A can either increase or decrease the confidence. A tradeoff has to be found to discretize “correctly” numeric attributes with respect to MinSupp and MinConf. For instance Weight ∈ [50, 70] can be frequent, but the rule Weight ∈ [55, 65] → Height ∈ [1.60, 1.70] may have a higher confidence than the rule Weight ∈ [50, 70] → Height ∈ [1.60, 1.70]. This shows that it is difficult to find a “good” discretization of numeric attributes, considering them individually. As a consequence, the classical two-step process for mining association rules, namely...
Website: www.eecs.wsu.edu | Filesize: -
No of Page(s): 6
Download QuantMiner: A Genetic Algorithm for Mining Quantitative Association ....pdf
No comments:
Post a Comment