Tuesday, October 2, 2012

A Hierarchical Clustering Algorithm Based on the Hungarian Method

A Hierarchical Clustering Algorithm Based on the Hungarian Method Jacob Goldberger School of Engineering Bar-Ilan University, Israel Tamir Tassa Division of Computer Science The Open University, Israel Abstract We propose a novel hierarchical clustering algorithm for data-sets in which only pairwise distances between the points are provided. The classical Hungarian method is an efficient algorithm for solving the problem of minimal-weight cycle cover. We utilize the Hungarian method as the basic building block of our clustering algorithm. The disjoint

cycles, produced by the Hungarian method, are viewed as a partition of the data-set. The clustering algorithm is formed by hierarchical merging. The proposed algorithm can handle data that is arranged in non-convex sets. The number of the clusters is automatically found as part of the clustering process. We report an improved performance of our algorithm in a variety of ex- amples and compare it to the spectral clustering algorithm. 1 Introduction Automatic grouping of objects into clusters is one of the fundamental problems in machine learning and in other fields of study. In many approaches, the first step toward clustering the data-set is extracting a feature vector from each object. The clustering is then performed by applying various clustering algorithms on these feature vectors. Two commonly used methods are K-means and applying the EM algorithm to learn a Mixture of Gaussians density (see e.g. [1]). Although these iterative methods may suffer from the problem of local optima, they provide high quality results when the data is organized in convex sets (blobs). This situation is characterized by the fact that the members of the same cluster are all similar to a single point which is the cluster centroid and they are all far apart from centers of other clusters. When the data is arranged in non-convex sets (e.g. concentric circles) these algorithms tend to fail. In such cases, two points are in the same cluster, even tough they are far apart, if there is a path of locally similar points that connects them. To find clusters in such cases, other approaches, essentially global, are required. Another problem with the above mentioned methods is that they require explicit feature-vector representation of the objects. However, sometimes such an information is not available and, instead, only pairwise similarity information is provided. An alternative clustering approach that attracted much attention in recent years is spectral clustering [10, 7, 13]. The spectral clustering algorithm can handle non- 1 convex arrangements of clusters and it is based on pairwise similarity information. It first obtains a low-dimensional representation of the data and then uses the K-means algorithm to carry out the clustering. More explicitly, spectral clustering proceeds as follows. First, it translates the pairwise distance information, di;j, into an affinity matrix W whose entries are given by Wi;j = ‰ e¡d i;j=2 2 i 6= j; 0 i = j; here, is a scaling factor. Letting D...

Website: www.openu.ac.il | Filesize: -
No of Page(s): 15
Download A Hierarchical Clustering Algorithm Based on the Hungarian Method ....pdf

No comments:

Post a Comment