Thursday, October 18, 2012

A Fast and Efficient Algorithm to Identify Clusters

A characteristic feature of many relevant real life networks, like the WWW, Internet, transportation and communication networks, or even biological and social networks, is their clustering structure. We discuss in this paper a novel algorithm to identify clusters -sets of densely interconnected nodes- in a network. The algorithm is based on local information and therefore it is very fast with respect other proposed methods, while it keeps a similar performance in detecting the clusters. Key words: graphs, clusters, networks, complex systems 1. Introduction Many real life networks like the WWW, Internet, transportation and com- munication networks,

or even biological and social networks have a strong clustering structure (they contain groups of vertices which are highly in- terconnected -vertices have many mutual neighbors-). Here we consider the a36Research supported by the Ministerio de Educaci´on y Ciencia, Spain, and the Eu- ropean Regional Development Fund under project TEC2005-03575 and by the Catalan Research Council under project 2005SGR00256. ∗Corresponding author. Tel. +34 934 134 109 Fax +34 934 137 007 Email addresses: comellas@ma4.upc.edu (Francesc Comellas), almirall@ma4.upc.edu (Alicia Miralles) URL: http://www-ma4.upc.edu/ comellas/ (Francesc Comellas) 1Avda. Canal Ol´ımpic s/n, 08860, Castelldefels, Catalonia, Spain Preprint submitted to Applied Mathematics and Computation August 2, 2008 notion of cluster in a general way. Therefore, depending on the context, it can be synonymous of community, class, module, etc. The problem of detecting clusters in a given network is an important issue in social studies, biological (epidemiology, ecological webs, metabolic), computer science (WWW, Inter- net, distributed systems, cluster computing). Clusters are also interesting as they reflect hierarchical aspects and are related to classification issues for information retrieval. Clusters play also an important role when executing most communication algorithms and should be considered to improve their performance. The construction of efficient and fast algorithms for the identification of the clustering structure in a generic network is a nontrivial task. The first problem is the non existence of a precise definition of cluster. Intuitively, a network can be said to have cluster structure if it consists of subsets of nodes, with many connections among the same subset, but few links between subsets, see, for example, [1, 2]. Algorithms to detect these subsets have ap- peared in the literature and they can be classified in two main groups (see the above two references for more details): hierarchical clustering methods (also known as agglomerative), which consist of generating a tree (dendro- gram) from a complete graph with as many vertices as the original network and where each edge has a weight measuring how close are the corresponding vertices. Starting from the set of all vertices with no edges between them, edges are iteratively added between pairs of vertices in order of decreasing weight. From the tree one can then infer the different clusters. To obtain the weights, some algorithms consider the spectrum of the adjacency matrix of the graph representing the network. The other class of algorithms are called divisive. From the whole graph, by iteratively cutting the edges, one...

Website: upcommons.upc.edu | Filesize: -
No of Page(s): 15
Download A Fast and Efficient Algorithm to Identify Clusters in ... - UPCommons.pdf

No comments:

Post a Comment