Saturday, October 20, 2012

Clustering algorithm for determining community structure

We propose a novel algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization, however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared our algorithm outperforms the fast

algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices. 1. INTRODUCTION Clustering plays a key role in the analysis and explo- ration of data. In short, clustering is the method by which meaningful clusters, or groups, within collections of data are created. These clusters are intended to group individuals - or samples - who are similar to each other so that the hidden structure within the collection of data is revealed, resulting in a valuable acquisition of knowl- edge. Data-mining and machine learning are disciplines that extensively work with clustering, in particular, with datasets composed by individuals and attributes. The aim is to identify groups of individuals which are similar based on their attributes. However, thanks to the recent collective effort on analyzing and compiling very large networks, there is a growing interest in methods based on the structure - topology - of the networks rather than on the individuals’ attributes. This new method for clustering is possible thanks to the characterization of many systems as networks. De- spite the very different nature of modelled systems (the Web [1], sexual relations [2], scientific collaboration[3, 4], protein interactions [5], the Internet [6], phone calls [7]) they do exhibit a non-trivial pattern of interactions. One of the regularities found in complex networks [8, 9] is the high cliquishness of the network [10], which leads to the fact that there are groups of vertices that are very in- terconnected among them with few interactions outside the groups. Therefore, there is an implicit community structure within complex networks. Girvan and Newman [11] proposed an algorithm to ex- tract the community structure from complex networks that has become one of the most used among the re- searchers in this community. From that important work a branch of research on complex networks has turned into clustering algorithms able to discover the commu- ∗Electronic address: jmpujol@lsi.upc.edu nity structure in those networks. To evaluate the ac- curacy - or quality - of a community structure yielded by a clustering algorithm Newman and Girvan devised a quantitative measure called modularity Q. Although there are other quantitative measures [12], modularity is widely accepted in the physics community. Q is defined in [13] as Q...

Website: www.lsi.upc.edu | Filesize: -
No of Page(s): 10
Download Clustering algorithm for determining community structure in large ....pdf

No comments:

Post a Comment