Tuesday, October 16, 2012

Using Complex Network Features for Fast Clustering in the Web

Applying graph clustering algorithms in real world networks needs to overcome two main challenges: the lack of prior knowledge and the scalability issue. This paper proposes a novel method based on the topological features of complex networks to optimize the clustering algorithms in real-world networks. More specifically, the features are used for parameter

estimation and performance optimization. The proposed method is evaluated on real-world networks extracted from the web. Experimental results show improvement both in terms of Adjusted Rand index values as well as runtime efficiency. Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval - Clustering; General Terms Algorithms, Performance, Experimentation Keywords Graph clustering, complex networks, parameter estimation. 1. INTRODUCTION Graph clustering is an important technology in social network analysis. With the rapid development of the Internet, network applications must deal with very large networks, often comprising millions of nodes. How to find a good tradeoff between speed and accuracy becomes an open challenge. Furthermore, how to discover a priori knowledge of real-world networks to determine parameters is crucial to the performance of clustering algorithms. Most real-world networks are complex networks which have some topological features that are not apparent in simple networks. These topological features reveal additional information of the communities in complex networks. For instance, the small world [1] hypothesis suggests that the diameter of the complex networks is small; the high clustering coefficient [2] suggests that most of the nodes are only connected to some neighbors in the same cluster; the low diameter mainly depends on a few “weak long ties” among the clusters. Scale-free feature [3] indicates that the distribution of degrees follow a power-law to suggests that a few active nodes consume a lot of edges, and other common nodes only consume very few edges. From these works in complex networks we can deduce two hypothesis: (1) most of the common nodes are connected to its neighbors to form some clusters towards the so called active nodes, and (2) a few weak ties among clusters make greater contribution to network connections. In this paper, we employ the hypothesis to optimize the graph clustering algorithms for large-scale complex networks such as link-based data in the web. The first hypothesis is also used to determine clustering parameters. Two well-known algorithms are used to evaluate the performance of the proposed methods. The selected algorithms include the k-medoids algorithm [4] and the Girvan-Newman algorithm [5], which are widely studied in social networks analysis. Experiments show that the performance of clustering algorithms is improved by approximating the shortest paths algorithm based on the hypothesis. 2. COMPLEX NETWORK CLUSTERING Many graph-clustering algorithms are time-consuming because of the bottleneck of all-pairs shortest paths problem (APSP). Since the aforementioned hypothesis says that the...

Website: www2011india.com | Filesize: -
No of Page(s): 2
Download Using Complex Network Features for Fast Clustering in the Web.pdf

No comments:

Post a Comment