Wednesday, October 17, 2012

An Algorithm to Find Overlapping Community Structure in Networks

Recent years have seen the development of many graph clustering algorithms, which can identify community structure in networks. The vast ma- jority of these only find disjoint communities, but in many real-world networks communities overlap to some extent. We present a new algorithm for discovering overlapping communities in networks, by extending Girvan and Newman’s well-known algorithm based on the betweenness centrality measure. Like the original algorithm, ours performs hierarchical clustering — partitioning a network into any desired number of clusters — but allows them to overlap. Experiments confirm good performance on randomly generated networks

based on a known overlapping community structure, and interesting results have also been obtained on a range of real-world networks. 1 Introduction and Motivation Many complex systems in the real world can be represented abstractly as networks (or graphs). Recently, with increasing availability of data about large networks and the need to understand them, the study of networks has become an important topic. A property that has been extensively studied is the existence of community structure in networks. A cluster (or community or module) is a subgraph such that the density of edges within it (intracluster edges) is greater than the density of edges between its vertices and those outside it (intercluster edges). A wide range of algorithms have been developed to discover communities in a network, including [4, 6, 11, 12, 13, 14]. Probably the best-known algorithm for finding community structure is Girvan and Newman’s algorithm [6, 14], based on the betweenness centrality measure [5]. The betweenness (strictly, the shortest-path betweenness) of edge e, cB(e), is defined as the number of shortest paths, between all pairs of vertices, that pass along e. A high be- tweenness means that the edge acts as a bottleneck between a large number of vertex pairs and suggests that it is an intercluster edge. Although the algorithm is quite slow and is no longer the most effective clustering algorithm, it does give relatively good results. The algorithm works as follows: 1. Calculate edge betweenness of all edges in network. 2. Find edge with highest edge betweenness and remove it. 3. Recalculate edge betweenness for all remaining edges. 4. Repeat from step 2 until no edges remain. This is a hierarchical, divisive, clustering algorithm. Initially, the n-vertex network (if connected) forms a single cluster. After one or more iterations, removing an edge (step 2) causes the network to split into two components (clusters). As further edges are removed, each component again splits, until n singleton clusters remain. The result is a dendrogram: a binary tree in which the distance of nodes from the root shows the order in which clusters were split. A cross-section of the dendrogram at any level represents a division of the network into any desired number of clusters. In step 3, edge betweenness need not be recalculated for the whole network, but only for the component containing the edge removed in step 2, or for both components if removing...

Website: www.cs.bris.ac.uk | Filesize: -
No of Page(s): 12
Download An Algorithm to Find Overlapping Community Structure in Networks.pdf

No comments:

Post a Comment