The discovery and analysis of community structure in networks is a topic of considerable recent interest within the physics community, but most methods proposed so far are unsuitable for very large networks because of their computational cost. Here we present a hierarchical agglomeration algorithm for detecting community structure which is faster than many competing algorithms: its running time on a network with n
vertices and m edges is O(mdlog n) where d is the depth of the dendrogram describing the community structure. Many real-world networks are sparse and hierarchical, with m n and d log n, in which case our algorithm runs in essentially linear time, O(nlog2 n). As an example of the application of this algorithm we use it to analyze a network of items for sale on the web-site of a large online retailer, items in the network being linked if they are frequently purchased by the same buyer. The network has more than 400 000 vertices and 2 million edges. We show that our algorithm can extract meaningful communities from this network, revealing large-scale patterns present in the purchasing habits of customers. I. INTRODUCTION Many systems of current interest to the scienti c com- munity can usefully be represented as networks [1{4]. Examples include the Internet [5] and the world-wide web [6, 7], social networks [8], citation networks [9, 10], food webs [11], and biochemical networks [12, 13]. Each of these networks consists of a set of nodes or vertices representing, for instance, computers or routers on the Internet or people in a social network, connected together by links or edges, representing data connections between computers, friendships between people, and so forth. One network feature that has been emphasized in recent work is community structure, the gathering of vertices into groups such that there is a higher den- sity of edges within groups than between them [14]. The problem of detecting such communities within net- works has been well studied. Early approaches such as the Kernighan{Lin algorithm [15], spectral partition- ing [16, 17], or hierarchical clustering [18] work well for speci c types of problems (particularly graph bisection or problems with well de ned vertex similarity measures), but perform poorly in more general cases [19]. To combat this problem a number of new algorithms have been proposed in recent years. Girvan and New- man [20, 21] proposed a divisive algorithm that uses edge betweenness as a metric to identify the boundaries of communities. This algorithm has been applied suc- cessfully to a variety of networks, including networks of email messages, human and animal social networks, net- works of collaborations between scientists and musicians, metabolic networks and gene networks [20, 22{30]. How- ever, as noted in [21], the algorithm makes heavy de- mands on computational resources, running in O(m2n) time on...
Website: www.ece.unm.edu | Filesize: -
No of Page(s): 6
Download Finding community structure in very large networks.pdf
No comments:
Post a Comment