Wednesday, October 17, 2012

Identifying and evaluating community structure in complex networks

Scalability different to community comp d with problems arising Recently, datasets that can networks, have received (Girvan interactions 2002 and many Of course this method is infeasible for networks larger than a handful of nodes, prompting the development of automated detec- tion techniques. The formulation of an algorithm and, more impor- tantly, the validation of its output requires a more concise definition of a community. Newman and Girvan (2004) were among the first to address this issue and proposed modularity to quantify the strength of community structure. This metric, based on the intuition that nodes within the same community should be more tightly connected than they would be by chance, has been adopted for a variety of uses including the

validation and compar- ison of community structures (Newman and Girvan, 2004; Pons and Latapy, 2006), but also as an objective function for optimiza- tion algorithms to identify communities (Clauset et al., 2004; Do- are often confounded by large networks and become fragile as datasets approach 10 5 –10 6 (or more) nodes. We believe that in or- der to achieve this level of scalability, a method much simpler than an optimization algorithm must be employed. To this end, we out- line an intuitive approach to community detection based on random walks and compare it to several published algorithms using a variety of metrics. Our experimental results show that this simple method is as good or better at discovering the true communities than other more complex algorithms. Finally, we discuss several possible exten- sions to the approach, and demonstrate its scalability on a network of over 1 million nodes, where other methods falter. * Corresponding author. Tel.: +1 574 631 8716; fax: +1 574 631 9260. Pattern Recognition Letters xxx (2009) xxx–xxx Contents lists available nition ARTICLE IN PRESS E-mail address: nchawla@nd.edu (N.V. Chawla). increasing availability of rich network data, there is also a need for effective and efficient analysis methods. One problem of great interest for pattern recognition in com- plex networks is community detection, or the unsupervised discov- ery of densely connected subgroups which are known to exist in many real-world networks. On the surface the concept of commu- nities appears intuitive, and if properly arranged their structure can be identified by visual inspection as illustrated in Fig. 1. ularity does not necessarily coincide with the correct division of the network; in this case algorithms that maximize modularity con- verge on a suboptimal solution, that is, miss the discovery of the actual and meaningful communities. We demonstrate this using a variety of metrics on diverse datasets for which the actual communi- ties are known as ground truth. Another issue with community detection algorithms is their computational complexity. These methods, rooted in graph theory, Evaluation metrics 1. Introduction Modern data mining is often confronte from complex relationships in data. be represented as graphs, or interaction considerable attention in various domains. clude the analysis of social networks Wasserman and Faust, 1994), chemical teins (Asur et al., 2007; Enright et al., and services (Clauset et al., 2004), 0167-8655/$ - see front matter C211 2009 Elsevier B.V. All doi:10.1016/j.patrec.2009.11.001 Please cite this article...

Website: icensa.nd.edu | Filesize: -
No of Page(s): 9
Download Identifying and evaluating community structure in complex networks.pdf

No comments:

Post a Comment