Friday, October 19, 2012

Community Detection and Author Disambiguation

Research is never carried out in isolation. Scholars build on each other’s findings, work together on problems, work on problems in parallel, and often verify each other’s results. The success of the academic enterprise is dependent on all its participants communicating their ideas, techniques, and findings with each other and with the public. Given the public importance of scholarly research, we believe that more attention must be given to understanding the dynamics of scholarly communications and to address problems facing the scholarly community. In this project, we seek to explore and understand the types

of communities that form within a specific discipline, high energy physics. Although work has been done to detect communities in ‘science’, communities and community detection algorithms have not been carefully explored for specific sub-disciplines such as that of high energy physics phenomenology. We explore the applicability of different network detection algorithms and try to evaluate their effectiveness at detecting ‘meaningful’ communities. Our approach, however, is not simply aimed at detecting meaningful communities for the sake of identifying them. We propose a novel approach to use the detected communities as a tool for disambiguating author names. Most bibliographic databases contain author names, but they rarely record a unique identifier for each individual. Since multiple people can share the same name, even within a single discipline, there is a need to develop algorithms that can computationally determine which papers are authored by the same individual. By determining which duplicate names are different individuals, it will be possible to show readers work by the same author, provide a means for authors to track their publications, and enable universities to better determine their scholarly output. Identifying individuals and correctly attributing bibliographic records to real people is an important problem for academics, academic institutions, and funding/granting agencies alike. From their different perspectives, each of these groups has an interest in identifying their scholarly output for evaluation and understanding of their academic impact. When looking for citation impact in existing databases, scholars want to identify papers they have written, universities want to identify papers written by their professors, and granting agencies want to identify the work of their grantees. Through computer-aided author disambiguation, it will be possible to provide services to study the scholarly output of an individual or an institution, as well as provide important services for the discovery of research, such as listing works by the same author. 2 2. Related Work Author Disambiguation Author disambiguation is a special case of entity resolution—discovering underlying entities and mapping database references to these entities (in our case, a specific person). Traditionally, entity resolution is done by looking at pair-wise comparisons of attributes for each record and resolving the entity based on attribute similarity. Famously, Bhattacharya & Getoor (2007) extend the traditional model by using relational data to aid in resolution. There have been many other proposed methods of author disambiguation in recent literature. Some methods rely on building statistical models, such as those in Zhang et...

Website: snap.stanford.edu | Filesize: -
No of Page(s): 15
Download Community Detection and Author Disambiguation in a High ... - SNAP.pdf

No comments:

Post a Comment