The increasing availability of interaction graphs re- quires new resource-efficient tools capable of extracting valuable biological knowledge from these networks. In this paper we report on a novel parallel implementation of Girvan and Newman’s clustering algorithm that is capable of running on clusters of computers. Our parallel implementation achieves almost linear speed-up up to 32 processors and allows us to run this computationally intensive algorithm on large protein-protein interaction networks. Preliminary experiments show that the algorithm has very high accuracy in
identifying functional related protein modules. Software will be made available in the public domain at http://www.cs.ucr.edu/˜qyang/ 1 Introduction Recent advances in proteomics such as yeast two- hybrid, phage display and mass spectrometry have resulted in several genome-scale protein-protein interaction (PPI) map projects. The identification of functional related proteins is among the most urgent computational challenges facing the scientific community. In the literature, the problem has been approached by analyzing the topological properties of interaction networks (see, e.g., [2, 11]) or by comparing networks from several model organisms (see, for example, [8, 9, 12]). In [2], Bader and Hogue described a graph theoretic clustering algorithm for finding potential protein complexes in large PPI networks. The method is based on vertex weighting by local neighborhood density and out- ward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. Rives and Galitski [11] studied modular organization of cellular networks by developing a network clustering method. An all-pairs shortest path distance matrix is constructed and transformed into an association matrix with each entry defined as a0a2a1a2a3a5a4 , where a3 is the shortest path distance between two vertices in the net- work. Hierarchical agglomerative average-linkage clustering with the uncentered correlation coefficient as the distance metric is then applied to the association matrix. With the availability of PPI networks from several model organisms, recent research on identifying functional related proteins has focused comparative approaches. Kelley et. al. [8] have employed ideas from sequence alignment and applied them on PPI networks of budding yeast S.cerevisiae and bacterial pathogen H. pylori. Conserved pathways are identified by global aligning two PPI networks. To perform the alignment, the two networks are combined into a global alignment graph in which each vertex represents a pair of proteins from each network which are similar at sequence level and each edge represents a conserved interaction, gap, or mismatch. A log probability score is formulated over vertices and edges of a path in the global alignment graph. The highest-scoring path is found by dynamic programming after decomposing the global alignment graph to acyclic graphs. A similar approach was adopted by Sharan et. al. [12] to identify protein complexes by comparing two PPI networks. An orthology graph, similar to global alignment graph, is constructed where edges are assigned weights so that high weighted sub- graphs correspond to conserved...
Website: www.cs.ucr.edu | Filesize: -
No of Page(s): 4
Download A Parallel Algorithm for Clustering Protein-Protein ... - CiteSeer.pdf
No comments:
Post a Comment