Wednesday, October 17, 2012

Network Structure of the Web

Network Structure of the Web Part 2 Web Surfing as a Markov Chain Process, continued • Adjacency Matrix A: – If there is a hyperlink from page i to page j, then Aij = 1, otherwise Aij = 0. • Transition Matrix P: – If a row of A has no 1s (i.e., no out-links), then insert 1/N for each element in that row in P (uniform teleporting probability) – Otherwise, divide each 1 in the row in A by the number of 1s in its row. (uniform probability of going to out-link) – Multiply the resulting matrix by (1- α) (probability of going to that linked page by not teleporting) – Add α /N to every entry

of the resulting matrix (probability of going to that each by teleporting) 6/3/2010 2 • Exercise 21.6: Consider the following web graph. What are the transition matrices for α = 0 and 0.5? 1 2 3 Computing PageRank, continued • Suppose alpha = 0.5. Let xt be the probability distribution over the states at time t. Suppose surfer starts in state 1. I.e., x0 = (1 0 0). After one time step, we have x1 = x0 P = (1/6 2/3 1/6) After two time steps, x2 = x1 P = (1/3 1/3 1/3). Keep going. Finally reach steady state of (5/18 4/9 5/18). [Show this is a steady state] 1 2 3           = 6/13/26/1 12/56/112/5 6/13/26/1 P 6/3/2010 3 Questions • What is the minimum possible PageRank of a page? • How does varying α affect PageRank? From http://www.geek.com/articles/chips/googles-pagerank-algorithm-traced- back-to-the-1940s-20100217/ Earlier forerunner to PageRank in the work of the Harvard economist Wassily Leontief: “In 1941, Leontief published a paper in which he divides a country's economy into sectors that both supply and receive resources from each other, although not in equal measure. One important question is: what is the value of each sector when they are so tightly integrated? Leontief's answer was to develop an iterative method of valuing each sector based on the importance of the sectors that supply it. Sound familiar? In 1973, Leontief was awarded the Nobel Prize in economics for this work.” 6/3/2010 4 Other Uses for Page Rank • Ranking journal impact (nodes are journals, links are citations in articles in one journal to articles in the other journal -- e.g., see http://www.eigenfactor.org) • Ranking doctoral programs (departments are nodes, one node links to another if it hires faculty from that dept.) • Food webs – species that are essential to an ecosystem 6/3/2010 5 “Here we show that an algorithm adapted from the one Google uses to rank web-pages can order species according to their importance for coextinctions, providing the sequence of losses that results in the fastest collapse of the network.” Hubs and Authorities (HITS Algorithm) • Proposed by Jon Kleinberg (Cornell) at same time Brin and Page were developing PageRank • HITS: “Hyperlinked-induced topic search” • Supposedly used by Teoma and Ask.com 6/3/2010 6 Hubs and Authorities Main ideas Each node has a hub score and an authority score Hub: Web...

Website: web.cecs.pdx.edu | Filesize: -
No of Page(s): 22
Download Network Structure of the Web.pdf

No comments:

Post a Comment