Focus

Distributed computation of PageRank

On its website, Google describes PageRank as follows: "PageRank reflects our view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that are considered important receive a higher PageRank and are more likely to appear at the top of the search results."

The PageRank algorithm was introduced by the cofounders of Google in the late 1990s and has been implemented on the search engine of Google from the time of its launching. It continues to be part of the search engine at Google and is said to be one of the 200 factors used for narrowing down the search results. PageRank indicates the importance of a web page determined by the hyperlink structure of the web. Specifically, it is determined by the number of hyperlinks pointing to the page as well as the importance of the pages where those hyperlinks originate. Related ideas for ranking objects had been previously used in other contexts, such as sociometry and bibliometrics.

The PageRank problem recently attracted the attention of the systems and control community. In this context, a randomized decentralized approach based on a so-called "gossip" communication protocol has been developed at IEIIT. Such an approach is meaningful and necessary in view of the computational difficulties due to the size of the web (currently consisting of more than 8 billion pages) and of the available computational resources. The convergence properties of the gossip communication protocol and the fluctuations of the PageRank values in the presence of fragile and uncertain links have been analyzed. Furthermore, other distributed algorithms, based on web aggregation techniques with intermittent communication in the randomized computation, for solving huge-scale problems with underlying sparse network structures have been proposed.

We believe that the results that have been obtained are of great interest for other "big data" problems, and algorithms derived from PageRank can be successfully extended to rank different objects in order of importance, such as scientific papers linked by citations, authors related by co-authorship, proteins in system biology and professional athletes. Therefore, the PageRank problem is attracting the attention of many researchers working in a diverse set of fields, such as systems and control, computer engineering, communications, physics, numerical analysis, linear algebra, and graph theory.

For further details see http://goo.gl/F8Gm0m