Engineering a Topological Sorting Algorithm for Massive Graphs Deepak Ajwani∗ Adan Cosgaya-Lozano† Norbert Zeh‡ Abstract We present an I/O-efficient algorithm for topologically sorting directed acyclic graphs (DAGs). No provably I/O-efficient algorithm for this problem is known. Sim- ilarly, the performance of our algorithm, which we call IterTS, may be poor in the worst case. However, our experiments show that IterTS achieves good perfor- mance in practise. The strategy of IterTS can be summarized as follows. We call an edge satisfied
if its tail has a smaller number than its head. A numbering satisfying at least half the edges in the DAG is easy to find: a random numbering is expected to have this property. IterTS starts with such a numbering and then iteratively corrects the numbering to satisfy more and more edges until all edges are satisfied. To evaluate IterTS, we compared its running time to those of three competitors: PeelTS, an I/O-efficient implementation of the standard strategy of iteratively removing sources and sinks; ReachTS, an I/O-efficient implementation of a recent parallel divide-and-conquer algorithm based on reachability queries; and SeTS, standard DFS-based topological sorting built on top of a semi-external DFS algorithm. In our evaluation on various types of input graphs, IterTS consistently outperformed PeelTS and ReachTS, by at least an order of magnitude in most cases. SeTS outperformed IterTSonmost graphswhosevertexsetsfit in memory. However, IterTS often came close to the running time of SeTS on these inputs and, more importantly, SeTS was not able to process graphs whose vertex sets were beyond the size of main memory, while IterTS was able to process such inputs efficiently. ∗MADALGO Center for Massive Data Algorithmics, Depart- ment of Computer Science, Aarhus University, Aarhus, Denmark. Email: ajwani@madalgo.au.dk. Research was supported in part by the Danish National Research Foundation. Travel to the con- ference was supported by IRCSET/IBM. †Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada. Email: acosgaya@cs.dal.ca. Supported in part by NSERC. ‡Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada. Email: nzeh@cs.dal.ca. This research was supported in part by NSERC and the Canada Research Chairs programme. 1 Introduction Let G = (V,E) be a directed acyclic graph (DAG) with n := |V| vertices and m := |E| edges. Topological sorting is the problem of finding a linear ordering of the vertices in V such that the tail of each edge in E precedes its head in the ordering. Linear-time algorithms for this problem are covered in standard undergraduate texts, as topological sorting captures the problem of finding a linear order of items or activities consistent with a set of pairwise ordering constraints, which arises in a number of applications. The problem of topologically sorting large DAGs arises, for example, in the application of recent multiple sequence alignment algorithms[20,21]to largecollectionsofDNA sequences. Topologically sorting large DAGs is also an impor- tant building block for other I/O-efficient algorithms, mostly due to a technique called time-forward...
Website: www.siam.org | Filesize: -
No of Page(s): 12
Download Engineering a Topological Sorting Algorithm for Massive ... - SIAM.pdf
No comments:
Post a Comment