Genetic algorithms (GAs) play a major role in many artificial-life systems,but there is often little detailed understanding of why the GA performs as it does, and little theoretical basis on which to characterize the types of fitness landscapes that lead to successful GA performance. In this paper we propose a strategy for addressing these issues. Our strategy consists of defining a set of features of fitness landscapes that are particular lyrel- evant to the GA, and experimentally study-
ing how various configurations of these features affect the GA’s performance along a number of dimensions. In this paper we informally describe an initial set of proposed feature classes, describe in detail one such class (Royal Road" functions), and present some initial experimental results concerning the role of crossover and building blocks"on landscapes constructed from features of this class. 1 Introduction Evolutionary processes are central to our understanding of natural living systems, and will play an equally central role in attempts to create and study artificial life. Genetic algorithms (GAs) [13, 9] are an idealized computational model of Darwinian evolution based on the principles of genetic variation and natural selection. GAs have been employed in many artificial-life systems as a means of evolving artificial organisms, simulating ecologies, and modeling population evolution. In these and other applications, the GA’s task ⁄In Toward a Practice of Autonomous Systems: Proceedings of the First European Conference on Artificial Life Cambridge, MA: MIT Press, 1992. is to search a fitness landscape for high values (where fitness can be either explicitly or implicitly defined), and GAs have been demonstrated to be efficient and powerful search techniques for a range of such problems (e.g., there are several examples in [19]). How- ever, the details of how the GA goes about searching a given landscape are not well understood. Consequently, there is little general understanding of what makes a problem hard or easy for a GA, and in particular, of the effects of various landscape features on the GA’s performance. In this paper we propose some new methods for addressing these fundamental issues concerning GAs, and present some initial experimental results. Our strategy involves defining a set of landscape features that are of particular relevance to GAs, constructing classes of landscapes containing these features in varying degrees, and studying in detail the effects of these features on the GA’s behavior. The idea is that this strategy will lead to a better understanding of how the GA works, and a better ability to predict the GA’s likely performance on a given landscape. Such long- term results would be of great importance to all researchers who use GAs in their models; we hope that they will also shed light on natural evolutionary systems. To date, several properties of fitness landscapes have been identified that can make the search for high-fitness values easy or hard for the GA. These include deception, sampling error, and the number of local optima in the landscape...
Website: web.cecs.pdx.edu | Filesize: -
No of Page(s): 11
Download The Royal Road for Genetic Algorithms: Fitness Landscapes and GA ....pdf
Wednesday, October 31, 2012
Genetic Algorithm Search for Predictive Patterns
Based on an algorithm for pattern matching in character strings, a pat- tern matching machine is implemented that searches for occurrences of patterns in multidimensional time series. Before the search process takes place, time series data is encoded in user-designed alphabets. The patterns, on the other hand, are formulated as regular expressions that are composed of letters from these alphabets and operators. Furthermore, a genetic algorithm is developed to breed patterns that maximize a user- defined fitness function. In an application to financial data, it is shown that patterns
bred to predict high exchange rates volatility in training samples retain statistically significant predictive power in validation samples. 1. Introduction This work is a contribution to the rapidly developing research area of data mining, a host of methods that aim at revealing hidden relation- ships and regularities in large sets of data. Of particular importance is the class of data mining problems concerned with discovering frequently occurring patterns in sequential data. We propose a versatile nonparametric technique for representing multidimensional data by encoding it in alphabets that are defined by an analyst user. The encoded data is explored by means of patterns, which are composed of operators and letters from these alphabets. Since patterns are regular expressions, they can be automatically manipulated, combined, and evaluated. These operations lie at the heart of our genetic algorithm (GA), which evolves patterns in order to breed ever better descriptors and predictors of the data. A concise and flexible pattern description language is, therefore, a powerful tool for data mining that serves two purposes: on the one hand, as a language in which theories concerned with the underlying data generating process are formulated and tested and, on the other, as a forecasting instrument. The present approach shows its special strength when dealing with multidimensional data that can be analyzed under multiple criteria and/or characterized by several indicators. Usually, each criterion (indicator) forms the base of an alphabet. Preprocessing the data by encoding it in alphabets ensures that the search for patterns unfolds efficiently. Furthermore, the possibility to design data-specific alphabets makes the method applicable not only to highly diverse record sets but also allows each researcher to analyze the (same) data with an idiosyncratic language. We stress here an important departure from the more traditional techniques of forecasting complex systems. Many methods, like kernel regression, neural networks, or reinforcement learning (see [1] for recent.
Website: www.complex-systems.com | Filesize: -
No of Page(s): 15
Download Genetic Algorithm Search for Predictive Patterns ... - Complex Systems.pdf
bred to predict high exchange rates volatility in training samples retain statistically significant predictive power in validation samples. 1. Introduction This work is a contribution to the rapidly developing research area of data mining, a host of methods that aim at revealing hidden relation- ships and regularities in large sets of data. Of particular importance is the class of data mining problems concerned with discovering frequently occurring patterns in sequential data. We propose a versatile nonparametric technique for representing multidimensional data by encoding it in alphabets that are defined by an analyst user. The encoded data is explored by means of patterns, which are composed of operators and letters from these alphabets. Since patterns are regular expressions, they can be automatically manipulated, combined, and evaluated. These operations lie at the heart of our genetic algorithm (GA), which evolves patterns in order to breed ever better descriptors and predictors of the data. A concise and flexible pattern description language is, therefore, a powerful tool for data mining that serves two purposes: on the one hand, as a language in which theories concerned with the underlying data generating process are formulated and tested and, on the other, as a forecasting instrument. The present approach shows its special strength when dealing with multidimensional data that can be analyzed under multiple criteria and/or characterized by several indicators. Usually, each criterion (indicator) forms the base of an alphabet. Preprocessing the data by encoding it in alphabets ensures that the search for patterns unfolds efficiently. Furthermore, the possibility to design data-specific alphabets makes the method applicable not only to highly diverse record sets but also allows each researcher to analyze the (same) data with an idiosyncratic language. We stress here an important departure from the more traditional techniques of forecasting complex systems. Many methods, like kernel regression, neural networks, or reinforcement learning (see [1] for recent.
Website: www.complex-systems.com | Filesize: -
No of Page(s): 15
Download Genetic Algorithm Search for Predictive Patterns ... - Complex Systems.pdf
Report - Distributed Computing Group
Many systems in the world can be usefully represented as networks. Examples are social networks, link structures in encyclopedias or the world wide web, or data structures in computer science. Embedding this networks into Euclidean space provides several advantages like visualization, fast distance calculations, a notion of direction, etc., and enables new applications. First, a generic and extensible framework, together with a graphical user interface, for embedding, analyzing, and visualizing graphs is presented.
It enables a simple way of de ning and plugging layout measurements to determine the quality of embeddings and be able to compare di erent embeddings. It helps nding the right con guration consisting of algorithm, parameters, and heuristics to obtain a adequate layout for a designated application. As another advantage, it provides a basic infrastructure for testing and evaluating new embedding ideas and algorithms. Second, a novel incremental embedding method, where new nodes are added to the existing lay- out, based on forecast is presented and evaluated with the framework. Based on our preliminary experiments, we evaluate and analyze strengths and shortcomings of the approach. Contents 1 Introduction 8 1.1 Embedding Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Part I: Framework 10 2.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Choosing a Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Extensions for JUNG . . . . . . . . . . . . . . . . . ....
Website: disco.ethz.ch | Filesize: -
No of Page(s): 45
Download Report - Distributed Computing Group.pdf
It enables a simple way of de ning and plugging layout measurements to determine the quality of embeddings and be able to compare di erent embeddings. It helps nding the right con guration consisting of algorithm, parameters, and heuristics to obtain a adequate layout for a designated application. As another advantage, it provides a basic infrastructure for testing and evaluating new embedding ideas and algorithms. Second, a novel incremental embedding method, where new nodes are added to the existing lay- out, based on forecast is presented and evaluated with the framework. Based on our preliminary experiments, we evaluate and analyze strengths and shortcomings of the approach. Contents 1 Introduction 8 1.1 Embedding Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Part I: Framework 10 2.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Choosing a Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Extensions for JUNG . . . . . . . . . . . . . . . . . ....
Website: disco.ethz.ch | Filesize: -
No of Page(s): 45
Download Report - Distributed Computing Group.pdf
Online Dynamic Graph Drawing
This paper presents an algorithm for drawing a sequence of graphs online. The algorithm strives to maintain the global structure of the graph and, thus, the user’s mental map while allowing arbitrary modifications between consecutive layouts. The algorithm works online and uses various execution culling methods in order to reduce the layout time and handle large dynamic graphs. Techniques for representing graphs on the GPU allow a speedup by a factor of up to 17 compared to the CPU implementation. The scalability of the algorithm across GPU generations is demonstrated. Applications of the algorithm to the visualization of discussion threads in Internet sites and to the visualization of social networks are
provided. Index Terms—Graph layout, GPU. Ç 1INTRODUCTION G RAPH drawing addresses the problem of constructing geometric representations of graphs [1]. It has applica- tions in a variety of areas, including software engineering, software visualization, databases, information systems, decision support systems, biology, chemistry, and social networks. Many applications require the ability of dynamic graph drawing, that is, the ability to modify the graph [1], [2], [3], as illustrated in Fig. 1. Sample applications include financial analysis, network visualization, security, social networks, and software visualization. The challenge in dynamic graph drawingisto computeanew layoutthatisboth aesthetically pleasing as it stands and fits well into the sequence of drawings of the evolving graph. Thelatter criterion has been termed preserving the mental map [4] or dynamic stability [2]. Most existing algorithms address the problem of offline dynamicgraphdrawing,wheretheentiresequenceofgraphs to be drawn is known in advance [3], [5], [6]. This gives the layout algorithm information about future changes in the graph, which allows it to optimize the layouts generated across the entire sequence. For instance, the algorithm can leaveplaceinordertoaccommodateanodethatappearslater inthesequence.Incontrast,verylittleresearchhasaddressed the problem of online dynamic graph drawing, where the graphsequencetobelaidoutisnotknowninadvance[7],[8]. This paper proposes an online algorithm for dynamic layout of graphs. It attempts to maintain the user’s mental map while computing fast layouts that take the global graph structure into account. The algorithm, which is based on force-directed layout techniques, controls the displacement of nodes according to the structure and changes performed on the graph. By taking special care in order to represent the graph in a GPU-efficient manner, the algorithm is able to make use of the GPU to significantly accelerate the layout. This paper makes the following contributions: First, a novel, efficient algorithm for online dynamic graph drawing is presented. It spends most of the execution time on the parts of the graph being modified. Second, it is shown how the heaviest part of the algorithm, performing force-directed layout, can be implemented in a manner suitable for executionontheGPU.Thisallowsustosignificantlyshorten the layout time. For example, the incremental drawing of a graph of 32,000 nodes takes 0.704 second per layout. Finally, two information visualization applications of the algorithm are presented. The first is the visualization of the evolution over time of discussion threads in Internet sites. In this application, illustrated in Fig. 1, nodes represent users, and edges represent messages sent between users in discussion forums. The second application is the visualization of the growth of a social network,...
Website: www.cse.ohio-state.edu | Filesize: -
No of Page(s): 14
Download Online Dynamic Graph Drawing - Computer Science & Engineering ....pdf
provided. Index Terms—Graph layout, GPU. Ç 1INTRODUCTION G RAPH drawing addresses the problem of constructing geometric representations of graphs [1]. It has applica- tions in a variety of areas, including software engineering, software visualization, databases, information systems, decision support systems, biology, chemistry, and social networks. Many applications require the ability of dynamic graph drawing, that is, the ability to modify the graph [1], [2], [3], as illustrated in Fig. 1. Sample applications include financial analysis, network visualization, security, social networks, and software visualization. The challenge in dynamic graph drawingisto computeanew layoutthatisboth aesthetically pleasing as it stands and fits well into the sequence of drawings of the evolving graph. Thelatter criterion has been termed preserving the mental map [4] or dynamic stability [2]. Most existing algorithms address the problem of offline dynamicgraphdrawing,wheretheentiresequenceofgraphs to be drawn is known in advance [3], [5], [6]. This gives the layout algorithm information about future changes in the graph, which allows it to optimize the layouts generated across the entire sequence. For instance, the algorithm can leaveplaceinordertoaccommodateanodethatappearslater inthesequence.Incontrast,verylittleresearchhasaddressed the problem of online dynamic graph drawing, where the graphsequencetobelaidoutisnotknowninadvance[7],[8]. This paper proposes an online algorithm for dynamic layout of graphs. It attempts to maintain the user’s mental map while computing fast layouts that take the global graph structure into account. The algorithm, which is based on force-directed layout techniques, controls the displacement of nodes according to the structure and changes performed on the graph. By taking special care in order to represent the graph in a GPU-efficient manner, the algorithm is able to make use of the GPU to significantly accelerate the layout. This paper makes the following contributions: First, a novel, efficient algorithm for online dynamic graph drawing is presented. It spends most of the execution time on the parts of the graph being modified. Second, it is shown how the heaviest part of the algorithm, performing force-directed layout, can be implemented in a manner suitable for executionontheGPU.Thisallowsustosignificantlyshorten the layout time. For example, the incremental drawing of a graph of 32,000 nodes takes 0.704 second per layout. Finally, two information visualization applications of the algorithm are presented. The first is the visualization of the evolution over time of discussion threads in Internet sites. In this application, illustrated in Fig. 1, nodes represent users, and edges represent messages sent between users in discussion forums. The second application is the visualization of the growth of a social network,...
Website: www.cse.ohio-state.edu | Filesize: -
No of Page(s): 14
Download Online Dynamic Graph Drawing - Computer Science & Engineering ....pdf
Visualization of Bibliographic Networks with a Reshaped Landscape
We describe a novel approach to visualize bibliographic networks that facilitates the simultaneous identification of clusters (e.g., topic areas) and prominent entities (e.g., surveys or landmark papers). While employing the landscape metaphor proposed in several earlier works, we introduce new means to determine relevant parameters of the landscape. Moreover, we are able to compute prominent entities, clustering of entities, and the landscape’s surface in a surprisingly simple and uniform way. The effectiveness of our network visualizations is illustrated on
data from the graph drawing literature. Categories and Subject Descriptors (according to ACM CCS): H.3.3 [Information Search and Retrieval]: Informa- tion filtering 1. Introduction Bibliographic analysis24 uses publication data to structure and summarize a scientific field. These data are often given in the form of networks, with nodes representing authors, journals, or publications, and edges representing relations between these entities such as authorship, collaboration, or citation. We present an approach to analyze and visualize biblio- graphic networks using uniform algorithms to determine the prominent entities in the network, to spatially represent the clustering of the network, and to compute a surface for a landscape visualization of results. Since we propose an integrated method of analysis and visualization directed at particular aspects of bibliographic analysis, it may serve as a specialized component in more elaborate systems,10; 5; 9 and in particular as a communica- tion/exploration back-end for systems that specialize in ex- tracting and presenting network data.7; 23 This paper is organized as follows. In Sect. 2 we recall the definition of Kleinberg’s hubs & authorities indices15 and sketch their use in the analysis of bibliographic data. Based on similar principles, a new method for two-dimensional layout of bibliographic networks preserving the scientific topography is presented in Sect. 3. In Sect. 4, index and layout are turned into a landscape visualization, again us- ing the same algorithmic principles. An illustrative example comprised of publications in proceedings of Graph Drawing Symposia is given in Sect. 5. 2. Landmark Papers To identify prominent entities in bibliographic networks, we determine the structural importance of vertices according to their position in the graph. Many concepts formalizing this notion are in use, but the concept of hubs & authorities,15 though originally conceived to improve relevance ranking in Web search engines, appears to be particularly suitable for bibliographic networks. In this section, we present an alter- native derivation of these indices to emphasize the similarity of their computation with those in later sections. We assume familiarity with basic matrix properties and computations.12 A straightforward notion of prominence in undirected graphs, commonly applied in the analysis of social net- works,22 is the idea that the importance of a vertex is de- termined by the importance of its neighbors. According to the following definition, the importance assigned to a vertex is proportional to the total importance of its neighbors. Definition 1 (eigenvector centrality4) Let A be the adja- cency...
Website: www.informatik.uni-konstanz.de | Filesize: -
No of Page(s): 7
Download Visualization of Bibliographic Networks with a Reshaped Landscape ....pdf
data from the graph drawing literature. Categories and Subject Descriptors (according to ACM CCS): H.3.3 [Information Search and Retrieval]: Informa- tion filtering 1. Introduction Bibliographic analysis24 uses publication data to structure and summarize a scientific field. These data are often given in the form of networks, with nodes representing authors, journals, or publications, and edges representing relations between these entities such as authorship, collaboration, or citation. We present an approach to analyze and visualize biblio- graphic networks using uniform algorithms to determine the prominent entities in the network, to spatially represent the clustering of the network, and to compute a surface for a landscape visualization of results. Since we propose an integrated method of analysis and visualization directed at particular aspects of bibliographic analysis, it may serve as a specialized component in more elaborate systems,10; 5; 9 and in particular as a communica- tion/exploration back-end for systems that specialize in ex- tracting and presenting network data.7; 23 This paper is organized as follows. In Sect. 2 we recall the definition of Kleinberg’s hubs & authorities indices15 and sketch their use in the analysis of bibliographic data. Based on similar principles, a new method for two-dimensional layout of bibliographic networks preserving the scientific topography is presented in Sect. 3. In Sect. 4, index and layout are turned into a landscape visualization, again us- ing the same algorithmic principles. An illustrative example comprised of publications in proceedings of Graph Drawing Symposia is given in Sect. 5. 2. Landmark Papers To identify prominent entities in bibliographic networks, we determine the structural importance of vertices according to their position in the graph. Many concepts formalizing this notion are in use, but the concept of hubs & authorities,15 though originally conceived to improve relevance ranking in Web search engines, appears to be particularly suitable for bibliographic networks. In this section, we present an alter- native derivation of these indices to emphasize the similarity of their computation with those in later sections. We assume familiarity with basic matrix properties and computations.12 A straightforward notion of prominence in undirected graphs, commonly applied in the analysis of social net- works,22 is the idea that the importance of a vertex is de- termined by the importance of its neighbors. According to the following definition, the importance assigned to a vertex is proportional to the total importance of its neighbors. Definition 1 (eigenvector centrality4) Let A be the adja- cency...
Website: www.informatik.uni-konstanz.de | Filesize: -
No of Page(s): 7
Download Visualization of Bibliographic Networks with a Reshaped Landscape ....pdf
“expected” number of edges between two vertices?
Overview of past work on community structure. • How to determine the “best” number of communities. • Fast linear algebra based method. • Bringing in statistics. A Brief History of Methods • Spectral methods, graph partitioning problems. • A well known example is spectral bisection, which uses the graph/ network Laplacian. • In the special case of a network having only two communities, Fiedler proposed a method for identifying the the members nodes. L ij = δ ij k i − A ij A Brief History of Methods • Hierarchical clustering: groups nodes into communities such
that nodes within a community are similar to each other in some sense; widely used in sociology. • Technique 1) calculate a weight, for every pair of nodes in the network 2) then take the nodes with no edges between them an add edges between pairs one by one in order of their weights, from strongest to weakest. • Many ways exist for calculating the values. • The entire process is frequently represented as a dendrogram, a visualization of the vertices coalescing into communities. W ij n W ij A Brief History of Methods • Girvan-Newman Algorithm: a divisive method for determining community structure that focuses on the betweenness of edges. • Edge betweenness: the number of shortest paths between pairs of vertices that run along an edge. • Removing edges of high betweenness breaks up the connected network into communities. GN Algorithm Algorithm 1. Calculate the betweenness for all edges in the network. 2. Remove the edge with the highest betweenness. 3. Recalculate betweenness for all edges affected by the removal. 4. Repeat from steps 2 until no edges remain. M. Girvan and M.E.J. Newman, “Community structure in social and biological networks” Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002). GN Algorithm GN Algorithm The classic “Karate Club” example Modularity • Introduced by Newman and Girvan to quantify which division of a network into communities/groups was the best. • Related to Newman’s work on assortativity in networks, “Mixing patterns in networks” Phys. Rev. E 67, 026126 (2003) • Modularity: the fraction of edges falling within communities minus the expected fraction of such edges. the fraction of all edges in the network that link vertices in community i to vertices in community j. e ij : a i = summationdisplay j e ij the fraction of edges that connect to vertices in community i. Q = summationdisplay i (e ii − a 2 i ) = T re − || e 2 || M.E.J. Newman and M. Girvan, “Finding and evaluating community structure in networks” Phys. Rev. E 69, 026113 (2004) Modularity M.E.J. Newman and M. Girvan, “Finding and evaluating community structure in networks” Phys. Rev. E 69, 026113 (2004) Again the “Karate Club” Modularity 0 0.1 0.2 0.3 0.4 0.5 modularity We love to study ourselves . . . A New-New Approach - Newman later returned to the subject of community structure and modularity with a new-new approach. -...
Website: mae.ucdavis.edu | Filesize: -
No of Page(s): 36
Download “expected” number of edges between two vertices?.pdf
that nodes within a community are similar to each other in some sense; widely used in sociology. • Technique 1) calculate a weight, for every pair of nodes in the network 2) then take the nodes with no edges between them an add edges between pairs one by one in order of their weights, from strongest to weakest. • Many ways exist for calculating the values. • The entire process is frequently represented as a dendrogram, a visualization of the vertices coalescing into communities. W ij n W ij A Brief History of Methods • Girvan-Newman Algorithm: a divisive method for determining community structure that focuses on the betweenness of edges. • Edge betweenness: the number of shortest paths between pairs of vertices that run along an edge. • Removing edges of high betweenness breaks up the connected network into communities. GN Algorithm Algorithm 1. Calculate the betweenness for all edges in the network. 2. Remove the edge with the highest betweenness. 3. Recalculate betweenness for all edges affected by the removal. 4. Repeat from steps 2 until no edges remain. M. Girvan and M.E.J. Newman, “Community structure in social and biological networks” Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002). GN Algorithm GN Algorithm The classic “Karate Club” example Modularity • Introduced by Newman and Girvan to quantify which division of a network into communities/groups was the best. • Related to Newman’s work on assortativity in networks, “Mixing patterns in networks” Phys. Rev. E 67, 026126 (2003) • Modularity: the fraction of edges falling within communities minus the expected fraction of such edges. the fraction of all edges in the network that link vertices in community i to vertices in community j. e ij : a i = summationdisplay j e ij the fraction of edges that connect to vertices in community i. Q = summationdisplay i (e ii − a 2 i ) = T re − || e 2 || M.E.J. Newman and M. Girvan, “Finding and evaluating community structure in networks” Phys. Rev. E 69, 026113 (2004) Modularity M.E.J. Newman and M. Girvan, “Finding and evaluating community structure in networks” Phys. Rev. E 69, 026113 (2004) Again the “Karate Club” Modularity 0 0.1 0.2 0.3 0.4 0.5 modularity We love to study ourselves . . . A New-New Approach - Newman later returned to the subject of community structure and modularity with a new-new approach. -...
Website: mae.ucdavis.edu | Filesize: -
No of Page(s): 36
Download “expected” number of edges between two vertices?.pdf
Tuesday, October 30, 2012
Task Matching and Scheduling in Heterogeneous Computing
Task Matching and Scheduling in Heterogeneous Computing Environments Using a Genetic-Algorithm-Based Approach 1 Lee Wang,* ,2 Howard Jay Siegel,* ,2 Vwani P. Roychowdhury,† ,3 and Anthony A. Maciejewski* ,2 *Parallel Processing Laboratory, School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47907-1285; and †Electrical Engineering Department, UCLA, Los Angeles, California 90095-1594 To exploit a heterogeneous computing (HC) environment, an application task may be decomposed into subtasks that have data dependencies. Subtask matching and scheduling consists of assigning subtasks to machines, ordering subtask execution for each machine, and ordering intermachine data transfers. The goal is to achieve the minimal completion time for the task. A heuristic approach
based on a genetic algorithm is developed to do matching and scheduling in HC environments. It is assumed that the matcher/scheduler is in control of a dedicated HC suite of machines. The characteristics of this genetic-algorithm-based approach include: separation of the matching and the scheduling representations, independence of the chromosome structure from the details of the communication subsystem, and consideration of overlap among all computations and communications that obey subtask precedence constraints. It is applicable to the static scheduling of production jobs and can be readily used to collectively schedule a set of tasks that are decomposed into subtasks. Some parameters and the selection scheme of the genetic algorithm were chosen experimentally to achieve the best performance. Extensive simulation tests were conducted. For small-sized problems (e.g., a small number of subtasks and a small number of machines), exhaustive searches were used to verify that this genetic-algorithm-based approach found the optimal solutions. Simulation results for larger-sized problems showed that this genetic-algorithm-based approach outperformed two nonevolutionary heuristics and a random search. © 1997 Academic Press 1. INTRODUCTION Different portions of an application task often require dif- ferent types of computation. In general, it is impossible for a single machine architecture with its associated compiler, op- erating system, and programming tools to satisfy all the com- putational requirements in such an application equally well. However, a heterogeneous computing (HC) environment that 1 This research was supported in part by NRaD under Subcontract 20- 950001-70 and by the DARPA/ITO Quorum Program under NPS Subcontract N62271-97-M-0900. 2 E-mail: {lwang,hj,maciejew}@ecn.purdue.edu. 3 E-mail: vwani@ee.ucla.edu. consists of a heterogeneous suite of machines, high-speed in- terconnections, interfaces, operating systems, communication protocols, and programming environments provides a variety of architectural capabilities, which can be orchestrated to per- form an application that has diverse execution requirements [Fre89, FrS93, KhP93, SiA96, Sun92]. In the HC environ- ment considered here, an application task can be decomposed into subtasks, where each subtask is computationally homoge- neous (well suited to a single machine), and different subtasks may have different machine architectural requirements. These subtasks can have data dependences among them. Once the application task is decomposed into subtasks, the following decisions have to be made: matching, i.e., assigning subtasks to machines, and scheduling, i.e., ordering subtask execution for each machine and ordering intermachine data transfers. In this context, the goal of HC is to achieve the minimal com- pletion time, i.e., the minimal overall execution...
Website: www.cisr.us | Filesize: -
No of Page(s): 15
Download Task Matching and Scheduling in Heterogeneous Computing - CISR.pdf
based on a genetic algorithm is developed to do matching and scheduling in HC environments. It is assumed that the matcher/scheduler is in control of a dedicated HC suite of machines. The characteristics of this genetic-algorithm-based approach include: separation of the matching and the scheduling representations, independence of the chromosome structure from the details of the communication subsystem, and consideration of overlap among all computations and communications that obey subtask precedence constraints. It is applicable to the static scheduling of production jobs and can be readily used to collectively schedule a set of tasks that are decomposed into subtasks. Some parameters and the selection scheme of the genetic algorithm were chosen experimentally to achieve the best performance. Extensive simulation tests were conducted. For small-sized problems (e.g., a small number of subtasks and a small number of machines), exhaustive searches were used to verify that this genetic-algorithm-based approach found the optimal solutions. Simulation results for larger-sized problems showed that this genetic-algorithm-based approach outperformed two nonevolutionary heuristics and a random search. © 1997 Academic Press 1. INTRODUCTION Different portions of an application task often require dif- ferent types of computation. In general, it is impossible for a single machine architecture with its associated compiler, op- erating system, and programming tools to satisfy all the com- putational requirements in such an application equally well. However, a heterogeneous computing (HC) environment that 1 This research was supported in part by NRaD under Subcontract 20- 950001-70 and by the DARPA/ITO Quorum Program under NPS Subcontract N62271-97-M-0900. 2 E-mail: {lwang,hj,maciejew}@ecn.purdue.edu. 3 E-mail: vwani@ee.ucla.edu. consists of a heterogeneous suite of machines, high-speed in- terconnections, interfaces, operating systems, communication protocols, and programming environments provides a variety of architectural capabilities, which can be orchestrated to per- form an application that has diverse execution requirements [Fre89, FrS93, KhP93, SiA96, Sun92]. In the HC environ- ment considered here, an application task can be decomposed into subtasks, where each subtask is computationally homoge- neous (well suited to a single machine), and different subtasks may have different machine architectural requirements. These subtasks can have data dependences among them. Once the application task is decomposed into subtasks, the following decisions have to be made: matching, i.e., assigning subtasks to machines, and scheduling, i.e., ordering subtask execution for each machine and ordering intermachine data transfers. In this context, the goal of HC is to achieve the minimal com- pletion time, i.e., the minimal overall execution...
Website: www.cisr.us | Filesize: -
No of Page(s): 15
Download Task Matching and Scheduling in Heterogeneous Computing - CISR.pdf
Financial Forecasting Using Genetic Algorithms
A new genetic-algorithm-based system is presented and applied to the task of predicting the future performances of individual stocks. The system, in its most general form, can be applied to any inductive machine-learning problem: given a database of examples, the system will return a general description applicable to examples both within and outside the database. This differs from traditional genetic algorithms, which perform optimization. The genetic algorithm system is compared to an established neural network system in the domain of financial fore- casting, using the results from over 160 stocks and roughly 5000 experiments. Synergy between the two systems is also examined. This study presents a new system that utilizes
genetic algorithms (GAs) top redict the future performances of individual stocks. More generally, the system extends GAs from their traditional domain of optimization to inductive machine learning or clasification. The overall learning system incorporates a GA, a niching method (for finding multiple solutions), and several other components (discussed in the section entitled Genetic Algorithms for Inductive Learning). Time-series forecasting is a special type of classification on which this study concentrates. Specifically, for any financial time series related to the performance of an individual stock, the goal is to forecast the value of the time series k steps into the future. The experiments of this study forecast the relative return of a stock12 weks into the future. We define a stock’s relative return as the stock’s return minus the average return of the over 1600 stocks we model. We make predictions for all 1600+ stocks at thre different points in time and sumarize the results.As a benchmark, the GA system is compared to an established neural network(NN) system (Mani & Bar, 1994) using the same 1600+ stocks and thre points in time. (We have used the N system and its predecesors to forecast stock prices and manage portfolios for approximately 3 years.) We examine the potential synergy from combining the GA and N forecasts, as well as other ways in which the two algorithms complement each other.The remainder of this article discusses inductive machine learning, casting financial forecasting as an inductive machine-learning problem; reviews genetic Aplied Artificial Intelligence, 10:543± 565, 1996Copyright bullet2 1996 Taylor & Francis0883-9514/96 $12.00 +.00 543 The authors thank Dean Bar, K. K. Quah, and Doug Case for their advice and assistance, and Steve Ward of Ward Systems Group for help with neural network implementations. The authors also thank the referees for their suggestions. Adress correspondence to Sam Mahfoud, LBS Capital Management, Inc., 311 Park Place Boulevard, Suite330, Clearwater, FL 34619, USA. E-mail:sam@lbs.com algorithms; examines genetic algorithms in inductive machine learning and financial forecasting; explains the GA-based system of this study; discusses the chosen applications domain predicting the performances of individual stocks; presents two sets of experiments and their asociated results; examines the results as well as experimental biases; and presents paths for future research. FINANCIAL FORECASTING AS INDUCTIVE MACHINE LEARNING This section briefly reviews inductive machine learning and proceds to cast the problem of financial time-series forecasting as a specific type of inductive machine learning. Inductive learning (Michalski, 1983) can be defined as acquiring conceptsthrough examining data items. It is the similarities among various data...
Website: sce.uhcl.edu | Filesize: -
No of Page(s): 24
Download FINANCIAL FORECASTING USING GENETIC ALGORITHMS.pdf
genetic algorithms (GAs) top redict the future performances of individual stocks. More generally, the system extends GAs from their traditional domain of optimization to inductive machine learning or clasification. The overall learning system incorporates a GA, a niching method (for finding multiple solutions), and several other components (discussed in the section entitled Genetic Algorithms for Inductive Learning). Time-series forecasting is a special type of classification on which this study concentrates. Specifically, for any financial time series related to the performance of an individual stock, the goal is to forecast the value of the time series k steps into the future. The experiments of this study forecast the relative return of a stock12 weks into the future. We define a stock’s relative return as the stock’s return minus the average return of the over 1600 stocks we model. We make predictions for all 1600+ stocks at thre different points in time and sumarize the results.As a benchmark, the GA system is compared to an established neural network(NN) system (Mani & Bar, 1994) using the same 1600+ stocks and thre points in time. (We have used the N system and its predecesors to forecast stock prices and manage portfolios for approximately 3 years.) We examine the potential synergy from combining the GA and N forecasts, as well as other ways in which the two algorithms complement each other.The remainder of this article discusses inductive machine learning, casting financial forecasting as an inductive machine-learning problem; reviews genetic Aplied Artificial Intelligence, 10:543± 565, 1996Copyright bullet2 1996 Taylor & Francis0883-9514/96 $12.00 +.00 543 The authors thank Dean Bar, K. K. Quah, and Doug Case for their advice and assistance, and Steve Ward of Ward Systems Group for help with neural network implementations. The authors also thank the referees for their suggestions. Adress correspondence to Sam Mahfoud, LBS Capital Management, Inc., 311 Park Place Boulevard, Suite330, Clearwater, FL 34619, USA. E-mail:sam@lbs.com algorithms; examines genetic algorithms in inductive machine learning and financial forecasting; explains the GA-based system of this study; discusses the chosen applications domain predicting the performances of individual stocks; presents two sets of experiments and their asociated results; examines the results as well as experimental biases; and presents paths for future research. FINANCIAL FORECASTING AS INDUCTIVE MACHINE LEARNING This section briefly reviews inductive machine learning and proceds to cast the problem of financial time-series forecasting as a specific type of inductive machine learning. Inductive learning (Michalski, 1983) can be defined as acquiring conceptsthrough examining data items. It is the similarities among various data...
Website: sce.uhcl.edu | Filesize: -
No of Page(s): 24
Download FINANCIAL FORECASTING USING GENETIC ALGORITHMS.pdf
FNV: Light-weight Flash-based network
Motivation: Network diagrams are commonly used to visualize biochemical pathways by displaying the relationships between genes, proteins, mRNAs, microRNAs, metabolites, regulatory DNA elements, diseases, viruses, and drugs. While there are several currently available web-based pathway viewers, there is still room for improvement. To this end, we have developed a Flash-based network viewer (FNV) for the visualization of small to moderately sized biological networks and pathways. Summary: Written in Adobe ActionScript 3.0
the viewer accepts simple Extensible Markup Language (XML) formatted input files to display pathways in vector graphics on any web-page providing flexible layout options, interactivity with the user through tool tips, hyperlinks, and the ability to rearrange nodes on the screen. FNV was utilized as a component in several web-based systems, namely Genes2Networks, Lists2Networks, KEA, ChEA and PathwayGene- rator. In addition, FVN can be used to embed pathways inside PDF files for the communication of pathways in soft publication materials. Availability: FNV is available for use and download along with sup- porting documentation and sample networks at http://www.maayanlab.net/FNV. Contact: avi.maayan@mssm.edu 1 INTRODUCTION Pathway databases such KEGG (Ogata et al., 1999), BioCarta (http://www.biocarta.com), WikiPathways (Pico et al., 2008) Science Signaling Connection Maps (Gough, 2002), and UCSD- Nature Signaling Gateway (Saunders et al., 2008) communicate over the web: cell signaling, transcriptional, and metabolic path- ways, as diagrams made of nodes and links. Such diagrams are visualized using different layout algorithms embedded in network viewers implemented with a variety of technologies. The majority of web-based network viewers make use of the Java web technolo- gies. For example, PATIKAweb (Dogrusoz et al., 2006) uses Java Server Pages (JSP) to retrieve information stored in the manually curated PATIKA database, or passed through a file to generate pathway diagrams using a force-directed algorithm to arrange stat- ic images of nodes and edges. Tools such as WebInterViewer (Han et al. 2004) and VisANT (Hu et al., 2008) are useful for large pro- * To whom correspondence should be addressed. tein-protein interaction networks utilizing JavaWebStart. However, JavaWebStart runs in a sandbox and does not easily communicate with the browser. Other tools such as jSquid, (Klammer et al., 2007) are powerful but since they utilize Java Applets they are slow to start and are inconsistent across browsers. Several web- based pathway viewers have been implemented without the use of Java. For example, CellDesigner (Funahashi et al., 2003) used by KEGG, creates static network images with hyperlinks. It was used, for example, by BioPP (Viswanathan, 2007) with Perl and CGI displaying static images with annotations mapped to the nodes as hyperlinks. AVIS (Berger et al., 2007), a network drawing tool that we developed, uses Asynchronous JavaScript and XML (AJAX) and underlying Perl libraries to draw static networks that are ren- dered using GraphViz. GraphViz (Gansner and North, 1999) is one of the most commonly used graph drawing tool for displaying...
Website: bioinformatics.oxfordjournals.org | Filesize: -
No of Page(s): 2
Download FNV: Light-weight Flash-based network and ... - Bioinformatics.pdf
the viewer accepts simple Extensible Markup Language (XML) formatted input files to display pathways in vector graphics on any web-page providing flexible layout options, interactivity with the user through tool tips, hyperlinks, and the ability to rearrange nodes on the screen. FNV was utilized as a component in several web-based systems, namely Genes2Networks, Lists2Networks, KEA, ChEA and PathwayGene- rator. In addition, FVN can be used to embed pathways inside PDF files for the communication of pathways in soft publication materials. Availability: FNV is available for use and download along with sup- porting documentation and sample networks at http://www.maayanlab.net/FNV. Contact: avi.maayan@mssm.edu 1 INTRODUCTION Pathway databases such KEGG (Ogata et al., 1999), BioCarta (http://www.biocarta.com), WikiPathways (Pico et al., 2008) Science Signaling Connection Maps (Gough, 2002), and UCSD- Nature Signaling Gateway (Saunders et al., 2008) communicate over the web: cell signaling, transcriptional, and metabolic path- ways, as diagrams made of nodes and links. Such diagrams are visualized using different layout algorithms embedded in network viewers implemented with a variety of technologies. The majority of web-based network viewers make use of the Java web technolo- gies. For example, PATIKAweb (Dogrusoz et al., 2006) uses Java Server Pages (JSP) to retrieve information stored in the manually curated PATIKA database, or passed through a file to generate pathway diagrams using a force-directed algorithm to arrange stat- ic images of nodes and edges. Tools such as WebInterViewer (Han et al. 2004) and VisANT (Hu et al., 2008) are useful for large pro- * To whom correspondence should be addressed. tein-protein interaction networks utilizing JavaWebStart. However, JavaWebStart runs in a sandbox and does not easily communicate with the browser. Other tools such as jSquid, (Klammer et al., 2007) are powerful but since they utilize Java Applets they are slow to start and are inconsistent across browsers. Several web- based pathway viewers have been implemented without the use of Java. For example, CellDesigner (Funahashi et al., 2003) used by KEGG, creates static network images with hyperlinks. It was used, for example, by BioPP (Viswanathan, 2007) with Perl and CGI displaying static images with annotations mapped to the nodes as hyperlinks. AVIS (Berger et al., 2007), a network drawing tool that we developed, uses Asynchronous JavaScript and XML (AJAX) and underlying Perl libraries to draw static networks that are ren- dered using GraphViz. GraphViz (Gansner and North, 1999) is one of the most commonly used graph drawing tool for displaying...
Website: bioinformatics.oxfordjournals.org | Filesize: -
No of Page(s): 2
Download FNV: Light-weight Flash-based network and ... - Bioinformatics.pdf
A Spectral Clustering Approach To Finding
Clustering nodes in a graph is a useful general technique in data mining of large network data sets. In this context, Newman and Girvan [9] recently proposed an objective func- tion for graph clustering called the Q function which allows automatic selection of the number of clusters. Empirically, higher values of the Q function have been shown to correlate well with good graph clusterings. In this paper we show how optimizing the Q function can be reformulated as a spectral relaxation problem and propose two new spectral clustering algorithms that seek to maximize Q. Experimental results indicate that the new algorithms are e–cient and efiective at
flnding both good clusterings and the appropriate number of clusters across a variety of real-world graph data sets. In addition, the spectral algorithms are much faster for large sparse graphs, scaling roughly linearly with the number of nodes n in the graph, compared to O(n2) for previous clus- tering algorithms using the Q function. 1 Introduction Large complex graphs representing relationships among sets of entities are an increasingly common focus of scientiflc inquiry. Examples include social networks, Web graphs, telecommunication networks, semantic networks, and biological networks. One of the key questions in understanding such data is How many communities are there and what are the community memberships"? Algorithms for flnding such communities, or auto- matically grouping nodes in a graph into clusters, have been developed in a variety of difierent areas, includ- ing VLSI design, parallel computing, computer vision, social networks, and more recently in machine learn- ing. Good algorithms for graph clustering hinge on the quality of the objective function being used. A vari- ety of difierent objective functions and clustering algo- rithms have been proposed for this problem, ranging from hierarchical clustering to max- ow/min-cut meth- ods to methods based on truncating the eigenspace of a suitably-deflned matrix. In recent years, much attention has been paid to spectral clustering algorithms (e.g., [11],[12],[14]) that, explicitly or implicitly, attempt to ⁄The research in this paper was supported by the National Science Foundation under Grant IRI-9703120 as part of the Knowledge Discovery and Dissemination program. SW was also supported by a National Defense Science and Engineering Graduate Fellowship. yDepartment of Computer Science, University of California, Irvine globally optimize cost functions such as the Normalized Cut measure [12]. The majority of these approaches at- tempt to balance the size of the clusters while minimiz- ing the interaction between dissimilar nodes. However, for the types of complex heterogeneous networks that arise naturally in many domains, the bias that these ap- proaches have towards clusters of equal size can be seen as a drawback. Furthermore, many of these measures, such as Normalized Cut, can not be used directly for selecting the number of clusters, k, since they increase (or decrease) monotonically as k is varied. Recently, a new approach was developed by New- man and Girvan [9] to overcome limitations of previ- ous measures for measuring community structure. They proposed the modularity function" Q, which directly measures the quality of a particular clustering of...
Website: www-stat.stanford.edu | Filesize: -
No of Page(s): 12
Download A Spectral Clustering Approach To Finding ... - Department of Statistics.pdf
flnding both good clusterings and the appropriate number of clusters across a variety of real-world graph data sets. In addition, the spectral algorithms are much faster for large sparse graphs, scaling roughly linearly with the number of nodes n in the graph, compared to O(n2) for previous clus- tering algorithms using the Q function. 1 Introduction Large complex graphs representing relationships among sets of entities are an increasingly common focus of scientiflc inquiry. Examples include social networks, Web graphs, telecommunication networks, semantic networks, and biological networks. One of the key questions in understanding such data is How many communities are there and what are the community memberships"? Algorithms for flnding such communities, or auto- matically grouping nodes in a graph into clusters, have been developed in a variety of difierent areas, includ- ing VLSI design, parallel computing, computer vision, social networks, and more recently in machine learn- ing. Good algorithms for graph clustering hinge on the quality of the objective function being used. A vari- ety of difierent objective functions and clustering algo- rithms have been proposed for this problem, ranging from hierarchical clustering to max- ow/min-cut meth- ods to methods based on truncating the eigenspace of a suitably-deflned matrix. In recent years, much attention has been paid to spectral clustering algorithms (e.g., [11],[12],[14]) that, explicitly or implicitly, attempt to ⁄The research in this paper was supported by the National Science Foundation under Grant IRI-9703120 as part of the Knowledge Discovery and Dissemination program. SW was also supported by a National Defense Science and Engineering Graduate Fellowship. yDepartment of Computer Science, University of California, Irvine globally optimize cost functions such as the Normalized Cut measure [12]. The majority of these approaches at- tempt to balance the size of the clusters while minimiz- ing the interaction between dissimilar nodes. However, for the types of complex heterogeneous networks that arise naturally in many domains, the bias that these ap- proaches have towards clusters of equal size can be seen as a drawback. Furthermore, many of these measures, such as Normalized Cut, can not be used directly for selecting the number of clusters, k, since they increase (or decrease) monotonically as k is varied. Recently, a new approach was developed by New- man and Girvan [9] to overcome limitations of previ- ous measures for measuring community structure. They proposed the modularity function" Q, which directly measures the quality of a particular clustering of...
Website: www-stat.stanford.edu | Filesize: -
No of Page(s): 12
Download A Spectral Clustering Approach To Finding ... - Department of Statistics.pdf
Document Structure and Layout Analysis
A document image is composed of a variety of physical entities or regions such as text blocks, lines, words, figures, tables, and background. We could also assign functional or logical labels such as sentences, titles, captions, author names, and addresses to some of theseregions.Theprocessofdocument structure and layout analysistriestodecompose a given document image into its component regions and understand their functional roles and relationships. The processing is carried out in multiple steps, such as pre- processing, page decomposition, structure understanding, etc. We will look into
each of these steps in detail in the following sections. Document images are often generated from physical documents by digitization us- ing scanners or digital cameras. Many documents, such as newspapers, magazines and brochures, contain very complex layout due to the placement of figures, titles, and cap- tions,complexbackgrounds,artistictextformatting,etc.(seeFigure1).Ahumanreader uses a variety of additional cues such as context, conventions and information about language/script, along with a complex reasoning process to decipher the contents of a document. Automatic analysis of an arbitrary document with complex layout is an extremely difficult task and is beyond the capabilities of the state-of-the-art document structure and layout analysis systems. This is interesting since documents are designed to be effective and clear to human interpretation unlike natural images. (a) (b) Fig.1. Examples of document images with complex layouts. 2 As mentioned before, we distinguish between the physical layout of a document and its logical structure [4]. One could also divide the document analysis process into two parts accordingly. 1.1 Physical Layout and Logical Structure The physical layout of a document refers to the physical location and boundaries of var- ious regions in the document image. The process of Document Layout Analysis aims to decompose a document image into a hierarchy of homogenous regions, such as figures, background, text blocks, text lines, words, characters, etc. The algorithms for layout analysis could be classified primarily into two groups depending on their approach. Bottom-up algorithms start with the smallest components of a document (pixels or con- nected components) and repeatedly group them to form larger, homogenous, regions. In contrast, top-down algorithms start with the complete document image and divide it repeatedly to form smaller and smaller regions. Each approach has its own advantage and they work well in specific situations. In addition, one could also employ a hybrid approach that uses a combination of top-down and bottom-up strategies. In addition to the physical layout, documents contain additional information about itscontents,suchastitles,paragraphs,captions,etc.Suchlabelsarelogicalorfunctional in nature as opposed to the structural labels of regions assigned by layout analysis. Most documents also contain the notion of reading order, which is a sequencing of the textual contents that makes comprehension of the document, easier. Languages such as Arabic, Chinese, etc. can have different reading directions as well (right-to-left, top- to-bottom). The set of logical or functional entities in a document, along with their inter-relationships is referred to as the Logical Structure of the document. The analysis...
Website: cvit.iiit.ac.in | Filesize: -
No of Page(s): 17
Download Document Structure and Layout Analysis.pdf
each of these steps in detail in the following sections. Document images are often generated from physical documents by digitization us- ing scanners or digital cameras. Many documents, such as newspapers, magazines and brochures, contain very complex layout due to the placement of figures, titles, and cap- tions,complexbackgrounds,artistictextformatting,etc.(seeFigure1).Ahumanreader uses a variety of additional cues such as context, conventions and information about language/script, along with a complex reasoning process to decipher the contents of a document. Automatic analysis of an arbitrary document with complex layout is an extremely difficult task and is beyond the capabilities of the state-of-the-art document structure and layout analysis systems. This is interesting since documents are designed to be effective and clear to human interpretation unlike natural images. (a) (b) Fig.1. Examples of document images with complex layouts. 2 As mentioned before, we distinguish between the physical layout of a document and its logical structure [4]. One could also divide the document analysis process into two parts accordingly. 1.1 Physical Layout and Logical Structure The physical layout of a document refers to the physical location and boundaries of var- ious regions in the document image. The process of Document Layout Analysis aims to decompose a document image into a hierarchy of homogenous regions, such as figures, background, text blocks, text lines, words, characters, etc. The algorithms for layout analysis could be classified primarily into two groups depending on their approach. Bottom-up algorithms start with the smallest components of a document (pixels or con- nected components) and repeatedly group them to form larger, homogenous, regions. In contrast, top-down algorithms start with the complete document image and divide it repeatedly to form smaller and smaller regions. Each approach has its own advantage and they work well in specific situations. In addition, one could also employ a hybrid approach that uses a combination of top-down and bottom-up strategies. In addition to the physical layout, documents contain additional information about itscontents,suchastitles,paragraphs,captions,etc.Suchlabelsarelogicalorfunctional in nature as opposed to the structural labels of regions assigned by layout analysis. Most documents also contain the notion of reading order, which is a sequencing of the textual contents that makes comprehension of the document, easier. Languages such as Arabic, Chinese, etc. can have different reading directions as well (right-to-left, top- to-bottom). The set of logical or functional entities in a document, along with their inter-relationships is referred to as the Logical Structure of the document. The analysis...
Website: cvit.iiit.ac.in | Filesize: -
No of Page(s): 17
Download Document Structure and Layout Analysis.pdf
Multi-scale Community Detection using Stability
Whether biological, social or technical, many real systems are represented as networks whose structure can be very informative regarding the original system’s organisation. In this respect the field of community detection has received a lot of attention in the past decade. Most of the approaches rely on the notion of modularity to assess the quality of a partition and use this measure as an optimisation criterion. Recently stability was introduced as
a new partition quality measure encompassing former partition quality measures such as modularity. The work presented here assesses stability as an optimisation criterion in a greedy approach similar to modularity optimisation techniques and enables multi-scale analysis using Markov time as resolution parameter. The method is validated and compared with other popular approaches against synthetic and various real data networks and the results show that the method enables accurate multi-scale network analysis. 1 INTRODUCTION In biology, sociology, engineering and beyond, many systems are represented and studied as graphs, or net- works (e.g. protein networks, social networks, web). In the past decade the field of community detection at- tracted a lot of interest considering community struc- tures as important features of real-world networks (Fortunato, 2010). Given a network of any kind, looking for communities refers to finding groups of nodes that are more densely connected internally than with the rest of the network. The concept considers the inhomogeneity within the connections between nodes to derive a partitioning of the network. As op- posed to clustering methods which commonly involve a given number of clusters, communities are usually unknown, can be of unequal size and density and of- ten have hierarchies (Fortunato, 2010). Finding such partitioning can provide information about the under- lying structure of a network and its functioning. It can also be used as a more compact representation of the network, for instance for visualisations. Detecting community structure in networks can be split into two subtasks: how to partition a graph, and how to measure the quality of a partition. The latter is commonly done using modularity (Newman and Girvan, 2004). Partitioning graphs is an NP- hard task (Fortunato, 2010) and heuristics based algorithms have thus been devised to reduce the complexity while still providing acceptable solutions. Considering the size of some real-world networks much effort is put into finding efficient algorithms able to deal with larger and larger networks such as modularity optimisation methods. However it has been shown that networks often have several levels of organisa- tion (Simon, 1962), leading to different partitions for each level which modularity optimisation alone can- not handle (Fortunato, 2010). Methods have been pro- vided to adapt modularity optimisation to multi-scale (multi-resolution) analysis using a tuning parameter (Reichardt and Bornholdt, 2006; Arenas et al., 2008). Yet the search for a partition quality function that acknowledges the multi-resolution nature...
Website: www.elemartelot.org | Filesize: -
No of Page(s): 10
Download Multi-scale Community Detection using Stability ... - Erwan Le Martelot.pdf
a new partition quality measure encompassing former partition quality measures such as modularity. The work presented here assesses stability as an optimisation criterion in a greedy approach similar to modularity optimisation techniques and enables multi-scale analysis using Markov time as resolution parameter. The method is validated and compared with other popular approaches against synthetic and various real data networks and the results show that the method enables accurate multi-scale network analysis. 1 INTRODUCTION In biology, sociology, engineering and beyond, many systems are represented and studied as graphs, or net- works (e.g. protein networks, social networks, web). In the past decade the field of community detection at- tracted a lot of interest considering community struc- tures as important features of real-world networks (Fortunato, 2010). Given a network of any kind, looking for communities refers to finding groups of nodes that are more densely connected internally than with the rest of the network. The concept considers the inhomogeneity within the connections between nodes to derive a partitioning of the network. As op- posed to clustering methods which commonly involve a given number of clusters, communities are usually unknown, can be of unequal size and density and of- ten have hierarchies (Fortunato, 2010). Finding such partitioning can provide information about the under- lying structure of a network and its functioning. It can also be used as a more compact representation of the network, for instance for visualisations. Detecting community structure in networks can be split into two subtasks: how to partition a graph, and how to measure the quality of a partition. The latter is commonly done using modularity (Newman and Girvan, 2004). Partitioning graphs is an NP- hard task (Fortunato, 2010) and heuristics based algorithms have thus been devised to reduce the complexity while still providing acceptable solutions. Considering the size of some real-world networks much effort is put into finding efficient algorithms able to deal with larger and larger networks such as modularity optimisation methods. However it has been shown that networks often have several levels of organisa- tion (Simon, 1962), leading to different partitions for each level which modularity optimisation alone can- not handle (Fortunato, 2010). Methods have been pro- vided to adapt modularity optimisation to multi-scale (multi-resolution) analysis using a tuning parameter (Reichardt and Bornholdt, 2006; Arenas et al., 2008). Yet the search for a partition quality function that acknowledges the multi-resolution nature...
Website: www.elemartelot.org | Filesize: -
No of Page(s): 10
Download Multi-scale Community Detection using Stability ... - Erwan Le Martelot.pdf
Monday, October 29, 2012
A Hybrid Genetic Algorithm for Multi-Depot and Periodic
A Hybrid Genetic Algorithm for Multi-Depot and Periodic Vehicle Routing Problems † Thibaut Vidal 1,2 , Teodor Gabriel Crainic 1,3,* , Michel Gendreau 1,4 , Nadia Lahrichi 1,3 , Walter Rei 1,3 1 Interuniversity Research Centre on Enterprise Networks, Logistics and Transportation
(CIRRELT) 2 Department of Computer Science and Operations Research, Université de Montréal, P.O. Box 6128, Station Centre-ville, Montréal, Canada H3C 3J7 3 Department of Management and Technology, Université du Québec à Montréal, P.O. Box 8888, Station Centre-Ville, Montréal, Canada H3C 3P8 4 Department of Mathematics and Industrial Engineering, École Polytechnique de Montréal, P.O. Box 6079, Station Centre-ville, Montréal, Canada H3C 3A7 Abstract. We propose an algorithmic framework that successfully addresses three vehicle routing problems: the multi-depot VRP, the periodic VRP, and the multi-depot periodic VRP with capacitated vehicles and constrained route duration. The meta-heuristic combines the exploration breadth of population-based evolutionary search, the aggressive improvement capabilities of neighborhood-based meta-heuristics, and advanced population diversity management schemes. Extensive computational experiments show that the method performs impressively, in terms of computational efficiency and solution quality, identifying either the best known solutions, including the optimal ones, or new best solutions for all currently available benchmark instances for the three problem classes. The proposed method also proves extremely competitive for the capacitated VRP. Keywords. Multi-depot, multi-period, vehicle routing problems, hybrid populations-based meta-heuristics, adaptive population, diversity management. Acknowledgements. Partial funding for this project has been provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), through its Industrial Research Chair and Discovery Grants programs, and by the partners of the Chair, CN, Rona, Alimentation Couche-Tard and the Ministry of Transportation of Québec, and by the Fonds québécois de la recherche sur la nature et les technologies (FQRNT) through its Team Research Project Program. † This version updates the CIRRELT-2010-34 Results and views expressed in this publication are the sole responsibility of the authors and do not necessarily reflect those of CIRRELT. Les résultats et opinions contenus dans cette publication ne reflètent pas nécessairement la position du CIRRELT et n'engagent pas sa responsabilité. _____________________________ * Corresponding author: Teodor.Gabriel@cirrelt.ca Dépôt légal – Bibliothèque et Archives nationales du Québec Bibliothèque et Archives Canada, 2011 © Copyright Vidal, Crainic, Gendreau, Lahrichi, Rei and CIRRELT, 2011 1 Introduction Vehicle Routing Problem (VRP) formulations are used to model an extremely broad range of issues in many application elds, transportation, supply chain management, production planning, and telecommunications, to name but a few (Toth and Vigo, 2002; Ho et al., 2010). Not surprisingly, starting with the seminal work of Dantzig and Ramser (1959), routing problems make up an extensively and continuously studied eld, as illustrated by numerous conferences, survey articles (e.g., Christo...
Website: www.cirrelt.ca | Filesize: -
No of Page(s): 44
Download A Hybrid Genetic Algorithm for Multi-Depot and Periodic ... - CIRRELT.pdf
(CIRRELT) 2 Department of Computer Science and Operations Research, Université de Montréal, P.O. Box 6128, Station Centre-ville, Montréal, Canada H3C 3J7 3 Department of Management and Technology, Université du Québec à Montréal, P.O. Box 8888, Station Centre-Ville, Montréal, Canada H3C 3P8 4 Department of Mathematics and Industrial Engineering, École Polytechnique de Montréal, P.O. Box 6079, Station Centre-ville, Montréal, Canada H3C 3A7 Abstract. We propose an algorithmic framework that successfully addresses three vehicle routing problems: the multi-depot VRP, the periodic VRP, and the multi-depot periodic VRP with capacitated vehicles and constrained route duration. The meta-heuristic combines the exploration breadth of population-based evolutionary search, the aggressive improvement capabilities of neighborhood-based meta-heuristics, and advanced population diversity management schemes. Extensive computational experiments show that the method performs impressively, in terms of computational efficiency and solution quality, identifying either the best known solutions, including the optimal ones, or new best solutions for all currently available benchmark instances for the three problem classes. The proposed method also proves extremely competitive for the capacitated VRP. Keywords. Multi-depot, multi-period, vehicle routing problems, hybrid populations-based meta-heuristics, adaptive population, diversity management. Acknowledgements. Partial funding for this project has been provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), through its Industrial Research Chair and Discovery Grants programs, and by the partners of the Chair, CN, Rona, Alimentation Couche-Tard and the Ministry of Transportation of Québec, and by the Fonds québécois de la recherche sur la nature et les technologies (FQRNT) through its Team Research Project Program. † This version updates the CIRRELT-2010-34 Results and views expressed in this publication are the sole responsibility of the authors and do not necessarily reflect those of CIRRELT. Les résultats et opinions contenus dans cette publication ne reflètent pas nécessairement la position du CIRRELT et n'engagent pas sa responsabilité. _____________________________ * Corresponding author: Teodor.Gabriel@cirrelt.ca Dépôt légal – Bibliothèque et Archives nationales du Québec Bibliothèque et Archives Canada, 2011 © Copyright Vidal, Crainic, Gendreau, Lahrichi, Rei and CIRRELT, 2011 1 Introduction Vehicle Routing Problem (VRP) formulations are used to model an extremely broad range of issues in many application elds, transportation, supply chain management, production planning, and telecommunications, to name but a few (Toth and Vigo, 2002; Ho et al., 2010). Not surprisingly, starting with the seminal work of Dantzig and Ramser (1959), routing problems make up an extensively and continuously studied eld, as illustrated by numerous conferences, survey articles (e.g., Christo...
Website: www.cirrelt.ca | Filesize: -
No of Page(s): 44
Download A Hybrid Genetic Algorithm for Multi-Depot and Periodic ... - CIRRELT.pdf
Feature Subset Selection Using A Genetic Algorithm
Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features #28from a much larger set#29 to represent the patterns to be classified. This paper presents an approach to the multi-criteria optimization problem of feature subset selection using a genetic algorithm. Our experiments demonstrate the feasibility of this approach for feature subset selection in the automated design of neural networks for pattern classification and knowledge discovery.
1 Introduction Many practical pattern classification tasks #28e.g., medical diagnosis#29 require learning of an appropriate classification function that assigns a given input pattern typically represented using a vector of attribute or feature values#29 to one of a #0Cnite set of classes. The choice of features, attributes, or measurements used to represent patterns that are presented to a classifier a#0Bect among other things#29: #0F The accuracy of the classification function that can be learned using an inductive learning algorithm #28e.g., a decision tree induction algorithm or a neural network learning algorithm#29: The attributes used to describe the patterns implicitly de#0Cne a pattern language. If the language is not expressive enough, it would fail to capture the infor- mationthat isnecessaryfor classi#0Ccation and henceregardlessof the learningalgorithm used, the accuracy of the classi#0Ccation function learned would be limited by this lack of information. #03 This report is a revised version of TR97-02. y This researchwas partially supported by the National Science Foundation #28through grants IRI-9409580 and IRI-9643299#29 and the John Deere Foundation. 1 #0F The time needed for learning a sufficiently accurate classification function: For a given representation of the classification function, the attributes used to describe the patterns implicitly determine the search space that needs to be explored by the learning algorithm. An abundance of irrelevant attributes can unnecessarily increase the size of the search space, and hence the time needed for learning a sufficiently accurate classification function. #0F The number of examples needed for learning a sufficiently accurate classification function: All other things being equal, the larger the number of attributes used to describe the patterns in a domain of interest, the larger is the number of examples needed to learn a classification function to a desired accuracy. #0F The cost of performing classification using the learned classification function: In many practical applications e.g., medical diagnosis, patterns are described using observable symptoms as well as results of diagnostic tests. Di#0Berent diagnostic tests might have di#0Berent costs as well as risks associated with them. For instance, an invasive ex-...
Website: archives.cs.iastate.edu | Filesize: -
No of Page(s): 13
Download Feature Subset Selection Using A Genetic Algorithm.pdf
1 Introduction Many practical pattern classification tasks #28e.g., medical diagnosis#29 require learning of an appropriate classification function that assigns a given input pattern typically represented using a vector of attribute or feature values#29 to one of a #0Cnite set of classes. The choice of features, attributes, or measurements used to represent patterns that are presented to a classifier a#0Bect among other things#29: #0F The accuracy of the classification function that can be learned using an inductive learning algorithm #28e.g., a decision tree induction algorithm or a neural network learning algorithm#29: The attributes used to describe the patterns implicitly de#0Cne a pattern language. If the language is not expressive enough, it would fail to capture the infor- mationthat isnecessaryfor classi#0Ccation and henceregardlessof the learningalgorithm used, the accuracy of the classi#0Ccation function learned would be limited by this lack of information. #03 This report is a revised version of TR97-02. y This researchwas partially supported by the National Science Foundation #28through grants IRI-9409580 and IRI-9643299#29 and the John Deere Foundation. 1 #0F The time needed for learning a sufficiently accurate classification function: For a given representation of the classification function, the attributes used to describe the patterns implicitly determine the search space that needs to be explored by the learning algorithm. An abundance of irrelevant attributes can unnecessarily increase the size of the search space, and hence the time needed for learning a sufficiently accurate classification function. #0F The number of examples needed for learning a sufficiently accurate classification function: All other things being equal, the larger the number of attributes used to describe the patterns in a domain of interest, the larger is the number of examples needed to learn a classification function to a desired accuracy. #0F The cost of performing classification using the learned classification function: In many practical applications e.g., medical diagnosis, patterns are described using observable symptoms as well as results of diagnostic tests. Di#0Berent diagnostic tests might have di#0Berent costs as well as risks associated with them. For instance, an invasive ex-...
Website: archives.cs.iastate.edu | Filesize: -
No of Page(s): 13
Download Feature Subset Selection Using A Genetic Algorithm.pdf
Customization and usability study of general purpose software tools
The ability of drawing very large networks as e.g. large computer networks is of great significance in visualizing the evolution of stochastic models for evolving networks. One focuses on designing and implementing new algorithms and innovative software systems that display a large graph at different abstraction levels. For example, there is an increasing need of systems that show maps of the Web and support the user during her navigation, of systems that display and monitor the traffic on the Internet, and of systems that draw portions of the Internet as a graph.
Until now, the vast majority of graph drawing algorithms that have been deeply studied and experimentally tested in the literature, like for instance for database schemes, can efficiently handle graphs of only hundreds of vertices. We aim at devising general algorithmic techniques for drawing large graphs and at experimenting their usage in new visualization systems, thus contributing to devising the technology transfer from the algorithmic research on graph drawing to its application in networks visualiza- tion. As part of this goal, we developed analysis-enhancing layouts and created in a cooperation with CR8 Universit´e de Paris Sud a novel technique that preserves the readability of abstract visualizations while showing all elements. On a different level we also report the activity that has been done in collaboration with Universit´e de Paris Sud and C01 Istituto Nazionale Fisica per la Materia in Rome. In the first case through k-core decomposition we can significantly reduce the complexity of the graph, still preserving some information. In the latter case the reduction is made by selecting the vertices to preserve according to their betweennes value. 1 General Techniques for Visualizing Large Graphs In order to design a new tool that can handle larger networks, several issues have to be clarified first. Among many are the suitable representation of networks and precise descriptions for the connection between the analysis and the layouts. For this deliverable, the problem of finding a concise description is already tackled. 11 17 65 14 9 3 1 16 2 8 15 10 12 4 7 13 Figure 1: Reduced view of the Autonomous System graph. Nodes represent groups having similar connectivity and sizes are proportional to the number of elements in the group. Edges show the connections between groups and their thickness is proportional to the number of connections. Several visualization techniques have been established that support drawing large graphs, like the collapse of subgraphs or groups, masking irrelevant parts or multi-views. However, these procedures are useless as long as they are not...
Website: pil.phys.uniroma1.it | Filesize: -
No of Page(s): 17
Download Customization and usability study of general purpose software tools ....pdf
Until now, the vast majority of graph drawing algorithms that have been deeply studied and experimentally tested in the literature, like for instance for database schemes, can efficiently handle graphs of only hundreds of vertices. We aim at devising general algorithmic techniques for drawing large graphs and at experimenting their usage in new visualization systems, thus contributing to devising the technology transfer from the algorithmic research on graph drawing to its application in networks visualiza- tion. As part of this goal, we developed analysis-enhancing layouts and created in a cooperation with CR8 Universit´e de Paris Sud a novel technique that preserves the readability of abstract visualizations while showing all elements. On a different level we also report the activity that has been done in collaboration with Universit´e de Paris Sud and C01 Istituto Nazionale Fisica per la Materia in Rome. In the first case through k-core decomposition we can significantly reduce the complexity of the graph, still preserving some information. In the latter case the reduction is made by selecting the vertices to preserve according to their betweennes value. 1 General Techniques for Visualizing Large Graphs In order to design a new tool that can handle larger networks, several issues have to be clarified first. Among many are the suitable representation of networks and precise descriptions for the connection between the analysis and the layouts. For this deliverable, the problem of finding a concise description is already tackled. 11 17 65 14 9 3 1 16 2 8 15 10 12 4 7 13 Figure 1: Reduced view of the Autonomous System graph. Nodes represent groups having similar connectivity and sizes are proportional to the number of elements in the group. Edges show the connections between groups and their thickness is proportional to the number of connections. Several visualization techniques have been established that support drawing large graphs, like the collapse of subgraphs or groups, masking irrelevant parts or multi-views. However, these procedures are useless as long as they are not...
Website: pil.phys.uniroma1.it | Filesize: -
No of Page(s): 17
Download Customization and usability study of general purpose software tools ....pdf
GPU Accelerated Lanczos Algorithm with Applications
Graphics Processing Units provide a large computational power at a very low price which position them as an ubiquitous accelerator. GPGPU is accelerating general purpose computations using GPU’s. GPU’s have been used to accelerate many Linear Algebra routines and Numerical Methods.Lanczos is an iterative method well suited for finding the extreme eigenvalues and the corresponding eigenvectors of large sparse symmetricmatrices.In this paper, we present an implementation of Lanczos Algorithm on GPU using the CUDA programming model and apply it to two important problems : graph bisection using spectral methods,and image segmentation. Our GPU implementation of spectral bisection per forms better when compared to both an Intel MathKernel Library implementation and a Matlab implementation.Our GPU implementation shows a speedup up
to 97.3 times over Matlab Implementation and 2.89 times over the Intel Math Kernel Library implementation on a Intel Core i7 920 Processor, which is a quad- core CPU. Similarly, our image segmentation implementation achieves a speed up of 3.27 compared to a multicore CPU based implementation using Intel Math Kernel Library and OpenMP. Through this work, we therefore wish to establish that the GPU may still be a better platform for also highly irregular and computationally intensive applications. I. INTRODUCTION The computationalpower of GPUs is increasing rapidly in the last few years. Coupled with the availability of programming environments such as CUDA, GPUsare being used for also general purpose computations.This trend is called GPGPU and presently several fundamental applications are available on GPUs. Examples include sparse matrix vector multiplication[1], SVD [7], and others. There are many problemswhich can be modeledas graphs and can be solved by formulating them as discrete combi- natorial optimizationproblems. Some of them include spec- tral graph partitioning,spectral image segmentation,spectral clustering, spectral graph layout and the like. Most of the aforementioned problems have a lot of practical applications, but posed as discrete optimization problems, these are hard to solve as they generally tend to be NP-complete problems. But real approximations to these problems can be solved using linear algebra methods like finding the spectrum of the Laplacian or adjacency matrix. These solutions require one to compute the extreme eigenvalues and corresponding eigenvectors of the Laplacian matrix of an underlying matrix. This approach of using the extreme eigenvalues, and the corresponding eigenvectors,also finds applications to many other problems from various settings such as computing page ranks [9], and latent semantic indexing. In general, the matrices involved in these computations are large, sparse, symmetric, and are real valued. Lanczos method is well suited for such problems.Lanczosmethodinvolves partial triadiagonalization on the given matrix, say A. Important information about the extremal eigenvalues of a matrix tends to emerge long before the tridiagonalizationis complete. This makes the Lanczos algorithm particular lyusefulin cases where a few of the largest or smallest eigen values of A are desired.Further, the onlylarge scale-operation involved is sparsematrix-vector multiplication which can be implemented as a black box. In this work, we implement Lanczos algorithm on a GPU and study two applications of the Lanczos algorithm : graph bisection with spectral methods,and image segmentation. Given an undirected graph G, with vertex set V(G) and edge setE(G), and a positive integerk, the graph partitioning problem is to partition the graph G into k partitions. The partitions have to satisfy two conditions. Firstly, each partition has an equal number of vertices....
Website: cstar.iiit.ac.in | Filesize: -
No of Page(s): 6
Download GPU Accelerated Lanczos Algorithm with Applications.pdf
to 97.3 times over Matlab Implementation and 2.89 times over the Intel Math Kernel Library implementation on a Intel Core i7 920 Processor, which is a quad- core CPU. Similarly, our image segmentation implementation achieves a speed up of 3.27 compared to a multicore CPU based implementation using Intel Math Kernel Library and OpenMP. Through this work, we therefore wish to establish that the GPU may still be a better platform for also highly irregular and computationally intensive applications. I. INTRODUCTION The computationalpower of GPUs is increasing rapidly in the last few years. Coupled with the availability of programming environments such as CUDA, GPUsare being used for also general purpose computations.This trend is called GPGPU and presently several fundamental applications are available on GPUs. Examples include sparse matrix vector multiplication[1], SVD [7], and others. There are many problemswhich can be modeledas graphs and can be solved by formulating them as discrete combi- natorial optimizationproblems. Some of them include spec- tral graph partitioning,spectral image segmentation,spectral clustering, spectral graph layout and the like. Most of the aforementioned problems have a lot of practical applications, but posed as discrete optimization problems, these are hard to solve as they generally tend to be NP-complete problems. But real approximations to these problems can be solved using linear algebra methods like finding the spectrum of the Laplacian or adjacency matrix. These solutions require one to compute the extreme eigenvalues and corresponding eigenvectors of the Laplacian matrix of an underlying matrix. This approach of using the extreme eigenvalues, and the corresponding eigenvectors,also finds applications to many other problems from various settings such as computing page ranks [9], and latent semantic indexing. In general, the matrices involved in these computations are large, sparse, symmetric, and are real valued. Lanczos method is well suited for such problems.Lanczosmethodinvolves partial triadiagonalization on the given matrix, say A. Important information about the extremal eigenvalues of a matrix tends to emerge long before the tridiagonalizationis complete. This makes the Lanczos algorithm particular lyusefulin cases where a few of the largest or smallest eigen values of A are desired.Further, the onlylarge scale-operation involved is sparsematrix-vector multiplication which can be implemented as a black box. In this work, we implement Lanczos algorithm on a GPU and study two applications of the Lanczos algorithm : graph bisection with spectral methods,and image segmentation. Given an undirected graph G, with vertex set V(G) and edge setE(G), and a positive integerk, the graph partitioning problem is to partition the graph G into k partitions. The partitions have to satisfy two conditions. Firstly, each partition has an equal number of vertices....
Website: cstar.iiit.ac.in | Filesize: -
No of Page(s): 6
Download GPU Accelerated Lanczos Algorithm with Applications.pdf
A multilevel layout algorithm for visualizing
Graph drawing is an integral part of many systems biology studies, enabling visual exploration and mining of large-scale biological networks. While a number of layout algorithms are available in popular network analysis platforms, such as Cytoscape, it remains poorly understood how well their solutions reflect the underlying biological processes that
give rise to the network connectivity structure. Moreover, visualizations obtained using conventional layout algorithms, such as those based on the force-directed drawing approach, may become uninformative when applied to larger networks with dense or clustered connectivity structure. Methods: We implemented a modified layout plug-in, named Multilevel Layout, which applies the conventional layout algorithms within a multilevel optimization framework to better capture the hierarchical modularity of many biological networks. Using a wide variety of real life biological networks, we carried out a systematic evaluation of the method in comparison with other layout algorithms in Cytoscape. Results: The multilevel approach provided both biologically relevant and visually pleasant layout solutions in most network types, hence complementing the layout options available in Cytoscape. In particular, it could improve drawing of large-scale networks of yeast genetic interactions and human physical interactions. In more general terms, the biological evaluation framework developed here enables one to assess the layout solutions from any existing or future graph drawing algorithm as well as to optimize their performance for a given network type or structure. Conclusions: By making use of the multilevel modular organization when visualizing biological networks, together with the biological evaluation of the layout solutions, one can generate convenient visualizations for many network biology applications. Background Network graphs provide a valuable conceptual framework for representing and mining high-throughput experimental datasets, as well as for extracting and interpreting their biological information by the means of graph-based analysis approaches [1-8]. In cellu- lar systems, network nodes typically refer to biomolecules, such as genes or proteins, and the edge connections the type of relationships the network is encoding, including physical or functional information. Network visualization aims to organize the complex network structures in a way that provides the user with readily apparent insights into the most interesting biological patterns and relationships within the data, such as Tuikkala et al. BioData Mining 2012, 5:2 http://www.biodatamining.org/content/5/1/2 BioData Mining © 2012 Tuikkala et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. components of biological pathways, processes or complexes, which can be further investigated by follow-up computational and/or experimental analyses [4-6,9,10]. Owing to the developments in biotechnologies, experimental datasets are steadily increasing in their size and complexity, posing many challenges to the network-centric data...
Website: www.biodatamining.org | Filesize: -
No of Page(s): 16
Download A multilevel layout algorithm for visualizing ... - BioData Mining.pdf
give rise to the network connectivity structure. Moreover, visualizations obtained using conventional layout algorithms, such as those based on the force-directed drawing approach, may become uninformative when applied to larger networks with dense or clustered connectivity structure. Methods: We implemented a modified layout plug-in, named Multilevel Layout, which applies the conventional layout algorithms within a multilevel optimization framework to better capture the hierarchical modularity of many biological networks. Using a wide variety of real life biological networks, we carried out a systematic evaluation of the method in comparison with other layout algorithms in Cytoscape. Results: The multilevel approach provided both biologically relevant and visually pleasant layout solutions in most network types, hence complementing the layout options available in Cytoscape. In particular, it could improve drawing of large-scale networks of yeast genetic interactions and human physical interactions. In more general terms, the biological evaluation framework developed here enables one to assess the layout solutions from any existing or future graph drawing algorithm as well as to optimize their performance for a given network type or structure. Conclusions: By making use of the multilevel modular organization when visualizing biological networks, together with the biological evaluation of the layout solutions, one can generate convenient visualizations for many network biology applications. Background Network graphs provide a valuable conceptual framework for representing and mining high-throughput experimental datasets, as well as for extracting and interpreting their biological information by the means of graph-based analysis approaches [1-8]. In cellu- lar systems, network nodes typically refer to biomolecules, such as genes or proteins, and the edge connections the type of relationships the network is encoding, including physical or functional information. Network visualization aims to organize the complex network structures in a way that provides the user with readily apparent insights into the most interesting biological patterns and relationships within the data, such as Tuikkala et al. BioData Mining 2012, 5:2 http://www.biodatamining.org/content/5/1/2 BioData Mining © 2012 Tuikkala et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. components of biological pathways, processes or complexes, which can be further investigated by follow-up computational and/or experimental analyses [4-6,9,10]. Owing to the developments in biotechnologies, experimental datasets are steadily increasing in their size and complexity, posing many challenges to the network-centric data...
Website: www.biodatamining.org | Filesize: -
No of Page(s): 16
Download A multilevel layout algorithm for visualizing ... - BioData Mining.pdf
Graph Clustering with Network Structure Indices
Graph clustering has become ubiquitous in the study of relational data sets. We ex- amine two simple algorithms: a new graph- ical adaptation of the k-medoids algorithm and the Girvan-Newman method based on edge betweenness centrality. We show that they can be effective at discovering the latent groups or communities that are defined by the link structure of a graph. However, both approaches rely on prohibitively expensive computations, given the size of modern relational data sets. Network structure in- dices (NSIs) are a proven technique for indexing network structure
and efficiently finding short paths. We show how incorporating NSIs into these graph clustering algorithms can overcome these complexity limitations. We also present promising quantitative and qualitative evaluations of the modified algorithms on synthetic and real data sets. 1. Introduction Clustering data is a fundamental task in machine learning. Given a set of data instances, the goal is to group them in a meaningful way, with the interpretation of the grouping dictated by the domain. In the context of relational data sets — that is, data whose instances are connected by a link structure representing domain-specific relationships or statistical dependency — the clustering task becomes a means for identifying communities within networks. For example, in the bibliographic domain, we find net- works of scientific papers. Interpreted as a graph, vertices (papers) are connected by an edge when one cites Appearing in Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, 2007. Copyright 2007 by the author(s)/owner(s). the other. Given a specific paper (or group of pa- pers), one may try to find out more about the subject matter by pouring through the works cited, and perhaps the works they cite as well. However, for a sufficiently large network, the number of papers to investigate quickly becomes overwhelming. By clustering the graph, we can identify the community of relevant works surrounding the paper in question. In the sections that follow, we discuss methods for clustering such graphs into groups that are solely determined by the network structure (e.g., co-star relations between actors or citations among scientific papers). Some of the simplest approaches to graph clustering are also very effective. We consider two algorithms: a graphical version of the k-medoids data clustering algorithm (Kaufman & Rousseeuw, 1990) and the Girvan-Newman algorithm (2002). While both techniques perform well, they are computationally expensive to the point of intractibility when run on even moderate-size relational data sets. Using the index- ing methods described by Rattigan, Maier, and Jensen (2006), we can drastically reduce the computational complexity of these algorithms. Surprisingly, this in- crease in scalability does not hinder performance. 2. Graph clustering algorithms 2.1. Evaluating clustering performance Before examining the details of the graph clustering algorithms, we introduce a framework for analyzing and evaluating clustering performance. We evalu- ate candidate algorithms on randomly generated uni- form clustered graphs...
Website: machinelearning.org | Filesize: -
No of Page(s): 8
Download Graph Clustering with Network Structure Indices - The International ....pdf
and efficiently finding short paths. We show how incorporating NSIs into these graph clustering algorithms can overcome these complexity limitations. We also present promising quantitative and qualitative evaluations of the modified algorithms on synthetic and real data sets. 1. Introduction Clustering data is a fundamental task in machine learning. Given a set of data instances, the goal is to group them in a meaningful way, with the interpretation of the grouping dictated by the domain. In the context of relational data sets — that is, data whose instances are connected by a link structure representing domain-specific relationships or statistical dependency — the clustering task becomes a means for identifying communities within networks. For example, in the bibliographic domain, we find net- works of scientific papers. Interpreted as a graph, vertices (papers) are connected by an edge when one cites Appearing in Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, 2007. Copyright 2007 by the author(s)/owner(s). the other. Given a specific paper (or group of pa- pers), one may try to find out more about the subject matter by pouring through the works cited, and perhaps the works they cite as well. However, for a sufficiently large network, the number of papers to investigate quickly becomes overwhelming. By clustering the graph, we can identify the community of relevant works surrounding the paper in question. In the sections that follow, we discuss methods for clustering such graphs into groups that are solely determined by the network structure (e.g., co-star relations between actors or citations among scientific papers). Some of the simplest approaches to graph clustering are also very effective. We consider two algorithms: a graphical version of the k-medoids data clustering algorithm (Kaufman & Rousseeuw, 1990) and the Girvan-Newman algorithm (2002). While both techniques perform well, they are computationally expensive to the point of intractibility when run on even moderate-size relational data sets. Using the index- ing methods described by Rattigan, Maier, and Jensen (2006), we can drastically reduce the computational complexity of these algorithms. Surprisingly, this in- crease in scalability does not hinder performance. 2. Graph clustering algorithms 2.1. Evaluating clustering performance Before examining the details of the graph clustering algorithms, we introduce a framework for analyzing and evaluating clustering performance. We evalu- ate candidate algorithms on randomly generated uni- form clustered graphs...
Website: machinelearning.org | Filesize: -
No of Page(s): 8
Download Graph Clustering with Network Structure Indices - The International ....pdf
Sunday, October 28, 2012
Fuzzy Tracking Methods for Mobile Robots
This chapter deals with the application of fuzzy logic to autonomous navigation of mobile robots. Two different tracking problems are considered: tracking of previously computed explicit paths, and tracking of walls and other nat- ural features in the robot environment. Fuzzy logic allows the management of heuristic rule base knowledge, impre- cise information from sensors, and the uncertainties in the knowledge about the environment. The Chapter includes the application of the tracking methods to the mobile robots RAM-1 and AURORA. A. Ollero*, A. García-Cerezo**, J.
L. Martínez** and A. Mandow**. * Departamento de Ingeniería de Sistemas y Automática. Escuela Superior de Ingenieros. Universidad de Sevilla. Avenida Reina Mercedes s/n, 41012 Sevilla (Spain). Fax: (+34) 5 455-68-49; Tel: (+34) 5 455-68-71; E-mail: aollero@cartuja.us.es ** Departamento de Ingeniería de Sistemas y Automática. Universidad de Málaga. Plaza el Ejido s/n, 29013 Málaga (Spain). Fax: (+34) 5 213-14-13; Tel: (+34) 5 213-14-18; E-mail: gcerezo@tecma1.ctima.uma.es Key Words . fuzzy control, autonomous vehicles, path tracking, navigation behaviors. 25.1 INTRODUCTION The number of applications of fuzzy logic to mobile robot and autonomous vehicle control has increased significantly in the last years. The most well known arguments supporting fuzzy control are the ability to cope with imprecise information in heuristic rule based knowledge and sensor measurements, the interpolative nature of the fuzzy control- lers, and the flexibility in the definition of non linear control laws. Motion planning and control of autonomous vehicles involves several problems related with environment perception and representation, path planning, path tracking, velocity control and low level motion control. This paper deals with the applications to the mobile robot tracking of references, which include: This document was created with FrameMaker 4.0.4 a) tracking of explicit previously defined paths, using dead reckoning and navigation sensors. b) tracking of walls and other natural features in the environment using external sensors. In a) the tracking objective is to generate the control commands for the vehicle to follow the previously defined path by taking into account the actual position and the constraints imposed by the vehicle and its lower level motion con- trollers. Fuzzy logic can be used both to supervise conventional path trackers, as proposed in [1], and for direct fuzzy control. This Chapter considers the second case. That is, the fuzzy controller directly generates the steering command from the sensor inputs [2]. The approach b) is related with fuzzy reactive approaches for mobile robot navigation. In this case the controller inputs are provided by the external sensors. The chapter is organized as follows. The following section is devoted to fuzzy direct explicit path tracking. The sec- tion includes the description of the method and the application to the mobile robot RAM-1. The next section is related to reactive navigation. Fuzzy tracking is considered in the framework of a behavior based architecture for mobile robot control. The application of this method to the AURORA mobile robot is considered in the next section. The...
Website: www8.cs.umu.se | Filesize: -
No of Page(s): 21
Download FUZZY TRACKING METHODS FOR MOBILE ROBOTS.pdf.pdf
L. Martínez** and A. Mandow**. * Departamento de Ingeniería de Sistemas y Automática. Escuela Superior de Ingenieros. Universidad de Sevilla. Avenida Reina Mercedes s/n, 41012 Sevilla (Spain). Fax: (+34) 5 455-68-49; Tel: (+34) 5 455-68-71; E-mail: aollero@cartuja.us.es ** Departamento de Ingeniería de Sistemas y Automática. Universidad de Málaga. Plaza el Ejido s/n, 29013 Málaga (Spain). Fax: (+34) 5 213-14-13; Tel: (+34) 5 213-14-18; E-mail: gcerezo@tecma1.ctima.uma.es Key Words . fuzzy control, autonomous vehicles, path tracking, navigation behaviors. 25.1 INTRODUCTION The number of applications of fuzzy logic to mobile robot and autonomous vehicle control has increased significantly in the last years. The most well known arguments supporting fuzzy control are the ability to cope with imprecise information in heuristic rule based knowledge and sensor measurements, the interpolative nature of the fuzzy control- lers, and the flexibility in the definition of non linear control laws. Motion planning and control of autonomous vehicles involves several problems related with environment perception and representation, path planning, path tracking, velocity control and low level motion control. This paper deals with the applications to the mobile robot tracking of references, which include: This document was created with FrameMaker 4.0.4 a) tracking of explicit previously defined paths, using dead reckoning and navigation sensors. b) tracking of walls and other natural features in the environment using external sensors. In a) the tracking objective is to generate the control commands for the vehicle to follow the previously defined path by taking into account the actual position and the constraints imposed by the vehicle and its lower level motion con- trollers. Fuzzy logic can be used both to supervise conventional path trackers, as proposed in [1], and for direct fuzzy control. This Chapter considers the second case. That is, the fuzzy controller directly generates the steering command from the sensor inputs [2]. The approach b) is related with fuzzy reactive approaches for mobile robot navigation. In this case the controller inputs are provided by the external sensors. The chapter is organized as follows. The following section is devoted to fuzzy direct explicit path tracking. The sec- tion includes the description of the method and the application to the mobile robot RAM-1. The next section is related to reactive navigation. Fuzzy tracking is considered in the framework of a behavior based architecture for mobile robot control. The application of this method to the AURORA mobile robot is considered in the next section. The...
Website: www8.cs.umu.se | Filesize: -
No of Page(s): 21
Download FUZZY TRACKING METHODS FOR MOBILE ROBOTS.pdf.pdf
A Hybrid of Genetic Algorithm and Particle Swarm
An evolutionary recurrent network which automates the design of recurrent neural/fuzzy networks using a new evolutionary learning algorithm is proposed in this paper. This new evolutionary learning algorithm is based on a hybrid of genetic algorithm (GA) and particle swarm optimization (PSO), and is thus called HGAPSO. In HGAPSO, individuals in a new generation are created, not only by crossover and mutation operation as in GA, but also by PSO. The concept of elite strategy is adopted in HGAPSO, where the upper-half of the
best-performing individuals in a population are regarded as elites. However, instead of being reproduced directly to the next generation, these elites are first enhanced. The group constituted by the elites is regarded as a swarm, and each elite corresponds to a particle within it. In this regard, the elites are enhanced by PSO, an operation which mimics the maturing phenomenon in nature. These enhanced elites constitute half of the population in the new generation, whereas the other half is generated by performing crossover and mutation op- eration on these enhanced elites. HGAPSO is applied to recurrent neural/fuzzy network design as follows. For recurrent neural net- work, a fully connected recurrent neural network is designed and applied to a temporal sequence production problem. For recur- rent fuzzy network design, a Takagi–Sugeno–Kang-type recurrent fuzzy network is designed and applied to dynamic plant control. The performance of HGAPSO is compared to both GA and PSO in these recurrent networks design problems, demonstrating its superiority. Index Terms—Dynamic plant control, elite strategy, recurrent neural/fuzzy work, temporal sequence production. I. INTRODUCTION T HE ADVENT OF evolutionary computation has inspired new resources for optimization problem solving, such as the optimal design of neural networks and fuzzy systems. In contrast to traditional computation systems which may be good at accurate and exact computation, but have brittle operations, evolutionary computation provides a more robust and efficient approach for solving complex real-world problems [1]–[3]. Many evolutionary algorithms, such as genetic algorithm (GA) [4], genetic programming [5], evolutionary programming [6], and evolution strategies [7], have been proposed. Since they are heuristic and stochastic, they are less likely to get stuck in local minimum, and they are based on populations made up of individuals with a specified behavior similar to biological phenomenon. These common characteristics led to the development of evolutionary computation as an increasing important field. Manuscript received February 14, 2003; revised May 20, 2003. This work was supported by the National Science Council, Taiwan, R.O.C., under Grant NSC 91-2213-E-055-022. This paper was recommended by Associate Editor L. O. Hall. The author is with the Department of Electrical Engineering, National Chung Hsing University, Taichung, 402 Taiwan, R.O.C. Digital Object Identifier 10.1109/TSMCB.2003.818557 Among existing evolutionary algorithms, the most well- known branch is GA. GAs are stochastic search procedures based on the mechanics of natural selection, genetics, and evolution [4]. Since they simultaneously evaluate...
Website: bitbucket.org | Filesize: -
No of Page(s): 10
Download A Hybrid of Genetic Algorithm and Particle Swarm ... - Bitbucket.pdf
best-performing individuals in a population are regarded as elites. However, instead of being reproduced directly to the next generation, these elites are first enhanced. The group constituted by the elites is regarded as a swarm, and each elite corresponds to a particle within it. In this regard, the elites are enhanced by PSO, an operation which mimics the maturing phenomenon in nature. These enhanced elites constitute half of the population in the new generation, whereas the other half is generated by performing crossover and mutation op- eration on these enhanced elites. HGAPSO is applied to recurrent neural/fuzzy network design as follows. For recurrent neural net- work, a fully connected recurrent neural network is designed and applied to a temporal sequence production problem. For recur- rent fuzzy network design, a Takagi–Sugeno–Kang-type recurrent fuzzy network is designed and applied to dynamic plant control. The performance of HGAPSO is compared to both GA and PSO in these recurrent networks design problems, demonstrating its superiority. Index Terms—Dynamic plant control, elite strategy, recurrent neural/fuzzy work, temporal sequence production. I. INTRODUCTION T HE ADVENT OF evolutionary computation has inspired new resources for optimization problem solving, such as the optimal design of neural networks and fuzzy systems. In contrast to traditional computation systems which may be good at accurate and exact computation, but have brittle operations, evolutionary computation provides a more robust and efficient approach for solving complex real-world problems [1]–[3]. Many evolutionary algorithms, such as genetic algorithm (GA) [4], genetic programming [5], evolutionary programming [6], and evolution strategies [7], have been proposed. Since they are heuristic and stochastic, they are less likely to get stuck in local minimum, and they are based on populations made up of individuals with a specified behavior similar to biological phenomenon. These common characteristics led to the development of evolutionary computation as an increasing important field. Manuscript received February 14, 2003; revised May 20, 2003. This work was supported by the National Science Council, Taiwan, R.O.C., under Grant NSC 91-2213-E-055-022. This paper was recommended by Associate Editor L. O. Hall. The author is with the Department of Electrical Engineering, National Chung Hsing University, Taichung, 402 Taiwan, R.O.C. Digital Object Identifier 10.1109/TSMCB.2003.818557 Among existing evolutionary algorithms, the most well- known branch is GA. GAs are stochastic search procedures based on the mechanics of natural selection, genetics, and evolution [4]. Since they simultaneously evaluate...
Website: bitbucket.org | Filesize: -
No of Page(s): 10
Download A Hybrid of Genetic Algorithm and Particle Swarm ... - Bitbucket.pdf
Selection in Massively Parallel Genetic Algorithms
The availability of massively parallel computers makes it possible to apply genetic algorithms to large populations and very complex applications. Among these applications are studies of natural evolution in the emerging eld of artificial life, which place special demands on the genetic algorithm. In this paper, we characterize the difference between panmictic and local selection/mating schemes in terms of diversity of alleles, diversity of genotypes, the inbreeding coefficient, and the speed and robustness of the genetic algorithm. Based on these metrics, local mating appears to not
only be superior to panmictic for artificial evolutionary simulations, but also for more traditional applications of genetic algorithms. In Proceedings of the Fourth International Conference on Genetic Algorithms. Morgan Kaufmann, 1991. 1 1 Introduction The availability of powerful super computers such as the Connection Machine(Hillis1985) means that genetic algorithms are now applied to larger and more difficult optimization problems (e.g. (Collins and Je erson 1991a), where the search space consists of 2 25590 points). Some of our recent arti cial life work (Je erson et al. 1991; Collins and Je erson 1991b; Collins and Je erson 1991c; Collins and Je erson 1991a) has involved massively parallel genetic algorithms characterized by large populations, enormous search spaces, and tness functions that change through time. These simulated evolution applications place special demands on the genetic algorithm. The simulations generally attempt to model the evolution of populations of tens of thousands of artificial organisms in a simulated environment over a period of thousands of generations. The ecosystem in which the tness of each individual is evaluated can potentially include both direct and indirect interactions with other members of the population, members of coevolving populations, the background environment, etc. In addition, the environment and selection criteria may change both during a generation and over a period of many generations (and may be different in different parts of the simulated world). Such applications require a genetic algorithm that is able to simultaneously explore a wide range of genotypes and can maintain enough genetic diversity to respond to changing conditions. Genetic algorithms that use panmictic selection and mating (where any individual can potentially mate with any other typically convergeon a single peak of multi modal functions, even when several solutions of equalqualityexist(Deb and Goldberg 1989). Genetic convergence is a serious problem when the adaptive landscape is constantly changing as it does in both natural and artificial ecosystems. Crowding, sharing, and restrictive mating are modifications to panmictic selection schemes that have been proposed to deal with the problem of convergence, and thus allow the population to simultaneously contain individuals on more than one peak in the adaptive land- scape (De Jong 1975; Goldberg and Richardson 1987; Deb and Goldberg 1989). These modifications are motivated by the natural phenomena of niches, species, and assortative mating, but they make use of global knowledge of the population, phenotypic distance measures, and global selection and mating, and thus are not well suited for parallel implementation. Rather than attempting to directly implement these natural phenomena, we...
Website: reference.kfupm.edu.sa | Filesize: -
No of Page(s): 15
Download Selection in Massively Parallel Genetic Algorithms - Science ....pdf
only be superior to panmictic for artificial evolutionary simulations, but also for more traditional applications of genetic algorithms. In Proceedings of the Fourth International Conference on Genetic Algorithms. Morgan Kaufmann, 1991. 1 1 Introduction The availability of powerful super computers such as the Connection Machine(Hillis1985) means that genetic algorithms are now applied to larger and more difficult optimization problems (e.g. (Collins and Je erson 1991a), where the search space consists of 2 25590 points). Some of our recent arti cial life work (Je erson et al. 1991; Collins and Je erson 1991b; Collins and Je erson 1991c; Collins and Je erson 1991a) has involved massively parallel genetic algorithms characterized by large populations, enormous search spaces, and tness functions that change through time. These simulated evolution applications place special demands on the genetic algorithm. The simulations generally attempt to model the evolution of populations of tens of thousands of artificial organisms in a simulated environment over a period of thousands of generations. The ecosystem in which the tness of each individual is evaluated can potentially include both direct and indirect interactions with other members of the population, members of coevolving populations, the background environment, etc. In addition, the environment and selection criteria may change both during a generation and over a period of many generations (and may be different in different parts of the simulated world). Such applications require a genetic algorithm that is able to simultaneously explore a wide range of genotypes and can maintain enough genetic diversity to respond to changing conditions. Genetic algorithms that use panmictic selection and mating (where any individual can potentially mate with any other typically convergeon a single peak of multi modal functions, even when several solutions of equalqualityexist(Deb and Goldberg 1989). Genetic convergence is a serious problem when the adaptive landscape is constantly changing as it does in both natural and artificial ecosystems. Crowding, sharing, and restrictive mating are modifications to panmictic selection schemes that have been proposed to deal with the problem of convergence, and thus allow the population to simultaneously contain individuals on more than one peak in the adaptive land- scape (De Jong 1975; Goldberg and Richardson 1987; Deb and Goldberg 1989). These modifications are motivated by the natural phenomena of niches, species, and assortative mating, but they make use of global knowledge of the population, phenotypic distance measures, and global selection and mating, and thus are not well suited for parallel implementation. Rather than attempting to directly implement these natural phenomena, we...
Website: reference.kfupm.edu.sa | Filesize: -
No of Page(s): 15
Download Selection in Massively Parallel Genetic Algorithms - Science ....pdf
Journal of Graph Algorithms and Applications
Methods for ranking World Wide Web resources according to their position in the link structure of the Web are receiving considerable attention, because they provide the first effective means for search engines to cope with the explosive growth and diversification of the Web. Closely related methods have been used in other disciplines for quite some time. We propose a visualization method that supports the simultaneous exploration of a link structure and a ranking
of its nodes by showing the result of the ranking algorithm in one dimension and using graph drawing techniques in the remaining one or two dimensions to show the underlying structure. We suggest to use a simple spectral layout algorithm, because it does not add to the complexity of an implementation already used for ranking, but nevertheless produces meaningful layouts. The effectiveness of our visualizations is demonstrated with example applications, in which they provide valuable insight into the link structure and the ranking mechanism alike. We consider them useful for the analysis of query results, maintenance of search engines, and evaluation of Web graph models. Communicated by Giuseppe Liotta and Ioannis G. Tollis: submitted October 2001; revised December 2002. Research supported in part by the Deutsche Forschungsgemeinschaft (DFG) under grant Br 2158/1-1 and the European Commission within FET Open Project COSIN (IST-2001-33555). U. Brandes and S. Cornelsen, Visual Ranking, JGAA, 7(2) 181–201 (2003) 182 1 Introduction The directed graph induced by the hyperlink structure of the Web has been recognized as a rich source of information. Understanding and exploiting this structure has a proven potential to help dealing with the explosive growth and diversification of the Web. Probably the most widely recognized example of this kind is the PageRank index employed by the Google search engine [9]. PageRank is but one of many models and algorithms to rank Web resources according to their position in a hyperlink structure (see, e.g., [36, 29, 13, 1, 8, 12]). We propose a method to complement rankings with a meaningful visual- ization of the graph they are computed on. While graph visualization is an active area of research as well [14, 28], its integration with quantitative network analyses is only beginning to receive at- tention. It is, however, rather difficult to understand the determinants of, say, a particular ranking if its results do not influence the way in which the structure is visualized. A design for graph visualizations showing a vertex valuation in its structural context is introduced in [6, 5]. In two-dimensional diagrams of social networks, the vertical dimension of the layout area is used to represent exactly the value assigned to each actor (a constraint), and a layout of the horizontal dimension is determined to make the diagram readable (an objective). Since the networks in question are relatively small (no more than a hundred vertices), an adaptation of the Sugiyama framework...
Website: 147.91.102.6 | Filesize: -
No of Page(s): 21
Download Journal of Graph Algorithms and Applications - Mathematical ....pdf
of its nodes by showing the result of the ranking algorithm in one dimension and using graph drawing techniques in the remaining one or two dimensions to show the underlying structure. We suggest to use a simple spectral layout algorithm, because it does not add to the complexity of an implementation already used for ranking, but nevertheless produces meaningful layouts. The effectiveness of our visualizations is demonstrated with example applications, in which they provide valuable insight into the link structure and the ranking mechanism alike. We consider them useful for the analysis of query results, maintenance of search engines, and evaluation of Web graph models. Communicated by Giuseppe Liotta and Ioannis G. Tollis: submitted October 2001; revised December 2002. Research supported in part by the Deutsche Forschungsgemeinschaft (DFG) under grant Br 2158/1-1 and the European Commission within FET Open Project COSIN (IST-2001-33555). U. Brandes and S. Cornelsen, Visual Ranking, JGAA, 7(2) 181–201 (2003) 182 1 Introduction The directed graph induced by the hyperlink structure of the Web has been recognized as a rich source of information. Understanding and exploiting this structure has a proven potential to help dealing with the explosive growth and diversification of the Web. Probably the most widely recognized example of this kind is the PageRank index employed by the Google search engine [9]. PageRank is but one of many models and algorithms to rank Web resources according to their position in a hyperlink structure (see, e.g., [36, 29, 13, 1, 8, 12]). We propose a method to complement rankings with a meaningful visual- ization of the graph they are computed on. While graph visualization is an active area of research as well [14, 28], its integration with quantitative network analyses is only beginning to receive at- tention. It is, however, rather difficult to understand the determinants of, say, a particular ranking if its results do not influence the way in which the structure is visualized. A design for graph visualizations showing a vertex valuation in its structural context is introduced in [6, 5]. In two-dimensional diagrams of social networks, the vertical dimension of the layout area is used to represent exactly the value assigned to each actor (a constraint), and a layout of the horizontal dimension is determined to make the diagram readable (an objective). Since the networks in question are relatively small (no more than a hundred vertices), an adaptation of the Sugiyama framework...
Website: 147.91.102.6 | Filesize: -
No of Page(s): 21
Download Journal of Graph Algorithms and Applications - Mathematical ....pdf
New spectral methods for ratio cut partitioning
Partitioning of circuit netlists is important in many phases of VLSI design, ranging from layout to testing and hardware simulation. The ratio cut objective function [29] has received much attention since it naturally captures both min- cut and equipartition, the two traditional goals of partitioning. In this paper, we show that the second smallest eigenvalue of a matrix derived from the netlist gives a provably good approx- imation of the optimal ratio cut partition cost. We also dem- onstrate that fast Lanczos-type methods for
the sparse sym- metric eigenvalue problem are a robust basis for computing heuristic ratio cuts based on the eigenvector of this second ei- genvalue. Effective clustering methods are an immediate by- product of the second eigenvector computation, and are very successful on the “difficult” input classes proposed in the CAD literature. Finally, we discuss the very natural intersection graph representation of the circuit netlist as a basis for partitioning, and propose a heuristic based on spectral ratio cut partitioning of the netlist intersection graph. Our partitioning heuristics were tested on industry benchmark suites, and the results com- pare favorably with those of Wei and Cheng [29], 1321 in terms of both solution quality and runtime. This paper concludes by describing several types of algorithmic speedups and directiops for future work. I. PRELIMINARIES S SYSTEM complexity increases, a divide-and-con- A quer approach is used to keep the circuit design pro- cess tractable. This recursive decomposition of the syn- thesis problem is reflected in the hierarchical organization of boards, multi-chip modules, integrated circuits, and macro cells. As we move downward in the design hier- archy, signal delays typically decrease; for example, on- chip communication is faster than inter-chip communi- cation. Therefore, the traditional metric for the decom- position is the number of signal nets which cross between layout subproblems. Minimizing this number is the es- sence of partitioning. Any decision made early in the layout synthesis pro- cedure will constrain succeeding decisions, and hence good solutions to the placement, global routing, and de- tailed routing problems depend on the quality of the par- titioning algorithm. As noted by such authors as Donath Manuscript received June 9, 1991; revised December 12, 1991. This work was supported by the National Science Foundation under Grant MIP- 91 10696. A. B. Kahng is supported also by awational Science Foundation Young Investigator Award. This paper was recommended by Editor A. Dunlop. The authors are with the Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90024-1596. IEEE Log Number 9200832. [7] and Wei and Cheng [32], partitioning is basic to many fundamental CAD problems, including the following: Pacbging of designs: Logic is partitioned into blocks, subject to I/O bounds and constraints on block area; this is the canonical partitioning appli- cation at all levels of design, arising whenever .tech- nology improves and existing designs must be re- packaged onto higher-capacity blocks. Clustering analysis:...
Website: w.sliponline.org | Filesize: -
No of Page(s): 12
Download New spectral methods for ratio cut partitioning and - UCSD VLSI CAD.pdf
the sparse sym- metric eigenvalue problem are a robust basis for computing heuristic ratio cuts based on the eigenvector of this second ei- genvalue. Effective clustering methods are an immediate by- product of the second eigenvector computation, and are very successful on the “difficult” input classes proposed in the CAD literature. Finally, we discuss the very natural intersection graph representation of the circuit netlist as a basis for partitioning, and propose a heuristic based on spectral ratio cut partitioning of the netlist intersection graph. Our partitioning heuristics were tested on industry benchmark suites, and the results com- pare favorably with those of Wei and Cheng [29], 1321 in terms of both solution quality and runtime. This paper concludes by describing several types of algorithmic speedups and directiops for future work. I. PRELIMINARIES S SYSTEM complexity increases, a divide-and-con- A quer approach is used to keep the circuit design pro- cess tractable. This recursive decomposition of the syn- thesis problem is reflected in the hierarchical organization of boards, multi-chip modules, integrated circuits, and macro cells. As we move downward in the design hier- archy, signal delays typically decrease; for example, on- chip communication is faster than inter-chip communi- cation. Therefore, the traditional metric for the decom- position is the number of signal nets which cross between layout subproblems. Minimizing this number is the es- sence of partitioning. Any decision made early in the layout synthesis pro- cedure will constrain succeeding decisions, and hence good solutions to the placement, global routing, and de- tailed routing problems depend on the quality of the par- titioning algorithm. As noted by such authors as Donath Manuscript received June 9, 1991; revised December 12, 1991. This work was supported by the National Science Foundation under Grant MIP- 91 10696. A. B. Kahng is supported also by awational Science Foundation Young Investigator Award. This paper was recommended by Editor A. Dunlop. The authors are with the Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90024-1596. IEEE Log Number 9200832. [7] and Wei and Cheng [32], partitioning is basic to many fundamental CAD problems, including the following: Pacbging of designs: Logic is partitioned into blocks, subject to I/O bounds and constraints on block area; this is the canonical partitioning appli- cation at all levels of design, arising whenever .tech- nology improves and existing designs must be re- packaged onto higher-capacity blocks. Clustering analysis:...
Website: w.sliponline.org | Filesize: -
No of Page(s): 12
Download New spectral methods for ratio cut partitioning and - UCSD VLSI CAD.pdf
Simple and Efficient Mesh Layout with Space-Filling Curves
We present a simple and e cient algorithm to compute cache-friendly layouts of unstructured geometric data. Coherent mesh layouts minimize cache misses and page faults by laying out vertices, triangles or tetrahedra in a spatially structured manner. Recently, Yoon et al. have shown that it is possible to construct an optimal cache-oblivious mesh layout (COML) for surface and volume data. How- ever, their
approach is based on an NP-Hard optimization problem, and is thus very computationally expensive. We present a mesh layout based on space- lling curves that has comparable performance to COML and is orders of magnitude faster to com- pute. We also discuss extending our algorithm to handle extremely large datasets through an out-of-core approach. Finally, we include an analysis that examines a number of di erent mesh layouts, highlighting their strengths and weaknesses. Our evaluation indicates that space- lling curve layouts can be an order of magni- tude faster and less memory-intensive to compute while, in every application, being able to maintain a performance within 5% of the best layout, including those that are speci cally tuned for GPU hardware vertex caches in [Lin and Yu 06, Sander et al. 07]. © A K Peters, Ltd. 1 1086-7651/06 $0.50 per page i i paper" | 2012/4/5 | 23:05 | page 2 | #2 i i i i i i 2 journal of graphics tools 1. Introduction In the past few years, advances in 3D data acquisition technology, as well as improvements in simulation algorithms, have made very large datasets avail- able to the computer graphics community. Currently, the size of these models can vary from a few tens of thousands to hundreds of millions of polygons. Many challenges arise from this notable increase in size and complexity, and recently much attention has been given to the problem of computing high quality memory layouts for geometric datasets. Such layouts aim to mini- mize the cache miss and page fault penalty incurred by applications. This can be done by de ning a cache coherence metric and then optimizing the layout according to this metric. Yoon et al. [Yoon and Lindstrom 06, Yoon et al. 05] have investigated this problem and proposed a cache-oblivious mesh layout (COML) that generates near optimal results for any cache con gu- ration. One of this work’s major ndings is that by optimizing a dataset’s layout in memory, it is possible to improve the performance of many mesh processing applications without modifying the applications themselves. Our technique is similar to COML in the sense that it can also naturally improve the performance of other applications, but it is based on a space- lling curve approach, and is thus much faster to compute. Space lling curves are well known for their memory coherence character- istics and high spatial locality...
Website: vgc.poly.edu | Filesize: -
No of Page(s): 15
Download Simple and Efficient Mesh Layout with Space-Filling Curves.pdf
approach is based on an NP-Hard optimization problem, and is thus very computationally expensive. We present a mesh layout based on space- lling curves that has comparable performance to COML and is orders of magnitude faster to com- pute. We also discuss extending our algorithm to handle extremely large datasets through an out-of-core approach. Finally, we include an analysis that examines a number of di erent mesh layouts, highlighting their strengths and weaknesses. Our evaluation indicates that space- lling curve layouts can be an order of magni- tude faster and less memory-intensive to compute while, in every application, being able to maintain a performance within 5% of the best layout, including those that are speci cally tuned for GPU hardware vertex caches in [Lin and Yu 06, Sander et al. 07]. © A K Peters, Ltd. 1 1086-7651/06 $0.50 per page i i paper" | 2012/4/5 | 23:05 | page 2 | #2 i i i i i i 2 journal of graphics tools 1. Introduction In the past few years, advances in 3D data acquisition technology, as well as improvements in simulation algorithms, have made very large datasets avail- able to the computer graphics community. Currently, the size of these models can vary from a few tens of thousands to hundreds of millions of polygons. Many challenges arise from this notable increase in size and complexity, and recently much attention has been given to the problem of computing high quality memory layouts for geometric datasets. Such layouts aim to mini- mize the cache miss and page fault penalty incurred by applications. This can be done by de ning a cache coherence metric and then optimizing the layout according to this metric. Yoon et al. [Yoon and Lindstrom 06, Yoon et al. 05] have investigated this problem and proposed a cache-oblivious mesh layout (COML) that generates near optimal results for any cache con gu- ration. One of this work’s major ndings is that by optimizing a dataset’s layout in memory, it is possible to improve the performance of many mesh processing applications without modifying the applications themselves. Our technique is similar to COML in the sense that it can also naturally improve the performance of other applications, but it is based on a space- lling curve approach, and is thus much faster to compute. Space lling curves are well known for their memory coherence character- istics and high spatial locality...
Website: vgc.poly.edu | Filesize: -
No of Page(s): 15
Download Simple and Efficient Mesh Layout with Space-Filling Curves.pdf
Spectral methods and network communities
Subset of nodes more densely linked among themselves than to the rest of the network. COMMUNITY STRUCTURE: • Allows coarse-graining of network structure, provides reduced complexity and simplified visualization. • Important for search engines. • In social networks reflects social structure and is related to opinion dynamics or rumor propagation. • In biochemical and neural networks is related to functional units [Ravasz et al. and Guimerá-Amaral]. Communities in Complex Networks COMMUNITY (or modulus): • Subset of nodes more densely linked among themselves than
to the rest of the network. COMMUNITY STRUCTURE: • Allows coarse-graining of network structure, provides reduced complexity and simplified visualization. • Important for search engines. • In social networks reflects social structure and is related to opinion dynamics or rumor propagation. • In biochemical and neural networks is related to functional units [Ravasz et al. and Guimerá-Amaral]. After Song,Havlin,Makse: WWW schematic representation. Methods for community detection Some methods are divisive while others are agglomerative • Girvan-Newman : removal of links with high betweenness. • Radicchi et al. divisive algorithm, based on triangles. • Super-paramagnetic clustering. Reichardt-Bornholdt. • Guimerá-Amaral, and Danon et al. simulated annealing modularity optimization. Very good results but slow. • Greedy algorithm by Newman. Very fast. • Arenas-Duch. Extremal optimization. Works very well. • Pons-Latapy. Walk-trap algorithm. Trapped random-walks. • Newman. Spectral method with a “modularity matrix”. Very fast and reliable. • ... (this list is non-exhaustive). Methods for community detection Some methods are divisive while others are agglomerative • Girvan-Newman : removal of links with high betweenness. • Radicchi et al. divisive algorithm, based on triangles. • Super-paramagnetic clustering. Reichardt-Bornholdt. • Guimerá-Amaral, and Danon et al. simulated annealing modularity optimization. Very good results but slow. • Greedy algorithm by Newman. Very fast. • Arenas-Duch. Extremal optimization. Works very well. • Pons-Latapy. Walk-trap algorithm. Trapped random-walks. • Newman. Spectral method with a “modularity matrix”. Very fast and reliable. • ... (this list is non-exhaustive). Methods for community detection Some methods are divisive while others are agglomerative • Girvan-Newman : removal of links with high betweenness. • Radicchi et al. divisive algorithm, based on triangles. • Super-paramagnetic clustering. Reichardt-Bornholdt. • Guimerá-Amaral, and Danon et al. simulated annealing modularity optimization. Very good results but slow. • Greedy algorithm by Newman. Very fast. • Arenas-Duch. Extremal optimization. Works very well. • Pons-Latapy. Walk-trap algorithm. Trapped random-walks. • Newman. Spectral method with a “modularity matrix”. Very fast and reliable. • ... (this list is non-exhaustive). Methods for community detection Some methods are divisive while others are agglomerative • Girvan-Newman : removal of links with high betweenness. • Radicchi et al. divisive algorithm, based on triangles. • Super-paramagnetic clustering. Reichardt-Bornholdt. • Guimerá-Amaral, and Danon et al. simulated annealing modularity optimization. Very good results but slow. • Greedy algorithm by Newman. Very fast. • Arenas-Duch. Extremal optimization. Works very well. • Pons-Latapy. Walk-trap algorithm. Trapped random-walks. • Newman. Spectral method with a “modularity matrix”....
Website: ergodic.ugr.es | Filesize: -
No of Page(s): 87
Download Spectral methods and network communities - Universidad de Granada.pdf
to the rest of the network. COMMUNITY STRUCTURE: • Allows coarse-graining of network structure, provides reduced complexity and simplified visualization. • Important for search engines. • In social networks reflects social structure and is related to opinion dynamics or rumor propagation. • In biochemical and neural networks is related to functional units [Ravasz et al. and Guimerá-Amaral]. After Song,Havlin,Makse: WWW schematic representation. Methods for community detection Some methods are divisive while others are agglomerative • Girvan-Newman : removal of links with high betweenness. • Radicchi et al. divisive algorithm, based on triangles. • Super-paramagnetic clustering. Reichardt-Bornholdt. • Guimerá-Amaral, and Danon et al. simulated annealing modularity optimization. Very good results but slow. • Greedy algorithm by Newman. Very fast. • Arenas-Duch. Extremal optimization. Works very well. • Pons-Latapy. Walk-trap algorithm. Trapped random-walks. • Newman. Spectral method with a “modularity matrix”. Very fast and reliable. • ... (this list is non-exhaustive). Methods for community detection Some methods are divisive while others are agglomerative • Girvan-Newman : removal of links with high betweenness. • Radicchi et al. divisive algorithm, based on triangles. • Super-paramagnetic clustering. Reichardt-Bornholdt. • Guimerá-Amaral, and Danon et al. simulated annealing modularity optimization. Very good results but slow. • Greedy algorithm by Newman. Very fast. • Arenas-Duch. Extremal optimization. Works very well. • Pons-Latapy. Walk-trap algorithm. Trapped random-walks. • Newman. Spectral method with a “modularity matrix”. Very fast and reliable. • ... (this list is non-exhaustive). Methods for community detection Some methods are divisive while others are agglomerative • Girvan-Newman : removal of links with high betweenness. • Radicchi et al. divisive algorithm, based on triangles. • Super-paramagnetic clustering. Reichardt-Bornholdt. • Guimerá-Amaral, and Danon et al. simulated annealing modularity optimization. Very good results but slow. • Greedy algorithm by Newman. Very fast. • Arenas-Duch. Extremal optimization. Works very well. • Pons-Latapy. Walk-trap algorithm. Trapped random-walks. • Newman. Spectral method with a “modularity matrix”. Very fast and reliable. • ... (this list is non-exhaustive). Methods for community detection Some methods are divisive while others are agglomerative • Girvan-Newman : removal of links with high betweenness. • Radicchi et al. divisive algorithm, based on triangles. • Super-paramagnetic clustering. Reichardt-Bornholdt. • Guimerá-Amaral, and Danon et al. simulated annealing modularity optimization. Very good results but slow. • Greedy algorithm by Newman. Very fast. • Arenas-Duch. Extremal optimization. Works very well. • Pons-Latapy. Walk-trap algorithm. Trapped random-walks. • Newman. Spectral method with a “modularity matrix”....
Website: ergodic.ugr.es | Filesize: -
No of Page(s): 87
Download Spectral methods and network communities - Universidad de Granada.pdf
Saturday, October 27, 2012
A genetic algorithm approach to the integrated inventory- distribution
We introduce a new genetic algorithm (GA) approach for the integrated inventory distribution problem (IIDP). We present the developed genetic representation and use a randomized version of a previously developed construction heuristic to generate the initial random population. We design suitable crossover and mutation operators for the GA improvement phase. The comparison of results shows the significance of the designed GA over the construction heuristic
and demonstrates the capability of reaching solutions within 20% of the optimum on sets of randomly generated test problems. Keywords: Inventory routing; Inventory management; Vehicle routing; GA; Lot sizing. 1. Introduction In the last few years, new ideas of centralized supply chain management, such as vendor managed inventory (VMI), have been widely accepted in many supply chain environments. The idea of centralized supply chain management is that suppliers get direct access to the customers’ inventory positions and make the necessary replenishment decisions. This lead to the interest of studying integrated models that combine transportation and inventory decisions. Such an integrated model is intended to optimize the replenishment decisions conducted by the supplier in order to minimize the overall inventory and transportation costs. In their literature review article, Baita et al. (1998) use the term ‘dynamic routing and inventory (DRAI)’ to refer to the class of problems in which simultaneous vehicle routing, as a transportation problem, and inventory decisions are present in a dynamic framework. They classify the approaches used for DRAI problems into two categories. The first one operates in the frequency domain where the decision variables are replenishment frequencies, or headways between shipments. Examples in the literature include the work of Blumenfeld et al. (1985), Hall (1985), Daganzo (1987), and Ernst and Pyke (1993) (for more references see Daganzo, 1999). The second category, referred to as the time domain approach, uses discrete time models to determine delivery quantities and vehicle routes at fixed time intervals. Within this category the most famous problem is the inventory routing problem (IRP), which arises in the application of the distribution of industrial gases. The main concern for this kind of application is to maintain an adequate level of inventory for all the customers and to avoid any stockout. In the IRP, it is assumed that each customer has a fixed demand rate and the focus is on minimizing the total transportation cost; while inventory costs are generally not of concern. Examples of this application in the literature include Bell et al. (1983), Golden et al. (1984), Dror et al. (1985), Dror and Ball (1987) and recently Campbell et al. (2002). In the literature, the integration of vehicle routing and inventory decisions with the consideration of inventory costs in the time domain approaches of the DRAI problems has taken different forms. In a few cases a single period planning problem has been addressed as...
Website: www-rcf.usc.edu | Filesize: -
No of Page(s): 20
Download A genetic algorithm approach to the integrated inventory- distribution ....pdf
and demonstrates the capability of reaching solutions within 20% of the optimum on sets of randomly generated test problems. Keywords: Inventory routing; Inventory management; Vehicle routing; GA; Lot sizing. 1. Introduction In the last few years, new ideas of centralized supply chain management, such as vendor managed inventory (VMI), have been widely accepted in many supply chain environments. The idea of centralized supply chain management is that suppliers get direct access to the customers’ inventory positions and make the necessary replenishment decisions. This lead to the interest of studying integrated models that combine transportation and inventory decisions. Such an integrated model is intended to optimize the replenishment decisions conducted by the supplier in order to minimize the overall inventory and transportation costs. In their literature review article, Baita et al. (1998) use the term ‘dynamic routing and inventory (DRAI)’ to refer to the class of problems in which simultaneous vehicle routing, as a transportation problem, and inventory decisions are present in a dynamic framework. They classify the approaches used for DRAI problems into two categories. The first one operates in the frequency domain where the decision variables are replenishment frequencies, or headways between shipments. Examples in the literature include the work of Blumenfeld et al. (1985), Hall (1985), Daganzo (1987), and Ernst and Pyke (1993) (for more references see Daganzo, 1999). The second category, referred to as the time domain approach, uses discrete time models to determine delivery quantities and vehicle routes at fixed time intervals. Within this category the most famous problem is the inventory routing problem (IRP), which arises in the application of the distribution of industrial gases. The main concern for this kind of application is to maintain an adequate level of inventory for all the customers and to avoid any stockout. In the IRP, it is assumed that each customer has a fixed demand rate and the focus is on minimizing the total transportation cost; while inventory costs are generally not of concern. Examples of this application in the literature include Bell et al. (1983), Golden et al. (1984), Dror et al. (1985), Dror and Ball (1987) and recently Campbell et al. (2002). In the literature, the integration of vehicle routing and inventory decisions with the consideration of inventory costs in the time domain approaches of the DRAI problems has taken different forms. In a few cases a single period planning problem has been addressed as...
Website: www-rcf.usc.edu | Filesize: -
No of Page(s): 20
Download A genetic algorithm approach to the integrated inventory- distribution ....pdf
Dynamic Spectral Layout of Small Worlds
Spectral methods are naturally suited for dynamic graph layout, because moderate changes of a graph yield moderate changes of the layout under weak assumptions. We discuss some general principles for dynamic graph layout and derive a dynamic spectral layout approach for the animation of small-world models. 1 Introduction The main problem in dynamic graph layout is the balance of layout quality and mental-map preservation [17]. Typically, the problem is addressed by adapting a static layout method such that it produces similar layouts for successive graphs. While these adaptations are typically ad-hoc [8], others [2,1] are based
on the formally derived method [3] of integrating difference metrics [5] into the static method. See [4] for an overview of the dynamic graph drawing problem. Spectral layout denotes the use of eigenvectors of graph-related matrices such as the adjacency or Laplacian matrix as coordinate vectors. See, e.g., [15] for an introduction. We argue that spectral methods are particularly suited for dy- namic graph layout both from a theoretical and practical point of view, because moderate changes in the graph naturally translate into moderate changes of the layout, and updates can be computed efficiently. This paper is organized as follows. In Sect. 2, we define some basic notation and recall the principles of spectral graph layout. The dynamic graph layout problem is reviewed briefly in Sect. 3, and methods for updates between layouts of consecutive graphs are treated in more detail in Sect. 4. In Sect. 5, our ap- proach for small worlds is introduced, and we conclude with a brief discussion in Sect. 6. 2 Preliminaries For ease of exposition we consider only two-dimensional straight-line represen- tations of simple, undirected graphs G = (V,E) with positive edge weights ω : E → IR+, although most techniques and results in this paper easily carry over to other classes of graphs. In straight-line representations, a two-dimensional layout is determined by a vector (pv)v∈V of positions pv = (xv,yv). Most of the time we will reason about one-dimensional layouts x that represent the projection of p onto one component. For any graph-related matrix M(G), a spectral layout of G is defined by two eigenvectors x and y of M(G). For simplicity, we will only consider layouts derived from the Laplacian matrix L(G) of G, which is defined by elements lscriptv,w = braceleftBiggsummationtext u∈V ω(u,v) ,v = w , −ω(v,w) ,v negationslash= w , The rows of L(G) add up to 0, thus, the vector 1 = (1,...,1)T is a trivial eigenvector for eigenvalue 0. Since L(G) is symmetric all eigenvalues are real, and the theorem of Gershgorin [13] yields, that the spectrum is bounded to the interval [0,g], for an upper bound g ≥ 0. Hence, the spectrum can be written as 0 = λ1 ≤ λ2 ≤ ... ≤ λn ≤ g with corresponding unit eigenvectors 1/√n = v1,...,vn. Based on the Laplacian, a spectral layout is defined as p = (v2,v3), where v2 and v3 are unit eigenvectors to the...
Website: www.informatik.uni-konstanz.de | Filesize: -
No of Page(s): 12
Download Dynamic Spectral Layout of Small Worlds - CiteSeer.pdf
on the formally derived method [3] of integrating difference metrics [5] into the static method. See [4] for an overview of the dynamic graph drawing problem. Spectral layout denotes the use of eigenvectors of graph-related matrices such as the adjacency or Laplacian matrix as coordinate vectors. See, e.g., [15] for an introduction. We argue that spectral methods are particularly suited for dy- namic graph layout both from a theoretical and practical point of view, because moderate changes in the graph naturally translate into moderate changes of the layout, and updates can be computed efficiently. This paper is organized as follows. In Sect. 2, we define some basic notation and recall the principles of spectral graph layout. The dynamic graph layout problem is reviewed briefly in Sect. 3, and methods for updates between layouts of consecutive graphs are treated in more detail in Sect. 4. In Sect. 5, our ap- proach for small worlds is introduced, and we conclude with a brief discussion in Sect. 6. 2 Preliminaries For ease of exposition we consider only two-dimensional straight-line represen- tations of simple, undirected graphs G = (V,E) with positive edge weights ω : E → IR+, although most techniques and results in this paper easily carry over to other classes of graphs. In straight-line representations, a two-dimensional layout is determined by a vector (pv)v∈V of positions pv = (xv,yv). Most of the time we will reason about one-dimensional layouts x that represent the projection of p onto one component. For any graph-related matrix M(G), a spectral layout of G is defined by two eigenvectors x and y of M(G). For simplicity, we will only consider layouts derived from the Laplacian matrix L(G) of G, which is defined by elements lscriptv,w = braceleftBiggsummationtext u∈V ω(u,v) ,v = w , −ω(v,w) ,v negationslash= w , The rows of L(G) add up to 0, thus, the vector 1 = (1,...,1)T is a trivial eigenvector for eigenvalue 0. Since L(G) is symmetric all eigenvalues are real, and the theorem of Gershgorin [13] yields, that the spectrum is bounded to the interval [0,g], for an upper bound g ≥ 0. Hence, the spectrum can be written as 0 = λ1 ≤ λ2 ≤ ... ≤ λn ≤ g with corresponding unit eigenvectors 1/√n = v1,...,vn. Based on the Laplacian, a spectral layout is defined as p = (v2,v3), where v2 and v3 are unit eigenvectors to the...
Website: www.informatik.uni-konstanz.de | Filesize: -
No of Page(s): 12
Download Dynamic Spectral Layout of Small Worlds - CiteSeer.pdf
The Development of Social Network Analysis
Social Network Analysis 26.2 Visualization Principles Illustrative Example Substance, Design, Algorithm 26.3 Substance-based Designs Prominence Cohesion Two-mode networks Dynamics 26.4 Trends and Challenges Social networks provide a rich source of graph drawing problems, because they appear in an incredibly wide variety of forms and contexts. After sketching the scope of social network analysis, we establish some general principles for social network visualization before nally reviewing applications of, and challenges for, graph drawing methods in this area. Other accounts more generally relating to the status of visualization in social network analysis are given, e.g., in [Klo81, BKR+99, Fre00, Fre05,
BKR06]. Surveys that are more comprehensive on information visualization approaches, interaction, and network applications from social media are given in [CM11, RF10, CY10]. 26.1 Social Network Analysis The fundamental assumption underlying social network theory is the idea that seemingly autonomous individuals and organizations are in fact embedded in social relations and interactions [BMBL09]. The term social network was coined to delineate the relational perspective from other research traditions on social groups and social categories [Bar54]. In general, a social network consists of actors (e.g., persons, organizations) and some form of (often, but not necessarily: social) relation among them. The network structure is usually modeled as a graph, in which vertices represent actors, and edges represent ties, i.e., the existence of a relation between two actors. Since traits of actors and ties may be important, both vertices and edges can have a multitude of attributes. We will use graph terminology for everything relating to the data model, and social network terminology when referring to substantive aspects. While attributed graph models are indeed at the heart of formal treatments, it is worth noting that theoretically justi ed data models are not as obvious as it may seem [But09]. In fact, social network analysis is maturing into a paradigm of distinct structural theories and associated relational methods. General introductions and methodological overviews can be found in [WB88, WF94, Sco00, CSW05, BE05], a historic account in [Fre04a], and a comprehensive collection of in uential articles in [Fre08]. c 2005 by CRC Press Figure 26.1 A sociogram from [Mor53, p. 422] showing a graph with fourteen highlighted vertices and four clusters. In social network reseach it is important to clarify whether the networks are considered dependent or explanatory variables. In the former case the interest is in why and how networks form the way they do, and in the latter case the interest is in why and how networks in uence other outcomes. For convenience, we will refer to the former as network theory (studying network formation) and to the latter as network analysis (studying network e ects). A major distinction from non-network approaches is that the unit of analysis is the dyad, i.e. a pair of actors (may they be linked or not) rather than a monad (a singleton actor). The methodological toolbox can be organized into the following main compartments. Indexing The assignment of values to predetermined substructures of any size. Most common...
Website: www.informatik.uni-konstanz.de | Filesize: -
No of Page(s): 34
Download The Development of Social Network Analysis—with an.pdf
BKR06]. Surveys that are more comprehensive on information visualization approaches, interaction, and network applications from social media are given in [CM11, RF10, CY10]. 26.1 Social Network Analysis The fundamental assumption underlying social network theory is the idea that seemingly autonomous individuals and organizations are in fact embedded in social relations and interactions [BMBL09]. The term social network was coined to delineate the relational perspective from other research traditions on social groups and social categories [Bar54]. In general, a social network consists of actors (e.g., persons, organizations) and some form of (often, but not necessarily: social) relation among them. The network structure is usually modeled as a graph, in which vertices represent actors, and edges represent ties, i.e., the existence of a relation between two actors. Since traits of actors and ties may be important, both vertices and edges can have a multitude of attributes. We will use graph terminology for everything relating to the data model, and social network terminology when referring to substantive aspects. While attributed graph models are indeed at the heart of formal treatments, it is worth noting that theoretically justi ed data models are not as obvious as it may seem [But09]. In fact, social network analysis is maturing into a paradigm of distinct structural theories and associated relational methods. General introductions and methodological overviews can be found in [WB88, WF94, Sco00, CSW05, BE05], a historic account in [Fre04a], and a comprehensive collection of in uential articles in [Fre08]. c 2005 by CRC Press Figure 26.1 A sociogram from [Mor53, p. 422] showing a graph with fourteen highlighted vertices and four clusters. In social network reseach it is important to clarify whether the networks are considered dependent or explanatory variables. In the former case the interest is in why and how networks form the way they do, and in the latter case the interest is in why and how networks in uence other outcomes. For convenience, we will refer to the former as network theory (studying network formation) and to the latter as network analysis (studying network e ects). A major distinction from non-network approaches is that the unit of analysis is the dyad, i.e. a pair of actors (may they be linked or not) rather than a monad (a singleton actor). The methodological toolbox can be organized into the following main compartments. Indexing The assignment of values to predetermined substructures of any size. Most common...
Website: www.informatik.uni-konstanz.de | Filesize: -
No of Page(s): 34
Download The Development of Social Network Analysis—with an.pdf
Graph Drawing Using Spectral Distance Embedding
We present a novel algorithm for drawing undirected connected graphs, by using a spectral decomposition of the distance matrix to approximate the graph theoretical distances. The main advantages of our algorithm are that it is ”exact” (as opposed to iterative), and it gives results that preserve symmetry and uniform node density, i.e., the drawings are aesthetically pleasing. Our approach has the benefits of fast spectral techniques, but at the same time it produces drawings of a quality comparable to or better than the much slower force-directed approaches. The
computational complexity of our algorithm is governed by its two main steps: distance matrix computation using an all-pairs short- est path algorithm, which is O(|V ||E|); and low-order spectral decomposition, which is O(|V | 2 ). The runtime for typical 20,000 node graphs ranges from 100 to 150 seconds. 1 Introduction AgraphG =(V,E) is a pair where V is the vertex set and E is the edge set, which is a binary relation over V . The graph drawing problem is to compute an aesthetically pleasing layout of vertices and edges so that it is easy to grasp visually the inherent structure of the graph. Depending on the aesthetic criteria of interest, various approaches have been developed, and a general survey can be found in [11, 16]. We consider only the straight-line edge drawings of graphs, which reduces the problem to finding the coordinates of the vertices in two dimensions. A popular approach is to define an energy function or a force-directed model with respect to vertex positions, and to iteratively compute a local minimum of the energy function. The positions of the vertices at the local minimum produce the final layout. This approach is generally simple and easy to extend to new energy functions. Various energy functions and force models have been studied [4–6, 10] and there exist several improvements to handle large graphs, most of them concentrating on a multi-scale paradigm. This involves laying out a coarser level of the graph first, and then taking advantage of this coarse layout to compute the vertex positions at a finer level (eg. [15, 18]). Spectral graph drawing approaches have become popular recently. We use the term spectral graph drawing to refer to any approach that produces a final layout using the spectral decomposition of some matrix derived from the vertex and edge sets. In this paper, we present a spectral graph drawing algorithm, SDE (Spectral Distance Embedding), in which we use the spectral decomposition of the graph theoretical distance matrix to produce the final layout of the vertices. In the final layout, the pair-wise Euclidean distances of the vertices approximate the graph theoretical distances. SDE consists of two main steps: (i) all-pairs shortest path computation, which takes O(|V ||E|) time. (ii) spectral decomposition of the distance matrix, in which we find the optimal rank-d reconstruction to embed in d-dimensions. The complexity of this step is O(d|V | 2 )....
Website: www.cs.rpi.edu | Filesize: -
No of Page(s): 12
Download SDE: Graph Drawing Using Spectral Distance Embedding.pdf
computational complexity of our algorithm is governed by its two main steps: distance matrix computation using an all-pairs short- est path algorithm, which is O(|V ||E|); and low-order spectral decomposition, which is O(|V | 2 ). The runtime for typical 20,000 node graphs ranges from 100 to 150 seconds. 1 Introduction AgraphG =(V,E) is a pair where V is the vertex set and E is the edge set, which is a binary relation over V . The graph drawing problem is to compute an aesthetically pleasing layout of vertices and edges so that it is easy to grasp visually the inherent structure of the graph. Depending on the aesthetic criteria of interest, various approaches have been developed, and a general survey can be found in [11, 16]. We consider only the straight-line edge drawings of graphs, which reduces the problem to finding the coordinates of the vertices in two dimensions. A popular approach is to define an energy function or a force-directed model with respect to vertex positions, and to iteratively compute a local minimum of the energy function. The positions of the vertices at the local minimum produce the final layout. This approach is generally simple and easy to extend to new energy functions. Various energy functions and force models have been studied [4–6, 10] and there exist several improvements to handle large graphs, most of them concentrating on a multi-scale paradigm. This involves laying out a coarser level of the graph first, and then taking advantage of this coarse layout to compute the vertex positions at a finer level (eg. [15, 18]). Spectral graph drawing approaches have become popular recently. We use the term spectral graph drawing to refer to any approach that produces a final layout using the spectral decomposition of some matrix derived from the vertex and edge sets. In this paper, we present a spectral graph drawing algorithm, SDE (Spectral Distance Embedding), in which we use the spectral decomposition of the graph theoretical distance matrix to produce the final layout of the vertices. In the final layout, the pair-wise Euclidean distances of the vertices approximate the graph theoretical distances. SDE consists of two main steps: (i) all-pairs shortest path computation, which takes O(|V ||E|) time. (ii) spectral decomposition of the distance matrix, in which we find the optimal rank-d reconstruction to embed in d-dimensions. The complexity of this step is O(d|V | 2 )....
Website: www.cs.rpi.edu | Filesize: -
No of Page(s): 12
Download SDE: Graph Drawing Using Spectral Distance Embedding.pdf
Subscribe to:
Posts (Atom)