-
Parallel Set Cover and Hypergraph Matching via Uniform Random Sampling
Authors:
Laxman Dhulipala,
Michael Dinitz,
Jakub Łącki,
Slobodan Mitrović
Abstract:
The SetCover problem has been extensively studied in many different models of computation, including parallel and distributed settings. From an approximation point of view, there are two standard guarantees: an $O(\log Δ)$-approximation (where $Δ$ is the maximum set size) and an $O(f)$-approximation (where $f$ is the maximum number of sets containing any given element).
In this paper, we introdu…
▽ More
The SetCover problem has been extensively studied in many different models of computation, including parallel and distributed settings. From an approximation point of view, there are two standard guarantees: an $O(\log Δ)$-approximation (where $Δ$ is the maximum set size) and an $O(f)$-approximation (where $f$ is the maximum number of sets containing any given element).
In this paper, we introduce a new, surprisingly simple, model-independent approach to solving SetCover in unweighted graphs. We obtain multiple improved algorithms in the MPC and CRCW PRAM models. First, in the MPC model with sublinear space per machine, our algorithms can compute an $O(f)$ approximation to SetCover in $\hat{O}(\sqrt{\log Δ} + \log f)$ rounds, where we use the $\hat{O}(x)$ notation to suppress $\mathrm{poly} \log x$ and $\mathrm{poly} \log \log n$ terms, and a $O(\log Δ)$ approximation in $O(\log^{3/2} n)$ rounds. Moreover, in the PRAM model, we give a $O(f)$ approximate algorithm using linear work and $O(\log n)$ depth. All these bounds improve the existing round complexity/depth bounds by a $\log^{Ω(1)} n$ factor.
Moreover, our approach leads to many other new algorithms, including improved algorithms for the HypergraphMatching problem in the MPC model, as well as simpler SetCover algorithms that match the existing bounds.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Massively Parallel Minimum Spanning Tree in General Metric Spaces
Authors:
Amir Azarmehr,
Soheil Behnezhad,
Rajesh Jayaram,
Jakub Łącki,
Vahab Mirrokni,
Peilin Zhong
Abstract:
We study the minimum spanning tree (MST) problem in the massively parallel computation (MPC) model. Our focus is particularly on the *strictly sublinear* regime of MPC where the space per machine is $O(n^δ)$. Here $n$ is the number of vertices and constant $δ\in (0, 1)$ can be made arbitrarily small. The MST problem admits a simple and folklore $O(\log n)$-round algorithm in the MPC model. When th…
▽ More
We study the minimum spanning tree (MST) problem in the massively parallel computation (MPC) model. Our focus is particularly on the *strictly sublinear* regime of MPC where the space per machine is $O(n^δ)$. Here $n$ is the number of vertices and constant $δ\in (0, 1)$ can be made arbitrarily small. The MST problem admits a simple and folklore $O(\log n)$-round algorithm in the MPC model. When the weights can be arbitrary, this matches a conditional lower bound of $Ω(\log n)$ which follows from a well-known 1vs2-Cycle conjecture. As such, much of the literature focuses on breaking the logarithmic barrier in more structured variants of the problem, such as when the vertices correspond to points in low- [ANOY14, STOC'14] or high-dimensional Euclidean spaces [JMNZ, SODA'24].
In this work, we focus more generally on metric spaces. Namely, all pairwise weights are provided and guaranteed to satisfy the triangle inequality, but are otherwise unconstrained. We show that for any $\varepsilon > 0$, a $(1+\varepsilon)$-approximate MST can be found in $O(\log \frac{1}{\varepsilon} + \log \log n)$ rounds, which is the first $o(\log n)$-round algorithm for finding any constant approximation in this setting. Other than being applicable to more general weight functions, our algorithm also slightly improves the $O(\log \log n \cdot \log \log \log n)$ round-complexity of [JMNZ24, SODA'24] and significantly improves its approximation from a large constant to $1+\varepsilon$.
On the lower bound side, we prove that under the 1vs2-Cycle conjecture, $Ω(\log \frac{1}{\varepsilon})$ rounds are needed for finding a $(1+\varepsilon)$-approximate MST in general metrics. It is worth noting that while many existing lower bounds in the MPC model under the 1vs2-Cycle conjecture only hold against "component stable" algorithms, our lower bound applies to *all* algorithms.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Efficient Centroid-Linkage Clustering
Authors:
MohammadHossein Bateni,
Laxman Dhulipala,
Willem Fletcher,
Kishen N Gowda,
D Ellis Hershkowitz,
Rajesh Jayaram,
Jakub Łącki
Abstract:
We give an efficient algorithm for Centroid-Linkage Hierarchical Agglomerative Clustering (HAC), which computes a $c$-approximate clustering in roughly $n^{1+O(1/c^2)}$ time. We obtain our result by combining a new Centroid-Linkage HAC algorithm with a novel fully dynamic data structure for nearest neighbor search which works under adaptive updates.
We also evaluate our algorithm empirically. By…
▽ More
We give an efficient algorithm for Centroid-Linkage Hierarchical Agglomerative Clustering (HAC), which computes a $c$-approximate clustering in roughly $n^{1+O(1/c^2)}$ time. We obtain our result by combining a new Centroid-Linkage HAC algorithm with a novel fully dynamic data structure for nearest neighbor search which works under adaptive updates.
We also evaluate our algorithm empirically. By leveraging a state-of-the-art nearest-neighbor search library, we obtain a fast and accurate Centroid-Linkage HAC algorithm. Compared to an existing state-of-the-art exact baseline, our implementation maintains the clustering quality while delivering up to a $36\times$ speedup due to performing fewer distance comparisons.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
BYO: A Unified Framework for Benchmarking Large-Scale Graph Containers
Authors:
Brian Wheatman,
Xiaojun Dong,
Zheqi Shen,
Laxman Dhulipala,
Jakub Łącki,
Prashant Pandey,
Helen Xu
Abstract:
A fundamental building block in any graph algorithm is a graph container - a data structure used to represent the graph. Ideally, a graph container enables efficient access to the underlying graph, has low space usage, and supports updating the graph efficiently. In this paper, we conduct an extensive empirical evaluation of graph containers designed to support running algorithms on large graphs.…
▽ More
A fundamental building block in any graph algorithm is a graph container - a data structure used to represent the graph. Ideally, a graph container enables efficient access to the underlying graph, has low space usage, and supports updating the graph efficiently. In this paper, we conduct an extensive empirical evaluation of graph containers designed to support running algorithms on large graphs. To our knowledge, this is the first apples-to-apples comparison of graph containers rather than overall systems, which include confounding factors such as differences in algorithm implementations and infrastructure.
We measure the running time of 10 highly-optimized algorithms across over 20 different containers and 10 graphs. Somewhat surprisingly, we find that the average algorithm running time does not differ much across containers, especially those that support dynamic updates. Specifically, a simple container based on an off-the-shelf B-tree is only 1.22x slower on average than a highly optimized static one. Moreover, we observe that simplifying a graph-container Application Programming Interface (API) to only a few simple functions incurs a mere 1.16x slowdown compared to a complete API. Finally, we also measure batch-insert throughput in dynamic-graph containers for a full picture of their performance.
To perform the benchmarks, we introduce BYO, a unified framework that standardizes evaluations of graph-algorithm performance across different graph containers. BYO extends the Graph Based Benchmark Suite (Dhulipala et al. 18), a state-of-the-art graph algorithm benchmark, to easily plug into different dynamic graph containers and enable fair comparisons between them on a large suite of graph algorithms. While several graph algorithm benchmarks have been developed to date, to the best of our knowledge, BYO is the first system designed to benchmark graph containers
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Dynamic PageRank: Algorithms and Lower Bounds
Authors:
Rajesh Jayaram,
Jakub Łącki,
Slobodan Mitrović,
Krzysztof Onak,
Piotr Sankowski
Abstract:
We consider the PageRank problem in the dynamic setting, where the goal is to explicitly maintain an approximate PageRank vector $π\in \mathbb{R}^n$ for a graph under a sequence of edge insertions and deletions. Our main result is a complete characterization of the complexity of dynamic PageRank maintenance for both multiplicative and additive ($L_1$) approximations.
First, we establish matching…
▽ More
We consider the PageRank problem in the dynamic setting, where the goal is to explicitly maintain an approximate PageRank vector $π\in \mathbb{R}^n$ for a graph under a sequence of edge insertions and deletions. Our main result is a complete characterization of the complexity of dynamic PageRank maintenance for both multiplicative and additive ($L_1$) approximations.
First, we establish matching lower and upper bounds for maintaining additive approximate PageRank in both incremental and decremental settings. In particular, we demonstrate that in the worst-case $(1/α)^{Θ(\log \log n)}$ update time is necessary and sufficient for this problem, where $α$ is the desired additive approximation. On the other hand, we demonstrate that the commonly employed ForwardPush approach performs substantially worse than this optimal runtime. Specifically, we show that ForwardPush requires $Ω(n^{1-δ})$ time per update on average, for any $δ> 0$, even in the incremental setting.
For multiplicative approximations, however, we demonstrate that the situation is significantly more challenging. Specifically, we prove that any algorithm that explicitly maintains a constant factor multiplicative approximation of the PageRank vector of a directed graph must have amortized update time $Ω(n^{1-δ})$, for any $δ> 0$, even in the incremental setting, thereby resolving a 13-year old open question of Bahmani et al.~(VLDB 2010). This sharply contrasts with the undirected setting, where we show that $\rm{poly}\ \log n$ update time is feasible, even in the fully dynamic setting under oblivious adversary.
△ Less
Submitted 16 May, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
It's Hard to HAC with Average Linkage!
Authors:
MohammadHossein Bateni,
Laxman Dhulipala,
Kishen N Gowda,
D Ellis Hershkowitz,
Rajesh Jayaram,
Jakub Łącki
Abstract:
Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and applied method for hierarchical clustering. Recent applications to massive datasets have driven significant interest in near-linear-time and efficient parallel algorithms for average linkage HAC.
We provide hardness results that rule out such algorithms. On the sequential side, we establish a runtime lower…
▽ More
Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and applied method for hierarchical clustering. Recent applications to massive datasets have driven significant interest in near-linear-time and efficient parallel algorithms for average linkage HAC.
We provide hardness results that rule out such algorithms. On the sequential side, we establish a runtime lower bound of $n^{3/2-ε}$ on $n$ node graphs for sequential combinatorial algorithms under standard fine-grained complexity assumptions. This essentially matches the best-known running time for average linkage HAC. On the parallel side, we prove that average linkage HAC likely cannot be parallelized even on simple graphs by showing that it is CC-hard on trees of diameter $4$. On the possibility side, we demonstrate that average linkage HAC can be efficiently parallelized (i.e., it is in NC) on paths and can be solved in near-linear time when the height of the output cluster hierarchy is small.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search
Authors:
Lars Gottesbüren,
Laxman Dhulipala,
Rajesh Jayaram,
Jakub Lacki
Abstract:
We consider the fundamental problem of decomposing a large-scale approximate nearest neighbor search (ANNS) problem into smaller sub-problems. The goal is to partition the input points into neighborhood-preserving shards, so that the nearest neighbors of any point are contained in only a few shards. When a query arrives, a routing algorithm is used to identify the shards which should be searched f…
▽ More
We consider the fundamental problem of decomposing a large-scale approximate nearest neighbor search (ANNS) problem into smaller sub-problems. The goal is to partition the input points into neighborhood-preserving shards, so that the nearest neighbors of any point are contained in only a few shards. When a query arrives, a routing algorithm is used to identify the shards which should be searched for its nearest neighbors. This approach forms the backbone of distributed ANNS, where the dataset is so large that it must be split across multiple machines.
In this paper, we design simple and highly efficient routing methods, and prove strong theoretical guarantees on their performance. A crucial characteristic of our routing algorithms is that they are inherently modular, and can be used with any partitioning method. This addresses a key drawback of prior approaches, where the routing algorithms are inextricably linked to their associated partitioning method. In particular, our new routing methods enable the use of balanced graph partitioning, which is a high-quality partitioning method without a naturally associated routing algorithm. Thus, we provide the first methods for routing using balanced graph partitioning that are extremely fast to train, admit low latency, and achieve high recall. We provide a comprehensive evaluation of our full partitioning and routing pipeline on billion-scale datasets, where it outperforms existing scalable partitioning methods by significant margins, achieving up to 2.14x higher QPS at 90% recall$@10$ than the best competitor.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Open Problems in (Hyper)Graph Decomposition
Authors:
Deepak Ajwani,
Rob H. Bisseling,
Katrin Casel,
Ümit V. Çatalyürek,
Cédric Chevalier,
Florian Chudigiewitsch,
Marcelo Fonseca Faraj,
Michael Fellows,
Lars Gottesbüren,
Tobias Heuer,
George Karypis,
Kamer Kaya,
Jakub Lacki,
Johannes Langguth,
Xiaoye Sherry Li,
Ruben Mayer,
Johannes Meintrup,
Yosuke Mizutani,
François Pellegrini,
Fabrizio Petrini,
Frances Rosamond,
Ilya Safro,
Sebastian Schlag,
Christian Schulz,
Roohani Sharma
, et al. (4 additional authors not shown)
Abstract:
Large networks are useful in a wide range of applications. Sometimes problem instances are composed of billions of entities. Decomposing and analyzing these structures helps us gain new insights about our surroundings. Even if the final application concerns a different problem (such as traversal, finding paths, trees, and flows), decomposing large graphs is often an important subproblem for comple…
▽ More
Large networks are useful in a wide range of applications. Sometimes problem instances are composed of billions of entities. Decomposing and analyzing these structures helps us gain new insights about our surroundings. Even if the final application concerns a different problem (such as traversal, finding paths, trees, and flows), decomposing large graphs is often an important subproblem for complexity reduction or parallelization. This report is a summary of discussions that happened at Dagstuhl seminar 23331 on "Recent Trends in Graph Decomposition" and presents currently open problems and future directions in the area of (hyper)graph decomposition.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs
Authors:
Laxman Dhulipala,
Jason Lee,
Jakub Łącki,
Vahab Mirrokni
Abstract:
We introduce TeraHAC, a $(1+ε)$-approximate hierarchical agglomerative clustering (HAC) algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to computing $(1+ε)$-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of $(1+ε)$-approximate HAC. Our approach allows us to partition the graph among multiple machines and…
▽ More
We introduce TeraHAC, a $(1+ε)$-approximate hierarchical agglomerative clustering (HAC) algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to computing $(1+ε)$-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of $(1+ε)$-approximate HAC. Our approach allows us to partition the graph among multiple machines and make significant progress in computing the clustering within each partition before any communication with other partitions is needed.
We evaluate TeraHAC on a number of real-world and synthetic graphs of up to 8 trillion edges. We show that TeraHAC requires over 100x fewer rounds compared to previously known approaches for computing HAC. It is up to 8.3x faster than SCC, the state-of-the-art distributed algorithm for hierarchical clustering, while achieving 1.16x higher quality. In fact, TeraHAC essentially retains the quality of the celebrated HAC algorithm while significantly improving the running time.
△ Less
Submitted 11 June, 2024; v1 submitted 7 August, 2023;
originally announced August 2023.
-
Fully Dynamic Consistent $k$-Center Clustering
Authors:
Jakub Łącki,
Bernhard Haeupler,
Christoph Grunau,
Václav Rozhoň,
Rajesh Jayaram
Abstract:
We study the consistent k-center clustering problem. In this problem, the goal is to maintain a constant factor approximate $k$-center solution during a sequence of $n$ point insertions and deletions while minimizing the recourse, i.e., the number of changes made to the set of centers after each point insertion or deletion. Previous works by Lattanzi and Vassilvitskii [ICML '12] and Fichtenberger,…
▽ More
We study the consistent k-center clustering problem. In this problem, the goal is to maintain a constant factor approximate $k$-center solution during a sequence of $n$ point insertions and deletions while minimizing the recourse, i.e., the number of changes made to the set of centers after each point insertion or deletion. Previous works by Lattanzi and Vassilvitskii [ICML '12] and Fichtenberger, Lattanzi, Norouzi-Fard, and Svensson [SODA '21] showed that in the incremental setting, where deletions are not allowed, one can obtain $k \cdot \textrm{polylog}(n) / n$ amortized recourse for both $k$-center and $k$-median, and demonstrated a matching lower bound. However, no algorithm for the fully dynamic setting achieves less than the trivial $O(k)$ changes per update, which can be obtained by simply reclustering the full dataset after every update.
In this work, we give the first algorithm for consistent $k$-center clustering for the fully dynamic setting, i.e., when both point insertions and deletions are allowed, and improves upon a trivial $O(k)$ recourse bound. Specifically, our algorithm maintains a constant factor approximate solution while ensuring worst-case constant recourse per update, which is optimal in the fully dynamic setting. Moreover, our algorithm is deterministic and is therefore correct even if an adaptive adversary chooses the insertions and deletions.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Phase transitions in the Prisoner's Dilemma game on scale-free networks
Authors:
Jacek Miękisz,
Javad Mohamadichamgavi,
Jakub Łącki
Abstract:
We study stochastic dynamics of the Prisoner's Dilemma game on random Erdös-Rényi and Barabási-Albert networks with a cost of maintaining a link between interacting players. Stochastic simulations show that when the cost increases, the population of players located on Barabási-Albert network undergoes a sharp transition from an ordered state, where almost all players cooperate, to a state in which…
▽ More
We study stochastic dynamics of the Prisoner's Dilemma game on random Erdös-Rényi and Barabási-Albert networks with a cost of maintaining a link between interacting players. Stochastic simulations show that when the cost increases, the population of players located on Barabási-Albert network undergoes a sharp transition from an ordered state, where almost all players cooperate, to a state in which both cooperators and defectors coexist. At the critical cost, the population oscillates in time between these two states. Such a situation is not present in the Erdös-Rényi network. We provide some heuristic analytical arguments for the phase transition and the value of the critical cost in the Barabási-Albert network.
△ Less
Submitted 12 February, 2024; v1 submitted 6 April, 2023;
originally announced April 2023.
-
Adaptive Massively Parallel Connectivity in Optimal Space
Authors:
Rustam Latypov,
Jakub Łącki,
Yannic Maus,
Jara Uitto
Abstract:
We study the problem of finding connected components in the Adaptive Massively Parallel Computation (AMPC) model. We show that when we require the total space to be linear in the size of the input graph the problem can be solved in $O(\log^* n)$ rounds in forests (with high probability) and $2^{O(\log^* n)}$ expected rounds in general graphs. This improves upon an existing $O(\log \log_{m/n} n)$ r…
▽ More
We study the problem of finding connected components in the Adaptive Massively Parallel Computation (AMPC) model. We show that when we require the total space to be linear in the size of the input graph the problem can be solved in $O(\log^* n)$ rounds in forests (with high probability) and $2^{O(\log^* n)}$ expected rounds in general graphs. This improves upon an existing $O(\log \log_{m/n} n)$ round algorithm. For the case when the desired number of rounds is constant we show that both problems can be solved using $Θ(m + n \log^{(k)} n)$ total space in expectation (in each round), where $k$ is an arbitrarily large constant and $\log^{(k)}$ is the $k$-th iterate of the $\log_2$ function. This improves upon existing algorithms requiring $Ω(m + n \log n)$ total space.
△ Less
Submitted 14 April, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
Constant Approximation for Normalized Modularity and Associations Clustering
Authors:
Jakub Łącki,
Vahab Mirrokni,
Christian Sohler
Abstract:
We study the problem of graph clustering under a broad class of objectives in which the quality of a cluster is defined based on the ratio between the number of edges in the cluster, and the total weight of vertices in the cluster. We show that our definition is closely related to popular clustering measures, namely normalized associations, which is a dual of the normalized cut objective, and norm…
▽ More
We study the problem of graph clustering under a broad class of objectives in which the quality of a cluster is defined based on the ratio between the number of edges in the cluster, and the total weight of vertices in the cluster. We show that our definition is closely related to popular clustering measures, namely normalized associations, which is a dual of the normalized cut objective, and normalized modularity. We give a linear time constant-approximate algorithm for our objective, which implies the first constant-factor approximation algorithms for normalized modularity and normalized associations.
△ Less
Submitted 29 December, 2022;
originally announced December 2022.
-
Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth
Authors:
Laxman Dhulipala,
David Eisenstat,
Jakub Łącki,
Vahab Mirronki,
Jessica Shi
Abstract:
Obtaining scalable algorithms for hierarchical agglomerative clustering (HAC) is of significant interest due to the massive size of real-world datasets. At the same time, efficiently parallelizing HAC is difficult due to the seemingly sequential nature of the algorithm. In this paper, we address this issue and present ParHAC, the first efficient parallel HAC algorithm with sublinear depth for the…
▽ More
Obtaining scalable algorithms for hierarchical agglomerative clustering (HAC) is of significant interest due to the massive size of real-world datasets. At the same time, efficiently parallelizing HAC is difficult due to the seemingly sequential nature of the algorithm. In this paper, we address this issue and present ParHAC, the first efficient parallel HAC algorithm with sublinear depth for the widely-used average-linkage function. In particular, we provide a $(1+ε)$-approximation algorithm for this problem on $m$ edge graphs using $\tilde{O}(m)$ work and poly-logarithmic depth. Moreover, we show that obtaining similar bounds for exact average-linkage HAC is not possible under standard complexity-theoretic assumptions.
We complement our theoretical results with a comprehensive study of the ParHAC algorithm in terms of its scalability, performance, and quality, and compare with several state-of-the-art sequential and parallel baselines. On a broad set of large publicly-available real-world datasets, we find that ParHAC obtains a 50.1x speedup on average over the best sequential baseline, while achieving quality similar to the exact HAC algorithm. We also show that ParHAC can cluster one of the largest publicly available graph datasets with 124 billion edges in a little over three hours using a commodity multicore machine.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Hierarchical Clustering in Graph Streams: Single-Pass Algorithms and Space Lower Bounds
Authors:
Sepehr Assadi,
Vaggos Chatziafratis,
Jakub Łącki,
Vahab Mirrokni,
Chen Wang
Abstract:
The Hierarchical Clustering (HC) problem consists of building a hierarchy of clusters to represent a given dataset. Motivated by the modern large-scale applications, we study the problem in the \streaming model, in which the memory is heavily limited and only a single or very few passes over the input are allowed. Specifically, we investigate whether a good hierarchical clustering can be obtained,…
▽ More
The Hierarchical Clustering (HC) problem consists of building a hierarchy of clusters to represent a given dataset. Motivated by the modern large-scale applications, we study the problem in the \streaming model, in which the memory is heavily limited and only a single or very few passes over the input are allowed. Specifically, we investigate whether a good hierarchical clustering can be obtained, or at least whether we can approximately estimate the value of the optimal hierarchy. To measure the quality of a hierarchy, we use the HC minimization objective introduced by Dasgupta. Assuming that the input is an $n$-vertex weighted graph whose edges arrive in a stream, we derive the following results on space-vs-accuracy tradeoffs:
* With $O(n\cdot \text{polylog}\,{n})$ space, we develop a single-pass algorithm, whose approximation ratio matches the currently best offline algorithm.
* When the space is more limited, namely, $n^{1-o(1)}$, we prove that no algorithm can even estimate the value of optimum HC tree to within an $o(\frac{\log{n}}{\log\log{n}})$ factor, even when allowed $\text{polylog}{\,{n}}$ passes over the input.
* In the most stringent setting of $\text{polylog}\,{n}$ space, we rule out algorithms that can even distinguish between "highly"-vs-"poorly" clusterable graphs, namely, graphs that have an $n^{1/2-o(1)}$ factor gap between their HC objective value.
* Finally, we prove that any single-pass streaming algorithm that computes an optimal HC tree requires to store almost the entire input even if allowed exponential time.
Our algorithmic results establish a general structural result that proves that cut sparsifiers of input graph can preserve cost of "balanced" HC trees to within a constant factor. Our lower bound results include a new streaming lower bound for a novel problem "One-vs-Many-Expanders", which can be of independent interest.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Optimal Decremental Connectivity in Non-Sparse Graphs
Authors:
Anders Aaman,
Adam Karczmarz,
Jakub Łącki,
Nikos Parotsidis,
Peter M. R. Rasmussen,
Mikkel Thorup
Abstract:
We present a dynamic algorithm for maintaining the connected and 2-edge-connected components in an undirected graph subject to edge deletions. The algorithm is Monte-Carlo randomized and processes any sequence of edge deletions in $O(m + n \operatorname{polylog} n)$ total time. Interspersed with the deletions, it can answer queries to whether any two given vertices currently belong to the same (2-…
▽ More
We present a dynamic algorithm for maintaining the connected and 2-edge-connected components in an undirected graph subject to edge deletions. The algorithm is Monte-Carlo randomized and processes any sequence of edge deletions in $O(m + n \operatorname{polylog} n)$ total time. Interspersed with the deletions, it can answer queries to whether any two given vertices currently belong to the same (2-edge-)connected component in constant time. Our result is based on a general Monte-Carlo randomized reduction from decremental $c$-edge-connectivity to a variant of fully-dynamic $c$-edge-connectivity on a sparse graph. While being Monte-Carlo, our reduction supports a certain final self-check that can be used in Las Vegas algorithms for static problems such as Unique Perfect Matching.
For non-sparse graphs with $Ω(n \operatorname{polylog} n)$ edges, our connectivity and $2$-edge-connectivity algorithms handle all deletions in optimal linear total time, using existing algorithms for the respective fully-dynamic problems. This improves upon an $O(m \log (n^2 / m) + n \operatorname{polylog} n)$-time algorithm of Thorup [J.Alg. 1999], which runs in linear time only for graphs with $Ω(n^2)$ edges.
Our constant amortized cost for edge deletions in decremental connectivity in non-sparse graphs should be contrasted with an $Ω(\log n/\log\log n)$ worst-case time lower bound in the decremental setting [Alstrup, Thore Husfeldt, FOCS'98] as well as an $Ω(\log n)$ amortized time lower-bound in the fully-dynamic setting [Patrascu and Demaine STOC'04].
△ Less
Submitted 19 November, 2021; v1 submitted 17 November, 2021;
originally announced November 2021.
-
Scalable Community Detection via Parallel Correlation Clustering
Authors:
Jessica Shi,
Laxman Dhulipala,
David Eisenstat,
Jakub Łącki,
Vahab Mirrokni
Abstract:
Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground trut…
▽ More
Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LambdaCC objective (introduced by Veldt et al.), which encompasses modularity and correlation clustering. Our framework consists of highly-optimized implementations that scale to large data sets of billions of edges and that obtain high-quality clusters compared to ground-truth data, on both unweighted and weighted graphs. Our empirical evaluation shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. For example, on a 30-core machine with two-way hyper-threading, our implementations achieve orders of magnitude speedups over other correlation clustering baselines, and up to 28.44x speedups over our own sequential baselines while maintaining or improving quality.
△ Less
Submitted 27 July, 2021;
originally announced August 2021.
-
Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time
Authors:
Laxman Dhulipala,
David Eisenstat,
Jakub Łącki,
Vahab Mirrokni,
Jessica Shi
Abstract:
We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering that provides the first efficient $\tilde{O}(m)$ time exact algorithms for classic linkage measures, such as complete- and WPGMA-linkage, as well as other measures. Furthermore, for average-linkage, arguably the m…
▽ More
We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering that provides the first efficient $\tilde{O}(m)$ time exact algorithms for classic linkage measures, such as complete- and WPGMA-linkage, as well as other measures. Furthermore, for average-linkage, arguably the most popular variant of HAC, we provide an algorithm that runs in $\tilde{O}(n\sqrt{m})$ time. For this variant, this is the first exact algorithm that runs in subquadratic time, as long as $m=n^{2-ε}$ for some constant $ε> 0$. We complement this result with a simple $ε$-close approximation algorithm for average-linkage in our framework that runs in $\tilde{O}(m)$ time. As an application of our algorithms, we consider clustering points in a metric space by first using $k$-NN to generate a graph from the point set, and then running our algorithms on the resulting weighted graph. We validate the performance of our algorithms on publicly available datasets, and show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Parallel Graph Algorithms in Constant Adaptive Rounds: Theory meets Practice
Authors:
Soheil Behnezhad,
Laxman Dhulipala,
Hossein Esfandiari,
Jakub Łącki,
Vahab Mirrokni,
Warren Schudy
Abstract:
We study fundamental graph problems such as graph connectivity, minimum spanning forest (MSF), and approximate maximum (weight) matching in a distributed setting. In particular, we focus on the Adaptive Massively Parallel Computation (AMPC) model, which is a theoretical model that captures MapReduce-like computation augmented with a distributed hash table.
We show the first AMPC algorithms for a…
▽ More
We study fundamental graph problems such as graph connectivity, minimum spanning forest (MSF), and approximate maximum (weight) matching in a distributed setting. In particular, we focus on the Adaptive Massively Parallel Computation (AMPC) model, which is a theoretical model that captures MapReduce-like computation augmented with a distributed hash table.
We show the first AMPC algorithms for all of the studied problems that run in a constant number of rounds and use only $O(n^ε)$ space per machine, where $0 < ε< 1$. Our results improve both upon the previous results in the AMPC model, as well as the best-known results in the MPC model, which is the theoretical model underpinning many popular distributed computation frameworks, such as MapReduce, Hadoop, Beam, Pregel and Giraph.
Finally, we provide an empirical comparison of the algorithms in the MPC and AMPC models in a fault-tolerant distriubted computation environment. We empirically evaluate our algorithms on a set of large real-world graphs and show that our AMPC algorithms can achieve improvements in both running time and round-complexity over optimized MPC baselines.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Near-Optimal Decremental Hopsets with Applications
Authors:
Jakub Łącki,
Yasamin Nazari
Abstract:
Given a weighted undirected graph $G=(V,E,w)$, a hopset $H$ of hopbound $β$ and stretch $(1+ε)$ is a set of edges such that for any pair of nodes $u, v \in V$, there is a path in $G \cup H$ of at most $β$ hops, whose length is within a $(1+ε)$ factor from the distance between $u$ and $v$ in $G$.
We show the first efficient decremental algorithm for maintaining hopsets with a polylogarithmic hopb…
▽ More
Given a weighted undirected graph $G=(V,E,w)$, a hopset $H$ of hopbound $β$ and stretch $(1+ε)$ is a set of edges such that for any pair of nodes $u, v \in V$, there is a path in $G \cup H$ of at most $β$ hops, whose length is within a $(1+ε)$ factor from the distance between $u$ and $v$ in $G$.
We show the first efficient decremental algorithm for maintaining hopsets with a polylogarithmic hopbound. The update time of our algorithm matches the best known static algorithm up to polylogarithmic factors. All the previous decremental hopset constructions had a superpolylogarithmic (but subpolynomial) hopbound of $2^{\log^{Ω(1)} n}$ [Bernstein, FOCS'09; HKN, FOCS'14; Chechik, FOCS'18].
By applying our decremental hopset construction, we get improved or near optimal bounds for several distance problems. Most importantly, we show how to decrementally maintain $(2k-1)(1+ε)$-approximate all-pairs shortest paths (for any constant $k \geq 2)$, in $\tilde{O}(n^{1/k})$ amortized update time and $O(k)$ query time. This improves (by a polynomial factor) over the update-time of the best previously known decremental algorithm in the constant query time regime. Moreover, it improves over the result of [Chechik, FOCS'18] that has a query time of $O(\log \log(nW))$, where $W$ is the aspect ratio, and the amortized update time is $n^{1/k}\cdot(\frac{1}ε)^{\tilde{O}(\sqrt{\log n})}$. For sparse graphs our construction nearly matches the best known static running time / query time tradeoff.
We also obtain near-optimal bounds for maintaining approximate multi-source shortest paths and distance sketches, and get improved bounds for approximate single-source shortest paths. Our algorithms are randomized and our bounds hold with high probability against an oblivious adversary.
△ Less
Submitted 4 August, 2022; v1 submitted 17 September, 2020;
originally announced September 2020.
-
Faster DBSCAN via subsampled similarity queries
Authors:
Heinrich Jiang,
Jennifer Jang,
Jakub Łącki
Abstract:
DBSCAN is a popular density-based clustering algorithm. It computes the $ε$-neighborhood graph of a dataset and uses the connected components of the high-degree nodes to decide the clusters. However, the full neighborhood graph may be too costly to compute with a worst-case complexity of $O(n^2)$. In this paper, we propose a simple variant called SNG-DBSCAN, which clusters based on a subsampled…
▽ More
DBSCAN is a popular density-based clustering algorithm. It computes the $ε$-neighborhood graph of a dataset and uses the connected components of the high-degree nodes to decide the clusters. However, the full neighborhood graph may be too costly to compute with a worst-case complexity of $O(n^2)$. In this paper, we propose a simple variant called SNG-DBSCAN, which clusters based on a subsampled $ε$-neighborhood graph, only requires access to similarity queries for pairs of points and in particular avoids any complex data structures which need the embeddings of the data points themselves. The runtime of the procedure is $O(sn^2)$, where $s$ is the sampling rate. We show under some natural theoretical assumptions that $s \approx \log n/n$ is sufficient for statistical cluster recovery guarantees leading to an $O(n\log n)$ complexity. We provide an extensive experimental analysis showing that on large datasets, one can subsample as little as $0.1\%$ of the neighborhood graph, leading to as much as over 200x speedup and 250x reduction in RAM consumption compared to scikit-learn's implementation of DBSCAN, while still maintaining competitive clustering performance.
△ Less
Submitted 21 October, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Fully Dynamic Matching: Beating 2-Approximation in $Δ^ε$ Update Time
Authors:
Soheil Behnezhad,
Jakub Łącki,
Vahab Mirrokni
Abstract:
In fully dynamic graphs, we know how to maintain a 2-approximation of maximum matching extremely fast, that is, in polylogarithmic update time or better. In a sharp contrast and despite extensive studies, all known algorithms that maintain a $2-Ω(1)$ approximate matching are much slower. Understanding this gap and, in particular, determining the best possible update time for algorithms providing a…
▽ More
In fully dynamic graphs, we know how to maintain a 2-approximation of maximum matching extremely fast, that is, in polylogarithmic update time or better. In a sharp contrast and despite extensive studies, all known algorithms that maintain a $2-Ω(1)$ approximate matching are much slower. Understanding this gap and, in particular, determining the best possible update time for algorithms providing a better-than-2 approximate matching is a major open question.
In this paper, we show that for any constant $ε> 0$, there is a randomized algorithm that with high probability maintains a $2-Ω(1)$ approximate maximum matching of a fully-dynamic general graph in worst-case update time $O(Δ^ε+\text{polylog } n)$, where $Δ$ is the maximum degree.
Previously, the fastest fully dynamic matching algorithm providing a better-than-2 approximation had $O(m^{1/4})$ update-time [Bernstein and Stein, SODA 2016]. A faster algorithm with update-time $O(n^ε)$ was known, but worked only for maintaining the size (and not the edges) of the matching in bipartite graphs [Bhattacharya, Henzinger, and Nanongkai, STOC 2016].
△ Less
Submitted 5 November, 2019;
originally announced November 2019.
-
Near-Optimal Massively Parallel Graph Connectivity
Authors:
Soheil Behnezhad,
Laxman Dhulipala,
Hossein Esfandiari,
Jakub Łącki,
Vahab Mirrokni
Abstract:
Identifying the connected components of a graph, apart from being a fundamental problem with countless applications, is a key primitive for many other algorithms. In this paper, we consider this problem in parallel settings. Particularly, we focus on the Massively Parallel Computations (MPC) model, which is the standard theoretical model for modern parallel frameworks such as MapReduce, Hadoop, or…
▽ More
Identifying the connected components of a graph, apart from being a fundamental problem with countless applications, is a key primitive for many other algorithms. In this paper, we consider this problem in parallel settings. Particularly, we focus on the Massively Parallel Computations (MPC) model, which is the standard theoretical model for modern parallel frameworks such as MapReduce, Hadoop, or Spark. We consider the truly sublinear regime of MPC for graph problems where the space per machine is $n^δ$ for some desirably small constant $δ\in (0, 1)$.
We present an algorithm that for graphs with diameter $D$ in the wide range $[\log^ε n, n]$, takes $O(\log D)$ rounds to identify the connected components and takes $O(\log \log n)$ rounds for all other graphs. The algorithm is randomized, succeeds with high probability, does not require prior knowledge of $D$, and uses an optimal total space of $O(m)$. We complement this by showing a conditional lower-bound based on the widely believed TwoCycle conjecture that $Ω(\log D)$ rounds are indeed necessary in this setting.
Studying parallel connectivity algorithms received a resurgence of interest after the pioneering work of Andoni et al. [FOCS 2018] who presented an algorithm with $O(\log D \cdot \log \log n)$ round-complexity. Our algorithm improves this result for the whole range of values of $D$ and almost settles the problem due to the conditional lower-bound.
Additionally, we show that with minimal adjustments, our algorithm can also be implemented in a variant of the (CRCW) PRAM in asymptotically the same number of rounds.
△ Less
Submitted 11 March, 2020; v1 submitted 11 October, 2019;
originally announced October 2019.
-
Walking Randomly, Massively, and Efficiently
Authors:
Jakub Łącki,
Slobodan Mitrović,
Krzysztof Onak,
Piotr Sankowski
Abstract:
We introduce a set of techniques that allow for efficiently generating many independent random walks in the Massive Parallel Computation (MPC) model with space per machine strongly sublinear in the number of vertices. In this space-per-machine regime, many natural approaches to graph problems struggle to overcome the $Θ(\log n)$ MPC round complexity barrier. Our techniques enable breaking this bar…
▽ More
We introduce a set of techniques that allow for efficiently generating many independent random walks in the Massive Parallel Computation (MPC) model with space per machine strongly sublinear in the number of vertices. In this space-per-machine regime, many natural approaches to graph problems struggle to overcome the $Θ(\log n)$ MPC round complexity barrier. Our techniques enable breaking this barrier for PageRank---one of the most important applications of random walks---even in more challenging directed graphs, and for approximate bipartiteness and expansion testing.
In the undirected case, we start our random walks from the stationary distribution, which implies that we approximately know the empirical distribution of their next steps. This allows for preparing continuations of random walks in advance and applying a doubling approach. As a result we can generate multiple random walks of length $l$ in $Θ(\log l)$ rounds on MPC. Moreover, we show that under the popular 1-vs.-2-Cycles conjecture, this round complexity is asymptotically tight.
For directed graphs, our approach stems from our treatment of the PageRank Markov chain. We first compute the PageRank for the undirected version of the input graph and then slowly transition towards the directed case, considering convex combinations of the transition matrices in the process.
For PageRank, we achieve the following round complexities for damping factor equal to $1 - ε$:
* in $O(\log \log n + \log 1 / ε)$ rounds for undirected graphs (with $\tilde O(m / ε^2)$ total space),
* in $\tilde O(\log^2 \log n + \log^2 1/ε)$ rounds for directed graphs (with $\tilde O((m+n^{1+o(1)}) / poly\, ε)$ total space).
△ Less
Submitted 5 November, 2019; v1 submitted 11 July, 2019;
originally announced July 2019.
-
Reliable Hubs for Partially-Dynamic All-Pairs Shortest Paths in Directed Graphs
Authors:
Adam Karczmarz,
Jakub Łącki
Abstract:
We give new partially-dynamic algorithms for the all-pairs shortest paths problem in weighted directed graphs. Most importantly, we give a new deterministic incremental algorithm for the problem that handles updates in $\widetilde{O}(mn^{4/3}\log{W}/ε)$ total time (where the edge weights are from $[1,W]$) and explicitly maintains a $(1+ε)$-approximate distance matrix. For a fixed $ε>0$, this is th…
▽ More
We give new partially-dynamic algorithms for the all-pairs shortest paths problem in weighted directed graphs. Most importantly, we give a new deterministic incremental algorithm for the problem that handles updates in $\widetilde{O}(mn^{4/3}\log{W}/ε)$ total time (where the edge weights are from $[1,W]$) and explicitly maintains a $(1+ε)$-approximate distance matrix. For a fixed $ε>0$, this is the first deterministic partially dynamic algorithm for all-pairs shortest paths in directed graphs, whose update time is $o(n^2)$ regardless of the number of edges. Furthermore, we also show how to improve the state-of-the-art partially dynamic randomized algorithms for all-pairs shortest paths [Baswana et al. STOC'02, Bernstein STOC'13] from Monte Carlo randomized to Las Vegas randomized without increasing the running time bounds (with respect to the $\widetilde{O}(\cdot)$ notation).
Our results are obtained by giving new algorithms for the problem of dynamically maintaining hubs, that is a set of $\widetilde{O}(n/d)$ vertices which hit a shortest path between each pair of vertices, provided it has hop-length $Ω(d)$. We give new subquadratic deterministic and Las Vegas algorithms for maintenance of hubs under either edge insertions or deletions.
△ Less
Submitted 4 July, 2019;
originally announced July 2019.
-
Massively Parallel Computation via Remote Memory Access
Authors:
Soheil Behnezhad,
Laxman Dhulipala,
Hossein Esfandiari,
Jakub Łącki,
Warren Schudy,
Vahab Mirrokni
Abstract:
We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round, all machines are provided with random read access to the data store, subject to the same constraints…
▽ More
We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round, all machines are provided with random read access to the data store, subject to the same constraints on the total amount of communication as in the MPC model. Our model is inspired by the previous empirical studies of distributed graph algorithms using MapReduce and a distributed hash table service.
This extension allows us to give new graph algorithms with much lower round complexities compared to the best known solutions in the MPC model. In particular, in the AMPC model we show how to solve maximal independent set in $O(1)$ rounds and connectivity/minimum spanning tree in $O(\log\log_{m/n} n)$ rounds both using $O(n^δ)$ space per machine for constant $δ< 1$. In the same memory regime for MPC, the best known algorithms for these problems require polylog $n$ rounds. Our results imply that the 2-Cycle conjecture, which is widely believed to hold in the MPC model, does not hold in the AMPC model.
△ Less
Submitted 18 May, 2019;
originally announced May 2019.
-
Connected Components at Scale via Local Contractions
Authors:
Jakub Łącki,
Vahab Mirrokni,
Michał Włodarczyk
Abstract:
As a fundamental tool in hierarchical graph clustering, computing connected components has been a central problem in large-scale data mining. While many known algorithms have been developed for this problem, they are either not scalable in practice or lack strong theoretical guarantees on the parallel running time, that is, the number of communication rounds. So far, the best proven guarantee is…
▽ More
As a fundamental tool in hierarchical graph clustering, computing connected components has been a central problem in large-scale data mining. While many known algorithms have been developed for this problem, they are either not scalable in practice or lack strong theoretical guarantees on the parallel running time, that is, the number of communication rounds. So far, the best proven guarantee is $\Oh(\log n)$, which matches the running time in the PRAM model.
In this paper, we aim to design a distributed algorithm for this problem that works well in theory and practice. In particular, we present a simple algorithm based on contractions and provide a scalable implementation of it in MapReduce. On the theoretical side, in addition to showing $\Oh(\log n)$ convergence for all graphs, we prove an $\Oh(\log \log n)$ parallel running time with high probability for a certain class of random graphs. We work in the MPC model that captures popular parallel computing frameworks, such as MapReduce, Hadoop or Spark.
On the practical side, we show that our algorithm outperforms the state-of-the-art MapReduce algorithms. To confirm its scalability, we report empirical results on graphs with several trillions of edges.
△ Less
Submitted 27 July, 2018;
originally announced July 2018.
-
Decremental SPQR-trees for Planar Graphs
Authors:
Jacob Holm,
Giuseppe F. Italiano,
Adam Karczmarz,
Jakub Łącki,
Eva Rotenberg
Abstract:
We present a decremental data structure for maintaining the SPQR-tree of a planar graph subject to edge contractions and deletions. The update time, amortized over $Ω(n)$ operations, is $O(\log^2 n)$.
Via SPQR-trees, we give a decremental data structure for maintaining $3$-vertex connectivity in planar graphs. It answers queries in $O(1)$ time and processes edge deletions and contractions in…
▽ More
We present a decremental data structure for maintaining the SPQR-tree of a planar graph subject to edge contractions and deletions. The update time, amortized over $Ω(n)$ operations, is $O(\log^2 n)$.
Via SPQR-trees, we give a decremental data structure for maintaining $3$-vertex connectivity in planar graphs. It answers queries in $O(1)$ time and processes edge deletions and contractions in $O(\log^2 n)$ amortized time. This is an exponential improvement over the previous best bound of $O(\sqrt{n}\,)$ that has stood for over 20 years. In addition, the previous data structures only supported edge deletions.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
Round Compression for Parallel Matching Algorithms
Authors:
Artur Czumaj,
Jakub Łącki,
Aleksander Mądry,
Slobodan Mitrović,
Krzysztof Onak,
Piotr Sankowski
Abstract:
For over a decade now we have been witnessing the success of {\em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow…
▽ More
For over a decade now we have been witnessing the success of {\em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms?
A prominent example here is the {\em maximum matching} problem---one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in $O(\log{n})$ rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has $n^{1+Ω(1)}$ memory, this problem can also be solved $2$-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly $O(\log{n})$ rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power.
In this paper, we finally refute that perplexing possibility. That is, we break the above $O(\log n)$ round complexity bound even in the case of {\em slightly sublinear} memory per machine. In fact, our improvement here is {\em almost exponential}: we are able to deliver a $(2+ε)$-approximation to maximum matching, for any fixed constant $ε>0$, in $O((\log \log n)^2)$ rounds.
△ Less
Submitted 1 February, 2018; v1 submitted 11 July, 2017;
originally announced July 2017.
-
Contracting a Planar Graph Efficiently
Authors:
Jacob Holm,
Giuseppe F. Italiano,
Adam Karczmarz,
Jakub Łącki,
Eva Rotenberg,
Piotr Sankowski
Abstract:
We present a data structure that can maintain a simple planar graph under edge contractions in linear total time. The data structure supports adjacency queries and provides access to neighbor lists in $O(1)$ time. Moreover, it can report all the arising self-loops and parallel edges.
By applying the data structure, we can achieve optimal running times for decremental bridge detection, 2-edge con…
▽ More
We present a data structure that can maintain a simple planar graph under edge contractions in linear total time. The data structure supports adjacency queries and provides access to neighbor lists in $O(1)$ time. Moreover, it can report all the arising self-loops and parallel edges.
By applying the data structure, we can achieve optimal running times for decremental bridge detection, 2-edge connectivity, maximal 3-edge connected components, and the problem of finding a unique perfect matching for a static planar graph. Furthermore, we improve the running times of algorithms for several planar graph problems, including decremental 2-vertex and 3-edge connectivity, and we show that using our data structure in a black-box manner, one obtains conceptually simple optimal algorithms for computing MST and 5-coloring in planar graphs.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
Decremental Single-Source Reachability in Planar Digraphs
Authors:
Giuseppe F. Italiano,
Adam Karczmarz,
Jakub Łącki,
Piotr Sankowski
Abstract:
In this paper we show a new algorithm for the decremental single-source reachability problem in directed planar graphs. It processes any sequence of edge deletions in $O(n\log^2{n}\log\log{n})$ total time and explicitly maintains the set of vertices reachable from a fixed source vertex. Hence, if all edges are eventually deleted, the amortized time of processing each edge deletion is only…
▽ More
In this paper we show a new algorithm for the decremental single-source reachability problem in directed planar graphs. It processes any sequence of edge deletions in $O(n\log^2{n}\log\log{n})$ total time and explicitly maintains the set of vertices reachable from a fixed source vertex. Hence, if all edges are eventually deleted, the amortized time of processing each edge deletion is only $O(\log^2 n \log \log n)$, which improves upon a previously known $O(\sqrt{n})$ solution. We also show an algorithm for decremental maintenance of strongly connected components in directed planar graphs with the same total update time. These results constitute the first almost optimal (up to polylogarithmic factors) algorithms for both problems.
To the best of our knowledge, these are the first dynamic algorithms with polylogarithmic update times on general directed planar graphs for non-trivial reachability-type problems, for which only polynomial bounds are known in general graphs.
△ Less
Submitted 31 May, 2017;
originally announced May 2017.
-
Optimal Dynamic Strings
Authors:
Paweł Gawrychowski,
Adam Karczmarz,
Tomasz Kociumaka,
Jakub Łącki,
Piotr Sankowski
Abstract:
In this paper we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: concat - concatenates two strings, split - splits a string into two at a given position, compare - finds the lexicographical order (less, equal, greater) between two strings, LCP - calculates the longest common prefix of two strings. We present an efficient data structure f…
▽ More
In this paper we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: concat - concatenates two strings, split - splits a string into two at a given position, compare - finds the lexicographical order (less, equal, greater) between two strings, LCP - calculates the longest common prefix of two strings. We present an efficient data structure for this problem, where an update requires only $O(\log n)$ worst-case time with high probability, with $n$ being the total length of all strings in the collection, and a query takes constant worst-case time. On the lower bound side, we prove that even if the only possible query is checking equality of two strings, either updates or queries take amortized $Ω(\log n)$ time; hence our implementation is optimal.
Such operations can be used as a basic building block to solve other string problems. We provide two examples. First, we can augment our data structure to provide pattern matching queries that may locate occurrences of a specified pattern $p$ in the strings in our collection in optimal $O(|p|)$ time, at the expense of increasing update time to $O(\log^2 n)$. Second, we show how to maintain a history of an edited text, processing updates in $O(\log t \log \log t)$ time, where $t$ is the number of edits, and how to support pattern matching queries against the whole history in $O(|p| \log t \log \log t)$ time.
Finally, we note that our data structure can be applied to test dynamic tree isomorphism and to compare strings generated by dynamic straight-line grammars.
△ Less
Submitted 8 April, 2016; v1 submitted 9 November, 2015;
originally announced November 2015.
-
Algorithmic Complexity of Power Law Networks
Authors:
Paweł Brach,
Marek Cygan,
Jakub Łącki,
Piotr Sankowski
Abstract:
It was experimentally observed that the majority of real-world networks follow power law degree distribution. The aim of this paper is to study the algorithmic complexity of such "typical" networks. The contribution of this work is twofold.
First, we define a deterministic condition for checking whether a graph has a power law degree distribution and experimentally validate it on real-world netw…
▽ More
It was experimentally observed that the majority of real-world networks follow power law degree distribution. The aim of this paper is to study the algorithmic complexity of such "typical" networks. The contribution of this work is twofold.
First, we define a deterministic condition for checking whether a graph has a power law degree distribution and experimentally validate it on real-world networks. This definition allows us to derive interesting properties of power law networks. We observe that for exponents of the degree distribution in the range $[1,2]$ such networks exhibit double power law phenomenon that was observed for several real-world networks. Our observation indicates that this phenomenon could be explained by just pure graph theoretical properties.
The second aim of our work is to give a novel theoretical explanation why many algorithms run faster on real-world data than what is predicted by algorithmic worst-case analysis. We show how to exploit the power law degree distribution to design faster algorithms for a number of classical P-time problems including transitive closure, maximum matching, determinant, PageRank and matrix inverse. Moreover, we deal with the problems of counting triangles and finding maximum clique. Previously, it has been only shown that these problems can be solved very efficiently on power law graphs when these graphs are random, e.g., drawn at random from some distribution. However, it is unclear how to relate such a theoretical analysis to real-world graphs, which are fixed. Instead of that, we show that the randomness assumption can be replaced with a simple condition on the degrees of adjacent vertices, which can be used to obtain similar results. As a result, in some range of power law exponents, we are able to solve the maximum clique problem in polynomial time, although in general power law networks the problem is NP-complete.
△ Less
Submitted 9 July, 2015;
originally announced July 2015.
-
Fast and simple connectivity in graph timelines
Authors:
Adam Karczmarz,
Jakub Łącki
Abstract:
In this paper we study the problem of answering connectivity queries about a \emph{graph timeline}. A graph timeline is a sequence of undirected graphs $G_1,\ldots,G_t$ on a common set of vertices of size $n$ such that each graph is obtained from the previous one by an addition or a deletion of a single edge. We present data structures, which preprocess the timeline and can answer the following qu…
▽ More
In this paper we study the problem of answering connectivity queries about a \emph{graph timeline}. A graph timeline is a sequence of undirected graphs $G_1,\ldots,G_t$ on a common set of vertices of size $n$ such that each graph is obtained from the previous one by an addition or a deletion of a single edge. We present data structures, which preprocess the timeline and can answer the following queries:
- forall$(u,v,a,b)$ -- does the path $u\to v$ exist in each of $G_a,\ldots,G_b$?
- exists$(u,v,a,b)$ -- does the path $u\to v$ exist in any of $G_a,\ldots,G_b$?
- forall2$(u,v,a,b)$ -- do there exist two edge-disjoint paths connecting $u$ and $v$ in each of $G_a,\ldots,G_b$
We show data structures that can answer forall and forall2 queries in $O(\log n)$ time after preprocessing in $O(m+t\log n)$ time. Here by $m$ we denote the number of edges that remain unchanged in each graph of the timeline. For the case of exists queries, we show how to extend an existing data structure to obtain a preprocessing/query trade-off of $\langle O(m+\min(nt, t^{2-α})), O(t^α)\rangle$ and show a matching conditional lower bound.
△ Less
Submitted 9 June, 2015;
originally announced June 2015.
-
Optimal decremental connectivity in planar graphs
Authors:
Jakub Łącki,
Piotr Sankowski
Abstract:
We show an algorithm for dynamic maintenance of connectivity information in an undirected planar graph subject to edge deletions. Our algorithm may answer connectivity queries of the form `Are vertices $u$ and $v$ connected with a path?' in constant time. The queries can be intermixed with any sequence of edge deletions, and the algorithm handles all updates in $O(n)$ time. This results improves o…
▽ More
We show an algorithm for dynamic maintenance of connectivity information in an undirected planar graph subject to edge deletions. Our algorithm may answer connectivity queries of the form `Are vertices $u$ and $v$ connected with a path?' in constant time. The queries can be intermixed with any sequence of edge deletions, and the algorithm handles all updates in $O(n)$ time. This results improves over previously known $O(n \log n)$ time algorithm.
△ Less
Submitted 25 September, 2014;
originally announced September 2014.
-
The Power of Dynamic Distance Oracles: Efficient Dynamic Algorithms for the Steiner Tree
Authors:
Jakub Łącki,
Jakub Oćwieja,
Marcin Pilipczuk,
Piotr Sankowski,
Anna Zych
Abstract:
In this paper we study the Steiner tree problem over a dynamic set of terminals. We consider the model where we are given an $n$-vertex graph $G=(V,E,w)$ with positive real edge weights, and our goal is to maintain a tree which is a good approximation of the minimum Steiner tree spanning a terminal set $S \subseteq V$, which changes over time. The changes applied to the terminal set are either ter…
▽ More
In this paper we study the Steiner tree problem over a dynamic set of terminals. We consider the model where we are given an $n$-vertex graph $G=(V,E,w)$ with positive real edge weights, and our goal is to maintain a tree which is a good approximation of the minimum Steiner tree spanning a terminal set $S \subseteq V$, which changes over time. The changes applied to the terminal set are either terminal additions (incremental scenario), terminal removals (decremental scenario), or both (fully dynamic scenario). Our task here is twofold. We want to support updates in sublinear $o(n)$ time, and keep the approximation factor of the algorithm as small as possible. We show that we can maintain a $(6+\varepsilon)$-approximate Steiner tree of a general graph in $\tilde{O}(\sqrt{n} \log D)$ time per terminal addition or removal. Here, $D$ denotes the stretch of the metric induced by $G$. For planar graphs we achieve the same running time and the approximation ratio of $(2+\varepsilon)$. Moreover, we show faster algorithms for incremental and decremental scenarios. Finally, we show that if we allow higher approximation ratio, even more efficient algorithms are possible. In particular we show a polylogarithmic time $(4+\varepsilon)$-approximate algorithm for planar graphs.
One of the main building blocks of our algorithms are dynamic distance oracles for vertex-labeled graphs, which are of independent interest. We also improve and use the online algorithms for the Steiner tree problem.
△ Less
Submitted 24 June, 2016; v1 submitted 15 August, 2013;
originally announced August 2013.
-
Faster Algorithms for Markov Decision Processes with Low Treewidth
Authors:
Krishnendu Chatterjee,
Jakub Łącki
Abstract:
We consider two core algorithmic problems for probabilistic verification: the maximal end-component decomposition and the almost-sure reachability set computation for Markov decision processes (MDPs). For MDPs with treewidth $k$, we present two improved static algorithms for both the problems that run in time $O(n \cdot k^{2.38} \cdot 2^k)$ and $O(m \cdot \log n \cdot k)$, respectively, where $n$…
▽ More
We consider two core algorithmic problems for probabilistic verification: the maximal end-component decomposition and the almost-sure reachability set computation for Markov decision processes (MDPs). For MDPs with treewidth $k$, we present two improved static algorithms for both the problems that run in time $O(n \cdot k^{2.38} \cdot 2^k)$ and $O(m \cdot \log n \cdot k)$, respectively, where $n$ is the number of states and $m$ is the number of edges, significantly improving the previous known $O(n\cdot k \cdot \sqrt{n\cdot k})$ bound for low treewidth. We also present decremental algorithms for both problems for MDPs with constant treewidth that run in amortized logarithmic time, which is a huge improvement over the previously known algorithms that require amortized linear time.
△ Less
Submitted 30 October, 2013; v1 submitted 30 March, 2013;
originally announced April 2013.
-
Single Source - All Sinks Max Flows in Planar Digraphs
Authors:
Jakub Łącki,
Yahav Nussbaum,
Piotr Sankowski,
Christian Wulff-Nilsen
Abstract:
Let G = (V,E) be a planar n-vertex digraph. Consider the problem of computing max st-flow values in G from a fixed source s to all sinks t in V\{s}. We show how to solve this problem in near-linear O(n log^3 n) time. Previously, no better solution was known than running a single-source single-sink max flow algorithm n-1 times, giving a total time bound of O(n^2 log n) with the algorithm of Borrada…
▽ More
Let G = (V,E) be a planar n-vertex digraph. Consider the problem of computing max st-flow values in G from a fixed source s to all sinks t in V\{s}. We show how to solve this problem in near-linear O(n log^3 n) time. Previously, no better solution was known than running a single-source single-sink max flow algorithm n-1 times, giving a total time bound of O(n^2 log n) with the algorithm of Borradaile and Klein.
An important implication is that all-pairs max st-flow values in G can be computed in near-quadratic time. This is close to optimal as the output size is Theta(n^2). We give a quadratic lower bound on the number of distinct max flow values and an Omega(n^3) lower bound for the total size of all min cut-sets. This distinguishes the problem from the undirected case where the number of distinct max flow values is O(n).
Previous to our result, no algorithm which could solve the all-pairs max flow values problem faster than the time of Theta(n^2) max-flow computations for every planar digraph was known.
This result is accompanied with a data structure that reports min cut-sets. For fixed s and all t, after O(n^{3/2} log^{3/2} n) preprocessing time, it can report the set of arcs C crossing a min st-cut in time roughly proportional to the size of C.
△ Less
Submitted 17 October, 2012;
originally announced October 2012.
-
Min-cuts and Shortest Cycles in Planar Graphs in O(n log log n) Time
Authors:
Jakub Łącki,
Piotr Sankowski
Abstract:
We present a deterministic O(n log log n) time algorithm for finding shortest cycles and minimum cuts in planar graphs. The algorithm improves the previously known fastest algorithm by Italiano et al. in STOC'11 by a factor of log n. This speedup is obtained through the use of dense distance graphs combined with a divide-and-conquer approach.
We present a deterministic O(n log log n) time algorithm for finding shortest cycles and minimum cuts in planar graphs. The algorithm improves the previously known fastest algorithm by Italiano et al. in STOC'11 by a factor of log n. This speedup is obtained through the use of dense distance graphs combined with a divide-and-conquer approach.
△ Less
Submitted 26 April, 2011;
originally announced April 2011.
-
The Road to Stueckelberg's Covariant Perturbation Theory as Illustrated by Successive Treatments of Compton Scattering
Authors:
J. Lacki,
H. Ruegg,
V. Telegdi
Abstract:
We review the history of the road to a manifestly covariant perturbative calculus within quantum electrodynamics from the early semi-classical results of the mid-twenties to the complete formalism of Stueckelberg in 1934. We chose as our case study the calculation of the cross-section of the Compton effect. We analyse Stueckelberg's paper extensively. This is our first contribution to a study of…
▽ More
We review the history of the road to a manifestly covariant perturbative calculus within quantum electrodynamics from the early semi-classical results of the mid-twenties to the complete formalism of Stueckelberg in 1934. We chose as our case study the calculation of the cross-section of the Compton effect. We analyse Stueckelberg's paper extensively. This is our first contribution to a study of his fundamental contributions to the theoretical physics of twentieth century.
△ Less
Submitted 15 March, 1999;
originally announced March 1999.
-
Partition Function Zeros of an Ising Spin Glass
Authors:
P. H. Damgaard,
J. Lacki
Abstract:
We study the pattern of zeros emerging from exact partition function evaluations of Ising spin glasses on conventional finite lattices of varying sizes. A large number of random bond configurations are probed in the framework of quenched averages. This study is motivated by the relationship between hierarchical lattice models whose partition function zeros fall on Julia sets and chaotic renormal…
▽ More
We study the pattern of zeros emerging from exact partition function evaluations of Ising spin glasses on conventional finite lattices of varying sizes. A large number of random bond configurations are probed in the framework of quenched averages. This study is motivated by the relationship between hierarchical lattice models whose partition function zeros fall on Julia sets and chaotic renormalization flows in such models with frustration, and by the possible connection of the latter with spin glass behaviour. In any finite volume, the simultaneous distribution of the zeros of all partition functions can be viewed as part of the more general problem of finding the location of all the zeros of a certain class of random polynomials with positive integer coefficients. Some aspects of this problem have been studied in various branches of mathematics, and we show how polynomial mappings which are used in graph theory to classify graphs, may help in characterizing the distribution of zeros. We finally discuss the possible limiting set as the volume is sent to infinity.
△ Less
Submitted 29 September, 1994; v1 submitted 29 July, 1994;
originally announced July 1994.
-
Series expansions without diagrams
Authors:
Gyan Bhanot,
Michael Creutz,
Ivan Horvath Jan Lacki,
John Weckel
Abstract:
We discuss the use of recursive enumeration schemes to obtain low and high temperature series expansions for discrete statistical systems. Using linear combinations of generalized helical lattices, the method is competitive with diagramatic approaches and is easily generalizable. We illustrate the approach using the Ising model and generate low temperature series in up to five dimensions and hig…
▽ More
We discuss the use of recursive enumeration schemes to obtain low and high temperature series expansions for discrete statistical systems. Using linear combinations of generalized helical lattices, the method is competitive with diagramatic approaches and is easily generalizable. We illustrate the approach using the Ising model and generate low temperature series in up to five dimensions and high temperature series in three dimensions. The method is general and can be applied to any discrete model. We describe how it would work for Potts models.
△ Less
Submitted 4 March, 1993;
originally announced March 1993.
-
Low Temperature Expansions for Potts Models
Authors:
G. Bhanot,
M. Creutz,
U. Glassner,
I. Horvath,
J. Lacki,
K. Schilling,
J. Weckel
Abstract:
On simple cubic lattices, we compute low temperature series expansions for the energy, magnetization and susceptibility of the three-state Potts model in D=2 and D=3 to 45 and 39 excited bonds respectively, and the eight-state Potts model in D=2 to 25 excited bonds. We use a recursive procedure which enumerates states explicitly. We analyze the series using Dlog Pade analysis and inhomogeneous d…
▽ More
On simple cubic lattices, we compute low temperature series expansions for the energy, magnetization and susceptibility of the three-state Potts model in D=2 and D=3 to 45 and 39 excited bonds respectively, and the eight-state Potts model in D=2 to 25 excited bonds. We use a recursive procedure which enumerates states explicitly. We analyze the series using Dlog Pade analysis and inhomogeneous differential approximants.
△ Less
Submitted 25 January, 1993;
originally announced January 1993.
-
Low temperature expansion for the 3-d Ising Model
Authors:
Gyan Bhanot,
Michael Creutz,
Jan Lacki
Abstract:
We compute the weak coupling expansion for the energy of the three dimensional Ising model through 48 excited bonds. We also compute the magnetization through 40 excited bonds. This was achieved via a recursive enumeration of states of fixed energy on a set of finite lattices. We use a linear combination of lattices with a generalization of helical boundary conditions to eliminate finite volume…
▽ More
We compute the weak coupling expansion for the energy of the three dimensional Ising model through 48 excited bonds. We also compute the magnetization through 40 excited bonds. This was achieved via a recursive enumeration of states of fixed energy on a set of finite lattices. We use a linear combination of lattices with a generalization of helical boundary conditions to eliminate finite volume effects.
△ Less
Submitted 24 June, 1992;
originally announced June 1992.