-
Optimal Neighborhood Exploration for Dynamic Independent Sets
Authors:
Jannick Borowitz,
Ernestine Großmann,
Christian Schulz
Abstract:
A dynamic graph algorithm is a data structure that supports edge insertions, deletions, and specific problem queries. While extensive research exists on dynamic algorithms for graph problems solvable in polynomial time, most of these algorithms have not been implemented or empirically evaluated.
This work addresses the NP-complete maximum weight and cardinality independent set problems in a dyna…
▽ More
A dynamic graph algorithm is a data structure that supports edge insertions, deletions, and specific problem queries. While extensive research exists on dynamic algorithms for graph problems solvable in polynomial time, most of these algorithms have not been implemented or empirically evaluated.
This work addresses the NP-complete maximum weight and cardinality independent set problems in a dynamic setting, applicable to areas like dynamic map-labeling and vehicle routing. Real-world instances can be vast, with millions of vertices and edges, making it challenging to find near-optimal solutions quickly. Exact solvers can find optimal solutions but have exponential worst-case runtimes. Conversely, heuristic algorithms use local search techniques to improve solutions by optimizing vertices.
In this work, we introduce a novel local search technique called optimal neighborhood exploration. This technique creates independent subproblems that are solved to optimality, leading to improved overall solutions. Through numerous experiments, we assess the effectiveness of our approach and compare it with other state-of-the-art dynamic solvers. Our algorithm features a parameter, the subproblem size, that balances running time and solution quality. With this parameter, our configuration matches state-of-the-art performance for the cardinality independent set problem. By increasing the parameter, we significantly enhance solution quality.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Engineering Edge Orientation Algorithms
Authors:
H. Reinstädtler,
C. Schulz,
B. Uçar
Abstract:
Given an undirected graph G, the edge orientation problem asks for assigning a direction to each edge to convert G into a directed graph. The aim is to minimize the maximum out degree of a vertex in the resulting directed graph. This problem, which is solvable in polynomial time, arises in many applications. An ongoing challenge in edge orientation algorithms is their scalability, particularly in…
▽ More
Given an undirected graph G, the edge orientation problem asks for assigning a direction to each edge to convert G into a directed graph. The aim is to minimize the maximum out degree of a vertex in the resulting directed graph. This problem, which is solvable in polynomial time, arises in many applications. An ongoing challenge in edge orientation algorithms is their scalability, particularly in handling large-scale networks with millions or billions of edges efficiently. We propose a novel algorithmic framework based on finding and manipulating simple paths to face this challenge. Our framework is based on an existing algorithm and allows many algorithmic choices. By carefully exploring these choices and engineering the underlying algorithms, we obtain an implementation which is more efficient and scalable than the current state-of-the-art. Our experiments demonstrate significant performance improvements compared to state-of-the-art solvers. On average our algorithm is 6.59 times faster when compared to the state-of-the-art.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Buffered Streaming Edge Partitioning
Authors:
Adil Chhabra,
Marcelo Fonseca Faraj,
Christian Schulz,
Daniel Seemaier
Abstract:
Addressing the challenges of processing massive graphs, which are prevalent in diverse fields such as social, biological, and technical networks, we introduce HeiStreamE and FreightE, two innovative (buffered) streaming algorithms designed for efficient edge partitioning of large-scale graphs. HeiStreamE utilizes an adapted Split-and-Connect graph model and a Fennel-based multilevel partitioning s…
▽ More
Addressing the challenges of processing massive graphs, which are prevalent in diverse fields such as social, biological, and technical networks, we introduce HeiStreamE and FreightE, two innovative (buffered) streaming algorithms designed for efficient edge partitioning of large-scale graphs. HeiStreamE utilizes an adapted Split-and-Connect graph model and a Fennel-based multilevel partitioning scheme, while FreightE partitions a hypergraph representation of the input graph. Besides ensuring superior solution quality, these approaches also overcome the limitations of existing algorithms by maintaining linear dependency on the graph size in both time and memory complexity with no dependence on the number of blocks of partition. Our comprehensive experimental analysis demonstrates that HeiStreamE outperforms current streaming algorithms and the re-streaming algorithm 2PS in partitioning quality (replication factor), and is more memory-efficient for real-world networks where the number of edges is far greater than the number of vertices. Further, FreightE is shown to produce fast and efficient partitions, particularly for higher numbers of partition blocks.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Consecutive Power Occurrences in Sturmian Words
Authors:
Jason Bell,
Chris Schulz,
Jeffrey Shallit
Abstract:
We show that every Sturmian word has the property that the distance between consecutive ending positions of cubes occurring in the word is always bounded by $10$ and this bound is optimal, extending a result of Rampersad, who proved that the bound $9$ holds for the Fibonacci word. We then give a general result showing that for every $e \in [1,(5+\sqrt{5})/2)$ there is a natural number $N$, dependi…
▽ More
We show that every Sturmian word has the property that the distance between consecutive ending positions of cubes occurring in the word is always bounded by $10$ and this bound is optimal, extending a result of Rampersad, who proved that the bound $9$ holds for the Fibonacci word. We then give a general result showing that for every $e \in [1,(5+\sqrt{5})/2)$ there is a natural number $N$, depending only on $e$, such that every Sturmian word has the property that the distance between consecutive ending positions of $e$-powers occurring in the word is uniformly bounded by $N$.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Engineering Weighted Connectivity Augmentation Algorithms
Authors:
Marcelo Fonseca Faraj,
Ernestine Großmann,
Felix Joos,
Thomas Möller,
Christian Schulz
Abstract:
Increasing the connectivity of a graph is a pivotal challenge in robust network design. The weighted connectivity augmentation problem is a common version of the problem that takes link costs into consideration. The problem is then to find a minimum cost subset of a given set of weighted links that increases the connectivity of a graph by one when the links are added to the edge set of the input i…
▽ More
Increasing the connectivity of a graph is a pivotal challenge in robust network design. The weighted connectivity augmentation problem is a common version of the problem that takes link costs into consideration. The problem is then to find a minimum cost subset of a given set of weighted links that increases the connectivity of a graph by one when the links are added to the edge set of the input instance. In this work, we give a first implementation of recently discovered better-than-2 approximations. Furthermore, we propose three new heuristic and one exact approach. These include a greedy algorithm considering link costs and the number of unique cuts covered, an approach based on minimum spanning trees and a local search algorithm that may improve a given solution by swapping links of paths. Our exact approach uses an ILP formulation with efficient cut enumeration as well as a fast initialization routine. We then perform an extensive experimental evaluation which shows that our algorithms are faster and yield the best solutions compared to the current state-of-the-art as well as the recently discovered better-than-2 approximation algorithms. Our novel local search algorithm can improve solution quality even further.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
The Right Model for the Job: An Evaluation of Legal Multi-Label Classification Baselines
Authors:
Martina Forster,
Claudia Schulz,
Prudhvi Nokku,
Melicaalsadat Mirsafian,
Jaykumar Kasundra,
Stavroula Skylaki
Abstract:
Multi-Label Classification (MLC) is a common task in the legal domain, where more than one label may be assigned to a legal document. A wide range of methods can be applied, ranging from traditional ML approaches to the latest Transformer-based architectures. In this work, we perform an evaluation of different MLC methods using two public legal datasets, POSTURE50K and EURLEX57K. By varying the am…
▽ More
Multi-Label Classification (MLC) is a common task in the legal domain, where more than one label may be assigned to a legal document. A wide range of methods can be applied, ranging from traditional ML approaches to the latest Transformer-based architectures. In this work, we perform an evaluation of different MLC methods using two public legal datasets, POSTURE50K and EURLEX57K. By varying the amount of training data and the number of labels, we explore the comparative advantage offered by different approaches in relation to the dataset properties. Our findings highlight DistilRoBERTa and LegalBERT as performing consistently well in legal MLC with reasonable computational demands. T5 also demonstrates comparable performance while offering advantages as a generative model in the presence of changing label sets. Finally, we show that the CrossEncoder exhibits potential for notable macro-F1 score improvements, albeit with increased computational costs.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
FLASH-TB: Integrating Arc-Flags and Trip-Based Public Transit Routing
Authors:
Ernestine Großmann,
Jonas Sauer,
Christian Schulz,
Patrick Steil,
Sascha Witt
Abstract:
We present FLASH-TB, a journey planning algorithm for public transit networks that combines Trip-Based Public Transit Routing (TB) with the Arc-Flags speedup technique. The basic idea is simple: The network is partitioned into a configurable number of cells. For each cell and each possible transfer between two vehicles, the algorithm precomputes a flag that indicates whether the transfer is requir…
▽ More
We present FLASH-TB, a journey planning algorithm for public transit networks that combines Trip-Based Public Transit Routing (TB) with the Arc-Flags speedup technique. The basic idea is simple: The network is partitioned into a configurable number of cells. For each cell and each possible transfer between two vehicles, the algorithm precomputes a flag that indicates whether the transfer is required to reach the cell. During a query, only flagged transfers are explored. Our algorithm improves upon previous attempts to apply Arc-Flags to public transit networks, which saw limited success due to conflicting rules for pruning the search space. We show that these rules can be reconciled while still producing correct results. Because the number of cells is configurable, FLASH-TB offers a tradeoff between query time and memory consumption. It is significantly more space-efficient than existing techniques with a comparable preprocessing time, which store generalized shortest-path trees: to match their query performance, it requires up to two orders of magnitude less memory. The fastest configuration of FLASH-TB achieves a speedup of more than two orders of magnitude over TB, offering sub-millisecond query times even on large countrywide networks.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
A Framework for Monitoring and Retraining Language Models in Real-World Applications
Authors:
Jaykumar Kasundra,
Claudia Schulz,
Melicaalsadat Mirsafian,
Stavroula Skylaki
Abstract:
In the Machine Learning (ML) model development lifecycle, training candidate models using an offline holdout dataset and identifying the best model for the given task is only the first step. After the deployment of the selected model, continuous model monitoring and model retraining is required in many real-world applications. There are multiple reasons for retraining, including data or concept dr…
▽ More
In the Machine Learning (ML) model development lifecycle, training candidate models using an offline holdout dataset and identifying the best model for the given task is only the first step. After the deployment of the selected model, continuous model monitoring and model retraining is required in many real-world applications. There are multiple reasons for retraining, including data or concept drift, which may be reflected on the model performance as monitored by an appropriate metric. Another motivation for retraining is the acquisition of increasing amounts of data over time, which may be used to retrain and improve the model performance even in the absence of drifts. We examine the impact of various retraining decision points on crucial factors, such as model performance and resource utilization, in the context of Multilabel Classification models. We explain our key decision points and propose a reference framework for designing an effective model retraining strategy.
△ Less
Submitted 17 November, 2023; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Open Problems in (Hyper)Graph Decomposition
Authors:
Deepak Ajwani,
Rob H. Bisseling,
Katrin Casel,
Ümit V. Çatalyürek,
Cédric Chevalier,
Florian Chudigiewitsch,
Marcelo Fonseca Faraj,
Michael Fellows,
Lars Gottesbüren,
Tobias Heuer,
George Karypis,
Kamer Kaya,
Jakub Lacki,
Johannes Langguth,
Xiaoye Sherry Li,
Ruben Mayer,
Johannes Meintrup,
Yosuke Mizutani,
François Pellegrini,
Fabrizio Petrini,
Frances Rosamond,
Ilya Safro,
Sebastian Schlag,
Christian Schulz,
Roohani Sharma
, et al. (4 additional authors not shown)
Abstract:
Large networks are useful in a wide range of applications. Sometimes problem instances are composed of billions of entities. Decomposing and analyzing these structures helps us gain new insights about our surroundings. Even if the final application concerns a different problem (such as traversal, finding paths, trees, and flows), decomposing large graphs is often an important subproblem for comple…
▽ More
Large networks are useful in a wide range of applications. Sometimes problem instances are composed of billions of entities. Decomposing and analyzing these structures helps us gain new insights about our surroundings. Even if the final application concerns a different problem (such as traversal, finding paths, trees, and flows), decomposing large graphs is often an important subproblem for complexity reduction or parallelization. This report is a summary of discussions that happened at Dagstuhl seminar 23331 on "Recent Trends in Graph Decomposition" and presents currently open problems and future directions in the area of (hyper)graph decomposition.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Scalable Algorithms for 2-Packing Sets on Arbitrary Graphs
Authors:
Jannick Borowitz,
Ernestine Großmann,
Christian Schulz,
Dominik Schweisgut
Abstract:
A 2-packing set for an undirected graph $G=(V,E)$ is a subset $\mathcal{S} \subset V$ such that any two vertices $v_1,v_2 \in \mathcal{S}$ have no common neighbors. Finding a 2-packing set of maximum cardinality is a NP-hard problem. We develop a new approach to solve this problem on arbitrary graphs using its close relation to the independent set problem. Thereby, our algorithm red2pack uses new…
▽ More
A 2-packing set for an undirected graph $G=(V,E)$ is a subset $\mathcal{S} \subset V$ such that any two vertices $v_1,v_2 \in \mathcal{S}$ have no common neighbors. Finding a 2-packing set of maximum cardinality is a NP-hard problem. We develop a new approach to solve this problem on arbitrary graphs using its close relation to the independent set problem. Thereby, our algorithm red2pack uses new data reduction rules specific to the 2-packing set problem as well as a graph transformation. Our experiments show that we outperform the state-of-the-art for arbitrary graphs with respect to solution quality and also are able to compute solutions multiple orders of magnitude faster than previously possible. For example, we are able to solve 63% of the graphs in the tested data set to optimality in less than a second while the competitor for arbitrary graphs can only solve 5% of these graphs to optimality even with a 10 hour time limit. Moreover, our approach can solve a wide range of large instances that have previously been unsolved.
△ Less
Submitted 12 February, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Evolution of plasmon excitations across the phase diagram of the cuprate superconductor La$_{2-x}$Sr$_{x}$CuO$_4$
Authors:
M. Hepting,
T. D. Boyko,
V. Zimmermann,
M. Bejas,
Y. E. Suyolcu,
P. Puphal,
R. J. Green,
L. Zinni,
J. Kim,
D. Casa,
M. H. Upton,
D. Wong,
C. Schulz,
M. Bartkowiak,
K. Habicht,
E. Pomjakushina,
G. Cristiani,
G. Logvenov,
M. Minola,
H. Yamase,
A. Greco,
B. Keimer
Abstract:
We use resonant inelastic x-ray scattering (RIXS) at the O $K$- and Cu $K$-edges to investigate the doping- and temperature dependence of low-energy plasmon excitations in La$_{2-x}$Sr$_{x}$CuO$_4$. We observe a monotonic increase of the energy scale of the plasmons with increasing doping $x$ in the underdoped regime, whereas a saturation occurs above optimal doping $x \gtrsim 0.16$ and persists a…
▽ More
We use resonant inelastic x-ray scattering (RIXS) at the O $K$- and Cu $K$-edges to investigate the doping- and temperature dependence of low-energy plasmon excitations in La$_{2-x}$Sr$_{x}$CuO$_4$. We observe a monotonic increase of the energy scale of the plasmons with increasing doping $x$ in the underdoped regime, whereas a saturation occurs above optimal doping $x \gtrsim 0.16$ and persists at least up to $x = 0.4$. Furthermore, we find that the plasmon excitations show only a marginal temperature dependence, and possible effects due to the superconducting transition and the onset of strange metal behavior are either absent or below the detection limit of our experiment. Taking into account the strongly correlated character of the cuprates, we show that layered $t$-$J$-$V$ model calculations accurately capture the increase of the plasmon energy in the underdoped regime. However, the computed plasmon energy continues to increase even for doping levels above $x \gtrsim 0.16$, which is distinct from the experimentally observed saturation, and reaches a broad maximum around $x = 0.55$. We discuss whether possible lattice disorder in overdoped samples, a renormalization of the electronic correlation strength at high dopings, or an increasing relevance of non-planar Cu and O orbitals could be responsible for the discrepancy between experiment and theory for doping levels above $x = 0.16$.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Dimensionality dependent electronic structure of the exfoliated van der Waals antiferromagnet NiPS$_3$
Authors:
M. F. DiScala,
D. Staros,
A. de la Torre,
A. Lopez,
D. Wong,
C. Schulz,
M. Bartkowiak,
B. Rubenstein,
K. W. Plumb
Abstract:
Resonant Inelastic X-ray Scattering (RIXS) was used to measure the local electronic structure in few-layer exfoliated flakes of the van der Waals antiferromagnet NiPS$_3$. The resulting spectra show a systematic softening and broadening of $NiS_6$ multiplet excitations with decreasing layer count from the bulk to three atomic layers (3L). These trends are driven by a decrease in the transition met…
▽ More
Resonant Inelastic X-ray Scattering (RIXS) was used to measure the local electronic structure in few-layer exfoliated flakes of the van der Waals antiferromagnet NiPS$_3$. The resulting spectra show a systematic softening and broadening of $NiS_6$ multiplet excitations with decreasing layer count from the bulk to three atomic layers (3L). These trends are driven by a decrease in the transition metal-ligand and ligand-ligand hopping integrals, and in the charge-transfer energy: $Δ$ = 0.60 eV in the bulk and 0.22 eV in 3L NiPS$_3$. Relevant intralayer magnetic exchange integrals computed from the electronic parameters exhibit a systematic decrease in the average interaction strength with thickness and place 2D NiPS$_3$ close to the phase boundary between stripy and spiral antiferromagnetic order, which may explain the apparent vanishing of long-range order in the 2D limit. This study explicitly demonstrates the influence of $inter$layer electronic interactions on $intra$layer ones in insulating magnets. As a consequence, the magnetic Hamiltonian in few-layer insulating magnets can be significantly different from that in the bulk.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Arc-Flags Meet Trip-Based Public Transit Routing
Authors:
Ernestine Großmann,
Jonas Sauer,
Christian Schulz,
Patrick Steil
Abstract:
We present Arc-Flag TB, a journey planning algorithm for public transit networks which combines Trip-Based Public Transit Routing (TB) with the Arc-Flags speedup technique. Compared to previous attempts to apply Arc-Flags to public transit networks, which saw limited success, our approach uses stronger pruning rules to reduce the search space. Our experiments show that Arc-Flag TB achieves a speed…
▽ More
We present Arc-Flag TB, a journey planning algorithm for public transit networks which combines Trip-Based Public Transit Routing (TB) with the Arc-Flags speedup technique. Compared to previous attempts to apply Arc-Flags to public transit networks, which saw limited success, our approach uses stronger pruning rules to reduce the search space. Our experiments show that Arc-Flag TB achieves a speedup of up to two orders of magnitude over TB, offering query times of less than a millisecond even on large countrywide networks. Compared to the state-of-the-art speedup technique Trip-Based Public Transit Routing Using Condensed Search Trees (TB-CST), our algorithm achieves similar query times but requires significantly less additional memory. Other state-of-the-art algorithms which achieve even faster query times, e.g., Public Transit Labeling, require enormous memory usage. In contrast, Arc-Flag TB offers a tradeoff between query performance and memory usage due to the fact that the number of regions in the network partition required by our algorithm is a configurable parameter. We also identify an issue in the transfer precomputation of TB that affects both TB-CST and Arc-Flag TB, leading to incorrect answers for some queries. This has not been previously recognized by the author of TB-CST. We provide discussion on how to resolve this issue in the future. Currently, Arc-Flag TB answers 1-6% of queries incorrectly, compared to over 20% for TB-CST on some networks.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
FREIGHT: Fast Streaming Hypergraph Partitioning
Authors:
Kamal Eyubov,
Marcelo Fonseca Faraj,
Christian Schulz
Abstract:
Partitioning the vertices of a (hyper)graph into k roughly balanced blocks such that few (hyper)edges run between blocks is a key problem for large-scale distributed processing. A current trend for partitioning huge (hyper)graphs using low computational resources are streaming algorithms. In this work, we propose FREIGHT: a Fast stREamInG Hypergraph parTitioning algorithm which is an adaptation of…
▽ More
Partitioning the vertices of a (hyper)graph into k roughly balanced blocks such that few (hyper)edges run between blocks is a key problem for large-scale distributed processing. A current trend for partitioning huge (hyper)graphs using low computational resources are streaming algorithms. In this work, we propose FREIGHT: a Fast stREamInG Hypergraph parTitioning algorithm which is an adaptation of the widely-known graph-based algorithm Fennel. By using an efficient data structure, we make the overall running of FREIGHT linearly dependent on the pin-count of the hypergraph and the memory consumption linearly dependent on the numbers of nets and blocks. The results of our extensive experimentation showcase the promising performance of FREIGHT as a highly efficient and effective solution for streaming hypergraph partitioning. Our algorithm demonstrates competitive running time with the Hashing algorithm, with a difference of a maximum factor of four observed on three fourths of the instances. Significantly, our findings highlight the superiority of FREIGHT over all existing (buffered) streaming algorithms and even the in-memory algorithm HYPE, with respect to both cut-net and connectivity measures. This indicates that our proposed algorithm is a promising hypergraph partitioning tool to tackle the challenge posed by large-scale and dynamic data processing.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Improved Exact and Heuristic Algorithms for Maximum Weight Clique
Authors:
Roman Erhardt,
Kathrin Hanauer,
Nils Kriege,
Christian Schulz,
Darren Strash
Abstract:
We propose improved exact and heuristic algorithms for solving the maximum weight clique problem, a well-known problem in graph theory with many applications. Our algorithms interleave successful techniques from related work with novel data reduction rules that use local graph structure to identify and remove vertices and edges while retaining the optimal solution. We evaluate our algorithms on a…
▽ More
We propose improved exact and heuristic algorithms for solving the maximum weight clique problem, a well-known problem in graph theory with many applications. Our algorithms interleave successful techniques from related work with novel data reduction rules that use local graph structure to identify and remove vertices and edges while retaining the optimal solution. We evaluate our algorithms on a range of synthetic and real-world graphs, and find that they outperform the current state of the art on most inputs. Our data reductions always produce smaller reduced graphs than existing data reductions alone. As a result, our exact algorithm, MWCRedu, finds solutions orders of magnitude faster on naturally weighted, medium-sized map labeling graphs and random hyperbolic graphs. Our heuristic algorithm, MWCPeel, outperforms its competitors on these instances, but is slightly less effective on extremely dense or large instances.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Faster Local Motif Clustering via Maximum Flows
Authors:
Adil Chhabra,
Marcelo Fonseca Faraj,
Christian Schulz
Abstract:
Local clustering aims to identify a cluster within a given graph that includes a designated seed node or a significant portion of a group of seed nodes. This cluster should be well-characterized, i.e., it has a high number of internal edges and a low number of external edges. In this work, we propose SOCIAL, a novel algorithm for local motif clustering which optimizes for motif conductance based o…
▽ More
Local clustering aims to identify a cluster within a given graph that includes a designated seed node or a significant portion of a group of seed nodes. This cluster should be well-characterized, i.e., it has a high number of internal edges and a low number of external edges. In this work, we propose SOCIAL, a novel algorithm for local motif clustering which optimizes for motif conductance based on a local hypergraph model representation of the problem and an adapted version of the max-flow quotient-cut improvement algorithm (MQI). In our experiments with the triangle motif, SOCIAL produces local clusters with an average motif conductance lower than the state-of-the-art, while being up to multiple orders of magnitude faster.
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
Engineering Fully Dynamic $Δ$-Orientation Algorithms
Authors:
Jannick Borowitz,
Ernestine Großmann,
Christian Schulz
Abstract:
A (fully) dynamic graph algorithm is a data structure that supports edge insertions, edge deletions, and answers certain queries that are specific to the problem under consideration. There has been a lot of research on dynamic algorithms for graph problems that are solvable in polynomial time by a static algorithm. However, while there is a large body of theoretical work on efficient dynamic graph…
▽ More
A (fully) dynamic graph algorithm is a data structure that supports edge insertions, edge deletions, and answers certain queries that are specific to the problem under consideration. There has been a lot of research on dynamic algorithms for graph problems that are solvable in polynomial time by a static algorithm. However, while there is a large body of theoretical work on efficient dynamic graph algorithms, a lot of these algorithms were never implemented and empirically evaluated. In this work, we consider the fully dynamic edge orientation problem, also called fully dynamic $Δ$-orientation problem, which is to maintain an orientation of the edges of an undirected graph such that the out-degree is low. If edges are inserted or deleted, one may have to flip the orientation of some edges in order to avoid vertices having a large out-degree. While there has been theoretical work on dynamic versions of this problem, currently there is no experimental evaluation available. In this work, we close this gap and engineer a range of new dynamic edge orientation algorithms as well as algorithms from the current literature. Moreover, we evaluate these algorithms on real-world dynamic graphs. The best algorithm considered in this paper in terms of quality, based on a simple breadth-first search, computes the optimum result on more than 90% of the instances and is on average only 2.4% worse than the optimum solution.
△ Less
Submitted 18 January, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Attention-based Multiple Instance Learning for Survival Prediction on Lung Cancer Tissue Microarrays
Authors:
Jonas Ammeling,
Lars-Henning Schmidt,
Jonathan Ganz,
Tanja Niedermair,
Christoph Brochhausen-Delius,
Christian Schulz,
Katharina Breininger,
Marc Aubreville
Abstract:
Attention-based multiple instance learning (AMIL) algorithms have proven to be successful in utilizing gigapixel whole-slide images (WSIs) for a variety of different computational pathology tasks such as outcome prediction and cancer subtyping problems. We extended an AMIL approach to the task of survival prediction by utilizing the classical Cox partial likelihood as a loss function, converting t…
▽ More
Attention-based multiple instance learning (AMIL) algorithms have proven to be successful in utilizing gigapixel whole-slide images (WSIs) for a variety of different computational pathology tasks such as outcome prediction and cancer subtyping problems. We extended an AMIL approach to the task of survival prediction by utilizing the classical Cox partial likelihood as a loss function, converting the AMIL model into a nonlinear proportional hazards model. We applied the model to tissue microarray (TMA) slides of 330 lung cancer patients. The results show that AMIL approaches can handle very small amounts of tissue from a TMA and reach similar C-index performance compared to established survival prediction methods trained with highly discriminative clinical factors such as age, cancer grade, and cancer stage
△ Less
Submitted 22 February, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Spectroscopy of the frustrated quantum antiferromagnet Cs$_2$CuCl$_4$
Authors:
Adolfo O. Fumega,
D. Wong,
C. Schulz,
F. Rodríguez,
S. Blanco-Canosa
Abstract:
We investigate the electronic structure of Cs$_2$CuCl$_4$, a material discussed in the framework of a frustrated quantum antiferromagnet, by means of resonant inelastic x-ray scattering (RIXS) and Density Functional Theory (DFT). From the non-dispersive highly localized dd excitations, we resolve the crystal field splitting of the Cu$^{2+}$ ions in a strongly distorted tetrahedral coordination. Th…
▽ More
We investigate the electronic structure of Cs$_2$CuCl$_4$, a material discussed in the framework of a frustrated quantum antiferromagnet, by means of resonant inelastic x-ray scattering (RIXS) and Density Functional Theory (DFT). From the non-dispersive highly localized dd excitations, we resolve the crystal field splitting of the Cu$^{2+}$ ions in a strongly distorted tetrahedral coordination. This allows us to model the RIXS spectrum within the Crystal Field Theory (CFT), assign the dd orbital excitations and retrieve experimentally the values of the crystal field splitting parameters D$_q$, D$_s$ and D$_τ$. The electronic structure obtained ab-initio agrees with the RIXS spectrum and modelled by CFT, highlighting the potential of combined spectroscopic, cluster and DFT calculations to determine the electronic ground state of complex materials.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Undefinability of multiplication in Presburger arithmetic with sets of powers
Authors:
Christian Schulz
Abstract:
We begin by proving that any Presburger-definable image of one or more sets of powers has zero natural density. Then, by adapting the proof of a dichotomy result on o-minimal structures by Friedman and Miller, we produce a similar dichotomy for expansions of Presburger arithmetic on the integers. Combining these two results, we obtain that the expansion of the ordered group of integers by any numb…
▽ More
We begin by proving that any Presburger-definable image of one or more sets of powers has zero natural density. Then, by adapting the proof of a dichotomy result on o-minimal structures by Friedman and Miller, we produce a similar dichotomy for expansions of Presburger arithmetic on the integers. Combining these two results, we obtain that the expansion of the ordered group of integers by any number of sets of powers does not define multiplication.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Finding Near-Optimal Weight Independent Sets at Scale
Authors:
Ernestine Großmann,
Sebastian Lamm,
Christian Schulz,
Darren Strash
Abstract:
Computing maximum weight independent sets in graphs is an important NP-hard optimization problem. The problem is particularly difficult to solve in large graphs for which data reduction techniques do not work well. To be more precise, state-of-the-art branch-and-reduce algorithms can solve many large-scale graphs if reductions are applicable. Otherwise, their performance quickly degrades due to br…
▽ More
Computing maximum weight independent sets in graphs is an important NP-hard optimization problem. The problem is particularly difficult to solve in large graphs for which data reduction techniques do not work well. To be more precise, state-of-the-art branch-and-reduce algorithms can solve many large-scale graphs if reductions are applicable. Otherwise, their performance quickly degrades due to branching requiring exponential time. In this paper, we develop an advanced memetic algorithm to tackle the problem, which incorporates recent data reduction techniques to compute near-optimal weighted independent sets in huge sparse networks. More precisely, we use a memetic approach to recursively choose vertices that are likely to be in a large-weight independent set. We include these vertices into the solution, and further reduce the graph. We show that identifying and removing vertices likely to be in large-weight independent sets opens up the reduction space and speeds up the computation of large-weight independent sets remarkably. Our experimental evaluation indicates that we are able to outperform state-of-the-art algorithms. For example, our two algorithm configurations compute the best results among all competing algorithms for 205 out of 207 instances. Thus can be seen as a useful tool when large-weight independent sets need to be computed in~practice.
△ Less
Submitted 21 April, 2023; v1 submitted 29 August, 2022;
originally announced August 2022.
-
Scalable Multilevel and Memetic Signed Graph Clustering
Authors:
Felix Hausberger,
Marcelo Fonseca Faraj,
Christian Schulz
Abstract:
In this study, we address the complex issue of graph clustering in signed graphs, which are characterized by positive and negative weighted edges representing attraction and repulsion among nodes, respectively. The primary objective is to efficiently partition the graph into clusters, ensuring that nodes within a cluster are closely linked by positive edges while minimizing negative edge connectio…
▽ More
In this study, we address the complex issue of graph clustering in signed graphs, which are characterized by positive and negative weighted edges representing attraction and repulsion among nodes, respectively. The primary objective is to efficiently partition the graph into clusters, ensuring that nodes within a cluster are closely linked by positive edges while minimizing negative edge connections between them. To tackle this challenge, we first develop a scalable multilevel algorithm based on label propagation and FM local search. Then we develop a memetic algorithm that incorporates a multilevel strategy. This approach meticulously combines elements of evolutionary algorithms with local refinement techniques, aiming to explore the search space more effectively than repeated executions. Our experimental analysis reveals that this our new algorithms significantly outperforms existing state-of-the-art algorithms. For example, our memetic algorithm can reach solution quality of the previous state-of-the-art algorithm up to four orders of magnitude faster.
△ Less
Submitted 9 July, 2024; v1 submitted 29 August, 2022;
originally announced August 2022.
-
More Recent Advances in (Hyper)Graph Partitioning
Authors:
Ümit V. Çatalyürek,
Karen D. Devine,
Marcelo Fonseca Faraj,
Lars Gottesbüren,
Tobias Heuer,
Henning Meyerhenke,
Peter Sanders,
Sebastian Schlag,
Christian Schulz,
Daniel Seemaier,
Dorothea Wagner
Abstract:
In recent years, significant advances have been made in the design and evaluation of balanced (hyper)graph partitioning algorithms. We survey trends of the last decade in practical algorithms for balanced (hyper)graph partitioning together with future research directions. Our work serves as an update to a previous survey on the topic. In particular, the survey extends the previous survey by also c…
▽ More
In recent years, significant advances have been made in the design and evaluation of balanced (hyper)graph partitioning algorithms. We survey trends of the last decade in practical algorithms for balanced (hyper)graph partitioning together with future research directions. Our work serves as an update to a previous survey on the topic. In particular, the survey extends the previous survey by also covering hypergraph partitioning and streaming algorithms, and has an additional focus on parallel algorithms.
△ Less
Submitted 30 June, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Local Motif Clustering via (Hyper)Graph Partitioning
Authors:
Adil Chhabra,
Marcelo Fonseca Faraj,
Christian Schulz
Abstract:
A widely-used operation on graphs is local clustering, i.e., extracting a well-characterized community around a seed node without the need to process the whole graph. Recently local motif clustering has been proposed: it looks for a local cluster based on the distribution of motifs. Since this local clustering perspective is relatively new, most approaches proposed for it are extensions of statist…
▽ More
A widely-used operation on graphs is local clustering, i.e., extracting a well-characterized community around a seed node without the need to process the whole graph. Recently local motif clustering has been proposed: it looks for a local cluster based on the distribution of motifs. Since this local clustering perspective is relatively new, most approaches proposed for it are extensions of statistical and numerical methods previously used for edge-based local clustering, while the available combinatorial approaches are still few and relatively simple. In this work, we build a hypergraph and a graph model which both represent the motif-distribution around the seed node. We solve these models using sophisticated combinatorial algorithms designed for (hyper)graph partitioning. In extensive experiments with the triangle motif, we observe that our algorithm computes communities with a motif conductance value being one third on average in comparison against the communities computed by the state-of-the-art tool MAPPR while being 6.3 times faster on average.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
Fractal dimensions of $k$-automatic sets
Authors:
Alexi Block Gorman,
Christian Schulz
Abstract:
This paper seeks to build on the extensive connections that have arisen between automata theory, combinatorics on words, fractal geometry, and model theory. Results in this paper establish a characterization for the behavior of the fractal geometry of "$k$-automatic" sets, subsets of $[0,1]^d$ that are recognized by Büchi automata. The primary tools for building this characterization include the e…
▽ More
This paper seeks to build on the extensive connections that have arisen between automata theory, combinatorics on words, fractal geometry, and model theory. Results in this paper establish a characterization for the behavior of the fractal geometry of "$k$-automatic" sets, subsets of $[0,1]^d$ that are recognized by Büchi automata. The primary tools for building this characterization include the entropy of a regular language and the digraph structure of an automaton. Via an analysis of the strongly connected components of such a structure, we give an algorithmic description of the box-counting dimension, Hausdorff dimension, and Hausdorff measure of the corresponding subset of the unit box. Applications to definability in model-theoretic expansions of the real additive group are laid out as well.
△ Less
Submitted 5 May, 2022;
originally announced May 2022.
-
Electronic Excitations of Hematite Heteroepitaxial Films Measured by Resonant Inelastic X-Ray Scattering at the Fe L-edge
Authors:
David S. Ellis,
Ru-Pan Wang,
Deniz Wong,
Jason K. Cooper,
Christian Schulz,
Yi-De Chuang,
Yifat Piekner,
Daniel A. Grave,
Markus Schleuning,
Dennis Friedrich,
Frank M. F. de Groot,
Avner Rothschild
Abstract:
Resonant Inelastic X-Ray Scattering (RIXS) spectra of hematite were measured at the Fe L3-edge for heteroepitaxial thin films which were undoped and doped with 1% Ti, Sn or Zn, in the energy loss range in excess of 1 eV to study electronic transitions. The spectra were measured for several momentum transfers (q), conducted at both low temperature (T=14K) and room temperature. While we can not rule…
▽ More
Resonant Inelastic X-Ray Scattering (RIXS) spectra of hematite were measured at the Fe L3-edge for heteroepitaxial thin films which were undoped and doped with 1% Ti, Sn or Zn, in the energy loss range in excess of 1 eV to study electronic transitions. The spectra were measured for several momentum transfers (q), conducted at both low temperature (T=14K) and room temperature. While we can not rule out dispersive features possibly owing to propagating excitations, the coarse envelopes of the general spectra did not appreciably change shape with q, implying that the bulk of the observed L-edge RIXS intensity originates from (mostly) non-dispersive ligand field (LF) excitations. Summing the RIXS spectra over q and comparing the results at T=14 K to those at T=300 K, revealed pronounced temperature effects, including an intensity change and energy shift of the 1.4 eV peak, a broadband intensity increase of the 3-4 eV range, and higher energy features. The q-summed spectra and their temperature dependences are virtually identical for nearly all of the samples with different dopants, save for the temperature dependence of the Ti-doped sample's spectrum, which we attribute to being affected by a large number of free charge carriers. Comparing with magnetization measurements for different temperatures and dopings likewise did not show a clear correlation between the RIXS spectra and the magnetic ordering states. To clarify the excited states, we performed spin multiplet calculations which were in excellent agreement with the RIXS spectra over a wide energy range and provide detailed electronic descriptions of the excited states. The implications of these findings to the photoconversion efficiency of hematite photoanodes is discussed.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Recursive Multi-Section on the Fly: Shared-Memory Streaming Algorithms for Hierarchical Graph Partitioning and Process Mapping
Authors:
Marcelo Fonseca Faraj,
Christian Schulz
Abstract:
Partitioning a graph into balanced blocks such that few edges run between blocks is a key problem for large-scale distributed processing. A current trend for partitioning huge graphs are streaming algorithms, which use low computational resources. In this work, we present a shared-memory streaming multi-recursive partitioning scheme that performs recursive multi-sections on the fly without knowing…
▽ More
Partitioning a graph into balanced blocks such that few edges run between blocks is a key problem for large-scale distributed processing. A current trend for partitioning huge graphs are streaming algorithms, which use low computational resources. In this work, we present a shared-memory streaming multi-recursive partitioning scheme that performs recursive multi-sections on the fly without knowing the overall input graph. Our approach has a considerably lower running time complexity in comparison with state-of-the-art non-buffered one-pass partitioning algorithms for the standard graph partitioning case. Moreover, if the topology of a distributed system is known, it is possible to further optimize the communication costs by mapping partitions onto processing elements. Our experiments indicate that our algorithm is both faster and produces better process mappings than competing tools. In case of graph partitioning, our framework is up to two orders of magnitude faster at the cost of 5% more cut edges compared to Fennel.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Weakly Supervised Semantic Segmentation of Remote Sensing Images for Tree Species Classification Based on Explanation Methods
Authors:
Steve Ahlswede,
Nimisha Thekke-Madam,
Christian Schulz,
Birgit Kleinschmit,
Begüm Demir
Abstract:
The collection of a high number of pixel-based labeled training samples for tree species identification is time consuming and costly in operational forestry applications. To address this problem, in this paper we investigate the effectiveness of explanation methods for deep neural networks in performing weakly supervised semantic segmentation using only image-level labels. Specifically, we conside…
▽ More
The collection of a high number of pixel-based labeled training samples for tree species identification is time consuming and costly in operational forestry applications. To address this problem, in this paper we investigate the effectiveness of explanation methods for deep neural networks in performing weakly supervised semantic segmentation using only image-level labels. Specifically, we consider four methods:i) class activation maps (CAM); ii) gradient-based CAM; iii) pixel correlation module; and iv) self-enhancing maps (SEM). We compare these methods with each other using both quantitative and qualitative measures of their segmentation accuracy, as well as their computational requirements. Experimental results obtained on an aerial image archive show that:i) considered explanation techniques are highly relevant for the identification of tree species with weak supervision; and ii) the SEM outperforms the other considered methods. The code for this paper is publicly available at https://git.tu-berlin.de/rsim/rs_wsss.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
A strong version of Cobham's theorem
Authors:
Philipp Hieronymi,
Chris Schulz
Abstract:
Let $k,\ell\geq 2$ be two multiplicatively independent integers. Cobham's famous theorem states that a set $X\subseteq \mathbb{N}$ is both $k$-recognizable and $\ell$-recognizable if and only if it is definable in Presburger arithmetic. Here we show the following strengthening: let $X\subseteq \mathbb{N}^m$ be $k$-recognizable, let $Y\subseteq \mathbb{N}^n$ be $\ell$-recognizable such that both…
▽ More
Let $k,\ell\geq 2$ be two multiplicatively independent integers. Cobham's famous theorem states that a set $X\subseteq \mathbb{N}$ is both $k$-recognizable and $\ell$-recognizable if and only if it is definable in Presburger arithmetic. Here we show the following strengthening: let $X\subseteq \mathbb{N}^m$ be $k$-recognizable, let $Y\subseteq \mathbb{N}^n$ be $\ell$-recognizable such that both $X$ and $Y$ are not definable in Presburger arithmetic. Then the first-order logical theory of $(\mathbb{N},+,X,Y)$ is undecidable. This is in contrast to a well-known theorem of Büchi that the first-order logical theory of $(\mathbb{N},+,X)$ is decidable.
△ Less
Submitted 1 September, 2023; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Generation and characterisation of isolated attosecond pulses at 100kHz repetition rate
Authors:
Tobias Witting,
Mikhail Osolodkov,
Felix Schell,
Felipe Morales,
Serguei Patchkovskii,
Peter Susnjar,
Fabio Cavalcante,
Carmen S. Menoni,
Claus P. Schulz,
Federico J. Furch,
Marc J. J. Vrakking
Abstract:
The generation of coherent light pulses in the extreme ultraviolet (XUV) spectral region with attosecond pulse durations constitutes the foundation of the field of attosecond science. Twenty years after the first demonstration of isolated attosecond pulses, they continue to be a unique tool enabling the observation and control of electron dynamics in atoms, molecules and solids. It has long been i…
▽ More
The generation of coherent light pulses in the extreme ultraviolet (XUV) spectral region with attosecond pulse durations constitutes the foundation of the field of attosecond science. Twenty years after the first demonstration of isolated attosecond pulses, they continue to be a unique tool enabling the observation and control of electron dynamics in atoms, molecules and solids. It has long been identified that an increase in the repetition rate of attosecond light sources is necessary for many applications in atomic and molecular physics, surface science, and imaging. Although high harmonic generation (HHG) at repetition rates exceeding 100 kHz, showing a continuum in the cut-off region of the XUV spectrum was already demonstrated in 2013, the number of photons per pulse was insufficient to perform pulse characterisation via attosecond streaking, let alone to perform a pump-probe experiment. Here we report on the generation and full characterisation of XUV attosecond pulses via HHG driven by near-single-cycle pulses at a repetition rate of 100 kHz. The high number of 10^6 XUV photons per pulse on target enables attosecond electron streaking experiments through which the XUV pulses are determined to consist of a dominant single attosecond pulse. These results open the door for attosecond pump-probe spectroscopy studies at a repetition rate one or two orders of magnitude above current implementations.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical Architectures
Authors:
Maria Predari,
Charilaos Tzovas,
Christian Schulz,
Henning Meyerhenke
Abstract:
Processing massive application graphs on distributed memory systems requires to map the graphs onto the system's processing elements (PEs). This task becomes all the more important when PEs have non-uniform communication costs or the input is highly irregular. Typically, mapping is addressed using partitioning, in a two-step approach or an integrated one. Parallel partitioning tools do exist; yet,…
▽ More
Processing massive application graphs on distributed memory systems requires to map the graphs onto the system's processing elements (PEs). This task becomes all the more important when PEs have non-uniform communication costs or the input is highly irregular. Typically, mapping is addressed using partitioning, in a two-step approach or an integrated one. Parallel partitioning tools do exist; yet, corresponding mapping algorithms or their public implementations all have major sequential parts or other severe scaling limitations. In this paper, we propose a parallel algorithm that maps graphs onto the PEs of a hierarchical system. Our solution integrates partitioning and mapping; it models the system hierarchy in a concise way as an implicit labeled tree. The vertices of the application graph are labeled as well, and these vertex labels induce the mapping. The mapping optimization follows the basic idea of parallel label propagation, but we tailor the gain computations of label changes to quickly account for the induced communication costs. Our MPI-based code is the first public implementation of a parallel graph mapping algorithm; to this end, we extend the partitioning library ParHIP. To evaluate our algorithm's implementation, we perform comparative experiments with complex networks in the million- and billion-scale range. In general our mapping tool shows good scalability on up to a few thousand PEs. Compared to other MPI-based competitors, our algorithm achieves the best speed to quality trade-off and our quality results are even better than non-parallel mapping tools.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
High-Quality Hypergraph Partitioning
Authors:
Sebastian Schlag,
Tobias Heuer,
Lars Gottesbüren,
Yaroslav Akhremtsev,
Christian Schulz,
Peter Sanders
Abstract:
This paper considers the balanced hypergraph partitioning problem, which asks for partitioning the vertices into $k$ disjoint blocks of bounded size while minimizing an objective function over the hyperedges. Here, we consider the most commonly used connectivity metric. We describe our open source hypergraph partitioner KaHyPar which is based on the successful multi-level approach -- driving it to…
▽ More
This paper considers the balanced hypergraph partitioning problem, which asks for partitioning the vertices into $k$ disjoint blocks of bounded size while minimizing an objective function over the hyperedges. Here, we consider the most commonly used connectivity metric. We describe our open source hypergraph partitioner KaHyPar which is based on the successful multi-level approach -- driving it to the extreme of one level for (almost) every vertex. Using carefully designed data structures and dynamic update techniques, this approach offers a very good time-quality tradeoff. We present two preprocessing techniques -- pin sparsification using locality sensitive hashing and community detection based on the Louvain algorithm. The community structure is used to guide the coarsening process that incrementally contracts vertices. Portfolio-based partitioning of the contracted hypergraph already achieves good initial solutions. While reversing the contractions, a combination of highly-localized direct $k$-way local search and flow-based techniques that take a more global view, refine the partition to achieve high quality. Optionally, a memetic algorithm evolves a pool of solution candidates to obtain even higher quality.
We evaluate KaHyPar on a large set of instances from a wide range of application domains. With respect to quality, KaHyPar outperforms all previously considered systems that can handle large hypergraphs such as hMETIS, PaToH, Mondriaan, or Zoltan. KaHyPar is also faster than most of these systems except for PaToH which represents a different speed-quality tradeoff. The results even extend to the special case of graph partitioning, where specialized systems such as KaHIP should have an advantage.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Deep Multilevel Graph Partitioning
Authors:
Lars Gottesbüren,
Tobias Heuer,
Peter Sanders,
Christian Schulz,
Daniel Seemaier
Abstract:
Partitioning a graph into blocks of "roughly equal" weight while cutting only few edges is a fundamental problem in computer science with a wide range of applications. In particular, the problem is a building block in applications that require parallel processing. While the amount of available cores in parallel architectures has significantly increased in recent years, state-of-the-art graph parti…
▽ More
Partitioning a graph into blocks of "roughly equal" weight while cutting only few edges is a fundamental problem in computer science with a wide range of applications. In particular, the problem is a building block in applications that require parallel processing. While the amount of available cores in parallel architectures has significantly increased in recent years, state-of-the-art graph partitioning algorithms do not work well if the input needs to be partitioned into a large number of blocks. Often currently available algorithms compute highly imbalanced solutions, solutions of low quality, or have excessive running time for this case. This is because most high-quality general-purpose graph partitioners are multilevel algorithms which perform graph coarsening to build a hierarchy of graphs, initial partitioning to compute an initial solution, and local improvement to improve the solution throughout the hierarchy. However, for large number of blocks, the smallest graph in the hierarchy that is used for initial partitioning still has to be large.
In this work, we substantially mitigate these problems by introducing deep multilevel graph partitioning and a shared-memory implementation thereof. Our scheme continues the multilevel approach deep into initial partitioning -- integrating it into a framework where recursive bipartitioning and direct k-way partitioning are combined such that they can operate with high performance and quality. Our approach is stronger, more flexible, arguably more elegant, and reduces bottlenecks for parallelization compared to other multilevel approaches. For example, for large number of blocks our algorithm is on average an order of magnitude faster than competing algorithms while computing balanced partitions with comparable solution quality. For small number of blocks, our algorithms are the fastest among competing systems with comparable quality.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
Fully-dynamic Weighted Matching Approximation in Practice
Authors:
Eugenio Angriman,
Henning Meyerhenke,
Christian Schulz,
Bora Uçar
Abstract:
Finding large or heavy matchings in graphs is a ubiquitous combinatorial optimization problem. In this paper, we engineer the first non-trivial implementations for approximating the dynamic weighted matching problem. Our first algorithm is based on random walks/paths combined with dynamic programming. The second algorithm has been introduced by Stubbs and Williams without an implementation. Roughl…
▽ More
Finding large or heavy matchings in graphs is a ubiquitous combinatorial optimization problem. In this paper, we engineer the first non-trivial implementations for approximating the dynamic weighted matching problem. Our first algorithm is based on random walks/paths combined with dynamic programming. The second algorithm has been introduced by Stubbs and Williams without an implementation. Roughly speaking, their algorithm uses dynamic unweighted matching algorithms as a subroutine (within a multilevel approach); this allows us to use previous work on dynamic unweighted matching algorithms as a black box in order to obtain a fully-dynamic weighted matching algorithm. We empirically study the algorithms on an extensive set of dynamic instances and compare them with optimal weighted matchings. Our experiments show that the random walk algorithm typically fares much better than Stubbs/Williams (regarding the time/quality tradeoff), and its results are often not far from the optimum.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Recent Advances in Fully Dynamic Graph Algorithms
Authors:
Kathrin Hanauer,
Monika Henzinger,
Christian Schulz
Abstract:
In recent years, significant advances have been made in the design and analysis of fully dynamic algorithms. However, these theoretical results have received very little attention from the practical perspective. Few of the algorithms are implemented and tested on real datasets, and their practical potential is far from understood. Here, we present a quick reference guide to recent engineering and…
▽ More
In recent years, significant advances have been made in the design and analysis of fully dynamic algorithms. However, these theoretical results have received very little attention from the practical perspective. Few of the algorithms are implemented and tested on real datasets, and their practical potential is far from understood. Here, we present a quick reference guide to recent engineering and theory results in the area of fully dynamic graph algorithms.
△ Less
Submitted 17 November, 2022; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Buffered Streaming Graph Partitioning
Authors:
Marcelo Fonseca Faraj,
Christian Schulz
Abstract:
Partitioning graphs into blocks of roughly equal size is widely used when processing large graphs. Currently there is a gap in the space of available partitioning algorithms. On the one hand, there are streaming algorithms that have been adopted to partition massive graph data on small machines. In the streaming model, vertices arrive one at a time including their neighborhood and then have to be…
▽ More
Partitioning graphs into blocks of roughly equal size is widely used when processing large graphs. Currently there is a gap in the space of available partitioning algorithms. On the one hand, there are streaming algorithms that have been adopted to partition massive graph data on small machines. In the streaming model, vertices arrive one at a time including their neighborhood and then have to be assigned directly to a block. These algorithms can partition huge graphs quickly with little memory, but they produce partitions with low solution quality. On the other hand, there are offline (shared-memory) multilevel algorithms that produce partitions with high quality but also need a machine with enough memory. We make a first step to close this gap by presenting an algorithm that computes significantly improved partitions of huge graphs using a single machine with little memory in streaming setting. First, we adopt the buffered streaming model which is a more reasonable approach in practice. In this model, a processing element can store a buffer, or batch, of nodes before making assignment decisions. When our algorithm receives a batch of nodes, we build a model graph that represents the nodes of the batch and the already present partition structure. This model enables us to apply multilevel algorithms and in turn compute much higher quality solutions of huge graphs on cheap machines than previously possible. To partition the model, we develop a multilevel algorithm that optimizes an objective function that has previously shown to be effective for the streaming setting. This also removes the dependency on the number of blocks k from the running time compared to the previous state-of-the-art. Overall, our algorithm computes, on average, 75.9% better solutions than Fennel using a very small buffer size. In addition, for large values of k our algorithm becomes faster than Fennel.
△ Less
Submitted 22 December, 2021; v1 submitted 18 February, 2021;
originally announced February 2021.
-
Decidability for Sturmian words
Authors:
Philipp Hieronymi,
Dun Ma,
Reed Oei,
Luke Schaeffer,
Christian Schulz,
Jeffrey Shallit
Abstract:
We show that the first-order theory of Sturmian words over Presburger arithmetic is decidable. Using a general adder recognizing addition in Ostrowski numeration systems by Baranwal, Schaeffer and Shallit, we prove that the first-order expansions of Presburger arithmetic by a single Sturmian word are uniformly $ω$-automatic, and then deduce the decidability of the theory of the class of such struc…
▽ More
We show that the first-order theory of Sturmian words over Presburger arithmetic is decidable. Using a general adder recognizing addition in Ostrowski numeration systems by Baranwal, Schaeffer and Shallit, we prove that the first-order expansions of Presburger arithmetic by a single Sturmian word are uniformly $ω$-automatic, and then deduce the decidability of the theory of the class of such structures. Using an implementation of this decision algorithm called Pecan, we automatically reprove classical theorems about Sturmian words in seconds, and are able to obtain new results about antisquares and antipalindromes in characteristic Sturmian words.
△ Less
Submitted 5 March, 2024; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Pecan: An Automated Theorem Prover for Automatic Sequences using Büchi Automata
Authors:
Reed Oei,
Dun Ma,
Christian Schulz,
Philipp Hieronymi
Abstract:
Pecan is an automated theorem prover for reasoning about properties of Sturmian words, an important object in the field of combinatorics on words. It is capable of efficiently proving non-trivial mathematical theorems about all Sturmian words.
Pecan is an automated theorem prover for reasoning about properties of Sturmian words, an important object in the field of combinatorics on words. It is capable of efficiently proving non-trivial mathematical theorems about all Sturmian words.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
Practical Fully Dynamic Minimum Cut Algorithms
Authors:
Monika Henzinger,
Alexander Noe,
Christian Schulz
Abstract:
We present a practically efficient algorithm for maintaining a global minimum cut in large dynamic graphs under both edge insertions and deletions. While there has been theoretical work on this problem, our algorithm is the first implementation of a fully-dynamic algorithm. The algorithm uses the theoretical foundation and combines it with efficient and finely-tuned implementations to give an algo…
▽ More
We present a practically efficient algorithm for maintaining a global minimum cut in large dynamic graphs under both edge insertions and deletions. While there has been theoretical work on this problem, our algorithm is the first implementation of a fully-dynamic algorithm. The algorithm uses the theoretical foundation and combines it with efficient and finely-tuned implementations to give an algorithm that can maintain the global minimum cut of a graph with rapid update times. We show that our algorithm gives up to multiple orders of magnitude speedup compared to static approaches both on edge insertions and deletions.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Recent Advances in Practical Data Reduction
Authors:
Faisal Abu-Khzam,
Sebastian Lamm,
Matthias Mnich,
Alexander Noe,
Christian Schulz,
Darren Strash
Abstract:
Over the last two decades, significant advances have been made in the design and analysis of fixed-parameter algorithms for a wide variety of graph-theoretic problems. This has resulted in an algorithmic toolbox that is by now well-established. However, these theoretical algorithmic ideas have received very little attention from the practical perspective. We survey recent trends in data reduction…
▽ More
Over the last two decades, significant advances have been made in the design and analysis of fixed-parameter algorithms for a wide variety of graph-theoretic problems. This has resulted in an algorithmic toolbox that is by now well-established. However, these theoretical algorithmic ideas have received very little attention from the practical perspective. We survey recent trends in data reduction engineering results for selected problems. Moreover, we describe concrete techniques that may be useful for future implementations in the area and give open problems and research questions.
△ Less
Submitted 31 December, 2020; v1 submitted 23 December, 2020;
originally announced December 2020.
-
The Future is Big Graphs! A Community View on Graph Processing Systems
Authors:
Sherif Sakr,
Angela Bonifati,
Hannes Voigt,
Alexandru Iosup,
Khaled Ammar,
Renzo Angles,
Walid Aref,
Marcelo Arenas,
Maciej Besta,
Peter A. Boncz,
Khuzaima Daudjee,
Emanuele Della Valle,
Stefania Dumbrava,
Olaf Hartig,
Bernhard Haslhofer,
Tim Hegeman,
Jan Hidders,
Katja Hose,
Adriana Iamnitchi,
Vasiliki Kalavri,
Hugo Kapp,
Wim Martens,
M. Tamer Özsu,
Eric Peukert,
Stefan Plantikow
, et al. (16 additional authors not shown)
Abstract:
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue t…
▽ More
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed?
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Biomedical Concept Relatedness -- A large EHR-based benchmark
Authors:
Claudia Schulz,
Josh Levy-Kramer,
Camille Van Assel,
Miklos Kepes,
Nils Hammerla
Abstract:
A promising application of AI to healthcare is the retrieval of information from electronic health records (EHRs), e.g. to aid clinicians in finding relevant information for a consultation or to recruit suitable patients for a study. This requires search capabilities far beyond simple string matching, including the retrieval of concepts (diagnoses, symptoms, medications, etc.) related to the one i…
▽ More
A promising application of AI to healthcare is the retrieval of information from electronic health records (EHRs), e.g. to aid clinicians in finding relevant information for a consultation or to recruit suitable patients for a study. This requires search capabilities far beyond simple string matching, including the retrieval of concepts (diagnoses, symptoms, medications, etc.) related to the one in question. The suitability of AI methods for such applications is tested by predicting the relatedness of concepts with known relatedness scores. However, all existing biomedical concept relatedness datasets are notoriously small and consist of hand-picked concept pairs. We open-source a novel concept relatedness benchmark overcoming these issues: it is six times larger than existing datasets and concept pairs are chosen based on co-occurrence in EHRs, ensuring their relevance for the application of interest. We present an in-depth analysis of our new dataset and compare it to existing ones, highlighting that it is not only larger but also complements existing datasets in terms of the types of concepts included. Initial experiments with state-of-the-art embedding methods show that our dataset is a challenging new benchmark for testing concept relatedness models.
△ Less
Submitted 30 October, 2020;
originally announced October 2020.
-
O'Reach: Even Faster Reachability in Large Graphs
Authors:
Kathrin Hanauer,
Christian Schulz,
Jonathan Trummer
Abstract:
One of the most fundamental problems in computer science is the reachability problem: Given a directed graph and two vertices s and t, can s reach t via a path? We revisit existing techniques and combine them with new approaches to support a large portion of reachability queries in constant time using a linear-sized reachability index. Our new algorithm O'Reach can be easily combined with previous…
▽ More
One of the most fundamental problems in computer science is the reachability problem: Given a directed graph and two vertices s and t, can s reach t via a path? We revisit existing techniques and combine them with new approaches to support a large portion of reachability queries in constant time using a linear-sized reachability index. Our new algorithm O'Reach can be easily combined with previously developed solutions for the problem or run standalone.
In a detailed experimental study, we compare a variety of algorithms with respect to their index-building and query times as well as their memory footprint on a diverse set of instances. Our experiments indicate that the query performance often depends strongly not only on the type of graph, but also on the result, i.e., reachable or unreachable. Furthermore, we show that previous algorithms are significantly sped up when combined with our new approach in almost all scenarios. Surprisingly, due to cache effects, a higher investment in space doesn't necessarily pay off: Reachability queries can often be answered even faster than single memory accesses in a precomputed full reachability matrix.
△ Less
Submitted 1 February, 2021; v1 submitted 25 August, 2020;
originally announced August 2020.
-
Boosting Data Reduction for the Maximum Weight Independent Set Problem Using Increasing Transformations
Authors:
Alexander Gellner,
Sebastian Lamm,
Christian Schulz,
Darren Strash,
Bogdán Zaválnij
Abstract:
Given a vertex-weighted graph, the maximum weight independent set problem asks for a pair-wise non-adjacent set of vertices such that the sum of their weights is maximum. The branch-and-reduce paradigm is the de facto standard approach to solve the problem to optimality in practice. In this paradigm, data reduction rules are applied to decrease the problem size. These data reduction rules ensure t…
▽ More
Given a vertex-weighted graph, the maximum weight independent set problem asks for a pair-wise non-adjacent set of vertices such that the sum of their weights is maximum. The branch-and-reduce paradigm is the de facto standard approach to solve the problem to optimality in practice. In this paradigm, data reduction rules are applied to decrease the problem size. These data reduction rules ensure that given an optimum solution on the new (smaller) input, one can quickly construct an optimum solution on the original input.
We introduce new generalized data reduction and transformation rules for the problem. A key feature of our work is that some transformation rules can increase the size of the input. Surprisingly, these so-called increasing transformations can simplify the problem and also open up the reduction space to yield even smaller irreducible graphs later throughout the algorithm. In experiments, our algorithm computes significantly smaller irreducible graphs on all except one instance, solves more instances to optimality than previously possible, is up to two orders of magnitude faster than the best state-of-the-art solver, and finds higher-quality solutions than heuristic solvers DynWVC and HILS on many instances. While the increasing transformations are only efficient enough for preprocessing at this time, we see this as a critical initial step towards a new branch-and-transform paradigm.
△ Less
Submitted 13 August, 2020; v1 submitted 12 August, 2020;
originally announced August 2020.
-
Efficient Process-to-Node Mapping Algorithms for Stencil Computations
Authors:
Sascha Hunold,
Konrad von Kirchbach,
Markus Lehr,
Christian Schulz,
Jesper Larsson Träff
Abstract:
Good process-to-compute-node mappings can be decisive for well performing HPC applications. A special, important class of process-to-node mapping problems is the problem of mapping processes that communicate in a sparse stencil pattern to Cartesian grids. By thoroughly exploiting the inherently present structure in this type of problem, we devise three novel distributed algorithms that are able to…
▽ More
Good process-to-compute-node mappings can be decisive for well performing HPC applications. A special, important class of process-to-node mapping problems is the problem of mapping processes that communicate in a sparse stencil pattern to Cartesian grids. By thoroughly exploiting the inherently present structure in this type of problem, we devise three novel distributed algorithms that are able to handle arbitrary stencil communication patterns effectively. We analyze the expected performance of our algorithms based on an abstract model of inter- and intra-node communication. An extensive experimental evaluation on several HPC machines shows that our algorithms are up to two orders of magnitude faster in running time than a (sequential) high-quality general graph mapping tool, while obtaining similar results in communication performance. Furthermore, our algorithms also achieve significantly better mapping quality compared to previous state-of-the-art Cartesian grid mapping algorithms. This results in up to a threefold performance improvement of an MPI_Neighbor_alltoall exchange operation. Our new algorithms can be used to implement the MPI_Cart_create functionality.
△ Less
Submitted 20 May, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Faster Parallel Multiterminal Cuts
Authors:
Monika Henzinger,
Alexander Noe,
Christian Schulz
Abstract:
We give an improved branch-and-bound solver for the multiterminal cut problem, based on the recent work of Henzinger et al.. We contribute new, highly effective data reduction rules to transform the graph into a smaller equivalent instance. In addition, we present a local search algorithm that can significantly improve a given solution to the multiterminal cut problem. Our exact algorithm is able…
▽ More
We give an improved branch-and-bound solver for the multiterminal cut problem, based on the recent work of Henzinger et al.. We contribute new, highly effective data reduction rules to transform the graph into a smaller equivalent instance. In addition, we present a local search algorithm that can significantly improve a given solution to the multiterminal cut problem. Our exact algorithm is able to give exact solutions to more and harder problems compared to the state-of-the-art algorithm by Henzinger et al.; and give better solutions for more than two third of the problems that are too large to be solved to optimality. Additionally, we give an inexact heuristic algorithm that computes high-quality solutions for very hard instances in reasonable time.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
Engineering Data Reduction for Nested Dissection
Authors:
Lara Ost,
Christian Schulz,
Darren Strash
Abstract:
Many applications rely on time-intensive matrix operations, such as factorization, which can be sped up significantly for large sparse matrices by interpreting the matrix as a sparse graph and computing a node ordering that minimizes the so-called fill-in. In this paper, we engineer new data reduction rules for the minimum fill-in problem, which significantly reduce the size of the graph while p…
▽ More
Many applications rely on time-intensive matrix operations, such as factorization, which can be sped up significantly for large sparse matrices by interpreting the matrix as a sparse graph and computing a node ordering that minimizes the so-called fill-in. In this paper, we engineer new data reduction rules for the minimum fill-in problem, which significantly reduce the size of the graph while producing an equivalent (or near-equivalent) instance. By applying both new and existing data reduction rules exhaustively before nested dissection, we obtain improved quality and at the same time large improvements in running time on a variety of instances. Our overall algorithm outperforms the state-of-the-art significantly: it not only yields better elimination orders, but it does so significantly faster than previously possible. For example, on road networks, where nested dissection algorithms are typically used as a preprocessing step for shortest path computations, our algorithms are on average six times faster than Metis while computing orderings with less fill-in.
△ Less
Submitted 23 April, 2020;
originally announced April 2020.
-
Dynamic Matching Algorithms in Practice
Authors:
Monika Henzinger,
Shahbaz Khan,
Richard Paul,
Christian Schulz
Abstract:
In recent years, significant advances have been made in the design and analysis of fully dynamic maximal matching algorithms. However, these theoretical results have received very little attention from the practical perspective. Few of the algorithms are implemented and tested on real datasets, and their practical potential is far from understood. In this paper, we attempt to bridge the gap betwee…
▽ More
In recent years, significant advances have been made in the design and analysis of fully dynamic maximal matching algorithms. However, these theoretical results have received very little attention from the practical perspective. Few of the algorithms are implemented and tested on real datasets, and their practical potential is far from understood. In this paper, we attempt to bridge the gap between theory and practice that is currently observed for the fully dynamic maximal matching problem. We engineer several algorithms and empirically study those algorithms on an extensive set of dynamic instances.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer!
Authors:
Claudia Schulz,
Damir Juric
Abstract:
A large number of embeddings trained on medical data have emerged, but it remains unclear how well they represent medical terminology, in particular whether the close relationship of semantically similar medical terms is encoded in these embeddings. To date, only small datasets for testing medical term similarity are available, not allowing to draw conclusions about the generalisability of embeddi…
▽ More
A large number of embeddings trained on medical data have emerged, but it remains unclear how well they represent medical terminology, in particular whether the close relationship of semantically similar medical terms is encoded in these embeddings. To date, only small datasets for testing medical term similarity are available, not allowing to draw conclusions about the generalisability of embeddings to the enormous amount of medical terms used by doctors. We present multiple automatically created large-scale medical term similarity datasets and confirm their high quality in an annotation study with doctors. We evaluate state-of-the-art word and contextual embeddings on our new datasets, comparing multiple vector similarity metrics and word vector aggregation techniques. Our results show that current embeddings are limited in their ability to adequately encode medical terms. The novel datasets thus form a challenging new benchmark for the development of medical embeddings able to accurately represent the whole medical terminology.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Molecular interpretation of the non-Newtonian viscoelastic behavior of liquid water at high frequencies
Authors:
Julius C. F. Schulz,
Alexander Schlaich,
Matthias Heyden,
Roland R. Netz,
Julian Kappler
Abstract:
Using classical as well as ab-initio molecular dynamics simulations, we calculate the frequency-dependent shear viscosity of pure water and water-glycerol mixtures. In agreement with recent experiments, we find deviations from Newtonian-fluid behavior in the THz regime. Based on an extension of the Maxwell model, we introduce a viscoelastic model to describe the observed viscosity spectrum of pure…
▽ More
Using classical as well as ab-initio molecular dynamics simulations, we calculate the frequency-dependent shear viscosity of pure water and water-glycerol mixtures. In agreement with recent experiments, we find deviations from Newtonian-fluid behavior in the THz regime. Based on an extension of the Maxwell model, we introduce a viscoelastic model to describe the observed viscosity spectrum of pure water. We find four relaxation modes in the spectrum which we attribute to i) hydrogen-bond network topology changes, ii) hydrogen-bond stretch vibrations of water pairs, iii) collective vibrations of water molecule triplets, and iv) librational excitations of individual water molecules. Our model quantitatively describes the viscoelastic response of liquid water on short timescales, where the hydrodynamic description via a Newtonian-fluid model breaks down.
△ Less
Submitted 18 March, 2020;
originally announced March 2020.