-
A TVD neural network closure and application to turbulent combustion
Authors:
Seung Won Suh,
Jonathan F MacArt,
Luke N Olson,
Jonathan B Freund
Abstract:
Trained neural networks (NN) have attractive features for closing governing equations, but in the absence of additional constraints, they can stray from physical reality. A NN formulation is introduced to preclude spurious oscillations that violate solution boundedness or positivity. It is embedded in the discretized equations as a machine learning closure and strictly constrained, inspired by tot…
▽ More
Trained neural networks (NN) have attractive features for closing governing equations, but in the absence of additional constraints, they can stray from physical reality. A NN formulation is introduced to preclude spurious oscillations that violate solution boundedness or positivity. It is embedded in the discretized equations as a machine learning closure and strictly constrained, inspired by total variation diminishing (TVD) methods for hyperbolic conservation laws. The constraint is exactly enforced during gradient-descent training by rescaling the NN parameters, which maps them onto an explicit feasible set. Demonstrations show that the constrained NN closure model usefully recovers linear and nonlinear hyperbolic phenomena and anti-diffusion while enforcing the non-oscillatory property. Finally, the model is applied to subgrid-scale (SGS) modeling of a turbulent reacting flow, for which it suppresses spurious oscillations in scalar fields that otherwise violate the solution boundedness. It outperforms a simple penalization of oscillations in the loss function.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Monolithic Multigrid Preconditioners for High-Order Discretizations of Stokes Equations
Authors:
Alexey Voronin,
Graham Harper,
Scott MacLachlan,
Luke N. Olson,
Raymond S. Tuminaro
Abstract:
This work introduces and assesses the efficiency of a monolithic $ph$MG multigrid framework designed for high-order discretizations of stationary Stokes systems using Taylor-Hood and Scott-Vogelius elements. The proposed approach integrates coarsening in both approximation order ($p$) and mesh resolution ($h$), to address the computational and memory efficiency challenges that are often encountere…
▽ More
This work introduces and assesses the efficiency of a monolithic $ph$MG multigrid framework designed for high-order discretizations of stationary Stokes systems using Taylor-Hood and Scott-Vogelius elements. The proposed approach integrates coarsening in both approximation order ($p$) and mesh resolution ($h$), to address the computational and memory efficiency challenges that are often encountered in conventional high-order numerical simulations. Our numerical results reveal that $ph$MG offers significant improvements over traditional spatial-coarsening-only multigrid ($h$MG) techniques for problems discretized with Taylor-Hood elements across a variety of problem sizes and discretization orders. In particular, the $ph$MG method exhibits superior performance in reducing setup and solve times, particularly when dealing with higher discretization orders and unstructured problem domains. For Scott-Vogelius discretizations, while monolithic $ph$MG delivers low iteration counts and competitive solve phase timings, it exhibits a discernibly slower setup phase when compared to a multilevel (non-monolithic) full-block-factorization (FBF) preconditioner where $ph$MG is employed only for the velocity unknowns. This is primarily due to the setup costs of the larger mixed-field relaxation patches with monolithic $ph$MG versus the patch setup costs with a single unknown type for FBF.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Monolithic Algebraic Multigrid Preconditioners for the Stokes Equations
Authors:
Alexey Voronin,
Scott MacLachlan,
Luke N. Olson,
Raymond Tuminaro
Abstract:
We investigate a novel monolithic algebraic multigrid (AMG) preconditioner for the Taylor-Hood ($\pmb{\mathbb{P}}_2/\mathbb{P}_1$) and Scott-Vogelius ($\pmb{\mathbb{P}}_2/\mathbb{P}_1^{disc}$) discretizations of the Stokes equations. The algorithm is based on the use of the lower-order $\pmb{\mathbb{P}}_1\text{iso}\kern1pt\pmb{\mathbb{P}}_2/\mathbb{P}_1$ operator within a defect-correction setting…
▽ More
We investigate a novel monolithic algebraic multigrid (AMG) preconditioner for the Taylor-Hood ($\pmb{\mathbb{P}}_2/\mathbb{P}_1$) and Scott-Vogelius ($\pmb{\mathbb{P}}_2/\mathbb{P}_1^{disc}$) discretizations of the Stokes equations. The algorithm is based on the use of the lower-order $\pmb{\mathbb{P}}_1\text{iso}\kern1pt\pmb{\mathbb{P}}_2/\mathbb{P}_1$ operator within a defect-correction setting, in combination with AMG construction of interpolation operators for velocities and pressures. The preconditioning framework is primarily algebraic, though the $\pmb{\mathbb{P}}_1\text{iso}\kern1pt\pmb{\mathbb{P}}_2/\mathbb{P}_1$ operator must be provided. We investigate two relaxation strategies in this setting. Specifically, a novel block factorization approach is devised for Vanka patch systems, which significantly reduces storage requirements and computational overhead, and a Chebyshev adaptation of the LSC-DGS relaxation is developed to improve parallelism. The preconditioner demonstrates robust performance across a variety of 2D and 3D Stokes problems, often matching or exceeding the effectiveness of an inexact block-triangular (or Uzawa) preconditioner, especially in challenging scenarios such as elongated-domain problems.
△ Less
Submitted 31 August, 2024; v1 submitted 11 June, 2023;
originally announced June 2023.
-
Optimized Sparse Matrix Operations for Reverse Mode Automatic Differentiation
Authors:
Nicolas Nytko,
Ali Taghibakhshi,
Tareq Uz Zaman,
Scott MacLachlan,
Luke N. Olson,
Matt West
Abstract:
Sparse matrix representations are ubiquitous in computational science and machine learning, leading to significant reductions in compute time, in comparison to dense representation, for problems that have local connectivity. The adoption of sparse representation in leading ML frameworks such as PyTorch is incomplete, however, with support for both automatic differentiation and GPU acceleration mis…
▽ More
Sparse matrix representations are ubiquitous in computational science and machine learning, leading to significant reductions in compute time, in comparison to dense representation, for problems that have local connectivity. The adoption of sparse representation in leading ML frameworks such as PyTorch is incomplete, however, with support for both automatic differentiation and GPU acceleration missing. In this work, we present an implementation of a CSR-based sparse matrix wrapper for PyTorch with CUDA acceleration for basic matrix operations, as well as automatic differentiability. We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.
△ Less
Submitted 9 November, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
Characterizing the Performance of Node-Aware Strategies for Irregular Point-to-Point Communication on Heterogeneous Architectures
Authors:
Shelby Lockhart,
Amanda Bienz,
William D. Gropp,
Luke N. Olson
Abstract:
Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI…
▽ More
Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI on heterogeneous compute environments through performance modeling, demonstrating the limitations of standard communication strategies for both device-aware and staging-through-host communication techniques. Presented models suggest staging communicated data through host processes then using node-aware communication strategies for high inter-node message counts. Notably, the models also predict that node-aware communication utilizing all available CPU cores to communicate inter-node data leads to the most performant strategy when communicating with a high number of nodes. Model validation is provided via a case study of irregular point-to-point communication patterns in distributed sparse matrix-vector products. Importantly, we include a discussion on the implications model predictions have on communication strategy design for emerging supercomputer architectures.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
On Computing Coercivity Constants in Linear Variational Problems Through Eigenvalue Analysis
Authors:
Peter Sentz,
Jehanzeb Hameed Chaudhry,
Luke N. Olson
Abstract:
In this work, we investigate the convergence of numerical approximations to coercivity constants of variational problems. These constants are essential components of rigorous error bounds for reduced-order modeling; extension of these bounds to the error with respect to exact solutions requires an understanding of convergence rates for discrete coercivity constants. The results are obtained by cha…
▽ More
In this work, we investigate the convergence of numerical approximations to coercivity constants of variational problems. These constants are essential components of rigorous error bounds for reduced-order modeling; extension of these bounds to the error with respect to exact solutions requires an understanding of convergence rates for discrete coercivity constants. The results are obtained by characterizing the coercivity constant as a spectral value of a self-adjoint linear operator; for several differential equations, we show that the coercivity constant is related to the eigenvalue of a compact operator. For these applications, convergence rates are derived and verified with numerical examples.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Reduced Basis Approximations of Parameterized Dynamical Partial Differential Equations via Neural Networks
Authors:
Peter Sentz,
Kristian Beckwith,
Eric C. Cyr,
Luke N. Olson,
Ravi Patel
Abstract:
Projection-based reduced order models are effective at approximating parameter-dependent differential equations that are parametrically separable. When parametric separability is not satisfied, which occurs in both linear and nonlinear problems, projection-based methods fail to adequately reduce the computational complexity. Devising alternative reduced order models is crucial for obtaining effici…
▽ More
Projection-based reduced order models are effective at approximating parameter-dependent differential equations that are parametrically separable. When parametric separability is not satisfied, which occurs in both linear and nonlinear problems, projection-based methods fail to adequately reduce the computational complexity. Devising alternative reduced order models is crucial for obtaining efficient and accurate approximations to expensive high-fidelity models. In this work, we develop a time-stepping procedure for dynamical parameter-dependent problems, in which a neural-network is trained to propagate the coefficients of a reduced basis expansion. This results in an online stage with a computational cost independent of the size of the underlying problem. We demonstrate our method on several parabolic partial differential equations, including a problem that is not parametrically separable.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Performance of Low Synchronization Orthogonalization Methods in Anderson Accelerated Fixed Point Solvers
Authors:
Shelby Lockhart,
David J. Gardner,
Carol S. Woodward,
Stephen Thomas,
Luke N. Olson
Abstract:
Anderson Acceleration (AA) is a method to accelerate the convergence of fixed point iterations for nonlinear, algebraic systems of equations. Due to the requirement of solving a least squares problem at each iteration and a reliance on modified Gram-Schmidt for updating the iteration space, AA requires extra costly synchronization steps for global reductions. Moreover, the number of reductions in…
▽ More
Anderson Acceleration (AA) is a method to accelerate the convergence of fixed point iterations for nonlinear, algebraic systems of equations. Due to the requirement of solving a least squares problem at each iteration and a reliance on modified Gram-Schmidt for updating the iteration space, AA requires extra costly synchronization steps for global reductions. Moreover, the number of reductions in each iteration depends on the size of the iteration space. In this work, we introduce three low synchronization orthogonalization algorithms into AA within SUNDIALS that reduce the total number of global reductions per iteration to a constant of 2 or 3, independent of the size of the iteration space. A performance study demonstrates the reduced time required by the new algorithms at large processor counts with CPUs and demonstrates the predicted performance on multi-GPU architectures. Most importantly, we provide convergence and timing data for multiple numerical experiments to demonstrate reliability of the algorithms within AA and improved performance at parallel strong-scaling limits.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
Coarse-Grid Selection Using Simulated Annealing
Authors:
Tareq. U. Zaman,
Scott P. MacLachlan,
Luke N. Olson,
Matt West
Abstract:
Multilevel techniques are efficient approaches for solving the large linear systems that arise from discretized partial differential equations and other problems. While geometric multigrid requires detailed knowledge about the underlying problem and its discretization, algebraic multigrid aims to be less intrusive, requiring less knowledge about the origin of the linear system. A key step in algeb…
▽ More
Multilevel techniques are efficient approaches for solving the large linear systems that arise from discretized partial differential equations and other problems. While geometric multigrid requires detailed knowledge about the underlying problem and its discretization, algebraic multigrid aims to be less intrusive, requiring less knowledge about the origin of the linear system. A key step in algebraic multigrid is the choice of the coarse/fine partitioning, aiming to balance the convergence of the iteration with its cost. In work by MacLachlan and Saad, a constrained combinatorial optimization problem is used to define the ``best'' coarse grid within the setting of a two-level reduction-based algebraic multigrid method and is shown to be NP-complete. Here, we develop a new coarsening algorithm based on simulated annealing to approximate solutions to this problem, which yields improved results over the greedy algorithm developed previously. We present numerical results for test problems on both structured and unstructured meshes, demonstrating the ability to exploit knowledge about the underlying grid structure if it is available.
△ Less
Submitted 19 January, 2023; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Low-order preconditioning of the Stokes equations
Authors:
Alexey Voronin,
Yunhui He,
Scott MacLachlan,
Luke N. Olson,
Ray Tuminaro
Abstract:
A well-known strategy for building effective preconditioners for higher-order discretizations of some PDEs, such as Poisson's equation, is to leverage effective preconditioners for their low-order analogs. In this work, we show that high-quality preconditioners can also be derived for the Taylor-Hood discretization of the Stokes equations in much the same manner. In particular, we investigate the…
▽ More
A well-known strategy for building effective preconditioners for higher-order discretizations of some PDEs, such as Poisson's equation, is to leverage effective preconditioners for their low-order analogs. In this work, we show that high-quality preconditioners can also be derived for the Taylor-Hood discretization of the Stokes equations in much the same manner. In particular, we investigate the use of geometric multigrid based on the $\boldsymbol{ \mathbb{Q}}_1iso\boldsymbol{ \mathbb{Q}}_2/ \mathbb{Q}_1$ discretization of the Stokes operator as a preconditioner for the $\boldsymbol{ \mathbb{Q}}_2/\mathbb{Q}_1$ discretization of the Stokes system. We utilize local Fourier analysis to optimize the damping parameters for Vanka and Braess-Sarazin relaxation schemes and to achieve robust convergence. These results are then verified and compared against the measured multigrid performance. While geometric multigrid can be applied directly to the $\boldsymbol{ \mathbb{Q}}_2/\mathbb{Q}_1$ system, our ultimate motivation is to apply algebraic multigrid within solvers for $\boldsymbol{ \mathbb{Q}}_2/\mathbb{Q}_1$ systems via the $\boldsymbol{ \mathbb{Q}}_1iso\boldsymbol{ \mathbb{Q}}_2/ \mathbb{Q}_1$ discretization, which will be considered in a companion paper.
△ Less
Submitted 21 April, 2021; v1 submitted 22 March, 2021;
originally announced March 2021.
-
Modeling Data Movement Performance on Heterogeneous Architectures
Authors:
Amanda Bienz,
Luke N. Olson,
William D. Gropp,
Shelby Lockhart
Abstract:
The cost of data movement on parallel systems varies greatly with machine architecture, job partition, and nearby jobs. Performance models that accurately capture the cost of data movement provide a tool for analysis, allowing for communication bottlenecks to be pinpointed. Modern heterogeneous architectures yield increased variance in data movement as there are a number of viable paths for inter-…
▽ More
The cost of data movement on parallel systems varies greatly with machine architecture, job partition, and nearby jobs. Performance models that accurately capture the cost of data movement provide a tool for analysis, allowing for communication bottlenecks to be pinpointed. Modern heterogeneous architectures yield increased variance in data movement as there are a number of viable paths for inter-GPU communication. In this paper, we present performance models for the various paths of inter-node communication on modern heterogeneous architectures, including the trade-off between GPUDirect communication and copying to CPUs. Furthermore, we present a novel optimization for inter-node communication based on these models, utilizing all available CPU cores per node. Finally, we show associated performance improvements for MPI collective operations.
△ Less
Submitted 16 July, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
A Least-Squares Finite Element Reduced Basis Method
Authors:
Jehanzeb Hameed Chaudhry,
Luke N. Olson,
Peter Sentz
Abstract:
We present a reduced basis (RB) method for parametrized linear elliptic partial differential equations (PDEs) in a least-squares finite element framework. A rigorous and reliable error estimate is developed, and is shown to bound the error with respect to the exact solution of the PDE, in contrast to estimates that measure error with respect to a finite-dimensional (high-fidelity) approximation. I…
▽ More
We present a reduced basis (RB) method for parametrized linear elliptic partial differential equations (PDEs) in a least-squares finite element framework. A rigorous and reliable error estimate is developed, and is shown to bound the error with respect to the exact solution of the PDE, in contrast to estimates that measure error with respect to a finite-dimensional (high-fidelity) approximation. It is shown that the first-order formulation of the least-squares finite element is a key ingredient. The method is demonstrated using numerical examples.
△ Less
Submitted 23 September, 2020; v1 submitted 10 March, 2020;
originally announced March 2020.
-
Node-Aware Improvements to Allreduce
Authors:
Amanda Bienz,
Luke N. Olson,
William D. Gropp
Abstract:
The \texttt{MPI\_Allreduce} collective operation is a core kernel of many parallel codebases, particularly for reductions over a single value per process. The commonly used allreduce recursive-doubling algorithm obtains the lower bound message count, yielding optimality for small reduction sizes based on node-agnostic performance models. However, this algorithm yields duplicate messages between se…
▽ More
The \texttt{MPI\_Allreduce} collective operation is a core kernel of many parallel codebases, particularly for reductions over a single value per process. The commonly used allreduce recursive-doubling algorithm obtains the lower bound message count, yielding optimality for small reduction sizes based on node-agnostic performance models. However, this algorithm yields duplicate messages between sets of nodes. Node-aware optimizations in MPICH remove duplicate messages through use of a single master process per node, yielding a large number of inactive processes at each inter-node step. In this paper, we present an algorithm that uses the multiple processes available per node to reduce the maximum number of inter-node messages communicated by a single process, improving the performance of allreduce operations, particularly for small message sizes.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Improving Performance Models for Irregular Point-to-Point Communication
Authors:
Amanda Bienz,
William D. Gropp,
Luke N. Olson
Abstract:
Parallel applications are often unable to take full advantage of emerging parallel architectures due to scaling limitations, which arise due to inter-process communication. Performance models are used to analyze the sources of communication costs. However, traditional models for point-to-point communication fail to capture the full cost of many irregular operations, such as sparse matrix methods.…
▽ More
Parallel applications are often unable to take full advantage of emerging parallel architectures due to scaling limitations, which arise due to inter-process communication. Performance models are used to analyze the sources of communication costs. However, traditional models for point-to-point communication fail to capture the full cost of many irregular operations, such as sparse matrix methods. In this paper, a node-aware based model is presented. Furthermore, the model is extended to include communication queue search time as well as an additional parameter estimating network contention. The resulting model is applied to a variety of irregular communication patterns throughout matrix operations, displaying improved accuracy over traditional models.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
High-order Finite Element--Integral Equation Coupling on Embedded Meshes
Authors:
Natalie N. Beams,
Andreas Klöckner,
Luke N. Olson
Abstract:
This paper presents a high-order method for solving an interface problem for the Poisson equation on embedded meshes through a coupled finite element and integral equation approach. The method is capable of handling homogeneous or inhomogeneous jump conditions without modification and retains high-order convergence close to the embedded interface. We present finite element-integral equation (FE-IE…
▽ More
This paper presents a high-order method for solving an interface problem for the Poisson equation on embedded meshes through a coupled finite element and integral equation approach. The method is capable of handling homogeneous or inhomogeneous jump conditions without modification and retains high-order convergence close to the embedded interface. We present finite element-integral equation (FE-IE) formulations for interior, exterior, and interface problems. The treatments of the exterior and interface problems are new. The resulting linear systems are solved through an iterative approach exploiting the second-kind nature of the IE operator combined with algebraic multigrid preconditioning for the FE part. Assuming smooth continuations of coefficients and right-hand-side data, we show error analysis supporting high-order accuracy. Numerical evidence further supports our claims of efficiency and high-order accuracy for smooth data.
△ Less
Submitted 16 August, 2018; v1 submitted 8 April, 2018;
originally announced April 2018.
-
Scaling Structured Multigrid to 500K+ Cores through Coarse-Grid Redistribution
Authors:
Andrew Reisner,
Luke N. Olson,
J. David Moulton
Abstract:
The efficient solution of sparse, linear systems resulting from the discretization of partial differential equations is crucial to the performance of many physics-based simulations. The algorithmic optimality of multilevel approaches for common discretizations makes them a good candidate for an efficient parallel solver. Yet, modern architectures for high-performance computing systems continue to…
▽ More
The efficient solution of sparse, linear systems resulting from the discretization of partial differential equations is crucial to the performance of many physics-based simulations. The algorithmic optimality of multilevel approaches for common discretizations makes them a good candidate for an efficient parallel solver. Yet, modern architectures for high-performance computing systems continue to challenge the parallel scalability of multilevel solvers. While algebraic multigrid methods are robust for solving a variety of problems, the increasing importance of data locality and cost of data movement in modern architectures motivates the need to carefully exploit structure in the problem.
Robust logically structured variational multigrid methods, such as Black Box Multigrid (BoxMG), maintain structure throughout the multigrid hierarchy. This avoids indirection and increased coarse-grid communication costs typical in parallel algebraic multigrid. Nevertheless, the parallel scalability of structured multigrid is challenged by coarse-grid problems where the overhead in communication dominates computation. In this paper, an algorithm is introduced for redistributing coarse-grid problems through incremental agglomeration. Guided by a predictive performance model, this algorithm provides robust redistribution decisions for structured multilevel solvers.
A two-dimensional diffusion problem is used to demonstrate the significant gain in performance of this algorithm over the previous approach that used agglomeration to one processor. In addition, the parallel scalability of this approach is demonstrated on two large-scale computing systems, with solves on up to 500K+ cores.
△ Less
Submitted 6 March, 2018;
originally announced March 2018.
-
Node Aware Sparse Matrix-Vector Multiplication
Authors:
Amanda Bienz,
William D. Gropp,
Luke N. Olson
Abstract:
The sparse matrix-vector multiply (SpMV) operation is a key computational kernel in many simulations and linear solvers. The large communication requirements associated with a reference implementation of a parallel SpMV result in poor parallel scalability. The cost of communication depends on the physical locations of the send and receive processes: messages injected into the network are more cost…
▽ More
The sparse matrix-vector multiply (SpMV) operation is a key computational kernel in many simulations and linear solvers. The large communication requirements associated with a reference implementation of a parallel SpMV result in poor parallel scalability. The cost of communication depends on the physical locations of the send and receive processes: messages injected into the network are more costly than messages sent between processes on the same node. In this paper, a node aware parallel SpMV (NAPSpMV) is introduced to exploit knowledge of the system topology, specifically the node-processor layout, to reduce costs associated with communication. The values of the input vector are redistributed to minimize both the number and the size of messages that are injected into the network during a SpMV, leading to a reduction in communication costs. A variety of computational experiments that highlight the efficiency of this approach are presented.
△ Less
Submitted 15 November, 2017; v1 submitted 23 December, 2016;
originally announced December 2016.
-
A Root-Node Based Algebraic Multigrid Method
Authors:
Thomas A. Manteuffel,
Luke N. Olson,
Jacob B. Schroder,
Ben S. Southworth
Abstract:
This paper provides a unified and detailed presentation of root-node style algebraic multigrid (AMG). Algebraic multigrid is a popular and effective iterative method for solving large, sparse linear systems that arise from discretizing partial differential equations. However, while AMG is designed for symmetric positive definite matrices (SPD), certain SPD problems, such as anisotropic diffusion,…
▽ More
This paper provides a unified and detailed presentation of root-node style algebraic multigrid (AMG). Algebraic multigrid is a popular and effective iterative method for solving large, sparse linear systems that arise from discretizing partial differential equations. However, while AMG is designed for symmetric positive definite matrices (SPD), certain SPD problems, such as anisotropic diffusion, are still not adequately addressed by existing methods. Non-SPD problems pose an even greater challenge, and in practice AMG is often not considered as a solver for such problems.
The focus of this paper is on so-called root-node AMG, which can be viewed as a combination of classical and aggregation-based multigrid. An algorithm for root-node is outlined and a filtering strategy is developed, which is able to control the cost of using root-node AMG, particularly on difficult problems. New theoretical motivation is provided for root-node and energy-minimization as applied to symmetric as well non-symmetric systems. Numerical results are then presented demonstrating the robust ability of root-node to solve non-symmetric problems, systems-based problems, and difficult SPD problems, including strongly anisotropic diffusion, convection-diffusion, and upwind steady-state transport, in a scalable manner. New, detailed estimates of the computational cost of the setup and solve phase are given for each example, providing additional support for root-node AMG over alternative methods.
△ Less
Submitted 28 January, 2018; v1 submitted 10 October, 2016;
originally announced October 2016.
-
Reducing Parallel Communication in Algebraic Multigrid through Sparsification
Authors:
Amanda Bienz,
Robert D. Falgout William Gropp,
Luke N. Olson,
Jacob B. Schroder
Abstract:
Algebraic multigrid (AMG) is an $\mathcal{O}(n)$ solution process for many large sparse linear systems. A hierarchy of progressively coarser grids is constructed that utilize complementary relaxation and interpolation operators. High-energy error is reduced by relaxation, while low-energy error is mapped to coarse-grids and reduced there. However, large parallel communication costs often limit par…
▽ More
Algebraic multigrid (AMG) is an $\mathcal{O}(n)$ solution process for many large sparse linear systems. A hierarchy of progressively coarser grids is constructed that utilize complementary relaxation and interpolation operators. High-energy error is reduced by relaxation, while low-energy error is mapped to coarse-grids and reduced there. However, large parallel communication costs often limit parallel scalability. As the multigrid hierarchy is formed, each coarse matrix is formed through a triple matrix product. The resulting coarse-grids often have significantly more nonzeros per row than the original fine-grid operator, thereby generating high parallel communication costs on coarse-levels. In this paper, we introduce a method that systematically removes entries in coarse-grid matrices after the hierarchy is formed, leading to an improved communication costs. We sparsify by removing weakly connected or unimportant entries in the matrix, leading to improved solve time. The main trade-off is that if the heuristic identifying unimportant entries is used too aggressively, then AMG convergence can suffer. To counteract this, the original hierarchy is retained, allowing entries to be reintroduced into the solver hierarchy if convergence is too slow. This enables a balance between communication cost and convergence, as necessary. In this paper we present new algorithms for reducing communication and present a number of computational experiments in support.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.
-
A Finite Element Based P3M Method for N-body Problems
Authors:
Natalie N. Beams,
Luke N. Olson,
Jonathan B. Freund
Abstract:
We introduce a fast mesh-based method for computing N-body interactions that is both scalable and accurate. The method is founded on a particle-particle--particle-mesh P3M approach, which decomposes a potential into rapidly decaying short-range interactions and smooth, mesh-resolvable long-range interactions. However, in contrast to the traditional approach of using Gaussian screen functions to ac…
▽ More
We introduce a fast mesh-based method for computing N-body interactions that is both scalable and accurate. The method is founded on a particle-particle--particle-mesh P3M approach, which decomposes a potential into rapidly decaying short-range interactions and smooth, mesh-resolvable long-range interactions. However, in contrast to the traditional approach of using Gaussian screen functions to accomplish this decomposition, our method employs specially designed polynomial bases to construct the screened potentials. Because of this form of the screen, the long-range component of the potential is then solved exactly with a finite element method, leading ultimately to a sparse matrix problem that is solved efficiently with standard multigrid methods. Moreover, since this system represents an exact discretization, the optimal resolution properties of the FFT are unnecessary, though the short-range calculation is now more involved than P3M/PME methods. We introduce the method, analyze its key properties, and demonstrate the accuracy of the algorithm.
△ Less
Submitted 29 March, 2015;
originally announced March 2015.