-
A matrix-free ILU realization based on surrogates
Authors:
Daniel Drzisga,
Andreas Wagner,
Barbara Wohlmuth
Abstract:
Matrix-free techniques play an increasingly important role in large-scale simulations. Schur complement techniques and massively parallel multigrid solvers for second-order elliptic partial differential equations can significantly benefit from reduced memory traffic and consumption. The matrix-free approach often restricts solver components to purely local operations, for instance, the Jacobi- or…
▽ More
Matrix-free techniques play an increasingly important role in large-scale simulations. Schur complement techniques and massively parallel multigrid solvers for second-order elliptic partial differential equations can significantly benefit from reduced memory traffic and consumption. The matrix-free approach often restricts solver components to purely local operations, for instance, the Jacobi- or Gauss--Seidel-Smoothers in multigrid methods. An incomplete LU (ILU) decomposition cannot be calculated from local information and is therefore not amenable to an on-the-fly computation which is typically needed for matrix-free calculations. It generally requires the storage and factorization of a sparse matrix which contradicts the low memory requirements in large scale scenarios. In this work, we propose a matrix-free ILU realization. More precisely, we introduce a memory-efficient, matrix-free ILU(0)-Smoother component for low-order conforming finite elements on tetrahedral hybrid grids. Hybrid grids consist of an unstructured macro-mesh which is subdivided into a structured micro-mesh. The ILU(0) is used for degrees-of-freedom assigned to the interior of macro-tetrahedra. This ILU(0)-Smoother can be used for the efficient matrix-free evaluation of the Steklov-Poincare operator from domain-decomposition methods. After introducing and formally defining our smoother, we investigate its performance on refined macro-tetrahedra. Secondly, the ILU(0)-Smoother on the macro-tetrahedrons is implemented via surrogate matrix polynomials in conjunction with a fast on-the-fly evaluation scheme resulting in an efficient matrix-free algorithm. The polynomial coefficients are obtained by solving a least-squares problem on a small part of the factorized ILU(0) matrices to stay memory efficient. The convergence rates of this smoother with respect to the polynomial order are thoroughly studied.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Semi matrix-free twogrid shifted Laplacian preconditioner for the Helmholtz equation with near optimal shifts
Authors:
Daniel Drzisga,
Tobias Köppl,
Barbara Wohlmuth
Abstract:
Due to its significance in terms of wave phenomena a considerable effort has been put into the design of preconditioners for the Helmholtz equation. One option to derive a preconditioner is to apply a multigrid method on a shifted operator. In such an approach, the wavenumber is shifted by some imaginary value. This step is motivated by the observation that the shifted problem can be more efficien…
▽ More
Due to its significance in terms of wave phenomena a considerable effort has been put into the design of preconditioners for the Helmholtz equation. One option to derive a preconditioner is to apply a multigrid method on a shifted operator. In such an approach, the wavenumber is shifted by some imaginary value. This step is motivated by the observation that the shifted problem can be more efficiently handled by iterative solvers when compared to the standard Helmholtz equation. However, up to now, it is not obvious what the best strategy for the choice of the shift parameter is. It is well known that a good shift parameter depends sensitively on the wavenumber and the discretization parameters such as the order and the mesh size. Therefore, we study the choice of a near optimal complex shift such that an FGMRES solver converges with fewer iterations. Our goal is to provide a map which returns the near optimal shift for the preconditioner depending on the wavenumber and the mesh size. In order to compute this map, a data driven approach is considered: We first generate many samples, and in a second step, we perform a nonlinear regression on this data. With this representative map, the near optimal shift can be obtained by a simple evaluation. Our preconditioner is based on a twogrid V-cycle applied to the shifted problem, allowing us to implement a semi matrix-free method. The performance of our preconditioned FGMRES solver is illustrated by several benchmark problems with heterogeneous wavenumbers in two and three space dimensions.
△ Less
Submitted 3 April, 2021;
originally announced April 2021.
-
Resiliency in Numerical Algorithm Design for Extreme Scale Simulations
Authors:
Emmanuel Agullo,
Mirco Altenbernd,
Hartwig Anzt,
Leonardo Bautista-Gomez,
Tommaso Benacchio,
Luca Bonaventura,
Hans-Joachim Bungartz,
Sanjay Chatterjee,
Florina M. Ciorba,
Nathan DeBardeleben,
Daniel Drzisga,
Sebastian Eibl,
Christian Engelmann,
Wilfried N. Gansterer,
Luc Giraud,
Dominik Goeddeke,
Marco Heisig,
Fabienne Jezequel,
Nils Kohl,
Xiaoye Sherry Li,
Romain Lion,
Miriam Mehl,
Paul Mycek,
Michael Obersteiner,
Enrique S. Quintana-Orti
, et al. (11 additional authors not shown)
Abstract:
This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors.
Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to backgr…
▽ More
This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors.
Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated.
More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors.
The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
The surrogate matrix methodology: Accelerating isogeometric analysis of waves
Authors:
Daniel Drzisga,
Brendan Keith,
Barbara Wohlmuth
Abstract:
The surrogate matrix methodology delivers low-cost approximations of matrices (i.e., surrogate matrices) which are normally computed in Galerkin methods via element-scale quadrature formulas. In this paper, the methodology is applied to a number of model problems in wave mechanics treated in the Galerkin isogeometic setting. Herein, the resulting surrogate methods are shown to significantly reduce…
▽ More
The surrogate matrix methodology delivers low-cost approximations of matrices (i.e., surrogate matrices) which are normally computed in Galerkin methods via element-scale quadrature formulas. In this paper, the methodology is applied to a number of model problems in wave mechanics treated in the Galerkin isogeometic setting. Herein, the resulting surrogate methods are shown to significantly reduce the assembly time in high frequency wave propagation problems. In particular, the assembly time is reduced with negligible loss in solution accuracy. This paper also extends the scope of previous articles in its series by considering multi-patch discretizations of time-harmonic, transient, and nonlinear PDEs as particular use cases of the methodology. Our a priori error analysis for the Helmholtz equation demonstrates that the additional consistency error introduced by the presence of surrogate matrices is independent of the wave number. In addition, our floating point analysis establishes that the computational complexity of the methodology compares favorably to other contemporary fast assembly techniques for isogeometric methods. Our numerical experiments demonstrate clear performance gains for time-harmonic problems, both with and without the presence of perfectly matched layers. Notable speed-ups are also presented for a transient problem with a compressible neo-Hookean material.
△ Less
Submitted 1 July, 2020; v1 submitted 10 April, 2020;
originally announced April 2020.
-
The surrogate matrix methodology: A reference implementation for low-cost assembly in isogeometric analysis
Authors:
Daniel Drzisga,
Brendan Keith,
Barbara Wohlmuth
Abstract:
A reference implementation of a new method in isogeometric analysis (IGA) is presented. It delivers low-cost variable-scale approximations (surrogates) of the matrices which IGA conventionally requires to be computed by element-scale quadrature. To generate surrogate matrices, quadrature must only be performed on a fraction of the elements in the computational domain. In this way, quadrature deter…
▽ More
A reference implementation of a new method in isogeometric analysis (IGA) is presented. It delivers low-cost variable-scale approximations (surrogates) of the matrices which IGA conventionally requires to be computed by element-scale quadrature. To generate surrogate matrices, quadrature must only be performed on a fraction of the elements in the computational domain. In this way, quadrature determines only a subset of the entries in the final matrix. The remaining matrix entries are computed by a simple B-spline interpolation procedure. We present the modifications and extensions required for a reference implementation in the open-source IGA software library GeoPDEs. The exposition is fashioned to help facilitate similar modifications in other contemporary software libraries.
△ Less
Submitted 8 September, 2019;
originally announced September 2019.
-
Stencil scaling for vector-valued PDEs on hybrid grids with applications to generalized Newtonian fluids
Authors:
Daniel Drzisga,
Ulrich Rüde,
Barbara Wohlmuth
Abstract:
Matrix-free finite element implementations for large applications provide an attractive alternative to standard sparse matrix data formats due to the significantly reduced memory consumption. Here, we show that they are also competitive with respect to the run time in the low order case if combined with suitable stencil scaling techniques. We focus on variable coefficient vector-valued partial dif…
▽ More
Matrix-free finite element implementations for large applications provide an attractive alternative to standard sparse matrix data formats due to the significantly reduced memory consumption. Here, we show that they are also competitive with respect to the run time in the low order case if combined with suitable stencil scaling techniques. We focus on variable coefficient vector-valued partial differential equations as they arise in many physical applications. The presented method is based on scaling constant reference stencils originating from a linear finite element discretization instead of evaluating the bilinear forms on-the-fly. This method assumes the usage of hierarchical hybrid grids, and it may be applied to vector-valued second-order elliptic partial differential equations directly or as a part of more complicated problems. We provide theoretical and experimental performance estimates showing the advantages of this new approach compared to the traditional on-the-fly integration and stored matrix approaches. In our numerical experiments, we consider two specific mathematical models. Namely, linear elastostatics and incompressible Stokes flow. The final example considers a non-linear shear-thinning generalized Newtonian fluid. For this type of non-linearity, we present an efficient approach to compute a regularized strain rate which is then used to define the node-wise viscosity. Depending on the compute architecture, we could observe maximum speedups of 64% and 122% compared to the on-the-fly integration. The largest considered example involved solving a Stokes problem with 12288 compute cores on the state of the art supercomputer SuperMUC-NG.
△ Less
Submitted 18 March, 2020; v1 submitted 23 August, 2019;
originally announced August 2019.
-
The surrogate matrix methodology: Low-cost assembly for isogeometric analysis
Authors:
Daniel Drzisga,
Brendan Keith,
Barbara Wohlmuth
Abstract:
A new methodology in isogeometric analysis (IGA) is presented. This methodology delivers low-cost variable-scale approximations (surrogates) of the matrices which IGA conventionally requires to be computed from element-scale quadrature formulas. To generate surrogate matrices, quadrature must only be performed on certain elements in the computational domain. This, in turn, determines only a subset…
▽ More
A new methodology in isogeometric analysis (IGA) is presented. This methodology delivers low-cost variable-scale approximations (surrogates) of the matrices which IGA conventionally requires to be computed from element-scale quadrature formulas. To generate surrogate matrices, quadrature must only be performed on certain elements in the computational domain. This, in turn, determines only a subset of the entries in the final matrix. The remaining matrix entries are computed by a simple B-spline interpolation procedure. Poisson's equation, membrane vibration, plate bending, and Stokes' flow problems are studied. In these problems, the use of surrogate matrices has a negligible impact on solution accuracy. Because only a small fraction of the original quadrature must be performed, we are able to report beyond a fifty-fold reduction in overall assembly time in the same software. The capacity for even further speed-ups is clearly demonstrated. The implementation used here was achieved by a small number of modifications to the open-source IGA software library GeoPDEs. Similar modifications could be made to other present-day software libraries.
△ Less
Submitted 17 September, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.
-
The surrogate matrix methodology: a priori error estimation
Authors:
Daniel Drzisga,
Brendan Keith,
Barbara Wohlmuth
Abstract:
We give the first mathematically rigorous analysis of an emerging approach to finite element analysis (see, e.g., Bauer et al. [Appl. Numer. Math., 2017]), which we hereby refer to as the surrogate matrix methodology. This methodology is based on the piece-wise smooth approximation of the matrices involved in a standard finite element discretization. In particular, it relies on the projection of s…
▽ More
We give the first mathematically rigorous analysis of an emerging approach to finite element analysis (see, e.g., Bauer et al. [Appl. Numer. Math., 2017]), which we hereby refer to as the surrogate matrix methodology. This methodology is based on the piece-wise smooth approximation of the matrices involved in a standard finite element discretization. In particular, it relies on the projection of smooth so-called stencil functions onto high-order polynomial subspaces. The performance advantage of the surrogate matrix methodology is seen in constructions where each stencil function uniquely determines the values of a significant collection of matrix entries. Such constructions are shown to be widely achievable through the use of locally-structured meshes. Therefore, this methodology can be applied to a wide variety of physically meaningful problems, including nonlinear problems and problems with curvilinear geometries. Rigorous a priori error analysis certifies the convergence of a novel surrogate method for the variable coefficient Poisson equation. The flexibility of the methodology is also demonstrated through the construction of novel methods for linear elasticity and nonlinear diffusion problems. In numerous numerical experiments, we demonstrate the efficacy of these new methods in a matrix-free environment with geometric multigrid solvers. In our experiments, up to a twenty-fold decrease in computation time is witnessed over the classical method with an otherwise identical implementation.
△ Less
Submitted 19 February, 2019;
originally announced February 2019.
-
A Scalable and Modular Software Architecture for Finite Elements on Hierarchical Hybrid Grids
Authors:
Nils Kohl,
Dominik Thönnes,
Daniel Drzisga,
Dominik Bartuschat,
Ulrich Rüde
Abstract:
In this article, a new generic higher-order finite-element framework for massively parallel simulations is presented. The modular software architecture is carefully designed to exploit the resources of modern and future supercomputers. Combining an unstructured topology with structured grid refinement facilitates high geometric adaptability and matrix-free multigrid implementations with excellent…
▽ More
In this article, a new generic higher-order finite-element framework for massively parallel simulations is presented. The modular software architecture is carefully designed to exploit the resources of modern and future supercomputers. Combining an unstructured topology with structured grid refinement facilitates high geometric adaptability and matrix-free multigrid implementations with excellent performance. Different abstraction levels and fully distributed data structures additionally ensure high flexibility, extensibility, and scalability. The software concepts support sophisticated load balancing and flexibly combining finite element spaces. Example scenarios with coupled systems of PDEs show the applicability of the concepts to performing geophysical simulations.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
A stencil scaling approach for accelerating matrix-free finite element implementations
Authors:
Simon Bauer,
Daniel Drzisga,
Marcus Mohr,
Ulrich Ruede,
Christian Waluga,
Barbara Wohlmuth
Abstract:
We present a novel approach to fast on-the-fly low order finite element assembly for scalar elliptic partial differential equations of Darcy type with variable coefficients optimized for matrix-free implementations. Our approach introduces a new operator that is obtained by appropriately scaling the reference stiffness matrix from the constant coefficient case. Assuming sufficient regularity, an a…
▽ More
We present a novel approach to fast on-the-fly low order finite element assembly for scalar elliptic partial differential equations of Darcy type with variable coefficients optimized for matrix-free implementations. Our approach introduces a new operator that is obtained by appropriately scaling the reference stiffness matrix from the constant coefficient case. Assuming sufficient regularity, an a priori analysis shows that solutions obtained by this approach are unique and have asymptotically optimal order convergence in the $H^1$- and the $L^2$-norm on hierarchical hybrid grids. For the pre-asymptotic regime, we present a local modification that guarantees uniform ellipticity of the operator. Cost considerations show that our novel approach requires roughly one third of the floating-point operations compared to a classical finite element assembly scheme employing nodal integration. Our theoretical considerations are illustrated by numerical tests that confirm the expectations with respect to accuracy and run-time. A large scale application with more than a hundred billion ($1.6\cdot10^{11}$) degrees of freedom executed on 14,310 compute cores demonstrates the efficiency of the new scaling approach.
△ Less
Submitted 23 July, 2018; v1 submitted 20 September, 2017;
originally announced September 2017.
-
Scheduling massively parallel multigrid for multilevel Monte Carlo methods
Authors:
Björn Gmeiner,
Daniel Drzisga,
Ulrich Ruede,
Robert Scheichl,
Barbara Wohlmuth
Abstract:
The computational complexity of naive, sampling-based uncertainty quantification for 3D partial differential equations is extremely high. Multilevel approaches, such as multilevel Monte Carlo (MLMC), can reduce the complexity significantly, but to exploit them fully in a parallel environment, sophisticated scheduling strategies are needed. Often fast algorithms that are executed in parallel are es…
▽ More
The computational complexity of naive, sampling-based uncertainty quantification for 3D partial differential equations is extremely high. Multilevel approaches, such as multilevel Monte Carlo (MLMC), can reduce the complexity significantly, but to exploit them fully in a parallel environment, sophisticated scheduling strategies are needed. Often fast algorithms that are executed in parallel are essential to compute fine level samples in 3D, whereas to compute individual coarse level samples only moderate numbers of processors can be employed efficiently. We make use of multiple instances of a parallel multigrid solver combined with advanced load balancing techniques. In particular, we optimize the concurrent execution across the three layers of the MLMC method: parallelization across levels, across samples, and across the spatial grid. The overall efficiency and performance of these methods will be analyzed. Here the scalability window of the multigrid solver is revealed as being essential, i.e., the property that the solution can be computed with a range of process numbers while maintaining good parallel efficiency. We evaluate the new scheduling strategies in a series of numerical tests, and conclude the paper demonstrating large 3D scaling experiments.
△ Less
Submitted 12 July, 2016;
originally announced July 2016.