(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–13 of 13 results for author: Bienz, A

.
  1. arXiv:2309.07337  [pdf, other

    cs.DC

    MPI Advance : Open-Source Message Passing Optimizations

    Authors: Amanda Bienz, Derek Schafer, Anthony Skjellum

    Abstract: The large variety of production implementations of the message passing interface (MPI) each provide unique and varying underlying algorithms. Each emerging supercomputer supports one or a small number of system MPI installations, tuned for the given architecture. Performance varies with MPI version, but application programmers are typically unable to achieve optimal performance with local MPI inst… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Available on conference website : https://eurompi23.github.io/assets/papers/EuroMPI23_paper_33.pdf

  2. arXiv:2308.13869  [pdf, other

    cs.DC

    A More Scalable Sparse Dynamic Data Exchange

    Authors: Andrew Geyko, Gerald Collom, Derek Schafer, Patrick Bridges, Amanda Bienz

    Abstract: Parallel architectures are continually increasing in performance and scale, while underlying algorithmic infrastructure often fail to take full advantage of available compute power. Within the context of MPI, irregular communication patterns create bottlenecks in parallel applications. One common bottleneck is the sparse dynamic data exchange, often required when forming communication patterns wit… ▽ More

    Submitted 3 April, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

  3. arXiv:2306.16589  [pdf, other

    cs.MS cs.PF

    Collective-Optimized FFTs

    Authors: Evelyn Namugwanya, Amanda Bienz, Derek Schafer, Anthony Skjellum

    Abstract: This paper measures the impact of the various alltoallv methods. Results are analyzed within Beatnik, a Z-model solver that is bottlenecked by HeFFTe and representative of applications that rely on FFTs.

    Submitted 4 July, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

  4. arXiv:2306.01876  [pdf, other

    cs.DC

    Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism

    Authors: Gerald Collom, Rui Peng Li, Amanda Bienz

    Abstract: Irregular communication often limits both the performance and scalability of parallel applications. Typically, applications individually implement irregular messages using point-to-point communications, and any optimizations are added directly into the application. As a result, these optimizations lack portability. There is no easy way to optimize point-to-point messages within MPI, as the interfa… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  5. arXiv:2209.06141  [pdf, other

    cs.DC

    Characterizing the Performance of Node-Aware Strategies for Irregular Point-to-Point Communication on Heterogeneous Architectures

    Authors: Shelby Lockhart, Amanda Bienz, William D. Gropp, Luke N. Olson

    Abstract: Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 14 pages, 13 figures

  6. arXiv:2206.03564  [pdf, other

    cs.DC

    A Locality-Aware Bruck Allgather

    Authors: Amanda Bienz, Shreeman Gautam, Amun Kharel

    Abstract: Collective algorithms are an essential part of MPI, allowing application programmers to utilize underlying optimizations of common distributed operations. The MPI_Allgather gathers data, which is originally distributed across all processes, so that all data is available to each process. For small data sizes, the Bruck algorithm is commonly implemented to minimize the maximum number of messages com… ▽ More

    Submitted 23 August, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

  7. arXiv:2203.06144  [pdf, other

    cs.DC

    Performance Analysis and Optimal Node-Aware Communication for Enlarged Conjugate Gradient Methods

    Authors: Shelby Lockhart, Amanda Bienz, William Gropp, Luke Olson

    Abstract: Krylov methods are a key way of solving large sparse linear systems of equations, but suffer from poor strong scalabilty on distributed memory machines. This is due to high synchronization costs from large numbers of collective communication calls alongside a low computational workload. Enlarged Krylov methods address this issue by decreasing the total iterations to convergence, an artifact of spl… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: 24 pages, 21 figures

  8. arXiv:2010.10378  [pdf, other

    cs.DC

    Modeling Data Movement Performance on Heterogeneous Architectures

    Authors: Amanda Bienz, Luke N. Olson, William D. Gropp, Shelby Lockhart

    Abstract: The cost of data movement on parallel systems varies greatly with machine architecture, job partition, and nearby jobs. Performance models that accurately capture the cost of data movement provide a tool for analysis, allowing for communication bottlenecks to be pinpointed. Modern heterogeneous architectures yield increased variance in data movement as there are a number of viable paths for inter-… ▽ More

    Submitted 16 July, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: 7 pages, 6 Figures, Preprint

  9. arXiv:1910.09650  [pdf, other

    cs.DC

    Node-Aware Improvements to Allreduce

    Authors: Amanda Bienz, Luke N. Olson, William D. Gropp

    Abstract: The \texttt{MPI\_Allreduce} collective operation is a core kernel of many parallel codebases, particularly for reductions over a single value per process. The commonly used allreduce recursive-doubling algorithm obtains the lower bound message count, yielding optimality for small reduction sizes based on node-agnostic performance models. However, this algorithm yields duplicate messages between se… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: 10 pages, 11 figures, ExaMPI Workshop at SC19

  10. arXiv:1904.05838  [pdf, other

    cs.DC cs.MS cs.PF math.NA

    Reducing Communication in Algebraic Multigrid with Multi-step Node Aware Communication

    Authors: Amanda Bienz, Luke Olson, William Gropp

    Abstract: Algebraic multigrid (AMG) is often viewed as a scalable $\mathcal{O}(n)$ solver for sparse linear systems. Yet, parallel AMG lacks scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy as well as the iterative solve phase. This work introduces a parallel implementation of AMG to reduce the cost of communication, yieldin… ▽ More

    Submitted 24 April, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

    Comments: 11 pages, 21 figures

  11. arXiv:1806.02030  [pdf, other

    cs.DC

    Improving Performance Models for Irregular Point-to-Point Communication

    Authors: Amanda Bienz, William D. Gropp, Luke N. Olson

    Abstract: Parallel applications are often unable to take full advantage of emerging parallel architectures due to scaling limitations, which arise due to inter-process communication. Performance models are used to analyze the sources of communication costs. However, traditional models for point-to-point communication fail to capture the full cost of many irregular operations, such as sparse matrix methods.… ▽ More

    Submitted 6 June, 2018; originally announced June 2018.

    Comments: 8 pages, 11 figures

  12. arXiv:1612.08060  [pdf, other

    cs.DC cs.MS

    Node Aware Sparse Matrix-Vector Multiplication

    Authors: Amanda Bienz, William D. Gropp, Luke N. Olson

    Abstract: The sparse matrix-vector multiply (SpMV) operation is a key computational kernel in many simulations and linear solvers. The large communication requirements associated with a reference implementation of a parallel SpMV result in poor parallel scalability. The cost of communication depends on the physical locations of the send and receive processes: messages injected into the network are more cost… ▽ More

    Submitted 15 November, 2017; v1 submitted 23 December, 2016; originally announced December 2016.

    Comments: 27 pages, 16 figures

  13. arXiv:1512.04629  [pdf, other

    cs.DC math.NA

    Reducing Parallel Communication in Algebraic Multigrid through Sparsification

    Authors: Amanda Bienz, Robert D. Falgout William Gropp, Luke N. Olson, Jacob B. Schroder

    Abstract: Algebraic multigrid (AMG) is an $\mathcal{O}(n)$ solution process for many large sparse linear systems. A hierarchy of progressively coarser grids is constructed that utilize complementary relaxation and interpolation operators. High-energy error is reduced by relaxation, while low-energy error is mapped to coarse-grids and reduced there. However, large parallel communication costs often limit par… ▽ More

    Submitted 14 December, 2015; originally announced December 2015.

    Comments: 27 pages, 19 figures, submitted to SISC, multigrid, algebraic multigrid, non-Galerkin multigrid, high performance computing

    MSC Class: 65F50