-
Enabling High-Throughput Parallel I/O in Particle-in-Cell Monte Carlo Simulations with openPMD and Darshan I/O Monitoring
Authors:
Jeremy J. Williams,
Daniel Medeiros,
Stefan Costea,
David Tskhakaya,
Franz Poeschel,
René Widera,
Axel Huebl,
Scott Klasky,
Norbert Podhorszki,
Leon Kos,
Ales Podolnik,
Jakub Hromadka,
Tapish Narwal,
Klaus Steiniger,
Michael Bussmann,
Erwin Laure,
Stefano Markidis
Abstract:
Large-scale HPC simulations of plasma dynamics in fusion devices require efficient parallel I/O to avoid slowing down the simulation and to enable the post-processing of critical information. Such complex simulations lacking parallel I/O capabilities may encounter performance bottlenecks, hindering their effectiveness in data-intensive computing tasks. In this work, we focus on introducing and enh…
▽ More
Large-scale HPC simulations of plasma dynamics in fusion devices require efficient parallel I/O to avoid slowing down the simulation and to enable the post-processing of critical information. Such complex simulations lacking parallel I/O capabilities may encounter performance bottlenecks, hindering their effectiveness in data-intensive computing tasks. In this work, we focus on introducing and enhancing the efficiency of parallel I/O operations in Particle-in-Cell Monte Carlo simulations. We first evaluate the scalability of BIT1, a massively-parallel electrostatic PIC MC code, determining its initial write throughput capabilities and performance bottlenecks using an HPC I/O performance monitoring tool, Darshan. We design and develop an adaptor to the openPMD I/O interface that allows us to stream PIC particle and field information to I/O using the BP4 backend, aggressively optimized for I/O efficiency, including the highly efficient ADIOS2 interface. Next, we explore advanced optimization techniques such as data compression, aggregation, and Lustre file striping, achieving write throughput improvements while enhancing data storage efficiency. Finally, we analyze the enhanced high-throughput parallel I/O and storage capabilities achieved through the integration of openPMD with rapid metadata extraction in BP4 format. Our study demonstrates that the integration of openPMD and advanced I/O optimizations significantly enhances BIT1's I/O performance and storage capabilities, successfully introducing high throughput parallel I/O and surpassing the capabilities of traditional file I/O.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Uncertainty Visualization of Critical Points of 2D Scalar Fields for Parametric and Nonparametric Probabilistic Models
Authors:
Tushar M. Athawale,
Zhe Wang,
David Pugmire,
Kenneth Moreland,
Qian Gong,
Scott Klasky,
Chris R. Johnson,
Paul Rosen
Abstract:
This paper presents a novel end-to-end framework for closed-form computation and visualization of critical point uncertainty in 2D uncertain scalar fields. Critical points are fundamental topological descriptors used in the visualization and analysis of scalar fields. The uncertainty inherent in data (e.g., observational and experimental data, approximations in simulations, and compression), howev…
▽ More
This paper presents a novel end-to-end framework for closed-form computation and visualization of critical point uncertainty in 2D uncertain scalar fields. Critical points are fundamental topological descriptors used in the visualization and analysis of scalar fields. The uncertainty inherent in data (e.g., observational and experimental data, approximations in simulations, and compression), however, creates uncertainty regarding critical point positions. Uncertainty in critical point positions, therefore, cannot be ignored, given their impact on downstream data analysis tasks. In this work, we study uncertainty in critical points as a function of uncertainty in data modeled with probability distributions. Although Monte Carlo (MC) sampling techniques have been used in prior studies to quantify critical point uncertainty, they are often expensive and are infrequently used in production-quality visualization software. We, therefore, propose a new end-to-end framework to address these challenges that comprises a threefold contribution. First, we derive the critical point uncertainty in closed form, which is more accurate and efficient than the conventional MC sampling methods. Specifically, we provide the closed-form and semianalytical (a mix of closed-form and MC methods) solutions for parametric (e.g., uniform, Epanechnikov) and nonparametric models (e.g., histograms) with finite support. Second, we accelerate critical point probability computations using a parallel implementation with the VTK-m library, which is platform portable. Finally, we demonstrate the integration of our implementation with the ParaView software system to demonstrate near-real-time results for real datasets.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Machine Learning Techniques for Data Reduction of Climate Applications
Authors:
Xiao Li,
Qian Gong,
Jaemoon Lee,
Scott Klasky,
Anand Rangarajan,
Sanjay Ranka
Abstract:
Scientists conduct large-scale simulations to compute derived quantities-of-interest (QoI) from primary data. Often, QoI are linked to specific features, regions, or time intervals, such that data can be adaptively reduced without compromising the integrity of QoI. For many spatiotemporal applications, these QoI are binary in nature and represent presence or absence of a physical phenomenon. We pr…
▽ More
Scientists conduct large-scale simulations to compute derived quantities-of-interest (QoI) from primary data. Often, QoI are linked to specific features, regions, or time intervals, such that data can be adaptively reduced without compromising the integrity of QoI. For many spatiotemporal applications, these QoI are binary in nature and represent presence or absence of a physical phenomenon. We present a pipelined compression approach that first uses neural-network-based techniques to derive regions where QoI are highly likely to be present. Then, we employ a Guaranteed Autoencoder (GAE) to compress data with differential error bounds. GAE uses QoI information to apply low-error compression to only these regions. This results in overall high compression ratios while still achieving downstream goals of simulation or data collections. Experimental results are presented for climate data generated from the E3SM Simulation model for downstream quantities such as tropical cyclone and atmospheric river detection and tracking. These results show that our approach is superior to comparable methods in the literature.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Machine Learning Techniques for Data Reduction of CFD Applications
Authors:
Jaemoon Lee,
Ki Sung Jung,
Qian Gong,
Xiao Li,
Scott Klasky,
Jacqueline Chen,
Anand Rangarajan,
Sanjay Ranka
Abstract:
We present an approach called guaranteed block autoencoder that leverages Tensor Correlations (GBATC) for reducing the spatiotemporal data generated by computational fluid dynamics (CFD) and other scientific applications. It uses a multidimensional block of tensors (spanning in space and time) for both input and output, capturing the spatiotemporal and interspecies relationship within a tensor. Th…
▽ More
We present an approach called guaranteed block autoencoder that leverages Tensor Correlations (GBATC) for reducing the spatiotemporal data generated by computational fluid dynamics (CFD) and other scientific applications. It uses a multidimensional block of tensors (spanning in space and time) for both input and output, capturing the spatiotemporal and interspecies relationship within a tensor. The tensor consists of species that represent different elements in a CFD simulation. To guarantee the error bound of the reconstructed data, principal component analysis (PCA) is applied to the residual between the original and reconstructed data. This yields a basis matrix, which is then used to project the residual of each instance. The resulting coefficients are retained to enable accurate reconstruction. Experimental results demonstrate that our approach can deliver two orders of magnitude in reduction while still keeping the errors of primary data under scientifically acceptable bounds. Compared to reduction-based approaches based on SZ, our method achieves a substantially higher compression ratio for a given error bound or a better error for a given compression ratio.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring
Authors:
Qian Gong,
Jieyang Chen,
Ben Whitney,
Xin Liang,
Viktor Reshniak,
Tania Banerjee,
Jaemoon Lee,
Anand Rangarajan,
Lipeng Wan,
Nicolas Vidal,
Qing Liu,
Ana Gainaru,
Norbert Podhorszki,
Richard Archibald,
Sanjay Ranka,
Scott Klasky
Abstract:
We describe MGARD, a software providing MultiGrid Adaptive Reduction for floating-point scientific data on structured and unstructured grids. With exceptional data compression capability and precise error control, MGARD addresses a wide range of requirements, including storage reduction, high-performance I/O, and in-situ data analysis. It features a unified application programming interface (API)…
▽ More
We describe MGARD, a software providing MultiGrid Adaptive Reduction for floating-point scientific data on structured and unstructured grids. With exceptional data compression capability and precise error control, MGARD addresses a wide range of requirements, including storage reduction, high-performance I/O, and in-situ data analysis. It features a unified application programming interface (API) that seamlessly operates across diverse computing architectures. MGARD has been optimized with highly-tuned GPU kernels and efficient memory and device management mechanisms, ensuring scalable and rapid operations.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Spatiotemporally adaptive compression for scientific dataset with feature preservation -- a case study on simulation data with extreme climate events analysis
Authors:
Qian Gong,
Chengzhu Zhang,
Xin Liang,
Viktor Reshniak,
Jieyang Chen,
Anand Rangarajan,
Sanjay Ranka,
Nicolas Vidal,
Lipeng Wan,
Paul Ullrich,
Norbert Podhorszki,
Robert Jacob,
Scott Klasky
Abstract:
Scientific discoveries are increasingly constrained by limited storage space and I/O capacities. For time-series simulations and experiments, their data often need to be decimated over timesteps to accommodate storage and I/O limitations. In this paper, we propose a technique that addresses storage costs while improving post-analysis accuracy through spatiotemporal adaptive, error-controlled lossy…
▽ More
Scientific discoveries are increasingly constrained by limited storage space and I/O capacities. For time-series simulations and experiments, their data often need to be decimated over timesteps to accommodate storage and I/O limitations. In this paper, we propose a technique that addresses storage costs while improving post-analysis accuracy through spatiotemporal adaptive, error-controlled lossy compression. We investigate the trade-off between data precision and temporal output rates, revealing that reducing data precision and increasing timestep frequency lead to more accurate analysis outcomes. Additionally, we integrate spatiotemporal feature detection with data compression and demonstrate that performing adaptive error-bounded compression in higher dimensional space enables greater compression ratios, leveraging the error propagation theory of a transformation-based compressor.
To evaluate our approach, we conduct experiments using the well-known E3SM climate simulation code and apply our method to compress variables used for cyclone tracking. Our results show a significant reduction in storage size while enhancing the quality of cyclone tracking analysis, both quantitatively and qualitatively, in comparison to the prevalent timestep decimation approach. Compared to three state-of-the-art lossy compressors lacking feature preservation capabilities, our adaptive compression framework improves perfectly matched cases in TC tracking by 26.4-51.3% at medium compression ratios and by 77.3-571.1% at large compression ratios, with a merely 5-11% computational overhead.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Unraveling Diffusion in Fusion Plasma: A Case Study of In Situ Processing and Particle Sorting
Authors:
Junmin Gu,
Paul Lin,
Kesheng Wu,
Seung-Hoe Ku,
C. S. Chang,
R. Michael Churchill,
Jong Choi,
Norbert Podhorszki,
Scott Klasky
Abstract:
This work starts an in situ processing capability to study a certain diffusion process in magnetic confinement fusion. This diffusion process involves plasma particles that are likely to escape confinement. Such particles carry a significant amount of energy from the burning plasma inside the tokamak to the diverter and damaging the diverter plate. This study requires in situ processing because of…
▽ More
This work starts an in situ processing capability to study a certain diffusion process in magnetic confinement fusion. This diffusion process involves plasma particles that are likely to escape confinement. Such particles carry a significant amount of energy from the burning plasma inside the tokamak to the diverter and damaging the diverter plate. This study requires in situ processing because of the fast changing nature of the particle diffusion process. However, the in situ processing approach is challenging because the amount of data to be retained for the diffusion calculations increases over time, unlike in other in situ processing cases where the amount of data to be processed is constant over time. Here we report our preliminary efforts to control the memory usage while ensuring the necessary analysis tasks are completed in a timely manner. Compared with an earlier naive attempt to directly computing the same diffusion displacements in the simulation code, this in situ version reduces the memory usage from particle information by nearly 60% and computation time by about 20%.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Scalable Hybrid Learning Techniques for Scientific Data Compression
Authors:
Tania Banerjee,
Jong Choi,
Jaemoon Lee,
Qian Gong,
Jieyang Chen,
Scott Klasky,
Anand Rangarajan,
Sanjay Ranka
Abstract:
Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a…
▽ More
Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression that addresses this requirement. Our hybrid compression technique combines machine learning techniques and standard compression methods. Specifically, we combine an autoencoder, an error-bounded lossy compressor to provide guarantees on raw data error, and a constraint satisfaction post-processing step to preserve the QoIs within a minimal error (generally less than floating point error).
The effectiveness of the data compression pipeline is demonstrated by compressing nuclear fusion simulation data generated by a large-scale fusion code, XGC, which produces hundreds of terabytes of data in a single day. Our approach works within the ADIOS framework and results in compression by a factor of more than 150 while requiring only a few percent of the computational resources necessary for generating the data, making the overall approach highly effective for practical scenarios.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
2022 Review of Data-Driven Plasma Science
Authors:
Rushil Anirudh,
Rick Archibald,
M. Salman Asif,
Markus M. Becker,
Sadruddin Benkadda,
Peer-Timo Bremer,
Rick H. S. Budé,
C. S. Chang,
Lei Chen,
R. M. Churchill,
Jonathan Citrin,
Jim A Gaffney,
Ana Gainaru,
Walter Gekelman,
Tom Gibbs,
Satoshi Hamaguchi,
Christian Hill,
Kelli Humbird,
Sören Jalas,
Satoru Kawaguchi,
Gon-Ho Kim,
Manuel Kirchen,
Scott Klasky,
John L. Kline,
Karl Krushelnick
, et al. (38 additional authors not shown)
Abstract:
Data science and technology offer transformative tools and methods to science. This review article highlights latest development and progress in the interdisciplinary field of data-driven plasma science (DDPS). A large amount of data and machine learning algorithms go hand in hand. Most plasma data, whether experimental, observational or computational, are generated or collected by machines today.…
▽ More
Data science and technology offer transformative tools and methods to science. This review article highlights latest development and progress in the interdisciplinary field of data-driven plasma science (DDPS). A large amount of data and machine learning algorithms go hand in hand. Most plasma data, whether experimental, observational or computational, are generated or collected by machines today. It is now becoming impractical for humans to analyze all the data manually. Therefore, it is imperative to train machines to analyze and interpret (eventually) such data as intelligently as humans but far more efficiently in quantity. Despite the recent impressive progress in applications of data science to plasma science and technology, the emerging field of DDPS is still in its infancy. Fueled by some of the most challenging problems such as fusion energy, plasma processing of materials, and fundamental understanding of the universe through observable plasma phenomena, it is expected that DDPS continues to benefit significantly from the interdisciplinary marriage between plasma science and data science into the foreseeable future.
△ Less
Submitted 31 May, 2022;
originally announced May 2022.
-
Near real-time streaming analysis of big fusion data
Authors:
Ralph Kube,
R. Michael Churchill,
CS Chang,
Jong Choi,
Jason Wang,
Scott Klasky,
Laurie Stephey,
Minjun Choi,
Eli Dart
Abstract:
While experiments on fusion plasmas produce high-dimensional data time series with ever increasing magnitude and velocity, data analysis has been lagging behind this development. For example, many data analysis tasks are often performed in a manual, ad-hoc manner some time after an experiment. In this article we introduce the DELTA framework that facilitates near real-time streaming analysis of bi…
▽ More
While experiments on fusion plasmas produce high-dimensional data time series with ever increasing magnitude and velocity, data analysis has been lagging behind this development. For example, many data analysis tasks are often performed in a manual, ad-hoc manner some time after an experiment. In this article we introduce the DELTA framework that facilitates near real-time streaming analysis of big and fast fusion data. By streaming measurement data from fusion experiments to a high-performance compute center, DELTA allows to perform demanding data analysis tasks in between plasma pulses. This article describe the modular and expandable software architecture of DELTA and presents performance benchmarks of its individual components as well as of entire workflows. Our focus is on the streaming analysis of ECEi data measured at KSTAR on NERSCs supercomputers and we routinely achieve data transfer rates of about 500 Megabyte per second. We show that a demanding turbulence analysis workload can be distributed among multiple GPUs and executes in under 5 minutes. We further discuss how DELTA uses modern database systems and container orchestration services to provide web-based real-time data visualization. For the case of ECEi data we demonstrate how data visualizations can be augmented with outputs from machine learning models. By providing session leaders and physics operators results of higher order data analysis using live visualization they may monitor the evolution of a long-pulse discharge in near real-time and may make more informed decision on how to configure the machine for the next shot.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
Improving I/O Performance for Exascale Applications through Online Data Layout Reorganization
Authors:
Lipeng Wan,
Axel Huebl,
Junmin Gu,
Franz Poeschel,
Ana Gainaru,
Ruonan Wang,
Jieyang Chen,
Xin Liang,
Dmitry Ganyushin,
Todd Munson,
Ian Foster,
Jean-Luc Vay,
Norbert Podhorszki,
Kesheng Wu,
Scott Klasky
Abstract:
The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Y…
▽ More
The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Yet, as such algorithms improve parallel application efficiency, they raise new challenges for I/O logic due to their irregular and dynamic data distributions. Thus, while the enormous data rates of Exascale simulations already challenge existing file system write strategies, the need for efficient read and processing of generated data introduces additional constraints on the data layout strategies that can be used when writing data to secondary storage. We review these I/O challenges and introduce two online data layout reorganization approaches for achieving good tradeoffs between read and write performance. We demonstrate the benefits of using these two approaches for the ECP particle-in-cell simulation WarpX, which serves as a motif for a large class of important Exascale applications. We show that by understanding application I/O patterns and carefully designing data layouts we can increase read performance by more than 80%.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2
Authors:
Franz Poeschel,
Juncheng E,
William F. Godoy,
Norbert Podhorszki,
Scott Klasky,
Greg Eisenhauer,
Philip E. Davis,
Lipeng Wan,
Ana Gainaru,
Junmin Gu,
Fabian Koller,
René Widera,
Michael Bussmann,
Axel Huebl
Abstract:
This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mes…
▽ More
This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely-coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks.
△ Less
Submitted 19 January, 2022; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Scalable Multigrid-based Hierarchical Scientific Data Refactoring on GPUs
Authors:
Jieyang Chen,
Lipeng Wan,
Xin Liang,
Ben Whitney,
Qing Liu,
Qian Gong,
David Pugmire,
Nicholas Thompson,
Jong Youl Choi,
Matthew Wolf,
Todd Munson,
Ian Foster,
Scott Klasky
Abstract:
Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth makes it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigr…
▽ More
Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth makes it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigrid-based hierarchical data representations hold promise as a solution to this problem, allowing for flexible conversion between different fidelities so that, for example, data can be created at high fidelity and then transferred or stored at lower fidelity via logically simple and mathematically sound operations. However, the effective use of such representations has been hindered until now by the relatively high costs of creating, accessing, reducing, and otherwise operating on such representations. We describe here highly optimized data refactoring kernels for GPU accelerators that enable efficient creation and manipulation of data in multigrid-based hierarchical forms. We demonstrate that our optimized design can achieve up to 264 TB/s aggregated data refactoring throughput -- 92% of theoretical peak -- on 1024 nodes of the Summit supercomputer. We showcase our optimized design by applying it to a large-scale scientific visualization workflow and the MGARD lossy compression software.
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
MGARD+: Optimizing Multilevel Methods for Error-bounded Scientific Data Reduction
Authors:
Xin Liang,
Ben Whitney,
Jieyang Chen,
Lipeng Wan,
Qing Liu,
Dingwen Tao,
James Kress,
Dave Pugmire,
Matthew Wolf,
Norbert Podhorszki,
Scott Klasky
Abstract:
Data management is becoming increasingly important in dealing with the large amounts of data produced by large-scale scientific simulations and instruments. Existing multilevel compression algorithms offer a promising way to manage scientific data at scale, but may suffer from relatively low performance and reduction quality. In this paper, we propose MGARD+, a multilevel data reduction and refact…
▽ More
Data management is becoming increasingly important in dealing with the large amounts of data produced by large-scale scientific simulations and instruments. Existing multilevel compression algorithms offer a promising way to manage scientific data at scale, but may suffer from relatively low performance and reduction quality. In this paper, we propose MGARD+, a multilevel data reduction and refactoring framework drawing on previous multilevel methods, to achieve high-performance data decomposition and high-quality error-bounded lossy compression. Our contributions are four-fold: 1) We propose a level-wise coefficient quantization method, which uses different error tolerances to quantize the multilevel coefficients. 2) We propose an adaptive decomposition method which treats the multilevel decomposition as a preconditioner and terminates the decomposition process at an appropriate level. 3) We leverage a set of algorithmic optimization strategies to significantly improve the performance of multilevel decomposition/recomposition. 4) We evaluate our proposed method using four real-world scientific datasets and compare with several state-of-the-art lossy compressors. Experiments demonstrate that our optimizations improve the decomposition/recomposition performance of the existing multilevel method by up to 70X, and the proposed compression method can improve compression ratio by up to 2X compared with other state-of-the-art error-bounded lossy compressors under the same level of data distortion.
△ Less
Submitted 10 November, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUs
Authors:
Jieyang Chen,
Lipeng Wan,
Xin Liang,
Ben Whitney,
Qing Liu,
David Pugmire,
Nicholas Thompson,
Matthew Wolf,
Todd Munson,
Ian Foster,
Scott Klasky
Abstract:
Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth make it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigri…
▽ More
Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth make it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigrid-based hierarchical data representations hold promise as a solution to this problem, allowing for flexible conversion between different fidelities so that, for example, data can be created at high fidelity and then transferred or stored at lower fidelity via logically simple and mathematically sound operations. However, the effective use of such representations has been hindered until now by the relatively high costs of creating, accessing, reducing, and otherwise operating on such representations. We describe here highly optimized data refactoring kernels for GPU accelerators that enable efficient creation and manipulation of data in multigrid-based hierarchical forms. We demonstrate that our optimized design can achieve up to 250 TB/s aggregated data refactoring throughput -- 83% of theoretical peak -- on 1024 nodes of the Summit supercomputer. We showcase our optimized design by applying it to a large-scale scientific visualization workflow and the MGARD lossy compression software.
△ Less
Submitted 27 February, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Towards 1ULP evaluation of Daubechies Wavelets
Authors:
Nicholas Thompson,
John Maddock,
George Ostrouchov,
Jeremy Logan,
David Pugmire,
Scott Klasky
Abstract:
We present algorithms to numerically evaluate Daubechies wavelets and scaling functions to high relative accuracy. These algorithms refine the suggestion of Daubechies and Lagarias to evaluate functions defined by two-scale difference equations using splines; carefully choosing amongst a family of rapidly convergent interpolators which effectively capture all the smoothness present in the function…
▽ More
We present algorithms to numerically evaluate Daubechies wavelets and scaling functions to high relative accuracy. These algorithms refine the suggestion of Daubechies and Lagarias to evaluate functions defined by two-scale difference equations using splines; carefully choosing amongst a family of rapidly convergent interpolators which effectively capture all the smoothness present in the function and whose error term admits a small asymptotic constant. We are also able to efficiently compute derivatives, though with a smoothness-induced reduction in accuracy. An implementation is provided in the Boost Software Library.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
A tight-coupling scheme sharing minimum information across a spatial interface between gyrokinetic turbulence codes
Authors:
Julien Dominski,
Seung-Hoe Ku,
Choong-Seock Chang,
Jong Choi,
Eric Suchyta,
Scott Parker,
Scott Klasky,
Amitava Bhattacharjee
Abstract:
A new scheme that tightly couples kinetic turbulence codes across a spatial interface is introduced. This scheme evolves from considerations of competing strategies and down-selection. It is found that the use of a composite kinetic distribution function and fields with global boundary conditions as if the coupled code were one, makes the coupling problem tractable. In contrast, coupling the two s…
▽ More
A new scheme that tightly couples kinetic turbulence codes across a spatial interface is introduced. This scheme evolves from considerations of competing strategies and down-selection. It is found that the use of a composite kinetic distribution function and fields with global boundary conditions as if the coupled code were one, makes the coupling problem tractable. In contrast, coupling the two solutions from each code across the overlap region is found to be more difficult due to numerical dephasing of the turbulent solutions between two solvers. Another advantage of the new scheme is that the data movement can be limited to the 3D fluid quantities, instead of higher dimensional kinetic information, which is computationally more efficient for large scale simulations on leadership class computers.
△ Less
Submitted 20 July, 2018; v1 submitted 13 June, 2018;
originally announced June 2018.
-
On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective
Authors:
Axel Huebl,
Rene Widera,
Felix Schmitt,
Alexander Matthes,
Norbert Podhorszki,
Jong Youl Choi,
Scott Klasky,
Michael Bussmann
Abstract:
We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threa…
▽ More
We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threaded data-transformations for the I/O library ADIOS as a feasible way to trade underutilized host-side compute potential on heterogeneous systems for reduced I/O latency.
△ Less
Submitted 1 June, 2017;
originally announced June 2017.
-
Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma
Authors:
Lingfei Wu,
Kesheng Wu,
Alex Sim,
Michael Churchill,
Jong Y. Choi,
Andreas Stathopoulos,
Cs Chang,
Scott Klasky
Abstract:
A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: loca…
▽ More
A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grouping feature cells into extended feature, and tracking movement of feature through overlapping in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes to detect and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion simulation data, we observed linear speedup on 1024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC.
△ Less
Submitted 2 July, 2016; v1 submitted 13 May, 2015;
originally announced May 2015.
-
Region Templates: Data Representation and Management for Large-Scale Image Analysis
Authors:
George Teodoro,
Tony Pan,
Tahsin Kurc,
Jun Kong,
Lee Cooper,
Scott Klasky,
Joel Saltz
Abstract:
Distributed memory machines equipped with CPUs and GPUs (hybrid computing nodes) are hard to program because of the multiple layers of memory and heterogeneous computing configurations. In this paper, we introduce a region template abstraction for the efficient management of common data types used in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The re…
▽ More
Distributed memory machines equipped with CPUs and GPUs (hybrid computing nodes) are hard to program because of the multiple layers of memory and heterogeneous computing configurations. In this paper, we introduce a region template abstraction for the efficient management of common data types used in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The region template provides a generic container template for common data structures, such as points, arrays, regions, and object sets, within a spatial and temporal bounding box. The region template abstraction enables different data management strategies and data I/O implementations, while providing a homogeneous, unified interface to the application for data storage and retrieval. The execution of region templates applications is coordinated by a runtime system that supports efficient execution in hybrid machines. Region templates applications are represented as hierarchical dataflow in which each computing stage may be represented as another dataflow of finer-grain tasks. A number of optimizations for hybrid machines are available in our runtime system, including performance-aware scheduling for maximizing utilization of computing devices and techniques to reduce impact of data transfers between CPUs and GPUs. An experimental evaluation on a state-of-the-art hybrid cluster using a microscopy imaging study shows that this abstraction adds negligible overhead (about 3%) and achieves good scalability.
△ Less
Submitted 30 May, 2014;
originally announced May 2014.
-
Stable characteristic evolution of generic 3-dimensional single-black-hole spacetimes
Authors:
The Binary Black Hole Grand Challenge Alliance,
:,
R. Gomez,
L. Lehner,
R. Marsa,
J. Winicour,
A. Abrahams,
A. Anderson,
P. Anninos,
T. Baumgarte,
N. Bishop,
S. Brandt J. Browne,
K. Camarda,
M. Choptuik,
R. Correl,
G. Cook,
C. Evans,
L. Finn,
G. Fox,
T. Haupt,
M. Huq,
L. Kidder,
S. Klasky,
P. Laguna,
W. Landry
, et al. (20 additional authors not shown)
Abstract:
We report new results which establish that the accurate 3-dimensional numerical simulation of generic single-black-hole spacetimes has been achieved by characteristic evolution with unlimited long term stability. Our results cover a selection of distorted, moving and spinning single black holes, with evolution times up to 60,000M.
We report new results which establish that the accurate 3-dimensional numerical simulation of generic single-black-hole spacetimes has been achieved by characteristic evolution with unlimited long term stability. Our results cover a selection of distorted, moving and spinning single black holes, with evolution times up to 60,000M.
△ Less
Submitted 20 January, 1998;
originally announced January 1998.
-
Boosted three-dimensional black-hole evolutions with singularity excision
Authors:
The Binary Black Hole Grand Challenge Alliance,
:,
G. B. Cook,
M. F. Huq,
S. A. Klasky,
M. A. Scheel,
A. M. Abrahams,
A. Anderson,
P. Anninos,
T. W. Baumgarte,
N. T. Bishop,
S. R. Brandt,
J. C. Browne,
K. Camarda,
M. W. Choptuik,
C. R. Evans,
L. S. Finn,
G. C. Fox,
R. Gomez,
T. Haupt,
L. E. Kidder,
P. Laguna,
W. Landry,
L. Lehner,
J. Lenaghan
, et al. (21 additional authors not shown)
Abstract:
Binary black hole interactions provide potentially the strongest source of gravitational radiation for detectors currently under development. We present some results from the Binary Black Hole Grand Challenge Alliance three- dimensional Cauchy evolution module. These constitute essential steps towards modeling such interactions and predicting gravitational radiation waveforms. We report on singl…
▽ More
Binary black hole interactions provide potentially the strongest source of gravitational radiation for detectors currently under development. We present some results from the Binary Black Hole Grand Challenge Alliance three- dimensional Cauchy evolution module. These constitute essential steps towards modeling such interactions and predicting gravitational radiation waveforms. We report on single black hole evolutions and the first successful demonstration of a black hole moving freely through a three-dimensional computational grid via a Cauchy evolution: a hole moving ~6M at 0.1c during a total evolution of duration ~60M.
△ Less
Submitted 26 November, 1997;
originally announced November 1997.
-
Gravitational wave extraction and outer boundary conditions by perturbative matching
Authors:
The Binary Black Hole Grand Challenge Alliance,
:,
A. M. Abrahams,
L. Rezzolla,
M. E. Rupright,
A. Anderson,
P. Anninos,
T. W. Baumgarte,
N. T. Bishop,
S. R. Brandt,
J. C. Browne,
K. Camarda,
M. W. Choptuik,
G. B. Cook,
C. R. Evans,
L. S. Finn,
G. Fox,
R. Gomez,
T. Haupt,
M. F. Huq,
L. E. Kidder,
S. Klasky,
P. Laguna,
W. Landry,
L. Lehner
, et al. (20 additional authors not shown)
Abstract:
We present a method for extracting gravitational radiation from a three-dimensional numerical relativity simulation and, using the extracted data, to provide outer boundary conditions. The method treats dynamical gravitational variables as nonspherical perturbations of Schwarzschild geometry. We discuss a code which implements this method and present results of tests which have been performed wi…
▽ More
We present a method for extracting gravitational radiation from a three-dimensional numerical relativity simulation and, using the extracted data, to provide outer boundary conditions. The method treats dynamical gravitational variables as nonspherical perturbations of Schwarzschild geometry. We discuss a code which implements this method and present results of tests which have been performed with a three dimensional numerical relativity code.
△ Less
Submitted 30 September, 1997;
originally announced September 1997.