Search | arXiv e-print repository

arXiv:2408.02869 [pdf, other]

Enabling High-Throughput Parallel I/O in Particle-in-Cell Monte Carlo Simulations with openPMD and Darshan I/O Monitoring

Authors: Jeremy J. Williams, Daniel Medeiros, Stefan Costea, David Tskhakaya, Franz Poeschel, René Widera, Axel Huebl, Scott Klasky, Norbert Podhorszki, Leon Kos, Ales Podolnik, Jakub Hromadka, Tapish Narwal, Klaus Steiniger, Michael Bussmann, Erwin Laure, Stefano Markidis

Abstract: Large-scale HPC simulations of plasma dynamics in fusion devices require efficient parallel I/O to avoid slowing down the simulation and to enable the post-processing of critical information. Such complex simulations lacking parallel I/O capabilities may encounter performance bottlenecks, hindering their effectiveness in data-intensive computing tasks. In this work, we focus on introducing and enh… ▽ More Large-scale HPC simulations of plasma dynamics in fusion devices require efficient parallel I/O to avoid slowing down the simulation and to enable the post-processing of critical information. Such complex simulations lacking parallel I/O capabilities may encounter performance bottlenecks, hindering their effectiveness in data-intensive computing tasks. In this work, we focus on introducing and enhancing the efficiency of parallel I/O operations in Particle-in-Cell Monte Carlo simulations. We first evaluate the scalability of BIT1, a massively-parallel electrostatic PIC MC code, determining its initial write throughput capabilities and performance bottlenecks using an HPC I/O performance monitoring tool, Darshan. We design and develop an adaptor to the openPMD I/O interface that allows us to stream PIC particle and field information to I/O using the BP4 backend, aggressively optimized for I/O efficiency, including the highly efficient ADIOS2 interface. Next, we explore advanced optimization techniques such as data compression, aggregation, and Lustre file striping, achieving write throughput improvements while enhancing data storage efficiency. Finally, we analyze the enhanced high-throughput parallel I/O and storage capabilities achieved through the integration of openPMD with rapid metadata extraction in BP4 format. Our study demonstrates that the integration of openPMD and advanced I/O optimizations significantly enhances BIT1's I/O performance and storage capabilities, successfully introducing high throughput parallel I/O and surpassing the capabilities of traditional file I/O. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: Accepted by IEEE Cluster workshop 2024 (REX-IO 2024), prepared in the standardized IEEE conference format and consists of 10 pages, which includes the main text, references, and figures

arXiv:2407.18015 [pdf, other]

Uncertainty Visualization of Critical Points of 2D Scalar Fields for Parametric and Nonparametric Probabilistic Models

Authors: Tushar M. Athawale, Zhe Wang, David Pugmire, Kenneth Moreland, Qian Gong, Scott Klasky, Chris R. Johnson, Paul Rosen

Abstract: This paper presents a novel end-to-end framework for closed-form computation and visualization of critical point uncertainty in 2D uncertain scalar fields. Critical points are fundamental topological descriptors used in the visualization and analysis of scalar fields. The uncertainty inherent in data (e.g., observational and experimental data, approximations in simulations, and compression), howev… ▽ More This paper presents a novel end-to-end framework for closed-form computation and visualization of critical point uncertainty in 2D uncertain scalar fields. Critical points are fundamental topological descriptors used in the visualization and analysis of scalar fields. The uncertainty inherent in data (e.g., observational and experimental data, approximations in simulations, and compression), however, creates uncertainty regarding critical point positions. Uncertainty in critical point positions, therefore, cannot be ignored, given their impact on downstream data analysis tasks. In this work, we study uncertainty in critical points as a function of uncertainty in data modeled with probability distributions. Although Monte Carlo (MC) sampling techniques have been used in prior studies to quantify critical point uncertainty, they are often expensive and are infrequently used in production-quality visualization software. We, therefore, propose a new end-to-end framework to address these challenges that comprises a threefold contribution. First, we derive the critical point uncertainty in closed form, which is more accurate and efficient than the conventional MC sampling methods. Specifically, we provide the closed-form and semianalytical (a mix of closed-form and MC methods) solutions for parametric (e.g., uniform, Epanechnikov) and nonparametric models (e.g., histograms) with finite support. Second, we accelerate critical point probability computations using a parallel implementation with the VTK-m library, which is platform portable. Finally, we demonstrate the integration of our implementation with the ParaView software system to demonstrate near-real-time results for real datasets. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: 9 pages paper + 2 page references, 8 figures, IEEE VIS 2024 paper to be published as a special issue of IEEE Transactions on Visualization and Computer Graphics (TVCG)

arXiv:2405.00879 [pdf, other]

Machine Learning Techniques for Data Reduction of Climate Applications

Authors: Xiao Li, Qian Gong, Jaemoon Lee, Scott Klasky, Anand Rangarajan, Sanjay Ranka

Abstract: Scientists conduct large-scale simulations to compute derived quantities-of-interest (QoI) from primary data. Often, QoI are linked to specific features, regions, or time intervals, such that data can be adaptively reduced without compromising the integrity of QoI. For many spatiotemporal applications, these QoI are binary in nature and represent presence or absence of a physical phenomenon. We pr… ▽ More Scientists conduct large-scale simulations to compute derived quantities-of-interest (QoI) from primary data. Often, QoI are linked to specific features, regions, or time intervals, such that data can be adaptively reduced without compromising the integrity of QoI. For many spatiotemporal applications, these QoI are binary in nature and represent presence or absence of a physical phenomenon. We present a pipelined compression approach that first uses neural-network-based techniques to derive regions where QoI are highly likely to be present. Then, we employ a Guaranteed Autoencoder (GAE) to compress data with differential error bounds. GAE uses QoI information to apply low-error compression to only these regions. This results in overall high compression ratios while still achieving downstream goals of simulation or data collections. Experimental results are presented for climate data generated from the E3SM Simulation model for downstream quantities such as tropical cyclone and atmospheric river detection and tracking. These results show that our approach is superior to comparable methods in the literature. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 7 pages. arXiv admin note: text overlap with arXiv:2404.18063

arXiv:2404.18063 [pdf, other]

Machine Learning Techniques for Data Reduction of CFD Applications

Authors: Jaemoon Lee, Ki Sung Jung, Qian Gong, Xiao Li, Scott Klasky, Jacqueline Chen, Anand Rangarajan, Sanjay Ranka

Abstract: We present an approach called guaranteed block autoencoder that leverages Tensor Correlations (GBATC) for reducing the spatiotemporal data generated by computational fluid dynamics (CFD) and other scientific applications. It uses a multidimensional block of tensors (spanning in space and time) for both input and output, capturing the spatiotemporal and interspecies relationship within a tensor. Th… ▽ More We present an approach called guaranteed block autoencoder that leverages Tensor Correlations (GBATC) for reducing the spatiotemporal data generated by computational fluid dynamics (CFD) and other scientific applications. It uses a multidimensional block of tensors (spanning in space and time) for both input and output, capturing the spatiotemporal and interspecies relationship within a tensor. The tensor consists of species that represent different elements in a CFD simulation. To guarantee the error bound of the reconstructed data, principal component analysis (PCA) is applied to the residual between the original and reconstructed data. This yields a basis matrix, which is then used to project the residual of each instance. The resulting coefficients are retained to enable accurate reconstruction. Experimental results demonstrate that our approach can deliver two orders of magnitude in reduction while still keeping the errors of primary data under scientifically acceptable bounds. Compared to reduction-based approaches based on SZ, our method achieves a substantially higher compression ratio for a given error bound or a better error for a given compression ratio. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 10 pages, 8 figures

arXiv:2401.05994 [pdf, other]

doi 10.1016/j.softx.2023.101590

MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring

Authors: Qian Gong, Jieyang Chen, Ben Whitney, Xin Liang, Viktor Reshniak, Tania Banerjee, Jaemoon Lee, Anand Rangarajan, Lipeng Wan, Nicolas Vidal, Qing Liu, Ana Gainaru, Norbert Podhorszki, Richard Archibald, Sanjay Ranka, Scott Klasky

Abstract: We describe MGARD, a software providing MultiGrid Adaptive Reduction for floating-point scientific data on structured and unstructured grids. With exceptional data compression capability and precise error control, MGARD addresses a wide range of requirements, including storage reduction, high-performance I/O, and in-situ data analysis. It features a unified application programming interface (API)… ▽ More We describe MGARD, a software providing MultiGrid Adaptive Reduction for floating-point scientific data on structured and unstructured grids. With exceptional data compression capability and precise error control, MGARD addresses a wide range of requirements, including storage reduction, high-performance I/O, and in-situ data analysis. It features a unified application programming interface (API) that seamlessly operates across diverse computing architectures. MGARD has been optimized with highly-tuned GPU kernels and efficient memory and device management mechanisms, ensuring scalable and rapid operations. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 20 pages, 8 figures

Journal ref: SoftwareX, 24(2023), 101590

arXiv:2401.03317 [pdf, other]

doi 10.1109/e-Science58273.2023.10254796

Spatiotemporally adaptive compression for scientific dataset with feature preservation -- a case study on simulation data with extreme climate events analysis

Authors: Qian Gong, Chengzhu Zhang, Xin Liang, Viktor Reshniak, Jieyang Chen, Anand Rangarajan, Sanjay Ranka, Nicolas Vidal, Lipeng Wan, Paul Ullrich, Norbert Podhorszki, Robert Jacob, Scott Klasky

Abstract: Scientific discoveries are increasingly constrained by limited storage space and I/O capacities. For time-series simulations and experiments, their data often need to be decimated over timesteps to accommodate storage and I/O limitations. In this paper, we propose a technique that addresses storage costs while improving post-analysis accuracy through spatiotemporal adaptive, error-controlled lossy… ▽ More Scientific discoveries are increasingly constrained by limited storage space and I/O capacities. For time-series simulations and experiments, their data often need to be decimated over timesteps to accommodate storage and I/O limitations. In this paper, we propose a technique that addresses storage costs while improving post-analysis accuracy through spatiotemporal adaptive, error-controlled lossy compression. We investigate the trade-off between data precision and temporal output rates, revealing that reducing data precision and increasing timestep frequency lead to more accurate analysis outcomes. Additionally, we integrate spatiotemporal feature detection with data compression and demonstrate that performing adaptive error-bounded compression in higher dimensional space enables greater compression ratios, leveraging the error propagation theory of a transformation-based compressor. To evaluate our approach, we conduct experiments using the well-known E3SM climate simulation code and apply our method to compress variables used for cyclone tracking. Our results show a significant reduction in storage size while enhancing the quality of cyclone tracking analysis, both quantitatively and qualitatively, in comparison to the prevalent timestep decimation approach. Compared to three state-of-the-art lossy compressors lacking feature preservation capabilities, our adaptive compression framework improves perfectly matched cases in TC tracking by 26.4-51.3% at medium compression ratios and by 77.3-571.1% at large compression ratios, with a merely 5-11% computational overhead. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: 10 pages, 13 figures, 2023 IEEE International Conference on e-Science and Grid Computing

Journal ref: 2023 IEEE 19th International Conference on e-Science, Limassol, Cyprus, 2023, pp. 1-10

arXiv:2311.01288 [pdf, other]

Unraveling Diffusion in Fusion Plasma: A Case Study of In Situ Processing and Particle Sorting

Authors: Junmin Gu, Paul Lin, Kesheng Wu, Seung-Hoe Ku, C. S. Chang, R. Michael Churchill, Jong Choi, Norbert Podhorszki, Scott Klasky

Abstract: This work starts an in situ processing capability to study a certain diffusion process in magnetic confinement fusion. This diffusion process involves plasma particles that are likely to escape confinement. Such particles carry a significant amount of energy from the burning plasma inside the tokamak to the diverter and damaging the diverter plate. This study requires in situ processing because of… ▽ More This work starts an in situ processing capability to study a certain diffusion process in magnetic confinement fusion. This diffusion process involves plasma particles that are likely to escape confinement. Such particles carry a significant amount of energy from the burning plasma inside the tokamak to the diverter and damaging the diverter plate. This study requires in situ processing because of the fast changing nature of the particle diffusion process. However, the in situ processing approach is challenging because the amount of data to be retained for the diffusion calculations increases over time, unlike in other in situ processing cases where the amount of data to be processed is constant over time. Here we report our preliminary efforts to control the memory usage while ensuring the necessary analysis tasks are completed in a timely manner. Compared with an earlier naive attempt to directly computing the same diffusion displacements in the simulation code, this in situ version reduces the memory usage from particle information by nearly 60% and computation time by about 20%. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2212.10733 [pdf, other]

Scalable Hybrid Learning Techniques for Scientific Data Compression

Authors: Tania Banerjee, Jong Choi, Jaemoon Lee, Qian Gong, Jieyang Chen, Scott Klasky, Anand Rangarajan, Sanjay Ranka

Abstract: Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a… ▽ More Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression that addresses this requirement. Our hybrid compression technique combines machine learning techniques and standard compression methods. Specifically, we combine an autoencoder, an error-bounded lossy compressor to provide guarantees on raw data error, and a constraint satisfaction post-processing step to preserve the QoIs within a minimal error (generally less than floating point error). The effectiveness of the data compression pipeline is demonstrated by compressing nuclear fusion simulation data generated by a large-scale fusion code, XGC, which produces hundreds of terabytes of data in a single day. Our approach works within the ADIOS framework and results in compression by a factor of more than 150 while requiring only a few percent of the computational resources necessary for generating the data, making the overall approach highly effective for practical scenarios. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2205.15832 [pdf, other]

doi 10.1109/TPS.2023.3268170

2022 Review of Data-Driven Plasma Science

Authors: Rushil Anirudh, Rick Archibald, M. Salman Asif, Markus M. Becker, Sadruddin Benkadda, Peer-Timo Bremer, Rick H. S. Budé, C. S. Chang, Lei Chen, R. M. Churchill, Jonathan Citrin, Jim A Gaffney, Ana Gainaru, Walter Gekelman, Tom Gibbs, Satoshi Hamaguchi, Christian Hill, Kelli Humbird, Sören Jalas, Satoru Kawaguchi, Gon-Ho Kim, Manuel Kirchen, Scott Klasky, John L. Kline, Karl Krushelnick , et al. (38 additional authors not shown)

Abstract: Data science and technology offer transformative tools and methods to science. This review article highlights latest development and progress in the interdisciplinary field of data-driven plasma science (DDPS). A large amount of data and machine learning algorithms go hand in hand. Most plasma data, whether experimental, observational or computational, are generated or collected by machines today.… ▽ More Data science and technology offer transformative tools and methods to science. This review article highlights latest development and progress in the interdisciplinary field of data-driven plasma science (DDPS). A large amount of data and machine learning algorithms go hand in hand. Most plasma data, whether experimental, observational or computational, are generated or collected by machines today. It is now becoming impractical for humans to analyze all the data manually. Therefore, it is imperative to train machines to analyze and interpret (eventually) such data as intelligently as humans but far more efficiently in quantity. Despite the recent impressive progress in applications of data science to plasma science and technology, the emerging field of DDPS is still in its infancy. Fueled by some of the most challenging problems such as fusion energy, plasma processing of materials, and fundamental understanding of the universe through observable plasma phenomena, it is expected that DDPS continues to benefit significantly from the interdisciplinary marriage between plasma science and data science into the foreseeable future. △ Less

Submitted 31 May, 2022; originally announced May 2022.

Comments: 112 pages (including 700+ references), 44 figures, submitted to IEEE Transactions on Plasma Science as a part of the IEEE Golden Anniversary Special Issue

Report number: Los Alamos Report number LA-UR-22-24834

Journal ref: IEEE Transactions on Plasma Science 51, 1750 - 1838 (2023)

arXiv:2108.08896 [pdf, other]

doi 10.1088/1361-6587/ac3f42

Near real-time streaming analysis of big fusion data

Authors: Ralph Kube, R. Michael Churchill, CS Chang, Jong Choi, Jason Wang, Scott Klasky, Laurie Stephey, Minjun Choi, Eli Dart

Abstract: While experiments on fusion plasmas produce high-dimensional data time series with ever increasing magnitude and velocity, data analysis has been lagging behind this development. For example, many data analysis tasks are often performed in a manual, ad-hoc manner some time after an experiment. In this article we introduce the DELTA framework that facilitates near real-time streaming analysis of bi… ▽ More While experiments on fusion plasmas produce high-dimensional data time series with ever increasing magnitude and velocity, data analysis has been lagging behind this development. For example, many data analysis tasks are often performed in a manual, ad-hoc manner some time after an experiment. In this article we introduce the DELTA framework that facilitates near real-time streaming analysis of big and fast fusion data. By streaming measurement data from fusion experiments to a high-performance compute center, DELTA allows to perform demanding data analysis tasks in between plasma pulses. This article describe the modular and expandable software architecture of DELTA and presents performance benchmarks of its individual components as well as of entire workflows. Our focus is on the streaming analysis of ECEi data measured at KSTAR on NERSCs supercomputers and we routinely achieve data transfer rates of about 500 Megabyte per second. We show that a demanding turbulence analysis workload can be distributed among multiple GPUs and executes in under 5 minutes. We further discuss how DELTA uses modern database systems and container orchestration services to provide web-based real-time data visualization. For the case of ECEi data we demonstrate how data visualizations can be augmented with outputs from machine learning models. By providing session leaders and physics operators results of higher order data analysis using live visualization they may monitor the evolution of a long-pulse discharge in near real-time and may make more informed decision on how to configure the machine for the next shot. △ Less

Submitted 19 August, 2021; originally announced August 2021.

arXiv:2107.07108 [pdf, other]

doi 10.1109/TPDS.2021.3100784

Improving I/O Performance for Exascale Applications through Online Data Layout Reorganization

Authors: Lipeng Wan, Axel Huebl, Junmin Gu, Franz Poeschel, Ana Gainaru, Ruonan Wang, Jieyang Chen, Xin Liang, Dmitry Ganyushin, Todd Munson, Ian Foster, Jean-Luc Vay, Norbert Podhorszki, Kesheng Wu, Scott Klasky

Abstract: The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Y… ▽ More The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Yet, as such algorithms improve parallel application efficiency, they raise new challenges for I/O logic due to their irregular and dynamic data distributions. Thus, while the enormous data rates of Exascale simulations already challenge existing file system write strategies, the need for efficient read and processing of generated data introduces additional constraints on the data layout strategies that can be used when writing data to secondary storage. We review these I/O challenges and introduce two online data layout reorganization approaches for achieving good tradeoffs between read and write performance. We demonstrate the benefits of using these two approaches for the ECP particle-in-cell simulation WarpX, which serves as a motif for a large class of important Exascale applications. We show that by understanding application I/O patterns and carefully designing data layouts we can increase read performance by more than 80%. △ Less

Submitted 15 July, 2021; originally announced July 2021.

Comments: 12 pages, 15 figures, accepted by IEEE Transactions on Parallel and Distributed Systems

Journal ref: IEEE Transactions on Parallel and Distributed Systems, 2021

arXiv:2107.06108 [pdf]

doi 10.1007/978-3-030-96498-6_6

Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

Authors: Franz Poeschel, Juncheng E, William F. Godoy, Norbert Podhorszki, Scott Klasky, Greg Eisenhauer, Philip E. Davis, Lipeng Wan, Ana Gainaru, Junmin Gu, Fabian Koller, René Widera, Michael Bussmann, Axel Huebl

Abstract: This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mes… ▽ More This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely-coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks. △ Less

Submitted 19 January, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: 18 pages, 9 figures, SMC2021, supplementary material at https://zenodo.org/record/4906276

arXiv:2105.12764 [pdf, other]

Scalable Multigrid-based Hierarchical Scientific Data Refactoring on GPUs

Authors: Jieyang Chen, Lipeng Wan, Xin Liang, Ben Whitney, Qing Liu, Qian Gong, David Pugmire, Nicholas Thompson, Jong Youl Choi, Matthew Wolf, Todd Munson, Ian Foster, Scott Klasky

Abstract: Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth makes it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigr… ▽ More Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth makes it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigrid-based hierarchical data representations hold promise as a solution to this problem, allowing for flexible conversion between different fidelities so that, for example, data can be created at high fidelity and then transferred or stored at lower fidelity via logically simple and mathematically sound operations. However, the effective use of such representations has been hindered until now by the relatively high costs of creating, accessing, reducing, and otherwise operating on such representations. We describe here highly optimized data refactoring kernels for GPU accelerators that enable efficient creation and manipulation of data in multigrid-based hierarchical forms. We demonstrate that our optimized design can achieve up to 264 TB/s aggregated data refactoring throughput -- 92% of theoretical peak -- on 1024 nodes of the Summit supercomputer. We showcase our optimized design by applying it to a large-scale scientific visualization workflow and the MGARD lossy compression software. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: arXiv admin note: text overlap with arXiv:2007.04457

arXiv:2010.05872 [pdf, other]

MGARD+: Optimizing Multilevel Methods for Error-bounded Scientific Data Reduction

Authors: Xin Liang, Ben Whitney, Jieyang Chen, Lipeng Wan, Qing Liu, Dingwen Tao, James Kress, Dave Pugmire, Matthew Wolf, Norbert Podhorszki, Scott Klasky

Abstract: Data management is becoming increasingly important in dealing with the large amounts of data produced by large-scale scientific simulations and instruments. Existing multilevel compression algorithms offer a promising way to manage scientific data at scale, but may suffer from relatively low performance and reduction quality. In this paper, we propose MGARD+, a multilevel data reduction and refact… ▽ More Data management is becoming increasingly important in dealing with the large amounts of data produced by large-scale scientific simulations and instruments. Existing multilevel compression algorithms offer a promising way to manage scientific data at scale, but may suffer from relatively low performance and reduction quality. In this paper, we propose MGARD+, a multilevel data reduction and refactoring framework drawing on previous multilevel methods, to achieve high-performance data decomposition and high-quality error-bounded lossy compression. Our contributions are four-fold: 1) We propose a level-wise coefficient quantization method, which uses different error tolerances to quantize the multilevel coefficients. 2) We propose an adaptive decomposition method which treats the multilevel decomposition as a preconditioner and terminates the decomposition process at an appropriate level. 3) We leverage a set of algorithmic optimization strategies to significantly improve the performance of multilevel decomposition/recomposition. 4) We evaluate our proposed method using four real-world scientific datasets and compare with several state-of-the-art lossy compressors. Experiments demonstrate that our optimizations improve the decomposition/recomposition performance of the existing multilevel method by up to 70X, and the proposed compression method can improve compression ratio by up to 2X compared with other state-of-the-art error-bounded lossy compressors under the same level of data distortion. △ Less

Submitted 10 November, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

arXiv:2007.04457 [pdf, other]

Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUs

Authors: Jieyang Chen, Lipeng Wan, Xin Liang, Ben Whitney, Qing Liu, David Pugmire, Nicholas Thompson, Matthew Wolf, Todd Munson, Ian Foster, Scott Klasky

Abstract: Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth make it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigri… ▽ More Rapid growth in scientific data and a widening gap between computational speed and I/O bandwidth make it increasingly infeasible to store and share all data produced by scientific simulations. Instead, we need methods for reducing data volumes: ideally, methods that can scale data volumes adaptively so as to enable negotiation of performance and fidelity tradeoffs in different situations. Multigrid-based hierarchical data representations hold promise as a solution to this problem, allowing for flexible conversion between different fidelities so that, for example, data can be created at high fidelity and then transferred or stored at lower fidelity via logically simple and mathematically sound operations. However, the effective use of such representations has been hindered until now by the relatively high costs of creating, accessing, reducing, and otherwise operating on such representations. We describe here highly optimized data refactoring kernels for GPU accelerators that enable efficient creation and manipulation of data in multigrid-based hierarchical forms. We demonstrate that our optimized design can achieve up to 250 TB/s aggregated data refactoring throughput -- 83% of theoretical peak -- on 1024 nodes of the Summit supercomputer. We showcase our optimized design by applying it to a large-scale scientific visualization workflow and the MGARD lossy compression software. △ Less

Submitted 27 February, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

arXiv:2005.05424 [pdf]

Towards 1ULP evaluation of Daubechies Wavelets

Authors: Nicholas Thompson, John Maddock, George Ostrouchov, Jeremy Logan, David Pugmire, Scott Klasky

Abstract: We present algorithms to numerically evaluate Daubechies wavelets and scaling functions to high relative accuracy. These algorithms refine the suggestion of Daubechies and Lagarias to evaluate functions defined by two-scale difference equations using splines; carefully choosing amongst a family of rapidly convergent interpolators which effectively capture all the smoothness present in the function… ▽ More We present algorithms to numerically evaluate Daubechies wavelets and scaling functions to high relative accuracy. These algorithms refine the suggestion of Daubechies and Lagarias to evaluate functions defined by two-scale difference equations using splines; carefully choosing amongst a family of rapidly convergent interpolators which effectively capture all the smoothness present in the function and whose error term admits a small asymptotic constant. We are also able to efficiently compute derivatives, though with a smoothness-induced reduction in accuracy. An implementation is provided in the Boost Software Library. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Comments: 16 pages, 5 figures

arXiv:1806.05251 [pdf, ps, other]

doi 10.1063/1.5044707

A tight-coupling scheme sharing minimum information across a spatial interface between gyrokinetic turbulence codes

Authors: Julien Dominski, Seung-Hoe Ku, Choong-Seock Chang, Jong Choi, Eric Suchyta, Scott Parker, Scott Klasky, Amitava Bhattacharjee

Abstract: A new scheme that tightly couples kinetic turbulence codes across a spatial interface is introduced. This scheme evolves from considerations of competing strategies and down-selection. It is found that the use of a composite kinetic distribution function and fields with global boundary conditions as if the coupled code were one, makes the coupling problem tractable. In contrast, coupling the two s… ▽ More A new scheme that tightly couples kinetic turbulence codes across a spatial interface is introduced. This scheme evolves from considerations of competing strategies and down-selection. It is found that the use of a composite kinetic distribution function and fields with global boundary conditions as if the coupled code were one, makes the coupling problem tractable. In contrast, coupling the two solutions from each code across the overlap region is found to be more difficult due to numerical dephasing of the turbulent solutions between two solvers. Another advantage of the new scheme is that the data movement can be limited to the 3D fluid quantities, instead of higher dimensional kinetic information, which is computationally more efficient for large scale simulations on leadership class computers. △ Less

Submitted 20 July, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

Comments: 8 pages, 4 figures

Journal ref: Physics of Plasmas 25, 072308 (2018)

arXiv:1706.00522 [pdf, other]

doi 10.1007/978-3-319-67630-2_2

On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective

Authors: Axel Huebl, Rene Widera, Felix Schmitt, Alexander Matthes, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, Michael Bussmann

Abstract: We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threa… ▽ More We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threaded data-transformations for the I/O library ADIOS as a feasible way to trade underutilized host-side compute potential on heterogeneous systems for reduced I/O latency. △ Less

Submitted 1 June, 2017; originally announced June 2017.

Comments: 15 pages, 5 figures, accepted for DRBSD-1 in conjunction with ISC'17

ACM Class: D.4.8; B.4.3; I.6.6

Journal ref: J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, LNCS 10524, pp. 15-29, 2017

arXiv:1505.03532 [pdf, other]

Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma

Authors: Lingfei Wu, Kesheng Wu, Alex Sim, Michael Churchill, Jong Y. Choi, Andreas Stathopoulos, Cs Chang, Scott Klasky

Abstract: A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: loca… ▽ More A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grouping feature cells into extended feature, and tracking movement of feature through overlapping in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes to detect and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion simulation data, we observed linear speedup on 1024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC. △ Less

Submitted 2 July, 2016; v1 submitted 13 May, 2015; originally announced May 2015.

Comments: 14 pages, 40 figures

arXiv:1405.7958 [pdf, other]

Region Templates: Data Representation and Management for Large-Scale Image Analysis

Authors: George Teodoro, Tony Pan, Tahsin Kurc, Jun Kong, Lee Cooper, Scott Klasky, Joel Saltz

Abstract: Distributed memory machines equipped with CPUs and GPUs (hybrid computing nodes) are hard to program because of the multiple layers of memory and heterogeneous computing configurations. In this paper, we introduce a region template abstraction for the efficient management of common data types used in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The re… ▽ More Distributed memory machines equipped with CPUs and GPUs (hybrid computing nodes) are hard to program because of the multiple layers of memory and heterogeneous computing configurations. In this paper, we introduce a region template abstraction for the efficient management of common data types used in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The region template provides a generic container template for common data structures, such as points, arrays, regions, and object sets, within a spatial and temporal bounding box. The region template abstraction enables different data management strategies and data I/O implementations, while providing a homogeneous, unified interface to the application for data storage and retrieval. The execution of region templates applications is coordinated by a runtime system that supports efficient execution in hybrid machines. Region templates applications are represented as hierarchical dataflow in which each computing stage may be represented as another dataflow of finer-grain tasks. A number of optimizations for hybrid machines are available in our runtime system, including performance-aware scheduling for maximizing utilization of computing devices and techniques to reduce impact of data transfers between CPUs and GPUs. An experimental evaluation on a state-of-the-art hybrid cluster using a microscopy imaging study shows that this abstraction adds negligible overhead (about 3%) and achieves good scalability. △ Less

Submitted 30 May, 2014; originally announced May 2014.

Comments: 43 pages, 17 figures

arXiv:gr-qc/9801069 [pdf, ps, other]

doi 10.1103/PhysRevLett.80.3915

Stable characteristic evolution of generic 3-dimensional single-black-hole spacetimes

Authors: The Binary Black Hole Grand Challenge Alliance, :, R. Gomez, L. Lehner, R. Marsa, J. Winicour, A. Abrahams, A. Anderson, P. Anninos, T. Baumgarte, N. Bishop, S. Brandt J. Browne, K. Camarda, M. Choptuik, R. Correl, G. Cook, C. Evans, L. Finn, G. Fox, T. Haupt, M. Huq, L. Kidder, S. Klasky, P. Laguna, W. Landry , et al. (20 additional authors not shown)

Abstract: We report new results which establish that the accurate 3-dimensional numerical simulation of generic single-black-hole spacetimes has been achieved by characteristic evolution with unlimited long term stability. Our results cover a selection of distorted, moving and spinning single black holes, with evolution times up to 60,000M. We report new results which establish that the accurate 3-dimensional numerical simulation of generic single-black-hole spacetimes has been achieved by characteristic evolution with unlimited long term stability. Our results cover a selection of distorted, moving and spinning single black holes, with evolution times up to 60,000M. △ Less

Submitted 20 January, 1998; originally announced January 1998.

Comments: 4 pages, 3 figures

Journal ref: Phys.Rev.Lett.80:3915-3918,1998

arXiv:gr-qc/9711078 [pdf, ps, other]

doi 10.1103/PhysRevLett.80.2512

Boosted three-dimensional black-hole evolutions with singularity excision

Authors: The Binary Black Hole Grand Challenge Alliance, :, G. B. Cook, M. F. Huq, S. A. Klasky, M. A. Scheel, A. M. Abrahams, A. Anderson, P. Anninos, T. W. Baumgarte, N. T. Bishop, S. R. Brandt, J. C. Browne, K. Camarda, M. W. Choptuik, C. R. Evans, L. S. Finn, G. C. Fox, R. Gomez, T. Haupt, L. E. Kidder, P. Laguna, W. Landry, L. Lehner, J. Lenaghan , et al. (21 additional authors not shown)

Abstract: Binary black hole interactions provide potentially the strongest source of gravitational radiation for detectors currently under development. We present some results from the Binary Black Hole Grand Challenge Alliance three- dimensional Cauchy evolution module. These constitute essential steps towards modeling such interactions and predicting gravitational radiation waveforms. We report on singl… ▽ More Binary black hole interactions provide potentially the strongest source of gravitational radiation for detectors currently under development. We present some results from the Binary Black Hole Grand Challenge Alliance three- dimensional Cauchy evolution module. These constitute essential steps towards modeling such interactions and predicting gravitational radiation waveforms. We report on single black hole evolutions and the first successful demonstration of a black hole moving freely through a three-dimensional computational grid via a Cauchy evolution: a hole moving ~6M at 0.1c during a total evolution of duration ~60M. △ Less

Submitted 26 November, 1997; originally announced November 1997.

Journal ref: Phys.Rev.lett.80:2512-2516,1998

arXiv:gr-qc/9709082 [pdf, ps, other]

doi 10.1103/PhysRevLett.80.1812

Gravitational wave extraction and outer boundary conditions by perturbative matching

Authors: The Binary Black Hole Grand Challenge Alliance, :, A. M. Abrahams, L. Rezzolla, M. E. Rupright, A. Anderson, P. Anninos, T. W. Baumgarte, N. T. Bishop, S. R. Brandt, J. C. Browne, K. Camarda, M. W. Choptuik, G. B. Cook, C. R. Evans, L. S. Finn, G. Fox, R. Gomez, T. Haupt, M. F. Huq, L. E. Kidder, S. Klasky, P. Laguna, W. Landry, L. Lehner , et al. (20 additional authors not shown)

Abstract: We present a method for extracting gravitational radiation from a three-dimensional numerical relativity simulation and, using the extracted data, to provide outer boundary conditions. The method treats dynamical gravitational variables as nonspherical perturbations of Schwarzschild geometry. We discuss a code which implements this method and present results of tests which have been performed wi… ▽ More We present a method for extracting gravitational radiation from a three-dimensional numerical relativity simulation and, using the extracted data, to provide outer boundary conditions. The method treats dynamical gravitational variables as nonspherical perturbations of Schwarzschild geometry. We discuss a code which implements this method and present results of tests which have been performed with a three dimensional numerical relativity code. △ Less

Submitted 30 September, 1997; originally announced September 1997.

Journal ref: Phys.Rev.Lett.80:1812-1815,1998

Showing 1–23 of 23 results for author: Klasky, S