Search | arXiv e-print repository

Ookami: An A64FX Computing Resource

Authors: A. C. Calder, E. Siegmann, C. Feldman, S. Chheda, D. C. Smolarski, F. D. Swesty, A. Curtis, J. Dey, D. Carlson, B. Michalowicz, R. J. Harrison

Abstract: We present a look at Ookami, a project providing community access to a testbed supercomputer with the ARM-based A64FX processors developed by a collaboration between RIKEN and Fujitsu and deployed in the Japanese supercomputer Fugaku. We describe the project, provide details about the user base and education/training program, and present highlights from performance studies of two astrophysical sim… ▽ More We present a look at Ookami, a project providing community access to a testbed supercomputer with the ARM-based A64FX processors developed by a collaboration between RIKEN and Fujitsu and deployed in the Japanese supercomputer Fugaku. We describe the project, provide details about the user base and education/training program, and present highlights from performance studies of two astrophysical simulation codes. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 9 pages, 3 figures, submitted to the Proceedings of 15th International Conference on Numerical Modeling of Space Plasma Flows

arXiv:2309.04652 [pdf, other]

doi 10.1145/3569951.3597583

A Further Study of Linux Kernel Hugepages on A64FX with FLASH, an Astrophysical Simulation Code

Authors: Catherine Feldman, Smeet Chheda, Alan C. Calder, Eva Siegmann, John Dey, Tony Curtis, Robert J. Harrison

Abstract: We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but fu… ▽ More We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but further investigation allowed us to utilize hp for multiple compilers by linking to the Fujitsu library libmpg and transparent hugepages (thp) by enabling it at the node level. By comparing the results of hardware counters and in-code timers, we found that hp and thp do not significantly impact the runtime performance of FLASH. Interestingly, there is a significant reduction in the TLB misses, differences in cache and memory access counters, and strange behavior is observed when using thp. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 10 pages, 2 figures, 7 tables. Proceedings for Practice and Experience in Advanced Research Computing (PEARC '23), July 23--27, 2023, Portland, OR, USA

ACM Class: C.1.4; I.6.0; J.2

Journal ref: Practice and Experience in Advanced Research Computing (PEARC '23). Association for Computing Machinery, New York, NY, USA, 186-195. (July 2023)

arXiv:2207.13685 [pdf, ps, other]

On Using Linux Kernel Huge Pages with FLASH, an Astrophysical Simulation Code

Authors: Alan C. Calder, Catherine Feldman, Eva Siegmann, John Dey, Anthony Curtis, Smeet Chheda, Robert J. Harrison

Abstract: We present efforts at improving the performance of FLASH, a multi-scale, multi-physics simulation code principally for astrophysical applications, by using huge pages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. We explored options for enabling the use of huge pages with… ▽ More We present efforts at improving the performance of FLASH, a multi-scale, multi-physics simulation code principally for astrophysical applications, by using huge pages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. We explored options for enabling the use of huge pages with several compilers, but we were only able to successfully use huge pages when compiling with the Fujitsu compiler. The use of huge pages substantially reduced the number of translation lookaside buffer misses, but overall performance gains were marginal. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: 6 pages, 1 figure, accepted to Embracing Arm for HPC, An IEEE Cluster 2022 Workshop

arXiv:2205.08067 [pdf]

Robust Perception Architecture Design for Automotive Cyber-Physical Systems

Authors: Joydeep Dey, Sudeep Pasricha

Abstract: In emerging automotive cyber-physical systems (CPS), accurate environmental perception is critical to achieving safety and performance goals. Enabling robust perception for vehicles requires solving multiple complex problems related to sensor selection/ placement, object detection, and sensor fusion. Current methods address these problems in isolation, which leads to inefficient solutions. We pres… ▽ More In emerging automotive cyber-physical systems (CPS), accurate environmental perception is critical to achieving safety and performance goals. Enabling robust perception for vehicles requires solving multiple complex problems related to sensor selection/ placement, object detection, and sensor fusion. Current methods address these problems in isolation, which leads to inefficient solutions. We present PASTA, a novel framework for global co-optimization of deep learning and sensing for dependable vehicle perception. Experimental results with the Audi-TT and BMW-Minicooper vehicles show how PASTA can find robust, vehicle-specific perception architecture solutions. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2201.13001 [pdf, other]

Deep Discriminative to Kernel Density Graph for In- and Out-of-distribution Calibrated Inference

Authors: Jayanta Dey, Haoyin Xu, Will LeVine, Ashwin De Silva, Tyler M. Tomita, Ali Geisa, Tiffany Chu, Jacob Desman, Joshua T. Vogelstein

Abstract: Deep discriminative approaches like random forests and deep neural networks have recently found applications in many important real-world scenarios. However, deploying these learning algorithms in safety-critical applications raises concerns, particularly when it comes to ensuring confidence calibration for both in-distribution and out-of-distribution data points. Many popular methods for in-distr… ▽ More Deep discriminative approaches like random forests and deep neural networks have recently found applications in many important real-world scenarios. However, deploying these learning algorithms in safety-critical applications raises concerns, particularly when it comes to ensuring confidence calibration for both in-distribution and out-of-distribution data points. Many popular methods for in-distribution (ID) calibration, such as isotonic and Platt's sigmoidal regression, exhibit excellent ID calibration performance. However, these methods are not calibrated for the entire feature space, leading to overconfidence in the case of out-of-distribution (OOD) samples. On the other end of the spectrum, existing out-of-distribution (OOD) calibration methods generally exhibit poor in-distribution (ID) calibration. In this paper, we address ID and OOD calibration problems jointly. We leveraged the fact that deep models, including both random forests and deep-nets, learn internal representations which are unions of polytopes with affine activation functions to conceptualize them both as partitioning rules of the feature space. We replace the affine function in each polytope populated by the training data with a Gaussian kernel. Our experiments on both tabular and vision benchmarks show that the proposed approaches obtain well-calibrated posteriors while mostly preserving or improving the classification accuracy of the original algorithm for ID region, and extrapolate beyond the training data to handle OOD inputs appropriately. △ Less

Submitted 7 June, 2024; v1 submitted 31 January, 2022; originally announced January 2022.

arXiv:2201.07372 [pdf, other]

Prospective Learning: Principled Extrapolation to the Future

Authors: Ashwin De Silva, Rahul Ramesh, Lyle Ungar, Marshall Hussain Shuler, Noah J. Cowan, Michael Platt, Chen Li, Leyla Isik, Seung-Eon Roh, Adam Charles, Archana Venkataraman, Brian Caffo, Javier J. How, Justus M Kebschull, John W. Krakauer, Maxim Bichuch, Kaleab Alemayehu Kinfu, Eva Yezerets, Dinesh Jayaraman, Jong M. Shin, Soledad Villar, Ian Phillips, Carey E. Priebe, Thomas Hartung, Michael I. Miller , et al. (18 additional authors not shown)

Abstract: Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenari… ▽ More Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenarios evolve over multiple spatiotemporal scales with partially predictable dynamics. Here we reformulate the learning problem to one that centers around this idea of dynamic futures that are partially learnable. We conjecture that certain sequences of tasks are not retrospectively learnable (in which the data distribution is fixed), but are prospectively learnable (in which distributions may be dynamic), suggesting that prospective learning is more difficult in kind than retrospective learning. We argue that prospective learning more accurately characterizes many real world problems that (1) currently stymie existing artificial intelligence solutions and/or (2) lack adequate explanations for how natural intelligences solve them. Thus, studying prospective learning will lead to deeper insights and solutions to currently vexing challenges in both natural and artificial intelligences. △ Less

Submitted 13 July, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

Comments: Accepted at the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023

arXiv:2110.08483 [pdf, other]

Simplest Streaming Trees

Authors: Haoyin Xu, Jayanta Dey, Sambit Panda, Joshua T. Vogelstein

Abstract: Decision forests, including random forests and gradient boosting trees, remain the leading machine learning methods for many real-world data problems, especially on tabular data. However, most of the current implementations only operate in batch mode, and therefore cannot incrementally update when more data arrive. Several previous works developed streaming trees and ensembles to overcome this lim… ▽ More Decision forests, including random forests and gradient boosting trees, remain the leading machine learning methods for many real-world data problems, especially on tabular data. However, most of the current implementations only operate in batch mode, and therefore cannot incrementally update when more data arrive. Several previous works developed streaming trees and ensembles to overcome this limitation. Nonetheless, we found that those state-of-the-art algorithms suffer from a number of drawbacks, including low accuracy on some problems and high memory usage on others. We therefore developed the simplest possible extension of decision trees: given new data, simply update existing trees by continuing to grow them, and replace some old trees with new ones to control the total number of trees. In a benchmark suite containing 72 classification problems (the OpenML-CC18 data suite), we illustrate that our approach, Stream Decision Forest (SDF), does not suffer from either of the aforementioned limitations. On those datasets, we also demonstrate that our approach often performs as well, and sometimes even better, than conventional batch decision forest algorithm. Thus, SDFs establish a simple standard for streaming trees and forests that could readily be applied to many real-world problems. △ Less

Submitted 24 October, 2023; v1 submitted 16 October, 2021; originally announced October 2021.

arXiv:2109.14501 [pdf, other]

Towards a theory of out-of-distribution learning

Authors: Jayanta Dey, Ali Geisa, Ronak Mehta, Tyler M. Tomita, Hayden S. Helm, Haoyin Xu, Eric Eaton, Jeffery Dick, Carey E. Priebe, Joshua T. Vogelstein

Abstract: Learning is a process wherein a learning agent enhances its performance through exposure of experience or data. Throughout this journey, the agent may encounter diverse learning environments. For example, data may be presented to the leaner all at once, in multiple batches, or sequentially. Furthermore, the distribution of each data sample could be either identical and independent (iid) or non-iid… ▽ More Learning is a process wherein a learning agent enhances its performance through exposure of experience or data. Throughout this journey, the agent may encounter diverse learning environments. For example, data may be presented to the leaner all at once, in multiple batches, or sequentially. Furthermore, the distribution of each data sample could be either identical and independent (iid) or non-iid. Additionally, there may exist computational and space constraints for the deployment of the learning algorithms. The complexity of a learning task can vary significantly, depending on the learning setup and the constraints imposed upon it. However, it is worth noting that the current literature lacks formal definitions for many of the in-distribution and out-of-distribution learning paradigms. Establishing proper and universally agreed-upon definitions for these learning setups is essential for thoroughly exploring the evolution of ideas across different learning scenarios and deriving generalized mathematical bounds for these learners. In this paper, we aim to address this issue by proposing a chronological approach to defining different learning tasks using the provably approximately correct (PAC) learning framework. We will start with in-distribution learning and progress to recently proposed lifelong or continual learning. We employ consistent terminology and notation to demonstrate how each of these learning frameworks represents a specific instance of a broader, more generalized concept of learnability. Our hope is that this work will inspire a universally agreed-upon approach to quantifying different types of learning, fostering greater understanding and progress in the field. △ Less

Submitted 7 June, 2024; v1 submitted 29 September, 2021; originally announced September 2021.

arXiv:2108.13637 [pdf, other]

When are Deep Networks really better than Decision Forests at small sample sizes, and how?

Authors: Haoyin Xu, Kaleab A. Kinfu, Will LeVine, Sambit Panda, Jayanta Dey, Michael Ainsworth, Yu-Chung Peng, Madi Kusmanov, Florian Engert, Christopher M. White, Joshua T. Vogelstein, Carey E. Priebe

Abstract: Deep networks and decision forests (such as random forests and gradient boosted trees) are the leading machine learning methods for structured and tabular data, respectively. Many papers have empirically compared large numbers of classifiers on one or two different domains (e.g., on 100 different tabular data settings). However, a careful conceptual and empirical comparison of these two strategies… ▽ More Deep networks and decision forests (such as random forests and gradient boosted trees) are the leading machine learning methods for structured and tabular data, respectively. Many papers have empirically compared large numbers of classifiers on one or two different domains (e.g., on 100 different tabular data settings). However, a careful conceptual and empirical comparison of these two strategies using the most contemporary best practices has yet to be performed. Conceptually, we illustrate that both can be profitably viewed as "partition and vote" schemes. Specifically, the representation space that they both learn is a partitioning of feature space into a union of convex polytopes. For inference, each decides on the basis of votes from the activated nodes. This formulation allows for a unified basic understanding of the relationship between these methods. Empirically, we compare these two strategies on hundreds of tabular data settings, as well as several vision and auditory settings. Our focus is on datasets with at most 10,000 samples, which represent a large fraction of scientific and biomedical datasets. In general, we found forests to excel at tabular and structured data (vision and audition) with small sample sizes, whereas deep nets performed better on structured data with larger sample sizes. This suggests that further gains in both scenarios may be realized via further combining aspects of forests and networks. We will continue revising this technical report in the coming months with updated results. △ Less

Submitted 2 November, 2021; v1 submitted 31 August, 2021; originally announced August 2021.

arXiv:2004.12908 [pdf, other]

A Simple Lifelong Learning Approach

Authors: Joshua T. Vogelstein, Jayanta Dey, Hayden S. Helm, Will LeVine, Ronak D. Mehta, Tyler M. Tomita, Haoyin Xu, Ali Geisa, Qingyang Wang, Gido M. van de Ven, Chenyu Gao, Weiwei Yang, Bryan Tower, Jonathan Larson, Christopher M. White, Carey E. Priebe

Abstract: In lifelong learning, data are used to improve performance not only on the present task, but also on past and future (unencountered) tasks. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain perf… ▽ More In lifelong learning, data are used to improve performance not only on the present task, but also on past and future (unencountered) tasks. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain performance on old tasks given new tasks. But striving to avoid forgetting sets the goal unnecessarily low. The goal of lifelong learning should be to use data to improve performance on both future tasks (forward transfer) and past tasks (backward transfer). In this paper, we show that a simple approach -- representation ensembling -- demonstrates both forward and backward transfer in a variety of simulated and benchmark data scenarios, including tabular, vision (CIFAR-100, 5-dataset, Split Mini-Imagenet, and Food1k), and speech (spoken digit), in contrast to various reference algorithms, which typically failed to transfer either forward or backward, or both. Moreover, our proposed approach can flexibly operate with or without a computational budget. △ Less

Submitted 11 June, 2024; v1 submitted 27 April, 2020; originally announced April 2020.

arXiv:1912.00519 [pdf, other]

Location Forensics of Media Recordings Utilizing Cascaded SVM and Pole-matching Classifiers

Authors: Jayanta Dey, Mohammad Ariful Haque

Abstract: Information regarding the location of power distribution grid can be extracted from the power signature embedded in the multimedia signals (e.g., audio, video data) recorded near electrical activities. This implicit mechanism of identifying the origin-of-recording can be a very promising tool for multimedia forensics and security applications. In this work, we have developed a novel grid-of-origin… ▽ More Information regarding the location of power distribution grid can be extracted from the power signature embedded in the multimedia signals (e.g., audio, video data) recorded near electrical activities. This implicit mechanism of identifying the origin-of-recording can be a very promising tool for multimedia forensics and security applications. In this work, we have developed a novel grid-of-origin identification system from media recording that consists of a number of support vector machine (SVM) followed by pole-matching (PM) classifiers. First, we determine the nominal frequency of the grid (50 or 60 Hzへるつ) based on the spectral observation. Then an SVM classifier, trained for the detection of a grid with a particular nominal frequency, narrows down the list of possible grids on the basis of different discriminating features extracted from the electric network frequency (ENF) signal. The decision of the SVM classifier is then passed to the PM classifier that detects the final grid based on the minimum distance between the estimated poles of test and training grids. Thus, we start from the problem of classifying grids with different nominal frequencies and simplify the problem of classification in three stages based on nominal frequency, SVM and finally using PM classifier. This cascaded system of classification ensures better accuracy (15.57% higher) compared to traditional ENF-based SVM classifiers described in the literature. △ Less

Submitted 1 December, 2019; originally announced December 2019.

arXiv:1902.01544 [pdf, ps, other]

An Ensemble SVM-based Approach for Voice Activity Detection

Authors: Jayanta Dey, Md Sanzid Bin Hossain, Mohammad Ariful Haque

Abstract: Voice activity detection (VAD), used as the front end of speech enhancement, speech and speaker recognition algorithms, determines the overall accuracy and efficiency of the algorithms. Therefore, a VAD with low complexity and high accuracy is highly desirable for speech processing applications. In this paper, we propose a novel training method on large dataset for supervised learning-based VAD sy… ▽ More Voice activity detection (VAD), used as the front end of speech enhancement, speech and speaker recognition algorithms, determines the overall accuracy and efficiency of the algorithms. Therefore, a VAD with low complexity and high accuracy is highly desirable for speech processing applications. In this paper, we propose a novel training method on large dataset for supervised learning-based VAD system using support vector machine (SVM). Despite of high classification accuracy of support vector machines (SVM), trivial SVM is not suitable for classification of large data sets needed for a good VAD system because of high training complexity. To overcome this problem, a novel ensemble-based approach using SVM has been proposed in this paper.The performance of the proposed ensemble structure has been compared with a feedforward neural network (NN). Although NN performs better than single SVM-based VAD trained on a small portion of the training data, ensemble SVM gives accuracy comparable to neural network-based VAD. Ensemble SVM and NN give 88.74% and 86.28% accuracy respectively whereas the stand-alone SVM shows 57.05% accuracy on average on the test dataset. △ Less

Submitted 4 February, 2019; originally announced February 2019.

Showing 1–12 of 12 results for author: Dey, J