(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–18 of 18 results for author: Hayot-Sasson, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11432  [pdf, other

    cs.DC

    Octopus: Experiences with a Hybrid Event-Driven Architecture for Distributed Scientific Computing

    Authors: Haochen Pan, Ryan Chard, Sicheng Zhou, Alok Kamatar, Rafael Vescovi, Valerie Hayot-Sasson, André Bauer, Maxime Gonthier, Kyle Chard, Ian Foster

    Abstract: Scientific research increasingly relies on distributed computational resources, storage systems, networks, and instruments, ranging from HPC and cloud systems to edge devices. Event-driven architecture (EDA) benefits applications targeting distributed research infrastructures by enabling the organization, communication, processing, reliability, and security of events generated from many sources. T… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 12 pages, 8 figures

  2. arXiv:2407.01764  [pdf, other

    cs.DC

    Object Proxy Patterns for Accelerating Distributed Applications

    Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Alexander Brace, André Bauer, Kyle Chard, Ian Foster

    Abstract: Workflow and serverless frameworks have empowered new approaches to distributed application design by abstracting compute resources. However, their typically limited or one-size-fits-all support for advanced data flow patterns leaves optimization to the application programmer -- optimization that becomes more difficult as data become larger. The transparent object proxy, which provides wide-area r… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2406.17710  [pdf, other

    cs.DC

    GreenFaaS: Maximizing Energy Efficiency of HPC Workloads with FaaS

    Authors: Alok Kamatar, Valerie Hayot-Sasson, Yadu Babuji, Andre Bauer, Gourav Rattihalli, Ninad Hogade, Dejan Milojicic, Kyle Chard, Ian Foster

    Abstract: Application energy efficiency can be improved by executing each application component on the compute element that consumes the least energy while also satisfying time constraints. In principle, the function as a service (FaaS) paradigm should simplify such optimizations by abstracting away compute location, but existing FaaS systems do not provide for user transparency over application energy cons… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages, 10 figures

  4. Performance comparison of Dask and Apache Spark on HPC systems for Neuroimaging

    Authors: Mathieu Dugré, Valérie Hayot-Sasson, Tristan Glatard

    Abstract: The general increase in data size and data sharing motivates the adoption of Big Data strategies in several scientific disciplines. However, while several options are available, no particular guidelines exist for selecting a Big Data engine. In this paper, we compare the runtime performance of two popular Big Data engines with Python APIs, Apache Spark, and Dask, in processing neuroimaging pipelin… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 16 pages, 10 figures, 2 tables

    Journal ref: Concurrency and Computation: Practice and Experience (2023) 35(21):e7635

  5. arXiv:2404.11556  [pdf, other

    cs.DC

    Hierarchical storage management in user space for neuroimaging applications

    Authors: Valérie Hayot-Sasson, Tristan Glatard

    Abstract: Neuroimaging open-data initiatives have led to increased availability of large scientific datasets. While these datasets are shifting the processing bottleneck from compute-intensive to data-intensive, current standardized analysis tools have yet to adopt strategies that mitigate the costs associated with large data transfers. A major challenge in adapting neuroimaging applications for data-intens… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  6. arXiv:2403.06077  [pdf, other

    cs.DC

    Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments

    Authors: Jim Pruyne, Valerie Hayot-Sasson, Weijian Zheng, Ryan Chard, Justin M. Wozniak, Tekin Bicer, Kyle Chard, Ian T. Foster

    Abstract: Experimental science is increasingly driven by instruments that produce vast volumes of data and thus a need to manage, compute, describe, and index this data. High performance and distributed computing provide the means of addressing the computing needs; however, in practice, the variety of actions required and the distributed set of resources involved, requires sophisticated "flows" defining the… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  7. arXiv:2305.09593  [pdf, other

    cs.DC

    Accelerating Communications in Federated Applications with Transparent Object Proxies

    Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Nathaniel Hudson, Charlie Sabino, Matt Baughman, Kyle Chard, Ian Foster

    Abstract: Advances in networks, accelerators, and cloud services encourage programmers to reconsider where to compute -- such as when fast networks make it cost-effective to compute on remote accelerators despite added latency. Workflow and cloud-hosted serverless computing frameworks can manage multi-step computations spanning federated collections of cloud, high-performance computing (HPC), and edge syste… ▽ More

    Submitted 29 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: Accepted for publication at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC23)

  8. arXiv:2304.03210  [pdf, other

    q-bio.MN cs.DC

    Causal Discovery and Optimal Experimental Design for Genome-Scale Biological Network Recovery

    Authors: Ashka Shah, Arvind Ramanathan, Valerie Hayot-Sasson, Rick Stevens

    Abstract: Causal discovery of genome-scale networks is important for identifying pathways from genes to observable traits - e.g. differences in cell function, disease, drug resistance and others. Causal learners based on graphical models rely on interventional samples to orient edges in the network. However, these models have not been shown to scale up the size of the genome, which are on the order of 1e3-1… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: To be published in Platform for Advanced Scientific Computing 2023 (PASC23) conference proceedings

  9. Workflows Community Summit 2022: A Roadmap Revolution

    Authors: Rafael Ferreira da Silva, Rosa M. Badia, Venkat Bala, Debbie Bard, Peer-Timo Bremer, Ian Buckley, Silvina Caino-Lores, Kyle Chard, Carole Goble, Shantenu Jha, Daniel S. Katz, Daniel Laney, Manish Parashar, Frederic Suter, Nick Tyler, Thomas Uram, Ilkay Altintas, Stefan Andersson, William Arndt, Juan Aznar, Jonathan Bader, Bartosz Balis, Chris Blanton, Kelly Rosa Braghetto, Aharon Brodutch , et al. (80 additional authors not shown)

    Abstract: Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

    Report number: ORNL/TM-2023/2885

  10. Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources

    Authors: Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Ryan Chard, Yadu Babuji, Ganesh Sivaraman, Sutanay Choudhury, Kyle Chard, Rajeev Thakur, Ian Foster

    Abstract: Applications that fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and inference tasks on specialized accelerators. Here, we present our experiences deploying two AI-guided simulation workflows across such heterogeneous systems. A unique aspect of our approach… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  11. arXiv:2207.01737  [pdf, other

    cs.DC

    Sea: A lightweight data-placement library for Big Data scientific computing

    Authors: Valérie Hayot-Sasson, Mathieu Dugré, Tristan Glatard

    Abstract: The recent influx of open scientific data has contributed to the transitioning of scientific computing from compute intensive to data intensive. Whereas many Big Data frameworks exist that minimize the cost of data transfers, few scientific applications integrate these frameworks or adopt data-placement strategies to mitigate the costs. Scientific applications commonly rely on well-established com… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  12. arXiv:2108.10496  [pdf, other

    cs.DC

    The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows

    Authors: Valerie Hayot-Sasson, Tristan Glatard, Ariel Rokem

    Abstract: To support the growing demands of neuroscience applications, researchers are transitioning to cloud computing for its scalable, robust and elastic infrastructure. Nevertheless, large datasets residing in object stores may result in significant data transfer overheads during workflow execution. Prefetching, a method to mitigate the cost of reading in mixed workloads, masks data transfer costs withi… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

  13. arXiv:2101.01335  [pdf, other

    cs.DC cs.PF

    Modeling the Linux page cache for accurate simulation of data-intensive applications

    Authors: Hoang-Dung Do, Valerie Hayot-Sasson, Rafael Ferreira da Silva, Christopher Steele, Henri Casanova, Tristan Glatard

    Abstract: The emergence of Big Data in recent years has resulted in a growing need for efficient data processing solutions. While infrastructures with sufficient compute power are available, the I/O bottleneck remains. The Linux page cache is an efficient approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-wor… ▽ More

    Submitted 4 January, 2021; originally announced January 2021.

    Comments: 10 pages, 8 figures, CCGrid

  14. arXiv:1912.11794  [pdf, other

    cs.PF

    Performance benefits of Intel(R) OptaneTM DC persistent memory for the parallel processing of large neuroimaging data

    Authors: Valerie Hayot-Sasson, Shawn T Brown, Tristan Glatard

    Abstract: Open-access neuroimaging datasets have reached petabyte scale, and continue to grow. The ability to leverage the entirety of these datasets is limited to a restricted number of labs with both the capacity and infrastructure to process the data. Whereas Big Data engines have significantly reduced application performance penalties with respect to data movement, their applied strategies (e.g. data lo… ▽ More

    Submitted 26 December, 2019; originally announced December 2019.

  15. arXiv:1907.13030  [pdf, other

    cs.DC cs.PF

    A performance comparison of Dask and Apache Spark for data-intensive neuroimaging pipelines

    Authors: Mathieu Dugré, Valérie Hayot-Sasson, Tristan Glatard

    Abstract: In the past few years, neuroimaging has entered the Big Data era due to the joint increase in image resolution, data sharing, and study sizes. However, no particular Big Data engines have emerged in this field, and several alternatives remain available. We compare two popular Big Data engines with Python APIs, Apache Spark and Dask, for their runtime performance in processing neuroimaging pipeline… ▽ More

    Submitted 5 October, 2019; v1 submitted 30 July, 2019; originally announced July 2019.

    Comments: 10 pages, 15 figures, 1 tables. To appear in the proceeding of the 14th WORKS Workshop on Topics in Workflows in Support of Large-Scale Science, 17 November 2019, Denver, CO, USA

  16. arXiv:1905.12720  [pdf, other

    cs.DC cs.PF

    Evaluation of pilot jobs for Apache Spark applications on HPC clusters

    Authors: Valerie Hayot-Sasson, Tristan Glatard

    Abstract: Big Data has become prominent throughout many scientific fields and, as a result, scientific communities have sought out Big Data frameworks to accelerate the processing of their increasingly data-intensive pipelines. However, while scientific communities typically rely on High-Performance Computing (HPC) clusters for the parallelization of their pipelines, many popular Big Data frameworks such as… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  17. arXiv:1812.06492  [pdf, other

    cs.DC

    Performance Evaluation of Big Data Processing Strategies for Neuroimaging

    Authors: Valérie Hayot-Sasson, Shawn T Brown, Tristan Glatard

    Abstract: Neuroimaging datasets are rapidly growing in size as a result of advancements in image acquisition methods, open-science and data sharing. However, the adoption of Big Data processing strategies by neuroimaging processing engines remains limited. Here, we evaluate three Big Data processing strategies (in-memory computing, data locality and lazy evaluation) on typical neuroimaging use cases, repres… ▽ More

    Submitted 2 April, 2019; v1 submitted 16 December, 2018; originally announced December 2018.

  18. arXiv:1711.09713  [pdf, other

    cs.SE cs.DC

    Boutiques: a flexible framework for automated application integration in computing platforms

    Authors: Tristan Glatard, Gregory Kiar, Tristan Aumentado-Armstrong, Natacha Beck, Pierre Bellec, Rémi Bernard, Axel Bonnet, Sorina Camarasu-Pop, Frédéric Cervenansky, Samir Das, Rafael Ferreira da Silva, Guillaume Flandin, Pascal Girard, Krzysztof J. Gorgolewski, Charles R. G. Guttmann, Valérie Hayot-Sasson, Pierre-Olivier Quirion, Pierre Rioux, Marc-Eienne Rousseau, Alan C. Evans

    Abstract: We present Boutiques, a system to automatically publish, integrate and execute applications across computational platforms. Boutiques applications are installed through software containers described in a rich and flexible JSON language. A set of core tools facilitate the construction, validation, import, execution, and publishing of applications. Boutiques is currently supported by several distinc… ▽ More

    Submitted 7 November, 2017; originally announced November 2017.

    Comments: 10 pages