-
Training and Onboarding initiatives in High Energy Physics experiments
Authors:
S. Hageboeck,
A. Reinsvold Hall,
N. Skidmore,
G. A. Stewart,
G. Benelli,
B. Carlson,
C. David,
J. Davies,
W. Deconinck,
D. DeMuth, Jr.,
P. Elmer,
R. B. Garg,
K. Lieret,
V. Lukashenko,
S. Malik,
A. Morris,
H. Schellman,
J. Veatch,
M. Hernandez Villanueva
Abstract:
In this paper we document the current analysis software training and onboarding activities in several High Energy Physics (HEP) experiments: ATLAS, CMS, LHCb, Belle II and DUNE. Fast and efficient onboarding of new collaboration members is increasingly important for HEP experiments as analyses and the related software become ever more complex with growing datasets. A meeting series was held by the…
▽ More
In this paper we document the current analysis software training and onboarding activities in several High Energy Physics (HEP) experiments: ATLAS, CMS, LHCb, Belle II and DUNE. Fast and efficient onboarding of new collaboration members is increasingly important for HEP experiments as analyses and the related software become ever more complex with growing datasets. A meeting series was held by the HEP Software Foundation (HSF) in 2022 for experiments to showcase their initiatives. Here we document and analyse these in an attempt to determine a set of key considerations for future experiments.
△ Less
Submitted 23 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Speeding up Madgraph5 aMC@NLO through CPU vectorization and GPU offloading: towards a first alpha release
Authors:
Andrea Valassi,
Taylor Childers,
Laurence Field,
Stephan Hageböck,
Walter Hopkins,
Olivier Mattelaer,
Nathan Nichols,
Stefan Roiser,
David Smith,
Jorgen Teig,
Carl Vuosalo,
Zenny Wettersten
Abstract:
The matrix element (ME) calculation in any Monte Carlo physics event generator is an ideal fit for implementing data parallelism with lockstep processing on GPUs and vector CPUs. For complex physics processes where the ME calculation is the computational bottleneck of event generation workflows, this can lead to large overall speedups by efficiently exploiting these hardware architectures, which a…
▽ More
The matrix element (ME) calculation in any Monte Carlo physics event generator is an ideal fit for implementing data parallelism with lockstep processing on GPUs and vector CPUs. For complex physics processes where the ME calculation is the computational bottleneck of event generation workflows, this can lead to large overall speedups by efficiently exploiting these hardware architectures, which are now largely underutilized in HEP. In this paper, we present the status of our work on the reengineering of the Madgraph5_aMC@NLO event generator at the time of the ACAT2022 conference. The progress achieved since our previous publication in the ICHEP2022 proceedings is discussed, for our implementations of the ME calculations in vectorized C++, in CUDA and in the SYCL framework, as well as in their integration into the existing MadEvent framework. The outlook towards a first alpha release of the software supporting QCD LO processes usable by the LHC experiments is also discussed.
△ Less
Submitted 9 December, 2023; v1 submitted 31 March, 2023;
originally announced March 2023.
-
Challenges and opportunities integrating LLAMA into AdePT
Authors:
Bernhard Manfred Gruber,
Guilherme Amadio,
Stephan Hageböck
Abstract:
Particle transport simulations are a cornerstone of high-energy physics (HEP), constituting a substantial part of the computing workload performed in HEP. To boost the simulation throughput and energy efficiency, GPUs as accelerators have been explored in recent years, further driven by the increasing use of GPUs on HPCs. The Accelerated demonstrator of electromagnetic Particle Transport (AdePT) i…
▽ More
Particle transport simulations are a cornerstone of high-energy physics (HEP), constituting a substantial part of the computing workload performed in HEP. To boost the simulation throughput and energy efficiency, GPUs as accelerators have been explored in recent years, further driven by the increasing use of GPUs on HPCs. The Accelerated demonstrator of electromagnetic Particle Transport (AdePT) is an advanced prototype for offloading the simulation of electromagnetic showers in Geant4 to GPUs, and still undergoes continuous development and optimization. Improving memory layout and data access is vital to use modern, massively parallel GPU hardware efficiently, contributing to the challenge of migrating traditional CPU based data structures to GPUs in AdePT. The low-level abstraction of memory access (LLAMA) is a C++ library that provides a zero-runtime-overhead data structure abstraction layer, focusing on multidimensional arrays of nested, structured data. It provides a framework for defining and switching custom memory mappings at compile time to define data layouts and instrument data access, making LLAMA an ideal tool to tackle the memory-related optimization challenges in AdePT. Our contribution shares insights gained with LLAMA when instrumenting data access inside AdePT, complementing traditional GPU profiler outputs. We demonstrate traces of read/write counts to data structure elements as well as memory heatmaps. The acquired knowledge allowed for subsequent data layout optimizations.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Second Analysis Ecosystem Workshop Report
Authors:
Mohamed Aly,
Jackson Burzynski,
Bryan Cardwell,
Daniel C. Craik,
Tal van Daalen,
Tomas Dado,
Ayanabha Das,
Antonio Delgado Peris,
Caterina Doglioni,
Peter Elmer,
Engin Eren,
Martin B. Eriksen,
Jonas Eschle,
Giulio Eulisse,
Conor Fitzpatrick,
José Flix Molina,
Alessandra Forti,
Ben Galewsky,
Sean Gasiorowski,
Aman Goel,
Loukas Gouskos,
Enrico Guiraud,
Kanhaiya Gupta,
Stephan Hageboeck,
Allison Reinsvold Hall
, et al. (44 additional authors not shown)
Abstract:
The second workshop on the HEP Analysis Ecosystem took place 23-25 May 2022 at IJCLab in Orsay, to look at progress and continuing challenges in scaling up HEP analysis to meet the needs of HL-LHC and DUNE, as well as the very pressing needs of LHC Run 3 analysis.
The workshop was themed around six particular topics, which were felt to capture key questions, opportunities and challenges. Each to…
▽ More
The second workshop on the HEP Analysis Ecosystem took place 23-25 May 2022 at IJCLab in Orsay, to look at progress and continuing challenges in scaling up HEP analysis to meet the needs of HL-LHC and DUNE, as well as the very pressing needs of LHC Run 3 analysis.
The workshop was themed around six particular topics, which were felt to capture key questions, opportunities and challenges. Each topic arranged a plenary session introduction, often with speakers summarising the state-of-the art and the next steps for analysis. This was then followed by parallel sessions, which were much more discussion focused, and where attendees could grapple with the challenges and propose solutions that could be tried. Where there was significant overlap between topics, a joint discussion between them was arranged.
In the weeks following the workshop the session conveners wrote this document, which is a summary of the main discussions, the key points raised and the conclusions and outcomes. The document was circulated amongst the participants for comments before being finalised here.
△ Less
Submitted 9 December, 2022;
originally announced December 2022.
-
Developments in Performance and Portability for MadGraph5_aMC@NLO
Authors:
Andrea Valassi,
Taylor Childers,
Laurence Field,
Stefan Hageböck,
Walter Hopkins,
Olivier Mattelaer,
Nathan Nichols,
Stefan Roiser,
David Smith
Abstract:
Event generators simulate particle interactions using Monte Carlo techniques, providing the primary connection between experiment and theory in experimental high energy physics. These software packages, which are the first step in the simulation worflow of collider experiments, represent approximately 5 to 20% of the annual WLCG usage for the ATLAS and CMS experiments. With computing architectures…
▽ More
Event generators simulate particle interactions using Monte Carlo techniques, providing the primary connection between experiment and theory in experimental high energy physics. These software packages, which are the first step in the simulation worflow of collider experiments, represent approximately 5 to 20% of the annual WLCG usage for the ATLAS and CMS experiments. With computing architectures becoming more heterogeneous, it is important to ensure that these key software frameworks can be run on future systems, large and small. In this contribution, recent progress on porting and speeding up the Madgraph5_aMC@NLO event generator on hybrid architectures, i.e. CPU with GPU accelerators, is discussed. The main focus of this work has been in the calculation of scattering amplitudes and "matrix elements", which is the computational bottleneck of an event generation application. For physics processes limited to QCD leading order, the code generation toolkit has been expanded to produce matrix element calculations using C++ vector instructions on CPUs and using CUDA for NVidia GPUs, as well as using Alpaka, Kokkos and SYCL for multiple CPU and GPU architectures. Performance is reported in terms of matrix element calculations per time on NVidia, Intel, and AMD devices. The status and outlook for the integration of this work into a production release usable by the LHC experiments, with the same functionalities and very similar user interfaces as the current Fortran version, is also described.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Offloading electromagnetic shower transport to GPUs
Authors:
G. Amadio,
J. Apostolakis,
P. Buncic,
G. Cosmo,
D. Dosaru,
A. Gheata,
S. Hageboeck,
J. Hahnfeld,
M. Hodgkinson,
B. Morgan,
M. Novak,
A. A. Petre,
W. Pokorski,
A. Ribon,
G. A. Stewart,
P. M. Vila
Abstract:
Making general particle transport simulation for high-energy physics (HEP) single-instruction-multiple-thread (SIMT) friendly, to take advantage of accelerator hardware, is an important alternative for boosting the throughput of simulation applications. To date, this challenge is not yet resolved, due to difficulties in mapping the complexity of Geant4 components and workflow to the massive parall…
▽ More
Making general particle transport simulation for high-energy physics (HEP) single-instruction-multiple-thread (SIMT) friendly, to take advantage of accelerator hardware, is an important alternative for boosting the throughput of simulation applications. To date, this challenge is not yet resolved, due to difficulties in mapping the complexity of Geant4 components and workflow to the massive parallelism features exposed by graphics processing units (GPU). The AdePT project is one of the R\&D initiatives tackling this limitation and exploring GPUs as potential accelerators for offloading some part of the CPU simulation workload. Our main target is to implement a complete electromagnetic shower demonstrator working on the GPU. The project is the first to create a full prototype of a realistic electron, positron, and gamma electromagnetic shower simulation on GPU, implemented as either a standalone application or as an extension of the standard Geant4 CPU workflow. Our prototype currently provides a platform to explore many optimisations and different approaches. We present the most recent results and initial conclusions of our work, using both a standalone GPU performance analysis and a first implementation of a hybrid workflow based on Geant4 on the CPU and AdePT on the GPU.
△ Less
Submitted 30 September, 2022;
originally announced September 2022.
-
Constraints on future analysis metadata systems in High Energy Physics
Authors:
T. J. Khoo,
A. Reinsvold Hall,
N. Skidmore,
S. Alderweireldt,
J. Anders,
C. Burr,
W. Buttinger,
P. David,
L. Gouskos,
L. Gray,
S. Hageboeck,
A. Krasznahorkay,
P. Laycock,
A. Lister,
Z. Marshall,
A. B. Meyer,
T. Novak,
S. Rappoccio,
M. Ritter,
E. Rodrigues,
J. Rumsevicius,
L. Sexton-Kennedy,
N. Smith,
G. A. Stewart,
S. Wertz
Abstract:
In High Energy Physics (HEP), analysis metadata comes in many forms -- from theoretical cross-sections, to calibration corrections, to details about file processing. Correctly applying metadata is a crucial and often time-consuming step in an analysis, but designing analysis metadata systems has historically received little direct attention. Among other considerations, an ideal metadata tool shoul…
▽ More
In High Energy Physics (HEP), analysis metadata comes in many forms -- from theoretical cross-sections, to calibration corrections, to details about file processing. Correctly applying metadata is a crucial and often time-consuming step in an analysis, but designing analysis metadata systems has historically received little direct attention. Among other considerations, an ideal metadata tool should be easy to use by new analysers, should scale to large data volumes and diverse processing paradigms, and should enable future analysis reinterpretation. This document, which is the product of community discussions organised by the HEP Software Foundation, categorises types of metadata by scope and format and gives examples of current metadata solutions. Important design considerations for metadata systems, including sociological factors, analysis preservation efforts, and technical factors, are discussed. A list of best practices and technical requirements for future analysis metadata systems is presented. These best practices could guide the development of a future cross-experimental effort for analysis metadata tools.
△ Less
Submitted 19 May, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Design and engineering of a simplified workflow execution for the MG5aMC event generator on GPUs and vector CPUs
Authors:
Andrea Valassi,
Stefan Roiser,
Olivier Mattelaer,
Stephan Hageboeck
Abstract:
Physics event generators are essential components of the data analysis software chain of high energy physics experiments, and important consumers of their CPU resources. Improving the software performance of these packages on modern hardware architectures, such as those deployed at HPC centers, is essential in view of the upcoming HL-LHC physics programme. In this paper, we describe an ongoing act…
▽ More
Physics event generators are essential components of the data analysis software chain of high energy physics experiments, and important consumers of their CPU resources. Improving the software performance of these packages on modern hardware architectures, such as those deployed at HPC centers, is essential in view of the upcoming HL-LHC physics programme. In this paper, we describe an ongoing activity to reengineer the Madgraph5_aMC@NLO physics event generator, primarily to port it and allow its efficient execution on GPUs, but also to modernize it and optimize its performance on vector CPUs. We describe the motivation, engineering process and software architecture design of our developments, as well as the current challenges and future directions for this project. This paper is based on our submission to vCHEP2021 in March 2021,complemented with a few preliminary results that we presented during the conference. Further details and updated results will be given in later publications.
△ Less
Submitted 13 July, 2021; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Software Training in HEP
Authors:
Sudhir Malik,
Samuel Meehan,
Kilian Lieret,
Meirin Oan Evans,
Michel H. Villanueva,
Daniel S. Katz,
Graeme A. Stewart,
Peter Elmer,
Sizar Aziz,
Matthew Bellis,
Riccardo Maria Bianchi,
Gianluca Bianco,
Johan Sebastian Bonilla,
Angela Burger,
Jackson Burzynski,
David Chamont,
Matthew Feickert,
Philipp Gadow,
Bernhard Manfred Gruber,
Daniel Guest,
Stephan Hageboeck,
Lukas Heinrich,
Maximilian M. Horzela,
Marc Huwiler,
Clemens Lange
, et al. (22 additional authors not shown)
Abstract:
Long term sustainability of the high energy physics (HEP) research software ecosystem is essential for the field. With upgrades and new facilities coming online throughout the 2020s this will only become increasingly relevant throughout this decade. Meeting this sustainability challenge requires a workforce with a combination of HEP domain knowledge and advanced software skills. The required softw…
▽ More
Long term sustainability of the high energy physics (HEP) research software ecosystem is essential for the field. With upgrades and new facilities coming online throughout the 2020s this will only become increasingly relevant throughout this decade. Meeting this sustainability challenge requires a workforce with a combination of HEP domain knowledge and advanced software skills. The required software skills fall into three broad groups. The first is fundamental and generic software engineering (e.g. Unix, version control,C++, continuous integration). The second is knowledge of domain specific HEP packages and practices (e.g., the ROOT data format and analysis framework). The third is more advanced knowledge involving more specialized techniques. These include parallel programming, machine learning and data science tools, and techniques to preserve software projects at all scales. This paper dis-cusses the collective software training program in HEP and its activities led by the HEP Software Foundation (HSF) and the Institute for Research and Innovation in Software in HEP (IRIS-HEP). The program equips participants with an array of software skills that serve as ingredients from which solutions to the computing challenges of HEP can be formed. Beyond serving the community by ensuring that members are able to pursue research goals, this program serves individuals by providing intellectual capital and transferable skills that are becoming increasingly important to careers in the realm of software and computing, whether inside or outside HEP
△ Less
Submitted 6 August, 2021; v1 submitted 28 February, 2021;
originally announced March 2021.
-
What the new RooFit can do for your analysis
Authors:
Stephan Hageboeck
Abstract:
RooFit is a toolkit for statistical modelling and fitting, and together with RooStats it is used for measurements and statistical tests by most experiments in particle physics. Since one year, RooFit is being modernised. In this talk, improvements already released with ROOT will be discussed, such as faster data loading, vectorised computations and more standard-like interfaces. These allow for sp…
▽ More
RooFit is a toolkit for statistical modelling and fitting, and together with RooStats it is used for measurements and statistical tests by most experiments in particle physics. Since one year, RooFit is being modernised. In this talk, improvements already released with ROOT will be discussed, such as faster data loading, vectorised computations and more standard-like interfaces. These allow for speeding up unbinned fits by several factors, and make RooFit easier to use from both C++ and Python.
△ Less
Submitted 22 February, 2021; v1 submitted 4 December, 2020;
originally announced December 2020.
-
HL-LHC Computing Review: Common Tools and Community Software
Authors:
HEP Software Foundation,
:,
Thea Aarrestad,
Simone Amoroso,
Markus Julian Atkinson,
Joshua Bendavid,
Tommaso Boccali,
Andrea Bocci,
Andy Buckley,
Matteo Cacciari,
Paolo Calafiura,
Philippe Canal,
Federico Carminati,
Taylor Childers,
Vitaliano Ciulli,
Gloria Corti,
Davide Costanzo,
Justin Gage Dezoort,
Caterina Doglioni,
Javier Mauricio Duarte,
Agnieszka Dziurda,
Peter Elmer,
Markus Elsing,
V. Daniel Elvira,
Giulio Eulisse
, et al. (85 additional authors not shown)
Abstract:
Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this doc…
▽ More
Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this document we address the issues for software that is used in multiple experiments (usually even more widely than ATLAS and CMS) and maintained by teams of developers who are either not linked to a particular experiment or who contribute to common software within the context of their experiment activity. We also give space to general considerations for future software and projects that tackle upcoming challenges, no matter who writes it, which is an area where community convergence on best practice is extremely useful.
△ Less
Submitted 31 August, 2020;
originally announced August 2020.
-
A Faster, More Intuitive RooFit
Authors:
Stephan Hageboeck
Abstract:
RooFit and RooStats, the toolkits for statistical modelling in ROOT, are used in most searches and measurements at the Large Hadron Collider as well as at $B$ factories. Larger datasets to be collected at e.g. the High-Luminosity LHC will enable measurements with higher precision, but will require faster data processing to keep fitting times stable. In this work, a simplification of RooFit's inter…
▽ More
RooFit and RooStats, the toolkits for statistical modelling in ROOT, are used in most searches and measurements at the Large Hadron Collider as well as at $B$ factories. Larger datasets to be collected at e.g. the High-Luminosity LHC will enable measurements with higher precision, but will require faster data processing to keep fitting times stable. In this work, a simplification of RooFit's interfaces and a redesign of its internal dataflow is presented. Interfaces are being extended to look and feel more STL-like to be more accessible both from C++ and Python to improve interoperability and ease of use, while maintaining compatibility with old code. The redesign of the dataflow improves cache locality and data loading, and can be used to process batches of data with vectorised SIMD computations. This reduces the time for computing unbinned likelihoods by a factor four to 16. This will allow to fit larger datasets of the future in the same time or faster than today's fits.
△ Less
Submitted 27 July, 2020; v1 submitted 28 March, 2020;
originally announced March 2020.
-
Making RooFit Ready for Run 3
Authors:
Stephan Hageboeck,
Lorenzo Moneta
Abstract:
RooFit and RooStats, the toolkits for statistical modelling in ROOT, are used in most searches and measurements at the Large Hadron Collider. The data to be collected in Run 3 will enable measurements with higher precision and models with larger complexity, but also require faster data processing. In this work, first results on modernising RooFit's collections, restructuring data flow and vectoris…
▽ More
RooFit and RooStats, the toolkits for statistical modelling in ROOT, are used in most searches and measurements at the Large Hadron Collider. The data to be collected in Run 3 will enable measurements with higher precision and models with larger complexity, but also require faster data processing. In this work, first results on modernising RooFit's collections, restructuring data flow and vectorising likelihood fits in RooFit will be discussed. These improvements will enable the LHC experiments to process larger datasets without having to compromise with respect to model complexity, as fitting times would increase significantly with the large datasets to be expected in Run 3.
△ Less
Submitted 28 March, 2020;
originally announced March 2020.