Search | arXiv e-print repository

Measurement of groomed event shape observables in deep-inelastic electron-proton scattering at HERA

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (123 additional authors not shown)

Abstract: The H1 Collaboration at HERA reports the first measurement of groomed event shape observables in deep inelastic electron-proton scattering (DIS) at $\sqrt{s}=319$ GeV, using data recorded between the years 2003 and 2007 with an integrated luminosity of $351$ pb$^{-1}$. Event shapes provide incisive probes of perturbative and non-perturbative QCD. Grooming techniques have been used for jet measurem… ▽ More The H1 Collaboration at HERA reports the first measurement of groomed event shape observables in deep inelastic electron-proton scattering (DIS) at $\sqrt{s}=319$ GeV, using data recorded between the years 2003 and 2007 with an integrated luminosity of $351$ pb$^{-1}$. Event shapes provide incisive probes of perturbative and non-perturbative QCD. Grooming techniques have been used for jet measurements in hadronic collisions; this paper presents the first application of grooming to DIS data. The analysis is carried out in the Breit frame, utilizing the novel Centauro jet clustering algorithm that is designed for DIS event topologies. Events are required to have squared momentum-transfer $Q^2 > 150$ GeV$^2$ and inelasticity $ 0.2 < y < 0.7$. We report measurements of the production cross section of groomed event 1-jettiness and groomed invariant mass for several choices of grooming parameter. Monte Carlo model calculations and analytic calculations based on Soft Collinear Effective Theory are compared to the measurements. △ Less

Submitted 1 August, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 32 pages, 17 tables, 7 figures, version as accepted by EPJ C

Report number: DESY-24-036

Journal ref: EPJC 84 (2024), 718

arXiv:2403.10109 [pdf, other]

Measurement of the 1-jettiness event shape observable in deep-inelastic electron-proton scattering at HERA

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (124 additional authors not shown)

Abstract: The H1 Collaboration reports the first measurement of the 1-jettiness event shape observable $τたう_1^b$ in neutral-current deep-inelastic electron-proton scattering (DIS). The observable $τたう_1^b$ is equivalent to a thrust observable defined in the Breit frame. The data sample was collected at the HERA $ep$ collider in the years 2003-2007 with center-of-mass energy of $\sqrt{s}=319\,\text{GeV}$, corres… ▽ More The H1 Collaboration reports the first measurement of the 1-jettiness event shape observable $τたう_1^b$ in neutral-current deep-inelastic electron-proton scattering (DIS). The observable $τたう_1^b$ is equivalent to a thrust observable defined in the Breit frame. The data sample was collected at the HERA $ep$ collider in the years 2003-2007 with center-of-mass energy of $\sqrt{s}=319\,\text{GeV}$, corresponding to an integrated luminosity of $351.1\,\text{pb}^{-1}$. Triple differential cross sections are provided as a function of $τたう_1^b$, event virtuality $Q^2$, and inelasticity $y$, in the kinematic region $Q^2>150\,\text{GeV}^{2}$. Single differential cross section are provided as a function of $τたう_1^b$ in a limited kinematic range. Double differential cross sections are measured, in contrast, integrated over $τたう_1^b$ and represent the inclusive neutral-current DIS cross section measured as a function of $Q^2$ and $y$. The data are compared to a variety of predictions and include classical and modern Monte Carlo event generators, predictions in fixed-order perturbative QCD where calculations up to $\mathcal{O}(αあるふぁ_s^3)$ are available for $τたう_1^b$ or inclusive DIS, and resummed predictions at next-to-leading logarithmic accuracy matched to fixed order predictions at $\mathcal{O}(αあるふぁ_s^2)$. These comparisons reveal sensitivity of the 1-jettiness observable to QCD parton shower and resummation effects, as well as the modeling of hadronization and fragmentation. Within their range of validity, the fixed-order predictions provide a good description of the data. Monte Carlo event generators are predictive over the full measured range and hence their underlying models and parameters can be constrained by comparing to the presented data. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 45 pages, 38 tables, 13 figures

Report number: DESY-24-035

arXiv:2403.08982 [pdf, other]

doi 10.1140/epjc/s10052-024-13003-1

Observation and differential cross section measurement of neutral current DIS events with an empty hemisphere in the Breit frame

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (124 additional authors not shown)

Abstract: The Breit frame provides a natural frame to analyze lepton-proton scattering events. In this reference frame, the parton model hard interactions between a quark and an exchanged boson defines the coordinate system such that the struck quark is back-scattered along the virtual photon momentum direction. In Quantum Chromodynamics (QCD), higher order perturbative or non-perturbative effects can chang… ▽ More The Breit frame provides a natural frame to analyze lepton-proton scattering events. In this reference frame, the parton model hard interactions between a quark and an exchanged boson defines the coordinate system such that the struck quark is back-scattered along the virtual photon momentum direction. In Quantum Chromodynamics (QCD), higher order perturbative or non-perturbative effects can change this picture drastically. As Bjorken-$x$ decreases below one half, a rather peculiar event signature is predicted with increasing probability, where no radiation is present in one of the two Breit-frame hemispheres and all emissions are to be found in the other hemisphere. At higher orders in $αあるふぁ_s$ or in the presence of soft QCD effects, predictions of the rate of these events are far from trivial, and that motivates measurements with real data. We report on the first observation of the empty current hemisphere events in electron-proton collisions at the HERA collider using data recorded with the H1 detector at a center-of-mass energy of 319 GeV. The fraction of inclusive neutral-current DIS events with an empty hemisphere is found to be $0.0112 \pm 3.9\,\%_\text{stat} \pm 4.5\,\%_\text{syst} \pm 1.6\,\%_\text{mod}$ in the selected kinematic region of $150< Q^2<1500$ GeV$^2$ and inelasticity $0.14< y<0.7$. The data sample corresponds to an integrated luminosity of 351.1 pb$^{-1}$, sufficient to enable differential cross section measurements of these events. The results show an enhanced discriminating power at lower Bjorken-$x$ among different Monte Carlo event generator predictions. △ Less

Submitted 1 August, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: 13 pages, 5 figures, 2 Tables. This version as accepted for publication

Report number: DESY-24-034

Journal ref: EPJC 84 (2024), 720

arXiv:2303.13620 [pdf, other]

doi 10.1016/j.physletb.2023.138101

Unbinned Deep Learning Jet Substructure Measurement in High $Q^2$ ep collisions at HERA

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (120 additional authors not shown)

Abstract: The radiation pattern within high energy quark- and gluon-initiated jets (jet substructure) is used extensively as a precision probe of the strong force as well as an environment for optimizing event generators with numerous applications in high energy particle and nuclear physics. Looking at electron-proton collisions is of particular interest as many of the complications present at hadron collid… ▽ More The radiation pattern within high energy quark- and gluon-initiated jets (jet substructure) is used extensively as a precision probe of the strong force as well as an environment for optimizing event generators with numerous applications in high energy particle and nuclear physics. Looking at electron-proton collisions is of particular interest as many of the complications present at hadron colliders are absent. A detailed study of modern jet substructure observables, jet angularities, in electron-proton collisions is presented using data recorded using the H1 detector at HERA. The measurement is unbinned and multi-dimensional, using machine learning to correct for detector effects. All of the available reconstructed object information of the respective jets is interpreted by a graph neural network, achieving superior precision on a selected set of jet angularities. Training these networks was enabled by the use of a large number of GPUs in the Perlmutter supercomputer at Berkeley Lab. The particle jets are reconstructed in the laboratory frame, using the $k_{\mathrm{T}}$ jet clustering algorithm. Results are reported at high transverse momentum transfer $Q^2>150$ GeV${}^2$, and inelasticity $0.2 < y < 0.7$. The analysis is also performed in sub-regions of $Q^2$, thus probing scale dependencies of the substructure variables. The data are compared with a variety of predictions and point towards possible improvements of such models. △ Less

Submitted 14 September, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: 25 pages, 10 figures, 8 tables, version accepted by Physics Letters B

Report number: DESY-23-034

Journal ref: PLB 844 (2023) 138101

arXiv:2206.10369 [pdf, other]

The State of Sparse Training in Deep Reinforcement Learning

Authors: Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

Abstract: The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision. Their appeal stems largely from the reduced number of parameters required to train and store, as well as in an increase in learning efficiency. Somewhat surprisingly, there have been very few efforts exploring their use in Deep Reinforcement Learning (DRL). In this work we perform a systematic… ▽ More The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision. Their appeal stems largely from the reduced number of parameters required to train and store, as well as in an increase in learning efficiency. Somewhat surprisingly, there have been very few efforts exploring their use in Deep Reinforcement Learning (DRL). In this work we perform a systematic investigation into applying a number of existing sparse training techniques on a variety of DRL agents and environments. Our results corroborate the findings from sparse training in the computer vision domain - sparse networks perform better than dense networks for the same parameter count - in the DRL domain. We provide detailed analyses on how the various components in DRL are affected by the use of sparse networks and conclude by suggesting promising avenues for improving the effectiveness of sparse training methods, as well as for advancing their use in DRL. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Proceedings of the 39th International Conference on Machine Learning (ICML'22)

arXiv:2203.15556 [pdf, other]

Training Compute-Optimal Large Language Models

Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre

Abstract: We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion… ▽ More We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4$\times$ more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.07622 [pdf, other]

The International Linear Collider: Report to Snowmass 2021

Authors: Alexander Aryshev, Ties Behnke, Mikael Berggren, James Brau, Nathaniel Craig, Ayres Freitas, Frank Gaede, Spencer Gessner, Stefania Gori, Christophe Grojean, Sven Heinemeyer, Daniel Jeans, Katja Kruger, Benno List, Jenny List, Zhen Liu, Shinichiro Michizono, David W. Miller, Ian Moult, Hitoshi Murayama, Tatsuya Nakada, Emilio Nanni, Mihoko Nojiri, Hasan Padamsee, Maxim Perelstein , et al. (487 additional authors not shown)

Abstract: The International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This docu… ▽ More The International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This document brings the story of the ILC up to date, emphasizing its strong physics motivation, its readiness for construction, and the opportunity it presents to the US and the global particle physics community. △ Less

Submitted 16 January, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: 356 pages, Large pdf file (40 MB) submitted to Snowmass 2021; v2 references to Snowmass contributions added, additional authors; v3 references added, some updates, additional authors

Report number: DESY-22-045, IFT--UAM/CSIC--22-028, KEK Preprint 2021-61, PNNL-SA-160884, SLAC-PUB-17662

arXiv:2202.01169 [pdf, other]

Unified Scaling Laws for Routed Language Models

Authors: Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu , et al. (1 additional authors not shown)

Abstract: The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better… ▽ More The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. Afterwards we provide two applications of these laws: first deriving an Effective Parameter Count along which all models scale at the same rate, and then using the scaling coefficients to give a quantitative comparison of the three routing techniques considered. Our analysis derives from an extensive evaluation of Routing Networks across five orders of magnitude of size, including models with hundreds of experts and hundreds of billions of parameters. △ Less

Submitted 9 February, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

Comments: Fixing typos and affiliation clarity

arXiv:2112.11446 [pdf, other]

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms. △ Less

Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: 120 pages

arXiv:2112.06749 [pdf, other]

Step-unrolled Denoising Autoencoders for Text Generation

Authors: Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, Aaron van den Oord

Abstract: In this paper we propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising diffusion techniques, SUNDAE is repeatedly applied on a sequence of tokens, starting from random inputs and improving them each time until convergence. We present a simple new improvement operator that converges in fewer iteratio… ▽ More In this paper we propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising diffusion techniques, SUNDAE is repeatedly applied on a sequence of tokens, starting from random inputs and improving them each time until convergence. We present a simple new improvement operator that converges in fewer iterations than diffusion methods, while qualitatively producing better samples on natural language datasets. SUNDAE achieves state-of-the-art results (among non-autoregressive methods) on the WMT'14 English-to-German translation task and good qualitative results on unconditional language modeling on the Colossal Cleaned Common Crawl dataset and a dataset of Python code from GitHub. The non-autoregressive nature of SUNDAE opens up possibilities beyond left-to-right prompted generation, by filling in arbitrary blank patterns in a template. △ Less

Submitted 19 April, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: Accepted to ICLR 2022

arXiv:2112.04426 [pdf, other]

Improving language models by retrieving from trillions of tokens

Authors: Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan , et al. (3 additional authors not shown)

Abstract: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to d… ▽ More We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. △ Less

Submitted 7 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: Fix incorrect reported numbers in Table 14

arXiv:2112.01120 [pdf, other]

Impact of jet-production data on the next-to-next-to-leading-order determination of HERAPDF2.0 parton distributions

Authors: H1, ZEUS Collaborations, :, I. Abt, R. Aggarwal, V. Andreev, M. Arratia, V. Aushev, A. Baghdasaryan, A. Baty, K. Begzsuren, O. Behnke, A. Belousov, A. Bertolin, I. Bloch, V. Boudry, G. Brandt, I. Brock, N. H. Brook, R. Brugnera, A. Bruni, A. Buniatyan, P. J. Bussey, L. Bystritskaya, A. Caldwell , et al. (212 additional authors not shown)

Abstract: The HERAPDF2.0 ensemble of parton distribution functions (PDFs) was introduced in 2015. The final stage is presented, a next-to-next-to-leading-order (NNLO) analysis of the HERA data on inclusive deep inelastic $ep$ scattering together with jet data as published by the H1 and ZEUS collaborations. A perturbative QCD fit, simultaneously of $αあるふぁ_s(M_Z^2)$ and and the PDFs, was performed with the result… ▽ More The HERAPDF2.0 ensemble of parton distribution functions (PDFs) was introduced in 2015. The final stage is presented, a next-to-next-to-leading-order (NNLO) analysis of the HERA data on inclusive deep inelastic $ep$ scattering together with jet data as published by the H1 and ZEUS collaborations. A perturbative QCD fit, simultaneously of $αあるふぁ_s(M_Z^2)$ and and the PDFs, was performed with the result $αあるふぁ_s(M_Z^2) = 0.1156 \pm 0.0011~{\rm (exp)}~ ^{+0.0001}_{-0.0002}~ {\rm (model}$ ${\rm +~parameterisation)}~ \pm 0.0029~{\rm (scale)}$. The PDF sets of HERAPDF2.0Jets NNLO were determined with separate fits using two fixed values of $αあるふぁ_s(M_Z^2)$, $αあるふぁ_s(M_Z^2)=0.1155$ and $0.118$, since the latter value was already chosen for the published HERAPDF2.0 NNLO analysis based on HERA inclusive DIS data only. The different sets of PDFs are presented, evaluated and compared. The consistency of the PDFs determined with and without the jet data demonstrates the consistency of HERA inclusive and jet-production cross-section data. The inclusion of the jet data reduced the uncertainty on the gluon PDF. Predictions based on the PDFs of HERAPDF2.0Jets NNLO give an excellent description of the jet-production data used as input. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Comments: 43 pages, 24 figures, to be submitted to Eur. Phys. J. C

Report number: DESY-21-206

arXiv:2108.12376 [pdf, other]

Measurement of lepton-jet correlation in deep-inelastic scattering with the H1 detector using machine learning for unfolding

Authors: H1 Collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Belousov, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, L. Cunqueiro Mendez, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu , et al. (120 additional authors not shown)

Abstract: The first measurement of lepton-jet momentum imbalance and azimuthal correlation in lepton-proton scattering at high momentum transfer is presented. These data, taken with the H1 detector at HERA, are corrected for detector effects using an unbinned machine learning algorithm OmniFold, which considers eight observables simultaneously in this first application. The unfolded cross sections are compa… ▽ More The first measurement of lepton-jet momentum imbalance and azimuthal correlation in lepton-proton scattering at high momentum transfer is presented. These data, taken with the H1 detector at HERA, are corrected for detector effects using an unbinned machine learning algorithm OmniFold, which considers eight observables simultaneously in this first application. The unfolded cross sections are compared to calculations performed within the context of collinear or transverse-momentum-dependent (TMD) factorization in Quantum Chromodynamics (QCD) as well as Monte Carlo event generators. The measurement probes a wide range of QCD phenomena, including TMD parton distribution functions and their evolution with energy in so far unexplored kinematic regions. △ Less

Submitted 1 April, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: 17 pages, 7 figures, 4 tables, version accepted by PRL

Report number: DESY 21-130

arXiv:2106.03517 [pdf, other]

Top-KAST: Top-K Always Sparse Training

Authors: Siddhant M. Jayakumar, Razvan Pascanu, Jack W. Rae, Simon Osindero, Erich Elsen

Abstract: Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint. Unfortunately, most existing methods for inducing performant sparse models still entail the instantiation of dense parameters, or dense gradients in the backward-pass, dur… ▽ More Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint. Unfortunately, most existing methods for inducing performant sparse models still entail the instantiation of dense parameters, or dense gradients in the backward-pass, during training. For very large models this requirement can be prohibitive. In this work we propose Top-KAST, a method that preserves constant sparsity throughout training (in both the forward and backward-passes). We demonstrate the efficacy of our approach by showing that it performs comparably to or better than previous works when training models on the established ImageNet benchmark, whilst fully maintaining sparsity. In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling where the current best performing architectures tend to have tens of billions of parameters and scaling up does not yet seem to have saturated performance. Sparse versions of these architectures can be run with significantly fewer resources, making them more widely accessible and applicable. Furthermore, in addition to being effective, our approach is straightforward and can easily be implemented in a wide range of existing machine learning frameworks with only a few additional lines of code. We therefore hope that our contribution will help enable the broader community to explore the potential held by massive models, without incurring massive computational cost. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Journal ref: Advances in Neural Information Processing Systems, 33, 20744-20754

arXiv:2006.15081 [pdf, other]

On the Generalization Benefit of Noise in Stochastic Gradient Descent

Authors: Samuel L. Smith, Erich Elsen, Soham De

Abstract: It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks. However recent papers have questioned this claim, arguing that this effect is simply a consequence of suboptimal hyperparameter tuning or insufficient compute budgets when the batch size is large. In this paper, we perform carefully designed experiment… ▽ More It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks. However recent papers have questioned this claim, arguing that this effect is simply a consequence of suboptimal hyperparameter tuning or insufficient compute budgets when the batch size is large. In this paper, we perform carefully designed experiments and rigorous hyperparameter sweeps on a range of popular models, which verify that small or moderately large batch sizes can substantially outperform very large batches on the test set. This occurs even when both models are trained for the same number of iterations and large batches achieve smaller training losses. Our results confirm that the noise in stochastic gradients can enhance generalization. We study how the optimal learning rate schedule changes as the epoch budget grows, and we provide a theoretical account of our observations based on the stochastic differential equation perspective of SGD dynamics. △ Less

Submitted 26 June, 2020; originally announced June 2020.

Comments: Camera-ready version of ICML 2020

arXiv:2006.10901 [pdf, other]

Sparse GPU Kernels for Deep Learning

Authors: Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen

Abstract: Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these applications have relatively moderate levels of sparsity that are not sufficient for existing sparse kernels to outperform their dense counterparts. In this… ▽ More Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these applications have relatively moderate levels of sparsity that are not sufficient for existing sparse kernels to outperform their dense counterparts. In this work, we study sparse matrices from deep learning applications and identify favorable properties that can be exploited to accelerate computation. Based on these insights, we develop high-performance GPU kernels for two sparse matrix operations widely applicable in neural networks: sparse matrix-dense matrix multiplication and sampled dense-dense matrix multiplication. Our kernels reach 27% of single-precision peak on Nvidia V100 GPUs. Using our kernels, we demonstrate sparse Transformer and MobileNet models that achieve 1.2-2.1x speedups and up to 12.8x memory savings without sacrificing accuracy. △ Less

Submitted 31 August, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: Updated to match camera-ready for SC20

arXiv:2006.07360 [pdf, other]

AlgebraNets

Authors: Jordan Hoffmann, Simon Schmitt, Simon Osindero, Karen Simonyan, Erich Elsen

Abstract: Neural networks have historically been built layerwise from the set of functions in ${f: \mathbb{R}^n \to \mathbb{R}^m }$, i.e. with activations and weights/parameters represented by real numbers, $\mathbb{R}$. Our work considers a richer set of objects for activations and weights, and undertakes a comprehensive study of alternative algebras as number representations by studying their performance… ▽ More Neural networks have historically been built layerwise from the set of functions in ${f: \mathbb{R}^n \to \mathbb{R}^m }$, i.e. with activations and weights/parameters represented by real numbers, $\mathbb{R}$. Our work considers a richer set of objects for activations and weights, and undertakes a comprehensive study of alternative algebras as number representations by studying their performance on two challenging problems: large-scale image classification using the ImageNet dataset and language modeling using the enwiki8 and WikiText-103 datasets. We denote this broader class of models as AlgebraNets. Our findings indicate that the conclusions of prior work, which explored neural networks constructed from $\mathbb{C}$ (complex numbers) and $\mathbb{H}$ (quaternions) on smaller datasets, do not always transfer to these challenging settings. However, our results demonstrate that there are alternative algebras which deliver better parameter and computational efficiency compared with $\mathbb{R}$. We consider $\mathbb{C}$, $\mathbb{H}$, $M_{2}(\mathbb{R})$ (the set of $2\times2$ real-valued matrices), $M_{2}(\mathbb{C})$, $M_{3}(\mathbb{R})$ and $M_{4}(\mathbb{R})$. Additionally, we note that multiplication in these algebras has higher compute density than real multiplication, a useful property in situations with inherently limited parameter reuse such as auto-regressive inference and sparse neural networks. We therefore investigate how to induce sparsity within AlgebraNets. We hope that our strong results on large-scale, practical benchmarks will spur further exploration of these unconventional architectures which challenge the default choice of using real numbers for neural network weights and activations. △ Less

Submitted 16 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

arXiv:2006.07232 [pdf, other]

A Practical Sparse Approximation for Real Time Recurrent Learning

Authors: Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

Abstract: Current methods for training recurrent neural networks are based on backpropagation through time, which requires storing a complete history of network states, and prohibits updating the weights `online' (after every timestep). Real Time Recurrent Learning (RTRL) eliminates the need for history storage and allows for online weight updates, but does so at the expense of computational costs that are… ▽ More Current methods for training recurrent neural networks are based on backpropagation through time, which requires storing a complete history of network states, and prohibits updating the weights `online' (after every timestep). Real Time Recurrent Learning (RTRL) eliminates the need for history storage and allows for online weight updates, but does so at the expense of computational costs that are quartic in the state size. This renders RTRL training intractable for all but the smallest networks, even ones that are made highly sparse. We introduce the Sparse n-step Approximation (SnAp) to the RTRL influence matrix, which only keeps entries that are nonzero within n steps of the recurrent core. SnAp with n=1 is no more expensive than backpropagation, and we find that it substantially outperforms other RTRL approximations with comparable costs such as Unbiased Online Recurrent Optimization. For highly sparse networks, SnAp with n=2 remains tractable and can outperform backpropagation through time in terms of learning speed when updates are done online. SnAp becomes equivalent to RTRL when n is large. △ Less

Submitted 12 June, 2020; originally announced June 2020.

arXiv:2006.03575 [pdf, other]

End-to-End Adversarial Text-to-Speech

Authors: Jeff Donahue, Sander Dieleman, Mikołaj Bińkowski, Erich Elsen, Karen Simonyan

Abstract: Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest. In this work, we take on the challenging task of learning to synthesise speech from normalised text or phonemes in an end-to-end manner, resulting in models which operate directly on character or phoneme input sequences and produce raw speech audi… ▽ More Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest. In this work, we take on the challenging task of learning to synthesise speech from normalised text or phonemes in an end-to-end manner, resulting in models which operate directly on character or phoneme input sequences and produce raw speech audio outputs. Our proposed generator is feed-forward and thus efficient for both training and inference, using a differentiable alignment scheme based on token length prediction. It learns to produce high fidelity audio through a combination of adversarial feedback and prediction losses constraining the generated audio to roughly match the ground truth in terms of its total duration and mel-spectrogram. To allow the model to capture temporal variation in the generated audio, we employ soft dynamic time warping in the spectrogram-based prediction loss. The resulting model achieves a mean opinion score exceeding 4 on a 5 point scale, which is comparable to the state-of-the-art models relying on multi-stage training and additional supervision. △ Less

Submitted 17 March, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

Comments: 23 pages. In proceedings of ICLR 2021

arXiv:1911.11134 [pdf, other]

Rigging the Lottery: Making All Tickets Winners

Authors: Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

Abstract: Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and… ▽ More Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static. Code used in our work can be found in github.com/google-research/rigl. △ Less

Submitted 23 July, 2021; v1 submitted 25 November, 2019; originally announced November 2019.

Comments: Published in Proceedings of the 37th International Conference on Machine Learning. Code can be found in github.com/google-research/rigl

Journal ref: Proceedings of the 37th International Conference on Machine Learning (2020) 471-481

arXiv:1911.09723 [pdf, other]

Fast Sparse ConvNets

Authors: Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

Abstract: Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolutions in Xception, and the inverted bottleneck in MobileNet v2. Notably, in all of these cases, the resulting building blocks enabled not only higher… ▽ More Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolutions in Xception, and the inverted bottleneck in MobileNet v2. Notably, in all of these cases, the resulting building blocks enabled not only higher efficiency, but also higher accuracy, and found wide adoption in the field. In this work, we further expand the arsenal of efficient building blocks for neural network architectures; but instead of combining standard primitives (such as convolution), we advocate for the replacement of these dense primitives with their sparse counterparts. While the idea of using sparsity to decrease the parameter count is not new, the conventional wisdom is that this reduction in theoretical FLOPs does not translate into real-world efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly, which we open-source for the benefit of the community as part of the XNNPACK library. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet v1, MobileNet v2 and EfficientNet architectures substantially outperform strong dense baselines on the efficiency-accuracy curve. On Snapdragon 835 our sparse networks outperform their dense equivalents by $1.3-2.4\times$ -- equivalent to approximately one entire generation of MobileNet-family improvement. We hope that our findings will facilitate wider adoption of sparsity as a tool for creating efficient and accurate deep learning architectures. △ Less

Submitted 21 November, 2019; originally announced November 2019.

arXiv:1909.11646 [pdf, other]

High Fidelity Speech Synthesis with Adversarial Networks

Authors: Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan

Abstract: Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech. To address this paucity, we introdu… ▽ More Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech. To address this paucity, we introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech. Our architecture is composed of a conditional feed-forward generator producing raw speech audio, and an ensemble of discriminators which operate on random windows of different sizes. The discriminators analyse the audio both in terms of general realism, as well as how well the audio corresponds to the utterance that should be pronounced. To measure the performance of GAN-TTS, we employ both subjective human evaluation (MOS - Mean Opinion Score), as well as novel quantitative metrics (Fréchet DeepSpeech Distance and Kernel DeepSpeech Distance), which we find to be well correlated with MOS. We show that GAN-TTS is capable of generating high-fidelity speech with naturalness comparable to the state-of-the-art models, and unlike autoregressive models, it is highly parallelisable thanks to an efficient feed-forward generator. Listen to GAN-TTS reading this abstract at https://storage.googleapis.com/deepmind-media/research/abstract.wav. △ Less

Submitted 26 September, 2019; v1 submitted 25 September, 2019; originally announced September 2019.

arXiv:1906.10732 [pdf, other]

The Difficulty of Training Sparse Neural Networks

Authors: Utku Evci, Fabian Pedregosa, Aidan Gomez, Erich Elsen

Abstract: We investigate the difficulties of training sparse neural networks and make new observations about optimization dynamics and the energy landscape within the sparse regime. Recent work of \citep{Gale2019, Liu2018} has shown that sparse ResNet-50 architectures trained on ImageNet-2012 dataset converge to solutions that are significantly worse than those found by pruning. We show that, despite the fa… ▽ More We investigate the difficulties of training sparse neural networks and make new observations about optimization dynamics and the energy landscape within the sparse regime. Recent work of \citep{Gale2019, Liu2018} has shown that sparse ResNet-50 architectures trained on ImageNet-2012 dataset converge to solutions that are significantly worse than those found by pruning. We show that, despite the failure of optimizers, there is a linear path with a monotonically decreasing objective from the initialization to the "good" solution. Additionally, our attempts to find a decreasing objective path from "bad" solutions to the "good" ones in the sparse subspace fail. However, if we allow the path to traverse the dense subspace, then we consistently find a path between two solutions. These findings suggest traversing extra dimensions may be needed to escape stationary points found in the sparse subspace. △ Less

Submitted 7 October, 2020; v1 submitted 25 June, 2019; originally announced June 2019.

Comments: sparse networks, pruning, energy landscape, sparsity

arXiv:1906.03139 [pdf, other]

Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods

Authors: Karel Lenc, Erich Elsen, Tom Schaul, Karen Simonyan

Abstract: In this work we show that Evolution Strategies (ES) are a viable method for learning non-differentiable parameters of large supervised models. ES are black-box optimization algorithms that estimate distributions of model parameters; however they have only been used for relatively small problems so far. We show that it is possible to scale ES to more complex tasks and models with millions of parame… ▽ More In this work we show that Evolution Strategies (ES) are a viable method for learning non-differentiable parameters of large supervised models. ES are black-box optimization algorithms that estimate distributions of model parameters; however they have only been used for relatively small problems so far. We show that it is possible to scale ES to more complex tasks and models with millions of parameters. While using ES for differentiable parameters is computationally impractical (although possible), we show that a hybrid approach is practically feasible in the case where the model has both differentiable and non-differentiable parameters. In this approach we use standard gradient-based methods for learning differentiable weights, while using ES for learning non-differentiable parameters - in our case sparsity masks of the weights. This proposed method is surprisingly competitive, and when parallelized over multiple devices has only negligible training time overhead compared to training with gradient descent. Additionally, this method allows to train sparse models from the first training step, so they can be much larger than when using methods that require training dense models first. We present results and analysis of supervised feed-forward models (such as MNIST and CIFAR-10 classification), as well as recurrent models, such as SparseWaveRNN for text-to-speech. △ Less

Submitted 7 June, 2019; originally announced June 2019.

arXiv:1902.09574 [pdf, other]

The State of Sparsity in Deep Neural Networks

Authors: Trevor Gale, Erich Elsen, Sara Hooker

Abstract: We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression rates on smaller dataset… ▽ More We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression rates on smaller datasets perform inconsistently, and that simple magnitude pruning approaches achieve comparable or better results. Additionally, we replicate the experiments performed by (Frankle & Carbin, 2018) and (Liu et al., 2018) at scale and show that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization. Together, these results highlight the need for large-scale benchmarks in the field of model compression. We open-source our code, top performing model checkpoints, and results of all hyperparameter configurations to establish rigorous baselines for future work on compression and sparsification. △ Less

Submitted 25 February, 2019; originally announced February 2019.

arXiv:1810.12247 [pdf, other]

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Authors: Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck

Abstract: Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of… ▽ More Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave. This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music. △ Less

Submitted 17 January, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

Comments: Examples available at https://goo.gl/magenta/maestro-examples

arXiv:1802.08435 [pdf, other]

Efficient Neural Audio Synthesis

Authors: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu

Abstract: Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high outp… ▽ More Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high output quality. We first describe a single-layer recurrent neural network, the WaveRNN, with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model. The compact form of the network makes it possible to generate 24kHzきろへるつ 16-bit audio 4x faster than real time on a GPU. Second, we apply a weight pruning technique to reduce the number of weights in the WaveRNN. We find that, for a constant number of parameters, large sparse networks perform better than small dense networks and this relationship holds for sparsity levels beyond 96%. The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time. Finally, we propose a new generation scheme based on subscaling that folds a long sequence into a batch of shorter sequences and allows one to generate multiple samples at once. The Subscale WaveRNN produces 16 samples per step without loss of quality and offers an orthogonal method for increasing sampling efficiency. △ Less

Submitted 25 June, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

Comments: 10 pages

arXiv:1711.10433 [pdf, other]

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

Authors: Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis

Abstract: The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time p… ▽ More The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices. △ Less

Submitted 28 November, 2017; originally announced November 2017.

arXiv:1710.11153 [pdf, other]

Onsets and Frames: Dual-Objective Piano Transcription

Authors: Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, Douglas Eck

Abstract: We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames. Our model predicts pitch onset events and then uses those predictions to condition framewise pitch predictions. During inference, we restrict the predictions from the framewise detector by not allowing a new note t… ▽ More We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames. Our model predicts pitch onset events and then uses those predictions to condition framewise pitch predictions. During inference, we restrict the predictions from the framewise detector by not allowing a new note to start unless the onset detector also agrees that an onset for that pitch is present in the frame. We focus on improving onsets and offsets together instead of either in isolation as we believe this correlates better with human musical perception. Our approach results in over a 100% relative improvement in note F1 score (with offsets) on the MAPS dataset. Furthermore, we extend the model to predict relative velocities of normalized audio which results in more natural-sounding transcriptions. △ Less

Submitted 5 June, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

Comments: Examples available at https://goo.gl/magenta/onsets-frames-examples

arXiv:1710.03740 [pdf, other]

Mixed Precision Training

Authors: Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu

Abstract: Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and g… ▽ More Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and gradients are stored in IEEE half-precision format. Half-precision floating numbers have limited numerical range compared to single-precision numbers. We propose two techniques to handle this loss of information. Firstly, we recommend maintaining a single-precision copy of the weights that accumulates the gradients after each optimizer step. This single-precision copy is rounded to half-precision format during training. Secondly, we propose scaling the loss appropriately to handle the loss of information with half-precision gradients. We demonstrate that this approach works for a wide variety of models including convolution neural networks, recurrent neural networks and generative adversarial networks. This technique works for large scale models with more than 100 million parameters trained on large datasets. Using this approach, we can reduce the memory consumption of deep learning models by nearly 2x. In future processors, we can also expect a significant computation speedup using half-precision hardware units. △ Less

Submitted 15 February, 2018; v1 submitted 10 October, 2017; originally announced October 2017.

Comments: Published as a conference paper at ICLR 2018

arXiv:1709.07251 [pdf, other]

Determination of the strong coupling constant $αあるふぁ_s(M_Z)$ in next-to-next-to-leading order QCD using H1 jet cross section measurements

Authors: H1 collaboration, V. Andreev, A. Baghdasaryan, K. Begzsuren, A. Belousov, V. Bertone, A. Bolz, V. Boudry, G. Brandt, V. Brisson, D. Britzger, A. Buniatyan, A. Bylinkin, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, J. G. Contreras, J. Cvach, J. Currie, J. B. Dainton, K. Daum, C. Diaconu, M. Dobre , et al. (123 additional authors not shown)

Abstract: The strong coupling constant $αあるふぁ_s(M_Z)$ is determined from inclusive jet and dijet cross sections in neutral-current deep-inelastic $ep$ scattering (DIS) measured at HERA by the H1 collaboration using next-to-next-to-leading order (NNLO) QCD predictions. The dependence of the NNLO predictions and of the resulting value of $αあるふぁ_s(M_Z)$ at the $Z$-boson mass $m_Z$ are studied as a function of the choi… ▽ More The strong coupling constant $αあるふぁ_s(M_Z)$ is determined from inclusive jet and dijet cross sections in neutral-current deep-inelastic $ep$ scattering (DIS) measured at HERA by the H1 collaboration using next-to-next-to-leading order (NNLO) QCD predictions. The dependence of the NNLO predictions and of the resulting value of $αあるふぁ_s(M_Z)$ at the $Z$-boson mass $m_Z$ are studied as a function of the choice of the renormalisation and factorisation scales. Using inclusive jet and dijet data together, the strong coupling constant is determined to be $αあるふぁ_s(M_Z)=0.1166\,(19)_{\rm exp}\,(24)_{\rm th}$. Complementary, $αあるふぁ_s(M_Z)$ is determined together with parton distribution functions of the proton (PDFs) from jet and inclusive DIS data measured by the H1 experiment. The value $αあるふぁ_s(M_Z)=0.1147\,(25)_{\rm tot}$ obtained is consistent with the determination from jet data alone. The impact of the jet data on the PDFs is studied. The running of the strong coupling is tested at different values of the renormalisation scale and the results are found to be in agreement with expectations. △ Less

Submitted 16 June, 2021; v1 submitted 21 September, 2017; originally announced September 2017.

Comments: 45 pages, 17 figures, with changes discussed in an erratum submitted to EPJ C

Report number: DESY17-137

arXiv:1705.08863 [pdf, ps, other]

doi 10.1016/j.physletb.2017.11.002

Running of the Charm-Quark Mass from HERA Deep-Inelastic Scattering Data

Authors: A. Gizhko, A. Geiser, S. Moch, I. Abt, O. Behnke, A. Bertolin, J. Blümlein, D. Britzger, R. Brugnera, A. Buniatyan, P. J. Bussey, R. Carlin, A. M. Cooper-Sarkar, K. Daum, S. Dusini, E. Elsen, L. Favart, J. Feltesse, B. Foster, A. Garfagnini, M. Garzelli, J. Gayler, D. Haidt, J. Hladky, A. W. Jung , et al. (25 additional authors not shown)

Abstract: Combined HERA data on charm production in deep-inelastic scattering have previously been used to determine the charm-quark running mass $m_c(m_c)$ in the MSbar renormalisation scheme. Here, the same data are used as a function of the photon virtuality $Q^2$ to evaluate the charm-quark running mass at different scales to one-loop order, in the context of a next-to-leading order QCD analysis. The sc… ▽ More Combined HERA data on charm production in deep-inelastic scattering have previously been used to determine the charm-quark running mass $m_c(m_c)$ in the MSbar renormalisation scheme. Here, the same data are used as a function of the photon virtuality $Q^2$ to evaluate the charm-quark running mass at different scales to one-loop order, in the context of a next-to-leading order QCD analysis. The scale dependence of the mass is found to be consistent with QCD expectations. △ Less

Submitted 24 May, 2017; originally announced May 2017.

Comments: 12 pages, 4 figures

Report number: DESY-17-048

arXiv:1704.05119 [pdf, other]

Exploring Sparsity in Recurrent Neural Networks

Authors: Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta

Abstract: Recurrent Neural Networks (RNN) are widely used to solve a variety of problems and as the quantity of data and the amount of available compute have increased, so have model sizes. The number of parameters in recent state-of-the-art networks makes them hard to deploy, especially on mobile phones and embedded devices. The challenge is due to both the size of the model and the time it takes to evalua… ▽ More Recurrent Neural Networks (RNN) are widely used to solve a variety of problems and as the quantity of data and the amount of available compute have increased, so have model sizes. The number of parameters in recent state-of-the-art networks makes them hard to deploy, especially on mobile phones and embedded devices. The challenge is due to both the size of the model and the time it takes to evaluate it. In order to deploy these RNNs efficiently, we propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network. At the end of training, the parameters of the network are sparse while accuracy is still close to the original dense neural network. The network size is reduced by 8x and the time required to train the model remains constant. Additionally, we can prune a larger dense network to achieve better than baseline performance while still reducing the total number of parameters significantly. Pruning RNNs reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2x to 7x. △ Less

Submitted 6 November, 2017; v1 submitted 17 April, 2017; originally announced April 2017.

Comments: Published as a conference paper at ICLR 2017

arXiv:1607.04381 [pdf, other]

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Authors: Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally

Abstract: Modern deep neural networks have a large number of parameters, making them very hard to train. We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance. In the first D (Dense) step, we train a dense network to learn connection weights and importance. In the S (Sparse) step, we regularize the network by pruning the unimp… ▽ More Modern deep neural networks have a large number of parameters, making them very hard to train. We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance. In the first D (Dense) step, we train a dense network to learn connection weights and importance. In the S (Sparse) step, we regularize the network by pruning the unimportant connections with small weights and retraining the network given the sparsity constraint. In the final D (re-Dense) step, we increase the model capacity by removing the sparsity constraint, re-initialize the pruned parameters from zero and retrain the whole dense network. Experiments show that DSD training can improve the performance for a wide range of CNNs, RNNs and LSTMs on the tasks of image classification, caption generation and speech recognition. On ImageNet, DSD improved the Top1 accuracy of GoogLeNet by 1.1%, VGG-16 by 4.3%, ResNet-18 by 1.2% and ResNet-50 by 1.1%, respectively. On the WSJ'93 dataset, DSD improved DeepSpeech and DeepSpeech2 WER by 2.0% and 1.1%. On the Flickr-8K dataset, DSD improved the NeuralTalk BLEU score by over 1.7. DSD is easy to use in practice: at training time, DSD incurs only one extra hyper-parameter: the sparsity ratio in the S step. At testing time, DSD doesn't change the network architecture or incur any inference overhead. The consistent and significant performance gain of DSD experiments shows the inadequacy of the current training methods for finding the best local optimum, while DSD effectively achieves superior optimization performance for finding a better solution. DSD models are available to download at https://songhan.github.io/DSD. △ Less

Submitted 21 February, 2017; v1 submitted 15 July, 2016; originally announced July 2016.

Comments: Published as a conference paper at ICLR 2017

arXiv:1512.02595 [pdf, other]

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Authors: Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh , et al. (9 additional authors not shown)

Abstract: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our app… ▽ More We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale. △ Less

Submitted 8 December, 2015; originally announced December 2015.

arXiv:1511.09032 [pdf, other]

doi 10.1016/j.nima.2015.12.050

Path to AWAKE: Evolution of the concept

Authors: A. Caldwell, E. Adli, L. Amorim, R. Apsimon, T. Argyropoulos, R. Assmann, A. -M. Bachmann, F. Batsch, J. Bauche, V. K. Berglyd Olsen, M. Bernardini, R. Bingham, B. Biskup, T. Bohl, C. Bracco, P. N. Burrows, G. Burt, B. Buttenschon, A. Butterworth, M. Cascella, S. Chattopadhyay, E. Chevallay, S. Cipiccia, H. Damerau, L. Deacon , et al. (96 additional authors not shown)

Abstract: This report describes the conceptual steps in reaching the design of the AWAKE experiment currently under construction at CERN. We start with an introduction to plasma wakefield acceleration and the motivation for using proton drivers. We then describe the self-modulation instability --- a key to an early realization of the concept. This is then followed by the historical development of the experi… ▽ More This report describes the conceptual steps in reaching the design of the AWAKE experiment currently under construction at CERN. We start with an introduction to plasma wakefield acceleration and the motivation for using proton drivers. We then describe the self-modulation instability --- a key to an early realization of the concept. This is then followed by the historical development of the experimental design, where the critical issues that arose and their solutions are described. We conclude with the design of the experiment as it is being realized at CERN and some words on the future outlook. A summary of the AWAKE design and construction status as presented in this conference is given in [1]. △ Less

Submitted 29 November, 2015; originally announced November 2015.

Comments: 15 pages, 24 figures, 1 table, 111 references, 121 author from 36 organizations

arXiv:1508.03192 [pdf]

doi 10.1016/j.nima.2015.10.005

The FLASHForward Facility at DESY

Authors: A. Aschikhin, C. Behrens, S. Bohlen, J. Dale, N. Delbos, L. di Lucchio, E. Elsen, J. -H. Erbe, M. Felber, B. Foster, L. Goldberg, J. Grebenyuk, J. -N. Gruse, B. Hidding, Zhanghu Hu, S. Karstensen, A. Knetsch, O. Kononenko, V. Libov, K. Ludwig, A. R. Maier, A. Martinez de la Ossa, T. Mehrling, C. A. J. Palmer, F. Pannek , et al. (13 additional authors not shown)

Abstract: The FLASHForward project at DESY is a pioneering plasma-wakefield acceleration experiment that aims to produce, in a few centimetres of ionised hydrogen, beams with energy of order GeV that are of quality sufficient to be used in a free-electron laser. The plasma wave will be driven by high-current density electron beams from the FLASH linear accelerator and will explore both external and internal… ▽ More The FLASHForward project at DESY is a pioneering plasma-wakefield acceleration experiment that aims to produce, in a few centimetres of ionised hydrogen, beams with energy of order GeV that are of quality sufficient to be used in a free-electron laser. The plasma wave will be driven by high-current density electron beams from the FLASH linear accelerator and will explore both external and internal witness-beam injection techniques. The plasma is created by ionising a gas in a gas cell with a multi-TW laser system, which can also be used to provide optical diagnostics of the plasma and electron beams due to the <30 fs synchronisation between the laser and the driving electron beam. The operation parameters of the experiment are discussed, as well as the scientific program. △ Less

Submitted 18 August, 2015; v1 submitted 13 August, 2015; originally announced August 2015.

Comments: 19 pages, 9 figures

Report number: DESY 15-143

arXiv:1412.5567 [pdf, other]

Deep Speech: Scaling up end-to-end speech recognition

Authors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng

Abstract: We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model backgroun… ▽ More We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems. △ Less

Submitted 19 December, 2014; v1 submitted 17 December, 2014; originally announced December 2014.

arXiv:1306.6353 [pdf]

The International Linear Collider Technical Design Report - Volume 3.I: Accelerator R&D in the Technical Design Phase

Authors: Chris Adolphsen, Maura Barone, Barry Barish, Karsten Buesser, Philip Burrows, John Carwardine, Jeffrey Clark, Hélène Mainaud Durand, Gerry Dugan, Eckhard Elsen, Atsushi Enomoto, Brian Foster, Shigeki Fukuda, Wei Gai, Martin Gastal, Rongli Geng, Camille Ginsburg, Susanna Guiducci, Mike Harrison, Hitoshi Hayano, Keith Kershaw, Kiyoshi Kubo, Victor Kuchler, Benno List, Wanming Liu , et al. (19 additional authors not shown)

Abstract: The International Linear Collider Technical Design Report (TDR) describes in four volumes the physics case and the design of a 500 GeV centre-of-mass energy linear electron-positron collider based on superconducting radio-frequency technology using Niobium cavities as the accelerating structures. The accelerator can be extended to 1 TeV and also run as a Higgs factory at around 250 GeV and on the… ▽ More The International Linear Collider Technical Design Report (TDR) describes in four volumes the physics case and the design of a 500 GeV centre-of-mass energy linear electron-positron collider based on superconducting radio-frequency technology using Niobium cavities as the accelerating structures. The accelerator can be extended to 1 TeV and also run as a Higgs factory at around 250 GeV and on the Z0 pole. A comprehensive value estimate of the accelerator is give, together with associated uncertainties. It is shown that no significant technical issues remain to be solved. Once a site is selected and the necessary site-dependent engineering is carried out, construction can begin immediately. The TDR also gives baseline documentation for two high-performance detectors that can share the ILC luminosity by being moved into and out of the beam line in a "push-pull" configuration. These detectors, ILD and SiD, are described in detail. They form the basis for a world-class experimental programme that promises to increase significantly our understanding of the fundamental processes that govern the evolution of the Universe. △ Less

Submitted 26 June, 2013; originally announced June 2013.

Comments: See also http://www.linearcollider.org/ILC/TDR . The full list of signatories is inside the Report

Report number: ILC-REPORT-2013-040; ANL-HEP-TR-13-20; BNL-100603-2013-IR; IRFU-13-59; CERN-ATS-2013-037; Cockcroft-13-10; CLNS 13/2085; DESY 13-062; FERMILAB TM-2554; IHEP-AC-ILC-2013-001; INFN-13-04/LNF; JAI-2013-001; JINR E9-2013-35; JLAB-R-2013-01; KEK Report 2013-1; KNU/CHEP-ILC-2013-1; LLNL-TR-635539; SLAC-R-1004; ILC-HiGrade-Report-2013-003

arXiv:1306.6328 [pdf]

The International Linear Collider Technical Design Report - Volume 3.II: Accelerator Baseline Design

Authors: Chris Adolphsen, Maura Barone, Barry Barish, Karsten Buesser, Philip Burrows, John Carwardine, Jeffrey Clark, Hélène Mainaud Durand, Gerry Dugan, Eckhard Elsen, Atsushi Enomoto, Brian Foster, Shigeki Fukuda, Wei Gai, Martin Gastal, Rongli Geng, Camille Ginsburg, Susanna Guiducci, Mike Harrison, Hitoshi Hayano, Keith Kershaw, Kiyoshi Kubo, Victor Kuchler, Benno List, Wanming Liu , et al. (19 additional authors not shown)

Abstract: The International Linear Collider Technical Design Report (TDR) describes in four volumes the physics case and the design of a 500 GeV centre-of-mass energy linear electron-positron collider based on superconducting radio-frequency technology using Niobium cavities as the accelerating structures. The accelerator can be extended to 1 TeV and also run as a Higgs factory at around 250 GeV and on the… ▽ More The International Linear Collider Technical Design Report (TDR) describes in four volumes the physics case and the design of a 500 GeV centre-of-mass energy linear electron-positron collider based on superconducting radio-frequency technology using Niobium cavities as the accelerating structures. The accelerator can be extended to 1 TeV and also run as a Higgs factory at around 250 GeV and on the Z0 pole. A comprehensive value estimate of the accelerator is give, together with associated uncertainties. It is shown that no significant technical issues remain to be solved. Once a site is selected and the necessary site-dependent engineering is carried out, construction can begin immediately. The TDR also gives baseline documentation for two high-performance detectors that can share the ILC luminosity by being moved into and out of the beam line in a "push-pull" configuration. These detectors, ILD and SiD, are described in detail. They form the basis for a world-class experimental programme that promises to increase significantly our understanding of the fundamental processes that govern the evolution of the Universe. △ Less

Submitted 26 June, 2013; originally announced June 2013.

Comments: See also http://www.linearcollider.org/ILC/TDR . The full list of signatories is inside the Report

Report number: ILC-REPORT-2013-040; ANL-HEP-TR-13-20; BNL-100603-2013-IR; IRFU-13-59; CERN-ATS-2013-037; Cockcroft-13-10; CLNS 13/2085; DESY 13-062; FERMILAB TM-2554; IHEP-AC-ILC-2013-001; INFN-13-04/LNF; JAI-2013-001; JINR E9-2013-35; JLAB-R-2013-01; KEK Report 2013-1; KNU/CHEP-ILC-2013-1; LLNL-TR-635539; SLAC-R-1004; ILC-HiGrade-Report-2013-003

arXiv:1207.1334 [pdf]

doi 10.1103/PhysRevSTAB.13.042801

Present status and first results of the final focus beam line at the KEK Accelerator Test Facility

Authors: P. Bambade, M. Alabau Pons, J. Amann, D. Angal-Kalinin, R. Apsimon, S. Araki, A. Aryshev, S. Bai, P. Bellomo, D. Bett, G. Blair, B. Bolzon, S. Boogert, G. Boorman, P. N. Burrows, G. Christian, P. Coe, B. Constance, Jean-Pierre Delahaye, L. Deacon, E. Elsen, A. Faus-Golfe, M. Fukuda, J. Gao, N. Geffroy , et al. (69 additional authors not shown)

Abstract: ATF2 is a final-focus test beam line which aims to focus the low emittance beam from the ATF damping ring to a vertical size of about 37 nm and to demonstrate nanometer level beam stability. Several advanced beam diagnostics and feedback tools are used. In December 2008, construction and installation were completed and beam commissioning started, supported by an international team of Asian, Europe… ▽ More ATF2 is a final-focus test beam line which aims to focus the low emittance beam from the ATF damping ring to a vertical size of about 37 nm and to demonstrate nanometer level beam stability. Several advanced beam diagnostics and feedback tools are used. In December 2008, construction and installation were completed and beam commissioning started, supported by an international team of Asian, European, and U.S. scientists. The present status and first results are described. △ Less

Submitted 5 July, 2012; originally announced July 2012.

Comments: 10 pp

Report number: FERMILAB-PUB-10-290-AD

Journal ref: Phys.Rev.ST Accel.Beams 13 (2010) 042801

arXiv:0806.0529 [pdf]

doi 10.1016/j.nima.2008.05.054

Simulation study of fast ion instability in the ILC damping ring and PETRA III

Authors: G. Xia, K. Ohmi, E. Elsen

Abstract: The fast ion instability is simulated in different gas pressures and fill patterns for the damping ring of the International Linear Collider (ILC) and PETRA III respectively. Beam size variation due to beta function and dispersion function change is taken into account. Feedback is also applied in the simulation. The fast ion instability is simulated in different gas pressures and fill patterns for the damping ring of the International Linear Collider (ILC) and PETRA III respectively. Beam size variation due to beta function and dispersion function change is taken into account. Feedback is also applied in the simulation. △ Less

Submitted 3 June, 2008; originally announced June 2008.

Comments: 11 pages, 2 tables, 16 figures

Journal ref: Nucl.Instrum.Meth.A593:183-187,2008

arXiv:0709.2248 [pdf, ps, other]

Update on Ion Studies

Authors: Guoxing Xia, Eckhard Elsen

Abstract: The effect of ions has received one of the highest priorities in R&D for the damping rings of the International Linear Collider(ILC). It is detrimental to the performance of the electron damping ring. In this note, an update concerning the ion studies for the ILC damping ring is given. We investigate the gap role and irregular fill pattern in the ring.The ion density reduction in different fills… ▽ More The effect of ions has received one of the highest priorities in R&D for the damping rings of the International Linear Collider(ILC). It is detrimental to the performance of the electron damping ring. In this note, an update concerning the ion studies for the ILC damping ring is given. We investigate the gap role and irregular fill pattern in the ring.The ion density reduction in different fills is calculated analytically. Simulation results are also presented. △ Less

Submitted 14 September, 2007; originally announced September 2007.

Comments: There are 6 pages, 9 figures and 1 table in this paper. It is for the proceedings of LCWS07 Workshop

Journal ref: ECONF C0705302:DR002,2007

arXiv:0706.3060 [pdf, ps, other]

N-Body Simulations on GPUs

Authors: Erich Elsen, V. Vishal, Mike Houston, Vijay Pande, Pat Hanrahan, Eric Darve

Abstract: Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this paper we show how graphics processors can be used for N-body simulations to obtain improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that… ▽ More Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this paper we show how graphics processors can be used for N-body simulations to obtain improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In some of the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs is comparable to specialized processors such as GRAPE-6A and MDGRAPE-3, but at a fraction of the cost. Furthermore, the wide availability of GPUs has significant implications for cluster computing and distributed computing efforts like Folding@Home. △ Less

Submitted 20 June, 2007; originally announced June 2007.

arXiv:hep-ph/0201085 [pdf, ps, other]

doi 10.1142/9789812778048_0067

Introduction to Diffraction and low x Dynamics

Authors: E. Elsen

Abstract: An attempt is made to illustrate the relation between low x process es and diffraction. ep scattering provides a unique laboratory, a single hadronic target probed by a point like lepton, where one can try to understand diffraction in terms of a colourless exchange in QCD. Low x processes eventually involve aspects of QCD which cannot be described perturbatively. The HERA inclusive measurements… ▽ More An attempt is made to illustrate the relation between low x process es and diffraction. ep scattering provides a unique laboratory, a single hadronic target probed by a point like lepton, where one can try to understand diffraction in terms of a colourless exchange in QCD. Low x processes eventually involve aspects of QCD which cannot be described perturbatively. The HERA inclusive measurements are examined in this respect and compared to results of the Tevatron. △ Less

Submitted 10 January, 2002; originally announced January 2002.

Comments: 8 pages, 7 figures in eps. talk given at XXXI International Symposium on Multiparticle Dynamics, Sep. 1-7, 2001, Datong China See http://ismd31 .ccnu.edu.cn/

arXiv:hep-ex/0104010 [pdf, ps, other]

doi 10.1109/23.958765

A Fast High Resolution Track Trigger for the H1 Experiment

Authors: A. Baird, E. Elsen, Y. H. Fleming, M. Kolander, S. Kolya, D. Meer, D. Mercer, J. Naumann, P. R. Newman, D. Sankey, A. Schoening, H. -C. Schultz-Coulon, Ch. Wissing

Abstract: After 2001 the upgraded ep collider HERA will provide an about five times higher luminosity for the two experiments H1 and ZEUS. In order to cope with the expected higher event rates the H1 collaboration is building a track based trigger system, the Fast Track Trigger (FTT). It will be integrated in the first three levels (L1-L3) of the H1 trigger scheme to provide higher selectivity for events… ▽ More After 2001 the upgraded ep collider HERA will provide an about five times higher luminosity for the two experiments H1 and ZEUS. In order to cope with the expected higher event rates the H1 collaboration is building a track based trigger system, the Fast Track Trigger (FTT). It will be integrated in the first three levels (L1-L3) of the H1 trigger scheme to provide higher selectivity for events with charged particles. The FTT will allow to reconstruct 3-dimensional tracks in the central drift chamber down to 100 MeV/c within the L2 latency of ~ 23 mus. To reach the necessary momentum resolution of ~ 5% (at 1 GeV/c) sophisticated reconstruction algorithms have to be implemented using high density Field Programmable Gate Arrays (FPGA) and their embedded Content Addressable Memories (CAM). The final track parameter optimization will be done using non-iterative fits implemented in DSPs. While at the first trigger level rough track information will be provided, at L2 tracks with high resolution are available to form trigger decisions on topological and other track based criteria like multiplicities and momenta. At the third trigger level a farm of commercial processor boards will be used to compute physics quantities such as invariant masses. △ Less

Submitted 6 April, 2001; originally announced April 2001.

Comments: 6 pages, 7 figures, submitted to TNS

Journal ref: IEEE Trans.Nucl.Sci.48:1276-1285,2001

arXiv:hep-ph/9610251 [pdf, ps, other]

Electroweak Physics at HERA: Introduction and Summary

Authors: R. J. Cashmore, E. Elsen, B. A. Kniehl, H. Spiesberger

Abstract: A high luminosity upgrade of HERA will allow the measurement of standard model parameters and the neutral current couplings of quarks. These results will have to be consistent with other precision measurements or indicate traces of new physics. The analysis of $W$ production will complement future results of LEP 2 and the Tevatron. We summarize the main results and conclusions obtained by the wo… ▽ More A high luminosity upgrade of HERA will allow the measurement of standard model parameters and the neutral current couplings of quarks. These results will have to be consistent with other precision measurements or indicate traces of new physics. The analysis of $W$ production will complement future results of LEP 2 and the Tevatron. We summarize the main results and conclusions obtained by the working group on Electroweak Physics concerning the potential of future experimentation at HERA. △ Less

Submitted 4 October, 1996; originally announced October 1996.

Comments: 12 pages (Latex), 4 figures (Postscript), to appear in the Proceedings of the Workshop on Future Physics at HERA. The complete report by the working group on electroweak physics at HERA is available from http://www.desy.de/~heraws96/proceedings

Report number: MPI/PhT/96-105

Showing 1–47 of 47 results for author: Elsen, E