Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference
Authors:
Benjamin Hawks,
Javier Duarte,
Nicholas J. Fraser,
Alessandro Pappalardo,
Nhan Tran,
Yaman Umuroglu
Abstract:
Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. I…
▽ More
Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning, and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task. Further, quantization-aware pruning typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization. Surprisingly, while networks with different training configurations can have similar performance for the benchmark application, the information content in the network can vary significantly, affecting its generalizability.
△ Less
Submitted 19 July, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
Experimental setup and procedure for the measurement of the 7Be(n,α)α reaction at n_TOF
Authors:
L. Cosentino,
A. Musumarra,
M. Barbagallo,
A. Pappalardo,
N. Colonna,
L. Damone,
M. Piscopo,
P. Finocchiaro,
E. Maugeri,
S. Heinitz,
D. Schumann,
R. Dressler,
N. Kivel,
O. Aberle,
J. Andrzejewski,
L. Audouin,
M. Ayranov,
M. Bacak,
S. Barros,
J. Balibrea-Correa,
V. Beecares,
F. Becvar,
C. Beinrucker,
E. Berthoumieux,
J. Billowes
, et al. (107 additional authors not shown)
Abstract:
The newly built second experimental area EAR2 of the n_TOF spallation neutron source at CERN allows to perform (n, charged particles) experiments on short-lived highly radioactive targets. This paper describes a detection apparatus and the experimental procedure for the determination of the cross-section of the 7Be(n,α) reaction, which represents one of the focal points toward the solution of the…
▽ More
The newly built second experimental area EAR2 of the n_TOF spallation neutron source at CERN allows to perform (n, charged particles) experiments on short-lived highly radioactive targets. This paper describes a detection apparatus and the experimental procedure for the determination of the cross-section of the 7Be(n,α) reaction, which represents one of the focal points toward the solution of the cosmological Lithium abundance problem, and whose only measurement, at thermal energy, dates back to 1963. The apparently unsurmountable experimental difficulties stemming from the huge 7Be γ-activity, along with the lack of a suitable neutron beam facility, had so far prevented further measurements. The detection system is subject to considerable radiation damage, but is capable of disentangling the rare reaction signals from the very high background. This newly developed setup could likely be useful also to study other challenging reactions requiring the detectors to be installed directly in the neutron beam.
△ Less
Submitted 1 April, 2016;
originally announced April 2016.