-
Using Cooperative Game Theory to Prune Neural Networks
Authors:
Mauricio Diaz-Ortiz Jr,
Benjamin Kempinski,
Daphne Cornelisse,
Yoram Bachrach,
Tal Kachman
Abstract:
We show how solution concepts from cooperative game theory can be used to tackle the problem of pruning neural networks.
The ever-growing size of deep neural networks (DNNs) increases their performance, but also their computational requirements. We introduce a method called Game Theory Assisted Pruning (GTAP), which reduces the neural network's size while preserving its predictive accuracy. GTAP…
▽ More
We show how solution concepts from cooperative game theory can be used to tackle the problem of pruning neural networks.
The ever-growing size of deep neural networks (DNNs) increases their performance, but also their computational requirements. We introduce a method called Game Theory Assisted Pruning (GTAP), which reduces the neural network's size while preserving its predictive accuracy. GTAP is based on eliminating neurons in the network based on an estimation of their joint impact on the prediction quality through game theoretic solutions. Specifically, we use a power index akin to the Shapley value or Banzhaf index, tailored using a procedure similar to Dropout (commonly used to tackle overfitting problems in machine learning).
Empirical evaluation of both feedforward networks and convolutional neural networks shows that this method outperforms existing approaches in the achieved tradeoff between the number of parameters and model accuracy.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees
Authors:
Alexia Jolicoeur-Martineau,
Kilian Fatras,
Tal Kachman
Abstract:
Tabular data is hard to acquire and is subject to missing values. This paper introduces a novel approach for generating and imputing mixed-type (continuous and categorical) tabular data utilizing score-based diffusion and conditional flow matching. In contrast to prior methods that rely on neural networks to learn the score function or the vector field, we adopt XGBoost, a widely used Gradient-Boo…
▽ More
Tabular data is hard to acquire and is subject to missing values. This paper introduces a novel approach for generating and imputing mixed-type (continuous and categorical) tabular data utilizing score-based diffusion and conditional flow matching. In contrast to prior methods that rely on neural networks to learn the score function or the vector field, we adopt XGBoost, a widely used Gradient-Boosted Tree (GBT) technique. To test our method, we build one of the most extensive benchmarks for tabular data generation and imputation, containing 27 diverse datasets and 9 metrics. Through empirical evaluation across the benchmark, we demonstrate that our approach outperforms deep-learning generation methods in data generation tasks and remains competitive in data imputation. Notably, it can be trained in parallel using CPUs without requiring a GPU. Our Python and R code is available at https://github.com/SamsungSAILMontreal/ForestDiffusion.
△ Less
Submitted 19 February, 2024; v1 submitted 18 September, 2023;
originally announced September 2023.
-
Explainability Techniques for Chemical Language Models
Authors:
Stefan Hödl,
William Robinson,
Yoram Bachrach,
Wilhelm Huck,
Tal Kachman
Abstract:
Explainability techniques are crucial in gaining insights into the reasons behind the predictions of deep learning models, which have not yet been applied to chemical language models. We propose an explainable AI technique that attributes the importance of individual atoms towards the predictions made by these models. Our method backpropagates the relevance information towards the chemical input s…
▽ More
Explainability techniques are crucial in gaining insights into the reasons behind the predictions of deep learning models, which have not yet been applied to chemical language models. We propose an explainable AI technique that attributes the importance of individual atoms towards the predictions made by these models. Our method backpropagates the relevance information towards the chemical input string and visualizes the importance of individual atoms. We focus on self-attention Transformers operating on molecular string representations and leverage a pretrained encoder for finetuning. We showcase the method by predicting and visualizing solubility in water and organic solvents. We achieve competitive model performance while obtaining interpretable predictions, which we use to inspect the pretrained model.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Diffusion models with location-scale noise
Authors:
Alexia Jolicoeur-Martineau,
Kilian Fatras,
Ke Li,
Tal Kachman
Abstract:
Diffusion Models (DMs) are powerful generative models that add Gaussian noise to the data and learn to remove it. We wanted to determine which noise distribution (Gaussian or non-Gaussian) led to better generated data in DMs. Since DMs do not work by design with non-Gaussian noise, we built a framework that allows reversing a diffusion process with non-Gaussian location-scale noise. We use that fr…
▽ More
Diffusion Models (DMs) are powerful generative models that add Gaussian noise to the data and learn to remove it. We wanted to determine which noise distribution (Gaussian or non-Gaussian) led to better generated data in DMs. Since DMs do not work by design with non-Gaussian noise, we built a framework that allows reversing a diffusion process with non-Gaussian location-scale noise. We use that framework to show that the Gaussian distribution performs the best over a wide range of other distributions (Laplace, Uniform, t, Generalized-Gaussian).
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Charting the Topography of the Neural Network Landscape with Thermal-Like Noise
Authors:
Theo Jules,
Gal Brener,
Tal Kachman,
Noam Levi,
Yohai Bar-Sinai
Abstract:
The training of neural networks is a complex, high-dimensional, non-convex and noisy optimization problem whose theoretical understanding is interesting both from an applicative perspective and for fundamental reasons. A core challenge is to understand the geometry and topography of the landscape that guides the optimization. In this work, we employ standard Statistical Mechanics methods, namely,…
▽ More
The training of neural networks is a complex, high-dimensional, non-convex and noisy optimization problem whose theoretical understanding is interesting both from an applicative perspective and for fundamental reasons. A core challenge is to understand the geometry and topography of the landscape that guides the optimization. In this work, we employ standard Statistical Mechanics methods, namely, phase-space exploration using Langevin dynamics, to study this landscape for an over-parameterized fully connected network performing a classification task on random data. Analyzing the fluctuation statistics, in analogy to thermal dynamics at a constant temperature, we infer a clear geometric description of the low-loss region. We find that it is a low-dimensional manifold whose dimension can be readily obtained from the fluctuations. Furthermore, this dimension is controlled by the number of data points that reside near the classification decision boundary. Importantly, we find that a quadratic approximation of the loss near the minimum is fundamentally inadequate due to the exponential nature of the decision boundary and the flatness of the low-loss region. This causes the dynamics to sample regions with higher curvature at higher temperatures, while producing quadratic-like statistics at any given temperature. We explain this behavior by a simplified loss model which is analytically tractable and reproduces the observed fluctuation statistics.
△ Less
Submitted 18 April, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation
Authors:
Yannick Hogewind,
Thiago D. Simao,
Tal Kachman,
Nils Jansen
Abstract:
We address the problem of safe reinforcement learning from pixel observations. Inherent challenges in such settings are (1) a trade-off between reward optimization and adhering to safety constraints, (2) partial observability, and (3) high-dimensional observations. We formalize the problem in a constrained, partially observable Markov decision process framework, where an agent obtains distinct rew…
▽ More
We address the problem of safe reinforcement learning from pixel observations. Inherent challenges in such settings are (1) a trade-off between reward optimization and adhering to safety constraints, (2) partial observability, and (3) high-dimensional observations. We formalize the problem in a constrained, partially observable Markov decision process framework, where an agent obtains distinct reward and safety signals. To address the curse of dimensionality, we employ a novel safety critic using the stochastic latent actor-critic (SLAC) approach. The latent variable model predicts rewards and safety violations, and we use the safety critic to train safe policies. Using well-known benchmark environments, we demonstrate competitive performance over existing approaches with respects to computational requirements, final reward return, and satisfying the safety constraints.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members
Authors:
Daphne Cornelisse,
Thomas Rood,
Mateusz Malinowski,
Yoram Bachrach,
Tal Kachman
Abstract:
In many multi-agent settings, participants can form teams to achieve collective outcomes that may far surpass their individual capabilities. Measuring the relative contributions of agents and allocating them shares of the reward that promote long-lasting cooperation are difficult tasks. Cooperative game theory offers solution concepts identifying distribution schemes, such as the Shapley value, th…
▽ More
In many multi-agent settings, participants can form teams to achieve collective outcomes that may far surpass their individual capabilities. Measuring the relative contributions of agents and allocating them shares of the reward that promote long-lasting cooperation are difficult tasks. Cooperative game theory offers solution concepts identifying distribution schemes, such as the Shapley value, that fairly reflect the contribution of individuals to the performance of the team or the Core, which reduces the incentive of agents to abandon their team. Applications of such methods include identifying influential features and sharing the costs of joint ventures or team formation. Unfortunately, using these solutions requires tackling a computational barrier as they are hard to compute, even in restricted settings. In this work, we show how cooperative game-theoretic solutions can be distilled into a learned model by training neural networks to propose fair and stable payoff allocations. We show that our approach creates models that can generalize to games far from the training distribution and can predict solutions for more players than observed during training. An important application of our framework is Explainable AI: our approach can be used to speed-up Shapley value computations on many instances.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
Lyapunov Exponents for Diversity in Differentiable Games
Authors:
Jonathan Lorraine,
Paul Vicol,
Jack Parker-Holder,
Tal Kachman,
Luke Metz,
Jakob Foerster
Abstract:
Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian ("ridges"). RR is designed for conservative gradient systems (i.e., settings involving a single loss function), where it branches at saddles - easy-to-find bifurcation points. We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method…
▽ More
Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian ("ridges"). RR is designed for conservative gradient systems (i.e., settings involving a single loss function), where it branches at saddles - easy-to-find bifurcation points. We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method - denoted Generalized Ridge Rider (GRR) - for finding arbitrary bifurcation points. We give theoretical motivation for our method by leveraging machinery from the field of dynamical systems. We construct novel toy problems where we can visualize new phenomena while giving insight into high-dimensional problems of interest. Finally, we empirically evaluate our method by finding diverse solutions in the iterated prisoners' dilemma and relevant machine learning problems including generative adversarial networks.
△ Less
Submitted 24 December, 2021;
originally announced December 2021.
-
Gradients are Not All You Need
Authors:
Luke Metz,
C. Daniel Freeman,
Samuel S. Schoenholz,
Tal Kachman
Abstract:
Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics sim…
▽ More
Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics simulation to training learned optimizers. We trace this failure to the spectrum of the Jacobian of the system under study, and provide criteria for when a practitioner might expect this failure to spoil their differentiation based optimization algorithms.
△ Less
Submitted 20 January, 2022; v1 submitted 10 November, 2021;
originally announced November 2021.
-
Anomalous Diffusion: Fractional Brownian Motion vs. Fractional Ito Motion
Authors:
Iddo Eliazar,
Tal Kachman
Abstract:
Generalizing Brownian motion (BM), fractional Brownian motion (FBM) is a paradigmatic selfsimilar model for anomalous diffusion. Specifically, varying its Hurst exponent, FBM spans: sub-diffusion, regular diffusion, and super-diffusion. As BM, also FBM is a symmetric and Gaussian process, with a continuous trajectory, and with a stationary velocity. In contrast to BM, FBM is neither a Markov proce…
▽ More
Generalizing Brownian motion (BM), fractional Brownian motion (FBM) is a paradigmatic selfsimilar model for anomalous diffusion. Specifically, varying its Hurst exponent, FBM spans: sub-diffusion, regular diffusion, and super-diffusion. As BM, also FBM is a symmetric and Gaussian process, with a continuous trajectory, and with a stationary velocity. In contrast to BM, FBM is neither a Markov process nor a martingale, and its velocity is correlated. Based on a recent study of selfsimilar Ito diffusions, we explore an alternative selfsimilar model for anomalous diffusion: fractional Ito motion (FIM). The FIM model exhibits the same Hurst-exponent behavior as FBM, and it is also a symmetric process with a continuous trajectory. In sharp contrast to FBM, we show that FIM: is not a Gaussian process; is a Markov process; is a martingale; and its velocity is not stationary and is not correlated. On the one hand, FBM is hard to simulate, its analytic tractability is limited, and it generates only a Gaussian dissipation pattern. On the other hand, FIM is easy to simulate, it is analytically tractable, and it generates non-Gaussian dissipation patterns. Moreover, we show that FIM has an intimate linkage to diffusion in a logarithmic potential. With its compelling properties, FIM offers researchers and practitioners a highly workable analytic model for anomalous diffusion.
△ Less
Submitted 11 November, 2021; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Gotta Go Fast When Generating Data with Score-Based Models
Authors:
Alexia Jolicoeur-Martineau,
Ke Li,
Rémi Piché-Taillefer,
Tal Kachman,
Ioannis Mitliagkas
Abstract:
Score-based (denoising diffusion) generative models have recently gained a lot of success in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data to noise and generate data by reversing it (thereby going from noise to data). Unfortunately, current score-based models generate data very slowly due to the sheer number of score network evalua…
▽ More
Score-based (denoising diffusion) generative models have recently gained a lot of success in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data to noise and generate data by reversing it (thereby going from noise to data). Unfortunately, current score-based models generate data very slowly due to the sheer number of score network evaluations required by numerical SDE solvers.
In this work, we aim to accelerate this process by devising a more efficient SDE solver. Existing approaches rely on the Euler-Maruyama (EM) solver, which uses a fixed step size. We found that naively replacing it with other SDE solvers fares poorly - they either result in low-quality samples or become slower than EM. To get around this issue, we carefully devise an SDE solver with adaptive step sizes tailored to score-based generative models piece by piece. Our solver requires only two score function evaluations, rarely rejects samples, and leads to high-quality samples. Our approach generates data 2 to 10 times faster than EM while achieving better or equal sample quality. For high-resolution images, our method leads to significantly higher quality samples than all other methods tested. Our SDE solver has the benefit of requiring no step size tuning.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
Novel Uncertainty Framework for Deep Learning Ensembles
Authors:
Tal Kachman,
Michal Moshkovitz,
Michal Rosen-Zvi
Abstract:
Deep neural networks have become the default choice for many of the machine learning tasks such as classification and regression. Dropout, a method commonly used to improve the convergence of deep neural networks, generates an ensemble of thinned networks with extensive weight sharing. Recent studies that dropout can be viewed as an approximate variational inference in Gaussian processes, and used…
▽ More
Deep neural networks have become the default choice for many of the machine learning tasks such as classification and regression. Dropout, a method commonly used to improve the convergence of deep neural networks, generates an ensemble of thinned networks with extensive weight sharing. Recent studies that dropout can be viewed as an approximate variational inference in Gaussian processes, and used as a practical tool to obtain uncertainty estimates of the network. We propose a novel statistical mechanics based framework to dropout and use this framework to propose a new generic algorithm that focuses on estimates of the variance of the loss as measured by the ensemble of thinned networks. Our approach can be applied to a wide range of deep neural network architectures and machine learning tasks. In classification, this algorithm allows the generation of a don't-know answer to be generated, which can increase the reliability of the classifier. Empirically we demonstrate state-of-the-art AUC results on publicly available benchmarks.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Learning multiple non-mutually-exclusive tasks for improved classification of inherently ordered labels
Authors:
Vadim Ratner,
Yoel Shoshan,
Tal Kachman
Abstract:
Medical image classification involves thresholding of labels that represent malignancy risk levels. Usually, a task defines a single threshold, and when developing computer-aided diagnosis tools, a single network is trained per such threshold, e.g. as screening out healthy (very low risk) patients to leave possibly sick ones for further analysis (low threshold), or trying to find malignant cases a…
▽ More
Medical image classification involves thresholding of labels that represent malignancy risk levels. Usually, a task defines a single threshold, and when developing computer-aided diagnosis tools, a single network is trained per such threshold, e.g. as screening out healthy (very low risk) patients to leave possibly sick ones for further analysis (low threshold), or trying to find malignant cases among those marked as non-risk by the radiologist ("second reading", high threshold). We propose a way to rephrase the classification problem in a manner that yields several problems (corresponding to different thresholds) to be solved simultaneously. This allows the use of Multiple Task Learning (MTL) methods, significantly improving the performance of the original classifier, by facilitating effective extraction of information from existing data.
△ Less
Submitted 21 November, 2018; v1 submitted 30 May, 2018;
originally announced May 2018.
-
Numerical implementation of the multiscale and averaging methods for quasi periodic systems
Authors:
Tal Kachman,
Shmuel Fishman,
Avy Soffer
Abstract:
We consider the problem of numerically solving the Schrödinger equation with a potential that is quasi periodic in space and time. We introduce a numerical scheme based on a newly developed multi-time scale and averaging technique. We demonstrate that with this novel method we can solve efficiently and with rigorous control of the error such an equation for long times. A comparison with the standa…
▽ More
We consider the problem of numerically solving the Schrödinger equation with a potential that is quasi periodic in space and time. We introduce a numerical scheme based on a newly developed multi-time scale and averaging technique. We demonstrate that with this novel method we can solve efficiently and with rigorous control of the error such an equation for long times. A comparison with the standard split-step method shows substantial improvement in computation times, besides the controlled errors. We apply this method for a free particle driven by quasi-periodic potential with many frequencies. The new method makes it possible to evolve the Schrodinger equation for times much longer than was possible so far and to conclude that there are regimes where the energy growth stops in-spite of the driving.
△ Less
Submitted 25 July, 2016; v1 submitted 13 January, 2015;
originally announced January 2015.
-
Dynamics of a Classical Particle in a Quasi Periodic Potential
Authors:
Yaniv Tenenbaum Katan,
Tal Kachman,
Shmuel Fishman,
Avy Soffer
Abstract:
We study the dynamics of a one-dimensional classical particle in a space and time dependent potential with randomly chosen parameters. The focus of this work is a quasi-periodic potential, which only includes a finite number of Fourier components. The momentum is calculated analytically for short time within a self-consistent approximation, under certain conditions.
We find that the dynamics can…
▽ More
We study the dynamics of a one-dimensional classical particle in a space and time dependent potential with randomly chosen parameters. The focus of this work is a quasi-periodic potential, which only includes a finite number of Fourier components. The momentum is calculated analytically for short time within a self-consistent approximation, under certain conditions.
We find that the dynamics can be described by a model of a random walk between the Chirikov resonances, which are resonances between the particle momentum and the Fourier components of the potential. We use numerical methods to test these results and to evaluate the important properties, such as the characteristic hopping time between the resonances. This work sheds light on the short time dynamics induced by potentials which are relevant for optics and atom optics.
△ Less
Submitted 4 January, 2015; v1 submitted 22 May, 2014;
originally announced May 2014.
-
Computer vision-based recognition of liquid surfaces and phase boundaries in transparent vessels, with emphasis on chemistry applications
Authors:
Sagi Eppel,
Tal Kachman
Abstract:
The ability to recognize the liquid surface and the liquid level in transparent containers is perhaps the most commonly used evaluation method when dealing with fluids. Such recognition is essential in determining the liquid volume, fill level, phase boundaries and phase separation in various fluid systems. The recognition of liquid surfaces is particularly important in solution chemistry, where i…
▽ More
The ability to recognize the liquid surface and the liquid level in transparent containers is perhaps the most commonly used evaluation method when dealing with fluids. Such recognition is essential in determining the liquid volume, fill level, phase boundaries and phase separation in various fluid systems. The recognition of liquid surfaces is particularly important in solution chemistry, where it is essential to many laboratory techniques (e.g., extraction, distillation, titration). A general method for the recognition of interfaces between liquid and air or between phase-separating liquids could have a wide range of applications and contribute to the understanding of the visual properties of such interfaces. This work examines a computer vision method for the recognition of liquid surfaces and liquid levels in various transparent containers. The method can be applied to recognition of both liquid-air and liquid-liquid surfaces. No prior knowledge of the number of phases is required. The method receives the image of the liquid container and the boundaries of the container in the image and scans all possible curves that could correspond to the outlines of liquid surfaces in the image. The method then compares each curve to the image to rate its correspondence with the outline of the real liquid surface by examining various image properties in the area surrounding each point of the curve. The image properties that were found to give the best indication of the liquid surface are the relative intensity change, the edge density change and the gradient direction relative to the curve normal.
△ Less
Submitted 6 November, 2014; v1 submitted 28 April, 2014;
originally announced April 2014.