(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–16 of 16 results for author: Vicol, P

.
  1. arXiv:2309.17400  [pdf, other

    cs.CV cs.LG

    Directly Fine-Tuning Diffusion Models on Differentiable Rewards

    Authors: Kevin Clark, Paul Vicol, Kevin Swersky, David J Fleet

    Abstract: We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models. We first show that it is possible to backpropagate the reward function gradient through the full sampling procedure, and that doing so achieves strong performance on a variety of rewards, outperforming… ▽ More

    Submitted 21 June, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Published at ICLR 2024

  2. arXiv:2304.11153  [pdf, other

    cs.LG cs.NE stat.ML

    Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

    Authors: Paul Vicol, Zico Kolter, Kevin Swersky

    Abstract: We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single. Similarly to the recently-proposed Persistent Evolution Strategies (PES), ES-Single is unbiased, and overcomes chaos arising from recursive function applications by smoothing the meta-loss landscape. ES-Single samples a single perturbation per particle, that is kept fixed o… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  3. arXiv:2212.14032  [pdf, other

    cs.LG

    On Implicit Bias in Overparameterized Bilevel Optimization

    Authors: Paul Vicol, Jonathan Lorraine, Fabian Pedregosa, David Duvenaud, Roger Grosse

    Abstract: Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. In practice, often at least one of these sub-problems is overparameterized. In this case, there are many ways to choose among optima that achieve… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

    Comments: ICML 2022

  4. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  5. arXiv:2203.00089  [pdf, other

    cs.LG math.OC stat.ML

    Amortized Proximal Optimization

    Authors: Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse

    Abstract: We propose a framework for online meta-optimization of parameters that govern optimization, called Amortized Proximal Optimization (APO). We first interpret various existing neural network optimizers as approximate stochastic proximal point methods which trade off the current-batch loss with proximity terms in both function space and weight space. The idea behind APO is to amortize the minimizatio… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

    Comments: 37 pages, 30 figures

  6. arXiv:2112.14754  [pdf, other

    cs.LG cs.CV stat.ML

    Disentanglement and Generalization Under Correlation Shifts

    Authors: Christina M. Funke, Paul Vicol, Kuan-Chieh Wang, Matthias Kümmerer, Richard Zemel, Matthias Bethge

    Abstract: Correlations between factors of variation are prevalent in real-world data. Exploiting such correlations may increase predictive performance on noisy data; however, often correlations are not robust (e.g., they may change between domains, datasets, or applications) and models that exploit them do not generalize when correlations shift. Disentanglement methods aim to learn representations which cap… ▽ More

    Submitted 23 December, 2022; v1 submitted 29 December, 2021; originally announced December 2021.

    Comments: CoLLAs 2022

  7. arXiv:2112.14570  [pdf, other

    cs.GT cs.LG cs.MA

    Lyapunov Exponents for Diversity in Differentiable Games

    Authors: Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, Jakob Foerster

    Abstract: Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian ("ridges"). RR is designed for conservative gradient systems (i.e., settings involving a single loss function), where it branches at saddles - easy-to-find bifurcation points. We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method… ▽ More

    Submitted 24 December, 2021; originally announced December 2021.

    Comments: AAMAS2022, 24 pages

  8. arXiv:2112.13835  [pdf, other

    cs.LG stat.ML

    Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies

    Authors: Paul Vicol, Luke Metz, Jascha Sohl-Dickstein

    Abstract: Unrolled computation graphs arise in many scenarios, including training RNNs, tuning hyperparameters through unrolled optimization, and training learned optimizers. Current approaches to optimizing parameters in such computation graphs suffer from high variance gradients, bias, slow updates, or large memory usage. We introduce a method called Persistent Evolution Strategies (PES), which divides th… ▽ More

    Submitted 27 December, 2021; originally announced December 2021.

    Comments: ICML 2021

  9. arXiv:2102.08431  [pdf, other

    cs.LG cs.GT

    Complex Momentum for Optimization in Games

    Authors: Jonathan Lorraine, David Acuna, Paul Vicol, David Duvenaud

    Abstract: We generalize gradient descent with momentum for optimization in differentiable games to have complex-valued momentum. We give theoretical motivation for our method by proving convergence on bilinear zero-sum games for simultaneous and alternating updates. Our method gives real-valued parameter updates, making it a drop-in replacement for standard optimizers. We empirically demonstrate that comple… ▽ More

    Submitted 1 June, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

  10. arXiv:2006.09347  [pdf, other

    cs.LG stat.ML

    Understanding and Mitigating Exploding Inverses in Invertible Neural Networks

    Authors: Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger Grosse, Jörn-Henrik Jacobsen

    Abstract: Invertible neural networks (INNs) have been used to design generative models, implement memory-saving gradient computation, and solve inverse problems. In this work, we show that commonly-used INN architectures suffer from exploding inverses and are thus prone to becoming numerically non-invertible. Across a wide range of INN use-cases, we reveal failures including the non-applicability of the cha… ▽ More

    Submitted 24 December, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: AISTATS 2021

  11. arXiv:1911.02590  [pdf, other

    cs.LG stat.ML

    Optimizing Millions of Hyperparameters by Implicit Differentiation

    Authors: Jonathan Lorraine, Paul Vicol, David Duvenaud

    Abstract: We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations. We present results about the relationship between the IFT and differentiating through optimization, motivating our algorithm. We use the proposed approach to train modern network architectures with millions of weights an… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: Submitted to AISTATS 2020

  12. arXiv:1903.03088  [pdf, other

    cs.LG stat.ML

    Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions

    Authors: Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, Roger Grosse

    Abstract: Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases. We show how to construct scalable best-response a… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

    Comments: Published as a conference paper at ICLR 2019

  13. arXiv:1810.10999  [pdf, other

    cs.LG stat.ML

    Reversible Recurrent Neural Networks

    Authors: Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse

    Abstract: Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomp… ▽ More

    Submitted 25 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at NIPS 2018

  14. arXiv:1806.10317  [pdf, other

    cs.LG stat.ML

    Adversarial Distillation of Bayesian Neural Network Posteriors

    Authors: Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel

    Abstract: Bayesian neural networks (BNNs) allow us to reason about uncertainty in a principled way. Stochastic Gradient Langevin Dynamics (SGLD) enables efficient BNN learning by drawing samples from the BNN posterior using mini-batches. However, SGLD and its extensions require storage of many copies of the model parameters, a potentially prohibitive cost, especially for large neural networks. We propose a… ▽ More

    Submitted 27 June, 2018; originally announced June 2018.

    Comments: accepted at ICML 2018

  15. arXiv:1803.04386  [pdf, other

    cs.LG stat.ML

    Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

    Authors: Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

    Abstract: Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies. Unfortunately, due to the large number of weights, all the examples in a mini-batch typically share the same weight perturbation, thereby limiting the variance reduction effect of large mini-batches. We introduce flipout,… ▽ More

    Submitted 2 April, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

    Comments: Published as a conference paper at ICLR 2018

  16. arXiv:1712.06761  [pdf, other

    cs.CV

    MovieGraphs: Towards Understanding Human-Centric Situations from Videos

    Authors: Paul Vicol, Makarand Tapaswi, Lluis Castrejon, Sanja Fidler

    Abstract: There is growing interest in artificial intelligence to build socially intelligent robots. This requires machines to have the ability to "read" people's emotions, motivations, and other factors that affect behavior. Towards this goal, we introduce a novel dataset called MovieGraphs which provides detailed, graph-based annotations of social situations depicted in movie clips. Each graph consists of… ▽ More

    Submitted 15 April, 2018; v1 submitted 18 December, 2017; originally announced December 2017.

    Comments: Spotlight at CVPR 2018. Webpage: http://moviegraphs.cs.toronto.edu