(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–27 of 27 results for author: Rae, J W

.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2306.04348  [pdf, other

    cond-mat.mes-hall

    Non-Hermitian Topological Magnonics

    Authors: Tao Yu, Ji Zou, Bowen Zeng, J. W. Rao, Ke Xia

    Abstract: Dissipation in mechanics, optics, acoustics, and electronic circuits is nowadays recognized to be not always detrimental but can be exploited to achieve non-Hermitian topological phases or properties with functionalities for potential device applications. As elementary excitations of ordered magnetic moments that exist in various magnetic materials, magnons are the information carriers in magnonic… ▽ More

    Submitted 9 November, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: 101 pages, 35 figures

  4. arXiv:2304.11948  [pdf

    cond-mat.mes-hall

    Perspective on non-Hermitian physics in magnetic systems

    Authors: Tao Yu, J. W. Rao

    Abstract: A perspective on non-Hermitian physics in magnetic systems is addressed in this short article, including exceptional points, exceptional nodal phases, the non-Hermitian SSH model, and the non-Hermitian skin effect.

    Submitted 23 August, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: 5 pages. Submitted as a section of Magnonic Roadmap 2024

  5. arXiv:2302.08904  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci nlin.AO physics.optics

    Coherent Microwave Emission of a Gain-Driven Polariton

    Authors: Bimu Yao, Y. S. Gui, J. W. Rao, Y. H. Zhang, Wei Lu, C. -M. Hu

    Abstract: By developing a gain-embedded cavity magnonics platform, we create gain-driven polariton (GDP) that is activated by an amplified electromagnetic field. Distinct effects of gain-driven light-matter interaction, such as polariton auto-oscillations, polariton phase singularity, self-selection of a polariton bright mode, and gain-induced magnon-photon synchronization, are theoretically studied and exp… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: 6 pages, 4 figures

  6. arXiv:2204.04590  [pdf, other

    cond-mat.mes-hall cond-mat.other

    Unveiling a Pump-Induced Magnon Mode via its Strong Interaction with Walker Modes

    Authors: J. W. Rao, Bimu Yao, C. Y. Wang, C. Zhang, Tao Yu, Wei Lu

    Abstract: We observe a power-dependent anticrossing of Walker spin-wave modes under microwave pumping when a ferrimagnet is placed in a microwave waveguide that does not support any discrete photon mode. We interpret this unexpected anticrossing as the generation of a pump-induced magnon mode that couples strongly to the Walker modes of the ferrimagnet. This anticrossing inherits an excellent tunability fro… ▽ More

    Submitted 5 August, 2023; v1 submitted 9 April, 2022; originally announced April 2022.

    Comments: 7 pages, 4 figures

  7. arXiv:2203.15556  [pdf, other

    cs.CL cs.LG

    Training Compute-Optimal Large Language Models

    Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre

    Abstract: We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

  8. arXiv:2112.11446  [pdf, other

    cs.CL cs.AI

    Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

    Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More

    Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: 120 pages

  9. arXiv:2112.04426  [pdf, other

    cs.CL cs.LG

    Improving language models by retrieving from trillions of tokens

    Authors: Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan , et al. (3 additional authors not shown)

    Abstract: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to d… ▽ More

    Submitted 7 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Fix incorrect reported numbers in Table 14

  10. arXiv:2106.03517  [pdf, other

    cs.LG stat.ML

    Top-KAST: Top-K Always Sparse Training

    Authors: Siddhant M. Jayakumar, Razvan Pascanu, Jack W. Rae, Simon Osindero, Erich Elsen

    Abstract: Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint. Unfortunately, most existing methods for inducing performant sparse models still entail the instantiation of dense parameters, or dense gradients in the backward-pass, dur… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Journal ref: Advances in Neural Information Processing Systems, 33, 20744-20754

  11. arXiv:2009.03950  [pdf, other

    cond-mat.mes-hall quant-ph

    Unconventional Singularity in Anti-Parity-Time Symmetric Cavity Magnonics

    Authors: Y. Yang, Yi-Pu Wang, J. W. Rao, Y. S. Gui, B. M. Yao, W. Lu, C. -M. Hu

    Abstract: By engineering an anti-parity-time (anti-PT) symmetric cavity magnonics system with precise eigenspace controllability, we observe two different singularities in the same system. One type of singularity, the exceptional point (EP), is produced by tuning the magnon damping. Between two EPs, the maximal coherent superposition of photon and magnon states is robustly sustained by the preserved anti-PT… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

    Comments: 6 pages, 4 figures

    Journal ref: Phys. Rev. Lett. 125, 147202 (2020)

  12. arXiv:2007.03356  [pdf, other

    cs.LG cs.CL stat.ML

    Do Transformers Need Deep Long-Range Memory

    Authors: Jack W. Rae, Ali Razavi

    Abstract: Deep attention models have advanced the modelling of sequential data across many domains. For language modelling in particular, the Transformer-XL -- a Transformer augmented with a long-range memory of past activations -- has been shown to be state-of-the-art across a variety of well-studied benchmarks. The Transformer-XL incorporates a long-range memory at every layer of the network, which render… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

    Comments: published at 58th Annual Meeting of the Association for Computational Linguistics. 6 pages, 4 figures, 1 table

  13. Electrical detection of magnon-photon interaction via auxiliary spin wave mode

    Authors: Peng-Chao Xu, J. W. Rao, Y. Wang, Y. S. Gui, John Q. Xiao, Xiaofeng Jin, C. -M. Hu

    Abstract: We report on the electrical detection of a hybrid magnon-photon system, which is comprised of a magnetic sample coupled to a planar cavity. While the uniform Kittel mode has the largest coupling strength among all the magnon modes, it only generates a modest voltage signal by means of inverse spin-Hall effect. We have found that the generated voltage can be significantly enhanced by introducing a… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Journal ref: Phys. Rev. B 102, 014453 (2020)

  14. arXiv:1912.05478  [pdf, other

    cond-mat.mes-hall quant-ph

    Travelling photons mediated interactions between a magnon mode and a cavity photon mode

    Authors: J. W. Rao, Y. P. Wang, Y. Yang, T. Yu, Y. S. Gui, X. L. Fan, D. S. Xue, C. -M. Hu

    Abstract: We systematically study the indirect interaction between a magnon mode and a cavity photon mode mediated by travelling photons of a waveguide. From a general Hamiltonian, we derive the effective coupling strength between two separated modes, and obtain the theoretical expression of system's transmission. Accordingly, we design an experimental set-up consisting of a shield cavity photon mode, micro… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

    Comments: 6 papges and 4 figures

    Journal ref: Phys. Rev. B 101, 064404 (2020)

  15. arXiv:1911.05507  [pdf, other

    cs.LG stat.ML

    Compressive Transformers for Long-Range Sequence Modelling

    Authors: Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap

    Abstract: We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory me… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: 19 pages, 6 figures, 10 tables

  16. arXiv:1910.06764  [pdf, other

    cs.LG cs.AI stat.ML

    Stabilizing Transformers for Reinforcement Learning

    Authors: Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell

    Abstract: Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer's ability to process long time horizons o… ▽ More

    Submitted 13 October, 2019; originally announced October 2019.

  17. arXiv:1910.02720  [pdf, other

    stat.ML cs.LG cs.NE

    Meta-Learning Deep Energy-Based Memory Models

    Authors: Sergey Bartunov, Jack W Rae, Simon Osindero, Timothy P Lillicrap

    Abstract: We study the problem of learning associative memory -- a system which is able to retrieve a remembered pattern based on its distorted or incomplete version. Attractor networks provide a sound model of associative memory: patterns are stored as attractors of the network dynamics and associative retrieval is performed by running the dynamics starting from a query pattern until it converges to an att… ▽ More

    Submitted 20 April, 2021; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: ICLR 2020

  18. arXiv:1909.12238  [pdf, other

    cs.AI cs.LG

    V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

    Authors: H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick

    Abstract: Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradie… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: * equal contribution

  19. arXiv:1908.07907  [pdf, other

    cond-mat.mes-hall quant-ph

    Nonreciprocity and Unidirectional Invisibility in Cavity Magnonics

    Authors: Yi-Pu Wang, J. W. Rao, Y. Yang, Peng-Chao Xu, Y. S. Gui, B. M. Yao, J. Q. You, C. -M. Hu

    Abstract: We reveal the cooperative effect of coherent and dissipative magnon-photon couplings in an open cavity magnonic system, which leads to nonreciprocity with a considerably large isolation ratio and flexible controllability. Furthermore, we discover unidirectional invisibility for microwave propagation, which appears at the zero-damping condition for hybrid magnon-photon modes. A simple model is deve… ▽ More

    Submitted 21 August, 2019; originally announced August 2019.

    Comments: 6 pages, 4 figures

  20. Cavity mediated dissipative coupling of distant magnetic moments: theory and experiment

    Authors: Peng-Chao Xu, J. W. Rao, Y. S. Gui, Xiaofeng Jin, C. -M. Hu

    Abstract: We investigate long-range coherent and dissipative coupling between two spatially separated magnets while both are coupled to a microwave cavity. A careful examination of the system shows that the indirect interaction between two magnon modes is dependent on their individual mechanisms of direct coupling to the cavity. If both magnon modes share the same form of coupling to the cavity (either cohe… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

    Journal ref: Phys. Rev. B 100, 094415 (2019)

  21. arXiv:1906.04304  [pdf, other

    cs.LG cs.DB cs.DS stat.ML

    Meta-Learning Neural Bloom Filters

    Authors: Jack W Rae, Sergey Bartunov, Timothy P Lillicrap

    Abstract: There has been a recent trend in training neural networks to replace data structures that have been crafted by hand, with an aim for faster execution, better accuracy, or greater compression. In this setting, a neural data structure is instantiated by training a network over many epochs of its inputs until convergence. In applications where inputs arrive at high throughput, or are ephemeral, train… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: International Conference on Machine Learning 2019

  22. arXiv:1902.06795  [pdf, other

    cond-mat.mes-hall

    Coherent control of magnon radiative damping with local photon states

    Authors: B. M. Yao, T. Yu, Y. S. Gui, J. W. Rao, Y. T. Zhao, W. Lu, C. -M. Hu

    Abstract: The collective excitation of ordered spins, known as spin waves or magnons, can in principle radiate by emitting travelling photons to an open system when decaying to the ground state. However, in contrast to the electric dipoles, magnetic dipoles contributed by magnons are more isolated from electromagnetic environment with negligible radiation in the vacuum, limiting their application in coheren… ▽ More

    Submitted 10 September, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

    Comments: 9+7 pages, 4+2 figures

  23. arXiv:1901.07633  [pdf, other

    cond-mat.mes-hall cond-mat.other physics.app-ph quant-ph

    Control of the magnon-photon level attraction in a planar cavity

    Authors: Y. Yang, J. W. Rao, Y. S. Gui, B. M. Yao, W. Lu, C. -M. Hu

    Abstract: A resistive coupling circuit is used to model the recently discovered dissipative coupling in a hybridized cavity photon-magnon system. With this model as a basis we have designed a planar cavity in which a controllable transition between level attraction and level repulsion can be achieved. This behaviour can be quantitatively understood using an LCR circuit model with a complex coupling strength… ▽ More

    Submitted 8 April, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

    Comments: 7 papges, 4 figures and 2 additional figures in the appendix

    Journal ref: Phys. Rev. Applied 11, 054023 (2019)

  24. Level Attraction Due to Dissipative Magnon-Photon Coupling

    Authors: M. Harder, Y. Yang, B. M. Yao, C. H. Yu, J. W. Rao, Y. S. Gui, R. L. Stamps, C. -M. Hu

    Abstract: We report dissipative magnon-photon coupling caused by cavity Lenz effect, where the magnons in a magnet induce a rf current in the cavity, leading to a cavity back action that impedes the magnetization dynamics. This effect is revealed in our experiment as level attraction with a coalescence of hybridized magnon-photon modes, which is distinctly different from level repulsion with mode anticrossi… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: 5 pages, 4 figures

  25. arXiv:1803.10049  [pdf, other

    cs.LG stat.ML

    Fast Parametric Learning with Activation Memorization

    Authors: Jack W Rae, Chris Dyer, Peter Dayan, Timothy P Lillicrap

    Abstract: Neural networks trained with backpropagation often struggle to identify classes that have been observed a small number of times. In applications where most class labels are rare, such as language modelling, this can become a performance bottleneck. One potential remedy is to augment the network with a fast-learning non-parametric model which stores recent activations and class labels into an exter… ▽ More

    Submitted 27 March, 2018; originally announced March 2018.

  26. arXiv:1802.10542  [pdf, other

    stat.ML cs.LG

    Memory-based Parameter Adaptation

    Authors: Pablo Sprechmann, Siddhant M. Jayakumar, Jack W. Rae, Alexander Pritzel, Adrià Puigdomènech Badia, Benigno Uria, Oriol Vinyals, Demis Hassabis, Razvan Pascanu, Charles Blundell

    Abstract: Deep neural networks have excelled on a wide range of problems, from vision to language and game playing. Neural networks very gradually incorporate information into weights as they process data, requiring very low learning rates. If the training distribution shifts, the network is slow to adapt, and when it does adapt, it typically performs badly on the training distribution before the shift. Our… ▽ More

    Submitted 28 February, 2018; originally announced February 2018.

    Comments: Published as a conference paper at ICLR 2018

  27. arXiv:1610.09027  [pdf, other

    cs.LG

    Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes

    Authors: Jack W Rae, Jonathan J Hunt, Tim Harley, Ivo Danihelka, Andrew Senior, Greg Wayne, Alex Graves, Timothy P Lillicrap

    Abstract: Neural networks augmented with external memory have the ability to learn algorithmic solutions to complex tasks. These models appear promising for applications such as language modeling and machine translation. However, they scale poorly in both space and time as the amount of memory grows --- limiting their applicability to real-world domains. Here, we present an end-to-end differentiable memory… ▽ More

    Submitted 27 October, 2016; originally announced October 2016.

    Comments: in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain