(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 255 results for author: Dean, J

.
  1. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  2. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2401.09718  [pdf

    econ.GN

    AI and the Opportunity for Shared Prosperity: Lessons from the History of Technology and the Economy

    Authors: Guy Ben-Ishai, Jeff Dean, James Manyika, Ruth Porat, Hal Varian, Kent Walker

    Abstract: Recent progress in artificial intelligence (AI) marks a pivotal moment in human history. It presents the opportunity for machines to learn, adapt, and perform tasks that have the potential to assist people, from everyday activities to their most creative and ambitious projects. It also has the potential to help businesses and organizations harness knowledge, increase productivity, innovate, transf… ▽ More

    Submitted 1 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: 37 pages

  5. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  6. arXiv:2311.01135  [pdf, other

    cs.LG physics.chem-ph

    Generating QM1B with PySCF$_{\text{IPU}}$

    Authors: Alexander Mathiasen, Hatem Helal, Kerstin Klaser, Paul Balanca, Josef Dean, Carlo Luschi, Dominique Beaini, Andrew Fitzgibbon, Dominic Masters

    Abstract: The emergence of foundation models in Computer Vision and Natural Language Processing have resulted in immense progress on downstream tasks. This progress was enabled by datasets with billions of training examples. Similar benefits are yet to be unlocked for quantum chemistry, where the potential of deep learning is constrained by comparatively small datasets with 100k to 20M training examples. Th… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 15 pages, 7 figures. NeurIPS 2023 Track Datasets and Benchmarks

    ACM Class: I.2.6; J.2

  7. arXiv:2310.12954  [pdf, other

    quant-ph

    Single-Mode Squeezed Light Generation and Tomography with an Integrated Optical Parametric Oscillator

    Authors: Taewon Park, Hubert S. Stokowski, Vahid Ansari, Samuel Gyger, Kevin K. S. Multani, Oguz Tolga Celik, Alexander Y. Hwang, Devin J. Dean, Felix M. Mayor, Timothy P. McKenna, Martin M. Fejer, Amir H. Safavi-Naeini

    Abstract: Quantum optical technologies promise advances in sensing, computing, and communication. A key resource is squeezed light, where quantum noise is redistributed between optical quadratures. We introduce a monolithic, chip-scale platform that exploits the $χかい^{(2)}$ nonlinearity of a thin-film lithium niobate (TFLN) resonator device to efficiently generate squeezed states of light. Our system integrat… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 21 pages; 4 figures in main body, 8 supplementary figures

  8. arXiv:2310.04292  [pdf, other

    cs.LG

    Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

    Authors: Dominique Beaini, Shenyang Huang, Joao Alex Cunha, Zhiyi Li, Gabriela Moisescu-Pareja, Oleksandr Dymov, Samuel Maddrell-Mander, Callum McLean, Frederik Wenkel, Luis Müller, Jama Hussein Mohamud, Ali Parviz, Michael Craig, Michał Koziarski, Jiarui Lu, Zhaocheng Zhu, Cristian Gabellini, Kerstin Klaser, Josef Dean, Cas Wognum, Maciej Sypetkowski, Guillaume Rabusseau, Reihaneh Rabbany, Jian Tang, Christopher Morris , et al. (10 additional authors not shown)

    Abstract: Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by… ▽ More

    Submitted 18 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  9. arXiv:2307.16397  [pdf, other

    physics.optics physics.app-ph

    Arbitrary electro-optic bandwidth and frequency control in lithium niobate optical resonators

    Authors: Jason F. Herrmann, Devin J. Dean, Christopher J. Sarabalis, Vahid Ansari, Kevin Multani, E. Alex Wollack, Timothy P. McKenna, Jeremy D. Witmer, Amir H. Safavi-Naeini

    Abstract: In situ tunable photonic filters and memories are important for emerging quantum and classical optics technologies. However, most photonic devices have fixed resonances and bandwidths determined at the time of fabrication. Here we present an in situ tunable optical resonator on thin-film lithium niobate. By leveraging the linear electro-optic effect, we demonstrate widely tunable control over reso… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 22 pages, 11 figure, 2 tables

  10. arXiv:2307.04200  [pdf, other

    physics.optics quant-ph

    Integrated frequency-modulated optical parametric oscillator

    Authors: Hubert S. Stokowski, Devin J. Dean, Alexander Y. Hwang, Taewon Park, Oguz Tolga Celik, Marc Jankowski, Carsten Langrock, Vahid Ansari, Martin M. Fejer, Amir H. Safavi-Naeini

    Abstract: Optical frequency combs have revolutionized precision measurement, time-keeping, and molecular spectroscopy. A substantial effort has developed around "microcombs": integrating comb-generating technologies into compact, reliable photonic platforms. Current approaches for generating these microcombs involve either the electro-optic (EO) or Kerr mechanisms. Despite rapid progress, maintaining high e… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 8 pages, 4 figures main text; another 19 pages and 9 figures in methods and supplementary

  11. Many-objective Optimization via Voting for Elites

    Authors: Jackson Dean, Nick Cheney

    Abstract: Real-world problems are often comprised of many objectives and require solutions that carefully trade-off between them. Current approaches to many-objective optimization often require challenging assumptions, like knowledge of the importance/difficulty of objectives in a weighted-sum single-objective paradigm, or enormous populations to overcome the curse of dimensionality in multi-objective Paret… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: Accepted at the Genetic and Evolutionary Computation Conference 2023 complex systems track as a poster

  12. arXiv:2306.09360  [pdf, other

    nucl-ex hep-ex hep-ph nucl-th

    Strong Interaction Physics at the Luminosity Frontier with 22 GeV Electrons at Jefferson Lab

    Authors: A. Accardi, P. Achenbach, D. Adhikari, A. Afanasev, C. S. Akondi, N. Akopov, M. Albaladejo, H. Albataineh, M. Albrecht, B. Almeida-Zamora, M. Amaryan, D. Androić, W. Armstrong, D. S. Armstrong, M. Arratia, J. Arrington, A. Asaturyan, A. Austregesilo, H. Avagyan, T. Averett, C. Ayerbe Gayoso, A. Bacchetta, A. B. Balantekin, N. Baltzell, L. Barion , et al. (419 additional authors not shown)

    Abstract: This document presents the initial scientific case for upgrading the Continuous Electron Beam Accelerator Facility (CEBAF) at Jefferson Lab (JLab) to 22 GeV. It is the result of a community effort, incorporating insights from a series of workshops conducted between March 2022 and April 2023. With a track record of over 25 years in delivering the world's most intense and precise multi-GeV electron… ▽ More

    Submitted 24 August, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Updates to the list of authors; Preprint number changed from theory to experiment; Updates to sections 4 and 6, including additional figures

    Report number: JLAB-PHY-23-3840

  13. Design and analysis of an exactly divergence-free hybridized discontinuous Galerkin method for incompressible flows on meshes with quadrilateral cells

    Authors: Joseph P. Dean, Sander Rhebergen, Garth N. Wells

    Abstract: We generalise a hybridized discontinuous Galerkin method for incompressible flow problems to non-affine cells, showing that with a suitable element mapping the generalised method preserves a key invariance property that eludes most methods, namely that any irrotational component of the prescribed force is exactly balanced by the pressure gradient and does not affect the velocity field. This invari… ▽ More

    Submitted 26 September, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    MSC Class: 65F08; 65M15; 65N12; 65N30; 76D07

  14. arXiv:2306.00008  [pdf, other

    cs.LG cs.CL

    Brainformers: Trading Simplicity for Efficiency

    Authors: Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laudon, Jeff Dean

    Abstract: Transformers are central to recent successes in natural language processing and computer vision. Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. Using this in… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 May, 2023; originally announced June 2023.

  15. arXiv:2303.02579  [pdf, other

    hep-ph hep-ex nucl-ex nucl-th

    The Present and Future of QCD

    Authors: P. Achenbach, D. Adhikari, A. Afanasev, F. Afzal, C. A. Aidala, A. Al-bataineh, D. K. Almaalol, M. Amaryan, D. Androić, W. R. Armstrong, M. Arratia, J. Arrington, A. Asaturyan, E. C. Aschenauer, H. Atac, H. Avakian, T. Averett, C. Ayerbe Gayoso, X. Bai, K. N. Barish, N. Barnea, G. Basar, M. Battaglieri, A. A. Baty, I. Bautista , et al. (378 additional authors not shown)

    Abstract: This White Paper presents the community inputs and scientific conclusions from the Hot and Cold QCD Town Meeting that took place September 23-25, 2022 at MIT, as part of the Nuclear Science Advisory Committee (NSAC) 2023 Long Range Planning process. A total of 424 physicists registered for the meeting. The meeting highlighted progress in Quantum Chromodynamics (QCD) nuclear physics since the 2015… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

    Comments: QCD Town Meeting White Paper, as submitted to 2023 NSAC LRP committee on Feb. 28, 2023

    Journal ref: Nucl.Phys.A 1047 (2024) 122874

  16. arXiv:2302.02947  [pdf, other

    cs.LG

    GPS++: Reviving the Art of Message Passing for Molecular Property Prediction

    Authors: Dominic Masters, Josef Dean, Kerstin Klaser, Zhiyi Li, Sam Maddrell-Mander, Adam Sanders, Hatem Helal, Deniz Beker, Andrew Fitzgibbon, Shenyang Huang, Ladislav Rampášek, Dominique Beaini

    Abstract: We present GPS++, a hybrid Message Passing Neural Network / Graph Transformer model for molecular property prediction. Our model integrates a well-tuned local message passing component and biased global attention with other key ideas from prior literature to achieve state-of-the-art results on large-scale molecular dataset PCQM4Mv2. Through a thorough ablation study we highlight the impact of indi… ▽ More

    Submitted 12 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2212.02229

  17. arXiv:2212.09717  [pdf, other

    quant-ph physics.optics

    Integrated Quantum Optical Phase Sensor

    Authors: Hubert S. Stokowski, Timothy P. McKenna, Taewon Park, Alexander Y. Hwang, Devin J. Dean, Oguz Tolga Celik, Vahid Ansari, Martin M. Fejer, Amir H. Safavi-Naeini

    Abstract: The quantum noise of light fundamentally limits optical phase sensors. A semiclassical picture attributes this noise to the random arrival time of photons from a coherent light source such as a laser. An engineered source of squeezed states suppresses this noise and allows sensitivity beyond the standard quantum limit (SQL) for phase detection. Advanced gravitational wave detectors like LIGO have… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: 14 pages, 3+3 figures

  18. arXiv:2212.02229  [pdf, other

    q-bio.QM cs.LG

    GPS++: An Optimised Hybrid MPNN/Transformer for Molecular Property Prediction

    Authors: Dominic Masters, Josef Dean, Kerstin Klaser, Zhiyi Li, Sam Maddrell-Mander, Adam Sanders, Hatem Helal, Deniz Beker, Ladislav Rampášek, Dominique Beaini

    Abstract: This technical report presents GPS++, the first-place solution to the Open Graph Benchmark Large-Scale Challenge (OGB-LSC 2022) for the PCQM4Mv2 molecular property prediction task. Our approach implements several key principles from the prior literature. At its core our GPS++ method is a hybrid MPNN/Transformer model that incorporates 3D atom positions and an auxiliary denoising task. The effectiv… ▽ More

    Submitted 6 December, 2022; v1 submitted 18 November, 2022; originally announced December 2022.

  19. arXiv:2211.05102  [pdf, other

    cs.LG cs.CL

    Efficiently Scaling Transformer Inference

    Authors: Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean

    Abstract: We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths. Better understanding of the engineering tradeoffs for inference for large Transformer-based models is important as use cases of these models are growing rapidly throughout application areas. We develop a sim… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  20. arXiv:2210.11416  [pdf, other

    cs.LG cs.CL

    Scaling Instruction-Finetuned Language Models

    Authors: Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang , et al. (10 additional authors not shown)

    Abstract: Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects d… ▽ More

    Submitted 6 December, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Public checkpoints: https://huggingface.co/docs/transformers/model_doc/flan-t5

  21. arXiv:2209.01667  [pdf, other

    cs.LG cs.CL

    A Review of Sparse Expert Models in Deep Learning

    Authors: William Fedus, Jeff Dean, Barret Zoph

    Abstract: Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in deep learning. This class of architecture encompasses Mixture-of-Experts, Switch Transformers, Routing Networks, BASE layers, and others, all with the unifying idea that each example is acted on by a subset of the parameters. By doing so, the degree of sparsity decouples the parameter count from the compute… ▽ More

    Submitted 4 September, 2022; originally announced September 2022.

    Comments: 23 pages

  22. arXiv:2208.08719  [pdf, ps, other

    math.CT

    Computads for weak $ωおめが$-categories as an inductive type

    Authors: Christopher J. Dean, Eric Finster, Ioannis Markakis, David Reutter, Jamie Vicary

    Abstract: We give a new description of computads for weak globular $ωおめが$-categories by giving an explicit inductive definition of the free words. This yields a new understanding of computads, and allows a new definition of $ωおめが$-category that avoids the technology of globular operads. Our framework permits direct proofs of important results via structural induction, and we use this to give new proofs that every… ▽ More

    Submitted 20 March, 2024; v1 submitted 18 August, 2022; originally announced August 2022.

  23. arXiv:2207.04147  [pdf, other

    physics.optics physics.app-ph

    High quantum efficiency parametric amplification via hybridized nonlinear optics

    Authors: Noah Flemens, Dylan Heberle, Jiaoyang Zheng, Devin J. Dean, Connor Davis, Kevin Zawilski, Peter G. Schunemann, Jeffrey Moses

    Abstract: Parametric amplifiers have allowed breakthroughs in ultrafast, strong-field, and high-energy density laser science and are an essential tool for extending the frequency range of powerful emerging diode-pumped solid-state laser technology. However, their impact is limited by inherently low quantum efficiency due to nonuniform light extraction. Here we demonstrate a new type of parametric amplifier… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

  24. arXiv:2206.07682  [pdf, other

    cs.CL

    Emergent Abilities of Large Language Models

    Authors: Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

    Abstract: Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot… ▽ More

    Submitted 26 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Transactions on Machine Learning Research (TMLR), 2022

  25. arXiv:2205.12755  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems

    Authors: Andrea Gesmundo, Jeff Dean

    Abstract: Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer, a key feature of human learning. Though, state of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks. Also, continual learning, that adds the temporal aspect to multitask, is… ▽ More

    Submitted 15 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  26. arXiv:2205.10937  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    muNet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems

    Authors: Andrea Gesmundo, Jeff Dean

    Abstract: Most uses of machine learning today involve training a model from scratch for a particular task, or sometimes starting with a model pretrained on a related task and then fine-tuning on a downstream task. Both approaches offer limited knowledge transfer between different tasks, time-consuming human-driven customization to individual tasks and high computational costs especially when starting from r… ▽ More

    Submitted 25 May, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

  27. arXiv:2204.05149  [pdf

    cs.LG cs.AI cs.GL

    The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink

    Authors: David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, Jeff Dean

    Abstract: Machine Learning (ML) workloads have rapidly grown in importance, but raised concerns about their carbon footprint. Four best practices can reduce ML training energy by up to 100x and CO2 emissions up to 1000x. By following best practices, overall ML energy use (across research, development, and production) held steady at <15% of Google's total energy use for the past three years. If the whole ML… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

  28. arXiv:2204.02311  [pdf, other

    cs.CL

    PaLM: Scaling Language Modeling with Pathways

    Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin , et al. (42 additional authors not shown)

    Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  29. arXiv:2203.12533  [pdf, other

    cs.DC cs.LG

    Pathways: Asynchronous Distributed Dataflow for ML

    Authors: Paul Barham, Aakanksha Chowdhery, Jeff Dean, Sanjay Ghemawat, Steven Hand, Dan Hurt, Michael Isard, Hyeontaek Lim, Ruoming Pang, Sudip Roy, Brennan Saeta, Parker Schuh, Ryan Sepassi, Laurent El Shafey, Chandramohan A. Thekkath, Yonghui Wu

    Abstract: We present the design of a new large scale orchestration layer for accelerators. Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas, while retaining state of the art performance for current models. Pathways uses a sharded dataflow graph of asynchronous operators that consume and produce futures, and efficiently gang-schedules heterogeneous paral… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: MLSys 2022

  30. arXiv:2202.08906  [pdf, other

    cs.CL cs.LG

    ST-MoE: Designing Stable and Transferable Sparse Expert Models

    Authors: Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, William Fedus

    Abstract: Scale has opened new frontiers in natural language processing -- but at a high cost. In response, Mixture-of-Experts (MoE) and Switch Transformers have been proposed as an energy efficient path to even larger and more capable language models. But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine… ▽ More

    Submitted 29 April, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: 25 pages main text, 39 pages overall

  31. arXiv:2202.01105  [pdf, other

    nucl-th hep-lat hep-ph

    Nuclear Forces for Precision Nuclear Physics -- a collection of perspectives

    Authors: Ingo Tews, Zohreh Davoudi, Andreas Ekström, Jason D. Holt, Kevin Becker, Raúl Briceño, David J. Dean, William Detmold, Christian Drischler, Thomas Duguet, Evgeny Epelbaum, Ashot Gasparyan, Jambul Gegelia, Jeremy R. Green, Harald W. Grießhammer, Andrew D. Hanlon, Matthias Heinz, Heiko Hergert, Martin Hoferichter, Marc Illa, David Kekejian, Alejandro Kievsky, Sebastian König, Hermann Krebs, Kristina D. Launey , et al. (20 additional authors not shown)

    Abstract: This is a collection of perspective pieces contributed by the participants of the Institute of Nuclear Theory's Program on Nuclear Physics for Precision Nuclear Physics which was held virtually from April 19 to May 7, 2021. The collection represents the reflections of a vibrant and engaged community of researchers on the status of theoretical research in low-energy nuclear physics, the challenges… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: Perspective pieces of the virtual INT program 21-1b "Nuclear Forces for Precision Nuclear Physics", 107 pages

    Report number: INT-PUB-22-002, LA-UR-22-20419, UMD-PP-022-02, FERMILAB-PUB-22-090-T

    Journal ref: Few-Body Systems 63, 67 (2022)

  32. arXiv:2112.00239  [pdf, other

    cond-mat.mtrl-sci

    Interpretable Machine Learning for Materials Design

    Authors: James Dean, Matthias Scheffler, Thomas A. R. Purcell, Sergey V. Barabash, Rahul Bhowmik, Timur Bazhirov

    Abstract: Fueled by the widespread adoption of Machine Learning (ML) and the high-throughput screening of materials, the data-centric approach to materials design has asserted itself as a robust and powerful tool for the in-silico prediction of materials properties. When training models to predict material properties, researchers often face a difficult choice between a model's interpretability or its perfor… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

    Comments: 38 pages, 14 figures, 12 tables

  33. Femtojoule, femtosecond all-optical switching in lithium niobate nanophotonics

    Authors: Qiushi Guo, Ryoto Sekine, Luis Ledezma, Rajveer Nehra, Devin J. Dean, Arkadev Roy, Robert M. Gray, Saman Jahani, Alireza Marandi

    Abstract: Optical nonlinear functions are crucial for various applications in integrated photonics, such as all-optical information processing, photonic neural networks and on-chip ultrafast light sources. Due to the weak nonlinearities in most integrated photonic platforms, realizing optical nonlinear functions typically requires large driving energies in the picojoules level or beyond, thus imposing a bar… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

  34. arXiv:2104.10350  [pdf

    cs.LG cs.CY

    Carbon Emissions and Large Neural Network Training

    Authors: David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, Jeff Dean

    Abstract: The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refi… ▽ More

    Submitted 23 April, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

  35. Overscreening and Underscreening in Solid-Electrolyte Grain Boundary Space-Charge Layers

    Authors: Jacob M. Dean, Samuel W. Coles, William R. Saunders, Andrew R. McCluskey, Matthew J. Wolf, Alison B. Walker, Benjamin J. Morgan

    Abstract: Polycrystalline solids can exhibit material properties that differ significantly from those of equivalent single-crystal samples, in part, because of a spontaneous redistribution of mobile point defects into so-called space-charge regions adjacent to grain boundaries. The general analytical form of these space-charge regions is known only in the dilute limit, where defect-defect correlations can b… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Journal ref: Phys. Rev. Lett. 127, 135502 (2021)

  36. arXiv:2102.07364  [pdf, other

    cs.LG

    Intermediate Layer Optimization for Inverse Problems using Deep Generative Models

    Authors: Giannis Daras, Joseph Dean, Ajil Jalal, Alexandros G. Dimakis

    Abstract: We propose Intermediate Layer Optimization (ILO), a novel optimization algorithm for solving inverse problems with deep generative models. Instead of optimizing only over the initial latent code, we progressively change the input layer obtaining successively more expressive generators. To explore the higher dimensional spaces, our method searches for latent codes that lie within a small $l_1$ ball… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

  37. arXiv:2010.04116  [pdf, other

    cs.LG cs.AI

    Interlocking Backpropagation: Improving depthwise model-parallelism

    Authors: Aidan N. Gomez, Oscar Key, Kuba Perlin, Stephen Gou, Nick Frosst, Jeff Dean, Yarin Gal

    Abstract: The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism can suffer from poor resource utilisation, which leads to wa… ▽ More

    Submitted 7 July, 2022; v1 submitted 8 October, 2020; originally announced October 2020.

  38. arXiv:2008.02875  [pdf, other

    cond-mat.soft cond-mat.stat-mech physics.chem-ph physics.comp-ph

    Phase transitions on non-uniformly curved surfaces: Coupling between phase and location

    Authors: Jack O. Law, Jacob M. Dean, Mark A. Miller, Halim Kusumaatmaja

    Abstract: For particles confined to two dimensions, any curvature of the surface affects the structural, kinetic and thermodynamic properties of the system. If the curvature is non-uniform, an even richer range of behaviours can emerge. Using a combination of bespoke Monte Carlo, molecular dynamics and basin-hopping methods, we show that the stable states of attractive colloids confined to non-uniformly cur… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: 9 pages, 9 figures, accepted in Soft Matter, SI can be requested to the authors

  39. arXiv:2007.13261  [pdf

    cs.CY physics.soc-ph

    From climate change to pandemics: decision science can help scientists have impact

    Authors: Christopher M. Baker, Patricia T. Campbell, Iadine Chades, Angela J. Dean, Susan M. Hester, Matthew H. Holden, James M. McCaw, Jodie McVernon, Robert Moss, Freya M. Shearer, Hugh P. Possingham

    Abstract: Scientific knowledge and advances are a cornerstone of modern society. They improve our understanding of the world we live in and help us navigate global challenges including emerging infectious diseases, climate change and the biodiversity crisis. For any scientist, whether they work primarily in fundamental knowledge generation or in the applied sciences, it is important to understand how scienc… ▽ More

    Submitted 21 October, 2021; v1 submitted 26 July, 2020; originally announced July 2020.

    MSC Class: 00A69 (Primary); 92B05 (Secondary)

  40. arXiv:2005.14104  [pdf, ps, other

    math.CT cs.LO math.LO

    Globular Multicategories with Homomorphism Types

    Authors: Christopher J. Dean

    Abstract: We introduce a notion of globular multicategory with homomorphism types. These structures arise when organizing collections of "higher category-like" objects such as type theories with identity types. We show how these globular multicategories can be used to construct various weak higher categorical structures of types and terms.

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: 25 pages

  41. arXiv:2004.10746  [pdf, other

    cs.LG cs.AI

    Chip Placement with Deep Reinforcement Learning

    Authors: Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Sungmin Bae, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Anand Babu, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter, Jeff Dean

    Abstract: In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  42. arXiv:1912.01054  [pdf, other

    eess.IV cs.CV cs.LG

    The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 Challenge

    Authors: Nicholas Heller, Fabian Isensee, Klaus H. Maier-Hein, Xiaoshuai Hou, Chunmei Xie, Fengyi Li, Yang Nan, Guangrui Mu, Zhiyong Lin, Miofei Han, Guang Yao, Yaozong Gao, Yao Zhang, Yixin Wang, Feng Hou, Jiawei Yang, Guangwei Xiong, Jiang Tian, Cheng Zhong, Jun Ma, Jack Rickman, Joshua Dean, Bethany Stai, Resha Tejpaul, Makinna Oestreich , et al. (16 additional authors not shown)

    Abstract: There is a large body of literature linking anatomic and geometric characteristics of kidney tumors to perioperative and oncologic outcomes. Semantic segmentation of these tumors and their host kidneys is a promising tool for quantitatively characterizing these lesions, but its adoption is limited due to the manual effort required to produce high-quality 3D segmentations of these structures. Recen… ▽ More

    Submitted 7 August, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: 24 pages, 11 figures

  43. arXiv:1911.05289  [pdf

    cs.LG cs.AR stat.ML

    The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design

    Authors: Jeffrey Dean

    Abstract: The past decade has seen a remarkable series of advances in machine learning, and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas, including computer vision, speech recognition, language translation, and natural language understanding tasks. This paper is a companion paper to a keynote… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: Companion paper to accompany a keynote talk at ISSCC 2020

  44. arXiv:1910.00762  [pdf, other

    cs.LG stat.ML

    Accelerating Deep Learning by Focusing on the Biggest Losers

    Authors: Angela H. Jiang, Daniel L. -K. Wong, Giulio Zhou, David G. Andersen, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael Kozuch, Zachary C. Lipton, Padmanabhan Pillai

    Abstract: This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward pass to decide whether to use that example to compute gradients and update parameters, or to skip immediately to the next example. By reducing the number of co… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

  45. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  46. arXiv:1904.00445  [pdf, other

    q-bio.QM cs.LG stat.ML

    The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context, CT Semantic Segmentations, and Surgical Outcomes

    Authors: Nicholas Heller, Niranjan Sathianathen, Arveen Kalapara, Edward Walczak, Keenan Moore, Heather Kaluzniak, Joel Rosenberg, Paul Blake, Zachary Rengel, Makinna Oestreich, Joshua Dean, Michael Tradewell, Aneri Shah, Resha Tejpaul, Zachary Edgerton, Matthew Peterson, Shaneabbas Raza, Subodh Regmi, Nikolaos Papanikolopoulos, Christopher Weight

    Abstract: The morphometry of a kidney tumor revealed by contrast-enhanced Computed Tomography (CT) imaging is an important factor in clinical decision making surrounding the lesion's diagnosis and treatment. Quantitative study of the relationship between kidney tumor morphology and clinical outcomes is difficult due to data scarcity and the laborious nature of manually quantifying imaging predictors. Automa… ▽ More

    Submitted 15 March, 2020; v1 submitted 31 March, 2019; originally announced April 2019.

    Comments: 13 pages, 2 figures

  47. The Apache Point Observatory Galactic Evolution Experiment (APOGEE) Spectrographs

    Authors: J. C. Wilson, F. R. Hearty, M. F. Skrutskie, S. R. Majewski, J. A. Holtzman, D. Eisenstein, J. Gunn, B. Blank, C. Henderson, S. Smee, M. Nelson, D. Nidever, J. Arns, R. Barkhouser, J. Barr, S. Beland, M. A. Bershady, M. R. Blanton, S. Brunner, A. Burton, L. Carey, M. Carr, J. P. Colque, J. Crane, G. J. Damke , et al. (64 additional authors not shown)

    Abstract: We describe the design and performance of the near-infrared (1.51--1.70 micron), fiber-fed, multi-object (300 fibers), high resolution (R = lambda/delta lambda ~ 22,500) spectrograph built for the Apache Point Observatory Galactic Evolution Experiment (APOGEE). APOGEE is a survey of ~ 10^5 red giant stars that systematically sampled all Milky Way populations (bulge, disk, and halo) to study the Ga… ▽ More

    Submitted 3 February, 2019; originally announced February 2019.

    Comments: 81 pages, 67 figures, PASP, accepted

  48. arXiv:1812.00825  [pdf

    cs.CV cs.AI cs.LG

    Microscope 2.0: An Augmented Reality Microscope with Real-time Artificial Intelligence Integration

    Authors: Po-Hsuan Cameron Chen, Krishna Gadepalli, Robert MacDonald, Yun Liu, Kunal Nagpal, Timo Kohlberger, Jeffrey Dean, Greg S. Corrado, Jason D. Hipp, Martin C. Stumpe

    Abstract: The brightfield microscope is instrumental in the visual examination of both biological and physical samples at sub-millimeter scales. One key clinical application has been in cancer histopathology, where the microscopic assessment of the tissue samples is used for the diagnosis and staging of cancer and thus guides clinical therapy. However, the interpretation of these samples is inherently subje… ▽ More

    Submitted 4 December, 2018; v1 submitted 21 November, 2018; originally announced December 2018.

    Journal ref: Nature Medicine (2019)

  49. arXiv:1808.04347  [pdf, other

    math.PR

    Functional Large Deviations for Cox Processes and $Cox/G/\infty$ Queues, with a Biological Application

    Authors: Justin Dean, Ayalvadi Ganesh, Edward Crane

    Abstract: We consider an infinite-server queue into which customers arrive according to a Cox process and have independent service times with a general distribution. We prove a functional large deviations principle for the equilibrium queue length process. The model is motivated by a linear feed-forward gene regulatory network, in which the rate of protein synthesis is modulated by the number of RNA molecul… ▽ More

    Submitted 27 March, 2020; v1 submitted 13 August, 2018; originally announced August 2018.

    Comments: 36 pages, 2 figures, to appear in Annals of Applied Probability

  50. arXiv:1806.04618  [pdf, other

    cs.CV

    Imperfect Segmentation Labels: How Much Do They Matter?

    Authors: Nicholas Heller, Joshua Dean, Nikolaos Papanikolopoulos

    Abstract: Labeled datasets for semantic segmentation are imperfect, especially in medical imaging where borders are often subtle or ill-defined. Little work has been done to analyze the effect that label errors have on the performance of segmentation methodologies. Here we present a large-scale study of model performance in the presence of varying types and degrees of error in training data. We trained U-Ne… ▽ More

    Submitted 23 September, 2018; v1 submitted 12 June, 2018; originally announced June 2018.

    Comments: 9 pages, 3 figures, Accepted at MICCAI LABELS 2018