-
On scalable oversight with weak LLMs judging strong LLMs
Authors:
Zachary Kenton,
Noah Y. Siegel,
János Kramár,
Jonah Brown-Cohen,
Samuel Albanie,
Jannis Bulian,
Rishabh Agarwal,
David Lindner,
Yunhao Tang,
Noah D. Goodman,
Rohin Shah
Abstract:
Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI a…
▽ More
Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale
Authors:
Keenon Werling,
Janelle Kaneda,
Alan Tan,
Rishi Agarwal,
Six Skov,
Tom Van Wouwe,
Scott Uhlrich,
Nicholas Bianco,
Carmichael Ong,
Antoine Falisse,
Shardul Sapkota,
Aidan Chandra,
Joshua Carter,
Ezio Preatoni,
Benjamin Fregly,
Jennifer Hicks,
Scott Delp,
C. Karen Liu
Abstract:
While reconstructing human poses in 3D from inexpensive sensors has advanced significantly in recent years, quantifying the dynamics of human motion, including the muscle-generated joint torques and external forces, remains a challenge. Prior attempts to estimate physics from reconstructed human poses have been hampered by a lack of datasets with high-quality pose and force data for a variety of m…
▽ More
While reconstructing human poses in 3D from inexpensive sensors has advanced significantly in recent years, quantifying the dynamics of human motion, including the muscle-generated joint torques and external forces, remains a challenge. Prior attempts to estimate physics from reconstructed human poses have been hampered by a lack of datasets with high-quality pose and force data for a variety of movements. We present the AddBiomechanics Dataset 1.0, which includes physically accurate human dynamics of 273 human subjects, over 70 hours of motion and force plate data, totaling more than 24 million frames. To construct this dataset, novel analytical methods were required, which are also reported here. We propose a benchmark for estimating human dynamics from motion using this dataset, and present several baseline results. The AddBiomechanics Dataset is publicly available at https://addbiomechanics.org/download_data.html.
△ Less
Submitted 16 May, 2024;
originally announced June 2024.
-
SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning
Authors:
Matthias Weissenbacher,
Rishabh Agarwal,
Yoshinobu Kawahara
Abstract:
An open challenge in reinforcement learning (RL) is the effective deployment of a trained policy to new or slightly different situations as well as semantically-similar environments. We introduce Symmetry-Invariant Transformer (SiT), a scalable vision transformer (ViT) that leverages both local and global data patterns in a self-supervised manner to improve generalisation. Central to our approach…
▽ More
An open challenge in reinforcement learning (RL) is the effective deployment of a trained policy to new or slightly different situations as well as semantically-similar environments. We introduce Symmetry-Invariant Transformer (SiT), a scalable vision transformer (ViT) that leverages both local and global data patterns in a self-supervised manner to improve generalisation. Central to our approach is Graph Symmetric Attention, which refines the traditional self-attention mechanism to preserve graph symmetries, resulting in invariant and equivariant latent representations. We showcase SiT's superior generalization over ViTs on MiniGrid and Procgen RL benchmarks, and its sample efficiency on Atari 100k and CIFAR10.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Strong Chirality Suppression in 1-D correlated Weyl Semimetal (TaSe4)2I
Authors:
Utkarsh Khandelwal,
Harshvardhan Jog,
Shupeng Xu,
Yicong Chen,
Kejian Qu,
Chengxi Zhao,
Eugene Mele,
Daniel P. Shoemaker,
Ritesh Agarwal
Abstract:
The interaction of light with correlated Weyl semimetals (WSMs) provides a unique platform for exploring non-equilibrium phases and fundamental properties such as chirality. Here, we investigate the structural chirality of (TaSe4)2I, a correlated WSM, under weak optical pumping using Circular Photogalvanic Effect (CPGE) measurements and Raman spectroscopy. Surprisingly, we find that there is a los…
▽ More
The interaction of light with correlated Weyl semimetals (WSMs) provides a unique platform for exploring non-equilibrium phases and fundamental properties such as chirality. Here, we investigate the structural chirality of (TaSe4)2I, a correlated WSM, under weak optical pumping using Circular Photogalvanic Effect (CPGE) measurements and Raman spectroscopy. Surprisingly, we find that there is a loss of chirality in (TaSe4)2I above a threshold light intensity. We suggest that the loss of chirality is due to an optically driven phase transition into an achiral structure distinct from the ground state. This structural transformation is supported by fluence-dependent Raman spectra, revealing a new peak at low pump fluences that disappears above the threshold fluence. The loss of chirality even at low optical powers suggests that the system quickly transitions into a non WSM phase, and also highlights the importance of considering light-induced structural interactions in understanding the behavior of correlated systems. These studies showcase that even low excitation powers can be used to control the properties of correlated topological systems, opening up new avenues for low power optical devices.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Object-Oriented Architecture: A Software Engineering-Inspired Shape Grammar for Durands Plates
Authors:
Rohan Agarwal
Abstract:
Addressing the challenge of modular architectural design, this study presents a novel approach through the implementation of a shape grammar system using functional and object-oriented programming principles from computer science. The focus lies on the modular generation of plates in the style of French Neoclassical architect Jean-Nicolas-Louis Durand, known for his modular rule-based method to ar…
▽ More
Addressing the challenge of modular architectural design, this study presents a novel approach through the implementation of a shape grammar system using functional and object-oriented programming principles from computer science. The focus lies on the modular generation of plates in the style of French Neoclassical architect Jean-Nicolas-Louis Durand, known for his modular rule-based method to architecture, demonstrating the system's capacity to articulate intricate architectural forms systematically. By leveraging computer programming principles, the proposed methodology allows for the creation of diverse designs while adhering to the inherent logic of Durand's original plates. The integration of Shape Machine allows a flexible framework for architects and designers, enabling the generation of complex structures in a modular fashion in existing CAD software. This research contributes to the exploration of computational tools in architectural design, offering a versatile solution for the synthesis of historically significant architectural elements.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Many-Shot In-Context Learning
Authors:
Rishabh Agarwal,
Avi Singh,
Lei M. Zhang,
Bernd Bohnet,
Luis Rosias,
Stephanie Chan,
Biao Zhang,
Ankesh Anand,
Zaheer Abbas,
Azade Nova,
John D. Co-Reyes,
Eric Chu,
Feryal Behbahani,
Aleksandra Faust,
Hugo Larochelle
Abstract:
Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative…
▽ More
Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases, can learn high-dimensional functions with numerical inputs, and performs comparably to fine-tuning. Our analysis also reveals the limitations of next-token prediction loss as an indicator of downstream ICL performance.
△ Less
Submitted 22 May, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis
Authors:
Rohit Agarwal,
Arijit Das,
Alexander Horsch,
Krishna Agarwal,
Dilip K. Prasad
Abstract:
The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss,…
▽ More
The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss, classify, evaluate, and compare the methodologies that are adept at modeling haphazard inputs, additionally providing the corresponding code implementations and their carbon footprint. Moreover, we classify the datasets related to the field of haphazard inputs and introduce evaluation metrics specifically designed for datasets exhibiting imbalance. The code of each methodology can be found at https://github.com/Rohit102497/HaphazardInputsReview
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Authors:
Jesse Farebrother,
Jordi Orbay,
Quan Vuong,
Adrien Ali Taïga,
Yevgen Chebotar,
Ted Xiao,
Alex Irpan,
Sergey Levine,
Pablo Samuel Castro,
Aleksandra Faust,
Aviral Kumar,
Rishabh Agarwal
Abstract:
Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast…
▽ More
Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Large Scale Generative AI Text Applied to Sports and Music
Authors:
Aaron Baughman,
Stephen Hammer,
Rahul Agarwal,
Gozde Akay,
Eduardo Morales,
Tony Johnson,
Leonid Karlinsky,
Rogerio Feris
Abstract:
We address the problem of scaling up the production of media content, including commentary and personalized news stories, for large-scale sports and music events worldwide. Our approach relies on generative AI models to transform a large volume of multimodal data (e.g., videos, articles, real-time scoring feeds, statistics, and fact sheets) into coherent and fluent text. Based on this approach, we…
▽ More
We address the problem of scaling up the production of media content, including commentary and personalized news stories, for large-scale sports and music events worldwide. Our approach relies on generative AI models to transform a large volume of multimodal data (e.g., videos, articles, real-time scoring feeds, statistics, and fact sheets) into coherent and fluent text. Based on this approach, we introduce, for the first time, an AI commentary system, which was deployed to produce automated narrations for highlight packages at the 2023 US Open, Wimbledon, and Masters tournaments. In the same vein, our solution was extended to create personalized content for ESPN Fantasy Football and stories about music artists for the Grammy awards. These applications were built using a common software architecture achieved a 15x speed improvement with an average Rouge-L of 82.00 and perplexity of 6.6. Our work was successfully deployed at the aforementioned events, supporting 90 million fans around the world with 8 billion page views, continuously pushing the bounds on what is possible at the intersection of sports, entertainment, and AI.
△ Less
Submitted 27 February, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Simple realization of a fragile topological lattice with quasi flat-bands in a microcavity array
Authors:
Yuhui Wang,
Shupeng Xu,
Liang Feng,
Ritesh Agarwal
Abstract:
Topological flat bands (TFBs) are increasingly recognized as an important paradigm to study topological effects in the context of strong correlation physics. As a representative example, recently it has been theoretically proposed that the topological non-triviality offers a unique contribution to flat-band superconductivity, which can potentially lead to a higher critical temperature of supercond…
▽ More
Topological flat bands (TFBs) are increasingly recognized as an important paradigm to study topological effects in the context of strong correlation physics. As a representative example, recently it has been theoretically proposed that the topological non-triviality offers a unique contribution to flat-band superconductivity, which can potentially lead to a higher critical temperature of superconductivity phase transition. Nevertheless, the topological effects within flat bands in bosonic systems, specifically in the context of Bose-Einstein condensation (BEC), are less explored. It has been shown theoretically that non-trivial topological and geometric properties will also have a significant influence in bosonic condensates as well. However, potential experimental realizations have not been extensively studied yet. In this work, we introduce a simple photonic lattice from coupled Kagome and triangular lattices designed based on topological quantum chemistry theory, which supports topologically nontrivial quasi-flat bands. Besides band representation analysis, the non-triviality of these quasi-flat bands is also confirmed by Wilson loop spectra which exhibit winding features. We further discuss the corresponding experimental realization in a microcavity array for future study supporting the potential extension to condensed exciton-polaritons. Notably, we showed that the inevitable in-plane longitudinal-transverse polarization splitting in optical microcavities will not hinder the construction of topological quasi-flat bands. This work acts as an initial step to experimentally explore the physical consequence of non-trivial topology and quantum geometry in quasi-flat bands in bosonic systems, offering potential channels for its direct observation.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Transformers Can Achieve Length Generalization But Not Robustly
Authors:
Yongchao Zhou,
Uri Alon,
Xinyun Chen,
Xuezhi Wang,
Rishabh Agarwal,
Denny Zhou
Abstract:
Length generalization, defined as the ability to extrapolate from shorter training sequences to longer test ones, is a significant challenge for language models. This issue persists even with large-scale Transformers handling relatively straightforward tasks. In this paper, we test the Transformer's ability of length generalization using the task of addition of two integers. We show that the succe…
▽ More
Length generalization, defined as the ability to extrapolate from shorter training sequences to longer test ones, is a significant challenge for language models. This issue persists even with large-scale Transformers handling relatively straightforward tasks. In this paper, we test the Transformer's ability of length generalization using the task of addition of two integers. We show that the success of length generalization is intricately linked to the data format and the type of position encoding. Using the right combination of data format and position encodings, we show for the first time that standard Transformers can extrapolate to a sequence length that is 2.5x the input length. Nevertheless, unlike in-distribution generalization, length generalization remains fragile, significantly influenced by factors like random weight initialization and training data order, leading to large variances across different random seeds.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
V-STaR: Training Verifiers for Self-Taught Reasoners
Authors:
Arian Hosseini,
Xingdi Yuan,
Nikolay Malkin,
Aaron Courville,
Alessandro Sordoni,
Rishabh Agarwal
Abstract:
Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this shortcoming…
▽ More
Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this shortcoming, we propose V-STaR that utilizes both the correct and incorrect solutions generated during the self-improvement process to train a verifier using DPO that judges correctness of model-generated solutions. This verifier is used at inference time to select one solution among many candidate solutions. Running V-STaR for multiple iterations results in progressively better reasoners and verifiers, delivering a 4% to 17% test accuracy improvement over existing self-improvement and verification approaches on common code generation and math reasoning benchmarks with LLaMA2 models.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Opto-twistronic Hall effect in a three-dimensional spiral lattice
Authors:
Zhurun Ji,
Yuzhou Zhao,
Yicong Chen,
Ziyan Zhu,
Yuhui Wang,
Wenjing Liu,
Gaurav Modi,
Eugene J. Mele,
Song Jin,
Ritesh Agarwal
Abstract:
Studies of moire systems have elucidated the exquisite effect of quantum geometry on the electronic bands and their properties, leading to the discovery of new correlated phases. However, most experimental studies have been confined to a few layers in the 2D limit. The extension of twistronics to its 3D limit, where the twist is extended into the third dimension between adjacent layers, remains un…
▽ More
Studies of moire systems have elucidated the exquisite effect of quantum geometry on the electronic bands and their properties, leading to the discovery of new correlated phases. However, most experimental studies have been confined to a few layers in the 2D limit. The extension of twistronics to its 3D limit, where the twist is extended into the third dimension between adjacent layers, remains underexplored due to the challenges in precisely stacking layers. Here, we focus on 3D twistronics on a platform of self-assembled spiral superlattice of multilayered WS2. Our findings reveal an opto-twistronic Hall effect in the spiral superlattice. This mesoscopic response is an experimental manifestation of the noncommutative geometry that arises when translational symmetry is replaced by a non-symmorphic screw operation. We also discover signatures of altered laws of optical excitation, manifested as an unconventional photon momentum-lattice interaction owing to moire of moire modulations in the 3D twistronic system. Crucially, our findings mark the initial identification of higher-order quantum geometrical tensors in light-matter interactions. This breakthrough opens new avenues for designing quantum materials-based optical lattices with large nonlinearities, paving the way for the development of advanced quantum nanophotonic devices.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Authors:
Avi Singh,
John D. Co-Reyes,
Rishabh Agarwal,
Ankesh Anand,
Piyush Patil,
Xavier Garcia,
Peter J. Liu,
James Harrison,
Jaehoon Lee,
Kelvin Xu,
Aaron Parisi,
Abhishek Kumar,
Alex Alemi,
Alex Rizkowsky,
Azade Nova,
Ben Adlam,
Bernd Bohnet,
Gamaleldin Elsayed,
Hanie Sedghi,
Igor Mordatch,
Isabelle Simpson,
Izzeddin Gur,
Jasper Snoek,
Jeffrey Pennington,
Jiri Hron
, et al. (16 additional authors not shown)
Abstract:
Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investig…
▽ More
Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST$^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.
△ Less
Submitted 17 April, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Authors:
Max Schwarzer,
Jesse Farebrother,
Joshua Greaves,
Ekin Dogus Cubuk,
Rishabh Agarwal,
Aaron Courville,
Marc G. Bellemare,
Sergei Kalinin,
Igor Mordatch,
Pablo Samuel Castro,
Kevin M. Roccapriore
Abstract:
We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural n…
▽ More
We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Existence and multiplicity for fractional Dirichlet problem with $γ(ξ)$-Laplacian equation and Nehari manifold
Authors:
J. Vanterler da C. Sousa,
D. S. Oliveira,
Ravi P. Agarwal
Abstract:
This paper is divided in two parts. In the first part, we prove coercivity results and minimization of the Euler energy functional. In the second part, we focus on the existence and multiplicity of a positive solution of fractional Dirichlet problem involving the $γ(ξ)$-Laplacian equation with non-negative weight functions in $\mathcal{H}^{α,β;χ}_{γ(ξ)}(Λ,\mathbb{R})$ using some variational techni…
▽ More
This paper is divided in two parts. In the first part, we prove coercivity results and minimization of the Euler energy functional. In the second part, we focus on the existence and multiplicity of a positive solution of fractional Dirichlet problem involving the $γ(ξ)$-Laplacian equation with non-negative weight functions in $\mathcal{H}^{α,β;χ}_{γ(ξ)}(Λ,\mathbb{R})$ using some variational techniques and Nehari manifold.
△ Less
Submitted 3 October, 2023;
originally announced November 2023.
-
EELBERT: Tiny Models through Dynamic Embeddings
Authors:
Gabrielle Cohn,
Rishika Agarwal,
Deepanshu Gupta,
Siddharth Patwardhan
Abstract:
We introduce EELBERT, an approach for compression of transformer-based models (e.g., BERT), with minimal impact on the accuracy of downstream tasks. This is achieved by replacing the input embedding layer of the model with dynamic, i.e. on-the-fly, embedding computations. Since the input embedding layer accounts for a significant fraction of the model size, especially for the smaller BERT variants…
▽ More
We introduce EELBERT, an approach for compression of transformer-based models (e.g., BERT), with minimal impact on the accuracy of downstream tasks. This is achieved by replacing the input embedding layer of the model with dynamic, i.e. on-the-fly, embedding computations. Since the input embedding layer accounts for a significant fraction of the model size, especially for the smaller BERT variants, replacing this layer with an embedding computation function helps us reduce the model size significantly. Empirical evaluation on the GLUE benchmark shows that our BERT variants (EELBERT) suffer minimal regression compared to the traditional BERT models. Through this approach, we are able to develop our smallest model UNO-EELBERT, which achieves a GLUE score within 4% of fully trained BERT-tiny, while being 15x smaller (1.2 MB) in size.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research
Authors:
Cole Gulino,
Justin Fu,
Wenjie Luo,
George Tucker,
Eli Bronstein,
Yiren Lu,
Jean Harb,
Xinlei Pan,
Yan Wang,
Xiangyu Chen,
John D. Co-Reyes,
Rishabh Agarwal,
Rebecca Roelofs,
Yao Lu,
Nico Montali,
Paul Mougin,
Zoey Yang,
Brandyn White,
Aleksandra Faust,
Rowan McAllister,
Dragomir Anguelov,
Benjamin Sapp
Abstract:
Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simul…
▽ More
Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Authors:
Yongchao Zhou,
Kaifeng Lyu,
Ankit Singh Rawat,
Aditya Krishna Menon,
Afshin Rostamizadeh,
Sanjiv Kumar,
Jean-François Kagy,
Rishabh Agarwal
Abstract:
Speculative decoding (SD) accelerates large language model inference by employing a faster draft model for generating multiple tokens, which are then verified in parallel by the larger target model, resulting in the text generated according to the target model distribution. However, identifying a compact draft model that is well-aligned with the target model is challenging. To tackle this issue, w…
▽ More
Speculative decoding (SD) accelerates large language model inference by employing a faster draft model for generating multiple tokens, which are then verified in parallel by the larger target model, resulting in the text generated according to the target model distribution. However, identifying a compact draft model that is well-aligned with the target model is challenging. To tackle this issue, we propose DistillSpec that uses knowledge distillation to better align the draft model with the target model, before applying SD. DistillSpec makes two key design choices, which we demonstrate via systematic study to be crucial to improving the draft and target alignment: utilizing on-policy data generation from the draft model, and tailoring the divergence function to the task and decoding strategy. Notably, DistillSpec yields impressive 10 - 45% speedups over standard SD on a range of standard benchmarks, using both greedy and non-greedy sampling. Furthermore, we combine DistillSpec with lossy SD to achieve fine-grained control over the latency vs. task performance trade-off. Finally, in practical scenarios with models of varying sizes, first using distillation to boost the performance of the target model and then applying DistillSpec to train a well-aligned draft model can reduce decoding latency by 6-10x with minimal performance drop, compared to standard decoding without distillation.
△ Less
Submitted 30 March, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Uncertainty principles associated with the short time quaternion coupled fractional Fourier transform
Authors:
Bivek Gupta,
Amit K. Verma,
Ravi P. Agarwal
Abstract:
In this paper, we extend the coupled fractional Fourier transform of a complex valued functions to that of the quaternion valued functions on $\mathbb{R}^4$ and call it the quaternion coupled fractional Fourier transform (QCFrFT). We obtain the sharp Hausdorff-Young inequality for QCFrFT and obtain the associated Rènyi uncertainty principle. We also define the short time quaternion coupled fractio…
▽ More
In this paper, we extend the coupled fractional Fourier transform of a complex valued functions to that of the quaternion valued functions on $\mathbb{R}^4$ and call it the quaternion coupled fractional Fourier transform (QCFrFT). We obtain the sharp Hausdorff-Young inequality for QCFrFT and obtain the associated Rènyi uncertainty principle. We also define the short time quaternion coupled fractional Fourier transform (STQCFrFT) and explore its important properties followed by the Lieb's and entropy uncertainty principles.
△ Less
Submitted 3 July, 2023;
originally announced September 2023.
-
Modelling Irregularly Sampled Time Series Without Imputation
Authors:
Rohit Agarwal,
Aman Sinha,
Dilip K. Prasad,
Marianne Clausel,
Alexander Horsch,
Mathieu Constant,
Xavier Coubez
Abstract:
Modelling irregularly-sampled time series (ISTS) is challenging because of missing values. Most existing methods focus on handling ISTS by converting irregularly sampled data into regularly sampled data via imputation. These models assume an underlying missing mechanism leading to unwanted bias and sub-optimal performance. We present SLAN (Switch LSTM Aggregate Network), which utilizes a pack of L…
▽ More
Modelling irregularly-sampled time series (ISTS) is challenging because of missing values. Most existing methods focus on handling ISTS by converting irregularly sampled data into regularly sampled data via imputation. These models assume an underlying missing mechanism leading to unwanted bias and sub-optimal performance. We present SLAN (Switch LSTM Aggregate Network), which utilizes a pack of LSTMs to model ISTS without imputation, eliminating the assumption of any underlying process. It dynamically adapts its architecture on the fly based on the measured sensors. SLAN exploits the irregularity information to capture each sensor's local summary explicitly and maintains a global summary state throughout the observational period. We demonstrate the efficacy of SLAN on publicly available datasets, namely, MIMIC-III, Physionet 2012 and Physionet 2019. The code is available at https://github.com/Rohit102497/SLAN.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Linking Symptom Inventories using Semantic Textual Similarity
Authors:
Eamonn Kennedy,
Shashank Vadlamani,
Hannah M Lindsey,
Kelly S Peterson,
Kristen Dams OConnor,
Kenton Murray,
Ronak Agarwal,
Houshang H Amiri,
Raeda K Andersen,
Talin Babikian,
David A Baron,
Erin D Bigler,
Karen Caeyenberghs,
Lisa Delano-Wood,
Seth G Disner,
Ekaterina Dobryakova,
Blessen C Eapen,
Rachel M Edelstein,
Carrie Esopenko,
Helen M Genova,
Elbert Geuze,
Naomi J Goodrich-Hunsaker,
Jordan Grafman,
Asta K Haberg,
Cooper B Hodges
, et al. (57 additional authors not shown)
Abstract:
An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores…
▽ More
An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores across previously incongruous symptom inventories. We tested the ability of four pre-trained STS models to screen thousands of symptom description pairs for related content - a challenging task typically requiring expert panels. Models were tasked to predict symptom severity across four different inventories for 6,607 participants drawn from 16 international data sources. The STS approach achieved 74.8% accuracy across five tasks, outperforming other models tested. This work suggests that incorporating contextual, semantic information can assist expert decision-making processes, yielding gains for both general and disease-specific clinical assessment.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
A Controllable Co-Creative Agent for Game System Design
Authors:
Rohan Agarwal,
Zhiyu Lin,
Mark Riedl
Abstract:
Many advancements have been made in procedural content generation for games, and with mixed-initiative co-creativity, have the potential for great benefits to human designers. However, co-creative systems for game generation are typically limited to specific genres, rules, or games, limiting the creativity of the designer. We seek to model games abstractly enough to apply to any genre, focusing on…
▽ More
Many advancements have been made in procedural content generation for games, and with mixed-initiative co-creativity, have the potential for great benefits to human designers. However, co-creative systems for game generation are typically limited to specific genres, rules, or games, limiting the creativity of the designer. We seek to model games abstractly enough to apply to any genre, focusing on designing game systems and mechanics, and create a controllable, co-creative agent that can collaborate on these designs. We present a model of games using state-machine-like components and resource flows, a set of controllable metrics, a design evaluator simulating playthroughs with these metrics, and an evolutionary design balancer and generator. We find this system to be both able to express a wide range of games and able to be human-controllable for future co-creative applications.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
Authors:
Rishabh Agarwal,
Nino Vieillard,
Yongchao Zhou,
Piotr Stanczyk,
Sabela Ramos,
Matthieu Geist,
Olivier Bachem
Abstract:
Knowledge distillation (KD) is widely used for compressing a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, current KD methods for auto-regressive sequence models suffer from distribution mismatch between output sequences seen during training and those generated by the student during inference. To address this issue, we introduce Gene…
▽ More
Knowledge distillation (KD) is widely used for compressing a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, current KD methods for auto-regressive sequence models suffer from distribution mismatch between output sequences seen during training and those generated by the student during inference. To address this issue, we introduce Generalized Knowledge Distillation (GKD). Instead of solely relying on a fixed set of output sequences, GKD trains the student on its self-generated output sequences by leveraging feedback from the teacher on such sequences. Unlike supervised KD approaches, GKD also offers the flexibility to employ alternative loss functions between the student and teacher, which can be useful when the student lacks the expressivity to mimic the teacher's distribution. Furthermore, GKD facilitates the seamless integration of distillation with RL fine-tuning (RLHF). We demonstrate the efficacy of GKD for distilling auto-regressive language models on summarization, translation, and arithmetic reasoning tasks, and task-agnostic distillation for instruction-tuning.
△ Less
Submitted 16 January, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
Bootstrapped Representations in Reinforcement Learning
Authors:
Charline Le Lan,
Stephen Tu,
Mark Rowland,
Anna Harutyunyan,
Rishabh Agarwal,
Marc G. Bellemare,
Will Dabney
Abstract:
In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated i…
▽ More
In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated into the learning process and help shape the learnt state representation. Bootstrapping methods are today's method of choice to make these additional predictions. Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). Surprisingly, we find that this representation differs from the features learned by Monte Carlo and residual gradient algorithms for most transition structures of the environment in the policy evaluation setting. We describe the efficacy of these representations for policy evaluation, and use our theoretical analysis to design new auxiliary learning rules. We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al, 1999) and Mountain Car (Moore, 1990).
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Taxonomy of hybridly polarized Stokes vortex beams
Authors:
Gauri Arora,
Ankit Butola,
Ruchi Rajput,
Rohit Agarwal,
Krishna Agarwal,
Alexander Horsch,
Dilip K Prasad,
Paramasivam Senthilkumaran
Abstract:
Structured beams carrying topological defects, namely phase and Stokes singularities, have gained extensive interest in numerous areas of optics. The non-separable spin and orbital angular momentum states of hybridly polarized Stokes singular beams provide additional freedom for manipulating optical fields. However, the characterization of hybridly polarized Stokes vortex beams remains challenging…
▽ More
Structured beams carrying topological defects, namely phase and Stokes singularities, have gained extensive interest in numerous areas of optics. The non-separable spin and orbital angular momentum states of hybridly polarized Stokes singular beams provide additional freedom for manipulating optical fields. However, the characterization of hybridly polarized Stokes vortex beams remains challenging owing to the degeneracy associated with the complex polarization structures of these beams. In addition, experimental noise factors such as relative phase, amplitude, and polarization difference together with beam fluctuations add to the perplexity in the identification process. Here, we present a generalized diffraction-based Stokes polarimetry approach assisted with deep learning for efficient identification of Stokes singular beams. A total of 15 classes of beams are considered based on the type of Stokes singularity and their associated mode indices. The resultant total and polarization component intensities of Stokes singular beams after diffraction through a triangular aperture are exploited by the deep neural network to recognize these beams. Our approach presents a classification accuracy of 98.67% for 15 types of Stokes singular beams that comprise several degenerate cases. The present study illustrates the potential of diffraction of the Stokes singular beam with polarization transformation, modeling of experimental noise factors, and a deep learning framework for characterizing hybridly polarized beams
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Authors:
Max Schwarzer,
Johan Obando-Ceron,
Aaron Courville,
Marc Bellemare,
Rishabh Agarwal,
Pablo Samuel Castro
Abstract:
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a dis…
▽ More
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
△ Less
Submitted 13 November, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Karma: Resource Allocation for Dynamic Demands
Authors:
Midhul Vuppalapati,
Giannis Fikioris,
Rachit Agarwal,
Asaf Cidon,
Anurag Khandelwal,
Eva Tardos
Abstract:
We consider the problem of fair resource allocation in a system where user demands are dynamic, that is, where user demands vary over time. Our key observation is that the classical max-min fairness algorithm for resource allocation provides many desirable properties (e.g., Pareto efficiency, strategy-proofness, and fairness), but only under the strong assumption of user demands being static over…
▽ More
We consider the problem of fair resource allocation in a system where user demands are dynamic, that is, where user demands vary over time. Our key observation is that the classical max-min fairness algorithm for resource allocation provides many desirable properties (e.g., Pareto efficiency, strategy-proofness, and fairness), but only under the strong assumption of user demands being static over time. For the realistic case of dynamic user demands, the max-min fairness algorithm loses one or more of these properties.
We present Karma, a new resource allocation mechanism for dynamic user demands. The key technical contribution in Karma is a credit-based resource allocation algorithm: in each quantum, users donate their unused resources and are assigned credits when other users borrow these resources; Karma carefully orchestrates the exchange of credits across users (based on their instantaneous demands, donated resources and borrowed resources), and performs prioritized resource allocation based on users' credits. We theoretically establish Karma guarantees related to Pareto efficiency, strategy-proofness, and fairness for dynamic user demands. Empirical evaluations over production workloads show that these properties translate well into practice: Karma is able to reduce disparity in performance across users to a bare minimum while maintaining Pareto-optimal system-wide performance.
△ Less
Submitted 7 July, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Creativity as Variations on a Theme: Formalizations, Evidence, and Engineered Applications
Authors:
Rohan Agarwal
Abstract:
There are many philosophies and theories on what creativity is and how it works, but one popular idea is that of variations on a theme and intersection of concepts. This literature review explores philosophical proposals of how creativity emerges from variations on a theme, and how formalizations of these proposals in human subject studies and computational methods result in creativity. Specifical…
▽ More
There are many philosophies and theories on what creativity is and how it works, but one popular idea is that of variations on a theme and intersection of concepts. This literature review explores philosophical proposals of how creativity emerges from variations on a theme, and how formalizations of these proposals in human subject studies and computational methods result in creativity. Specifically, the philosophical idea of intangible clouds of concepts is analyzed with empirical studies of concept representation and mental model formation, and mathematical formalizations of such ideas. Empirical findings on emergent neural activity from neural network combinations are also examined for evidence of novel, emergent ideas from the collision of existing ones. Finally, work on human-AI co-creativity is used as a lens for concept collision and the effectiveness of this model of creativity. This paper also proposes directions for further research in studying creativity as variations on a theme.
△ Less
Submitted 7 May, 2023;
originally announced May 2023.
-
Echoes of Biases: How Stigmatizing Language Affects AI Performance
Authors:
Yizhi Liu,
Weiguang Wang,
Guodong Gordon Gao,
Ritu Agarwal
Abstract:
Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction us…
▽ More
Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction using a Transformer-based deep learning model and explainable AI (XAI) techniques. Our findings demonstrate that SL written by clinicians adversely affects AI performance, particularly so for black patients, highlighting SL as a source of racial disparity in AI model development. To explore an operationally efficient way to mitigate SL's impact, we investigate patterns in the generation of SL through a clinicians' collaborative network, identifying central clinicians as having a stronger impact on racial disparity in the AI model. We find that removing SL written by central clinicians is a more efficient bias reduction strategy than eliminating all SL in the entire corpus of data. This study provides actionable insights for responsible AI development and contributes to understanding clinician behavior and EHR note writing in healthcare.
△ Less
Submitted 12 June, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Beyond Prompts: Exploring the Design Space of Mixed-Initiative Co-Creativity Systems
Authors:
Zhiyu Lin,
Upol Ehsan,
Rohan Agarwal,
Samihan Dani,
Vidushi Vashishth,
Mark Riedl
Abstract:
Generative Artificial Intelligence systems have been developed for image, code, story, and game generation with the goal of facilitating human creativity. Recent work on neural generative systems has emphasized one particular means of interacting with AI systems: the user provides a specification, usually in the form of prompts, and the AI system generates the content. However, there are other con…
▽ More
Generative Artificial Intelligence systems have been developed for image, code, story, and game generation with the goal of facilitating human creativity. Recent work on neural generative systems has emphasized one particular means of interacting with AI systems: the user provides a specification, usually in the form of prompts, and the AI system generates the content. However, there are other configurations of human and AI coordination, such as co-creativity (CC) in which both human and AI systems can contribute to content creation, and mixed-initiative (MI) in which both human and AI systems can initiate content changes. In this paper, we define a hypothetical human-AI configuration design space consisting of different means for humans and AI systems to communicate creative intent to each other. We conduct a human participant study with 185 participants to understand how users want to interact with differently configured MI-CC systems. We find out that MI-CC systems with more extensive coverage of the design space are rated higher or on par on a variety of creative and goal-completion metrics, demonstrating that wider coverage of the design space can improve user experience and achievement when using the system; Preference varies greatly between expertise groups, suggesting the development of adaptive, personalized MI-CC systems; Participants identified new design space dimensions including scrutability -- the ability to poke and prod at models -- and explainability.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Discrete Rubio de Francia extrapolation theorem via factorization of weights and iterated algorithms
Authors:
S. H. Saker,
A. I. Saied,
R. P. Agarwal
Abstract:
In this paper, we prove a discrete Rubio de Francia extrapolation theorem via factorization of discrete Muckenhoupt weights and discrete iterated Rubio de Francia algorithm and its duality.
In this paper, we prove a discrete Rubio de Francia extrapolation theorem via factorization of discrete Muckenhoupt weights and discrete iterated Rubio de Francia algorithm and its duality.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Optically induced symmetry breaking due to nonequilibrium steady state formation in charge density wave material 1T-TiSe2
Authors:
Harshvardhan Jog,
Luminita Harnagea,
Dibyata Rout,
Takashi Taniguchi,
Kenji Watanabe,
Eugene J. Mele,
Ritesh Agarwal
Abstract:
The strongly correlated charge density wave (CDW) phase of 1T-TiSe$_2$ is being extensively researched to verify the claims of a unique chiral order due to the presence of three equivalent Fermi wavevectors involved in the CDW formation. Characterization of the symmetries is therefore critical to understand the origin of their intriguing properties but can be complicated by the coupling of the ele…
▽ More
The strongly correlated charge density wave (CDW) phase of 1T-TiSe$_2$ is being extensively researched to verify the claims of a unique chiral order due to the presence of three equivalent Fermi wavevectors involved in the CDW formation. Characterization of the symmetries is therefore critical to understand the origin of their intriguing properties but can be complicated by the coupling of the electronic and lattice degrees of freedom. Here we use continuous wave laser excitation to probe the symmetries of TiSe$_2$ using the circular photogalvanic effect with very high sensitivity. We observe that the ground state of the CDW phase is achiral. However, laser excitation above a threshold intensity transforms TiSe$_2$ into a chiral phase in a nonequilibrium steady state, which changes the electronic correlations in the stacking direction of the layered material. The inherent sensitivity of the photogalvanic technique provides clear evidence of the different optically driven phases of 1T-TiSe$_2$, as well as emphasizes the interplay of electronic and lattice degrees of freedom in this system under optical excitation. Our work demonstrates that optically induced phase change can occur at extremely low optical intensities in strongly correlated materials, providing a pathway for future studies to engineer new phases using light.
△ Less
Submitted 19 November, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
Authors:
Jesse Farebrother,
Joshua Greaves,
Rishabh Agarwal,
Charline Le Lan,
Ross Goroshin,
Pablo Samuel Castro,
Marc G. Bellemare
Abstract:
Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treate…
▽ More
Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent's network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment's reward function.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Hankel determinant for a general subclass of m-fold symmetric bi-univalent functions defined by Ruscheweyh operator
Authors:
Pishtiwan Othman Sabir,
Ravi P. Agarwal,
Shabaz Jalil MohammedFaeq,
Pshtiwan Othman Mohammed,
Nejmeddine Chorfi,
Thabet Abdeljawad
Abstract:
Making use of the Hankel determinant and the Ruscheweyh derivative, in this work, we consider a general subclass of m-fold symmetric normalized bi-univalent functions defined in the open unit disk. Moreover, we investigate the bounds for the second Hankel determinant of this class and some consequences of the results are presented. In addition, to demonstrate the accuracy on some functions and con…
▽ More
Making use of the Hankel determinant and the Ruscheweyh derivative, in this work, we consider a general subclass of m-fold symmetric normalized bi-univalent functions defined in the open unit disk. Moreover, we investigate the bounds for the second Hankel determinant of this class and some consequences of the results are presented. In addition, to demonstrate the accuracy on some functions and conditions, most general programs are written in Python V.3.8.8 (2021).
△ Less
Submitted 30 August, 2023; v1 submitted 23 April, 2023;
originally announced April 2023.
-
Catch Me If You Can: Identifying Fraudulent Physician Reviews with Large Language Models Using Generative Pre-Trained Transformers
Authors:
Aishwarya Deep Shukla,
Laksh Agarwal,
Jie Mein,
Goh,
Guodong,
Gao,
Ritu Agarwal
Abstract:
The proliferation of fake reviews of doctors has potentially detrimental consequences for patient well-being and has prompted concern among consumer protection groups and regulatory bodies. Yet despite significant advancements in the fields of machine learning and natural language processing, there remains limited comprehension of the characteristics differentiating fraudulent from authentic revie…
▽ More
The proliferation of fake reviews of doctors has potentially detrimental consequences for patient well-being and has prompted concern among consumer protection groups and regulatory bodies. Yet despite significant advancements in the fields of machine learning and natural language processing, there remains limited comprehension of the characteristics differentiating fraudulent from authentic reviews. This study utilizes a novel pre-labeled dataset of 38048 physician reviews to establish the effectiveness of large language models in classifying reviews. Specifically, we compare the performance of traditional ML models, such as logistic regression and support vector machines, to generative pre-trained transformer models. Furthermore, we use GPT4, the newest model in the GPT family, to uncover the key dimensions along which fake and genuine physician reviews differ. Our findings reveal significantly superior performance of GPT-3 over traditional ML models in this context. Additionally, our analysis suggests that GPT3 requires a smaller training sample than traditional models, suggesting its appropriateness for tasks with scarce training data. Moreover, the superiority of GPT3 performance increases in the cold start context i.e., when there are no prior reviews of a doctor. Finally, we employ GPT4 to reveal the crucial dimensions that distinguish fake physician reviews. In sharp contrast to previous findings in the literature that were obtained using simulated data, our findings from a real-world dataset show that fake reviews are generally more clinically detailed, more reserved in sentiment, and have better structure and grammar than authentic ones.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Understanding Rug Pulls: An In-Depth Behavioral Analysis of Fraudulent NFT Creators
Authors:
Trishie Sharma,
Rachit Agarwal,
Sandeep Kumar Shukla
Abstract:
The explosive growth of non-fungible tokens (NFTs) on Web3 has created a new frontier for digital art and collectibles, but also an emerging space for fraudulent activities. This study provides an in-depth analysis of NFT rug pulls, which are fraudulent schemes aimed at stealing investors' funds. Using data from 758 rug pulls across 10 NFT marketplaces, we examine the structural and behavioral pro…
▽ More
The explosive growth of non-fungible tokens (NFTs) on Web3 has created a new frontier for digital art and collectibles, but also an emerging space for fraudulent activities. This study provides an in-depth analysis of NFT rug pulls, which are fraudulent schemes aimed at stealing investors' funds. Using data from 758 rug pulls across 10 NFT marketplaces, we examine the structural and behavioral properties of these schemes, identify the characteristics and motivations of rug-pullers, and classify NFT projects into groups based on creators' association with their accounts. Our findings reveal that repeated rug pulls account for a significant proportion of the rise in NFT-related cryptocurrency crimes, with one NFT collection attempting 37 rug pulls within three months. Additionally, we identify the largest group of creators influencing the majority of rug pulls, and demonstrate the connection between rug-pullers of different NFT projects through the use of the same wallets to store and move money. Our study contributes to the understanding of NFT market risks and provides insights for designing preventative strategies to mitigate future losses.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
CIRCLE: Capture In Rich Contextual Environments
Authors:
Joao Pedro Araujo,
Jiaman Li,
Karthik Vetrivel,
Rishi Agarwal,
Deepak Gopinath,
Jiajun Wu,
Alexander Clegg,
C. Karen Liu
Abstract:
Synthesizing 3D human motion in a contextual, ecological environment is important for simulating realistic activities people perform in the real world. However, conventional optics-based motion capture systems are not suited for simultaneously capturing human movements and complex scenes. The lack of rich contextual 3D human motion datasets presents a roadblock to creating high-quality generative…
▽ More
Synthesizing 3D human motion in a contextual, ecological environment is important for simulating realistic activities people perform in the real world. However, conventional optics-based motion capture systems are not suited for simultaneously capturing human movements and complex scenes. The lack of rich contextual 3D human motion datasets presents a roadblock to creating high-quality generative human motion models. We propose a novel motion acquisition system in which the actor perceives and operates in a highly contextual virtual world while being motion captured in the real world. Our system enables rapid collection of high-quality human motion in highly diverse scenes, without the concern of occlusion or the need for physical scene construction in the real world. We present CIRCLE, a dataset containing 10 hours of full-body reaching motion from 5 subjects across nine scenes, paired with ego-centric information of the environment represented in various forms, such as RGBD videos. We use this dataset to train a model that generates human motion conditioned on scene information. Leveraging our dataset, the model learns to use ego-centric scene information to achieve nontrivial reaching tasks in the context of complex 3D scenes. To download the data please visit https://stanford-tml.github.io/circle_dataset/.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
Absence of topological protection of the interface states in $\mathbb{Z}_2$ photonic crystals
Authors:
Shupeng Xu,
Yuhui Wang,
Ritesh Agarwal
Abstract:
Inspired from electronic systems, topological photonics aims to engineer new optical devices with robust properties. In many cases, the ideas from topological phases protected by internal symmetries in fermionic systems are extended to those protected by crystalline symmetries. One such popular photonic crystal model was proposed by Wu and Hu in 2015 for realizing a bosonic $\mathbb{Z}_2$ topologi…
▽ More
Inspired from electronic systems, topological photonics aims to engineer new optical devices with robust properties. In many cases, the ideas from topological phases protected by internal symmetries in fermionic systems are extended to those protected by crystalline symmetries. One such popular photonic crystal model was proposed by Wu and Hu in 2015 for realizing a bosonic $\mathbb{Z}_2$ topological crystalline insulator with robust topological edge states, which led to intense theoretical and experimental studies. However, rigorous relationship between the bulk topology and edge properties for this model, which is central to evaluating its advantage over traditional photonic designs, has never been established. In this work we revisit the expanded and shrunken honeycomb lattice structures proposed by Wu and Hu by using topological quantum chemistry tools and show that they are topologically trivial in the sense that symmetric, localized Wannier functions can be constructed. We show that the $\mathbb{Z}$ and $\mathbb{Z}_2$ type classification of the Wu-Hu model are equivalent to the $C_2T$ protected Euler class and the second Stiefel-Whitney class respectively, with the latter characterizing the full valence bands of Wu-Hu model indicating only a higher order topological insulator (HOTI) phase. We show that the Wu-Hu interface states can be gapped by a uniform topology preserving $C_6$ and $T$ symmetric perturbation, which demonstrates the trivial nature of the interface. Our results reveals that topology is not a necessary condition for the reported helical edge states in many photonics systems and opens new possibilities for interface engineering that may not be constrained to require topological designs.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Machine learning based biomedical image processing for echocardiographic images
Authors:
Ayesha Heena,
Nagashettappa Biradar,
Najmuddin M. Maroof,
Surbhi Bhatia,
Rashmi Agarwal,
Kanta Prasad
Abstract:
The popularity of Artificial intelligence and machine learning have prompted researchers to use it in the recent researches. The proposed method uses K-Nearest Neighbor (KNN) algorithm for segmentation of medical images, extracting of image features for analysis by classifying the data based on the neural networks. Classification of the images in medical imaging is very important, KNN is one suita…
▽ More
The popularity of Artificial intelligence and machine learning have prompted researchers to use it in the recent researches. The proposed method uses K-Nearest Neighbor (KNN) algorithm for segmentation of medical images, extracting of image features for analysis by classifying the data based on the neural networks. Classification of the images in medical imaging is very important, KNN is one suitable algorithm which is simple, conceptual and computational, which provides very good accuracy in results. KNN algorithm is a unique user-friendly approach with wide range of applications in machine learning algorithms which are majorly used for the various image processing applications including classification, segmentation and regression issues of the image processing. The proposed system uses gray level co-occurrence matrix features. The trained neural network has been tested successfully on a group of echocardiographic images, errors were compared using regression plot. The results of the algorithm are tested using various quantitative as well as qualitative metrics and proven to exhibit better performance in terms of both quantitative and qualitative metrics in terms of current state-of-the-art methods in the related area. To compare the performance of trained neural network the regression analysis performed showed a good correlation.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Towards a Muon Collider
Authors:
Carlotta Accettura,
Dean Adams,
Rohit Agarwal,
Claudia Ahdida,
Chiara Aimè,
Nicola Amapane,
David Amorim,
Paolo Andreetto,
Fabio Anulli,
Robert Appleby,
Artur Apresyan,
Aram Apyan,
Sergey Arsenyev,
Pouya Asadi,
Mohammed Attia Mahmoud,
Aleksandr Azatov,
John Back,
Lorenzo Balconi,
Laura Bandiera,
Roger Barlow,
Nazar Bartosik,
Emanuela Barzi,
Fabian Batsch,
Matteo Bauce,
J. Scott Berg
, et al. (272 additional authors not shown)
Abstract:
A muon collider would enable the big jump ahead in energy reach that is needed for a fruitful exploration of fundamental interactions. The challenges of producing muon collisions at high luminosity and 10 TeV centre of mass energy are being investigated by the recently-formed International Muon Collider Collaboration. This Review summarises the status and the recent advances on muon colliders desi…
▽ More
A muon collider would enable the big jump ahead in energy reach that is needed for a fruitful exploration of fundamental interactions. The challenges of producing muon collisions at high luminosity and 10 TeV centre of mass energy are being investigated by the recently-formed International Muon Collider Collaboration. This Review summarises the status and the recent advances on muon colliders design, physics and detector studies. The aim is to provide a global perspective of the field and to outline directions for future work.
△ Less
Submitted 27 November, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Aux-Drop: Handling Haphazard Inputs in Online Learning Using Auxiliary Dropouts
Authors:
Rohit Agarwal,
Deepak Gupta,
Alexander Horsch,
Dilip K. Prasad
Abstract:
Many real-world applications based on online learning produce streaming data that is haphazard in nature, i.e., contains missing features, features becoming obsolete in time, the appearance of new features at later points in time and a lack of clarity on the total number of input features. These challenges make it hard to build a learnable system for such applications, and almost no work exists in…
▽ More
Many real-world applications based on online learning produce streaming data that is haphazard in nature, i.e., contains missing features, features becoming obsolete in time, the appearance of new features at later points in time and a lack of clarity on the total number of input features. These challenges make it hard to build a learnable system for such applications, and almost no work exists in deep learning that addresses this issue. In this paper, we present Aux-Drop, an auxiliary dropout regularization strategy for online learning that handles the haphazard input features in an effective manner. Aux-Drop adapts the conventional dropout regularization scheme for the haphazard input feature space ensuring that the final output is minimally impacted by the chaotic appearance of such features. It helps to prevent the co-adaptation of especially the auxiliary and base features, as well as reduces the strong dependence of the output on any of the auxiliary inputs of the model. This helps in better learning for scenarios where certain features disappear in time or when new features are to be modelled. The efficacy of Aux-Drop has been demonstrated through extensive numerical experiments on SOTA benchmarking datasets that include Italy Power Demand, HIGGS, SUSY and multiple UCI datasets. The code is available at https://github.com/Rohit102497/Aux-Drop.
△ Less
Submitted 31 May, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
MABNet: Master Assistant Buddy Network with Hybrid Learning for Image Retrieval
Authors:
Rohit Agarwal,
Gyanendra Das,
Saksham Aggarwal,
Alexander Horsch,
Dilip K. Prasad
Abstract:
Image retrieval has garnered growing interest in recent times. The current approaches are either supervised or self-supervised. These methods do not exploit the benefits of hybrid learning using both supervision and self-supervision. We present a novel Master Assistant Buddy Network (MABNet) for image retrieval which incorporates both learning mechanisms. MABNet consists of master and assistant bl…
▽ More
Image retrieval has garnered growing interest in recent times. The current approaches are either supervised or self-supervised. These methods do not exploit the benefits of hybrid learning using both supervision and self-supervision. We present a novel Master Assistant Buddy Network (MABNet) for image retrieval which incorporates both learning mechanisms. MABNet consists of master and assistant blocks, both learning independently through supervision and collectively via self-supervision. The master guides the assistant by providing its knowledge base as a reference for self-supervision and the assistant reports its knowledge back to the master by weight transfer. We perform extensive experiments on public datasets with and without post-processing.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
Authors:
Ghada Sokar,
Rishabh Agarwal,
Pablo Samuel Castro,
Utku Evci
Abstract:
In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective me…
▽ More
In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.
△ Less
Submitted 13 June, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
Efficient 3D Object Reconstruction using Visual Transformers
Authors:
Rohan Agarwal,
Wei Zhou,
Xiaofeng Wu,
Yuhan Li
Abstract:
Reconstructing a 3D object from a 2D image is a well-researched vision problem, with many kinds of deep learning techniques having been tried. Most commonly, 3D convolutional approaches are used, though previous work has shown state-of-the-art methods using 2D convolutions that are also significantly more efficient to train. With the recent rise of transformers for vision tasks, often outperformin…
▽ More
Reconstructing a 3D object from a 2D image is a well-researched vision problem, with many kinds of deep learning techniques having been tried. Most commonly, 3D convolutional approaches are used, though previous work has shown state-of-the-art methods using 2D convolutions that are also significantly more efficient to train. With the recent rise of transformers for vision tasks, often outperforming convolutional methods, along with some earlier attempts to use transformers for 3D object reconstruction, we set out to use visual transformers in place of convolutions in existing efficient, high-performing techniques for 3D object reconstruction in order to achieve superior results on the task. Using a transformer-based encoder and decoder to predict 3D structure from 2D images, we achieve accuracy similar or superior to the baseline approach. This study serves as evidence for the potential of visual transformers in the task of 3D object reconstruction.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Revisiting Bellman Errors for Offline Model Selection
Authors:
Joshua P. Zitovsky,
Daniel de Marchi,
Rishabh Agarwal,
Michael R. Kosorok
Abstract:
Offline model selection (OMS), that is, choosing the best policy from a set of many policies given only logged data, is crucial for applying offline RL in real-world settings. One idea that has been extensively explored is to select policies based on the mean squared Bellman error (MSBE) of the associated Q-functions. However, previous work has struggled to obtain adequate OMS performance with Bel…
▽ More
Offline model selection (OMS), that is, choosing the best policy from a set of many policies given only logged data, is crucial for applying offline RL in real-world settings. One idea that has been extensively explored is to select policies based on the mean squared Bellman error (MSBE) of the associated Q-functions. However, previous work has struggled to obtain adequate OMS performance with Bellman errors, leading many researchers to abandon the idea. To this end, we elucidate why previous work has seen pessimistic results with Bellman errors and identify conditions under which OMS algorithms based on Bellman errors will perform well. Moreover, we develop a new estimator of the MSBE that is more accurate than prior methods. Our estimator obtains impressive OMS performance on diverse discrete control tasks, including Atari games.
△ Less
Submitted 6 June, 2023; v1 submitted 31 January, 2023;
originally announced February 2023.
-
A New Shrinking projection Algorithm for an infinite family of Bregman weak relatively nonexpansive mappings in a Banach Space
Authors:
Bijan Orouji,
Ebrahim Soori,
Donal O'Regan,
Ravi P. Agarwal
Abstract:
In this paper, using a new shrinking projection method and generalized resolvents of maximal monotone operators and generalized projections, we consider the strong convergence for finding a common point of the fixed points of a Bregman quasi-nonexpansive mapping, and common fixed points of a infinite family of Bregman weak relatively nonexpansive mappings, and common zero points of a finite family…
▽ More
In this paper, using a new shrinking projection method and generalized resolvents of maximal monotone operators and generalized projections, we consider the strong convergence for finding a common point of the fixed points of a Bregman quasi-nonexpansive mapping, and common fixed points of a infinite family of Bregman weak relatively nonexpansive mappings, and common zero points of a finite family of maximal monotone mappings, and common solutions of an equilibrium problem in a reflexive Banach space.
△ Less
Submitted 20 April, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces
Authors:
Charline Le Lan,
Joshua Greaves,
Jesse Farebrother,
Mark Rowland,
Fabian Pedregosa,
Rishabh Agarwal,
Marc G. Bellemare
Abstract:
Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of…
▽ More
Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of a given matrix from sample entries, i.e. from small random submatrices. Although a number of sample-based methods exist for this problem (e.g. Oja's rule \citep{oja1982simplified}), these assume access to full columns of the matrix or particular matrix structure such as symmetry and cannot be combined as-is with neural networks \citep{baldi1989neural}. In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace is represented by a neural network, and hence can be scaled to datasets with an effectively infinite number of rows and columns. Our method consists in defining a loss function whose minimizer is the desired principal subspace, and constructing a gradient estimate of this loss whose bias can be controlled. We complement our theoretical analysis with a series of experiments on synthetic matrices, the MNIST dataset \citep{lecun2010mnist} and the reinforcement learning domain PuddleWorld \citep{sutton1995generalization} demonstrating the usefulness of our approach.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes
Authors:
Aviral Kumar,
Rishabh Agarwal,
Xinyang Geng,
George Tucker,
Sergey Levine
Abstract:
The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works argue that offline RL methods encounter unique challenges to scaling up model capacity. Drawing on the learnings from these works, we re-examine previous design…
▽ More
The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works argue that offline RL methods encounter unique challenges to scaling up model capacity. Drawing on the learnings from these works, we re-examine previous design choices and find that with appropriate choices: ResNets, cross-entropy based distributional backups, and feature normalization, offline Q-learning algorithms exhibit strong performance that scales with model capacity. Using multi-task Atari as a testbed for scaling and generalization, we train a single policy on 40 games with near-human performance using up-to 80 million parameter networks, finding that model performance scales favorably with capacity. In contrast to prior work, we extrapolate beyond dataset performance even when trained entirely on a large (400M transitions) but highly suboptimal dataset (51% human-level performance). Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal. Finally, we show that offline Q-learning with a diverse dataset is sufficient to learn powerful representations that facilitate rapid transfer to novel games and fast online learning on new variations of a training game, improving over existing state-of-the-art representation learning approaches.
△ Less
Submitted 17 April, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.