-
Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning
Authors:
Jacob Mitchell Springer,
Vaishnavh Nagarajan,
Aditi Raghunathan
Abstract:
Sharpness-Aware Minimization (SAM) has emerged as a promising alternative optimizer to stochastic gradient descent (SGD). The originally-proposed motivation behind SAM was to bias neural networks towards flatter minima that are believed to generalize better. However, recent studies have shown conflicting evidence on the relationship between flatness and generalization, suggesting that flatness doe…
▽ More
Sharpness-Aware Minimization (SAM) has emerged as a promising alternative optimizer to stochastic gradient descent (SGD). The originally-proposed motivation behind SAM was to bias neural networks towards flatter minima that are believed to generalize better. However, recent studies have shown conflicting evidence on the relationship between flatness and generalization, suggesting that flatness does fully explain SAM's success. Sidestepping this debate, we identify an orthogonal effect of SAM that is beneficial out-of-distribution: we argue that SAM implicitly balances the quality of diverse features. SAM achieves this effect by adaptively suppressing well-learned features which gives remaining features opportunity to be learned. We show that this mechanism is beneficial in datasets that contain redundant or spurious features where SGD falls for the simplicity bias and would not otherwise learn all available features. Our insights are supported by experiments on real data: we demonstrate that SAM improves the quality of features in datasets containing redundant or spurious features, including CelebA, Waterbirds, CIFAR-MNIST, and DomainBed.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Lower Bound on the Greedy Approximation Ratio for Adaptive Submodular Cover
Authors:
Blake Harris,
Viswanath Nagarajan
Abstract:
We show that the greedy algorithm for adaptive-submodular cover has approximation ratio at least 1.3*(1+ln Q). Moreover, the instance demonstrating this gap has Q=1. So, it invalidates a prior result in the paper ``Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization'' by Golovin-Krause, that claimed a (1+ln Q)^2 approximation ratio for the same algorithm.
We show that the greedy algorithm for adaptive-submodular cover has approximation ratio at least 1.3*(1+ln Q). Moreover, the instance demonstrating this gap has Q=1. So, it invalidates a prior result in the paper ``Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization'' by Golovin-Krause, that claimed a (1+ln Q)^2 approximation ratio for the same algorithm.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
The pitfalls of next-token prediction
Authors:
Gregor Bachmann,
Vaishnavh Nagarajan
Abstract:
Can a mere next-token predictor faithfully model human intelligence? We crystallize this emerging concern and correct popular misconceptions surrounding it, and advocate a simple multi-token objective.
As a starting point, we argue that the two often-conflated phases of next-token prediction -- autoregressive inference and teacher-forced training -- must be treated distinctly. The popular critic…
▽ More
Can a mere next-token predictor faithfully model human intelligence? We crystallize this emerging concern and correct popular misconceptions surrounding it, and advocate a simple multi-token objective.
As a starting point, we argue that the two often-conflated phases of next-token prediction -- autoregressive inference and teacher-forced training -- must be treated distinctly. The popular criticism that errors can compound during autoregressive inference, crucially assumes that teacher-forcing has learned an accurate next-token predictor. This assumption sidesteps a more deep-rooted problem we expose: in certain classes of tasks, teacher-forcing can simply fail to learn an accurate next-token predictor in the first place. We describe a general mechanism of how teacher-forcing can fail, and design a minimal planning task where both the Transformer and the Mamba architecture empirically fail in that manner -- remarkably, despite the task being straightforward to learn.
Finally, we provide preliminary evidence that this failure can be resolved using a simple modification that predicts multiple tokens in advance. We hope this finding can ground future debates and inspire explorations beyond the next-token prediction paradigm. We make our code available under https://github.com/gregorbachmann/Next-Token-Failures
△ Less
Submitted 5 July, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Semi-Bandit Learning for Monotone Stochastic Optimization
Authors:
Arpit Agarwal,
Rohan Ghuge,
Viswanath Nagarajan
Abstract:
Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get…
▽ More
Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get good (approximation) algorithms if these distributions are unknown, and the algorithm needs to learn them through repeated interactions? In this paper, we resolve this question for a large class of "monotone" stochastic problems, by providing a generic online learning algorithm with $\sqrt{T \log T}$ regret relative to the best approximation algorithm (under known distributions). Importantly, our online algorithm works in a semi-bandit setting, where in each period, the algorithm only observes samples from the r.v.s that were actually probed. Our framework applies to several fundamental problems in stochastic optimization such as prophet inequality, Pandora's box, stochastic knapsack, stochastic matchings and stochastic submodular optimization.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Optimal Decision Tree with Noisy Outcomes
Authors:
Su Jia,
Fatemeh Navidi,
Viswanath Nagarajan,
R. Ravi
Abstract:
In pool-based active learning, the learner is given an unlabeled data set and aims to efficiently learn the unknown hypothesis by querying the labels of the data points. This can be formulated as the classical Optimal Decision Tree (ODT) problem: Given a set of tests, a set of hypotheses, and an outcome for each pair of test and hypothesis, our objective is to find a low-cost testing procedure (i.…
▽ More
In pool-based active learning, the learner is given an unlabeled data set and aims to efficiently learn the unknown hypothesis by querying the labels of the data points. This can be formulated as the classical Optimal Decision Tree (ODT) problem: Given a set of tests, a set of hypotheses, and an outcome for each pair of test and hypothesis, our objective is to find a low-cost testing procedure (i.e., decision tree) that identifies the true hypothesis. This optimization problem has been extensively studied under the assumption that each test generates a deterministic outcome. However, in numerous applications, for example, clinical trials, the outcomes may be uncertain, which renders the ideas from the deterministic setting invalid. In this work, we study a fundamental variant of the ODT problem in which some test outcomes are noisy, even in the more general case where the noise is persistent, i.e., repeating a test gives the same noisy output. Our approximation algorithms provide guarantees that are nearly best possible and hold for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades continuously with this number. We numerically evaluated our algorithms for identifying toxic chemicals and learning linear classifiers, and observed that our algorithms have costs very close to the information-theoretic minimum.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
Low Resistance Ohmic Contact to P-type Monolayer WSe2
Authors:
Jingxu Xie,
Zuocheng Zhang,
Haodong Zhang,
Vikram Nagarajan,
Wenyu Zhao,
Haleem Kim,
Collin Sanborn,
Ruishi Qi,
Sudi Chen,
Salman Kahn,
Kenji Watanabe,
Takashi Taniguchi,
Alex Zettl,
Michael Crommie,
James Analytis,
Feng Wang
Abstract:
Advanced microelectronics in the future may require semiconducting channel materials beyond silicon. Two-dimensional (2D) semiconductors, characterized by their atomically thin thickness, hold immense promise for high-performance electronic devices at the nanometer scale with lower heat dissipation. One challenge for achieving high-performance 2D semiconductor field effect transistors (FET), espec…
▽ More
Advanced microelectronics in the future may require semiconducting channel materials beyond silicon. Two-dimensional (2D) semiconductors, characterized by their atomically thin thickness, hold immense promise for high-performance electronic devices at the nanometer scale with lower heat dissipation. One challenge for achieving high-performance 2D semiconductor field effect transistors (FET), especially for p-type materials, is the high electrical contact resistance present at the metal-semiconductor interface. In conventional bulk semiconductors, low resistance ohmic contact is realized through heavy substitutional doping with acceptor or donor impurities at the contact region. The strategy of substitutional doping, however, does not work for p-type 2D semiconductors such as monolayer tungsten diselenide (WSe$_2$).In this study, we developed highly efficient charge-transfer doping with WSe$_2$/$α$-RuCl$_3$ heterostructures to achieve low-resistance ohmic contact for p-type WSe$_2$ transistors. We show that a hole doping as high as 3$\times$10$^{13}$ cm$^{-2}$ can be achieved in the WSe$_2/α$-RuCl$_3$ heterostructure due to its type-III band alignment. It results in an Ohmic contact with resistance lower than 4 k Ohm $μ$m at the p-type monolayer WSe$_2$/metal junction. at room temperature. Using this low-resistance contact, we demonstrate high-performance p-type WSe$_2$ transistors with a saturation current of 35 $μ$A$\cdot$ $μ$m$^{-1}$ and an I$_{ON}$/I$_{OFF}$ ratio exceeding 10$^9$ It could enable future microelectronic devices based on 2D semiconductors and contribute to the extension of Moore's law.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Informative Path Planning with Limited Adaptivity
Authors:
Rayen Tan,
Rohan Ghuge,
Viswanath Nagarajan
Abstract:
We consider the informative path planning ($\mathtt{IPP}$) problem in which a robot interacts with an uncertain environment and gathers information by visiting locations. The goal is to minimize its expected travel cost to cover a given submodular function. Adaptive solutions, where the robot incorporates all available information to select the next location to visit, achieve the best objective. H…
▽ More
We consider the informative path planning ($\mathtt{IPP}$) problem in which a robot interacts with an uncertain environment and gathers information by visiting locations. The goal is to minimize its expected travel cost to cover a given submodular function. Adaptive solutions, where the robot incorporates all available information to select the next location to visit, achieve the best objective. However, such a solution is resource-intensive as it entails recomputing after every visited location. A more practical approach is to design solutions with a small number of adaptive "rounds", where the robot recomputes only once at the start of each round. In this paper, we design an algorithm for $\mathtt{IPP}$ parameterized by the number $k$ of adaptive rounds, and prove a smooth trade-off between $k$ and the solution quality (relative to fully adaptive solutions). We validate our theoretical results by experiments on a real road network, where we observe that a few rounds of adaptivity suffice to obtain solutions of cost almost as good as fully-adaptive ones.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing
Authors:
Durga Mandarapu,
Vani Nagarajan,
Artem Pelenitsyn,
Milind Kulkarni
Abstract:
High-performance implementations of $k$-Nearest Neighbor Search ($k$NN) in low dimensions use tree-based data structures. Tree algorithms are hard to parallelize on GPUs due to their irregularity. However, newer Nvidia GPUs offer hardware support for tree operations through ray-tracing cores. Recent works have proposed using RT cores to implement $k$NN search, but they all have a hardware-imposed…
▽ More
High-performance implementations of $k$-Nearest Neighbor Search ($k$NN) in low dimensions use tree-based data structures. Tree algorithms are hard to parallelize on GPUs due to their irregularity. However, newer Nvidia GPUs offer hardware support for tree operations through ray-tracing cores. Recent works have proposed using RT cores to implement $k$NN search, but they all have a hardware-imposed constraint on the distance metric used in the search -- the Euclidean distance. We propose and implement two reductions to support $k$NN for a broad range of distances other than the Euclidean distance: Arkade Filter-Refine and Arkade Monotone Transformation, each of which allows non-Euclidean distance-based nearest neighbor queries to be performed in terms of the Euclidean distance. With our reductions, we observe that $k$NN search time speedups range between $1.6$x-$200$x and $1.3$x-$33.1$x over various state-of-the-art GPU shader core and RT core baselines, respectively. In evaluation, we provide several insights on RT architectures' ability to efficiently build and traverse the tree by analyzing the $k$NN search time trends.
△ Less
Submitted 21 April, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
What do larger image classifiers memorise?
Authors:
Michal Lukasik,
Vaishnavh Nagarajan,
Ankit Singh Rawat,
Aditya Krishna Menon,
Sanjiv Kumar
Abstract:
The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels. To carefully study this issue, Feldman proposed a metric to quantify the degree of memorisation of individual training examples, and empirically computed the correspondi…
▽ More
The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels. To carefully study this issue, Feldman proposed a metric to quantify the degree of memorisation of individual training examples, and empirically computed the corresponding memorisation profile of a ResNet on image classification bench-marks. While an exciting first glimpse into what real-world models memorise, this leaves open a fundamental question: do larger neural models memorise more? We present a comprehensive empirical analysis of this question on image classification benchmarks. We find that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes: most samples experience decreased memorisation under larger models, while the rest exhibit cap-shaped or increasing memorisation. We show that various proxies for the Feldman memorization score fail to capture these fundamental trends. Lastly, we find that knowledge distillation, an effective and popular model compression technique, tends to inhibit memorisation, while also improving generalisation. Specifically, memorisation is mostly inhibited on examples with increasing memorisation trajectories, thus pointing at how distillation improves generalisation.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Authors:
Tian Jin,
Nolan Clement,
Xin Dong,
Vaishnavh Nagarajan,
Michael Carbin,
Jonathan Ragan-Kelley,
Gintare Karolina Dziugaite
Abstract:
How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-conte…
▽ More
How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Think before you speak: Training Language Models With Pause Tokens
Authors:
Sachin Goyal,
Ziwei Ji,
Ankit Singh Rawat,
Aditya Krishna Menon,
Sanjiv Kumar,
Vaishnavh Nagarajan
Abstract:
Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing training and inference on lan…
▽ More
Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing training and inference on language models with a (learnable) $\textit{pause}$ token, a sequence of which is appended to the input prefix. We then delay extracting the model's outputs until the last pause token is seen, thereby allowing the model to process extra computation before committing to an answer. We empirically evaluate $\textit{pause-training}$ on decoder-only models of 1B and 130M parameters with causal pretraining on C4, and on downstream tasks covering reasoning, question-answering, general understanding and fact recall. Our main finding is that inference-time delays show gains when the model is both pre-trained and finetuned with delays. For the 1B model, we witness gains on 8 of 9 tasks, most prominently, a gain of $18\%$ EM score on the QA task of SQuAD, $8\%$ on CommonSenseQA and $1\%$ accuracy on the reasoning task of GSM8k. Our work raises a range of conceptual and practical future research questions on making delayed next-token prediction a widely applicable new paradigm.
△ Less
Submitted 20 April, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Strain Dependent Spin Hall Magnetoresistance in the Multiferroic Antiferromagnet BiFeO$_3$
Authors:
D. Sando,
S. Chen,
O. Paull,
B. Xu,
J. J. L. van Rijn,
C. Xu,
S. Xu,
F. Appert,
J. Juraszek,
L. Bellaiche,
V. Nagarajan,
T. Banerjee
Abstract:
The spin Hall magnetoresistance (SMR) of epitaxial BiFeO$_3$ thin films is investigated. SMR consistent with ferromagnetic interfacial states for BiFeO$_3$ films fabricated on (001) SrTiO$_3$ (R' BFO) and LaAlO$_3$ (T' BFO) substrates is found, albeit with different temperature dependencies. For T' BFO, the SMR is enhanced at room temperature, and decays with reduced temperatures. By contrast, R'…
▽ More
The spin Hall magnetoresistance (SMR) of epitaxial BiFeO$_3$ thin films is investigated. SMR consistent with ferromagnetic interfacial states for BiFeO$_3$ films fabricated on (001) SrTiO$_3$ (R' BFO) and LaAlO$_3$ (T' BFO) substrates is found, albeit with different temperature dependencies. For T' BFO, the SMR is enhanced at room temperature, and decays with reduced temperatures. By contrast, R' BFO shows a monotonic decrease in SMR response with increasing temperature, mirroring the trend of a weak ferromagnet. Density functional theory shows that this difference originates from the coupling of the applied magnetic field to oxygen octahedral rotation (R') and spin (T') degrees of freedom.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search
Authors:
Vani Nagarajan,
Durga Mandarapu,
Milind Kulkarni
Abstract:
The problem of identifying the k-Nearest Neighbors (kNNS) of a point has proven to be very useful both as a standalone application and as a subroutine in larger applications. Given its far-reaching applicability in areas such as machine learning and point clouds, extensive research has gone into leveraging GPU acceleration to solve this problem. Recent work has shown that using Ray Tracing cores i…
▽ More
The problem of identifying the k-Nearest Neighbors (kNNS) of a point has proven to be very useful both as a standalone application and as a subroutine in larger applications. Given its far-reaching applicability in areas such as machine learning and point clouds, extensive research has gone into leveraging GPU acceleration to solve this problem. Recent work has shown that using Ray Tracing cores in recent GPUs to accelerate kNNS is much more efficient compared to traditional acceleration using shader cores. However, the existing translation of kNNS to a ray tracing problem imposes a constraint on the search space for neighbors. Due to this, we can only use RT cores to accelerate fixed-radius kNNS, which requires the user to set a search radius a priori and hence can miss neighbors. In this work, we propose TrueKNN, the first unbounded RT-accelerated neighbor search. TrueKNN adopts an iterative approach where we incrementally grow the search space until all points have found their k neighbors. We show that our approach is orders of magnitude faster than existing approaches and can even be used to accelerate fixed-radius neighbor searches.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Svarah: Evaluating English ASR Systems on Indian Accents
Authors:
Tahir Javed,
Sakshi Joshi,
Vignesh Nagarajan,
Sai Sundaresan,
Janki Nawale,
Abhigyan Raman,
Kaushal Bhogale,
Pratyush Kumar,
Mitesh M. Khapra
Abstract:
India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this…
▽ More
India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this work, we address this gap by creating Svarah, a benchmark that contains 9.6 hours of transcribed English audio from 117 speakers across 65 geographic locations throughout India, resulting in a diverse range of accents. Svarah comprises both read speech and spontaneous conversational data, covering various domains, such as history, culture, tourism, etc., ensuring a diverse vocabulary. We evaluate 6 open source ASR models and 2 commercial ASR systems on Svarah and show that there is clear scope for improvement on Indian accents. Svarah as well as all our code will be publicly available.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
RT-DBSCAN: Accelerating DBSCAN using Ray Tracing Hardware
Authors:
Vani Nagarajan,
Milind Kulkarni
Abstract:
General Purpose computing on Graphical Processing Units (GPGPU) has resulted in unprecedented levels of speedup over its CPU counterparts, allowing programmers to harness the computational power of GPU shader cores to accelerate other computing applications. But this style of acceleration is best suited for regular computations (e.g., linear algebra). Recent GPUs feature new Ray Tracing (RT) cores…
▽ More
General Purpose computing on Graphical Processing Units (GPGPU) has resulted in unprecedented levels of speedup over its CPU counterparts, allowing programmers to harness the computational power of GPU shader cores to accelerate other computing applications. But this style of acceleration is best suited for regular computations (e.g., linear algebra). Recent GPUs feature new Ray Tracing (RT) cores that instead speed up the irregular process of ray tracing using Bounding Volume Hierarchies. While these cores seem limited in functionality, they can be used to accelerate n-body problems by leveraging RT cores to accelerate the required distance computations. In this work, we propose RT-DBSCAN, the first RT-accelerated DBSCAN implementation. We use RT cores to accelerate Density-Based Clustering of Applications with Noise (DBSCAN) by translating fixed-radius nearest neighbor queries to ray tracing queries. We show that leveraging the RT hardware results in speedups between 1.3x to 4x over current state-of-the-art, GPU-based DBSCAN implementations.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
ResMem: Learn what you can and memorize the rest
Authors:
Zitong Yang,
Michal Lukasik,
Vaishnavh Nagarajan,
Zonglin Li,
Ankit Singh Rawat,
Manzil Zaheer,
Aditya Krishna Menon,
Sanjiv Kumar
Abstract:
The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a ne…
▽ More
The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a neural network) by fitting the model's residuals with a $k$-nearest neighbor based regressor. The final prediction is then the sum of the original model and the fitted residual regressor. By construction, ResMem can explicitly memorize the training labels. Empirically, we show that ResMem consistently improves the test set generalization of the original prediction model across various standard vision and natural language processing benchmarks. Theoretically, we formulate a stylized linear regression problem and rigorously show that ResMem results in a more favorable test risk over the base predictor.
△ Less
Submitted 20 October, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
On student-teacher deviations in distillation: does it pay to disobey?
Authors:
Vaishnavh Nagarajan,
Aditya Krishna Menon,
Srinadh Bhojanapalli,
Hossein Mobahi,
Sanjiv Kumar
Abstract:
Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network. Yet, it has been shown in recent work that, despite being trained to fit the teacher's probabilities, the student may not only significantly deviate from the teacher probabilities, but may also outdo than the teacher in…
▽ More
Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network. Yet, it has been shown in recent work that, despite being trained to fit the teacher's probabilities, the student may not only significantly deviate from the teacher probabilities, but may also outdo than the teacher in performance. Our work aims to reconcile this seemingly paradoxical observation. Specifically, we characterize the precise nature of the student-teacher deviations, and argue how they can co-occur with better generalization. First, through experiments on image and language data, we identify that these probability deviations correspond to the student systematically exaggerating the confidence levels of the teacher. Next, we theoretically and empirically establish another form of exaggeration in some simple settings: KD exaggerates the implicit bias of gradient descent in converging faster along the top eigendirections of the data. Finally, we tie these two observations together: we demonstrate that the exaggerated bias of KD can simultaneously result in both (a) the exaggeration of confidence and (b) the improved generalization of the student, thus offering a resolution to the apparent paradox. Our analysis brings existing theory and practice closer by considering the role of gradient descent in KD and by demonstrating the exaggerated bias effect in both theoretical and empirical settings.
△ Less
Submitted 18 March, 2024; v1 submitted 30 January, 2023;
originally announced January 2023.
-
IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages
Authors:
Ananya B. Sai,
Vignesh Nagarajan,
Tanay Dixit,
Raj Dabre,
Anoop Kunchukuttan,
Pratyush Kumar,
Mitesh M. Khapra
Abstract:
The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over…
▽ More
The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET, have the highest correlations with annotator scores. Additionally, we find that the metrics do not adequately capture fluency-based errors in Indian languages, and there is a need to develop metrics focused on Indian languages. We hope that our dataset and analysis will help promote further research in this area.
△ Less
Submitted 3 July, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Electronic transport mechanisms in a thin crystal of the Kitaev candidate $α$-RuCl$_3$ probed through guarded high impedance measurements
Authors:
Patrick Barfield,
Vinh Tran,
Vikram Nagarajan,
Maya Martinez,
Amirari Diego,
Derek Bergner,
Alessandra Lanzara,
James G. Analytis,
Claudia Ojeda-Aristizabal
Abstract:
$α$-RuCl$_3$ is considered to be the top candidate material for the experimental realization of the celebrated Kitaev model. It is however known that additional interactions beyond the Kitaev model trigger in $α$-RuCl$_3$, a long-range zigzag antiferromagnetic ground state. In this work, we investigate a nanoflake of $α$-RuCl$_3…
▽ More
$α$-RuCl$_3$ is considered to be the top candidate material for the experimental realization of the celebrated Kitaev model. It is however known that additional interactions beyond the Kitaev model trigger in $α$-RuCl$_3$, a long-range zigzag antiferromagnetic ground state. In this work, we investigate a nanoflake of $α$-RuCl$_3$ through guarded high impedance measurements aimed at reaching through electronic transport, the regime where the system turns into a zigzag antiferromagnet. We investigated a variety of temperatures (\SI{1.45}{\kelvin} - \SI{175}{\kelvin}) and out-of-plane magnetic fields ranging up to \SI{11}{\tesla}. We found a clear signature of a structural phase transition at $\approx 160$\,K as reported for thin crystals of $α$-RuCl$_3$, as well as a thermally activated behavior at temperatures above $\approx 30$\,K with a characteristic activation energy significantly smaller than the energy gap that we observe for $α$-RuCl$_3$ bulk crystals through our Angle Resolved Photoemission Spectroscopy (ARPES) experiments. Additionally we found that below $\approx 30$\,K, transport is ruled by Efros-Shklovskii (ES) VRH. These observations point to the presence of Coulomb impurities in our thin crystals. Most importantly, our data shows that below the magnetic ordering transition known for bulk $α$-RuCl$_3$ ($\approx 7$\,K), there is a clear deviation from VRH or thermal activation transport mechanisms. Our work demonstrates the possibility of reaching through specialized high impedance measurements, the thrilling ground states predicted for $α$-RuCl$_3$ at low temperatures in the frame of the Kitaev model, and informs about the transport mechanisms in this material in a wide temperature range as well as on important characteristic quantities such as the localization length of the impurities in a thin $α$-RuCl$_3$ crystal.
△ Less
Submitted 13 January, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Brachistochrone of off-centered cylinders
Authors:
Krishnaraj Sambath,
Vidhya Nagarajan
Abstract:
We consider the problem of finding paths of shortest transit time between two points (popularly known as Brachistochrone) for cylinders with off-centered center of mass, rolling down without slip, subject solely to the force of gravity. This problem is set up using principles of classical rigid body dynamics and the desired path function is solved for numerically using the method of discrete calcu…
▽ More
We consider the problem of finding paths of shortest transit time between two points (popularly known as Brachistochrone) for cylinders with off-centered center of mass, rolling down without slip, subject solely to the force of gravity. This problem is set up using principles of classical rigid body dynamics and the desired path function is solved for numerically using the method of discrete calculus of variations. We discover a distinct array of brachistochrone trajectories for off-centered cylinders, demonstrate a critical dependence of such paths on the initial location and orientation of cylinders' centers of mass and bring new insights into the family of brachistochrone problems and solutions.
△ Less
Submitted 26 December, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Authors:
Arpit Agarwal,
Rohan Ghuge,
Viswanath Nagarajan
Abstract:
We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the $\textit{fully adaptive}$ setting, where the algorithm can make updates after every comparison. The "batched" dueling bandit problem is motivated by large-scale applications like…
▽ More
We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the $\textit{fully adaptive}$ setting, where the algorithm can make updates after every comparison. The "batched" dueling bandit problem is motivated by large-scale applications like web search ranking and recommendation systems, where performing sequential updates may be infeasible. In this work, we ask: $\textit{is there a solution using only a few adaptive rounds that matches the asymptotic regret bounds of the best sequential algorithms for $K$-armed dueling bandits?}$ We answer this in the affirmative $\textit{under the Condorcet condition}$, a standard setting of the $K$-armed dueling bandit problem. We obtain asymptotic regret of $O(K^2\log^2(K)) + O(K\log(T))$ in $O(\log(T))$ rounds, where $T$ is the time horizon. Our regret bounds nearly match the best regret bounds known in the fully sequential setting under the Condorcet condition. Finally, in computational experiments over a variety of real-world datasets, we observe that our algorithm using $O(\log(T))$ rounds achieves almost the same performance as fully sequential algorithms (that use $T$ rounds).
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Ferroelectric Solitons Crafted in Epitaxial Bismuth Ferrite Superlattices
Authors:
V. Govinden,
P. R. Tong,
X. Guo,
Q. Zhang,
S. Mantri,
S. Prokhorenko,
Y. Nahas,
Y. Wu,
L. Bellaiche,
H. Tian,
Z. Hong,
D. Sando,
V. Nagarajan
Abstract:
In ferroelectrics, complex interactions among various degrees of freedom enable the condensation of topologically protected polarization textures. Known as ferroelectric solitons, these particle-like structures represent a new class of materials with promise for beyond CMOS technologies due to their ultrafine size and sensitivity to external stimuli. Such polarization textures have scarcely been r…
▽ More
In ferroelectrics, complex interactions among various degrees of freedom enable the condensation of topologically protected polarization textures. Known as ferroelectric solitons, these particle-like structures represent a new class of materials with promise for beyond CMOS technologies due to their ultrafine size and sensitivity to external stimuli. Such polarization textures have scarcely been reported in multiferroics. Here, we report a range of soliton topologies in bismuth ferrite strontium titanate superlattices. High-resolution piezoresponse force microscopy and Cs-corrected high-angle annular dark-field scanning transmission electron microscopy reveal a zoo of topologies, and polarization displacement mapping of planar specimens reveals center-convergent and divergent topological defects as small as 3 nm. Phase field simulations verify that some of these topologies can be classed as bimerons, with a topological charge of plus and minus one, and first-principles-based effective Hamiltonian computations show that the co-existence of such structures can lead to non-integer topological charges, a first observation in a BiFeO3-based system. Our results open new opportunities in multiferroic topotronics.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Minimum Cost Adaptive Submodular Cover
Authors:
Hessa Al-Thani,
Yubing Cui,
Viswanath Nagarajan
Abstract:
Adaptive submodularity is a fundamental concept in stochastic optimization, with numerous applications such as sensor placement, hypothesis identification and viral marketing. We consider the problem of minimum cost cover of adaptive-submodular functions, and provide a $4(1+\ln Q)$-approximation algorithm, where $Q$ is the goal value. In fact, we consider a significantly more general objective of…
▽ More
Adaptive submodularity is a fundamental concept in stochastic optimization, with numerous applications such as sensor placement, hypothesis identification and viral marketing. We consider the problem of minimum cost cover of adaptive-submodular functions, and provide a $4(1+\ln Q)$-approximation algorithm, where $Q$ is the goal value. In fact, we consider a significantly more general objective of minimizing the $p^{th}$ moment of the coverage cost, and show that our algorithm simultaneously achieves a $(p+1)^{p+1}\cdot (\ln Q+1)^p$ approximation guarantee for all $p\ge 1$. All our approximation ratios are best possible up to constant factors (assuming $P\ne NP$). Moreover, our results also extend to the setting where one wants to cover {\em multiple} adaptive-submodular functions. Finally, we evaluate the empirical performance of our algorithm on instances of hypothesis identification.
△ Less
Submitted 21 May, 2024; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Nonvolatile Electric Field Control of Thermal Magnons in the Absence of an Applied Magnetic Field
Authors:
Eric Parsonnet,
Lucas Caretta,
Vikram Nagarajan,
Hongrui Zhang,
Hossein Taghinejad,
Piush Behera,
Xiaoxi Huang,
Pravin Kavle,
Abel Fernandez,
Dmitri Nikonov,
Hai Li,
Ian Young,
James Analytis,
Ramamoorthy Ramesh
Abstract:
Spin transport through magnetic insulators has been demonstrated in a variety of materials and is an emerging pathway for next-generation spin-based computing. To modulate spin transport in these systems, one typically applies a sufficiently strong magnetic field to allow for deterministic control of magnetic order. Here, we make use of the well-known multiferroic magnetoelectric, BiFeO3, to demon…
▽ More
Spin transport through magnetic insulators has been demonstrated in a variety of materials and is an emerging pathway for next-generation spin-based computing. To modulate spin transport in these systems, one typically applies a sufficiently strong magnetic field to allow for deterministic control of magnetic order. Here, we make use of the well-known multiferroic magnetoelectric, BiFeO3, to demonstrate non-volatile, hysteretic, electric-field control of thermally excited magnon current in the absence of an applied magnetic field. These findings are an important step toward magnon-based devices, where electric-field-only control is highly desirable.
△ Less
Submitted 23 August, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Batched Dueling Bandits
Authors:
Arpit Agarwal,
Rohan Ghuge,
Viswanath Nagarajan
Abstract:
The $K$-armed dueling bandit problem, where the feedback is in the form of noisy pairwise comparisons, has been widely studied. Previous works have only focused on the sequential setting where the policy adapts after every comparison. However, in many applications such as search ranking and recommendation systems, it is preferable to perform comparisons in a limited number of parallel batches. We…
▽ More
The $K$-armed dueling bandit problem, where the feedback is in the form of noisy pairwise comparisons, has been widely studied. Previous works have only focused on the sequential setting where the policy adapts after every comparison. However, in many applications such as search ranking and recommendation systems, it is preferable to perform comparisons in a limited number of parallel batches. We study the batched $K$-armed dueling bandit problem under two standard settings: (i) existence of a Condorcet winner, and (ii) strong stochastic transitivity and stochastic triangle inequality. For both settings, we obtain algorithms with a smooth trade-off between the number of batches and regret. Our regret bounds match the best known sequential regret bounds (up to poly-logarithmic factors), using only a logarithmic number of batches. We complement our regret analysis with a nearly-matching lower bound. Finally, we also validate our theoretical results via experiments on synthetic and real data.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
On Some Variants of Euclidean K-Supplier
Authors:
Euiwoong Lee,
Viswanath Nagarajan,
Lily Wang
Abstract:
The $k$-Supplier problem is an important location problem that has been actively studied in both general and Euclidean metrics. Many of its variants have also been studied, primarily on general metrics. We study two variants of $k$-Supplier, namely Priority $k$-Supplier and $k$-Supplier with Outliers, in Euclidean metrics. We obtain $(1+\sqrt{3})$-approximation algorithms for both variants, which…
▽ More
The $k$-Supplier problem is an important location problem that has been actively studied in both general and Euclidean metrics. Many of its variants have also been studied, primarily on general metrics. We study two variants of $k$-Supplier, namely Priority $k$-Supplier and $k$-Supplier with Outliers, in Euclidean metrics. We obtain $(1+\sqrt{3})$-approximation algorithms for both variants, which are the first improvements over the previously-known factor-$3$ approximation (that is known to be best-possible for general metrics). We also study the Matroid Supplier problem on Euclidean metrics, and show that it cannot be approximated to a factor better than $3$ (assuming $P\ne NP$); so the Euclidean metric offers no improvement in this case.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
Scalable Machine Learning Architecture for Neonatal Seizure Detection on Ultra-Edge Devices
Authors:
Vishal Nagarajan,
Ashwini Muralidharan,
Deekshitha Sriraman,
Pravin Kumar S
Abstract:
Neonatal seizures are a commonly encountered neurological condition. They are the first clinical signs of a serious neurological disorder. Thus, rapid recognition and treatment are necessary to prevent serious fatalities. The use of electroencephalography (EEG) in the field of neurology allows precise diagnosis of several medical conditions. However, interpreting EEG signals needs the attention of…
▽ More
Neonatal seizures are a commonly encountered neurological condition. They are the first clinical signs of a serious neurological disorder. Thus, rapid recognition and treatment are necessary to prevent serious fatalities. The use of electroencephalography (EEG) in the field of neurology allows precise diagnosis of several medical conditions. However, interpreting EEG signals needs the attention of highly specialized staff since the infant brain is developmentally immature during the neonatal period. Detecting seizures on time could potentially prevent the negative effects on the neurocognitive development of the infants. In recent years, neonatal seizure detection using machine learning algorithms have been gaining traction. Since there is a need for the classification of bio-signals to be computationally inexpensive in the case of seizure detection, this research presents a machine learning (ML) based architecture that operates with comparable predictive performance as previous models but with minimum level configuration. The proposed classifier was trained and tested on a public dataset of NICU seizures recorded at the Helsinki University Hospital. Our architecture achieved a best sensitivity of 87%, which is 6% more than that of the standard ML model chosen in this study. The model size of the ML classifier is optimized to just 4.84 KB with minimum prediction time of 182.61 milliseconds, thus enabling it to be deployed on wearable ultra-edge devices for quick and accurate response and obviating the need for cloud-based and other such exhaustive computational methods.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
End-to-End Optimized Arrhythmia Detection Pipeline using Machine Learning for Ultra-Edge Devices
Authors:
Sideshwar J B,
Sachin Krishan T,
Vishal Nagarajan,
Shanthakumar S,
Vineeth Vijayaraghavan
Abstract:
Atrial fibrillation (AF) is the most prevalent cardiac arrhythmia worldwide, with 2% of the population affected. It is associated with an increased risk of strokes, heart failure and other heart-related complications. Monitoring at-risk individuals and detecting asymptomatic AF could result in considerable public health benefits, as individuals with asymptomatic AF could take preventive measures w…
▽ More
Atrial fibrillation (AF) is the most prevalent cardiac arrhythmia worldwide, with 2% of the population affected. It is associated with an increased risk of strokes, heart failure and other heart-related complications. Monitoring at-risk individuals and detecting asymptomatic AF could result in considerable public health benefits, as individuals with asymptomatic AF could take preventive measures with lifestyle changes. With increasing affordability to wearables, personalized health care is becoming more accessible. These personalized healthcare solutions require accurate classification of bio-signals while being computationally inexpensive. By making inferences on-device, we avoid issues inherent to cloud-based systems such as latency and network connection dependency. We propose an efficient pipeline for real-time Atrial Fibrillation Detection with high accuracy that can be deployed in ultra-edge devices. The feature engineering employed in this research catered to optimizing the resource-efficient classifier used in the proposed pipeline, which was able to outperform the best performing standard ML model by $10^5\times$ in terms of memory footprint with a mere trade-off of 2% classification accuracy. We also obtain higher accuracy of approximately 6% while consuming 403$\times$ lesser memory and being 5.2$\times$ faster compared to the previous state-of-the-art (SoA) embedded implementation.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Non-Adaptive Stochastic Score Classification and Explainable Halfspace Evaluation
Authors:
Rohan Ghuge,
Anupam Gupta,
Viswanath Nagarajan
Abstract:
Sequential testing problems involve a complex system with several components, each of which is "working" with some independent probability. The outcome of each component can be determined by performing a test, which incurs some cost. The overall system status is given by a function $f$ of the outcomes of its components. The goal is to evaluate this function $f$ by performing tests at the minimum e…
▽ More
Sequential testing problems involve a complex system with several components, each of which is "working" with some independent probability. The outcome of each component can be determined by performing a test, which incurs some cost. The overall system status is given by a function $f$ of the outcomes of its components. The goal is to evaluate this function $f$ by performing tests at the minimum expected cost. While there has been extensive prior work on this topic, provable approximation bounds are mainly limited to simple functions like ``k-out-of-n'' and halfspaces. We consider significantly more general "score classification" functions, and provide the first constant factor approximation algorithm (improving over a previous logarithmic approximation ratio). Moreover, our policy is non adaptive: it just involves performing tests in an a priori fixed order. We also consider the related halfspace evaluation problem, where we want to evaluate some function on $d$ halfspaces (e.g., intersection of halfspaces). We show that our approach provides an $O(d^2\log d)$-approximation algorithm for this problem. Our algorithms also extend to the setting of "batched'' tests, where multiple tests can be performed simultaneously while incurring an extra setup cost. Finally, we perform computational experiments that demonstrate the practical performance of our algorithm for score classification. We observe that, for most instances, the cost of our algorithm is within $50\%$ of an information-theoretic lower bound on the optimal value.
△ Less
Submitted 19 August, 2023; v1 submitted 10 November, 2021;
originally announced November 2021.
-
Explaining generalization in deep learning: progress and fundamental limits
Authors:
Vaishnavh Nagarajan
Abstract:
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error?
In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show ho…
▽ More
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error?
In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count.
Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound.
With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision.
We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.
△ Less
Submitted 17 October, 2021;
originally announced October 2021.
-
Efficient Algorithms for Stochastic Ridepooling Assignment with Mixed Fleets
Authors:
Qi Luo,
Viswanath Nagarajan,
Alexander Sundt,
Yafeng Yin,
John Vincent,
Mehrdad Shahabi
Abstract:
Ride-pooling, which accommodates multiple passenger requests in a single trip, has the potential to significantly increase fleet utilization in shared mobility platforms. The ride-pooling assignment problem finds optimal co-riders to maximize the total utility or profit on a shareability graph, a hypergraph representing the matching compatibility between available vehicles and pending requests. Wi…
▽ More
Ride-pooling, which accommodates multiple passenger requests in a single trip, has the potential to significantly increase fleet utilization in shared mobility platforms. The ride-pooling assignment problem finds optimal co-riders to maximize the total utility or profit on a shareability graph, a hypergraph representing the matching compatibility between available vehicles and pending requests. With mixed fleets due to the introduction of automated or premium vehicles, fleet sizing and relocation decisions should be made before the requests are revealed. Due to the immense size of the underlying shareability graph and demand uncertainty, it is impractical to use exact methods to calculate the optimal trip assignments. Two approximation algorithms for mid-capacity and high-capacity vehicles are proposed in this paper; The respective approximation ratios are $\frac1{p^2}$ and $\frac{e-1}{(2e+o(1)) p \ln p}$, where $p$ is the maximum vehicle capacity plus one. The performance of these algorithms is validated using a mixed autonomy on-demand mobility simulator. These efficient algorithms serve as a stepping stone for a variety of multimodal and multiclass on-demand mobility applications.
△ Less
Submitted 14 April, 2022; v1 submitted 19 August, 2021;
originally announced August 2021.
-
The Power of Adaptivity for Stochastic Submodular Cover
Authors:
Rohan Ghuge,
Anupam Gupta,
Viswanath Nagarajan
Abstract:
In the stochastic submodular cover problem, the goal is to select a subset of stochastic items of minimum expected cost to cover a submodular function. Solutions in this setting correspond to sequential decision processes that select items one by one "adaptively" (depending on prior observations). While such adaptive solutions achieve the best objective, the inherently sequential nature makes them…
▽ More
In the stochastic submodular cover problem, the goal is to select a subset of stochastic items of minimum expected cost to cover a submodular function. Solutions in this setting correspond to sequential decision processes that select items one by one "adaptively" (depending on prior observations). While such adaptive solutions achieve the best objective, the inherently sequential nature makes them undesirable in many applications. We ask: how well can solutions with only a few adaptive rounds approximate fully-adaptive solutions? We give nearly tight answers for both independent and correlated settings, proving smooth tradeoffs between the number of adaptive rounds and the solution quality, relative to fully adaptive solutions. Experiments on synthetic and real datasets show qualitative improvements in the solutions as we allow more rounds of adaptivity; in practice, solutions with a few rounds of adaptivity are nearly as good as fully adaptive solutions.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
Assessing Generalization of SGD via Disagreement
Authors:
Yiding Jiang,
Vaishnavh Nagarajan,
Christina Baek,
J. Zico Kolter
Abstract:
We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data. This builds on -- and is a stronger version of -- the observation in Nakkiran & Bansal '20, which requires the s…
▽ More
We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data. This builds on -- and is a stronger version of -- the observation in Nakkiran & Bansal '20, which requires the second run to be on an altogether fresh training set. We further theoretically show that this peculiar phenomenon arises from the \emph{well-calibrated} nature of \emph{ensembles} of SGD-trained models. This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.
△ Less
Submitted 15 May, 2022; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Extending Classic Paxos for High-performance Read-Modify-Write Registers
Authors:
Vasilis Gavrielatos,
Antonios Katsarakis,
Vijay Nagarajan
Abstract:
In this work we provide a detailed specification of how we extended and implemented Classic Paxos (CP) to execute Read-Modify-Writes. In addition, we also specify how we implemented All-aboard Paxos over CP and how we use carstamps, to also add ABD reads and writes, to accelerate the common case, where RMWs are not needed. Our specification targets a Key-Value-Store that is deployed within the dat…
▽ More
In this work we provide a detailed specification of how we extended and implemented Classic Paxos (CP) to execute Read-Modify-Writes. In addition, we also specify how we implemented All-aboard Paxos over CP and how we use carstamps, to also add ABD reads and writes, to accelerate the common case, where RMWs are not needed. Our specification targets a Key-Value-Store that is deployed within the datacenter, is replicated across 3 to 7 machines and supports reads, writes and RMWs.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Magnon-spinon dichotomy in the Kitaev hyperhoneycomb $β$-Li$_2$IrO$_3$
Authors:
Alejandro Ruiz,
Nicholas P. Breznay,
Mengqun Li,
Ioannis Rousochatzakis,
Anthony Allen,
Isaac Zinda,
Vikram Nagarajan,
Gilbert Lopez,
Mary H. Upton,
Jungho Kim,
Ayman H. Said,
Xian-Rong Huang,
Thomas Gog,
Diego Casa,
Robert J. Birgeneau,
Jake D. Koralek,
James G. Analytis,
Natalia B. Perkins,
Alex Frano
Abstract:
The family of edge-sharing tri-coordinated iridates and ruthenates has emerged in recent years as a major platform for Kitaev spin liquid physics, where spins fractionalize into emergent magnetic fluxes and Majorana fermions with Dirac-like dispersions. While such exotic states are usually pre-empted by long-range magnetic order at low temperatures, signatures of Majorana fermions with long cohere…
▽ More
The family of edge-sharing tri-coordinated iridates and ruthenates has emerged in recent years as a major platform for Kitaev spin liquid physics, where spins fractionalize into emergent magnetic fluxes and Majorana fermions with Dirac-like dispersions. While such exotic states are usually pre-empted by long-range magnetic order at low temperatures, signatures of Majorana fermions with long coherent times have been predicted to manifest at intermediate and higher energy scales, similar to the observation of spinons in quasi-1D spin chains. Here we present a Resonant Inelastic X-ray Scattering study of the magnetic excitations of the hyperhoneycomb iridate $β$-Li$_2$IrO$_3$ under a magnetic field with a record-high-resolution spectrometer. At low-temperatures, dispersing spin waves can be resolved around the predicted intertwined incommensurate spiral and field-induced zigzag orders, whose excitation energy reaches a maximum of 16meV. A 2T magnetic field softens the dispersion around ${\bf Q}=0$. The behavior of the spin waves under magnetic field is consistent with our semiclassical calculations for the ground state and the dynamical spin structure factor, which further predicts that the ensued intertwined uniform states remain robust up to very high fields (100 T). Most saliently, the low-energy magnon-like mode is superimposed by a broad continuum of excitations, centered around 35meV and extending up to 100meV. This high-energy continuum survives up to at least 300K -- well above the ordering temperature of 38K -- and gives evidence for pairs of long-lived Majorana fermions of the proximate Kitaev spin liquid.
△ Less
Submitted 4 February, 2021; v1 submitted 4 February, 2021;
originally announced February 2021.
-
Super-R BiFeO$_3$: Epitaxial stabilization of a low-symmetry phase with giant electromechanical response
Authors:
Oliver Paull,
Changsong Xu,
Xuan Cheng,
Yangyang Zhang,
Bin Xu,
Kyle Kelley,
Liam Collins,
Alex de Marco,
Rama K. Vasudevan,
Laurent Bellaiche,
Valanoor Nagarajan,
Daniel Sando
Abstract:
Piezoelectrics interconvert mechanical energy and electric charge and are widely used in actuators and sensors. The best performing materials are ferroelectrics at a morphotropic phase boundary (MPB), where several phases can intimately coexist. Switching between these phases by electric field produces a large electromechanical response. In the ferroelectric BiFeO$_3$, strain can be used to create…
▽ More
Piezoelectrics interconvert mechanical energy and electric charge and are widely used in actuators and sensors. The best performing materials are ferroelectrics at a morphotropic phase boundary (MPB), where several phases can intimately coexist. Switching between these phases by electric field produces a large electromechanical response. In the ferroelectric BiFeO$_3$, strain can be used to create an MPB-like phase mixture and thus to generate large electric field dependent strains. However, this enhanced response occurs at localized, randomly positioned regions of the film, which potentially complicates nanodevice design. Here, we use epitaxial strain and orientation engineering in tandem - anisotropic epitaxy - to craft a hitherto unavailable low-symmetry phase of BiFeO$_3$ which acts as a structural bridge between the rhombohedral-like and tetragonal-like polymorphs. Interferometric displacement sensor measurements and first-principle calculations reveal that under external electric bias, this phase undergoes a transition to the tetragonal-like polymorph, generating a piezoelectric response enhanced by over 200%, and associated giant field-induced reversible strain. These results offer a new route to engineer giant electromechanical properties in thin films, with broader perspectives for other functional oxide systems.
△ Less
Submitted 31 January, 2021;
originally announced February 2021.
-
Evidence for freezing of charge degrees of freedom across a critical point in CeCoIn$_5$
Authors:
Nikola Maksimovic,
Tessa Cookmeyer,
Jan Rusz,
Vikram Nagarajan,
Amanda Gong,
Fanghui Wan,
Stefano Faubel,
Ian M. Hayes,
Sooyoung Jang,
Yochai Werman,
Peter M. Oppeneer,
Ehud Altman,
James G. Analytis
Abstract:
The presence of a quantum critical point separating two distinct zero-temperature phases is thought to underlie the `strange' metal state of many high-temperature superconductors. The nature of this quantum critical point, as well as a description of the resulting strange metal, are central open problems in condensed matter physics. In large part, the controversy stems from the lack of a clear bro…
▽ More
The presence of a quantum critical point separating two distinct zero-temperature phases is thought to underlie the `strange' metal state of many high-temperature superconductors. The nature of this quantum critical point, as well as a description of the resulting strange metal, are central open problems in condensed matter physics. In large part, the controversy stems from the lack of a clear broken symmetry to characterize the critical phase transition, and this challenge is no clearer than in the example of the unconventional superconductor CeCoIn$_5$. Through Hall effect and Fermi surface measurements of CeCoIn$_5$, in comparison to ab initio calculations, we find evidence for a critical point that connects two Fermi surfaces with different volumes without apparent symmetry-breaking, indicating the presence of a transition that involves an abrupt localization of one sector of the charge degrees of freedom. We present a model for the anomalous electrical Hall resistivity of this material based on the conductivity of valence charge fluctuations.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
A Learning Theoretic Perspective on Local Explainability
Authors:
Jeffrey Li,
Vaishnavh Nagarajan,
Gregory Plumb,
Ameet Talwalkar
Abstract:
In this paper, we explore connections between interpretable machine learning and learning theory through the lens of local approximation explanations. First, we tackle the traditional problem of performance generalization and bound the test-time accuracy of a model using a notion of how locally explainable it is. Second, we explore the novel problem of explanation generalization which is an import…
▽ More
In this paper, we explore connections between interpretable machine learning and learning theory through the lens of local approximation explanations. First, we tackle the traditional problem of performance generalization and bound the test-time accuracy of a model using a notion of how locally explainable it is. Second, we explore the novel problem of explanation generalization which is an important concern for a growing class of finite sample-based local approximation explanations. Finally, we validate our theoretical results empirically and show that they reflect what can be seen in practice.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Understanding the Failure Modes of Out-of-Distribution Generalization
Authors:
Vaishnavh Nagarajan,
Anders Andreassen,
Behnam Neyshabur
Abstract:
Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expe…
▽ More
Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expect these models to succeed. In particular, through a theoretical study of gradient-descent-trained linear classifiers on some easy-to-learn tasks, we uncover two complementary failure modes. These modes arise from how spurious correlations induce two kinds of skews in the data: one geometric in nature, and another, statistical in nature. Finally, we construct natural modifications of image classification datasets to understand when these failure modes can arise in practice. We also design experiments to isolate the two failure modes when training modern neural networks on these datasets.
△ Less
Submitted 29 April, 2021; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Detection of odor quality and ripening stage of Mangifera indica L. by graphdiyne nanosheet -- a DFT outlook
Authors:
V. Nagarajan,
R. Chandiramouli
Abstract:
Using first-principles calculation, geometrical stability together with electronic properties of graphdiyne nanosheet (Gdn-NS) is investigated. The structural stability of Gdn-NS is established with the support of phonon band structure and cohesive energy. The main objective of the present study is to check the odor quality of Mangifera indica L. (mangoes) fruits during the various ripening stage…
▽ More
Using first-principles calculation, geometrical stability together with electronic properties of graphdiyne nanosheet (Gdn-NS) is investigated. The structural stability of Gdn-NS is established with the support of phonon band structure and cohesive energy. The main objective of the present study is to check the odor quality of Mangifera indica L. (mangoes) fruits during the various ripening stage with the influence of Gdn-NS material. In addition, the adsorption of various volatiles, namely ethyl butanoate, myrcene, (E,Z,Z)-1,3,4,8-undecatetraene and $γ$-octalactone aromas on Gdn-NS is explored with the significant parameters including Bader charge transfer, energy gap, average energy gap changes and adsorption energy. The sensitivity of volatiles emitting from various ripening stages of mango on Gdn-NS were explored with the influence of density of states spectrum. The outcomes of the proposed work help us to check the ripening stage and odor quality of Mangifera indica L. by Gdn-NS material using density functional theory.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Online Generalized Network Design Under (Dis)Economies of Scale
Authors:
Viswanath Nagarajan,
Lily Wang
Abstract:
We consider a general online network design problem where a sequence of N requests arrive over time, each of which needs to use some subset of the available resources E. The cost incurred by any resource e is some function $f_e$ of the total load $L_e$ on that resource. The objective is to minimize the total cost $\sum_{e\in E} f_e(L_e)$. We focus on cost functions that exhibit (dis)economies of s…
▽ More
We consider a general online network design problem where a sequence of N requests arrive over time, each of which needs to use some subset of the available resources E. The cost incurred by any resource e is some function $f_e$ of the total load $L_e$ on that resource. The objective is to minimize the total cost $\sum_{e\in E} f_e(L_e)$. We focus on cost functions that exhibit (dis)economies of scale, that are of the form $f_e(x) = σ_e + ξ_e\cdot x^{α_e}$ if $x>0$ (and zero if $x=0$), where the exponent $α_e\ge 1$. Optimization problems under these functions have received significant recent attention due to applications in energy-efficient computing. Our main result is a deterministic online algorithm with tight competitive ratio $Θ\left(\max_{e\in E} \left(\frac{σ_e}{ξ_e}\right)^{1/α_e}\right)$ when $α_e$ is constant for all $e\in E$. This framework is applicable to a variety of network design problems in undirected and directed graphs, including multicommodity routing, Steiner tree/forest connectivity and set-connectivity. In fact, our online competitive ratio even matches the previous-best (offline) approximation ratio for generalized network design.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Magnetoresistance scaling, disorder, `hot spots' and the origin of $T$-linear resistivity in BaFe$_2$(As$_{1-x}$P$_x$)$_2$
Authors:
Nikola Maksimovic,
Ian M. Hayes,
Vikram Nagarajan,
Alexei E. Koshelev,
John Singleton,
Yeonbae Lee,
Thomas Schenkel,
James G. Analytis
Abstract:
The scaling of $H$-linear magnetoresistance in field and temperature was measured in under-doped (x = 0.19) and optimally-doped (x=0.31)~BaFe$_2$(As$_{1-x}$P$_x$)$_2$. We analyze the data based on an orbital model in the presence of strongly anisotropic quasiparticle spectra and scattering time due to antiferromagnetism. The magnetoresistance is dominated by the properties of small regions of the…
▽ More
The scaling of $H$-linear magnetoresistance in field and temperature was measured in under-doped (x = 0.19) and optimally-doped (x=0.31)~BaFe$_2$(As$_{1-x}$P$_x$)$_2$. We analyze the data based on an orbital model in the presence of strongly anisotropic quasiparticle spectra and scattering time due to antiferromagnetism. The magnetoresistance is dominated by the properties of small regions of the Fermi surface called `hot spots' where antiferromagnetic excitations induce a large quasiparticle scattering rate. Approximate temperature-magnetic field scaling relations are derived and shown to be consistent with the experimental data. We argue that these results link the origin of linear-in-temperature resistivity to hot spots arising from an antiferromagnetic critical point, and magnetoresistance measurements provide a route to quantify this link.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
Provably Safe PAC-MDP Exploration Using Analogies
Authors:
Melrose Roderick,
Vaishnavh Nagarajan,
J. Zico Kolter
Abstract:
A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the…
▽ More
A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods.
△ Less
Submitted 22 March, 2021; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Stochastic Makespan Minimization in Structured Set Systems
Authors:
Anupam Gupta,
Amit Kumar,
Viswanath Nagarajan,
Xiangkun Shen
Abstract:
We study stochastic combinatorial optimization problems where the objective is to minimize the expected maximum load (a.k.a.\ the makespan). In this framework, we have a set of $n$ tasks and $m$ resources, where each task $j$ uses some subset of the resources. Tasks have random sizes $X_j$, and our goal is to non-adaptively select $t$ tasks to minimize the expected maximum load over all resources,…
▽ More
We study stochastic combinatorial optimization problems where the objective is to minimize the expected maximum load (a.k.a.\ the makespan). In this framework, we have a set of $n$ tasks and $m$ resources, where each task $j$ uses some subset of the resources. Tasks have random sizes $X_j$, and our goal is to non-adaptively select $t$ tasks to minimize the expected maximum load over all resources, where the load on any resource $i$ is the total size of all selected tasks that use $i$. For example, when resources are points and tasks are intervals in a line, we obtain an $O(\log\log m)$-approximation algorithm. Our technique is also applicable to other problems with some geometric structure in the relation between tasks and resources; e.g., packing paths, rectangles, and "fat" objects. Our approach uses a strong LP relaxation using the cumulant generating functions of the random variables. We also show that this LP has an $Ω(\log^* m)$ integrality gap, even for the problem of selecting intervals on a line; here $\log^* m$ is the iterated logarithm function.
△ Less
Submitted 24 June, 2021; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Competition between magnetic order and charge localization in Na$_2$IrO$_3$ thin crystal devices
Authors:
Josue Rodriguez,
Gilbert Lopez,
Samantha Crouch,
Nicholas P. Breznay,
Robert Kealhofer,
Vikram Nagarajan,
Drew Latzke,
Francisco Ramirez,
Naomy Marrufo,
Peter Santiago,
Jared Lara,
Amirari Diego,
Everardo Molina,
David Rosser,
Hadi Tavassol,
Alessandra Lanzara,
James G. Analytis,
Claudia Ojeda-Aristizabal
Abstract:
Spin orbit assisted Mott insulators such as sodium iridate (Na$_2$IrO$_3$) have been an important subject of study in the recent years. In these materials, the interplay of electronic correlations, spin-orbit coupling, crystal field effects and a honeycomb arrangement of ions bring exciting ground states, predicted in the frame of the Kitaev model. The insulating character of Na$_2$IrO$_3$ has ham…
▽ More
Spin orbit assisted Mott insulators such as sodium iridate (Na$_2$IrO$_3$) have been an important subject of study in the recent years. In these materials, the interplay of electronic correlations, spin-orbit coupling, crystal field effects and a honeycomb arrangement of ions bring exciting ground states, predicted in the frame of the Kitaev model. The insulating character of Na$_2$IrO$_3$ has hampered its integration to an electronic device, desirable for applications, such as the manipulation of quasiparticles interesting for topological quantum computing. Here we show through electronic transport measurements supported by Angle Resolved Photoemission Spectroscopy (ARPES) experiments, that electronic transport in Na$_2$IrO$_3$ is ruled by variable range hopping and it is strongly dependent on the magnetic ordering transition known for bulk Na$_2$IrO$_3$, as well as on external electric fields. Electronic transport measurements allow us to deduce a value for the localization length and the density of states in our Na$_2$IrO$_3$ thin crystals devices, offering an alternative approach to study insulating layered materials.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol
Authors:
A. Katsarakis,
V. Gavrielatos,
M. Katebzadeh,
A. Joshi,
A. Dragojevic,
B. Grot,
V. Nagarajan
Abstract:
Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when fault…
▽ More
Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency.
This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6X lower than that of CRAQ and ZAB.
△ Less
Submitted 27 January, 2020;
originally announced January 2020.
-
Silicene nanosheet to discriminate the quality of pear fruit based on volatiles adsorption --- a DFT application
Authors:
R. Keerthi Bhavadharani,
V. Nagarajan,
R. Chandiramouli
Abstract:
We report the interaction between the silicene nanosheet (Si-NS) and volatile organic compounds (VOCs) released from the pear fruit (Pyrus communis) in ripened and over-ripened stages using density functional theory (DFT) technique. The geometric stability of Si-NS is studied from the phonon band structure. Further, the electronic property of Si-NS is studied from the energy band gap structure, an…
▽ More
We report the interaction between the silicene nanosheet (Si-NS) and volatile organic compounds (VOCs) released from the pear fruit (Pyrus communis) in ripened and over-ripened stages using density functional theory (DFT) technique. The geometric stability of Si-NS is studied from the phonon band structure. Further, the electronic property of Si-NS is studied from the energy band gap structure, and the energy gap is found to be 0.46 eV, which exhibits semiconductor property. The outcomes infer that the adsorption of volatiles released from the pear fruit on silicene nanosheet is in the following order hexyl acetate $\rightarrow$ butyl acetate $\rightarrow$ butyl butyrate in the ripened stage whereas in the over-ripened stage the adsorption sequence is noticed to be acetic acid $\rightarrow$ ethyl acetate $\rightarrow$ 1-butanol. The adsorption property of pear fruit volatiles on silicene nanosheet is documented with the adsorption energy, average energy gap changes, and Bader charge transfer. Moreover, the adsorption of VOCs on silicene nanosheet is also explored using the energy band structure, electron density along with the adsorption sites and density of states (DOS) spectrum. Besides, the findings reveal that the silicene nanosheet can be used to discriminate the quality of pear fruit.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Hidden spin-orbital order in the Kitaev hyperhoneycomb $β$-Li$_2$IrO$_3$
Authors:
Alejandro Ruiz,
Vikram Nagarajan,
Mayia Vranas,
Gilbert Lopez,
Gregory T. McCandless,
Itamar Kimchi,
Julia Y. Chan,
Nicholas P. Breznay,
Alex Frano,
Benjamin A. Frandsen,
James G. Analytis
Abstract:
We report the existence of a phase transition at high temperature in the 3D Kitaev candidate material, $β$-Li$_2$IrO$_3$. We show that the transition is bulk, intrinsic and orders a tiny magnetic moment with a spatially anisotropic saturation moment. We show that even though this transition is global, it does not freeze the local Ir moments, which order at much lower temperatures into an incommens…
▽ More
We report the existence of a phase transition at high temperature in the 3D Kitaev candidate material, $β$-Li$_2$IrO$_3$. We show that the transition is bulk, intrinsic and orders a tiny magnetic moment with a spatially anisotropic saturation moment. We show that even though this transition is global, it does not freeze the local Ir moments, which order at much lower temperatures into an incommensurate state. Rather, the ordered moment has an orbital origin that is coupled to spin correlations, likely of a Kitaev origin. The separate ordering of spin-correlated orbital moments and of local Ir moments reveals a novel way in which magnetic frustration in Kitaev systems can lead to coexisting magnetic states.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience
Authors:
Vaishnavh Nagarajan,
J. Zico Kolter
Abstract:
The ability of overparameterized deep networks to generalize well has been linked to the fact that stochastic gradient descent (SGD) finds solutions that lie in flat, wide minima in the training loss -- minima where the output of the network is resilient to small random noise added to its parameters. So far this observation has been used to provide generalization guarantees only for neural network…
▽ More
The ability of overparameterized deep networks to generalize well has been linked to the fact that stochastic gradient descent (SGD) finds solutions that lie in flat, wide minima in the training loss -- minima where the output of the network is resilient to small random noise added to its parameters. So far this observation has been used to provide generalization guarantees only for neural networks whose parameters are either \textit{stochastic} or \textit{compressed}. In this work, we present a general PAC-Bayesian framework that leverages this observation to provide a bound on the original network learned -- a network that is deterministic and uncompressed. What enables us to do this is a key novelty in our approach: our framework allows us to show that if on training data, the interactions between the weight matrices satisfy certain conditions that imply a wide training loss minimum, these conditions themselves {\em generalize} to the interactions between the matrices on test data, thereby implying a wide test loss minimum. We then apply our general framework in a setup where we assume that the pre-activation values of the network are not too small (although we assume this only on the training data). In this setup, we provide a generalization guarantee for the original (deterministic, uncompressed) network, that does not scale with product of the spectral norms of the weight matrices -- a guarantee that would not have been possible with prior approaches.
△ Less
Submitted 30 May, 2019;
originally announced May 2019.
-
Stochastic Load Balancing on Unrelated Machines
Authors:
Anupam Gupta,
Amit Kumar,
Viswanath Nagarajan,
Xiangkun Shen
Abstract:
We consider the problem of makespan minimization on unrelated machines when job sizes are stochastic. The goal is to find a fixed assignment of jobs to machines, to minimize the expected value of the maximum load over all the machines. For the identical machines special case when the size of a job is the same across all machines, a constant-factor approximation algorithm has long been known. Our m…
▽ More
We consider the problem of makespan minimization on unrelated machines when job sizes are stochastic. The goal is to find a fixed assignment of jobs to machines, to minimize the expected value of the maximum load over all the machines. For the identical machines special case when the size of a job is the same across all machines, a constant-factor approximation algorithm has long been known. Our main result is the first constant-factor approximation algorithm for the general case of unrelated machines. This is achieved by (i) formulating a lower bound using an exponential-size linear program that is efficiently computable, and (ii) rounding this linear program while satisfying only a specific subset of the constraints that still suffice to bound the expected makespan. We also consider two generalizations. The first is the budgeted makespan minimization problem, where the goal is to minimize the expected makespan subject to scheduling a target number (or reward) of jobs. We extend our main result to obtain a constant-factor approximation algorithm for this problem. The second problem involves $q$-norm objectives, where we want to minimize the expected q-norm of the machine loads. Here we give an $O(q/\log q)$-approximation algorithm, which is a constant-factor approximation for any fixed $q$.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.