Search | arXiv e-print repository

arXiv:2405.20439 [pdf, other]

Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning

Authors: Jacob Mitchell Springer, Vaishnavh Nagarajan, Aditi Raghunathan

Abstract: Sharpness-Aware Minimization (SAM) has emerged as a promising alternative optimizer to stochastic gradient descent (SGD). The originally-proposed motivation behind SAM was to bias neural networks towards flatter minima that are believed to generalize better. However, recent studies have shown conflicting evidence on the relationship between flatness and generalization, suggesting that flatness doe… ▽ More Sharpness-Aware Minimization (SAM) has emerged as a promising alternative optimizer to stochastic gradient descent (SGD). The originally-proposed motivation behind SAM was to bias neural networks towards flatter minima that are believed to generalize better. However, recent studies have shown conflicting evidence on the relationship between flatness and generalization, suggesting that flatness does fully explain SAM's success. Sidestepping this debate, we identify an orthogonal effect of SAM that is beneficial out-of-distribution: we argue that SAM implicitly balances the quality of diverse features. SAM achieves this effect by adaptively suppressing well-learned features which gives remaining features opportunity to be learned. We show that this mechanism is beneficial in datasets that contain redundant or spurious features where SGD falls for the simplicity bias and would not otherwise learn all available features. Our insights are supported by experiments on real data: we demonstrate that SAM improves the quality of features in datasets containing redundant or spurious features, including CelebA, Waterbirds, CIFAR-MNIST, and DomainBed. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 25 pages, 10 figures, 2 tables

arXiv:2405.14995 [pdf, other]

Lower Bound on the Greedy Approximation Ratio for Adaptive Submodular Cover

Authors: Blake Harris, Viswanath Nagarajan

Abstract: We show that the greedy algorithm for adaptive-submodular cover has approximation ratio at least 1.3*(1+ln Q). Moreover, the instance demonstrating this gap has Q=1. So, it invalidates a prior result in the paper ``Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization'' by Golovin-Krause, that claimed a (1+ln Q)^2 approximation ratio for the same algorithm. We show that the greedy algorithm for adaptive-submodular cover has approximation ratio at least 1.3*(1+ln Q). Moreover, the instance demonstrating this gap has Q=1. So, it invalidates a prior result in the paper ``Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization'' by Golovin-Krause, that claimed a (1+ln Q)^2 approximation ratio for the same algorithm. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 7 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:2208.08351

arXiv:2403.06963 [pdf, other]

The pitfalls of next-token prediction

Authors: Gregor Bachmann, Vaishnavh Nagarajan

Abstract: Can a mere next-token predictor faithfully model human intelligence? We crystallize this emerging concern and correct popular misconceptions surrounding it, and advocate a simple multi-token objective. As a starting point, we argue that the two often-conflated phases of next-token prediction -- autoregressive inference and teacher-forced training -- must be treated distinctly. The popular critic… ▽ More Can a mere next-token predictor faithfully model human intelligence? We crystallize this emerging concern and correct popular misconceptions surrounding it, and advocate a simple multi-token objective. As a starting point, we argue that the two often-conflated phases of next-token prediction -- autoregressive inference and teacher-forced training -- must be treated distinctly. The popular criticism that errors can compound during autoregressive inference, crucially assumes that teacher-forcing has learned an accurate next-token predictor. This assumption sidesteps a more deep-rooted problem we expose: in certain classes of tasks, teacher-forcing can simply fail to learn an accurate next-token predictor in the first place. We describe a general mechanism of how teacher-forcing can fail, and design a minimal planning task where both the Transformer and the Mamba architecture empirically fail in that manner -- remarkably, despite the task being straightforward to learn. Finally, we provide preliminary evidence that this failure can be resolved using a simple modification that predicts multiple tokens in advance. We hope this finding can ground future debates and inspire explorations beyond the next-token prediction paradigm. We make our code available under https://github.com/gregorbachmann/Next-Token-Failures △ Less

Submitted 5 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: ICML 2024

arXiv:2312.15427 [pdf, other]

Semi-Bandit Learning for Monotone Stochastic Optimization

Authors: Arpit Agarwal, Rohan Ghuge, Viswanath Nagarajan

Abstract: Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get… ▽ More Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get good (approximation) algorithms if these distributions are unknown, and the algorithm needs to learn them through repeated interactions? In this paper, we resolve this question for a large class of "monotone" stochastic problems, by providing a generic online learning algorithm with $\sqrt{T \log T}$ regret relative to the best approximation algorithm (under known distributions). Importantly, our online algorithm works in a semi-bandit setting, where in each period, the algorithm only observes samples from the r.v.s that were actually probed. Our framework applies to several fundamental problems in stochastic optimization such as prophet inequality, Pandora's box, stochastic knapsack, stochastic matchings and stochastic submodular optimization. △ Less

Submitted 24 December, 2023; originally announced December 2023.

arXiv:2312.15357 [pdf, other]

Optimal Decision Tree with Noisy Outcomes

Authors: Su Jia, Fatemeh Navidi, Viswanath Nagarajan, R. Ravi

Abstract: In pool-based active learning, the learner is given an unlabeled data set and aims to efficiently learn the unknown hypothesis by querying the labels of the data points. This can be formulated as the classical Optimal Decision Tree (ODT) problem: Given a set of tests, a set of hypotheses, and an outcome for each pair of test and hypothesis, our objective is to find a low-cost testing procedure (i.… ▽ More In pool-based active learning, the learner is given an unlabeled data set and aims to efficiently learn the unknown hypothesis by querying the labels of the data points. This can be formulated as the classical Optimal Decision Tree (ODT) problem: Given a set of tests, a set of hypotheses, and an outcome for each pair of test and hypothesis, our objective is to find a low-cost testing procedure (i.e., decision tree) that identifies the true hypothesis. This optimization problem has been extensively studied under the assumption that each test generates a deterministic outcome. However, in numerous applications, for example, clinical trials, the outcomes may be uncertain, which renders the ideas from the deterministic setting invalid. In this work, we study a fundamental variant of the ODT problem in which some test outcomes are noisy, even in the more general case where the noise is persistent, i.e., repeating a test gives the same noisy output. Our approximation algorithms provide guarantees that are nearly best possible and hold for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades continuously with this number. We numerically evaluated our algorithms for identifying toxic chemicals and learning linear classifiers, and observed that our algorithms have costs very close to the information-theoretic minimum. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.04849 [pdf]

Low Resistance Ohmic Contact to P-type Monolayer WSe2

Authors: Jingxu Xie, Zuocheng Zhang, Haodong Zhang, Vikram Nagarajan, Wenyu Zhao, Haleem Kim, Collin Sanborn, Ruishi Qi, Sudi Chen, Salman Kahn, Kenji Watanabe, Takashi Taniguchi, Alex Zettl, Michael Crommie, James Analytis, Feng Wang

Abstract: Advanced microelectronics in the future may require semiconducting channel materials beyond silicon. Two-dimensional (2D) semiconductors, characterized by their atomically thin thickness, hold immense promise for high-performance electronic devices at the nanometer scale with lower heat dissipation. One challenge for achieving high-performance 2D semiconductor field effect transistors (FET), espec… ▽ More Advanced microelectronics in the future may require semiconducting channel materials beyond silicon. Two-dimensional (2D) semiconductors, characterized by their atomically thin thickness, hold immense promise for high-performance electronic devices at the nanometer scale with lower heat dissipation. One challenge for achieving high-performance 2D semiconductor field effect transistors (FET), especially for p-type materials, is the high electrical contact resistance present at the metal-semiconductor interface. In conventional bulk semiconductors, low resistance ohmic contact is realized through heavy substitutional doping with acceptor or donor impurities at the contact region. The strategy of substitutional doping, however, does not work for p-type 2D semiconductors such as monolayer tungsten diselenide (WSe$_2$).In this study, we developed highly efficient charge-transfer doping with WSe$_2$/$αあるふぁ$-RuCl$_3$ heterostructures to achieve low-resistance ohmic contact for p-type WSe$_2$ transistors. We show that a hole doping as high as 3$\times$10$^{13}$ cm$^{-2}$ can be achieved in the WSe$_2/αあるふぁ$-RuCl$_3$ heterostructure due to its type-III band alignment. It results in an Ohmic contact with resistance lower than 4 k Ohm $μみゅー$m at the p-type monolayer WSe$_2$/metal junction. at room temperature. Using this low-resistance contact, we demonstrate high-performance p-type WSe$_2$ transistors with a saturation current of 35 $μみゅー$A$\cdot$ $μみゅー$m$^{-1}$ and an I$_{ON}$/I$_{OFF}$ ratio exceeding 10$^9$ It could enable future microelectronic devices based on 2D semiconductors and contribute to the extension of Moore's law. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.12698 [pdf, other]

Informative Path Planning with Limited Adaptivity

Authors: Rayen Tan, Rohan Ghuge, Viswanath Nagarajan

Abstract: We consider the informative path planning ($\mathtt{IPP}$) problem in which a robot interacts with an uncertain environment and gathers information by visiting locations. The goal is to minimize its expected travel cost to cover a given submodular function. Adaptive solutions, where the robot incorporates all available information to select the next location to visit, achieve the best objective. H… ▽ More We consider the informative path planning ($\mathtt{IPP}$) problem in which a robot interacts with an uncertain environment and gathers information by visiting locations. The goal is to minimize its expected travel cost to cover a given submodular function. Adaptive solutions, where the robot incorporates all available information to select the next location to visit, achieve the best objective. However, such a solution is resource-intensive as it entails recomputing after every visited location. A more practical approach is to design solutions with a small number of adaptive "rounds", where the robot recomputes only once at the start of each round. In this paper, we design an algorithm for $\mathtt{IPP}$ parameterized by the number $k$ of adaptive rounds, and prove a smooth trade-off between $k$ and the solution quality (relative to fully adaptive solutions). We validate our theoretical results by experiments on a real road network, where we observe that a few rounds of adaptivity suffice to obtain solutions of cost almost as good as fully-adaptive ones. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 35 pages, 9 figures

arXiv:2311.09168 [pdf, other]

doi 10.1145/3650200.3656601

Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing

Authors: Durga Mandarapu, Vani Nagarajan, Artem Pelenitsyn, Milind Kulkarni

Abstract: High-performance implementations of $k$-Nearest Neighbor Search ($k$NN) in low dimensions use tree-based data structures. Tree algorithms are hard to parallelize on GPUs due to their irregularity. However, newer Nvidia GPUs offer hardware support for tree operations through ray-tracing cores. Recent works have proposed using RT cores to implement $k$NN search, but they all have a hardware-imposed… ▽ More High-performance implementations of $k$-Nearest Neighbor Search ($k$NN) in low dimensions use tree-based data structures. Tree algorithms are hard to parallelize on GPUs due to their irregularity. However, newer Nvidia GPUs offer hardware support for tree operations through ray-tracing cores. Recent works have proposed using RT cores to implement $k$NN search, but they all have a hardware-imposed constraint on the distance metric used in the search -- the Euclidean distance. We propose and implement two reductions to support $k$NN for a broad range of distances other than the Euclidean distance: Arkade Filter-Refine and Arkade Monotone Transformation, each of which allows non-Euclidean distance-based nearest neighbor queries to be performed in terms of the Euclidean distance. With our reductions, we observe that $k$NN search time speedups range between $1.6$x-$200$x and $1.3$x-$33.1$x over various state-of-the-art GPU shader core and RT core baselines, respectively. In evaluation, we provide several insights on RT architectures' ability to efficiently build and traverse the tree by analyzing the $k$NN search time trends. △ Less

Submitted 21 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.05337 [pdf, other]

What do larger image classifiers memorise?

Authors: Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

Abstract: The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels. To carefully study this issue, Feldman proposed a metric to quantify the degree of memorisation of individual training examples, and empirically computed the correspondi… ▽ More The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels. To carefully study this issue, Feldman proposed a metric to quantify the degree of memorisation of individual training examples, and empirically computed the corresponding memorisation profile of a ResNet on image classification bench-marks. While an exciting first glimpse into what real-world models memorise, this leaves open a fundamental question: do larger neural models memorise more? We present a comprehensive empirical analysis of this question on image classification benchmarks. We find that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes: most samples experience decreased memorisation under larger models, while the rest exhibit cap-shaped or increasing memorisation. We show that various proxies for the Feldman memorization score fail to capture these fundamental trends. Lastly, we find that knowledge distillation, an effective and popular model compression technique, tends to inhibit memorisation, while also improving generalisation. Specifically, memorisation is mostly inhibited on examples with increasing memorisation trajectories, thus pointing at how distillation improves generalisation. △ Less

Submitted 8 October, 2023; originally announced October 2023.

MSC Class: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Machine Learning (stat.ML)

arXiv:2310.04680 [pdf, other]

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

Authors: Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

Abstract: How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-conte… ▽ More How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.02226 [pdf, other]

Think before you speak: Training Language Models With Pause Tokens

Authors: Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan

Abstract: Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing training and inference on lan… ▽ More Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing training and inference on language models with a (learnable) $\textit{pause}$ token, a sequence of which is appended to the input prefix. We then delay extracting the model's outputs until the last pause token is seen, thereby allowing the model to process extra computation before committing to an answer. We empirically evaluate $\textit{pause-training}$ on decoder-only models of 1B and 130M parameters with causal pretraining on C4, and on downstream tasks covering reasoning, question-answering, general understanding and fact recall. Our main finding is that inference-time delays show gains when the model is both pre-trained and finetuned with delays. For the 1B model, we witness gains on 8 of 9 tasks, most prominently, a gain of $18\%$ EM score on the QA task of SQuAD, $8\%$ on CommonSenseQA and $1\%$ accuracy on the reasoning task of GSM8k. Our work raises a range of conceptual and practical future research questions on making delayed next-token prediction a widely applicable new paradigm. △ Less

Submitted 20 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Published at ICLR 2024

arXiv:2308.12500 [pdf]

Strain Dependent Spin Hall Magnetoresistance in the Multiferroic Antiferromagnet BiFeO$_3$

Authors: D. Sando, S. Chen, O. Paull, B. Xu, J. J. L. van Rijn, C. Xu, S. Xu, F. Appert, J. Juraszek, L. Bellaiche, V. Nagarajan, T. Banerjee

Abstract: The spin Hall magnetoresistance (SMR) of epitaxial BiFeO$_3$ thin films is investigated. SMR consistent with ferromagnetic interfacial states for BiFeO$_3$ films fabricated on (001) SrTiO$_3$ (R' BFO) and LaAlO$_3$ (T' BFO) substrates is found, albeit with different temperature dependencies. For T' BFO, the SMR is enhanced at room temperature, and decays with reduced temperatures. By contrast, R'… ▽ More The spin Hall magnetoresistance (SMR) of epitaxial BiFeO$_3$ thin films is investigated. SMR consistent with ferromagnetic interfacial states for BiFeO$_3$ films fabricated on (001) SrTiO$_3$ (R' BFO) and LaAlO$_3$ (T' BFO) substrates is found, albeit with different temperature dependencies. For T' BFO, the SMR is enhanced at room temperature, and decays with reduced temperatures. By contrast, R' BFO shows a monotonic decrease in SMR response with increasing temperature, mirroring the trend of a weak ferromagnet. Density functional theory shows that this difference originates from the coupling of the applied magnetic field to oxygen octahedral rotation (R') and spin (T') degrees of freedom. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: 14 pages incl. 3 figures

arXiv:2305.18356 [pdf, other]

RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search

Authors: Vani Nagarajan, Durga Mandarapu, Milind Kulkarni

Abstract: The problem of identifying the k-Nearest Neighbors (kNNS) of a point has proven to be very useful both as a standalone application and as a subroutine in larger applications. Given its far-reaching applicability in areas such as machine learning and point clouds, extensive research has gone into leveraging GPU acceleration to solve this problem. Recent work has shown that using Ray Tracing cores i… ▽ More The problem of identifying the k-Nearest Neighbors (kNNS) of a point has proven to be very useful both as a standalone application and as a subroutine in larger applications. Given its far-reaching applicability in areas such as machine learning and point clouds, extensive research has gone into leveraging GPU acceleration to solve this problem. Recent work has shown that using Ray Tracing cores in recent GPUs to accelerate kNNS is much more efficient compared to traditional acceleration using shader cores. However, the existing translation of kNNS to a ray tracing problem imposes a constraint on the search space for neighbors. Due to this, we can only use RT cores to accelerate fixed-radius kNNS, which requires the user to set a search radius a priori and hence can miss neighbors. In this work, we propose TrueKNN, the first unbounded RT-accelerated neighbor search. TrueKNN adopts an iterative approach where we incrementally grow the search space until all points have found their k neighbors. We show that our approach is orders of magnitude faster than existing approaches and can even be used to accelerate fixed-radius neighbor searches. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: This paper has been accepted at the International Conference on Supercomputing 2023 (ICS'23)

arXiv:2305.15760 [pdf, other]

Svarah: Evaluating English ASR Systems on Indian Accents

Authors: Tahir Javed, Sakshi Joshi, Vignesh Nagarajan, Sai Sundaresan, Janki Nawale, Abhigyan Raman, Kaushal Bhogale, Pratyush Kumar, Mitesh M. Khapra

Abstract: India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this… ▽ More India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this work, we address this gap by creating Svarah, a benchmark that contains 9.6 hours of transcribed English audio from 117 speakers across 65 geographic locations throughout India, resulting in a diverse range of accents. Svarah comprises both read speech and spontaneous conversational data, covering various domains, such as history, culture, tourism, etc., ensuring a diverse vocabulary. We evaluate 6 open source ASR models and 2 commercial ASR systems on Svarah and show that there is clear scope for improvement on Indian accents. Svarah as well as all our code will be publicly available. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2303.09655 [pdf, other]

RT-DBSCAN: Accelerating DBSCAN using Ray Tracing Hardware

Authors: Vani Nagarajan, Milind Kulkarni

Abstract: General Purpose computing on Graphical Processing Units (GPGPU) has resulted in unprecedented levels of speedup over its CPU counterparts, allowing programmers to harness the computational power of GPU shader cores to accelerate other computing applications. But this style of acceleration is best suited for regular computations (e.g., linear algebra). Recent GPUs feature new Ray Tracing (RT) cores… ▽ More General Purpose computing on Graphical Processing Units (GPGPU) has resulted in unprecedented levels of speedup over its CPU counterparts, allowing programmers to harness the computational power of GPU shader cores to accelerate other computing applications. But this style of acceleration is best suited for regular computations (e.g., linear algebra). Recent GPUs feature new Ray Tracing (RT) cores that instead speed up the irregular process of ray tracing using Bounding Volume Hierarchies. While these cores seem limited in functionality, they can be used to accelerate n-body problems by leveraging RT cores to accelerate the required distance computations. In this work, we propose RT-DBSCAN, the first RT-accelerated DBSCAN implementation. We use RT cores to accelerate Density-Based Clustering of Applications with Noise (DBSCAN) by translating fixed-radius nearest neighbor queries to ray tracing queries. We show that leveraging the RT hardware results in speedups between 1.3x to 4x over current state-of-the-art, GPU-based DBSCAN implementations. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2302.01576 [pdf, other]

ResMem: Learn what you can and memorize the rest

Authors: Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Sanjiv Kumar

Abstract: The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a ne… ▽ More The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a neural network) by fitting the model's residuals with a $k$-nearest neighbor based regressor. The final prediction is then the sum of the original model and the fitted residual regressor. By construction, ResMem can explicitly memorize the training labels. Empirically, we show that ResMem consistently improves the test set generalization of the original prediction model across various standard vision and natural language processing benchmarks. Theoretically, we formulate a stylized linear regression problem and rigorously show that ResMem results in a more favorable test risk over the base predictor. △ Less

Submitted 20 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

arXiv:2301.12923 [pdf, other]

On student-teacher deviations in distillation: does it pay to disobey?

Authors: Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

Abstract: Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network. Yet, it has been shown in recent work that, despite being trained to fit the teacher's probabilities, the student may not only significantly deviate from the teacher probabilities, but may also outdo than the teacher in… ▽ More Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network. Yet, it has been shown in recent work that, despite being trained to fit the teacher's probabilities, the student may not only significantly deviate from the teacher probabilities, but may also outdo than the teacher in performance. Our work aims to reconcile this seemingly paradoxical observation. Specifically, we characterize the precise nature of the student-teacher deviations, and argue how they can co-occur with better generalization. First, through experiments on image and language data, we identify that these probability deviations correspond to the student systematically exaggerating the confidence levels of the teacher. Next, we theoretically and empirically establish another form of exaggeration in some simple settings: KD exaggerates the implicit bias of gradient descent in converging faster along the top eigendirections of the data. Finally, we tie these two observations together: we demonstrate that the exaggerated bias of KD can simultaneously result in both (a) the exaggeration of confidence and (b) the improved generalization of the student, thus offering a resolution to the apparent paradox. Our analysis brings existing theory and practice closer by considering the role of gradient descent in KD and by demonstrating the exaggerated bias effect in both theoretical and empirical settings. △ Less

Submitted 18 March, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2212.10180 [pdf, other]

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages

Authors: Ananya B. Sai, Vignesh Nagarajan, Tanay Dixit, Raj Dabre, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

Abstract: The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over… ▽ More The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET, have the highest correlations with annotator scores. Additionally, we find that the metrics do not adequately capture fluency-based errors in Indian languages, and there is a need to develop metrics focused on Indian languages. We hope that our dataset and analysis will help promote further research in this area. △ Less

Submitted 3 July, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: ACL 2023 long paper

arXiv:2212.07578 [pdf, other]

doi 10.1063/5.0146141

Electronic transport mechanisms in a thin crystal of the Kitaev candidate $αあるふぁ$-RuCl$_3$ probed through guarded high impedance measurements

Authors: Patrick Barfield, Vinh Tran, Vikram Nagarajan, Maya Martinez, Amirari Diego, Derek Bergner, Alessandra Lanzara, James G. Analytis, Claudia Ojeda-Aristizabal

Abstract: $αあるふぁ$-RuCl$_3$ is considered to be the top candidate material for the experimental realization of the celebrated Kitaev model. It is however known that additional interactions beyond the Kitaev model trigger in $αあるふぁ$-RuCl$_3$, a long-range zigzag antiferromagnetic ground state. In this work, we investigate a nanoflake of $αあるふぁ$-RuCl$_3… ▽ More $αあるふぁ$-RuCl$_3$ is considered to be the top candidate material for the experimental realization of the celebrated Kitaev model. It is however known that additional interactions beyond the Kitaev model trigger in $αあるふぁ$-RuCl$_3$, a long-range zigzag antiferromagnetic ground state. In this work, we investigate a nanoflake of $αあるふぁ$-RuCl$_3$ through guarded high impedance measurements aimed at reaching through electronic transport, the regime where the system turns into a zigzag antiferromagnet. We investigated a variety of temperatures (\SI{1.45}{\kelvin} - \SI{175}{\kelvin}) and out-of-plane magnetic fields ranging up to \SI{11}{\tesla}. We found a clear signature of a structural phase transition at $\approx 160$\,K as reported for thin crystals of $αあるふぁ$-RuCl$_3$, as well as a thermally activated behavior at temperatures above $\approx 30$\,K with a characteristic activation energy significantly smaller than the energy gap that we observe for $αあるふぁ$-RuCl$_3$ bulk crystals through our Angle Resolved Photoemission Spectroscopy (ARPES) experiments. Additionally we found that below $\approx 30$\,K, transport is ruled by Efros-Shklovskii (ES) VRH. These observations point to the presence of Coulomb impurities in our thin crystals. Most importantly, our data shows that below the magnetic ordering transition known for bulk $αあるふぁ$-RuCl$_3$ ($\approx 7$\,K), there is a clear deviation from VRH or thermal activation transport mechanisms. Our work demonstrates the possibility of reaching through specialized high impedance measurements, the thrilling ground states predicted for $αあるふぁ$-RuCl$_3$ at low temperatures in the frame of the Kitaev model, and informs about the transport mechanisms in this material in a wide temperature range as well as on important characteristic quantities such as the localization length of the impurities in a thin $αあるふぁ$-RuCl$_3$ crystal. △ Less

Submitted 13 January, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

Comments: 8 pages, 6 figures, Supplementary Materials

arXiv:2212.01243 [pdf, other]

doi 10.1088/1361-6404/aca570

Brachistochrone of off-centered cylinders

Authors: Krishnaraj Sambath, Vidhya Nagarajan

Abstract: We consider the problem of finding paths of shortest transit time between two points (popularly known as Brachistochrone) for cylinders with off-centered center of mass, rolling down without slip, subject solely to the force of gravity. This problem is set up using principles of classical rigid body dynamics and the desired path function is solved for numerically using the method of discrete calcu… ▽ More We consider the problem of finding paths of shortest transit time between two points (popularly known as Brachistochrone) for cylinders with off-centered center of mass, rolling down without slip, subject solely to the force of gravity. This problem is set up using principles of classical rigid body dynamics and the desired path function is solved for numerically using the method of discrete calculus of variations. We discover a distinct array of brachistochrone trajectories for off-centered cylinders, demonstrate a critical dependence of such paths on the initial location and orientation of cylinders' centers of mass and bring new insights into the family of brachistochrone problems and solutions. △ Less

Submitted 26 December, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: Preprint version (in arXiv) identical in content, only differs in format, to accepted version (in EJP)

Journal ref: European Journal of Physics (2022)

arXiv:2209.12108 [pdf, other]

An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem

Authors: Arpit Agarwal, Rohan Ghuge, Viswanath Nagarajan

Abstract: We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the $\textit{fully adaptive}$ setting, where the algorithm can make updates after every comparison. The "batched" dueling bandit problem is motivated by large-scale applications like… ▽ More We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the $\textit{fully adaptive}$ setting, where the algorithm can make updates after every comparison. The "batched" dueling bandit problem is motivated by large-scale applications like web search ranking and recommendation systems, where performing sequential updates may be infeasible. In this work, we ask: $\textit{is there a solution using only a few adaptive rounds that matches the asymptotic regret bounds of the best sequential algorithms for $K$-armed dueling bandits?}$ We answer this in the affirmative $\textit{under the Condorcet condition}$, a standard setting of the $K$-armed dueling bandit problem. We obtain asymptotic regret of $O(K^2\log^2(K)) + O(K\log(T))$ in $O(\log(T))$ rounds, where $T$ is the time horizon. Our regret bounds nearly match the best regret bounds known in the fully sequential setting under the Condorcet condition. Finally, in computational experiments over a variety of real-world datasets, we observe that our algorithm using $O(\log(T))$ rounds achieves almost the same performance as fully sequential algorithms (that use $T$ rounds). △ Less

Submitted 24 September, 2022; originally announced September 2022.

arXiv:2209.08979 [pdf]

doi 10.1038/s41467-023-39841-3

Ferroelectric Solitons Crafted in Epitaxial Bismuth Ferrite Superlattices

Authors: V. Govinden, P. R. Tong, X. Guo, Q. Zhang, S. Mantri, S. Prokhorenko, Y. Nahas, Y. Wu, L. Bellaiche, H. Tian, Z. Hong, D. Sando, V. Nagarajan

Abstract: In ferroelectrics, complex interactions among various degrees of freedom enable the condensation of topologically protected polarization textures. Known as ferroelectric solitons, these particle-like structures represent a new class of materials with promise for beyond CMOS technologies due to their ultrafine size and sensitivity to external stimuli. Such polarization textures have scarcely been r… ▽ More In ferroelectrics, complex interactions among various degrees of freedom enable the condensation of topologically protected polarization textures. Known as ferroelectric solitons, these particle-like structures represent a new class of materials with promise for beyond CMOS technologies due to their ultrafine size and sensitivity to external stimuli. Such polarization textures have scarcely been reported in multiferroics. Here, we report a range of soliton topologies in bismuth ferrite strontium titanate superlattices. High-resolution piezoresponse force microscopy and Cs-corrected high-angle annular dark-field scanning transmission electron microscopy reveal a zoo of topologies, and polarization displacement mapping of planar specimens reveals center-convergent and divergent topological defects as small as 3 nm. Phase field simulations verify that some of these topologies can be classed as bimerons, with a topological charge of plus and minus one, and first-principles-based effective Hamiltonian computations show that the co-existence of such structures can lead to non-integer topological charges, a first observation in a BiFeO3-based system. Our results open new opportunities in multiferroic topotronics. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2208.08351 [pdf, ps, other]

Minimum Cost Adaptive Submodular Cover

Authors: Hessa Al-Thani, Yubing Cui, Viswanath Nagarajan

Abstract: Adaptive submodularity is a fundamental concept in stochastic optimization, with numerous applications such as sensor placement, hypothesis identification and viral marketing. We consider the problem of minimum cost cover of adaptive-submodular functions, and provide a $4(1+\ln Q)$-approximation algorithm, where $Q$ is the goal value. In fact, we consider a significantly more general objective of… ▽ More Adaptive submodularity is a fundamental concept in stochastic optimization, with numerous applications such as sensor placement, hypothesis identification and viral marketing. We consider the problem of minimum cost cover of adaptive-submodular functions, and provide a $4(1+\ln Q)$-approximation algorithm, where $Q$ is the goal value. In fact, we consider a significantly more general objective of minimizing the $p^{th}$ moment of the coverage cost, and show that our algorithm simultaneously achieves a $(p+1)^{p+1}\cdot (\ln Q+1)^p$ approximation guarantee for all $p\ge 1$. All our approximation ratios are best possible up to constant factors (assuming $P\ne NP$). Moreover, our results also extend to the setting where one wants to cover {\em multiple} adaptive-submodular functions. Finally, we evaluate the empirical performance of our algorithm on instances of hypothesis identification. △ Less

Submitted 21 May, 2024; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: 24 pages, 3 figures

arXiv:2203.16519 [pdf]

doi 10.1103/PhysRevLett.129.087601

Nonvolatile Electric Field Control of Thermal Magnons in the Absence of an Applied Magnetic Field

Authors: Eric Parsonnet, Lucas Caretta, Vikram Nagarajan, Hongrui Zhang, Hossein Taghinejad, Piush Behera, Xiaoxi Huang, Pravin Kavle, Abel Fernandez, Dmitri Nikonov, Hai Li, Ian Young, James Analytis, Ramamoorthy Ramesh

Abstract: Spin transport through magnetic insulators has been demonstrated in a variety of materials and is an emerging pathway for next-generation spin-based computing. To modulate spin transport in these systems, one typically applies a sufficiently strong magnetic field to allow for deterministic control of magnetic order. Here, we make use of the well-known multiferroic magnetoelectric, BiFeO3, to demon… ▽ More Spin transport through magnetic insulators has been demonstrated in a variety of materials and is an emerging pathway for next-generation spin-based computing. To modulate spin transport in these systems, one typically applies a sufficiently strong magnetic field to allow for deterministic control of magnetic order. Here, we make use of the well-known multiferroic magnetoelectric, BiFeO3, to demonstrate non-volatile, hysteretic, electric-field control of thermally excited magnon current in the absence of an applied magnetic field. These findings are an important step toward magnon-based devices, where electric-field-only control is highly desirable. △ Less

Submitted 23 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: 34 pages, 4 figures, 9 supplemental figures

Journal ref: Phys. Rev. Lett. 129, 087601 (2022)

arXiv:2202.10660 [pdf, other]

Batched Dueling Bandits

Authors: Arpit Agarwal, Rohan Ghuge, Viswanath Nagarajan

Abstract: The $K$-armed dueling bandit problem, where the feedback is in the form of noisy pairwise comparisons, has been widely studied. Previous works have only focused on the sequential setting where the policy adapts after every comparison. However, in many applications such as search ranking and recommendation systems, it is preferable to perform comparisons in a limited number of parallel batches. We… ▽ More The $K$-armed dueling bandit problem, where the feedback is in the form of noisy pairwise comparisons, has been widely studied. Previous works have only focused on the sequential setting where the policy adapts after every comparison. However, in many applications such as search ranking and recommendation systems, it is preferable to perform comparisons in a limited number of parallel batches. We study the batched $K$-armed dueling bandit problem under two standard settings: (i) existence of a Condorcet winner, and (ii) strong stochastic transitivity and stochastic triangle inequality. For both settings, we obtain algorithms with a smooth trade-off between the number of batches and regret. Our regret bounds match the best known sequential regret bounds (up to poly-logarithmic factors), using only a logarithmic number of batches. We complement our regret analysis with a nearly-matching lower bound. Finally, we also validate our theoretical results via experiments on synthetic and real data. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:2112.01700 [pdf, other]

On Some Variants of Euclidean K-Supplier

Authors: Euiwoong Lee, Viswanath Nagarajan, Lily Wang

Abstract: The $k$-Supplier problem is an important location problem that has been actively studied in both general and Euclidean metrics. Many of its variants have also been studied, primarily on general metrics. We study two variants of $k$-Supplier, namely Priority $k$-Supplier and $k$-Supplier with Outliers, in Euclidean metrics. We obtain $(1+\sqrt{3})$-approximation algorithms for both variants, which… ▽ More The $k$-Supplier problem is an important location problem that has been actively studied in both general and Euclidean metrics. Many of its variants have also been studied, primarily on general metrics. We study two variants of $k$-Supplier, namely Priority $k$-Supplier and $k$-Supplier with Outliers, in Euclidean metrics. We obtain $(1+\sqrt{3})$-approximation algorithms for both variants, which are the first improvements over the previously-known factor-$3$ approximation (that is known to be best-possible for general metrics). We also study the Matroid Supplier problem on Euclidean metrics, and show that it cannot be approximated to a factor better than $3$ (assuming $P\ne NP$); so the Euclidean metric offers no improvement in this case. △ Less

Submitted 2 December, 2021; originally announced December 2021.

arXiv:2111.15569 [pdf, other]

Scalable Machine Learning Architecture for Neonatal Seizure Detection on Ultra-Edge Devices

Authors: Vishal Nagarajan, Ashwini Muralidharan, Deekshitha Sriraman, Pravin Kumar S

Abstract: Neonatal seizures are a commonly encountered neurological condition. They are the first clinical signs of a serious neurological disorder. Thus, rapid recognition and treatment are necessary to prevent serious fatalities. The use of electroencephalography (EEG) in the field of neurology allows precise diagnosis of several medical conditions. However, interpreting EEG signals needs the attention of… ▽ More Neonatal seizures are a commonly encountered neurological condition. They are the first clinical signs of a serious neurological disorder. Thus, rapid recognition and treatment are necessary to prevent serious fatalities. The use of electroencephalography (EEG) in the field of neurology allows precise diagnosis of several medical conditions. However, interpreting EEG signals needs the attention of highly specialized staff since the infant brain is developmentally immature during the neonatal period. Detecting seizures on time could potentially prevent the negative effects on the neurocognitive development of the infants. In recent years, neonatal seizure detection using machine learning algorithms have been gaining traction. Since there is a need for the classification of bio-signals to be computationally inexpensive in the case of seizure detection, this research presents a machine learning (ML) based architecture that operates with comparable predictive performance as previous models but with minimum level configuration. The proposed classifier was trained and tested on a public dataset of NICU seizures recorded at the Helsinki University Hospital. Our architecture achieved a best sensitivity of 87%, which is 6% more than that of the standard ML model chosen in this study. The model size of the ML classifier is optimized to just 4.84 KB with minimum prediction time of 182.61 milliseconds, thus enabling it to be deployed on wearable ultra-edge devices for quick and accurate response and obviating the need for cloud-based and other such exhaustive computational methods. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: 6 pages, 5 figures, Accepted at 2nd International Conference on Artificial Intelligence and Signal Processing (AISP) 2022

arXiv:2111.11789 [pdf, other]

End-to-End Optimized Arrhythmia Detection Pipeline using Machine Learning for Ultra-Edge Devices

Authors: Sideshwar J B, Sachin Krishan T, Vishal Nagarajan, Shanthakumar S, Vineeth Vijayaraghavan

Abstract: Atrial fibrillation (AF) is the most prevalent cardiac arrhythmia worldwide, with 2% of the population affected. It is associated with an increased risk of strokes, heart failure and other heart-related complications. Monitoring at-risk individuals and detecting asymptomatic AF could result in considerable public health benefits, as individuals with asymptomatic AF could take preventive measures w… ▽ More Atrial fibrillation (AF) is the most prevalent cardiac arrhythmia worldwide, with 2% of the population affected. It is associated with an increased risk of strokes, heart failure and other heart-related complications. Monitoring at-risk individuals and detecting asymptomatic AF could result in considerable public health benefits, as individuals with asymptomatic AF could take preventive measures with lifestyle changes. With increasing affordability to wearables, personalized health care is becoming more accessible. These personalized healthcare solutions require accurate classification of bio-signals while being computationally inexpensive. By making inferences on-device, we avoid issues inherent to cloud-based systems such as latency and network connection dependency. We propose an efficient pipeline for real-time Atrial Fibrillation Detection with high accuracy that can be deployed in ultra-edge devices. The feature engineering employed in this research catered to optimizing the resource-efficient classifier used in the proposed pipeline, which was able to outperform the best performing standard ML model by $10^5\times$ in terms of memory footprint with a mere trade-off of 2% classification accuracy. We also obtain higher accuracy of approximately 6% while consuming 403$\times$ lesser memory and being 5.2$\times$ faster compared to the previous state-of-the-art (SoA) embedded implementation. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: 8 pages, 9 figures, Accepted at 20th IEEE International Conference on Machine Learning and Applications (ICMLA) 2021

arXiv:2111.05687 [pdf, other]

Non-Adaptive Stochastic Score Classification and Explainable Halfspace Evaluation

Authors: Rohan Ghuge, Anupam Gupta, Viswanath Nagarajan

Abstract: Sequential testing problems involve a complex system with several components, each of which is "working" with some independent probability. The outcome of each component can be determined by performing a test, which incurs some cost. The overall system status is given by a function $f$ of the outcomes of its components. The goal is to evaluate this function $f$ by performing tests at the minimum e… ▽ More Sequential testing problems involve a complex system with several components, each of which is "working" with some independent probability. The outcome of each component can be determined by performing a test, which incurs some cost. The overall system status is given by a function $f$ of the outcomes of its components. The goal is to evaluate this function $f$ by performing tests at the minimum expected cost. While there has been extensive prior work on this topic, provable approximation bounds are mainly limited to simple functions like ``k-out-of-n'' and halfspaces. We consider significantly more general "score classification" functions, and provide the first constant factor approximation algorithm (improving over a previous logarithmic approximation ratio). Moreover, our policy is non adaptive: it just involves performing tests in an a priori fixed order. We also consider the related halfspace evaluation problem, where we want to evaluate some function on $d$ halfspaces (e.g., intersection of halfspaces). We show that our approach provides an $O(d^2\log d)$-approximation algorithm for this problem. Our algorithms also extend to the setting of "batched'' tests, where multiple tests can be performed simultaneously while incurring an extra setup cost. Finally, we perform computational experiments that demonstrate the practical performance of our algorithm for score classification. We observe that, for most instances, the cost of our algorithm is within $50\%$ of an information-theoretic lower bound on the optimal value. △ Less

Submitted 19 August, 2023; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: Full version of IPCO 2022 paper

arXiv:2110.08922 [pdf, other]

Explaining generalization in deep learning: progress and fundamental limits

Authors: Vaishnavh Nagarajan

Abstract: This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show ho… ▽ More This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether. △ Less

Submitted 17 October, 2021; originally announced October 2021.

Comments: arXiv admin note: text overlap with arXiv:1902.04742

arXiv:2108.08651 [pdf, other]

Efficient Algorithms for Stochastic Ridepooling Assignment with Mixed Fleets

Authors: Qi Luo, Viswanath Nagarajan, Alexander Sundt, Yafeng Yin, John Vincent, Mehrdad Shahabi

Abstract: Ride-pooling, which accommodates multiple passenger requests in a single trip, has the potential to significantly increase fleet utilization in shared mobility platforms. The ride-pooling assignment problem finds optimal co-riders to maximize the total utility or profit on a shareability graph, a hypergraph representing the matching compatibility between available vehicles and pending requests. Wi… ▽ More Ride-pooling, which accommodates multiple passenger requests in a single trip, has the potential to significantly increase fleet utilization in shared mobility platforms. The ride-pooling assignment problem finds optimal co-riders to maximize the total utility or profit on a shareability graph, a hypergraph representing the matching compatibility between available vehicles and pending requests. With mixed fleets due to the introduction of automated or premium vehicles, fleet sizing and relocation decisions should be made before the requests are revealed. Due to the immense size of the underlying shareability graph and demand uncertainty, it is impractical to use exact methods to calculate the optimal trip assignments. Two approximation algorithms for mid-capacity and high-capacity vehicles are proposed in this paper; The respective approximation ratios are $\frac1{p^2}$ and $\frac{e-1}{(2e+o(1)) p \ln p}$, where $p$ is the maximum vehicle capacity plus one. The performance of these algorithms is validated using a mixed autonomy on-demand mobility simulator. These efficient algorithms serve as a stepping stone for a variety of multimodal and multiclass on-demand mobility applications. △ Less

Submitted 14 April, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

arXiv:2106.16115 [pdf, other]

The Power of Adaptivity for Stochastic Submodular Cover

Authors: Rohan Ghuge, Anupam Gupta, Viswanath Nagarajan

Abstract: In the stochastic submodular cover problem, the goal is to select a subset of stochastic items of minimum expected cost to cover a submodular function. Solutions in this setting correspond to sequential decision processes that select items one by one "adaptively" (depending on prior observations). While such adaptive solutions achieve the best objective, the inherently sequential nature makes them… ▽ More In the stochastic submodular cover problem, the goal is to select a subset of stochastic items of minimum expected cost to cover a submodular function. Solutions in this setting correspond to sequential decision processes that select items one by one "adaptively" (depending on prior observations). While such adaptive solutions achieve the best objective, the inherently sequential nature makes them undesirable in many applications. We ask: how well can solutions with only a few adaptive rounds approximate fully-adaptive solutions? We give nearly tight answers for both independent and correlated settings, proving smooth tradeoffs between the number of adaptive rounds and the solution quality, relative to fully adaptive solutions. Experiments on synthetic and real datasets show qualitative improvements in the solutions as we allow more rounds of adaptivity; in practice, solutions with a few rounds of adaptivity are nearly as good as fully adaptive solutions. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: In proceedings of ICML 2021

arXiv:2106.13799 [pdf, other]

Assessing Generalization of SGD via Disagreement

Authors: Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J. Zico Kolter

Abstract: We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data. This builds on -- and is a stronger version of -- the observation in Nakkiran & Bansal '20, which requires the s… ▽ More We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data. This builds on -- and is a stronger version of -- the observation in Nakkiran & Bansal '20, which requires the second run to be on an altogether fresh training set. We further theoretically show that this peculiar phenomenon arises from the \emph{well-calibrated} nature of \emph{ensembles} of SGD-trained models. This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration. △ Less

Submitted 15 May, 2022; v1 submitted 25 June, 2021; originally announced June 2021.

arXiv:2103.14701 [pdf, ps, other]

Extending Classic Paxos for High-performance Read-Modify-Write Registers

Authors: Vasilis Gavrielatos, Antonios Katsarakis, Vijay Nagarajan

Abstract: In this work we provide a detailed specification of how we extended and implemented Classic Paxos (CP) to execute Read-Modify-Writes. In addition, we also specify how we implemented All-aboard Paxos over CP and how we use carstamps, to also add ABD reads and writes, to accelerate the common case, where RMWs are not needed. Our specification targets a Key-Value-Store that is deployed within the dat… ▽ More In this work we provide a detailed specification of how we extended and implemented Classic Paxos (CP) to execute Read-Modify-Writes. In addition, we also specify how we implemented All-aboard Paxos over CP and how we use carstamps, to also add ABD reads and writes, to accelerate the common case, where RMWs are not needed. Our specification targets a Key-Value-Store that is deployed within the datacenter, is replicated across 3 to 7 machines and supports reads, writes and RMWs. △ Less

Submitted 26 March, 2021; originally announced March 2021.

arXiv:2102.02714 [pdf, other]

doi 10.1103/PhysRevB.103.184404

Magnon-spinon dichotomy in the Kitaev hyperhoneycomb $βべーた$-Li$_2$IrO$_3$

Authors: Alejandro Ruiz, Nicholas P. Breznay, Mengqun Li, Ioannis Rousochatzakis, Anthony Allen, Isaac Zinda, Vikram Nagarajan, Gilbert Lopez, Mary H. Upton, Jungho Kim, Ayman H. Said, Xian-Rong Huang, Thomas Gog, Diego Casa, Robert J. Birgeneau, Jake D. Koralek, James G. Analytis, Natalia B. Perkins, Alex Frano

Abstract: The family of edge-sharing tri-coordinated iridates and ruthenates has emerged in recent years as a major platform for Kitaev spin liquid physics, where spins fractionalize into emergent magnetic fluxes and Majorana fermions with Dirac-like dispersions. While such exotic states are usually pre-empted by long-range magnetic order at low temperatures, signatures of Majorana fermions with long cohere… ▽ More The family of edge-sharing tri-coordinated iridates and ruthenates has emerged in recent years as a major platform for Kitaev spin liquid physics, where spins fractionalize into emergent magnetic fluxes and Majorana fermions with Dirac-like dispersions. While such exotic states are usually pre-empted by long-range magnetic order at low temperatures, signatures of Majorana fermions with long coherent times have been predicted to manifest at intermediate and higher energy scales, similar to the observation of spinons in quasi-1D spin chains. Here we present a Resonant Inelastic X-ray Scattering study of the magnetic excitations of the hyperhoneycomb iridate $βべーた$-Li$_2$IrO$_3$ under a magnetic field with a record-high-resolution spectrometer. At low-temperatures, dispersing spin waves can be resolved around the predicted intertwined incommensurate spiral and field-induced zigzag orders, whose excitation energy reaches a maximum of 16meV. A 2T magnetic field softens the dispersion around ${\bf Q}=0$. The behavior of the spin waves under magnetic field is consistent with our semiclassical calculations for the ground state and the dynamical spin structure factor, which further predicts that the ensued intertwined uniform states remain robust up to very high fields (100 T). Most saliently, the low-energy magnon-like mode is superimposed by a broad continuum of excitations, centered around 35meV and extending up to 100meV. This high-energy continuum survives up to at least 300K -- well above the ordering temperature of 38K -- and gives evidence for pairs of long-lived Majorana fermions of the proximate Kitaev spin liquid. △ Less

Submitted 4 February, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

Comments: 8 pages, 4 figures

Journal ref: Phys. Rev. B 103, 184404 (2021)

arXiv:2102.00558 [pdf]

doi 10.1038/s41563-021-01098-w

Super-R BiFeO$_3$: Epitaxial stabilization of a low-symmetry phase with giant electromechanical response

Authors: Oliver Paull, Changsong Xu, Xuan Cheng, Yangyang Zhang, Bin Xu, Kyle Kelley, Liam Collins, Alex de Marco, Rama K. Vasudevan, Laurent Bellaiche, Valanoor Nagarajan, Daniel Sando

Abstract: Piezoelectrics interconvert mechanical energy and electric charge and are widely used in actuators and sensors. The best performing materials are ferroelectrics at a morphotropic phase boundary (MPB), where several phases can intimately coexist. Switching between these phases by electric field produces a large electromechanical response. In the ferroelectric BiFeO$_3$, strain can be used to create… ▽ More Piezoelectrics interconvert mechanical energy and electric charge and are widely used in actuators and sensors. The best performing materials are ferroelectrics at a morphotropic phase boundary (MPB), where several phases can intimately coexist. Switching between these phases by electric field produces a large electromechanical response. In the ferroelectric BiFeO$_3$, strain can be used to create an MPB-like phase mixture and thus to generate large electric field dependent strains. However, this enhanced response occurs at localized, randomly positioned regions of the film, which potentially complicates nanodevice design. Here, we use epitaxial strain and orientation engineering in tandem - anisotropic epitaxy - to craft a hitherto unavailable low-symmetry phase of BiFeO$_3$ which acts as a structural bridge between the rhombohedral-like and tetragonal-like polymorphs. Interferometric displacement sensor measurements and first-principle calculations reveal that under external electric bias, this phase undergoes a transition to the tetragonal-like polymorph, generating a piezoelectric response enhanced by over 200%, and associated giant field-induced reversible strain. These results offer a new route to engineer giant electromechanical properties in thin films, with broader perspectives for other functional oxide systems. △ Less

Submitted 31 January, 2021; originally announced February 2021.

Comments: 20 pages, 4 figures

arXiv:2011.12951 [pdf, other]

Evidence for freezing of charge degrees of freedom across a critical point in CeCoIn$_5$

Authors: Nikola Maksimovic, Tessa Cookmeyer, Jan Rusz, Vikram Nagarajan, Amanda Gong, Fanghui Wan, Stefano Faubel, Ian M. Hayes, Sooyoung Jang, Yochai Werman, Peter M. Oppeneer, Ehud Altman, James G. Analytis

Abstract: The presence of a quantum critical point separating two distinct zero-temperature phases is thought to underlie the `strange' metal state of many high-temperature superconductors. The nature of this quantum critical point, as well as a description of the resulting strange metal, are central open problems in condensed matter physics. In large part, the controversy stems from the lack of a clear bro… ▽ More The presence of a quantum critical point separating two distinct zero-temperature phases is thought to underlie the `strange' metal state of many high-temperature superconductors. The nature of this quantum critical point, as well as a description of the resulting strange metal, are central open problems in condensed matter physics. In large part, the controversy stems from the lack of a clear broken symmetry to characterize the critical phase transition, and this challenge is no clearer than in the example of the unconventional superconductor CeCoIn$_5$. Through Hall effect and Fermi surface measurements of CeCoIn$_5$, in comparison to ab initio calculations, we find evidence for a critical point that connects two Fermi surfaces with different volumes without apparent symmetry-breaking, indicating the presence of a transition that involves an abrupt localization of one sector of the charge degrees of freedom. We present a model for the anomalous electrical Hall resistivity of this material based on the conductivity of valence charge fluctuations. △ Less

Submitted 25 November, 2020; originally announced November 2020.

Comments: 17 pages, 4 figures, Supplement included

arXiv:2011.01205 [pdf, other]

A Learning Theoretic Perspective on Local Explainability

Authors: Jeffrey Li, Vaishnavh Nagarajan, Gregory Plumb, Ameet Talwalkar

Abstract: In this paper, we explore connections between interpretable machine learning and learning theory through the lens of local approximation explanations. First, we tackle the traditional problem of performance generalization and bound the test-time accuracy of a model using a notion of how locally explainable it is. Second, we explore the novel problem of explanation generalization which is an import… ▽ More In this paper, we explore connections between interpretable machine learning and learning theory through the lens of local approximation explanations. First, we tackle the traditional problem of performance generalization and bound the test-time accuracy of a model using a notion of how locally explainable it is. Second, we explore the novel problem of explanation generalization which is an important concern for a growing class of finite sample-based local approximation explanations. Finally, we validate our theoretical results empirically and show that they reflect what can be seen in practice. △ Less

Submitted 2 November, 2020; originally announced November 2020.

arXiv:2010.15775 [pdf, other]

Understanding the Failure Modes of Out-of-Distribution Generalization

Authors: Vaishnavh Nagarajan, Anders Andreassen, Behnam Neyshabur

Abstract: Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expe… ▽ More Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expect these models to succeed. In particular, through a theoretical study of gradient-descent-trained linear classifiers on some easy-to-learn tasks, we uncover two complementary failure modes. These modes arise from how spurious correlations induce two kinds of skews in the data: one geometric in nature, and another, statistical in nature. Finally, we construct natural modifications of image classification datasets to understand when these failure modes can arise in practice. We also design experiments to isolate the two failure modes when training modern neural networks on these datasets. △ Less

Submitted 29 April, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

arXiv:2009.14092 [pdf, other]

doi 10.5488/CMP.23.33301

Detection of odor quality and ripening stage of Mangifera indica L. by graphdiyne nanosheet -- a DFT outlook

Authors: V. Nagarajan, R. Chandiramouli

Abstract: Using first-principles calculation, geometrical stability together with electronic properties of graphdiyne nanosheet (Gdn-NS) is investigated. The structural stability of Gdn-NS is established with the support of phonon band structure and cohesive energy. The main objective of the present study is to check the odor quality of Mangifera indica L. (mangoes) fruits during the various ripening stage… ▽ More Using first-principles calculation, geometrical stability together with electronic properties of graphdiyne nanosheet (Gdn-NS) is investigated. The structural stability of Gdn-NS is established with the support of phonon band structure and cohesive energy. The main objective of the present study is to check the odor quality of Mangifera indica L. (mangoes) fruits during the various ripening stage with the influence of Gdn-NS material. In addition, the adsorption of various volatiles, namely ethyl butanoate, myrcene, (E,Z,Z)-1,3,4,8-undecatetraene and $γがんま$-octalactone aromas on Gdn-NS is explored with the significant parameters including Bader charge transfer, energy gap, average energy gap changes and adsorption energy. The sensitivity of volatiles emitting from various ripening stages of mango on Gdn-NS were explored with the influence of density of states spectrum. The outcomes of the proposed work help us to check the ripening stage and odor quality of Mangifera indica L. by Gdn-NS material using density functional theory. △ Less

Submitted 29 September, 2020; originally announced September 2020.

Comments: 13 pages, 12 figures, 2 tables

Journal ref: Condens. Matter Phys., 2020, Vol. 23, No 3, 33301

arXiv:2007.07721 [pdf, ps, other]

Online Generalized Network Design Under (Dis)Economies of Scale

Authors: Viswanath Nagarajan, Lily Wang

Abstract: We consider a general online network design problem where a sequence of N requests arrive over time, each of which needs to use some subset of the available resources E. The cost incurred by any resource e is some function $f_e$ of the total load $L_e$ on that resource. The objective is to minimize the total cost $\sum_{e\in E} f_e(L_e)$. We focus on cost functions that exhibit (dis)economies of s… ▽ More We consider a general online network design problem where a sequence of N requests arrive over time, each of which needs to use some subset of the available resources E. The cost incurred by any resource e is some function $f_e$ of the total load $L_e$ on that resource. The objective is to minimize the total cost $\sum_{e\in E} f_e(L_e)$. We focus on cost functions that exhibit (dis)economies of scale, that are of the form $f_e(x) = σしぐま_e + ξくしー_e\cdot x^{αあるふぁ_e}$ if $x>0$ (and zero if $x=0$), where the exponent $αあるふぁ_e\ge 1$. Optimization problems under these functions have received significant recent attention due to applications in energy-efficient computing. Our main result is a deterministic online algorithm with tight competitive ratio $Θしーた\left(\max_{e\in E} \left(\frac{σしぐま_e}{ξくしー_e}\right)^{1/αあるふぁ_e}\right)$ when $αあるふぁ_e$ is constant for all $e\in E$. This framework is applicable to a variety of network design problems in undirected and directed graphs, including multicommodity routing, Steiner tree/forest connectivity and set-connectivity. In fact, our online competitive ratio even matches the previous-best (offline) approximation ratio for generalized network design. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: 20 pages

arXiv:2007.04970 [pdf, other]

doi 10.1103/PhysRevX.10.041062

Magnetoresistance scaling, disorder, `hot spots' and the origin of $T$-linear resistivity in BaFe$_2$(As$_{1-x}$P$_x$)$_2$

Authors: Nikola Maksimovic, Ian M. Hayes, Vikram Nagarajan, Alexei E. Koshelev, John Singleton, Yeonbae Lee, Thomas Schenkel, James G. Analytis

Abstract: The scaling of $H$-linear magnetoresistance in field and temperature was measured in under-doped (x = 0.19) and optimally-doped (x=0.31)~BaFe$_2$(As$_{1-x}$P$_x$)$_2$. We analyze the data based on an orbital model in the presence of strongly anisotropic quasiparticle spectra and scattering time due to antiferromagnetism. The magnetoresistance is dominated by the properties of small regions of the… ▽ More The scaling of $H$-linear magnetoresistance in field and temperature was measured in under-doped (x = 0.19) and optimally-doped (x=0.31)~BaFe$_2$(As$_{1-x}$P$_x$)$_2$. We analyze the data based on an orbital model in the presence of strongly anisotropic quasiparticle spectra and scattering time due to antiferromagnetism. The magnetoresistance is dominated by the properties of small regions of the Fermi surface called `hot spots' where antiferromagnetic excitations induce a large quasiparticle scattering rate. Approximate temperature-magnetic field scaling relations are derived and shown to be consistent with the experimental data. We argue that these results link the origin of linear-in-temperature resistivity to hot spots arising from an antiferromagnetic critical point, and magnetoresistance measurements provide a route to quantify this link. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: 9 pages, 4 figures, Supplemental Material available on request

Journal ref: Phys. Rev. X 10, 041062 (2020)

arXiv:2007.03574 [pdf, other]

Provably Safe PAC-MDP Exploration Using Analogies

Authors: Melrose Roderick, Vaishnavh Nagarajan, J. Zico Kolter

Abstract: A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the… ▽ More A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods. △ Less

Submitted 22 March, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: 10 pages, 3 figures, In proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

arXiv:2002.11153 [pdf, other]

Stochastic Makespan Minimization in Structured Set Systems

Authors: Anupam Gupta, Amit Kumar, Viswanath Nagarajan, Xiangkun Shen

Abstract: We study stochastic combinatorial optimization problems where the objective is to minimize the expected maximum load (a.k.a.\ the makespan). In this framework, we have a set of $n$ tasks and $m$ resources, where each task $j$ uses some subset of the resources. Tasks have random sizes $X_j$, and our goal is to non-adaptively select $t$ tasks to minimize the expected maximum load over all resources,… ▽ More We study stochastic combinatorial optimization problems where the objective is to minimize the expected maximum load (a.k.a.\ the makespan). In this framework, we have a set of $n$ tasks and $m$ resources, where each task $j$ uses some subset of the resources. Tasks have random sizes $X_j$, and our goal is to non-adaptively select $t$ tasks to minimize the expected maximum load over all resources, where the load on any resource $i$ is the total size of all selected tasks that use $i$. For example, when resources are points and tasks are intervals in a line, we obtain an $O(\log\log m)$-approximation algorithm. Our technique is also applicable to other problems with some geometric structure in the relation between tasks and resources; e.g., packing paths, rectangles, and "fat" objects. Our approach uses a strong LP relaxation using the cumulant generating functions of the random variables. We also show that this LP has an $Ωおめが(\log^* m)$ integrality gap, even for the problem of selecting intervals on a line; here $\log^* m$ is the iterated logarithm function. △ Less

Submitted 24 June, 2021; v1 submitted 25 February, 2020; originally announced February 2020.

Comments: 30 pages, 2 figures

Journal ref: IPCO 2020

arXiv:2002.04785 [pdf, ps, other]

doi 10.1103/PhysRevB.101.235415

Competition between magnetic order and charge localization in Na$_2$IrO$_3$ thin crystal devices

Authors: Josue Rodriguez, Gilbert Lopez, Samantha Crouch, Nicholas P. Breznay, Robert Kealhofer, Vikram Nagarajan, Drew Latzke, Francisco Ramirez, Naomy Marrufo, Peter Santiago, Jared Lara, Amirari Diego, Everardo Molina, David Rosser, Hadi Tavassol, Alessandra Lanzara, James G. Analytis, Claudia Ojeda-Aristizabal

Abstract: Spin orbit assisted Mott insulators such as sodium iridate (Na$_2$IrO$_3$) have been an important subject of study in the recent years. In these materials, the interplay of electronic correlations, spin-orbit coupling, crystal field effects and a honeycomb arrangement of ions bring exciting ground states, predicted in the frame of the Kitaev model. The insulating character of Na$_2$IrO$_3$ has ham… ▽ More Spin orbit assisted Mott insulators such as sodium iridate (Na$_2$IrO$_3$) have been an important subject of study in the recent years. In these materials, the interplay of electronic correlations, spin-orbit coupling, crystal field effects and a honeycomb arrangement of ions bring exciting ground states, predicted in the frame of the Kitaev model. The insulating character of Na$_2$IrO$_3$ has hampered its integration to an electronic device, desirable for applications, such as the manipulation of quasiparticles interesting for topological quantum computing. Here we show through electronic transport measurements supported by Angle Resolved Photoemission Spectroscopy (ARPES) experiments, that electronic transport in Na$_2$IrO$_3$ is ruled by variable range hopping and it is strongly dependent on the magnetic ordering transition known for bulk Na$_2$IrO$_3$, as well as on external electric fields. Electronic transport measurements allow us to deduce a value for the localization length and the density of states in our Na$_2$IrO$_3$ thin crystals devices, offering an alternative approach to study insulating layered materials. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Journal ref: Phys. Rev. B 101, 235415 (2020)

arXiv:2001.09804 [pdf, other]

doi 10.1145/3373376.3378496

Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol

Authors: A. Katsarakis, V. Gavrielatos, M. Katebzadeh, A. Joshi, A. Dragojevic, B. Grot, V. Nagarajan

Abstract: Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when fault… ▽ More Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency. This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6X lower than that of CRAQ and ZAB. △ Less

Submitted 27 January, 2020; originally announced January 2020.

Comments: Accepted in ASPLOS 2020

ACM Class: C.4.6; C.4.5

arXiv:1910.00886 [pdf, other]

doi 10.5488/CMP.22.33001

Silicene nanosheet to discriminate the quality of pear fruit based on volatiles adsorption --- a DFT application

Authors: R. Keerthi Bhavadharani, V. Nagarajan, R. Chandiramouli

Abstract: We report the interaction between the silicene nanosheet (Si-NS) and volatile organic compounds (VOCs) released from the pear fruit (Pyrus communis) in ripened and over-ripened stages using density functional theory (DFT) technique. The geometric stability of Si-NS is studied from the phonon band structure. Further, the electronic property of Si-NS is studied from the energy band gap structure, an… ▽ More We report the interaction between the silicene nanosheet (Si-NS) and volatile organic compounds (VOCs) released from the pear fruit (Pyrus communis) in ripened and over-ripened stages using density functional theory (DFT) technique. The geometric stability of Si-NS is studied from the phonon band structure. Further, the electronic property of Si-NS is studied from the energy band gap structure, and the energy gap is found to be 0.46 eV, which exhibits semiconductor property. The outcomes infer that the adsorption of volatiles released from the pear fruit on silicene nanosheet is in the following order hexyl acetate $\rightarrow$ butyl acetate $\rightarrow$ butyl butyrate in the ripened stage whereas in the over-ripened stage the adsorption sequence is noticed to be acetic acid $\rightarrow$ ethyl acetate $\rightarrow$ 1-butanol. The adsorption property of pear fruit volatiles on silicene nanosheet is documented with the adsorption energy, average energy gap changes, and Bader charge transfer. Moreover, the adsorption of VOCs on silicene nanosheet is also explored using the energy band structure, electron density along with the adsorption sites and density of states (DOS) spectrum. Besides, the findings reveal that the silicene nanosheet can be used to discriminate the quality of pear fruit. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Comments: 13 pages, 20 figures, 1 table

Journal ref: Condens. Matter Phys., 2019, vol. 22, No. 3, 33001

arXiv:1909.06355 [pdf, other]

doi 10.1103/PhysRevB.101.075112

Hidden spin-orbital order in the Kitaev hyperhoneycomb $βべーた$-Li$_2$IrO$_3$

Authors: Alejandro Ruiz, Vikram Nagarajan, Mayia Vranas, Gilbert Lopez, Gregory T. McCandless, Itamar Kimchi, Julia Y. Chan, Nicholas P. Breznay, Alex Frano, Benjamin A. Frandsen, James G. Analytis

Abstract: We report the existence of a phase transition at high temperature in the 3D Kitaev candidate material, $βべーた$-Li$_2$IrO$_3$. We show that the transition is bulk, intrinsic and orders a tiny magnetic moment with a spatially anisotropic saturation moment. We show that even though this transition is global, it does not freeze the local Ir moments, which order at much lower temperatures into an incommens… ▽ More We report the existence of a phase transition at high temperature in the 3D Kitaev candidate material, $βべーた$-Li$_2$IrO$_3$. We show that the transition is bulk, intrinsic and orders a tiny magnetic moment with a spatially anisotropic saturation moment. We show that even though this transition is global, it does not freeze the local Ir moments, which order at much lower temperatures into an incommensurate state. Rather, the ordered moment has an orbital origin that is coupled to spin correlations, likely of a Kitaev origin. The separate ordering of spin-correlated orbital moments and of local Ir moments reveals a novel way in which magnetic frustration in Kitaev systems can lead to coexisting magnetic states. △ Less

Submitted 13 September, 2019; originally announced September 2019.

Comments: 7 pages, 5 Figures

Journal ref: Phys. Rev. B 101, 075112 (2020)

arXiv:1905.13344 [pdf, other]

Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience

Authors: Vaishnavh Nagarajan, J. Zico Kolter

Abstract: The ability of overparameterized deep networks to generalize well has been linked to the fact that stochastic gradient descent (SGD) finds solutions that lie in flat, wide minima in the training loss -- minima where the output of the network is resilient to small random noise added to its parameters. So far this observation has been used to provide generalization guarantees only for neural network… ▽ More The ability of overparameterized deep networks to generalize well has been linked to the fact that stochastic gradient descent (SGD) finds solutions that lie in flat, wide minima in the training loss -- minima where the output of the network is resilient to small random noise added to its parameters. So far this observation has been used to provide generalization guarantees only for neural networks whose parameters are either \textit{stochastic} or \textit{compressed}. In this work, we present a general PAC-Bayesian framework that leverages this observation to provide a bound on the original network learned -- a network that is deterministic and uncompressed. What enables us to do this is a key novelty in our approach: our framework allows us to show that if on training data, the interactions between the weight matrices satisfy certain conditions that imply a wide training loss minimum, these conditions themselves {\em generalize} to the interactions between the matrices on test data, thereby implying a wide test loss minimum. We then apply our general framework in a setup where we assume that the pre-activation values of the network are not too small (although we assume this only on the training data). In this setup, we provide a generalization guarantee for the original (deterministic, uncompressed) network, that does not scale with product of the spectral norms of the weight matrices -- a guarantee that would not have been possible with prior approaches. △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: Published as a conference paper at ICLR 2019

arXiv:1904.07271 [pdf, ps, other]

Stochastic Load Balancing on Unrelated Machines

Authors: Anupam Gupta, Amit Kumar, Viswanath Nagarajan, Xiangkun Shen

Abstract: We consider the problem of makespan minimization on unrelated machines when job sizes are stochastic. The goal is to find a fixed assignment of jobs to machines, to minimize the expected value of the maximum load over all the machines. For the identical machines special case when the size of a job is the same across all machines, a constant-factor approximation algorithm has long been known. Our m… ▽ More We consider the problem of makespan minimization on unrelated machines when job sizes are stochastic. The goal is to find a fixed assignment of jobs to machines, to minimize the expected value of the maximum load over all the machines. For the identical machines special case when the size of a job is the same across all machines, a constant-factor approximation algorithm has long been known. Our main result is the first constant-factor approximation algorithm for the general case of unrelated machines. This is achieved by (i) formulating a lower bound using an exponential-size linear program that is efficiently computable, and (ii) rounding this linear program while satisfying only a specific subset of the constraints that still suffice to bound the expected makespan. We also consider two generalizations. The first is the budgeted makespan minimization problem, where the goal is to minimize the expected makespan subject to scheduling a target number (or reward) of jobs. We extend our main result to obtain a constant-factor approximation algorithm for this problem. The second problem involves $q$-norm objectives, where we want to minimize the expected q-norm of the machine loads. Here we give an $O(q/\log q)$-approximation algorithm, which is a constant-factor approximation for any fixed $q$. △ Less

Submitted 15 April, 2019; originally announced April 2019.

Showing 1–50 of 108 results for author: Nagarajan, V