Search | arXiv e-print repository

arXiv:2407.20016 [pdf, other]

Stability and topological nature of charged Gauss-Bonnet AdS black holes in five dimensions

Authors: Imtak Jeon, Bum-Hoon Lee, Wonwoo Lee, Madhu Mishra

Abstract: We examine the thermodynamic characteristics and phase structures of a black hole, where the black hole horizon could be a hypersurface with positive, zero, or negative constant curvature, within the framework of Einstein-Maxwell theory, incorporating a negative cosmological constant and a Gauss-Bonnet correction. Our research follows the topological approach to black hole thermodynamics where we… ▽ More We examine the thermodynamic characteristics and phase structures of a black hole, where the black hole horizon could be a hypersurface with positive, zero, or negative constant curvature, within the framework of Einstein-Maxwell theory, incorporating a negative cosmological constant and a Gauss-Bonnet correction. Our research follows the topological approach to black hole thermodynamics where we treat anti-de Sitter (AdS) black holes as topological defects in thermodynamic space. We study the nature of the black hole's critical points and local stability by computing the winding numbers/topological charge associated with the zero point of the vector field, derived from the temperature of extremal points and generalized off-shell Gibbs free energy, respectively. Black holes are classified into different topological classes based on their topological number. In this study, we found unlike the charged AdS black hole, the charged GB AdS black hole exhibits a critical point. Our findings reveal the occurrence of a liquid/gas-like first-order phase transition between small-large black hole phases of the spherical charged GB AdS black hole. We conclude that the charged GB AdS and charged AdS black holes belong to different topological classes in the grand canonical ensemble. Furthermore, connecting with the previous studies, we conclude that the charged AdS and charged GB AdS black holes in canonical and charged GB in the grand canonical ensemble belong to the same topological classes. △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: 29 pages, 21 figures, 3 Tables

arXiv:2407.13739 [pdf, other]

Scaling Granite Code Models to 128K Context

Authors: Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, Bridget McGinn, Tim Bula, Mayank Mishra, Adriana Meza Soria, Gaoyuan Zhang, Aditya Prasad, Yikang Shen, Saptha Surendran, Shanmukha Guttula, Hima Patel, Parameswaran Selvam, Xuan-Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda

Abstract: This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also re… ▽ More This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.09105 [pdf, other]

Enhancing Training Efficiency Using Packing with Flash Attention

Authors: Achintya Kundu, Rhui Dih Lee, Laura Wynter, Raghu Kiran Ganti, Mayank Mishra

Abstract: Padding is often used in tuning LLM models by adding special tokens to shorter training examples to match the length of the longest sequence in each batch. While this ensures uniformity for batch processing, it introduces inefficiencies by including irrelevant padding tokens in the computation and wastes GPU resources. Hugging Face SFT trainer has always offered the option to use packing to combin… ▽ More Padding is often used in tuning LLM models by adding special tokens to shorter training examples to match the length of the longest sequence in each batch. While this ensures uniformity for batch processing, it introduces inefficiencies by including irrelevant padding tokens in the computation and wastes GPU resources. Hugging Face SFT trainer has always offered the option to use packing to combine multiple training examples, allowing for maximal utilization of GPU resources. However, up till now, it did not offer proper masking of each packed training example. This capability has now been added to Hugging Face Transformers 4.43. We analyse this new feature and show the benefits across different variations of packing. △ Less

Submitted 29 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.06893 [pdf]

Measuring Sustainability Intention of ESG Fund Disclosure using Few-Shot Learning

Authors: Mayank Singh, Nazia Nafis, Abhijeet Kumar, Mridul Mishra

Abstract: Global sustainable fund universe encompasses open-end funds and exchange-traded funds (ETF) that, by prospectus or other regulatory filings, claim to focus on Environment, Social and Governance (ESG). Challengingly, the claims can only be confirmed by examining the textual disclosures to check if there is presence of intentionality and ESG focus on its investment strategy. Currently, there is no r… ▽ More Global sustainable fund universe encompasses open-end funds and exchange-traded funds (ETF) that, by prospectus or other regulatory filings, claim to focus on Environment, Social and Governance (ESG). Challengingly, the claims can only be confirmed by examining the textual disclosures to check if there is presence of intentionality and ESG focus on its investment strategy. Currently, there is no regulation to enforce sustainability in ESG products space. This paper proposes a unique method and system to classify and score the fund prospectuses in the sustainable universe regarding specificity and transparency of language. We aim to employ few-shot learners to identify specific, ambiguous, and generic sustainable investment-related language. Additionally, we construct a ratio metric to determine language score and rating to rank products and quantify sustainability claims for US sustainable universe. As a by-product, we publish manually annotated quality training dataset on Hugging Face (ESG-Prospectus-Clarity-Category under cc-by-nc-sa-4.0) of more than 1K ESG textual statements. The performance of the few-shot finetuning approach is compared with zero-shot models e.g., Llama-13B, GPT 3.5 Turbo etc. We found that prompting large language models are not accurate for domain specific tasks due to misalignment issues. The few-shot finetuning techniques outperform zero-shot models by large margins of more than absolute ~30% in precision, recall and F1 metrics on completely unseen ESG languages (test set). Overall, the paper attempts to establish a systematic and scalable approach to measure and rate sustainability intention quantitatively for sustainable funds using texts in prospectus. Regulatory bodies, investors, and advisors may utilize the findings of this research to reduce cognitive load in investigating or screening of ESG funds which accurately reflects the ESG intention. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: This paper was presented at 'AI applications in ESG Conference' at IIM Bangalore, India (Nov, 2023)

arXiv:2407.05467 [pdf, other]

The infrastructure powering IBM's Gen AI model development

Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

arXiv:2406.09318 [pdf, ps, other]

Characterising Interventions in Causal Games

Authors: Manuj Mishra, James Fox, Michael Wooldridge

Abstract: Causal games are probabilistic graphical models that enable causal queries to be answered in multi-agent settings. They extend causal Bayesian networks by specifying decision and utility variables to represent the agents' degrees of freedom and objectives. In multi-agent settings, whether each agent decides on their policy before or after knowing the causal intervention is important as this affect… ▽ More Causal games are probabilistic graphical models that enable causal queries to be answered in multi-agent settings. They extend causal Bayesian networks by specifying decision and utility variables to represent the agents' degrees of freedom and objectives. In multi-agent settings, whether each agent decides on their policy before or after knowing the causal intervention is important as this affects whether they can respond to the intervention by adapting their policy. Consequently, previous work in causal games imposed chronological constraints on permissible interventions. We relax this by outlining a sound and complete set of primitive causal interventions so the effect of any arbitrarily complex interventional query can be studied in multi-agent settings. We also demonstrate applications to the design of safe AI systems by considering causal mechanism design and commitment. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted to the 40th Conference on Uncertainty in Artificial Intelligence (UAI-2024)

arXiv:2406.08301 [pdf, other]

Jet modification via $πぱい^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, K. Aoki, N. Apadula, L. Aphecetche, J. Asai, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (510 additional authors not shown)

Abstract: High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs… ▽ More High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δでるた_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δでるたφふぁい$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 534 authors from 83 institutions, 12 pages, 7 figures. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2406.03128 [pdf, ps, other]

The Weyl Transform of a smooth measure on a real-analytic submanifold

Authors: Mansi Mishra, M. K. Vemuri

Abstract: If $μみゅー$ is a smooth measure supported on a real-analytic submanifold of $\mathbb{R}^{2n}$ which is not contained in any affine hyperplane, then the Weyl transform of $μみゅー$ is a compact operator. If $μみゅー$ is a smooth measure supported on a real-analytic submanifold of $\mathbb{R}^{2n}$ which is not contained in any affine hyperplane, then the Weyl transform of $μみゅー$ is a compact operator. △ Less

Submitted 5 June, 2024; originally announced June 2024.

MSC Class: 22D10; 22E30; 43A05; 43A80; 53D55

arXiv:2405.12981 [pdf, other]

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Authors: William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar Panda, Jonathan Ragan Kelly

Abstract: Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory required to store the KV cache can become prohibitive at long sequence lengths and large batch sizes. Since the invention of the transformer, two of the most effective interventions discovered for reducing the size of the KV cache… ▽ More Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory required to store the KV cache can become prohibitive at long sequence lengths and large batch sizes. Since the invention of the transformer, two of the most effective interventions discovered for reducing the size of the KV cache have been Multi-Query Attention (MQA) and its generalization, Grouped-Query Attention (GQA). MQA and GQA both modify the design of the attention block so that multiple query heads can share a single key/value head, reducing the number of distinct key/value heads by a large factor while only minimally degrading accuracy. In this paper, we show that it is possible to take Multi-Query Attention a step further by also sharing key and value heads between adjacent layers, yielding a new attention design we call Cross-Layer Attention (CLA). With CLA, we find that it is possible to reduce the size of the KV cache by another 2x while maintaining nearly the same accuracy as unmodified MQA. In experiments training 1B- and 3B-parameter models from scratch, we demonstrate that CLA provides a Pareto improvement over the memory/accuracy tradeoffs which are possible with traditional MQA, enabling inference with longer sequence lengths and larger batch sizes than would otherwise be possible △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.04324 [pdf, other]

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile all around code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

arXiv:2404.06423 [pdf, other]

Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints

Authors: Manav Mishra, Hritik Bana, Saswata Sarkar, Sujeevraja Sanjeevi, PB Sujit, Kaarthik Sundar

Abstract: This article presents a deep reinforcement learning-based approach to tackle a persistent surveillance mission requiring a single unmanned aerial vehicle initially stationed at a depot with fuel or time-of-flight constraints to repeatedly visit a set of targets with equal priority. Owing to the vehicle's fuel or time-of-flight constraints, the vehicle must be regularly refueled, or its battery mus… ▽ More This article presents a deep reinforcement learning-based approach to tackle a persistent surveillance mission requiring a single unmanned aerial vehicle initially stationed at a depot with fuel or time-of-flight constraints to repeatedly visit a set of targets with equal priority. Owing to the vehicle's fuel or time-of-flight constraints, the vehicle must be regularly refueled, or its battery must be recharged at the depot. The objective of the problem is to determine an optimal sequence of visits to the targets that minimizes the maximum time elapsed between successive visits to any target while ensuring that the vehicle never runs out of fuel or charge. We present a deep reinforcement learning algorithm to solve this problem and present the results of numerical experiments that corroborate the effectiveness of this approach in comparison with common-sense greedy heuristics. △ Less

Submitted 2 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: 6 pages

Report number: LA-UR-24-23186

arXiv:2404.05567 [pdf, other]

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Authors: Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva, Colin Raffel, Rameswar Panda

Abstract: Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios. However, MoE models generally require 2-4$\times$ times more parameters to achieve comparable performance to a dense model, which incurs larger GPU memory requirements and makes MoE models less… ▽ More Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios. However, MoE models generally require 2-4$\times$ times more parameters to achieve comparable performance to a dense model, which incurs larger GPU memory requirements and makes MoE models less efficient in I/O-bounded scenarios like autoregressive generation. In this work, we propose a hybrid dense training and sparse inference framework for MoE models (DS-MoE) which achieves strong computation and parameter efficiency by employing dense computation across all experts during training and sparse computation during inference. Our experiments on training LLMs demonstrate that our DS-MoE models are more parameter-efficient than standard sparse MoEs and are on par with dense models in terms of total parameter size and performance while being computationally cheaper (activating 30-40% of the model's parameters). Performance tests using vLLM show that our DS-MoE-6B model runs up to $1.86\times$ faster than similar dense models like Mistral-7B, and between $1.50\times$ and $1.71\times$ faster than comparable MoEs, such as DeepSeekMoE-16B and Qwen1.5-MoE-A2.7B. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.03605 [pdf, other]

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Authors: Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, Dan Alistarh, Rameswar Panda, Yoon Kim

Abstract: We consider the problem of accurate quantization for language models, where both the weights and activations are uniformly quantized to 4 bits per parameter, the lowest bitwidth format natively supported by GPU hardware. In this context, the key challenge is activation quantization: it is known that language models contain outlier channels whose values on average are orders of magnitude higher tha… ▽ More We consider the problem of accurate quantization for language models, where both the weights and activations are uniformly quantized to 4 bits per parameter, the lowest bitwidth format natively supported by GPU hardware. In this context, the key challenge is activation quantization: it is known that language models contain outlier channels whose values on average are orders of magnitude higher than than other channels, which prevents accurate low-bitwidth quantization with known techniques. We systematically study this phenomena and find that these outlier channels emerge early in training, and that they occur more frequently in layers with residual streams. We then propose a simple strategy which regularizes a layer's inputs via quantization-aware training (QAT) and its outputs via activation kurtosis regularization. We show that regularizing both the inputs and outputs is crucial for preventing a model's "migrating" the difficulty in input quantization to the weights, which makes post-training quantization (PTQ) of weights more difficult. When combined with weight PTQ, we show that our approach can obtain a W4A4 model that performs competitively to the standard-precision W16A16 baseline. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.03177 [pdf]

Direct visualization of local magnetic domain dynamics in a 2D Van der Walls material/ferromagnet interface

Authors: Joseph Vimal Vas, Rohit Medwal, Sourabh Manna, Mayank Mishra, Aaron Muller, John Rex Mohan, Yasuhiro Fukuma, Martial Duchamp, Rajdeep Singh Rawat

Abstract: Exploring new strategies for controlling the magnetic domain propagation is the key to realize ultrafast, high-density domain wall-based memory and logic devices for next generation computing. These strategies include strain modulation in multiferroic devices, geometric confinement and area-selective pinning of domain wall. 2D Van der Waals materials introduce localized modifications to the interf… ▽ More Exploring new strategies for controlling the magnetic domain propagation is the key to realize ultrafast, high-density domain wall-based memory and logic devices for next generation computing. These strategies include strain modulation in multiferroic devices, geometric confinement and area-selective pinning of domain wall. 2D Van der Waals materials introduce localized modifications to the interfacial magnetic order, enabling control over the propagation of magnetic domains. Here, using Lorentz-Transmission Electron Microscopy (L-TEM) along with the Modified Transport of Intensity equations (MTIE), we demonstrate controlled domain expansion with in-situ magnetic field in a ferromagnet (Permalloy, NiFe) interfacing with a 2D Van der Waals material Graphene (Gr). The Gr/NiFe interface exhibits distinctive domain expansion rate with magnetic field selectively near the interface which is further analyzed using micromagnetic simulations. Our findings are crucial for comprehending direct visualization of interface controlled magnetic domain expansion, offering insights for developing future domain wall-based technology. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Merged Manuscript and supplementary file. Submitted to Communications Physics (under review)

arXiv:2404.02900 [pdf, other]

DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Authors: Harsh Rangwani, Pradipto Mondal, Mayank Mishra, Ashish Ramayee Asokan, R. Venkatesh Babu

Abstract: Vision Transformer (ViT) has emerged as a prominent architecture for various computer vision tasks. In ViT, we divide the input image into patch tokens and process them through a stack of self attention blocks. However, unlike Convolutional Neural Networks (CNN), ViTs simple architecture has no informative inductive bias (e.g., locality,etc. ). Due to this, ViT requires a large amount of data for… ▽ More Vision Transformer (ViT) has emerged as a prominent architecture for various computer vision tasks. In ViT, we divide the input image into patch tokens and process them through a stack of self attention blocks. However, unlike Convolutional Neural Networks (CNN), ViTs simple architecture has no informative inductive bias (e.g., locality,etc. ). Due to this, ViT requires a large amount of data for pre-training. Various data efficient approaches (DeiT) have been proposed to train ViT on balanced datasets effectively. However, limited literature discusses the use of ViT for datasets with long-tailed imbalances. In this work, we introduce DeiT-LT to tackle the problem of training ViTs from scratch on long-tailed datasets. In DeiT-LT, we introduce an efficient and effective way of distillation from CNN via distillation DIST token by using out-of-distribution images and re-weighting the distillation loss to enhance focus on tail classes. This leads to the learning of local CNN-like features in early ViT blocks, improving generalization for tail classes. Further, to mitigate overfitting, we propose distilling from a flat CNN teacher, which leads to learning low-rank generalizable features for DIST tokens across all ViT blocks. With the proposed DeiT-LT scheme, the distillation DIST token becomes an expert on the tail classes, and the classifier CLS token becomes an expert on the head classes. The experts help to effectively learn features corresponding to both the majority and minority classes using a distinct set of tokens within the same ViT architecture. We show the effectiveness of DeiT-LT for training ViT from scratch on datasets ranging from small-scale CIFAR-10 LT to large-scale iNaturalist-2018. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: CVPR 2024. Project Page: https://rangwani-harsh.github.io/DeiT-LT

arXiv:2404.00399 [pdf, other]

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Authors: Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak , et al. (20 additional authors not shown)

Abstract: Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, where… ▽ More Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, whereas pretraining from scratch is computationally expensive, and compliance with AI safety and development laws. This paper presents Aurora-M, a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. Continually pretrained from StarCoderPlus on 435 billion additional tokens, Aurora-M surpasses 2 trillion tokens in total training token count. It is the first open-source multilingual model fine-tuned on human-reviewed safety instructions, thus aligning its development not only with conventional red-teaming considerations, but also with the specific concerns articulated in the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Aurora-M is rigorously evaluated across various tasks and languages, demonstrating robustness against catastrophic forgetting and outperforming alternatives in multilingual settings, particularly in safety evaluations. To promote responsible open-source LLM development, Aurora-M and its variants are released at https://huggingface.co/collections/aurora-m/aurora-m-models-65fdfdff62471e09812f5407 . △ Less

Submitted 23 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2404.00188 [pdf, other]

doi 10.1109/ICAIC60265.2024.10433803

DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries

Authors: Manit Mishra, Abderrahman Braham, Charles Marsom, Bryan Chung, Gavin Griffin, Dakshesh Sidnerlikar, Chatanya Sarin, Arjun Rajaram

Abstract: Conventional processes for analyzing datasets and extracting meaningful information are often time-consuming and laborious. Previous work has identified manual, repetitive coding and data collection as major obstacles that hinder data scientists from undertaking more nuanced labor and high-level projects. To combat this, we evaluated OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS) that can e… ▽ More Conventional processes for analyzing datasets and extracting meaningful information are often time-consuming and laborious. Previous work has identified manual, repetitive coding and data collection as major obstacles that hinder data scientists from undertaking more nuanced labor and high-level projects. To combat this, we evaluated OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS) that can extrapolate key findings, including correlations and basic information, from a given dataset. The model was tested on a diverse set of benchmark datasets to evaluate its performance across multiple standards, including data science code-generation based tasks involving libraries such as NumPy, Pandas, Scikit-Learn, and TensorFlow, and was broadly successful in correctly answering a given data science query related to the benchmark dataset. The LDS used various novel prompt engineering techniques to effectively answer a given question, including Chain-of-Thought reinforcement and SayCan prompt engineering. Our findings demonstrate great potential for leveraging Large Language Models for low-level, zero-shot data analysis. △ Less

Submitted 29 March, 2024; originally announced April 2024.

Comments: 5 pages, Submitted to International Conference on AI in Cybersecurity

arXiv:2403.15305 [pdf, ps, other]

X-ray emission spectrum for axion-photon conversion in magnetospheres of strongly magnetized neutron stars

Authors: Shubham Yadav, M. Mishra, Tapomoy Guha Sarkar

Abstract: Detecting axionic dark matter (DM) could be possible in an X-ray spectrum from strongly magnetized neutron stars (NSs). We examine the possibility of axion-photon conversion in the magnetospheres of strongly magnetized NSs. In the current work, we investigate how the modified Tolman Oppenheimer Volkoff (TOV) system of equations (in the presence of a magnetic field) affects the energy spectrum of a… ▽ More Detecting axionic dark matter (DM) could be possible in an X-ray spectrum from strongly magnetized neutron stars (NSs). We examine the possibility of axion-photon conversion in the magnetospheres of strongly magnetized NSs. In the current work, we investigate how the modified Tolman Oppenheimer Volkoff (TOV) system of equations (in the presence of a magnetic field) affects the energy spectrum of axions and axions-converted-photon flux. We have considered the distance-dependent magnetic field in the modified TOV system of equations. We employ three different equations of states (EoSs), namely APR, FPS, and SLY, to solve these equations. We obtain the axions emission rate by including the Cooper-pair-breaking formation process and Bremsstrahlung process in the core of NSs using the NSCool code. We primarily focus on Magnificient seven (M7) star RXJ 1856.5-3754. We further investigate the impact of the magnetic field on the actual observables, such as axion energy spectrum and axion-converted-photon flux at an axion mass in meV range by assuming mass $M_{NS} \sim 1.4M_{\odot}$. We compare our calculated axion-converted-photon flux from all available archival data sets from PN+MOS+Chandra. We also study the variation of the energy spectrum at a fixed energy with varying central magnetic fields. Our predicted axion-converted-photon flux values as a function of axion energy closely follow the experimentally archival data, which allows us to put bounds on the axion mass for the three EoS. △ Less

Submitted 20 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted in European Physical Journal C (15 pages, 17 figures)

arXiv:2403.08936 [pdf, other]

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Authors: Peihong Yu, Manav Mishra, Alec Koppel, Carl Busart, Priya Narayan, Dinesh Manocha, Amrit Bedi, Pratap Tokekar

Abstract: Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce… ▽ More Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team. These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements, thus naively imitating them will not achieve cooperation due to potential conflicts. To this end, we propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate, namely personalized expert-guided MARL (PegMARL). This algorithm utilizes two discriminators: the first provides incentives based on the alignment of policy behavior with demonstrations, and the second regulates incentives based on whether the behavior leads to the desired objective. We evaluate PegMARL using personalized demonstrations in both discrete and continuous environments. The results demonstrate that PegMARL learns near-optimal policies even when provided with suboptimal demonstrations, and outperforms state-of-the-art MARL algorithms in solving coordinated tasks. We also showcase PegMARL's capability to leverage joint demonstrations in the StarCraft scenario and converge effectively even with demonstrations from non-co-trained policies. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.19173 [pdf, other]

StarCoder 2 and The Stack v2: The Next Generation

Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.13044 [pdf, ps, other]

Conversion of Emitted Axionic Dark Matter to Photons for Non-Rotating Magnetized Neutron Stars

Authors: Shubham Yadav, M. Mishra, Tapomoy Guha Sarkar

Abstract: We attempt to find the impact of a modified Tolman Oppenheimer Volkoff (TOV) system of equations on the luminosities of direct photons, neutrinos and axions for a particular axion mass in the presence of a magnetic field. We employ two different equation of states (EoSs) namely APR and FPS to generate the profiles of mass and pressure for spherically symmetric and non-rotating Neutron stars (NSs).… ▽ More We attempt to find the impact of a modified Tolman Oppenheimer Volkoff (TOV) system of equations on the luminosities of direct photons, neutrinos and axions for a particular axion mass in the presence of a magnetic field. We employ two different equation of states (EoSs) namely APR and FPS to generate the profiles of mass and pressure for spherically symmetric and non-rotating Neutron stars (NSs). We then compute the axions and neutrino emission rates by employing the Cooper-pair-breaking and formation process (PBF) in the core using the NSCool code. We also examine the possibility of axion to photon conversion in the magnetosphere of NSs. Furthermore, we investigate the impact of the magnetic field on the actual observables, such as the energy spectrum of axions and axion-converted photon flux for three different NSs. Our comparative study indicates that axions energy spectrum and axion-converted photon flux changes significantly due to an intense magnetic field. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 12 pages, 14 figures. arXiv admin note: text overlap with arXiv:2212.11652

arXiv:2402.02479 [pdf, other]

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Authors: Gaurav Pandey, Yatin Nandwani, Tahira Naseem, Mayank Mishra, Guangxuan Xu, Dinesh Raghu, Sachindra Joshi, Asim Munawar, Ramón Fernandez Astudillo

Abstract: Distribution matching methods for language model alignment such as Generation with Distributional Control (GDC) and Distributional Policy Gradient (DPG) have not received the same level of attention in reinforcement learning from human feedback (RLHF) as contrastive methods such as Sequence Likelihood Calibration (SLiC), Direct Preference Optimization (DPO) and its variants. We identify high varia… ▽ More Distribution matching methods for language model alignment such as Generation with Distributional Control (GDC) and Distributional Policy Gradient (DPG) have not received the same level of attention in reinforcement learning from human feedback (RLHF) as contrastive methods such as Sequence Likelihood Calibration (SLiC), Direct Preference Optimization (DPO) and its variants. We identify high variance of the gradient estimate as the primary reason for the lack of success of these methods and propose a self-normalized baseline to reduce the variance. We further generalize the target distribution in DPG, GDC and DPO by using Bayes' rule to define the reward-conditioned posterior. The resulting approach, referred to as BRAIn - Bayesian Reward-conditioned Amortized Inference acts as a bridge between distribution matching methods and DPO and significantly outperforms prior art in summarization and Antropic HH tasks. △ Less

Submitted 10 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted at ICML 2024 (main conference)

arXiv:2401.04951 [pdf, ps, other]

Conjugacy classes of automorphisms of the unit ball in a complex Hilbert space

Authors: Rachna Aggarwal, Krishnendu Gongopadhyay, Mukund Madhav Mishra

Abstract: In this article, we consider the ball model of an infinite dimensional complex hyperbolic space, i.e. the open unit ball of a complex Hilbert space centered at the origin equipped with the Caratheodory metric. We consider the group of holomorphic automorphisms of the ball and classify the conjugacy classes of automorphisms. We also compute the centralizers for elements in the group of automorphism… ▽ More In this article, we consider the ball model of an infinite dimensional complex hyperbolic space, i.e. the open unit ball of a complex Hilbert space centered at the origin equipped with the Caratheodory metric. We consider the group of holomorphic automorphisms of the ball and classify the conjugacy classes of automorphisms. We also compute the centralizers for elements in the group of automorphisms. △ Less

Submitted 10 January, 2024; originally announced January 2024.

MSC Class: 51M10; 51F25

arXiv:2312.07078 [pdf, ps, other]

A generalization of a result of Minakshisundaram and Pleijel

Authors: Mansi Mishra, Ankita Sharma, M. K. Vemuri

Abstract: Minakshisundaram and Pleijel gave an asymptotic formula for the sum of squares of the pointwise values of the eigenfunctions of the Laplace-Beltrami operator on a compact Riemannian manifold, with eigenvalues less than a fixed number. Here, a generalization is given, where the pointwise values are replaced by the Fourier coefficients of a smooth measure supported on a compact submanifold. Minakshisundaram and Pleijel gave an asymptotic formula for the sum of squares of the pointwise values of the eigenfunctions of the Laplace-Beltrami operator on a compact Riemannian manifold, with eigenvalues less than a fixed number. Here, a generalization is given, where the pointwise values are replaced by the Fourier coefficients of a smooth measure supported on a compact submanifold. △ Less

Submitted 16 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: 13 pages

MSC Class: 58J50; 58J35

arXiv:2309.12076 [pdf, other]

Super-resolution and super-sensitivity of quantum LiDAR with multi-photonic state and binary outcome photon counting measurement

Authors: Priyanka Sharma, Manoj K. Mishra, Devendra Kumar Mishra

Abstract: Here we are investigating the enhancement in phase sensitivity and resolution in Mach-Zehnder interferometer (MZI) based quantum LiDAR. We are using multi-photonic state (MPS), superposition of four coherent states [1], as the input state and binary outcome parity photon counting measurement and binary outcome zero-nonzero photon counting measurement as the measurement schemes. We thoroughly inves… ▽ More Here we are investigating the enhancement in phase sensitivity and resolution in Mach-Zehnder interferometer (MZI) based quantum LiDAR. We are using multi-photonic state (MPS), superposition of four coherent states [1], as the input state and binary outcome parity photon counting measurement and binary outcome zero-nonzero photon counting measurement as the measurement schemes. We thoroughly investigate the results in lossless as well as in lossy cases. We found enhancement in resolution and phase sensitivity in comparison to the coherent state and even coherent superposition state (ECSS) based quantum LiDAR. Our analysis shows that MPS may be an alternative nonclassical resource in the field of quantum imaging and quantum sensing technologies, like in quantum LiDAR. △ Less

Submitted 3 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: We welcome comments

arXiv:2309.02376 [pdf]

Performance analysis of InAlN/GaN HEMT and optimization for high frequency applications

Authors: Jagori Raychaudhuri, Jayjit Mukherjee, Amit Malik, Sudhir Kumar, D. S. Rawal, Meena Mishra, Santanu Ghosh

Abstract: An InAlN/GaN HEMT device was studied using extensive temperature dependent DC IV measurements and CV measurements. Barrier traps in the InAlN layer were characterized using transient analysis. Forward gate current was modelled using analytical equations. RF performance of the device was also studied and device parameters were extracted following small signal equivalent circuit model. Extensive sim… ▽ More An InAlN/GaN HEMT device was studied using extensive temperature dependent DC IV measurements and CV measurements. Barrier traps in the InAlN layer were characterized using transient analysis. Forward gate current was modelled using analytical equations. RF performance of the device was also studied and device parameters were extracted following small signal equivalent circuit model. Extensive simulations in Silvaco TCADきゃど were also carried out by varying stem height, gate length and incorporating back barrier to optimize the suitability of this device in Ku-band by reducing the detrimental Short Channel Effects (SCEs). In this paper a novel structure i.e., a short length T gate with recess, on thin GaN buffer to achieve high cut-off frequency (f$_T$) and high maximum oscillating frequency (f$_{max}$) apt for Ku-band applications is also proposed. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2309.02365 [pdf]

doi 10.1088/1402-4896/ad32ff

Investigation of RF performance of Ku-band GaN HEMT device and an in-depth analysis of short channel effects

Authors: Jagori Raychaudhuri, Jayjit Mukherjee, Sudhir Kumar, D. S. Rawal, Meena Mishra, Santanu Ghosh

Abstract: In this paper, we have characterized an AlGaN/GaN High Electron Mobility Transistor (HEMT) with a short gate length (Lg $\approx$ 0.15$μみゅー$m). We have studied the effect of short gate length on the small signal parameters, linearity parameters and gm-gd ratio in GaN HEMT devices. To understand how scaling results in the variation of the above-mentioned parameters a comparative study with higher gate… ▽ More In this paper, we have characterized an AlGaN/GaN High Electron Mobility Transistor (HEMT) with a short gate length (Lg $\approx$ 0.15$μみゅー$m). We have studied the effect of short gate length on the small signal parameters, linearity parameters and gm-gd ratio in GaN HEMT devices. To understand how scaling results in the variation of the above-mentioned parameters a comparative study with higher gate length devices on similar heterostructure is also presented here. We have scaled down the gate length but the barrier thickness(t$_{bar}$) remained same which affects the aspect ratio (L$_{g}$/t$_{bar}$) of the device and its inseparable consequences are the prominent short channel effects (SCEs) barring the optimum output performance of the device. These interesting phenomena were studied in detail and explored over a temperature range of -40$^\circ$C to 80$^\circ$C. To the best of our knowledge this paper explores temperature dependence of SCEs of GaN HEMT for the first time. With an approach to reduce the impact of SCEs a simulation study in Silvaco TCADきゃど was carried out and it is observed that a recessed gate structure on conventional heterostructure successfully reduces SCEs and improves RF performance of the device. This work gives an overall view of gate length scaling on conventional AlGaN/GaN HEMTs. △ Less

Submitted 9 April, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

Journal ref: Physica Scripta, Vol. 99, No. 4, 2024

arXiv:2308.04843 [pdf, ps, other]

doi 10.1016/j.jmaa.2024.128532

Existence and Uniqueness of Solution to Unsteady Darcy-Brinkman Problem with Korteweg Stress for Modelling Miscible Porous Media Flow

Authors: Sahil Kundu, Surya Narayan Maharana, Manoranjan Mishra

Abstract: The work investigates a model that combines a convection-diffusion-reaction equation for solute concentration with an unsteady Darcy-Brinkman equation for the flow field, including the Kortweg stress. Additionally, the flow field experiences an external body force term while the permeability fluctuates with solute concentration. Such models are used to describe flows in porous mediums such as frac… ▽ More The work investigates a model that combines a convection-diffusion-reaction equation for solute concentration with an unsteady Darcy-Brinkman equation for the flow field, including the Kortweg stress. Additionally, the flow field experiences an external body force term while the permeability fluctuates with solute concentration. Such models are used to describe flows in porous mediums such as fractured karst reservoirs, mineral wool, industrial foam, coastal mud, etc. The system of equations has Neumann boundary conditions for the solute concentration and no-flow conditions for the velocity field, and the well-posedness of the model is discussed for a wide range of initial data. The proofing techniques remain applicable in establishing the well-posedness of non-reactive and homogeneous porous media flows under the specified simplifications. △ Less

Submitted 24 May, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

MSC Class: 76D03 (Primary) 76S05; 35D30; 35Q35 (Secondary)

arXiv:2307.12139 [pdf]

Dense plasma irradiated platinum with improved spin Hall effect

Authors: Sachin Kumar, Sourabh Manna, John Rex Mohan, Utkarsh Shashank, Jospeh Vimal, Mayank Mishra, Surbhi Gupta, Hironori Asada, Yasuhiro Fukuma, Rajdeep Singh Rawat, Rohit Medwal

Abstract: The impurity incorporation in host high-spin orbit coupling materials like platinum has shown improved charge-to-spin conversion by modifying the up-spin and down-spin electron trajectories by bending or skewing them in opposite directions. This enables efficient generation, manipulation, and transport of spin currents. In this study, we irradiate the platinum with non-focus dense plasma to incorp… ▽ More The impurity incorporation in host high-spin orbit coupling materials like platinum has shown improved charge-to-spin conversion by modifying the up-spin and down-spin electron trajectories by bending or skewing them in opposite directions. This enables efficient generation, manipulation, and transport of spin currents. In this study, we irradiate the platinum with non-focus dense plasma to incorporate the oxygen ion species. We systematically analyze the spin Hall angle of the oxygen plasma irradiated Pt films using spin torque ferromagnetic resonance. Our results demonstrate a 2.4 times enhancement in the spin Hall effect after plasma treatment of Pt as compared to pristine Pt. This improvement is attributed to the introduction of disorder and defects in the Pt lattice, which enhances the spin-orbit coupling and leads to more efficient charge-to-spin conversion without breaking the spin-orbit torque symmetries. Our findings offer a new method of dense plasma-based modification of material for the development of advanced spintronic devices based on Pt and other heavy metals. △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2305.11790 [pdf, other]

Prompting with Pseudo-Code Instructions

Authors: Mayank Mishra, Prince Kumar, Riyaz Bhat, Rudra Murthy V, Danish Contractor, Srikanth Tamilselvam

Abstract: Prompting with natural language instructions has recently emerged as a popular method of harnessing the capabilities of large language models. Given the inherent ambiguity present in natural language, it is intuitive to consider the possible advantages of prompting with less ambiguous prompt styles, such as the use of pseudo-code. In this paper we explore if prompting via pseudo-code instruction… ▽ More Prompting with natural language instructions has recently emerged as a popular method of harnessing the capabilities of large language models. Given the inherent ambiguity present in natural language, it is intuitive to consider the possible advantages of prompting with less ambiguous prompt styles, such as the use of pseudo-code. In this paper we explore if prompting via pseudo-code instructions helps improve the performance of pre-trained language models. We manually create a dataset of pseudo-code prompts for 132 different tasks spanning classification, QA and generative language tasks, sourced from the Super-NaturalInstructions dataset. Using these prompts along with their counterparts in natural language, we study their performance on two LLM families - BLOOM and CodeGen. Our experiments show that using pseudo-code instructions leads to better results, with an average increase (absolute) of 7-16 points in F1 scores for classification tasks and an improvement (relative) of 12-38% in aggregate ROUGE-L scores across all tasks. We include detailed ablation studies which indicate that code comments, docstrings, and the structural clues encoded in pseudo-code all contribute towards the improvement in performance. To the best of our knowledge our work is the first to demonstrate how pseudo-code prompts can be helpful in improving the performance of pre-trained LMs. △ Less

Submitted 19 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Published in EMNLP 2023 main track

arXiv:2305.06161 [pdf, other]

StarCoder: may the source be with you!

Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license. △ Less

Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

arXiv:2302.07440 [pdf]

Road Redesign Technique Achieving Enhanced Road Safety by Inpainting with a Diffusion Model

Authors: Sumit Mishra, Medhavi Mishra, Taeyoung Kim, Dongsoo Har

Abstract: Road infrastructure can affect the occurrence of road accidents. Therefore, identifying roadway features with high accident probability is crucial. Here, we introduce image inpainting that can assist authorities in achieving safe roadway design with minimal intervention in the current roadway structure. Image inpainting is based on inpainting safe roadway elements in a roadway image, replacing acc… ▽ More Road infrastructure can affect the occurrence of road accidents. Therefore, identifying roadway features with high accident probability is crucial. Here, we introduce image inpainting that can assist authorities in achieving safe roadway design with minimal intervention in the current roadway structure. Image inpainting is based on inpainting safe roadway elements in a roadway image, replacing accident-prone (AP) features by using a diffusion model. After object-level segmentation, the AP features identified by the properties of accident hotspots are masked by a human operator and safe roadway elements are inpainted. With only an average time of 2 min for image inpainting, the likelihood of an image being classified as an accident hotspot drops by an average of 11.85%. In addition, safe urban spaces can be designed considering human factors of commuters such as gaze saliency. Considering this, we introduce saliency enhancement that suggests chrominance alteration for a safe road view. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 9 Pages, 6 figures, 4 tables

arXiv:2301.03988 [pdf, other]

SantaCoder: don't reach for the stars!

Authors: Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo , et al. (16 additional authors not shown)

Abstract: The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigat… ▽ More The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigating better preprocessing methods for the training data. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. We find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at https://hf.co/bigcode. △ Less

Submitted 24 February, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

arXiv:2212.13827 [pdf, other]

Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data

Authors: Harsh Rangwani, Sumukh K Aithal, Mayank Mishra, R. Venkatesh Babu

Abstract: Real-world datasets exhibit imbalances of varying types and degrees. Several techniques based on re-weighting and margin adjustment of loss are often used to enhance the performance of neural networks, particularly on minority classes. In this work, we analyze the class-imbalanced learning problem by examining the loss landscape of neural networks trained with re-weighting and margin-based techniq… ▽ More Real-world datasets exhibit imbalances of varying types and degrees. Several techniques based on re-weighting and margin adjustment of loss are often used to enhance the performance of neural networks, particularly on minority classes. In this work, we analyze the class-imbalanced learning problem by examining the loss landscape of neural networks trained with re-weighting and margin-based techniques. Specifically, we examine the spectral density of Hessian of class-wise loss, through which we observe that the network weights converge to a saddle point in the loss landscapes of minority classes. Following this observation, we also find that optimization methods designed to escape from saddle points can be effectively used to improve generalization on minority classes. We further theoretically and empirically demonstrate that Sharpness-Aware Minimization (SAM), a recent technique that encourages convergence to a flat minima, can be effectively used to escape saddle points for minority classes. Using SAM results in a 6.2\% increase in accuracy on the minority classes over the state-of-the-art Vector Scaling Loss, leading to an overall average increase of 4\% across imbalanced datasets. The code is available at: https://github.com/val-iisc/Saddle-LongTail. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Comments: NeurIPS 2022. Code: https://github.com/val-iisc/Saddle-LongTail

arXiv:2212.11652 [pdf, ps, other]

Thermal Evolution and Axion Emission Properties of Strongly Magnetized Neutron Stars

Authors: Shubham Yadav, M. Mishra, Tapomoy Guha Sarkar, Captain R. Singh

Abstract: Emission properties of compact astrophysical objects such as Neutron stars (NSs) are associated with crucial astronomical observables. In the current work, we obtain the mass, pressure profiles of the non-rotating NSs using the modified Tolman Oppenheimer Volkoff (TOV) system of equations in the presence of intense magnetic field. We obtain the profiles by using a specific distance-dependent magne… ▽ More Emission properties of compact astrophysical objects such as Neutron stars (NSs) are associated with crucial astronomical observables. In the current work, we obtain the mass, pressure profiles of the non-rotating NSs using the modified Tolman Oppenheimer Volkoff (TOV) system of equations in the presence of intense magnetic field. We obtain the profiles by using a specific distance-dependent magnetic field in the modified TOV equations. We employ three different equations of states (EoS) to solve the TOV equations by assuming the core of NSs comprises a hadronic matter. Employing the above profiles, we determine the cooling rates of spherically symmetric NSs as a function of time with and without including the magnetic field using the NSCool code. We have also determined the cooling rates as a function of radius for three different NSs. Furthermore, we determine the luminosity of neutrinos, axions, and photons emitting from the NSs in the presence and absence of a magnetic field for an axion mass $16$ meV and three different EoS. Our comparative study indicates that the cooling rate and luminosities of neutrinos, axions, and photons change significantly due to the impact of the strong magnetic field. We also find that due to the magnetic field, the axion mass bound increases slightly compared to without a magnetic field.. △ Less

Submitted 27 February, 2024; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: Accepted in European Physical Journal C. 19 pages , 34 figures

arXiv:2212.09624 [pdf]

Holder Recommendations using Graph Representation Learning & Link Prediction

Authors: Rachna Saxena, Abhijeet Kumar, Mridul Mishra

Abstract: Lead recommendations for financial products such as funds or ETF is potentially challenging in investment space due to changing market scenarios, and difficulty in capturing financial holder's mindset and their philosophy. Current methods surface leads based on certain product categorization and attributes like returns, fees, category etc. to suggest similar product to investors which may not capt… ▽ More Lead recommendations for financial products such as funds or ETF is potentially challenging in investment space due to changing market scenarios, and difficulty in capturing financial holder's mindset and their philosophy. Current methods surface leads based on certain product categorization and attributes like returns, fees, category etc. to suggest similar product to investors which may not capture the holder's investment behavior holistically. Other reported works does subjective analysis of institutional holder's ideology. This paper proposes a comprehensive data driven framework for developing a lead recommendations system in holder's space for financial products like funds by using transactional history, asset flows and product specific attributes. The system assumes holder's interest implicitly by considering all investment transactions made and collects possible meta information to detect holder's investment profile/persona like investment anticipation and investment behavior. This paper focusses on holder recommendation component of framework which employs a bi-partite graph representation of financial holders and funds using variety of attributes and further employs GraphSage model for learning representations followed by link prediction model for ranking recommendation for future period. The performance of the proposed approach is compared with baseline model i.e., content-based filtering approach on metric hits at Top-k (50, 100, 200) recommendations. We found that the proposed graph ML solution outperform baseline by absolute 42%, 22% and 14% with a look ahead bias and by absolute 18%, 19% and 18% on completely unseen holders in terms of hit rate for top-k recommendations: 50, 100 and 200 respectively. △ Less

Submitted 10 November, 2022; originally announced December 2022.

Comments: 6 pages, 6 figures, 2 tables Presented at a workshop in ACM AI in Finance conference

arXiv:2211.12729 [pdf, ps, other]

The Weyl Transform of a measure

Authors: Mansi Mishra, M. K. Vemuri

Abstract: (1) Suppose $μみゅー$ is a smooth measure on a hypersurface of positive Gaussian curvature in $\R^{2n}$. If $n\ge 2$, then $W(μみゅー)$, the Weyl transform of $μみゅー$, is a compact operator, and if $p>n\ge 6$ then $W(μみゅー)$ belongs to the $p$-Schatten class. (2) There exist Schatten class operators with linearly dependent quantum translates. (1) Suppose $μみゅー$ is a smooth measure on a hypersurface of positive Gaussian curvature in $\R^{2n}$. If $n\ge 2$, then $W(μみゅー)$, the Weyl transform of $μみゅー$, is a compact operator, and if $p>n\ge 6$ then $W(μみゅー)$ belongs to the $p$-Schatten class. (2) There exist Schatten class operators with linearly dependent quantum translates. △ Less

Submitted 23 November, 2022; originally announced November 2022.

MSC Class: 22D10; 22E30; 43A05; 43A80; 47B10

arXiv:2211.11395 [pdf, ps, other]

Prasad's Conjecture about dualizing involutions

Authors: Prashant Arote, Manish Mishra

Abstract: Let $G$ be a connected reductive group defined over a finite field $\mathbb{F}_q$ with corresponding Frobenius $F$. Let $ιいおた_G$ denote the duality involution defined by D. Prasad under the hypothesis $2\mathrm{H}^1(F,Z(G))=0$, where $Z(G)$ denotes the center of $G$. We show that for each irreducible character $ρろー$ of $G^F$, the involution $ιいおた_G$ takes $ρろー$ to its dual $ρろー^{\vee}$ if and only if for a su… ▽ More Let $G$ be a connected reductive group defined over a finite field $\mathbb{F}_q$ with corresponding Frobenius $F$. Let $ιいおた_G$ denote the duality involution defined by D. Prasad under the hypothesis $2\mathrm{H}^1(F,Z(G))=0$, where $Z(G)$ denotes the center of $G$. We show that for each irreducible character $ρろー$ of $G^F$, the involution $ιいおた_G$ takes $ρろー$ to its dual $ρろー^{\vee}$ if and only if for a suitable Jordan decomposition of characters, an associated unipotent character $u_ρろー$ has Frobenius eigenvalues $\pm$ 1. As a corollary, we obtain that if $G$ has no exceptional factors and satisfies $2\mathrm{H}^1(F,Z(G))=0$, then the duality involution $ιいおた_G$ takes $ρろー$ to its dual $ρろー^{\vee}$ for each irreducible character $ρろー$ of $G^F$. Our results resolve a finite group counterpart of a conjecture of D.~Prasad. △ Less

Submitted 16 November, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: Title changed. Final version. To appear in IMRN

arXiv:2211.06628 [pdf, ps, other]

doi 10.1007/JHEP02(2023)145

Higher derivative invariants in four dimensional N=3 Poincare supergravity

Authors: Subramanya Hegde, Madhu Mishra, Debangshu Mukherjee, Bindusar Sahoo

Abstract: In this paper, we use the superconformal approach to derive the higher derivative action for N = 3 Poincare supergravity in four space-time dimensions. We first study the coupling of N = 3 vector multiplets to conformal supergravity. Thereafter we combine it with the pure N = 3 conformal supergravity action and use a minimum of three vector multiplets as compensators to arrive at Poincare supergra… ▽ More In this paper, we use the superconformal approach to derive the higher derivative action for N = 3 Poincare supergravity in four space-time dimensions. We first study the coupling of N = 3 vector multiplets to conformal supergravity. Thereafter we combine it with the pure N = 3 conformal supergravity action and use a minimum of three vector multiplets as compensators to arrive at Poincare supergravity with higher derivative corrections. We give a general prescription on how to eliminate the auxiliary fields in an iterative manner and obtain the supergravity action order by order in derivatives. We also show that the truncation of the action at fourth order in derivatives is a consistent truncation. △ Less

Submitted 27 January, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

Comments: 31 pages,minor changes

arXiv:2211.05100 [pdf, other]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License. △ Less

Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.16759 [pdf, ps, other]

doi 10.1007/s43037-023-00255-4

Fixed points and normal automorphisms of the unit ball of bounded operators on $\mathbb{C}^n$

Authors: Rachna Aggarwal, Krishnendu Gongopadhyay, Mukund Madhav Mishra

Abstract: We examine the group of isometries of the open unit ball of a complex Banach space of certain bounded linear operators equipped with the Carathéodory metric. Therein we obtain a charactrization of the normal isometries in terms of their special type of fixed points. We examine the group of isometries of the open unit ball of a complex Banach space of certain bounded linear operators equipped with the Carathéodory metric. Therein we obtain a charactrization of the normal isometries in terms of their special type of fixed points. △ Less

Submitted 7 March, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

MSC Class: Primary 32M15; Secondary 47B02; 47B15; 47B91

arXiv:2210.07295 [pdf, other]

Joint Reasoning on Hybrid-knowledge sources for Task-Oriented Dialog

Authors: Mayank Mishra, Danish Contractor, Dinesh Raghu

Abstract: Traditional systems designed for task oriented dialog utilize knowledge present only in structured knowledge sources to generate responses. However, relevant information required to generate responses may also reside in unstructured sources, such as documents. Recent state of the art models such as HyKnow and SeKnow aimed at overcoming these challenges make limiting assumptions about the knowledge… ▽ More Traditional systems designed for task oriented dialog utilize knowledge present only in structured knowledge sources to generate responses. However, relevant information required to generate responses may also reside in unstructured sources, such as documents. Recent state of the art models such as HyKnow and SeKnow aimed at overcoming these challenges make limiting assumptions about the knowledge sources. For instance, these systems assume that certain types of information, such as a phone number, is always present in a structured knowledge base (KB) while information about aspects such as entrance ticket prices, would always be available in documents. In this paper, we create a modified version of the MutliWOZ-based dataset prepared by SeKnow to demonstrate how current methods have significant degradation in performance when strict assumptions about the source of information are removed. Then, in line with recent work exploiting pre-trained language models, we fine-tune a BART based model using prompts for the tasks of querying knowledge sources, as well as, for response generation, without making assumptions about the information present in each knowledge source. Through a series of experiments, we demonstrate that our model is robust to perturbations to knowledge modality (source of information), and that it can fuse information from structured as well as unstructured knowledge to generate responses. △ Less

Submitted 7 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

arXiv:2209.00574 [pdf, ps, other]

Harish-Chandra Induction and Jordan Decomposition of Characters

Authors: Prashant Arote, Manish Mishra

Abstract: We show that for any finite connected reductive group, a Jordan decomposition can always be chosen such that it commutes with Harish-Chandra induction. En route, we show that the endomorphism algebra of the Harish-Chandra induction of a cuspidal representation of a Levi subgroup is isomorphic to a unipotent counterpart. These results generalize the well known results for groups with connected cent… ▽ More We show that for any finite connected reductive group, a Jordan decomposition can always be chosen such that it commutes with Harish-Chandra induction. En route, we show that the endomorphism algebra of the Harish-Chandra induction of a cuspidal representation of a Levi subgroup is isomorphic to a unipotent counterpart. These results generalize the well known results for groups with connected center. △ Less

Submitted 20 December, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

Comments: Proof of Lemma 5.1 in previous version is corrected

MSC Class: 20C33

arXiv:2206.08213 [pdf, other]

A Closer Look at Smoothness in Domain Adversarial Training

Authors: Harsh Rangwani, Sumukh K Aithal, Mayank Mishra, Arihant Jain, R. Venkatesh Babu

Abstract: Domain adversarial training has been ubiquitous for achieving invariant representations and is used widely for various domain adaptation tasks. In recent times, methods converging to smooth optima have shown improved generalization for supervised learning tasks like classification. In this work, we analyze the effect of smoothness enhancing formulations on domain adversarial training, the objectiv… ▽ More Domain adversarial training has been ubiquitous for achieving invariant representations and is used widely for various domain adaptation tasks. In recent times, methods converging to smooth optima have shown improved generalization for supervised learning tasks like classification. In this work, we analyze the effect of smoothness enhancing formulations on domain adversarial training, the objective of which is a combination of task loss (eg. classification, regression, etc.) and adversarial terms. We find that converging to a smooth minima with respect to (w.r.t.) task loss stabilizes the adversarial training leading to better performance on target domain. In contrast to task loss, our analysis shows that converging to smooth minima w.r.t. adversarial loss leads to sub-optimal generalization on the target domain. Based on the analysis, we introduce the Smooth Domain Adversarial Training (SDAT) procedure, which effectively enhances the performance of existing domain adversarial methods for both classification and object detection tasks. Our analysis also provides insight into the extensive usage of SGD over Adam in the community for domain adversarial training. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: ICML 2022. Code: https://github.com/val-iisc/SDAT

arXiv:2206.00485 [pdf, other]

Co-creation and ownership for AI radio

Authors: Skylar Gordon, Robert Mahari, Manaswi Mishra, Ziv Epstein

Abstract: Recent breakthroughs in AI-generated music open the door for new forms for co-creation and co-creativity. We present Artificial$.\!$fm, a proof-of-concept casual creator that blends AI-music generation, subjective ratings, and personalized recommendation for the creation and curation of AI-generated music. Listeners can rate emergent songs to steer the evolution of future music. They can also pers… ▽ More Recent breakthroughs in AI-generated music open the door for new forms for co-creation and co-creativity. We present Artificial$.\!$fm, a proof-of-concept casual creator that blends AI-music generation, subjective ratings, and personalized recommendation for the creation and curation of AI-generated music. Listeners can rate emergent songs to steer the evolution of future music. They can also personalize their preferences to better navigate the possibility space. As a "slow creator" with many human stakeholders, Artificial$.\!$fm is an example of how casual creators can leverage human curation at scale to collectively navigate a possibility space. It also provides a case study to reflect on how ownership should be considered in these contexts. We report on the design and development of Artificial$.\!$fm, and provide a legal analysis on the ownership of artifacts generated on the platform. △ Less

Submitted 1 June, 2022; originally announced June 2022.

arXiv:2205.12890 [pdf, other]

doi 10.1103/PhysRevLett.129.010501

Demonstration of Entanglement-Enhanced Covert Sensing

Authors: Shuhong Hao, Haowei Shi, Christos N. Gagatsos, Mayank Mishra, Boulat Bash, Ivan Djordjevic, Saikat Guha, Quntao Zhuang, Zheshen Zhang

Abstract: The laws of quantum physics endow superior performance and security for information processing: quantum sensing harnesses nonclassical resources to enable measurement precision unmatched by classical sensing, whereas quantum cryptography aims to unconditionally protect the secrecy of the processed information. Here, we present the theory and experiment for entanglement-enhanced covert sensing, a p… ▽ More The laws of quantum physics endow superior performance and security for information processing: quantum sensing harnesses nonclassical resources to enable measurement precision unmatched by classical sensing, whereas quantum cryptography aims to unconditionally protect the secrecy of the processed information. Here, we present the theory and experiment for entanglement-enhanced covert sensing, a paradigm that simultaneously offers high measurement precision and data integrity by concealing the probe signal in an ambient noise background so that the execution of the protocol is undetectable with a high probability. We show that entanglement offers a performance boost in estimating the imparted phase by a probed object, as compared to a classical protocol at the same covertness level. The implemented entanglement-enhanced covert sensing protocol operates close to the fundamental quantum limit by virtue of its near-optimum entanglement source and quantum receiver. Our work is expected to create ample opportunities for quantum information processing at unprecedented security and performance levels. △ Less

Submitted 27 June, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: 19 pages, 12 figures

Journal ref: Phys. Rev. Lett 129, 010501 (2022)

arXiv:2204.04575 [pdf, other]

The COHERENT Experimental Program

Authors: D. Akimov, S. Alawabdeh, P. An, A. Arteaga, C. Awe, P. S. Barbeau, C. Barry, B. Becker, V. Belov, I. Bernardi, M. A. Blackston, L. Blokland, C. Bock, B. Bodur, A. Bolozdynya, R. Bouabid, A. Bracho, J. Browning, B. Cabrera-Palmer, N. Chen, D. Chernyak, E. Conley, J. Daughhetee, J. Daughtry, E. Day , et al. (106 additional authors not shown)

Abstract: The COHERENT experiment located in Neutrino Alley at the Spallation Neutron Source (SNS), Oak Ridge National Laboratory (ORNL), has made the world's first two measurements of coherent elastic neutrino-nucleus scattering (CEvNS), on CsI and argon, using neutrinos produced at the SNS. The COHERENT collaboration continues to pursue CEvNS measurements on various targets as well as additional studies o… ▽ More The COHERENT experiment located in Neutrino Alley at the Spallation Neutron Source (SNS), Oak Ridge National Laboratory (ORNL), has made the world's first two measurements of coherent elastic neutrino-nucleus scattering (CEvNS), on CsI and argon, using neutrinos produced at the SNS. The COHERENT collaboration continues to pursue CEvNS measurements on various targets as well as additional studies of inelastic neutrino-nucleus interactions, searches for accelerator-produced dark matter (DM) and physics beyond the Standard Model, using the uniquely high-quality and high-intensity neutrino source available at the SNS. This white paper describes primarily COHERENT's ongoing and near-future program at the SNS First Target Station (FTS). Opportunities enabled by the SNS Second Target Station (STS) for the study of neutrino physics and development of novel detector technologies are elaborated in a separate white paper. △ Less

Submitted 9 April, 2022; originally announced April 2022.

Comments: 38 papers, 24 figures; Snowmass contribution

arXiv:2203.05279 [pdf, other]

doi 10.1007/JHEP06(2022)087

Thermodynamics of BPS and Near-BPS AdS6 Black Holes

Authors: Madhu Mishra, Amitabh Virmani

Abstract: We develop the thermodynamics of BPS and near-BPS AdS6 black holes. We study the phase diagram of BPS black holes in the grand canonical ensemble. We highlight two distinct deformations orthogonal to the BPS surface: (i) increasing the temperature while keeping the charges fixed, (ii) changing the charges while maintaining extremality such that the BPS constraint is no longer satisfied. For both t… ▽ More We develop the thermodynamics of BPS and near-BPS AdS6 black holes. We study the phase diagram of BPS black holes in the grand canonical ensemble. We highlight two distinct deformations orthogonal to the BPS surface: (i) increasing the temperature while keeping the charges fixed, (ii) changing the charges while maintaining extremality such that the BPS constraint is no longer satisfied. For both these deformations, we show that the considerations of the BPS entropy function can be extended to describe the near-BPS regime. The excess entropy together with changes in all potentials are perfectly accounted for via the extremization principle. △ Less

Submitted 24 March, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: 35 pages, 5 figures; v2: minor changes, reference added

Journal ref: https://link.springer.com/article/10.1007/JHEP06(2022)087

arXiv:2202.03734 [pdf, other]

Cascaded Debiasing: Studying the Cumulative Effect of Multiple Fairness-Enhancing Interventions

Authors: Bhavya Ghai, Mihir Mishra, Klaus Mueller

Abstract: Understanding the cumulative effect of multiple fairness enhancing interventions at different stages of the machine learning (ML) pipeline is a critical and underexplored facet of the fairness literature. Such knowledge can be valuable to data scientists/ML practitioners in designing fair ML pipelines. This paper takes the first step in exploring this area by undertaking an extensive empirical stu… ▽ More Understanding the cumulative effect of multiple fairness enhancing interventions at different stages of the machine learning (ML) pipeline is a critical and underexplored facet of the fairness literature. Such knowledge can be valuable to data scientists/ML practitioners in designing fair ML pipelines. This paper takes the first step in exploring this area by undertaking an extensive empirical study comprising 60 combinations of interventions, 9 fairness metrics, 2 utility metrics (Accuracy and F1 Score) across 4 benchmark datasets. We quantitatively analyze the experimental data to measure the impact of multiple interventions on fairness, utility and population groups. We found that applying multiple interventions results in better fairness and lower utility than individual interventions on aggregate. However, adding more interventions do no always result in better fairness or worse utility. The likelihood of achieving high performance (F1 Score) along with high fairness increases with larger number of interventions. On the downside, we found that fairness-enhancing interventions can negatively impact different population groups, especially the privileged group. This study highlights the need for new fairness metrics that account for the impact on different population groups apart from just the disparity between groups. Lastly, we offer a list of combinations of interventions that perform best for different fairness and utility metrics to aid the design of fair ML pipelines. △ Less

Submitted 22 August, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: Accepted to ACM CIKM Conference 2022

arXiv:2202.00654 [pdf, other]

doi 10.1103/PhysRevC.106.054902

Using the non-hydrodynamic mode to study the onset of hydrodynamic behavior in ultraperipheral symmetric nuclear collisions

Authors: Nikhil Hatwar, Madhukar Mishra

Abstract: With the attempts of extending the hydrodynamic framework of heavy-ion collision to proton-proton and other small and low energy systems, we are confronted with the question of how small the system can get and still be safely modelled as a fluid. One of the transport coefficients required in the $2^{nd}$ order relativistic viscous hydrodynamics is the shear relaxation time, inclusion of which solv… ▽ More With the attempts of extending the hydrodynamic framework of heavy-ion collision to proton-proton and other small and low energy systems, we are confronted with the question of how small the system can get and still be safely modelled as a fluid. One of the transport coefficients required in the $2^{nd}$ order relativistic viscous hydrodynamics is the shear relaxation time, inclusion of which solves the causality violation problem in the Navier-Stokes equation. In phenomenological studies this coefficient has been taken as a constant and much attention has gone into finding and fixing the shear viscosity to entropy density ratio, $ηいーた/s$. This transport coefficient also happens to control the non-hydrodynamic mode of the out-of-equilibrium hydrodynamics theory. It has been predicted that for decreasing system size, observables become sensitive to variation in shear relaxation time as a result of increasing dominance of non-hydrodynamic mode, which could potentially indicate breakdown of hydrodynamics. In this study, we try to test this prediction in the peripheral Pb-Pb collisions at $2.76$ TeV and Au-Au collisions at $200$ GeV, with IPGlasma initial condition and $(2+1)-$Dimensional viscous hydrodynamics. We find that elliptic flow does show adequate sensitivity to variation in relaxation time for decreasing system size. The multiplicity rapidity density limit for applicability of hydrodynamics is found to be around $dN/dy\approx10$, with the possibility of refinement in this value given a way to improve the centrality resolution in experimental data for referencing in peripheral collisions. △ Less

Submitted 1 December, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

Showing 1–50 of 227 results for author: Mishra, M