(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–17 of 17 results for author: Kwatra, N

.
  1. arXiv:2407.09141  [pdf, other

    cs.LG

    Accuracy is Not All You Need

    Authors: Abhinav Dutta, Sanjeev Krishnan, Nipun Kwatra, Ramachandran Ramjee

    Abstract: When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks.If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality.However, even when the accuracy of baseline and… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.07000  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

    Authors: Amey Agrawal, Anmol Agarwal, Nitin Kedia, Jayashree Mohan, Souvik Kundu, Nipun Kwatra, Ramachandran Ramjee, Alexey Tumanov

    Abstract: Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems are evaluated against conventional latency and throughput metrics (eg. TTFT, TBT, Normalised Latency and TPOT). However, these metrics fail to fully capture the nuances of LLM inference, leading to an incomplete assessment of use… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  3. arXiv:2405.05465  [pdf, other

    cs.LG cs.AI cs.CL

    Vidur: A Large-Scale Simulation Framework For LLM Inference

    Authors: Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, Alexey Tumanov

    Abstract: Optimizing the deployment of Large language models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies. To address this challenge, we present Vidur - a large-scale, high-fidelity, easil… ▽ More

    Submitted 21 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2403.02310  [pdf, other

    cs.LG cs.DC

    Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

    Authors: Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee

    Abstract: Each LLM serving request goes through two phases. The first is prefill which processes the entire input prompt and produces the first output token and the second is decode which generates the rest of output tokens, one-at-a-time. Prefill iterations have high latency but saturate GPU compute due to parallel processing of the input prompt. In contrast, decode iterations have low latency but also low… ▽ More

    Submitted 17 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  5. arXiv:2308.16369  [pdf, other

    cs.LG cs.DC

    SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

    Authors: Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee

    Abstract: Large Language Model (LLM) inference consists of two distinct phases - prefill phase which processes the input prompt and decode phase which generates output tokens autoregressively. While the prefill phase effectively saturates GPU compute at small batch sizes, the decode phase results in low compute utilization as it generates one token at a time per request. The varying prefill and decode times… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  6. arXiv:2304.14916  [pdf, other

    eess.SP cs.AI cs.HC cs.LG

    "Can't Take the Pressure?": Examining the Challenges of Blood Pressure Estimation via Pulse Wave Analysis

    Authors: Suril Mehta, Nipun Kwatra, Mohit Jain, Daniel McDuff

    Abstract: The use of observed wearable sensor data (e.g., photoplethysmograms [PPG]) to infer health measures (e.g., glucose level or blood pressure) is a very active area of research. Such technology can have a significant impact on health screening, chronic disease management and remote monitoring. A common approach is to collect sensor data and corresponding labels from a clinical grade device (e.g., blo… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

  7. arXiv:2208.05552  [pdf, other

    cs.HC cs.CV

    Towards Automating Retinoscopy for Refractive Error Diagnosis

    Authors: Aditya Aggarwal, Siddhartha Gairola, Uddeshya Upadhyay, Akshay P Vasishta, Diwakar Rao, Aditya Goyal, Kaushik Murali, Nipun Kwatra, Mohit Jain

    Abstract: Refractive error is the most common eye disorder and is the key cause behind correctable visual impairment, responsible for nearly 80% of the visual impairment in the US. Refractive error can be diagnosed using multiple methods, including subjective refraction, retinoscopy, and autorefractors. Although subjective refraction is the gold standard, it requires cooperation from the patient and hence i… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: This paper is accepted for publication in IMWUT 2022

  8. arXiv:2207.06888  [pdf, other

    cs.LG cs.AI

    Distance Learner: Incorporating Manifold Prior to Model Training

    Authors: Aditya Chetan, Nipun Kwatra

    Abstract: The manifold hypothesis (real world data concentrates near low-dimensional manifolds) is suggested as the principle behind the effectiveness of machine learning algorithms in very high dimensional problems that are common in domains such as vision and speech. Multiple methods have been proposed to explicitly incorporate the manifold hypothesis as a prior in modern Deep Neural Networks (DNNs), with… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  9. arXiv:2205.03702  [pdf, other

    eess.IV cs.CV cs.CY

    Keratoconus Classifier for Smartphone-based Corneal Topographer

    Authors: Siddhartha Gairola, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

    Abstract: Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, espe… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: 4 pages

  10. arXiv:2202.07848  [pdf, other

    cs.DC cs.AI

    Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

    Authors: Dharma Shukla, Muthian Sivathanu, Srinidhi Viswanatha, Bhargav Gulavani, Rimma Nehme, Amey Agrawal, Chen Chen, Nipun Kwatra, Ramachandran Ramjee, Pankaj Sharma, Atul Katiyar, Vipul Modi, Vaibhav Sharma, Abhishek Singh, Shreshth Singhal, Kaustubh Welankar, Lu Xun, Ravi Anupindi, Karthik Elangovan, Hasibur Rahman, Zhou Lin, Rahul Seetharaman, Cheng Xu, Eddie Ailijiang, Suresh Krishnappa , et al. (1 additional authors not shown)

    Abstract: Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically sca… ▽ More

    Submitted 21 February, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: Revision: Fixed some typos

  11. arXiv:2111.04007  [pdf, other

    cs.DC

    Varuna: Scalable, Low-cost Training of Massive Deep Learning Models

    Authors: Sanjith Athlur, Nitika Saran, Muthian Sivathanu, Ramachandran Ramjee, Nipun Kwatra

    Abstract: Systems for training massive deep learning models (billions of parameters) today assume and require specialized "hyper-clusters": hundreds or thousands of GPUs wired with specialized high-bandwidth interconnects such as NV-Link and Infiniband. Besides being expensive, such dependence on hyper-clusters and custom high-speed inter-connects limits the size of such clusters, creating (a) scalability l… ▽ More

    Submitted 15 November, 2021; v1 submitted 7 November, 2021; originally announced November 2021.

    Comments: 14 pages, 10 figures

  12. arXiv:2111.01354  [pdf, other

    cs.HC

    SmartKC: Smartphone-based Corneal Topographer for Keratoconus Detection

    Authors: Siddhartha Gairola, Murtuza Bohra, Nadeem Shaheer, Navya Jayaprakash, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

    Abstract: Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it… ▽ More

    Submitted 21 January, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Change Log: + Fixed sim-K computation (updated Section 5.5.3); re-ran our pipeline with the updated sim-K values (updated Figure 7); + Conducted the comparative evaluation with doctors again (total 4 doctors), and got improved results (updated Section 7.2 and Table 2); [Note: This is an updated version of the paper that was accepted for publication in IMWUT 2021.]

  13. arXiv:2105.14526  [pdf, other

    cs.LG

    LRTuner: A Learning Rate Tuner for Deep Neural Networks

    Authors: Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

    Abstract: One very important hyperparameter for training deep neural networks is the learning rate schedule of the optimizer. The choice of learning rate schedule determines the computational cost of getting close to a minima, how close you actually get to the minima, and most importantly the kind of local minima (wide/narrow) attained. The kind of minima attained has a significant impact on the generalizat… ▽ More

    Submitted 30 May, 2021; originally announced May 2021.

    Comments: 17 pages

  14. arXiv:2011.00196  [pdf, other

    cs.SD cs.LG eess.AS

    RespireNet: A Deep Neural Network for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting

    Authors: Siddhartha Gairola, Francis Tom, Nipun Kwatra, Mohit Jain

    Abstract: Auscultation of respiratory sounds is the primary tool for screening and diagnosing lung diseases. Automated analysis, coupled with digital stethoscopes, can play a crucial role in enabling tele-screening of fatal lung diseases. Deep neural networks (DNNs) have shown a lot of promise for such problems, and are an obvious choice. However, DNNs are extremely data hungry, and the largest respiratory… ▽ More

    Submitted 7 May, 2021; v1 submitted 31 October, 2020; originally announced November 2020.

    Comments: Code visible at https://github.com/microsoft/RespireNet

  15. arXiv:2010.12622  [pdf, other

    cs.LG cs.CV stat.ML

    S2cGAN: Semi-Supervised Training of Conditional GANs with Fewer Labels

    Authors: Arunava Chakraborty, Rahul Ragesh, Mahir Shah, Nipun Kwatra

    Abstract: Generative adversarial networks (GANs) have been remarkably successful in learning complex high dimensional real word distributions and generating realistic samples. However, they provide limited control over the generation process. Conditional GANs (cGANs) provide a mechanism to control the generation process by conditioning the output on a user defined input. Although training GANs requires only… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  16. arXiv:2003.03977  [pdf, other

    cs.LG stat.ML

    Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

    Authors: Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

    Abstract: Several papers argue that wide minima generalize better than narrow minima. In this paper, through detailed experiments that not only corroborate the generalization properties of wide minima, we also provide empirical evidence for a new hypothesis that the density of wide minima is likely lower than the density of narrow minima. Further, motivated by this hypothesis, we design a novel explore-expl… ▽ More

    Submitted 1 June, 2021; v1 submitted 9 March, 2020; originally announced March 2020.

    Comments: 34 pages

  17. arXiv:1908.03941  [pdf, other

    math.NT

    Galois Cohomology for Lubin-Tate $(\varphi_q,Γがんま_{LT})$-modules over Coefficient rings

    Authors: Chandrakant Aribam, Neha Kwatra

    Abstract: The classification of the local Galois representations using $(\varphi,Γがんま)$-modules by Fontaine has been generalized by Kisin and Ren over the Lubin-Tate extensions of local fields using the theory of $(\varphi_q,Γがんま_{LT})$-modules. In this paper, we extend the work of (Fontaine) Herr by introducing a complex which allows us to compute cohomology over the Lubin-Tate extensions and compare it with the… ▽ More

    Submitted 30 November, 2022; v1 submitted 11 August, 2019; originally announced August 2019.

    Comments: Accepted in Research in Number Theory