Search | arXiv e-print repository

Accuracy is Not All You Need

Authors: Abhinav Dutta, Sanjeev Krishnan, Nipun Kwatra, Ramachandran Ramjee

Abstract: When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks.If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality.However, even when the accuracy of baseline and… ▽ More When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks.If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality.However, even when the accuracy of baseline and compressed model are similar, we observe the phenomenon of flips, wherein answers change from correct to incorrect and vice versa in proportion.We conduct a detailed study of metrics across multiple compression techniques, models and datasets, demonstrating that the behavior of compressed models as visible to end-users is often significantly different from the baseline model, even when accuracy is similar.We further evaluate compressed models qualitatively and quantitatively using MT-Bench and show that compressed models are significantly worse than baseline models in this free-form generative task.Thus, we argue that compression techniques should also be evaluated using distance metrics.We propose two such metrics, KL-Divergence and flips, and show that they are well correlated. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07000 [pdf, other]

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

Authors: Amey Agrawal, Anmol Agarwal, Nitin Kedia, Jayashree Mohan, Souvik Kundu, Nipun Kwatra, Ramachandran Ramjee, Alexey Tumanov

Abstract: Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems are evaluated against conventional latency and throughput metrics (eg. TTFT, TBT, Normalised Latency and TPOT). However, these metrics fail to fully capture the nuances of LLM inference, leading to an incomplete assessment of use… ▽ More Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems are evaluated against conventional latency and throughput metrics (eg. TTFT, TBT, Normalised Latency and TPOT). However, these metrics fail to fully capture the nuances of LLM inference, leading to an incomplete assessment of user-facing performance crucial for real-time applications such as chat and translation. In this paper, we first identify the pitfalls of current performance metrics in evaluating LLM inference systems. We then propose Metron, a comprehensive performance evaluation framework that includes fluidity-index -- a novel metric designed to reflect the intricacies of the LLM inference process and its impact on real-time user experience. Finally, we evaluate various existing open-source platforms and model-as-a-service offerings using Metron, discussing their strengths and weaknesses. Metron is available at https://github.com/project-metron/metron. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2405.05465 [pdf, other]

Vidur: A Large-Scale Simulation Framework For LLM Inference

Authors: Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, Alexey Tumanov

Abstract: Optimizing the deployment of Large language models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies. To address this challenge, we present Vidur - a large-scale, high-fidelity, easil… ▽ More Optimizing the deployment of Large language models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies. To address this challenge, we present Vidur - a large-scale, high-fidelity, easily-extensible simulation framework for LLM inference performance. Vidur models the performance of LLM operators using a combination of experimental profiling and predictive modeling, and evaluates the end-to-end inference performance for different workloads by estimating several metrics of interest such as latency and throughput. We validate the fidelity of Vidur on several LLMs and show that it estimates inference latency with less than 9% error across the range. Further, we present Vidur-Search, a configuration search tool that helps optimize LLM deployment. Vidur-Search uses Vidur to automatically identify the most cost-effective deployment configuration that meets application performance constraints. For example, Vidur-Search finds the best deployment configuration for LLaMA2-70B in one hour on a CPU machine, in contrast to a deployment-based exploration which would require 42K GPU hours - costing ~218K dollars. Source code for Vidur is available at https://github.com/microsoft/vidur. △ Less

Submitted 21 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2403.02310 [pdf, other]

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Authors: Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee

Abstract: Each LLM serving request goes through two phases. The first is prefill which processes the entire input prompt and produces the first output token and the second is decode which generates the rest of output tokens, one-at-a-time. Prefill iterations have high latency but saturate GPU compute due to parallel processing of the input prompt. In contrast, decode iterations have low latency but also low… ▽ More Each LLM serving request goes through two phases. The first is prefill which processes the entire input prompt and produces the first output token and the second is decode which generates the rest of output tokens, one-at-a-time. Prefill iterations have high latency but saturate GPU compute due to parallel processing of the input prompt. In contrast, decode iterations have low latency but also low compute utilization because a decode iteration processes only a single token per request. This makes batching highly effective for decodes and consequently for overall throughput. However, batching multiple requests leads to an interleaving of prefill and decode iterations which makes it challenging to achieve both high throughput and low latency. We introduce an efficient LLM inference scheduler, Sarathi-Serve, to address this throughput-latency tradeoff. Sarathi-Serve introduces chunked-prefills which splits a prefill request into near equal sized chunks and creates stall-free schedules that adds new requests in a batch without pausing ongoing decodes. Stall-free scheduling unlocks the opportunity to improve throughput with large batch sizes while minimizing the effect of batching on latency. Furthermore, uniform batches in Sarathi-Serve ameliorate the imbalance between iterations resulting in minimal pipeline bubbles. Our techniques yield significant improvements in inference performance across models and hardware under tail latency constraints. For Mistral-7B on single A100 GPUs, we achieve 2.6x higher serving capacity and up to 3.7x higher serving capacity for the Yi-34B model on two A100 GPUs as compared to vLLM. When used with pipeline parallelism on Falcon-180B, Sarathi-Serve provides up to 5.6x gain in the end-to-end serving capacity. The source code for Sarathi-Serve is available at https://github.com/microsoft/sarathi-serve. △ Less

Submitted 17 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2308.16369 [pdf, other]

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Authors: Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee

Abstract: Large Language Model (LLM) inference consists of two distinct phases - prefill phase which processes the input prompt and decode phase which generates output tokens autoregressively. While the prefill phase effectively saturates GPU compute at small batch sizes, the decode phase results in low compute utilization as it generates one token at a time per request. The varying prefill and decode times… ▽ More Large Language Model (LLM) inference consists of two distinct phases - prefill phase which processes the input prompt and decode phase which generates output tokens autoregressively. While the prefill phase effectively saturates GPU compute at small batch sizes, the decode phase results in low compute utilization as it generates one token at a time per request. The varying prefill and decode times also lead to imbalance across micro-batches when using pipeline parallelism, resulting in further inefficiency due to bubbles. We present SARATHI to address these challenges. SARATHI employs chunked-prefills, which splits a prefill request into equal sized chunks, and decode-maximal batching, which constructs a batch using a single prefill chunk and populates the remaining slots with decodes. During inference, the prefill chunk saturates GPU compute, while the decode requests 'piggyback' and cost up to an order of magnitude less compared to a decode-only batch. Chunked-prefills allows constructing multiple decode-maximal batches from a single prefill request, maximizing coverage of decodes that can piggyback. Furthermore, the uniform compute design of these batches ameliorates the imbalance between micro-batches, significantly reducing pipeline bubbles. Our techniques yield significant improvements in inference performance across models and hardware. For the LLaMA-13B model on A6000 GPU, SARATHI improves decode throughput by up to 10x, and accelerates end-to-end throughput by up to 1.33x. For LLaMa-33B on A100 GPU, we achieve 1.25x higher end-to-end-throughput and up to 4.25x higher decode throughput. When used with pipeline parallelism on GPT-3, SARATHI reduces bubbles by 6.29x, resulting in an end-to-end throughput improvement of 1.91x. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2304.14916 [pdf, other]

"Can't Take the Pressure?": Examining the Challenges of Blood Pressure Estimation via Pulse Wave Analysis

Authors: Suril Mehta, Nipun Kwatra, Mohit Jain, Daniel McDuff

Abstract: The use of observed wearable sensor data (e.g., photoplethysmograms [PPG]) to infer health measures (e.g., glucose level or blood pressure) is a very active area of research. Such technology can have a significant impact on health screening, chronic disease management and remote monitoring. A common approach is to collect sensor data and corresponding labels from a clinical grade device (e.g., blo… ▽ More The use of observed wearable sensor data (e.g., photoplethysmograms [PPG]) to infer health measures (e.g., glucose level or blood pressure) is a very active area of research. Such technology can have a significant impact on health screening, chronic disease management and remote monitoring. A common approach is to collect sensor data and corresponding labels from a clinical grade device (e.g., blood pressure cuff), and train deep learning models to map one to the other. Although well intentioned, this approach often ignores a principled analysis of whether the input sensor data has enough information to predict the desired metric. We analyze the task of predicting blood pressure from PPG pulse wave analysis. Our review of the prior work reveals that many papers fall prey data leakage, and unrealistic constraints on the task and the preprocessing steps. We propose a set of tools to help determine if the input signal in question (e.g., PPG) is indeed a good predictor of the desired label (e.g., blood pressure). Using our proposed tools, we have found that blood pressure prediction using PPG has a high multi-valued mapping factor of 33.2% and low mutual information of 9.8%. In comparison, heart rate prediction using PPG, a well-established task, has a very low multi-valued mapping factor of 0.75% and high mutual information of 87.7%. We argue that these results provide a more realistic representation of the current progress towards to goal of wearable blood pressure measurement via PPG pulse wave analysis. △ Less

Submitted 23 April, 2023; originally announced April 2023.

arXiv:2208.05552 [pdf, other]

Towards Automating Retinoscopy for Refractive Error Diagnosis

Authors: Aditya Aggarwal, Siddhartha Gairola, Uddeshya Upadhyay, Akshay P Vasishta, Diwakar Rao, Aditya Goyal, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Refractive error is the most common eye disorder and is the key cause behind correctable visual impairment, responsible for nearly 80% of the visual impairment in the US. Refractive error can be diagnosed using multiple methods, including subjective refraction, retinoscopy, and autorefractors. Although subjective refraction is the gold standard, it requires cooperation from the patient and hence i… ▽ More Refractive error is the most common eye disorder and is the key cause behind correctable visual impairment, responsible for nearly 80% of the visual impairment in the US. Refractive error can be diagnosed using multiple methods, including subjective refraction, retinoscopy, and autorefractors. Although subjective refraction is the gold standard, it requires cooperation from the patient and hence is not suitable for infants, young children, and developmentally delayed adults. Retinoscopy is an objective refraction method that does not require any input from the patient. However, retinoscopy requires a lens kit and a trained examiner, which limits its use for mass screening. In this work, we automate retinoscopy by attaching a smartphone to a retinoscope and recording retinoscopic videos with the patient wearing a custom pair of paper frames. We develop a video processing pipeline that takes retinoscopic videos as input and estimates the net refractive error based on our proposed extension of the retinoscopy mathematical model. Our system alleviates the need for a lens kit and can be performed by an untrained examiner. In a clinical trial with 185 eyes, we achieved a sensitivity of 91.0% and specificity of 74.0% on refractive error diagnosis. Moreover, the mean absolute error of our approach was 0.75$\pm$0.67D on net refractive error estimation compared to subjective refraction measurements. Our results indicate that our approach has the potential to be used as a retinoscopy-based refractive error screening tool in real-world medical settings. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: This paper is accepted for publication in IMWUT 2022

arXiv:2207.06888 [pdf, other]

Distance Learner: Incorporating Manifold Prior to Model Training

Authors: Aditya Chetan, Nipun Kwatra

Abstract: The manifold hypothesis (real world data concentrates near low-dimensional manifolds) is suggested as the principle behind the effectiveness of machine learning algorithms in very high dimensional problems that are common in domains such as vision and speech. Multiple methods have been proposed to explicitly incorporate the manifold hypothesis as a prior in modern Deep Neural Networks (DNNs), with… ▽ More The manifold hypothesis (real world data concentrates near low-dimensional manifolds) is suggested as the principle behind the effectiveness of machine learning algorithms in very high dimensional problems that are common in domains such as vision and speech. Multiple methods have been proposed to explicitly incorporate the manifold hypothesis as a prior in modern Deep Neural Networks (DNNs), with varying success. In this paper, we propose a new method, Distance Learner, to incorporate this prior for DNN-based classifiers. Distance Learner is trained to predict the distance of a point from the underlying manifold of each class, rather than the class label. For classification, Distance Learner then chooses the class corresponding to the closest predicted class manifold. Distance Learner can also identify points as being out of distribution (belonging to neither class), if the distance to the closest manifold is higher than a threshold. We evaluate our method on multiple synthetic datasets and show that Distance Learner learns much more meaningful classification boundaries compared to a standard classifier. We also evaluate our method on the task of adversarial robustness, and find that it not only outperforms standard classifier by a large margin, but also performs at par with classifiers trained via state-of-the-art adversarial training. △ Less

Submitted 14 July, 2022; originally announced July 2022.

arXiv:2205.03702 [pdf, other]

Keratoconus Classifier for Smartphone-based Corneal Topographer

Authors: Siddhartha Gairola, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, espe… ▽ More Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, especially in the Global South. Low-cost smartphone-based corneal topographers, such as SmartKC, have been proposed to make keratoconus diagnosis accessible. Similar to medical-grade topographers, SmartKC outputs curvature heatmaps and quantitative metrics that need to be evaluated by doctors for keratoconus diagnosis. An automatic scheme for evaluation of these heatmaps and quantitative values can play a crucial role in screening keratoconus in areas where doctors are not available. In this work, we propose a dual-head convolutional neural network (CNN) for classifying keratoconus on the heatmaps generated by SmartKC. Since SmartKC is a new device and only had a small dataset (114 samples), we developed a 2-stage transfer learning strategy -- using historical data collected from a medical-grade topographer and a subset of SmartKC data -- to satisfactorily train our network. This, combined with our domain-specific data augmentations, achieved a sensitivity of 91.3% and a specificity of 94.2%. △ Less

Submitted 7 May, 2022; originally announced May 2022.

Comments: 4 pages

arXiv:2202.07848 [pdf, other]

Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

Authors: Dharma Shukla, Muthian Sivathanu, Srinidhi Viswanatha, Bhargav Gulavani, Rimma Nehme, Amey Agrawal, Chen Chen, Nipun Kwatra, Ramachandran Ramjee, Pankaj Sharma, Atul Katiyar, Vipul Modi, Vaibhav Sharma, Abhishek Singh, Shreshth Singhal, Kaustubh Welankar, Lu Xun, Ravi Anupindi, Karthik Elangovan, Hasibur Rahman, Zhou Lin, Rahul Seetharaman, Cheng Xu, Eddie Ailijiang, Suresh Krishnappa , et al. (1 additional authors not shown)

Abstract: Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically sca… ▽ More Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of AI accelerators (e.g., GPUs, FPGAs). All jobs in Singularity are preemptable, migratable, and dynamically resizable (elastic) by default: a live job can be dynamically and transparently (a) preempted and migrated to a different set of nodes, cluster, data center or a region and resumed exactly from the point where the execution was preempted, and (b) resized (i.e., elastically scaled-up/down) on a varying set of accelerators of a given type. Our mechanisms are transparent in that they do not require the user to make any changes to their code or require using any custom libraries that may limit flexibility. Additionally, our approach significantly improves the reliability of deep learning workloads. We show that the resulting efficiency and reliability gains with Singularity are achieved with negligible impact on the steady-state performance. Finally, our design approach is agnostic of DNN architectures and handles a variety of parallelism strategies (e.g., data/pipeline/model parallelism). △ Less

Submitted 21 February, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: Revision: Fixed some typos

arXiv:2111.04007 [pdf, other]

Varuna: Scalable, Low-cost Training of Massive Deep Learning Models

Authors: Sanjith Athlur, Nitika Saran, Muthian Sivathanu, Ramachandran Ramjee, Nipun Kwatra

Abstract: Systems for training massive deep learning models (billions of parameters) today assume and require specialized "hyper-clusters": hundreds or thousands of GPUs wired with specialized high-bandwidth interconnects such as NV-Link and Infiniband. Besides being expensive, such dependence on hyper-clusters and custom high-speed inter-connects limits the size of such clusters, creating (a) scalability l… ▽ More Systems for training massive deep learning models (billions of parameters) today assume and require specialized "hyper-clusters": hundreds or thousands of GPUs wired with specialized high-bandwidth interconnects such as NV-Link and Infiniband. Besides being expensive, such dependence on hyper-clusters and custom high-speed inter-connects limits the size of such clusters, creating (a) scalability limits on job parallelism; (b) resource fragmentation across hyper-clusters. In this paper, we present Varuna, a new system that enables training massive deep learning models on commodity networking. Varuna makes thrifty use of networking resources and automatically configures the user's training job to efficiently use any given set of resources. Therefore, Varuna is able to leverage "low-priority" VMs that cost about 5x cheaper than dedicated GPUs, thus significantly reducing the cost of training massive models. We demonstrate the efficacy of Varuna by training massive models, including a 200 billion parameter model, on 5x cheaper "spot VMs", while maintaining high training throughput. Varuna improves end-to-end training time by up to 18x compared to other model-parallel approaches and up to 26% compared to other pipeline parallel approaches. The code for Varuna is available at https://github.com/microsoft/varuna. △ Less

Submitted 15 November, 2021; v1 submitted 7 November, 2021; originally announced November 2021.

Comments: 14 pages, 10 figures

arXiv:2111.01354 [pdf, other]

SmartKC: Smartphone-based Corneal Topographer for Keratoconus Detection

Authors: Siddhartha Gairola, Murtuza Bohra, Nadeem Shaheer, Navya Jayaprakash, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it… ▽ More Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it a leading cause for partial/complete blindness among such populations. We propose SmartKC, a low-cost, smartphone-based keratoconus diagnosis system comprising of a 3D-printed placido's disc attachment, an LED light strip, and an intelligent smartphone app to capture the reflection of the placido rings on the cornea. An image processing pipeline analyzes the corneal image and uses the smartphone's camera parameters, the placido rings' 3D location, the pixel location of the reflected placido rings and the setup's working distance to construct the corneal surface, via the Arc-Step method and Zernike polynomials based surface fitting. In a clinical study with 101 distinct eyes, we found that SmartKC achieves a sensitivity of 94.1% and a specificity of 100.0%. Moreover, the quantitative curvature estimates (sim-K) strongly correlate with a gold-standard medical device (Pearson correlation coefficient =0.78). Our results indicate that SmartKC has the potential to be used as a keratoconus screening tool under real-world medical settings. △ Less

Submitted 21 January, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Change Log: + Fixed sim-K computation (updated Section 5.5.3); re-ran our pipeline with the updated sim-K values (updated Figure 7); + Conducted the comparative evaluation with doctors again (total 4 doctors), and got improved results (updated Section 7.2 and Table 2); [Note: This is an updated version of the paper that was accepted for publication in IMWUT 2021.]

arXiv:2105.14526 [pdf, other]

LRTuner: A Learning Rate Tuner for Deep Neural Networks

Authors: Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

Abstract: One very important hyperparameter for training deep neural networks is the learning rate schedule of the optimizer. The choice of learning rate schedule determines the computational cost of getting close to a minima, how close you actually get to the minima, and most importantly the kind of local minima (wide/narrow) attained. The kind of minima attained has a significant impact on the generalizat… ▽ More One very important hyperparameter for training deep neural networks is the learning rate schedule of the optimizer. The choice of learning rate schedule determines the computational cost of getting close to a minima, how close you actually get to the minima, and most importantly the kind of local minima (wide/narrow) attained. The kind of minima attained has a significant impact on the generalization accuracy of the network. Current systems employ hand tuned learning rate schedules, which are painstakingly tuned for each network and dataset. Given that the state space of schedules is huge, finding a satisfactory learning rate schedule can be very time consuming. In this paper, we present LRTuner, a method for tuning the learning rate as training proceeds. Our method works with any optimizer, and we demonstrate results on SGD with Momentum, and Adam optimizers. We extensively evaluate LRTuner on multiple datasets, models, and across optimizers. We compare favorably against standard learning rate schedules for the given dataset and models, including ImageNet on Resnet-50, Cifar-10 on Resnet-18, and SQuAD fine-tuning on BERT. For example on ImageNet with Resnet-50, LRTuner shows up to 0.2% absolute gains in test accuracy compared to the hand-tuned baseline schedule. Moreover, LRTuner can achieve the same accuracy as the baseline schedule in 29% less optimization steps. △ Less

Submitted 30 May, 2021; originally announced May 2021.

Comments: 17 pages

arXiv:2011.00196 [pdf, other]

RespireNet: A Deep Neural Network for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting

Authors: Siddhartha Gairola, Francis Tom, Nipun Kwatra, Mohit Jain

Abstract: Auscultation of respiratory sounds is the primary tool for screening and diagnosing lung diseases. Automated analysis, coupled with digital stethoscopes, can play a crucial role in enabling tele-screening of fatal lung diseases. Deep neural networks (DNNs) have shown a lot of promise for such problems, and are an obvious choice. However, DNNs are extremely data hungry, and the largest respiratory… ▽ More Auscultation of respiratory sounds is the primary tool for screening and diagnosing lung diseases. Automated analysis, coupled with digital stethoscopes, can play a crucial role in enabling tele-screening of fatal lung diseases. Deep neural networks (DNNs) have shown a lot of promise for such problems, and are an obvious choice. However, DNNs are extremely data hungry, and the largest respiratory dataset ICBHI has only 6898 breathing cycles, which is still small for training a satisfactory DNN model. In this work, RespireNet, we propose a simple CNN-based model, along with a suite of novel techniques -- device specific fine-tuning, concatenation-based augmentation, blank region clipping, and smart padding -- enabling us to efficiently use the small-sized dataset. We perform extensive evaluation on the ICBHI dataset, and improve upon the state-of-the-art results for 4-class classification by 2.2% △ Less

Submitted 7 May, 2021; v1 submitted 31 October, 2020; originally announced November 2020.

Comments: Code visible at https://github.com/microsoft/RespireNet

arXiv:2010.12622 [pdf, other]

S2cGAN: Semi-Supervised Training of Conditional GANs with Fewer Labels

Authors: Arunava Chakraborty, Rahul Ragesh, Mahir Shah, Nipun Kwatra

Abstract: Generative adversarial networks (GANs) have been remarkably successful in learning complex high dimensional real word distributions and generating realistic samples. However, they provide limited control over the generation process. Conditional GANs (cGANs) provide a mechanism to control the generation process by conditioning the output on a user defined input. Although training GANs requires only… ▽ More Generative adversarial networks (GANs) have been remarkably successful in learning complex high dimensional real word distributions and generating realistic samples. However, they provide limited control over the generation process. Conditional GANs (cGANs) provide a mechanism to control the generation process by conditioning the output on a user defined input. Although training GANs requires only unsupervised data, training cGANs requires labelled data which can be very expensive to obtain. We propose a framework for semi-supervised training of cGANs which utilizes sparse labels to learn the conditional mapping, and at the same time leverages a large amount of unsupervised data to learn the unconditional distribution. We demonstrate effectiveness of our method on multiple datasets and different conditional tasks. △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:2003.03977 [pdf, other]

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

Authors: Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

Abstract: Several papers argue that wide minima generalize better than narrow minima. In this paper, through detailed experiments that not only corroborate the generalization properties of wide minima, we also provide empirical evidence for a new hypothesis that the density of wide minima is likely lower than the density of narrow minima. Further, motivated by this hypothesis, we design a novel explore-expl… ▽ More Several papers argue that wide minima generalize better than narrow minima. In this paper, through detailed experiments that not only corroborate the generalization properties of wide minima, we also provide empirical evidence for a new hypothesis that the density of wide minima is likely lower than the density of narrow minima. Further, motivated by this hypothesis, we design a novel explore-exploit learning rate schedule. On a variety of image and natural language datasets, compared to their original hand-tuned learning rate baselines, we show that our explore-exploit schedule can result in either up to 0.84% higher absolute accuracy using the original training budget or up to 57% reduced training time while achieving the original reported accuracy. For example, we achieve state-of-the-art (SOTA) accuracy for IWSLT'14 (DE-EN) dataset by just modifying the learning rate schedule of a high performing model. △ Less

Submitted 1 June, 2021; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: 34 pages

arXiv:1908.03941 [pdf, other]

Galois Cohomology for Lubin-Tate $(\varphi_q,Γがんま_{LT})$-modules over Coefficient rings

Authors: Chandrakant Aribam, Neha Kwatra

Abstract: The classification of the local Galois representations using $(\varphi,Γがんま)$-modules by Fontaine has been generalized by Kisin and Ren over the Lubin-Tate extensions of local fields using the theory of $(\varphi_q,Γがんま_{LT})$-modules. In this paper, we extend the work of (Fontaine) Herr by introducing a complex which allows us to compute cohomology over the Lubin-Tate extensions and compare it with the… ▽ More The classification of the local Galois representations using $(\varphi,Γがんま)$-modules by Fontaine has been generalized by Kisin and Ren over the Lubin-Tate extensions of local fields using the theory of $(\varphi_q,Γがんま_{LT})$-modules. In this paper, we extend the work of (Fontaine) Herr by introducing a complex which allows us to compute cohomology over the Lubin-Tate extensions and compare it with the Galois cohomology groups. We further extend that complex to include certain non-abelian extensions. We then deduce some relations of this cohomology with those arising from $(ψぷさい_q,Γがんま_{LT})$-modules. We also compute the Iwasawa cohomology over the Lubin-Tate extensions in terms of $ψぷさい_q$-operator acting on the étale $(\varphi_q,Γがんま_{LT})$-module attached to the local Galois representation. Moreover, we generalize the notion of $(\varphi_q,Γがんま_{LT})$-modules over the coefficient ring $R$ and show that the equivalence given by Kisin and Ren extends to the Galois representations over $R$. This equivalence allows us to generalize our results to the case of coefficient rings. △ Less

Submitted 30 November, 2022; v1 submitted 11 August, 2019; originally announced August 2019.

Comments: Accepted in Research in Number Theory

Showing 1–17 of 17 results for author: Kwatra, N