(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–19 of 19 results for author: Chaudhari, H

.
  1. arXiv:2405.20485  [pdf, other

    cs.CR cs.CL cs.LG

    Phantom: General Trigger Attacks on Retrieval Augmented Language Generation

    Authors: Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A. Choquette-Choo, Milad Nasr, Cristina Nita-Rotaru, Alina Oprea

    Abstract: Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) in chatbot applications, enabling developers to adapt and personalize the LLM output without expensive training or fine-tuning. RAG systems use an external knowledge database to retrieve the most relevant documents for a given query, providing this context to the LLM generator. While RAG achieves i… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2404.02181  [pdf, other

    cs.LG cs.AI

    Leveraging Machine Learning for Early Autism Detection via INDT-ASD Indian Database

    Authors: Trapti Shrivastava, Harshal Chaudhari, Vrijendra Singh

    Abstract: Machine learning (ML) has advanced quickly, particularly throughout the area of health care. The diagnosis of neurodevelopment problems using ML is a very important area of healthcare. Autism spectrum disorder (ASD) is one of the developmental disorders that is growing the fastest globally. The clinical screening tests used to identify autistic symptoms are expensive and time-consuming. But now th… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  3. L3Cube-MahaSocialNER: A Social Media based Marathi NER Dataset and BERT models

    Authors: Harsh Chaudhari, Anuja Patil, Dhanashree Lavekar, Pranav Khairnar, Raviraj Joshi

    Abstract: This work introduces the L3Cube-MahaSocialNER dataset, the first and largest social media dataset specifically designed for Named Entity Recognition (NER) in the Marathi language. The dataset comprises 18,000 manually labeled sentences covering eight entity classes, addressing challenges posed by social media data, including non-standard language and informal idioms. Deep learning models, includin… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Accepted at Forum for Information Retrieval Evaluation (FIRE 2023)

  4. On Significance of Subword tokenization for Low Resource and Efficient Named Entity Recognition: A case study in Marathi

    Authors: Harsh Chaudhari, Anuja Patil, Dhanashree Lavekar, Pranav Khairnar, Raviraj Joshi, Sachin Pande

    Abstract: Named Entity Recognition (NER) systems play a vital role in NLP applications such as machine translation, summarization, and question-answering. These systems identify named entities, which encompass real-world concepts like locations, persons, and organizations. Despite extensive research on NER systems for the English language, they have not received adequate attention in the context of low reso… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Accepted at ICDAM 2023

  5. arXiv:2311.10005  [pdf, other

    cs.DB

    Towards Flexibility and Robustness of LSM Trees

    Authors: Andy Huynh, Harshal A. Chaudhari, Evimaria Terzi, Manos Athanassoulis

    Abstract: Log-Structured Merge trees (LSM trees) are increasingly used as part of the storage engine behind several data systems, and are frequently deployed in the cloud. As the number of applications relying on LSM-based storage backends increases, the problem of performance tuning of LSM trees receives increasing attention. We consider both nominal tunings - where workload and execution environment are a… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 25 pages, 19 figures, VLDB-J. arXiv admin note: substantial text overlap with arXiv:2110.13801

  6. arXiv:2310.03838  [pdf, other

    cs.LG

    Chameleon: Increasing Label-Only Membership Leakage with Adaptive Poisoning

    Authors: Harsh Chaudhari, Giorgio Severi, Alina Oprea, Jonathan Ullman

    Abstract: The integration of machine learning (ML) in numerous critical applications introduces a range of privacy concerns for individuals who provide their datasets for model training. One such privacy risk is Membership Inference (MI), in which an attacker seeks to determine whether a particular data sample was included in the training dataset of a model. Current state-of-the-art MI attacks capitalize on… ▽ More

    Submitted 16 January, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: To appear at International Conference on Learning Representations (ICLR) 2024

  7. arXiv:2303.00989  [pdf, other

    physics.ao-ph

    Role of modified cloud microphysics parameterization in coupled climate model for studying ISM rainfall: small-scale cloud model and climate model work better together

    Authors: Moumita Bhowmik, Anupam Hazra, Ankur Srivastava, Dipjyoti Mudiar, Hemantkumar S. Chaudhari, Suryachandra A. Rao, Lian-Ping Wang

    Abstract: An unresolved problem of present generation coupled climate models is the realistic distribution of rainfall over Indian monsoon region, which is also related to the persistent dry bias over Indian land mass. Therefore, quantitative prediction of the intensity of rainfall events has remained a challenge for the state-of-the-art global coupled models. Guided by the observation, it is hypothesized t… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  8. arXiv:2208.12348  [pdf, other

    cs.LG cs.CR

    SNAP: Efficient Extraction of Private Properties with Poisoning

    Authors: Harsh Chaudhari, John Abascal, Alina Oprea, Matthew Jagielski, Florian Tramèr, Jonathan Ullman

    Abstract: Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. Such attacks have privacy implications for data owners sharing their datasets to train machine learning models. Several existing approaches for property inference attacks against deep neural networks have been proposed, but they all rely on the attacker training a large… ▽ More

    Submitted 21 June, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: 28 pages, 16 figures

  9. arXiv:2205.09986  [pdf, other

    cs.CR cs.LG

    SafeNet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning

    Authors: Harsh Chaudhari, Matthew Jagielski, Alina Oprea

    Abstract: Secure multiparty computation (MPC) has been proposed to allow multiple mutually distrustful data owners to jointly train machine learning (ML) models on their combined data. However, by design, MPC protocols faithfully compute the training functionality, which the adversarial ML community has shown to leak private information and can be tampered with in poisoning attacks. In this work, we argue t… ▽ More

    Submitted 8 September, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

  10. arXiv:2110.13801  [pdf, other

    cs.DB

    Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty

    Authors: Andy Huynh, Harshal A. Chaudhari, Evimaria Terzi, Manos Athanassoulis

    Abstract: Log-Structured Merge trees (LSM trees) are increasingly used as the storage engines behind several data systems, frequently deployed in the cloud. Similar to other database architectures, LSM trees take into account information about the expected workload (e.g., reads vs. writes, point vs. range queries) to optimize their performance via tuning. Operating in shared infrastructure like the cloud, h… ▽ More

    Submitted 2 November, 2021; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: 21 pages, 30 figures

  11. arXiv:2110.03956  [pdf

    physics.ao-ph

    Seasonal Predictability of Lightning over the Global Hotspot Regions

    Authors: Chandrima Mallick, Anupam Hazra, Subodh K. Saha, Hemantkumar S. Chaudhari, Samir Pokhrel, Mahen Konwar, Ushnanshu Dutta, Greeshma M. Mohan, K. Gayatri Vani

    Abstract: Skillful seasonal prediction of lightning is crucial over several global hotspot regions, as it causes severe damages to infrastructures and losses of human life. While major emphasis has been given for predicting rainfall, prediction of lightning in one season advance remained uncommon, owing to the nature of the problem, which is short-lived local phenomenon. Here we show that on the seasonal ti… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  12. Unraveling the Global Teleconnections of Indian Summer Monsoon Clouds: Expedition from CMIP5 to CMIP6

    Authors: Ushnanshu Dutta, Anupam Hazra, Hemantkumar S. Chaudhari, Subodh Kumar Saha, Samir Pokhrel, Utkarsh Verma

    Abstract: We have analyzed the teleconnection of total cloud fraction (TCF) with global sea surface temperature (SST) in multi-model ensembles (MME) of the fifth and sixth Coupled Model Intercomparison Projects (CMIP5 and CMIP6). CMIP6-MME has a more robust and realistic teleconnection (TCF and global SST) pattern over the extra-tropics (R ~0.43) and North Atlantic (R ~0.39) region, which in turn resulted i… ▽ More

    Submitted 20 September, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: 12 pages, 4 main figures, 2 supplementary figures

  13. arXiv:2101.04521  [pdf

    physics.ao-ph physics.geo-ph physics.soc-ph

    Examining the variability of cloud hydrometeors and its importance on the Indian summer monsoon rainfall predictability

    Authors: Ushnanshu Dutta, Anupam Hazra, Subodh Kumar Saha, Hemantkumar S. Chaudhari, Samir Pokhrel, Mahen Konwar

    Abstract: Skilful prediction of the seasonal Indian summer monsoon (ISM) rainfall (ISMR) at least one season in advance has great socio-economic value. It represents a lifeline for about a sixth of the world's population. The ISMR prediction remained a challenging problem with the sub-critical skills of the dynamical models attributable to limited understanding of the interaction among clouds, convection, a… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: 36 Pages, 14 figures

  14. arXiv:2009.02423  [pdf, other

    cs.AI cs.IR

    A General Framework for Fairness in Multistakeholder Recommendations

    Authors: Harshal A. Chaudhari, Sangdi Lin, Ondrej Linda

    Abstract: Contemporary recommender systems act as intermediaries on multi-sided platforms serving high utility recommendations from sellers to buyers. Such systems attempt to balance the objectives of multiple stakeholders including sellers, buyers, and the platform itself. The difficulty in providing recommendations that maximize the utility for a buyer, while simultaneously representing all the sellers on… ▽ More

    Submitted 4 September, 2020; originally announced September 2020.

    Comments: 7 pages, 3 figures

    ACM Class: I.2.1

  15. arXiv:2006.10904  [pdf, other

    cs.AI cs.CY

    Learn to Earn: Enabling Coordination within a Ride Hailing Fleet

    Authors: Harshal A. Chaudhari, John W. Byers, Evimaria Terzi

    Abstract: The problem of optimizing social welfare objectives on multi sided ride hailing platforms such as Uber, Lyft, etc., is challenging, due to misalignment of objectives between drivers, passengers, and the platform itself. An ideal solution aims to minimize the response time for each hyper local passenger ride request, while simultaneously maintaining high demand satisfaction and supply utilization a… ▽ More

    Submitted 16 July, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: 16 pages, 9 figures

    MSC Class: 68T05 ACM Class: I.2; K.4; J.6

  16. arXiv:1912.02631  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Trident: Efficient 4PC Framework for Privacy Preserving Machine Learning

    Authors: Harsh Chaudhari, Rahul Rachuri, Ajith Suresh

    Abstract: Machine learning has started to be deployed in fields such as healthcare and finance, which propelled the need for and growth of privacy-preserving machine learning (PPML). We propose an actively secure four-party protocol (4PC), and a framework for PPML, showcasing its applications on four of the most widely-known machine learning algorithms -- Linear Regression, Logistic Regression, Neural Netwo… ▽ More

    Submitted 8 June, 2021; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: This work appeared at the 26th Annual Network and Distributed System Security Symposium (NDSS) 2020. Update: An improved version of this framework is available at arXiv:2106.02850

  17. ASTRA: High Throughput 3PC over Rings with Application to Secure Prediction

    Authors: Harsh Chaudhari, Ashish Choudhury, Arpita Patra, Ajith Suresh

    Abstract: The concrete efficiency of secure computation has been the focus of many recent works. In this work, we present concretely-efficient protocols for secure $3$-party computation (3PC) over a ring of integers modulo $2^{\ell}$ tolerating one corruption, both with semi-honest and malicious security. Owing to the fact that computation over ring emulates computation over the real-world system architectu… ▽ More

    Submitted 5 December, 2019; originally announced December 2019.

    Comments: This article is the full and extended version of an article appeared in ACM CCSW 2019

  18. Unraveling the Mystery of Indian Summer Monsoon Prediction: Improved Estimate of Predictability Limit

    Authors: Subodh Kumar Saha, Anupam Hazra, Samir Pokhrel, Hemantkumar S. Chaudhari, K. Sujith, Archana Rai, Hasibur Rahaman, B. N. Goswami

    Abstract: Large socio-economic impact of the Indian Summer Monsoon (ISM) extremes motivated numerous attempts at its long range prediction over the past century. However, a rather estimated low potential predictability limit (PPL) of seasonal prediction of the ISM, contributed significantly by 'internal' interannual variability was considered insurmountable. Here we show that the 'internal' variability cont… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

  19. Markov Chain Monitoring

    Authors: Harshal A. Chaudhari, Michael Mathioudakis, Evimaria Terzi

    Abstract: In networking applications, one often wishes to obtain estimates about the number of objects at different parts of the network (e.g., the number of cars at an intersection of a road network or the number of packets expected to reach a node in a computer network) by monitoring the traffic in a small number of network nodes or edges. We formalize this task by defining the 'Markov Chain Monitoring' p… ▽ More

    Submitted 23 January, 2018; originally announced January 2018.

    Comments: 13 pages, 10 figures, 1 table