(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–20 of 20 results for author: Ermis, B

.
  1. arXiv:2406.18682  [pdf, other

    cs.CL cs.AI cs.LG

    The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

    Authors: Aakanksha, Arash Ahmadian, Beyza Ermis, Seraphina Goldfarb-Tarrant, Julia Kreutzer, Marzieh Fadaee, Sara Hooker

    Abstract: A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally, preference training and safety measures often overfit to harms common in Western-centric datasets. Here, we explore the viability of different alignment approaches… ▽ More

    Submitted 8 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2403.03893  [pdf, other

    cs.CL cs.AI

    From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models

    Authors: Luiza Pozzobon, Patrick Lewis, Sara Hooker, Beyza Ermis

    Abstract: To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient anno… ▽ More

    Submitted 30 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  3. arXiv:2402.17400  [pdf, other

    cs.CL

    Investigating Continual Pretraining in Large Language Models: Insights and Implications

    Authors: Çağatay Yıldız, Nishaanth Kanna Ravichandran, Prishruit Punia, Matthias Bethge, Beyza Ermis

    Abstract: This paper studies the evolving domain of Continual Learning (CL) in large language models (LLMs), with a focus on developing strategies for efficient and sustainable training. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge and enhancin… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  4. arXiv:2311.17295  [pdf, other

    cs.CL cs.AI

    Elo Uncovered: Robustness and Best Practices in Language Model Evaluation

    Authors: Meriem Boubdir, Edward Kim, Beyza Ermis, Sara Hooker, Marzieh Fadaee

    Abstract: In Natural Language Processing (NLP), the Elo rating system, originally designed for ranking players in dynamic games such as chess, is increasingly being used to evaluate Large Language Models (LLMs) through "A vs B" paired comparisons. However, while popular, the system's suitability for assessing entities with constant skill levels, such as LLMs, remains relatively unexplored. We study two fund… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 22 pages, 7 figures, 2 tables. Revised version of the paper accepted at GEM Workshop, EMNLP 2023

  5. arXiv:2310.14424  [pdf, other

    cs.CL cs.AI

    Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation

    Authors: Meriem Boubdir, Edward Kim, Beyza Ermis, Marzieh Fadaee, Sara Hooker

    Abstract: Human evaluation is increasingly critical for assessing large language models, capturing linguistic nuances, and reflecting user preferences more accurately than traditional automated metrics. However, the resource-intensive nature of this type of annotation process poses significant challenges. The key question driving our work: "is it feasible to minimize human-in-the-loop feedback by prioritizi… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: 37 pages, 8 figures

  6. arXiv:2310.07589  [pdf, other

    cs.AI

    Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models

    Authors: Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker

    Abstract: Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models. Furthermore, previous approaches have often neglected the crucial factor of language's evolving nature over time. In this work, we present a comprehensive perspective on toxicity mitigation that takes i… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  7. arXiv:2309.05444  [pdf, other

    cs.CL cs.LG

    Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

    Authors: Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermiş, Acyr Locatelli, Sara Hooker

    Abstract: The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architectur… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  8. arXiv:2304.12397  [pdf, other

    cs.CL cs.AI

    On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

    Authors: Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker

    Abstract: Perception of toxicity evolves over time and often differs between geographies and cultural backgrounds. Similarly, black-box commercially available APIs for detecting toxicity, such as the Perspective API, are not static, but frequently retrained to address any unattended weaknesses and biases. We evaluate the implications of these changes on the reproducibility of findings that compare the relat… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  9. arXiv:2207.06940  [pdf, other

    cs.LG stat.ML

    PASHA: Efficient HPO and NAS with Progressive Resource Allocation

    Authors: Ondrej Bohdal, Lukas Balles, Martin Wistuba, Beyza Ermis, Cédric Archambeau, Giovanni Zappella

    Abstract: Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. We propose an approach t… ▽ More

    Submitted 8 March, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted at ICLR 2023

  10. arXiv:2206.14085  [pdf, other

    cs.LG cs.CV

    Continual Learning with Transformers for Image Classification

    Authors: Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau

    Abstract: In many real-world scenarios, data to train machine learning models become available over time. However, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is often difficult to prevent due to practical constraints, such as the amount of data that can be stored or the limit… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: Appeared in CVPR CLVision workshop. arXiv admin note: substantial text overlap with arXiv:2203.04640

  11. arXiv:2203.04640  [pdf, other

    cs.CL cs.AI stat.ML

    Memory Efficient Continual Learning with Transformers

    Authors: Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau

    Abstract: In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is difficult to prevent due to practical constraints. For instance, the amount of data that can be stored or the computa… ▽ More

    Submitted 13 January, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: This paper was published at NeurIPS 2022

  12. arXiv:2004.13106  [pdf, other

    cs.LG stat.ML

    Learning to Rank in the Position Based Model with Bandit Feedback

    Authors: Beyza Ermis, Patrick Ernst, Yannik Stein, Giovanni Zappella

    Abstract: Personalization is a crucial aspect of many online experiences. In particular, content ranking is often a key component in delivering sophisticated personalization results. Commonly, supervised learning-to-rank methods are applied, which suffer from bias introduced during data collection by production systems in charge of producing the ranking. To compensate for this problem, we leverage contextua… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

  13. arXiv:1807.02089  [pdf, other

    stat.ML cs.LG

    Linear Bandits with Stochastic Delayed Feedback

    Authors: Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

    Abstract: Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by practitioners hoping to apply existing algorithms is that usually the feedback is randomly delayed and delays are only partially observable. For example, while a purchase… ▽ More

    Submitted 2 March, 2020; v1 submitted 5 July, 2018; originally announced July 2018.

  14. arXiv:1712.02629  [pdf, ps, other

    stat.ML cs.LG

    Differentially Private Variational Dropout

    Authors: Beyza Ermis, Ali Taylan Cemgil

    Abstract: Deep neural networks with their large number of parameters are highly flexible learning systems. The high flexibility in such networks brings with some serious problems such as overfitting, and regularization is used to address this problem. A currently popular and effective regularization technique for controlling the overfitting is dropout. Often, large data collections required for neural netwo… ▽ More

    Submitted 16 December, 2017; v1 submitted 30 November, 2017; originally announced December 2017.

    Comments: arXiv admin note: substantial text overlap with arXiv:1712.01665

  15. arXiv:1712.01665  [pdf, ps, other

    stat.ML cs.LG

    Differentially Private Dropout

    Authors: Beyza Ermis, Ali Taylan Cemgil

    Abstract: Large data collections required for the training of neural networks often contain sensitive information such as the medical histories of patients, and the privacy of the training data must be preserved. In this paper, we introduce a dropout technique that provides an elegant Bayesian interpretation to dropout, and show that the intrinsic noise added, with the primary goal of regularization, can be… ▽ More

    Submitted 30 November, 2017; originally announced December 2017.

    Comments: arXiv admin note: text overlap with arXiv:1611.00340 by other authors

  16. arXiv:1507.05016  [pdf, ps, other

    stat.ML

    Incremental Variational Inference for Latent Dirichlet Allocation

    Authors: Cedric Archambeau, Beyza Ermis

    Abstract: We introduce incremental variational inference and apply it to latent Dirichlet allocation (LDA). Incremental variational inference is inspired by incremental EM and provides an alternative to stochastic variational inference. Incremental LDA can process massive document collections, does not require to set a learning rate, converges faster to a local optimum of the variational bound and enjoys th… ▽ More

    Submitted 22 July, 2015; v1 submitted 17 July, 2015; originally announced July 2015.

  17. arXiv:1409.8276  [pdf, other

    cs.LG math.NA stat.ML

    A Bayesian Tensor Factorization Model via Variational Inference for Link Prediction

    Authors: Beyza Ermis, A. Taylan Cemgil

    Abstract: Probabilistic approaches for tensor factorization aim to extract meaningful structure from incomplete data by postulating low rank constraints. Recently, variational Bayesian (VB) inference techniques have successfully been applied to large scale models. This paper presents full Bayesian inference via VB on both single and coupled tensor factorization models. Our method can be run even for very la… ▽ More

    Submitted 29 September, 2014; originally announced September 2014.

    Comments: arXiv admin note: substantial text overlap with arXiv:1409.8083

  18. arXiv:1409.8083  [pdf, other

    stat.CO math.NA

    Variational Inference For Probabilistic Latent Tensor Factorization with KL Divergence

    Authors: Beyza Ermis, Y. Kenan Yılmaz, A. Taylan Cemgil, Evrim Acar

    Abstract: Probabilistic Latent Tensor Factorization (PLTF) is a recently proposed probabilistic framework for modelling multi-way data. Not only the common tensor factorization models but also any arbitrary tensor factorization structure can be realized by the PLTF framework. This paper presents full Bayesian inference via variational Bayes that facilitates more powerful modelling and allows more sophistica… ▽ More

    Submitted 29 September, 2014; originally announced September 2014.

  19. arXiv:1208.6231  [pdf, other

    cs.LG

    Link Prediction via Generalized Coupled Tensor Factorisation

    Authors: Beyza Ermiş, Evrim Acar, A. Taylan Cemgil

    Abstract: This study deals with the missing link prediction problem: the problem of predicting the existence of missing connections between entities of interest. We address link prediction using coupled analysis of relational datasets represented as heterogeneous data, i.e., datasets in the form of matrices and higher-order tensors. We propose to use an approach based on probabilistic interpretation of tens… ▽ More

    Submitted 30 August, 2012; originally announced August 2012.

  20. Distributed Detection in Sensor Networks with Limited Range Sensors

    Authors: Erhan B. Ermis, Venkatesh Saligrama

    Abstract: We consider a multi-object detection problem over a sensor network (SNET) with limited range sensors. This problem complements the widely considered decentralized detection problem where all sensors observe the same object. While the necessity for global collaboration is clear in the decentralized detection problem, the benefits of collaboration with limited range sensors is unclear and has not… ▽ More

    Submitted 18 March, 2008; v1 submitted 26 January, 2007; originally announced January 2007.

    Comments: Submitted to IEEE Transactions on Signal Processing