(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–16 of 16 results for author: D'Souza, D

.
  1. arXiv:2408.14960  [pdf, other

    cs.CL cs.AI

    Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress

    Authors: Ayomide Odumakinde, Daniel D'souza, Pat Verga, Beyza Ermis, Sara Hooker

    Abstract: The use of synthetic data has played a critical role in recent state-of-art breakthroughs. However, overly relying on a single oracle teacher model to generate data has been shown to lead to model collapse and invite propagation of biases. These limitations are particularly evident in multilingual settings, where the absence of a universally effective teacher model that excels across all languages… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  2. arXiv:2402.07827  [pdf, other

    cs.CL

    Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

    Authors: Ahmet Üstün, Viraat Aryabumi, Zheng-Xin Yong, Wei-Yin Ko, Daniel D'souza, Gbemileke Onilude, Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargus, Phil Blunsom, Shayne Longpre, Niklas Muennighoff, Marzieh Fadaee, Julia Kreutzer, Sara Hooker

    Abstract: Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOM… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  3. arXiv:2306.02427  [pdf, other

    cs.LO cs.FL cs.SC eess.SY

    Towards Efficient Controller Synthesis Techniques for Logical LTL Games

    Authors: Stanly Samuel, Deepak D'Souza, Raghavan Komondoor

    Abstract: Two-player games are a fruitful way to represent and reason about several important synthesis tasks. These tasks include controller synthesis (where one asks for a controller for a given plant such that the controlled plant satisfies a given temporal specification), program repair (setting values of variables to avoid exceptions), and synchronization synthesis (adding lock/unlock statements in mul… ▽ More

    Submitted 21 August, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

  4. arXiv:2303.00586  [pdf, other

    stat.ML cs.AI cs.CV cs.CY cs.LG

    FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling

    Authors: Wei-Yin Ko, Daniel D'souza, Karina Nguyen, Randall Balestriero, Sara Hooker

    Abstract: Ensembling multiple Deep Neural Networks (DNNs) is a simple and effective way to improve top-line metrics and to outperform a larger single model. In this work, we go beyond top-line metrics and instead explore the impact of ensembling on subgroup performances. Surprisingly, we observe that even with a simple homogeneous ensemble -- all the individual DNNs share the same training set, architecture… ▽ More

    Submitted 20 December, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  5. arXiv:2212.08170  [pdf, other

    cs.AI cs.LG cs.LO cs.SC

    BNSynth: Bounded Boolean Functional Synthesis

    Authors: Ravi Raja, Stanly Samuel, Chiranjib Bhattacharyya, Deepak D'Souza, Aditya Kanade

    Abstract: The automated synthesis of correct-by-construction Boolean functions from logical specifications is known as the Boolean Functional Synthesis (BFS) problem. BFS has many application areas that range from software engineering to circuit design. In this paper, we introduce a tool BNSynth, that is the first to solve the BFS problem under a given bound on the solution space. Bounding the solution spac… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    ACM Class: I.2.2; I.2.6; B.6.0

  6. arXiv:2107.13098  [pdf, other

    cs.CV cs.LG

    A Tale Of Two Long Tails

    Authors: Daniel D'souza, Zach Nussbaum, Chirag Agarwal, Sara Hooker

    Abstract: As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions. However, the majority of work on uncertainty has focused on traditional probabilistic or ranking approaches - where the model assigns low probabilities or scores to uncertain examples. While this captures what examples are… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

    Comments: Preliminary results accepted to Workshop on Uncertainty and Robustness in Deep Learning (UDL), ICML, 2021

  7. GenSys: A Scalable Fixed-point Engine for Maximal Controller Synthesis over Infinite State Spaces

    Authors: Stanly Samuel, Deepak D'Souza, Raghavan Komondoor

    Abstract: The synthesis of maximally-permissive controllers in infinite-state systems has many practical applications. Such controllers directly correspond to maximal winning strategies in logically specified infinite-state two-player games. In this paper, we introduce a tool called GenSys which is a fixed-point engine for computing maximal winning strategies for players in infinite-state safety games. A ke… ▽ More

    Submitted 16 August, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

  8. arXiv:2103.11811  [pdf

    cs.CL cs.AI

    MasakhaNER: Named Entity Recognition for African Languages

    Authors: David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi , et al. (36 additional authors not shown)

    Abstract: We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We… ▽ More

    Submitted 5 July, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted to TACL 2021, pre-MIT Press publication version

  9. arXiv:2010.02642  [pdf, other

    cs.PL

    Static Race Detection for RTOS Applications

    Authors: Rishi Tulsyan, Rekha Pai, Deepak D'Souza

    Abstract: We present a static analysis technique for detecting data races in Real-Time Operating System (RTOS) applications. These applications are often employed in safety-critical tasks and the presence of races may lead to erroneous behaviour with serious consequences. Analyzing these applications is challenging due to the variety of non-standard synchronization mechanisms they use. We propose a techniqu… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: 18 pages Accepted in FSTTCS 2020 This version contains detailed semantics

  10. arXiv:2009.02775  [pdf, other

    cs.PL

    A Thread-Local Semantics and Efficient Static Analyses for Race Free Programs

    Authors: Suvam Mukherjee, Oded Padon, Sharon Shoham, Deepak D'Souza, Noam Rinetzky

    Abstract: Data race free (DRF) programs constitute an important class of concurrent programs. In this paper we provide a framework for designing and proving the correctness of data flow analyses that target this class of programs. These analyses are in the same spirit as the "sync-CFG" analysis proposed in earlier literature. To achieve this, we first propose a novel concrete semantics for DRF programs, cal… ▽ More

    Submitted 6 September, 2020; originally announced September 2020.

  11. arXiv:2008.11600  [pdf, other

    cs.CV cs.LG

    Estimating Example Difficulty Using Variance of Gradients

    Authors: Chirag Agarwal, Daniel D'souza, Sara Hooker

    Abstract: In machine learning, a question of great interest is understanding what examples are challenging for a model to classify. Identifying atypical examples ensures the safe deployment of models, isolates samples that require further human inspection and provides interpretability into model behavior. In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by… ▽ More

    Submitted 21 June, 2022; v1 submitted 26 August, 2020; originally announced August 2020.

    Comments: Accepted to CVPR 2022

  12. arXiv:2001.10328  [pdf, other

    cs.PL

    Verification of a Generative Separation Kernel

    Authors: Inzemamul Haque, Deepak D'Souza, Habeeb P, Arnab Kundu, Ganesh Babu

    Abstract: We present a formal verification of the functional correctness of the Muen Separation Kernel. Muen is representative of the class of modern separation kernels that leverage hardware virtualization support, and are generative in nature in that they generate a specialized kernel for each system configuration. These features pose substantial challenges to existing verification techniques. We propose… ▽ More

    Submitted 14 May, 2020; v1 submitted 25 January, 2020; originally announced January 2020.

  13. arXiv:1809.04430  [pdf, other

    cs.CV cs.LG cs.NE physics.med-ph stat.ML

    Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy

    Authors: Stanislav Nikolov, Sam Blackwell, Alexei Zverovitch, Ruheena Mendes, Michelle Livne, Jeffrey De Fauw, Yojan Patel, Clemens Meyer, Harry Askham, Bernardino Romera-Paredes, Christopher Kelly, Alan Karthikesalingam, Carlton Chu, Dawn Carnell, Cheng Boon, Derek D'Souza, Syed Ali Moinuddin, Bethany Garie, Yasmin McQuinlan, Sarah Ireland, Kiarna Hampton, Krystle Fuller, Hugh Montgomery, Geraint Rees, Mustafa Suleyman , et al. (4 additional authors not shown)

    Abstract: Over half a million individuals are diagnosed with head and neck cancer each year worldwide. Radiotherapy is an important curative treatment for this disease, but it requires manual time consuming delineation of radio-sensitive organs at risk (OARs). This planning process can delay treatment, while also introducing inter-operator variability with resulting downstream radiation dose differences. Wh… ▽ More

    Submitted 13 January, 2021; v1 submitted 12 September, 2018; originally announced September 2018.

  14. arXiv:1712.09418  [pdf, ps, other

    cs.LO cs.LG cs.PL

    Horn-ICE Learning for Synthesizing Invariants and Contracts

    Authors: Deepak D'Souza, P. Ezudheen, Pranav Garg, P. Madhusudan, Daniel Neider

    Abstract: We design learning algorithms for synthesizing invariants using Horn implication counterexamples (Horn-ICE), extending the ICE-learning model. In particular, we describe a decision-tree learning algorithm that learns from Horn-ICE samples, works in polynomial time, and uses statistical heuristics to learn small trees that satisfy the samples. Since most verification proofs can be modeled using Hor… ▽ More

    Submitted 26 December, 2017; originally announced December 2017.

    MSC Class: 68Q60; 68Q32

  15. arXiv:1008.2458  [pdf, other

    cs.SE

    A Case Study in Matching Service Descriptions to Implementations in an Existing System

    Authors: Hari S. Gupta, Deepak D'Souza, Raghavan Komondoor, Girish M. Rama

    Abstract: A number of companies are trying to migrate large monolithic software systems to Service Oriented Architectures. A common approach to do this is to first identify and describe desired services (i.e., create a model), and then to locate portions of code within the existing system that implement the described services. In this paper we describe a detailed case study we undertook to match a model to… ▽ More

    Submitted 19 December, 2010; v1 submitted 14 August, 2010; originally announced August 2010.

    Comments: 20 pages, 19 pdf figures

    ACM Class: D.2.7; D.2.13; K.6.3

  16. arXiv:cs/0601096  [pdf, ps, other

    cs.LO

    On timed automata with input-determined guards

    Authors: Deepak D'Souza, Nicolas Tabareau

    Abstract: We consider a general notion of timed automata with input-determined guards and show that they admit a robust logical framework along the lines of [D 'Souza03], in terms of a monadic second order logic characterisation and an expressively complete timed temporal logic. We then generalize these automata using the notion of recursive operators introduced by Henzinger, Raskin, and Schobbens, and sh… ▽ More

    Submitted 23 January, 2006; originally announced January 2006.