(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–11 of 11 results for author: Swietojanski, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.08862  [pdf, other

    cs.CL eess.AS

    Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

    Authors: Maurits Bleeker, Pawel Swietojanski, Stefan Braun, Xiaodan Zhuang

    Abstract: This paper presents an extension to train end-to-end Context-Aware Transformer Transducer ( CATT ) models by using a simple, yet efficient method of mining hard negative phrases from the latent space of the context encoder. During training, given a reference query, we mine a number of similar phrases using approximate nearest neighbour search. These sampled phrases are then used as negative exampl… ▽ More

    Submitted 16 August, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted to Interspeech 2023. 5 pages, 2 figures, 2 tables

  2. arXiv:2211.01438  [pdf, other

    eess.AS cs.CL cs.SD

    Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

    Authors: Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang

    Abstract: This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries,… ▽ More

    Submitted 18 April, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: To appear in ICASSP 2023

    Journal ref: International Conference on Acoustics, Speech, and Signal Processing, 2023 International Conference on Acoustics, Speech, and Signal Processing International Conference on Acoustics, Speech, and Signal Processing

  3. arXiv:2210.12214  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation

    Authors: Thien Nguyen, Nathalie Tran, Liuhui Deng, Thiago Fraga da Silva, Matthew Radzihovsky, Roger Hsiao, Henry Mason, Stefan Braun, Erik McDermott, Dogan Can, Pawel Swietojanski, Lyan Verwimp, Sibel Oyman, Tresi Arvizo, Honza Silovsky, Arnab Ghoshal, Mathieu Martel, Bharat Ram Ambati, Mohamed Ali

    Abstract: Code-switching describes the practice of using more than one language in the same sentence. In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. Focusing on the scenario where the ASR model is trained without supervised code-switching data, we found that semi-supervised training and synthetic code-swit… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: 5 pages, 1 figure, submitted to ICASSP 2023, *: equal contributions

  4. arXiv:2011.13205  [pdf, other

    cs.CL cs.LG

    SLURP: A Spoken Language Understanding Resource Package

    Authors: Emanuele Bastianelli, Andrea Vanzo, Pawel Swietojanski, Verena Rieser

    Abstract: Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications. However, publicly available SLU resources are limited. In this paper, we release SLURP, a new SLU package containing the following: (1) A new challenging dataset in English spanning 18 domains, which is substantially bigger an… ▽ More

    Submitted 26 November, 2020; originally announced November 2020.

    Comments: Published at the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP-2020)

  5. arXiv:2008.06580  [pdf, other

    eess.AS cs.CL cs.SD

    Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

    Authors: Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski

    Abstract: We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation. The overview characterizes adaptation algorithms as based on embeddings, model parameter adaptation, or data au… ▽ More

    Submitted 28 February, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

    Comments: Total of 31 pages, 27 figures. Associated repository: https://github.com/pswietojanski/ojsp_adaptation_review_2020

    Journal ref: IEEE Open Journal of Signal Processing, vol. 2, pp. 33-66, 2021

  6. arXiv:2005.01322  [pdf, other

    cs.HC

    Building Proactive Voice Assistants: When and How (not) to Interact

    Authors: O. Miksik, I. Munasinghe, J. Asensio-Cubero, S. Reddy Bethi, S-T. Huang, S. Zylfo, X. Liu, T. Nica, A. Mitrocsak, S. Mezza, R. Beard, R. Shi, R. Ng, P. Mediano, Z. Fountas, S-H. Lee, J. Medvesek, H. Zhuang, Y. Rogers, P. Swietojanski

    Abstract: Voice assistants have recently achieved remarkable commercial success. However, the current generation of these devices is typically capable of only reactive interactions. In other words, interactions have to be initiated by the user, which somewhat limits their usability and user experience. We propose, that the next generation of such devices should be able to proactively provide the right infor… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: 17 pages, technical report

  7. arXiv:2001.09239  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Multi-task self-supervised learning for Robust Speech Recognition

    Authors: Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio

    Abstract: Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require ma… ▽ More

    Submitted 17 April, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

    Comments: In Proc. of ICASSP 2020

  8. arXiv:1904.00202  [pdf, other

    cs.SD eess.AS

    Static Visual Spatial Priors for DoA Estimation

    Authors: Pawel Swietojanski, Ondrej Miksik

    Abstract: As we interact with the world, for example when we communicate with our colleagues in a large open space or meeting room, we continuously analyse the surrounding environment and, in particular, localise and recognise acoustic events. While we largely take such abilities for granted, they represent a challenging problem for current robots or smart voice assistants as they can be easily fooled by hi… ▽ More

    Submitted 30 March, 2019; originally announced April 2019.

    Comments: 6 pages, 6 figures, 3 tables

  9. arXiv:1903.05566  [pdf, ps, other

    cs.CL cs.LG

    Benchmarking Natural Language Understanding Services for building Conversational Agents

    Authors: Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser

    Abstract: We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer. In this paper, we present the first wide coverage evaluation and comparison of some of the most popular NLU services, on a l… ▽ More

    Submitted 26 March, 2019; v1 submitted 13 March, 2019; originally announced March 2019.

    Comments: Accepted by IWSDS2019

  10. Differentiable Pooling for Unsupervised Acoustic Model Adaptation

    Authors: Pawel Swietojanski, Steve Renals

    Abstract: We present a deep neural network (DNN) acoustic model that includes parametrised and differentiable pooling operators. Unsupervised acoustic model adaptation is cast as the problem of updating the decision boundaries implemented by each pooling operator. In particular, we experiment with two types of pooling parametrisations: learned $L_p$-norm pooling and weighted Gaussian pooling, in which the w… ▽ More

    Submitted 13 July, 2016; v1 submitted 31 March, 2016; originally announced March 2016.

    Comments: 11 pages, 7 Tables, 7 Figures in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, num. 11, 2016

  11. Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation

    Authors: Pawel Swietojanski, Jinyu Li, Steve Renals

    Abstract: This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) -- a method that linearly re-combines hidden units in a speaker- or environment-dependent manner using small amounts of unsupervised adaptation data. We also extend LHUC to a speaker adaptive training (SAT) framework that leads to a more adaptable DNN acoustic… ▽ More

    Submitted 13 July, 2016; v1 submitted 12 January, 2016; originally announced January 2016.

    Comments: 14 pages, 9 Tables, 11 Figues in IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, Num. 8, 2016