(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–4 of 4 results for author: Van Segbroeck, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2305.13794  [pdf, other

    cs.CL eess.AS

    Personalized Predictive ASR for Latency Reduction in Voice Assistants

    Authors: Andreas Schwarz, Di He, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow

    Abstract: Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize prefetching to partially hide the latency of response generation. Prefetching involves passing a preliminary ASR hypothesis to downstream systems in order to prefetch and cache a response. If the final ASR hypothesis after endpoint detection matches the preliminary one, the cached response can be delivered to the user, th… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted for Interspeech 2023

  2. arXiv:2007.13802  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

    Authors: Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas

    Abstract: In this work, we propose a novel and efficient minimum word error rate (MWER) training method for RNN-Transducer (RNN-T). Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: Accepted to Interspeech 2020

  3. arXiv:2007.00131  [pdf, other

    eess.AS cs.CL cs.SD

    Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

    Authors: Maarten Van Segbroeck, Harish Mallidih, Brian King, I-Fan Chen, Gurpreet Chadha, Roland Maas

    Abstract: Acoustic models in real-time speech recognition systems typically stack multiple unidirectional LSTM layers to process the acoustic frames over time. Performance improvements over vanilla LSTM architectures have been reported by prepending a stack of frequency-LSTM (FLSTM) layers to the time LSTM. These FLSTM layers can learn a more robust input feature to the time LSTM layers by modeling time-fre… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

  4. arXiv:1909.13447  [pdf

    eess.AS cs.CL cs.SD

    DiPCo -- Dinner Party Corpus

    Authors: Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas

    Abstract: We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment. The corpus was created by recording multiple groups of four Amazon employee volunteers having a natural conversation in English around a dining table. The participants were recorded by a single-channel close-talk microphone and by five far-field 7-microphone array devices position… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.