(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–9 of 9 results for author: Hoffmeister, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2401.02417  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

    Authors: David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister

    Abstract: While word error rates of automatic speech recognition (ASR) systems have consistently fallen, natural language understanding (NLU) applications built on top of ASR systems still attribute significant numbers of failures to low-quality speech recognition results. Existing assistant systems collect large numbers of these unsuccessful interactions, but these systems usually fail to learn from these… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: To appear in ICASSP 2024

  2. arXiv:2301.02736  [pdf, other

    eess.AS cs.LG cs.SD

    Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

    Authors: David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

    Abstract: Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning). In this work, we investigate the potential of leveragin… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

  3. arXiv:2110.09890  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-Modal Pre-Training for Automated Speech Recognition

    Authors: David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister

    Abstract: Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance. Unfortunately, approaches relying on such hyper-local information tend to be vulnerable to both local-level corruption (such as audio-frame drops, or loud noises) and global-level noise (such as environmental noise, or background noise… ▽ More

    Submitted 15 September, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Presented at ICASSP 2022

  4. arXiv:1909.13447  [pdf

    eess.AS cs.CL cs.SD

    DiPCo -- Dinner Party Corpus

    Authors: Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas

    Abstract: We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment. The corpus was created by recording multiple groups of four Amazon employee volunteers having a natural conversation in English around a dining table. The participants were recorded by a single-channel close-talk microphone and by five far-field 7-microphone array devices position… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

  5. Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

    Authors: Kenichi Kumatani, Minhua Wu, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister

    Abstract: The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the differenc… ▽ More

    Submitted 28 April, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: ICASSP2019, 5 pages. arXiv admin note: substantial text overlap with arXiv:1903.05299

    Report number: https://doi.org/10.1109/ICASSP.2019.8682294

    Journal ref: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019, page 6635-6639

  6. Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

    Authors: Minhua Wu, Kenichi Kumatani, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister

    Abstract: Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques do not always yield ASR accuracy improvement because the optimization criterion for speech enhancement is not directly relevant to the ASR objective. In this w… ▽ More

    Submitted 28 April, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: ICASSP 2019, 5 pages

    Report number: https://doi.org/10.1109/ICASSP.2019.8682977

    Journal ref: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019, pages 6640-6644

  7. arXiv:1902.02383  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    End-to-end Anchored Speech Recognition

    Authors: Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister

    Abstract: Voice-controlled house-hold devices, like Amazon Echo or Google Home, face the problem of performing speech recognition of device-directed speech in the presence of interfering background speech, i.e., background noise and interfering speech from another person or media device in proximity need to be ignored. We propose two end-to-end models to tackle this problem with information extracted from t… ▽ More

    Submitted 6 February, 2019; originally announced February 2019.

    Comments: Accepted by ICASSP 2019

  8. arXiv:1901.02348  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

    Authors: Ladislav Mošner, Minhua Wu, Anirudh Raju, Sree Hari Krishnan Parthasarathi, Kenichi Kumatani, Shiva Sundaram, Roland Maas, Björn Hoffmeister

    Abstract: For real-world speech recognition applications, noise robustness is still a challenge. In this work, we adopt the teacher-student (T/S) learning technique using a parallel clean and noisy corpus for improving automatic speech recognition (ASR) performance under multimedia noise. On top of that, we apply a logits selection method which only preserves the k highest values to prevent wrong emphasis o… ▽ More

    Submitted 15 March, 2019; v1 submitted 5 January, 2019; originally announced January 2019.

    Comments: To Appear in ICASSP 2019

  9. arXiv:1808.02504  [pdf, other

    cs.CL eess.AS

    Device-directed Utterance Detection

    Authors: Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister

    Abstract: In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants. Applications include rejection of false wake-ups or unintended interactions as well as enabling wake-word free follow-up queries. Consider the example interaction: $"Computer,~play~music", "Computer,~reduce~the~volume"$. In this interaction,… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

    Comments: Interspeech 2018 (accepted)