(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–7 of 7 results for author: Heitkaemper, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2106.02472  [pdf, other

    cs.SD eess.AS

    A Database for Research on Detection and Enhancement of Speech Transmitted over HF links

    Authors: Jens Heitkaemper, Joerg Schmalenstroeer, Joerg Ullmann, Valentin Ion, Reinhold Haeb-Umbach

    Abstract: In this paper we present an open database for the development of detection and enhancement algorithms of speech transmitted over HF radio channels. It consists of audio samples recorded by various receivers at different locations across Europe, all monitoring the same single-sideband modulated transmission from a base station in Paderborn, Germany. Transmitted and received speech signals are preci… ▽ More

    Submitted 21 July, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Accepted to ITG 2021

  2. arXiv:2103.01599  [pdf, other

    cs.SD eess.AS

    Open Range Pitch Tracking for Carrier Frequency Difference Estimation from HF Transmitted Speech

    Authors: Joerg Schmalenstroeer, Jens Heitkaemper, Joerg Ullmann, Reinhold Haeb-Umbach

    Abstract: In this paper we investigate the task of detecting carrier frequency differences from demodulated single sideband signals by examining the pitch contours of the received baseband speech signal in the short-time spectral domain. From the detected pitch frequency trajectory and its harmonics a carrier frequency difference, which is caused by demodulating the radio signal with the wrong carrier frequ… ▽ More

    Submitted 3 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: Submitted to EUSIPCO 2021

  3. arXiv:2005.09913  [pdf, other

    eess.AS cs.SD

    Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments

    Authors: Jens Heitkaemper, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

    Abstract: Speech activity detection (SAD), which often rests on the fact that the noise is "more" stationary than speech, is particularly challenging in non-stationary environments, because the time variance of the acoustic scene makes it difficult to discriminate speech from noise. We propose two approaches to SAD, where one is based on statistical signal processing, while the other utilizes neural network… ▽ More

    Submitted 28 July, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: Accepted to Interspeech 2020

  4. arXiv:2005.04132  [pdf, other

    eess.AS cs.SD

    Asteroid: the PyTorch-based audio source separation toolkit for researchers

    Authors: Manuel Pariente, Samuele Cornell, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas, David Ditter, Ariel Frank, Antoine Deleforge, Emmanuel Vincent

    Abstract: This paper describes Asteroid, the PyTorch-based audio source separation toolkit for researchers. Inspired by the most successful neural source separation systems, it provides all neural building blocks required to build such a system. To improve reproducibility, Kaldi-style recipes on common audio source separation datasets are also provided. This paper describes the software architecture of Aste… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020

  5. arXiv:1911.08895  [pdf, other

    cs.SD cs.CL eess.AS

    Demystifying TasNet: A Dissecting Approach

    Authors: Jens Heitkaemper, Darius Jakobeit, Christoph Boeddeker, Lukas Drude, Reinhold Haeb-Umbach

    Abstract: In recent years time domain speech separation has excelled over frequency domain separation in single channel scenarios and noise-free environments. In this paper we dissect the gains of the time-domain audio separation network (TasNet) approach by gradually replacing components of an utterance-level permutation invariant training (u-PIT) based separation system in the frequency domain until the T… ▽ More

    Submitted 5 February, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: Accepted to ICASSP 2020

  6. arXiv:1910.13934  [pdf, other

    cs.SD cs.CL eess.AS

    SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition

    Authors: Lukas Drude, Jens Heitkaemper, Christoph Boeddeker, Reinhold Haeb-Umbach

    Abstract: We present a multi-channel database of overlapping speech for training, evaluation, and detailed analysis of source separation and extraction algorithms: SMS-WSJ -- Spatialized Multi-Speaker Wall Street Journal. It consists of artificially mixed speech taken from the WSJ database, but unlike earlier databases we consider all WSJ0+1 utterances and take care of strictly separating the speaker sets p… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020

  7. arXiv:1905.12230  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR

    Authors: Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach

    Abstract: In this paper, we present Hitachi and Paderborn University's joint effort for automatic speech recognition (ASR) in a dinner party scenario. The main challenges of ASR systems for dinner party recordings obtained by multiple microphone arrays are (1) heavy speech overlaps, (2) severe noise and reverberation, (3) very natural conversational content, and possibly (4) insufficient training data. As a… ▽ More

    Submitted 26 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Accepted to INTERSPEECH 2019