Computational bioacoustics with deep learning: a review and roadmap

D Stowell - PeerJ, 2022 - peerj.com
Animal vocalisations and natural soundscapes are fascinating objects of study, and contain
valuable evidence about animal behaviours, populations and ecosystems. They are studied …

Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation

Y Luo, N Mesgarani - IEEE/ACM transactions on audio, speech …, 2019 - ieeexplore.ieee.org
Single-channel, speaker-independent speech separation methods have recently seen great
progress. However, the accuracy, latency, and computational cost of such methods remain …

Wave-u-net: A multi-scale neural network for end-to-end audio source separation

D Stoller, S Ewert, S Dixon - arXiv preprint arXiv:1806.03185, 2018 - arxiv.org
Models for audio source separation usually operate on the magnitude spectrum, which
ignores phase information and makes separation performance dependant on hyper …

[PDF][PDF] Spleeter: a fast and efficient music source separation tool with pre-trained models

R Hennequin, A Khlif, F Voituret… - Journal of Open Source …, 2020 - joss.theoj.org
We present and release a new tool for music source separation with pre-trained models
called Spleeter. Spleeter was designed with ease of use, separation performance, and …

An overview of lead and accompaniment separation in music

Z Rafii, A Liutkus, FR Stöter, SI Mimilakis… - … on Audio, Speech …, 2018 - ieeexplore.ieee.org
Popular music is often composed of an accompaniment and a lead component, the latter
typically consisting of vocals. Filtering such mixtures to extract one or both components has …

Hybrid spectrogram and waveform source separation

A Défossez - arXiv preprint arXiv:2111.03600, 2021 - arxiv.org
Source separation models either work on the spectrogram or waveform domain. In this work,
we show how to perform end-to-end hybrid source separation, letting the model decide …

Phasen: A phase-and-harmonics-aware speech enhancement network

D Yin, C Luo, Z Xiong, W Zeng - Proceedings of the AAAI Conference on …, 2020 - aaai.org
Time-frequency (TF) domain masking is a mainstream approach for single-channel speech
enhancement. Recently, focuses have been put to phase prediction in addition to amplitude …

Phase-aware speech enhancement with deep complex u-net

HS Choi, JH Kim, J Huh, A Kim, JW Ha… - … Conference on Learning …, 2018 - openreview.net
Most deep learning-based models for speech enhancement have mainly focused on
estimating the magnitude of spectrogram while reusing the phase from noisy speech for …

Music source separation in the waveform domain

A Défossez, N Usunier, L Bottou, F Bach - arXiv preprint arXiv:1911.13254, 2019 - arxiv.org
Source separation for music is the task of isolating contributions, or stems, from different
instruments recorded individually and arranged together to form a song. Such components …

Music source separation with band-split RNN

Y Luo, J Yu - IEEE/ACM Transactions on Audio, Speech, and …, 2023 - ieeexplore.ieee.org
The performance of music source separation (MSS) models has been greatly improved in
recent years thanks to the development of novel neural network architectures and training …