Computational bioacoustics with deep learning: a review and roadmap
D Stowell - PeerJ, 2022 - peerj.com
Animal vocalisations and natural soundscapes are fascinating objects of study, and contain
valuable evidence about animal behaviours, populations and ecosystems. They are studied …
valuable evidence about animal behaviours, populations and ecosystems. They are studied …
Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation
Y Luo, N Mesgarani - IEEE/ACM transactions on audio, speech …, 2019 - ieeexplore.ieee.org
Single-channel, speaker-independent speech separation methods have recently seen great
progress. However, the accuracy, latency, and computational cost of such methods remain …
progress. However, the accuracy, latency, and computational cost of such methods remain …
Wave-u-net: A multi-scale neural network for end-to-end audio source separation
Models for audio source separation usually operate on the magnitude spectrum, which
ignores phase information and makes separation performance dependant on hyper …
ignores phase information and makes separation performance dependant on hyper …
[PDF][PDF] Spleeter: a fast and efficient music source separation tool with pre-trained models
R Hennequin, A Khlif, F Voituret… - Journal of Open Source …, 2020 - joss.theoj.org
We present and release a new tool for music source separation with pre-trained models
called Spleeter. Spleeter was designed with ease of use, separation performance, and …
called Spleeter. Spleeter was designed with ease of use, separation performance, and …
An overview of lead and accompaniment separation in music
Popular music is often composed of an accompaniment and a lead component, the latter
typically consisting of vocals. Filtering such mixtures to extract one or both components has …
typically consisting of vocals. Filtering such mixtures to extract one or both components has …
Hybrid spectrogram and waveform source separation
A Défossez - arXiv preprint arXiv:2111.03600, 2021 - arxiv.org
Source separation models either work on the spectrogram or waveform domain. In this work,
we show how to perform end-to-end hybrid source separation, letting the model decide …
we show how to perform end-to-end hybrid source separation, letting the model decide …
Phasen: A phase-and-harmonics-aware speech enhancement network
Time-frequency (TF) domain masking is a mainstream approach for single-channel speech
enhancement. Recently, focuses have been put to phase prediction in addition to amplitude …
enhancement. Recently, focuses have been put to phase prediction in addition to amplitude …
Phase-aware speech enhancement with deep complex u-net
Most deep learning-based models for speech enhancement have mainly focused on
estimating the magnitude of spectrogram while reusing the phase from noisy speech for …
estimating the magnitude of spectrogram while reusing the phase from noisy speech for …
Music source separation in the waveform domain
Source separation for music is the task of isolating contributions, or stems, from different
instruments recorded individually and arranged together to form a song. Such components …
instruments recorded individually and arranged together to form a song. Such components …
Music source separation with band-split RNN
The performance of music source separation (MSS) models has been greatly improved in
recent years thanks to the development of novel neural network architectures and training …
recent years thanks to the development of novel neural network architectures and training …