Search | arXiv e-print repository

doi 10.5334/tismir.171

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

Authors: Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang , et al. (2 additional authors not shown)

Abstract: This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce t… ▽ More This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dBでしべる in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions. △ Less

Submitted 19 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: Published in Transactions of the International Society for Music Information Retrieval (https://transactions.ismir.net/articles/10.5334/tismir.171)

Journal ref: Transactions of the International Society for Music Information Retrieval, 7(1), pp.63-84, 2024

arXiv:2211.04346 [pdf, other]

Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement

Authors: Shucong Zhang, Malcolm Chadwick, Alberto Gil C. P. Ramos, Sourav Bhattacharya

Abstract: Personalised speech enhancement (PSE), which extracts only the speech of a target user and removes everything else from a recorded audio clip, can potentially improve users' experiences of audio AI modules deployed in the wild. To support a large variety of downstream audio tasks, such as real-time ASR and audio-call enhancement, a PSE solution should operate in a streaming mode, i.e., input audio… ▽ More Personalised speech enhancement (PSE), which extracts only the speech of a target user and removes everything else from a recorded audio clip, can potentially improve users' experiences of audio AI modules deployed in the wild. To support a large variety of downstream audio tasks, such as real-time ASR and audio-call enhancement, a PSE solution should operate in a streaming mode, i.e., input audio cleaning should happen in real-time with a small latency and real-time factor. Personalisation is typically achieved by extracting a target speaker's voice profile from an enrolment audio, in the form of a static embedding vector, and then using it to condition the output of a PSE model. However, a fixed target speaker embedding may not be optimal under all conditions. In this work, we present a streaming Transformer-based PSE model and propose a novel cross-attention approach that gives adaptive target speaker representations. We present extensive experiments and show that our proposed cross-attention approach outperforms competitive baselines consistently, even when our model is only approximately half the size. △ Less

Submitted 8 November, 2022; originally announced November 2022.

arXiv:2112.08535 [pdf, other]

Fractional cyber-neural systems -- a brief survey

Authors: Emily Reed, Sarthak Chatterjee, Guilherme Ramos, Paul Bogdan, Sérgio Pequito

Abstract: Neurotechnology has made great strides in the last 20 years. However, we still have a long way to go to commercialize many of these technologies as we lack a unified framework to study cyber-neural systems (CNS) that bring the hardware, software, and the neural system together. Dynamical systems play a key role in developing these technologies as they capture different aspects of the brain and pro… ▽ More Neurotechnology has made great strides in the last 20 years. However, we still have a long way to go to commercialize many of these technologies as we lack a unified framework to study cyber-neural systems (CNS) that bring the hardware, software, and the neural system together. Dynamical systems play a key role in developing these technologies as they capture different aspects of the brain and provide insight into their function. Converging evidence suggests that fractional-order dynamical systems are advantageous in modeling neural systems because of their compact representation and accuracy in capturing the long-range memory exhibited in neural behavior. In this brief survey, we provide an overview of fractional CNS that entails fractional-order systems in the context of CNS. In particular, we introduce basic definitions required for the analysis and synthesis of fractional CNS, encompassing system identification, state estimation, and closed-loop control. Additionally, we provide an illustration of some applications in the context of CNS and draw some possible future research directions. Ultimately, advancements in these three areas will be critical in developing the next generation of CNS, which will, ultimately, improve people's quality of life. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: 67 pages, 13 figures

arXiv:2111.13449 [pdf, ps, other]

Minimum jointly structural input and output selection for strongly connected networks

Authors: Guilherme Ramos, A. Pedro Aguiar, Sérgio Pequito

Abstract: In this paper, given a linear time-invariant strongly connected network, we study the problem of determining the minimum number of state variables that need to be simultaneously actuated and measured to ensure structural controllability and observability, respectively. This problem is fundamental in the design of multi-agent systems, where there are economic constraints in the decision of which ag… ▽ More In this paper, given a linear time-invariant strongly connected network, we study the problem of determining the minimum number of state variables that need to be simultaneously actuated and measured to ensure structural controllability and observability, respectively. This problem is fundamental in the design of multi-agent systems, where there are economic constraints in the decision of which agents to equip with a more costly on-board system that will allow the agent to have both actuation and sensing capabilities. Despite the combinatorial nature of this problem, we present a solution that couples the design of both structural controllability and structural observability counterparts to address it with polynomial-time complexity. △ Less

Submitted 26 November, 2021; originally announced November 2021.

arXiv:2109.02415 [pdf, other]

Evaluation of Convolutional Neural Networks for COVID-19 Classification on Chest X-Rays

Authors: Felipe André Zeiser, Cristiano André da Costa, Gabriel de Oliveira Ramos, Henrique Bohn, Ismael Santos, Rodrigo da Rosa Righi

Abstract: Early identification of patients with COVID-19 is essential to enable adequate treatment and to reduce the burden on the health system. The gold standard for COVID-19 detection is the use of RT-PCR tests. However, due to the high demand for tests, these can take days or even weeks in some regions of Brazil. Thus, an alternative for detecting COVID-19 is the analysis of Digital Chest X-rays (XR). C… ▽ More Early identification of patients with COVID-19 is essential to enable adequate treatment and to reduce the burden on the health system. The gold standard for COVID-19 detection is the use of RT-PCR tests. However, due to the high demand for tests, these can take days or even weeks in some regions of Brazil. Thus, an alternative for detecting COVID-19 is the analysis of Digital Chest X-rays (XR). Changes due to COVID-19 can be detected in XR, even in asymptomatic patients. In this context, models based on deep learning have great potential to be used as support systems for diagnosis or as screening tools. In this paper, we propose the evaluation of convolutional neural networks to identify pneumonia due to COVID-19 in XR. The proposed methodology consists of a preprocessing step of the XR, data augmentation, and classification by the convolutional architectures DenseNet121, InceptionResNetV2, InceptionV3, MovileNetV2, ResNet50, and VGG16 pre-trained with the ImageNet dataset. The obtained results demonstrate that the VGG16 architecture obtained superior performance in the classification of XR for the evaluation metrics using the methodology proposed in this article. The obtained results for our methodology demonstrate that the VGG16 architecture presented a superior performance in the classification of XR, with an Accuracy of 85.11%, Sensitivity of 85.25%, Specificity of $85.16%, F1-score of $85.03%, and an AUえーゆーC of 0.9758. △ Less

Submitted 6 September, 2021; originally announced September 2021.

arXiv:2107.13493 [pdf, other]

Minimum Structural Sensor Placement for Switched Linear Time-Invariant Systems and Unknown Inputs

Authors: Emily A. Reed, Guilherme Ramos, Paul Bogdan, Sérgio Pequito

Abstract: In this paper, we study the structural state and input observability of continuous-time switched linear time-invariant systems and unknown inputs. First, we provide necessary and sufficient conditions for their structural state and input observability that can be efficiently verified in $O((m(n+p))^2)$, where $n$ is the number of state variables, $p$ is the number of unknown inputs, and $m$ is the… ▽ More In this paper, we study the structural state and input observability of continuous-time switched linear time-invariant systems and unknown inputs. First, we provide necessary and sufficient conditions for their structural state and input observability that can be efficiently verified in $O((m(n+p))^2)$, where $n$ is the number of state variables, $p$ is the number of unknown inputs, and $m$ is the number of modes. Moreover, we address the minimum sensor placement problem for these systems by adopting a feed-forward analysis and by providing an algorithm with a computational complexity of $ O((m(n+p)+αあるふぁ)^{2.373})$, where $αあるふぁ$ is the number of target strongly connected components of the system's digraph representation. Lastly, we explore different assumptions on both the system and unknown inputs (latent space) dynamics that add more structure to the problem, and thereby, enable us to render algorithms with lower computational complexity, which are suitable for implementation in large-scale systems. △ Less

Submitted 28 July, 2021; originally announced July 2021.

arXiv:2107.00431 [pdf, other]

A Discrete-time Reputation-based Resilient Consensus Algorithm for Synchronous or Asynchronous Communications

Authors: Guilherme Ramos, Daniel Silvestre, Carlos Silvestre

Abstract: We tackle the problem of a set of agents achieving resilient consensus in the presence of attacked agents. We present a discrete-time reputation-based consensus algorithm for synchronous and asynchronous networks by developing a local strategy where, at each time, each agent assigns a reputation (between zero and one) to each neighbor. The reputation is then used to weigh the neighbors' values in… ▽ More We tackle the problem of a set of agents achieving resilient consensus in the presence of attacked agents. We present a discrete-time reputation-based consensus algorithm for synchronous and asynchronous networks by developing a local strategy where, at each time, each agent assigns a reputation (between zero and one) to each neighbor. The reputation is then used to weigh the neighbors' values in the update of its state. Under mild assumptions, we show that: (i) the proposed method converges exponentially to the consensus of the regular agents; (ii) if a regular agent identifies a neighbor as an attacked node, then it is indeed an attacked node; (iii) if the consensus value of the normal nodes differs from that of any of the attacked nodes' values, then the reputation that a regular agent assigns to the attacked neighbors goes to zero. Further, we extend our method to achieve resilience in the scenarios where there are noisy nodes, dynamic networks and stochastic node selection. Finally, we illustrate our algorithm with several examples, and we delineate some attacking scenarios that can be dealt by the current proposal but not by the state-of-the-art approaches. △ Less

Submitted 1 July, 2021; originally announced July 2021.

arXiv:2105.10229 [pdf, other]

A scalable distributed dynamical systems approach to compute the strongly connected components and diameter of networks

Authors: Emily A. Reed, Guilherme Ramos, Paul Bogdan, Sérgio Pequito

Abstract: Finding strongly connected components (SCCs) and the diameter of a directed network play a key role in a variety of discrete optimization problems, and subsequently, machine learning and control theory problems. On the one hand, SCCs are used in solving the 2-satisfiability problem, which has applications in clustering, scheduling, and visualization. On the other hand, the diameter has application… ▽ More Finding strongly connected components (SCCs) and the diameter of a directed network play a key role in a variety of discrete optimization problems, and subsequently, machine learning and control theory problems. On the one hand, SCCs are used in solving the 2-satisfiability problem, which has applications in clustering, scheduling, and visualization. On the other hand, the diameter has applications in network learning and discovery problems enabling efficient internet routing and searches, as well as identifying faults in the power grid. In this paper, we leverage consensus-based principles to find the SCCs in a scalable and distributed fashion with a computational complexity of $\mathcal{O}\left(Dd_{\text{in-degree}}^{\max}\right)$, where $D$ is the (finite) diameter of the network and $d_{\text{in-degree}}^{\max}$ is the maximum in-degree of the network. Additionally, we prove that our algorithm terminates in $D+1$ iterations, which allows us to retrieve the diameter of the network. We illustrate the performance of our algorithm on several random networks, including Erdős-Rényi, Barabási-Albert, and \mbox{Watts-Strogatz} networks. △ Less

Submitted 24 June, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

Comments: Authors Reed and Ramos contributed equally

arXiv:2012.03675 [pdf, other]

Binary Segmentation of Seismic Facies Using Encoder-Decoder Neural Networks

Authors: Gefersom Lima, Gabriel Ramos, Sandro Rigo, Felipe Zeiser, Ariane da Silveira

Abstract: The interpretation of seismic data is vital for characterizing sediments' shape in areas of geological study. In seismic interpretation, deep learning becomes useful for reducing the dependence on handcrafted facies segmentation geometry and the time required to study geological areas. This work presents a Deep Neural Network for Facies Segmentation (DNFS) to obtain state-of-the-art results for se… ▽ More The interpretation of seismic data is vital for characterizing sediments' shape in areas of geological study. In seismic interpretation, deep learning becomes useful for reducing the dependence on handcrafted facies segmentation geometry and the time required to study geological areas. This work presents a Deep Neural Network for Facies Segmentation (DNFS) to obtain state-of-the-art results for seismic facies segmentation. DNFS is trained using a combination of cross-entropy and Jaccard loss functions. Our results show that DNFS obtains highly detailed predictions for seismic facies segmentation using fewer parameters than StNet and U-Net. △ Less

Submitted 14 November, 2020; originally announced December 2020.

arXiv:2008.11223 [pdf, other]

Structural Systems Theory: an overview of the last 15 years

Authors: Guilherme Ramos, A. Pedro Aguiar, Sergio Pequito

Abstract: In this paper, we provide an overview of the research conducted in the context of structural systems since the latest survey by Dion et al. in 2003. We systematically consider all the papers that cite this survey as well as the seminal work in this field that took place on and after the publication of the later survey, are published in peer-reviewed venues and in English. Structural systems theory… ▽ More In this paper, we provide an overview of the research conducted in the context of structural systems since the latest survey by Dion et al. in 2003. We systematically consider all the papers that cite this survey as well as the seminal work in this field that took place on and after the publication of the later survey, are published in peer-reviewed venues and in English. Structural systems theory deals with parametric systems where parameters might be unknown and, therefore, addresses the study of systems properties that depend only on the system's structure (or topology) described by the inter-dependencies between state variables. Remarkably, structural systems properties hold generically (i.e., almost always) under the assumption that parameters are independent. Therefore, it constitutes an approach to assess necessary conditions that systems should satisfy. In recent years, structural systems theory was applied to design systems that attain such properties, as well as to endure resilient/security and privacy properties. Furthermore, structural systems theory enables the formulation of such topics as combinatorial optimization problems, which allow us to understand their computational complexity and find algorithms that are efficiently deployed in the context of large-scale systems. In particular, we present an overview of how structural systems theory has been used in the context of linear time-invariant systems, as well as other dynamical models, for which a brief description of the different problem statements and solutions approaches are presented. Next, we describe recent variants of structural systems theory, as well as different applications of the classical and new approaches. Finally, we provide an overview of recent and future directions in this field. △ Less

Submitted 25 August, 2020; originally announced August 2020.

arXiv:2008.04574 [pdf, other]

Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems

Authors: Ravichander Vipperla, Sangjun Park, Kihyun Choo, Samin Ishtiaq, Kyoungbo Min, Sourav Bhattacharya, Abhinav Mehrotra, Alberto Gil C. P. Ramos, Nicholas D. Lane

Abstract: LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per infe… ▽ More LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per inference; and 2) Bit-bunching, which reduces the computations in the final layer of LPCNet. With the proposed bunching techniques, LPCNet, in conjunction with a Deep Convolutional TTS (DCTTS) acoustic model, shows a 2.19x improvement over the baseline run-time when running on a mobile device, with a less than 0.1 decrease in TTS mean opinion score (MOS). △ Less

Submitted 11 August, 2020; originally announced August 2020.

Comments: Interspeech 2020

arXiv:2007.02928 [pdf, ps, other]

Multiperiod Stochastic Peak Shaving Using Storage

Authors: Benjamin Flamm, Guillermo Ramos, Annika Eichler, John Lygeros

Abstract: We present an online stochastic model predictive control framework for demand charge management for a grid-connected consumer with attached electrical energy storage. The consumer we consider must satisfy an inflexible but stochastic electricity demand, and also receives a stochastic electricity inflow. The optimization problem formulated solves a stochastic cost minimization problem, with given w… ▽ More We present an online stochastic model predictive control framework for demand charge management for a grid-connected consumer with attached electrical energy storage. The consumer we consider must satisfy an inflexible but stochastic electricity demand, and also receives a stochastic electricity inflow. The optimization problem formulated solves a stochastic cost minimization problem, with given weather forecast scenarios converted into forecast demand and inflow. We introduce a novel weighting scheme to account for cases where the optimization horizon spans multiple demand charge periods. The optimization scheme is tested in a setting with building demand and photovoltaic array inflow data from a real office building. The simulation study allows us to compare various design and modeling alternatives, ultimately proposing a policy based on causal affine decision rules. △ Less

Submitted 6 July, 2020; originally announced July 2020.

Comments: 8 pages, 7 figures

arXiv:1904.02644 [pdf, other]

doi 10.1103/PhysRevX.9.041050

Characterising optical fibre transmission matrices using metasurface reflector stacks for lensless imaging without distal access

Authors: George S. D. Gordon, Milana Gataric, Alberto Gil C. P. Ramos, Ralf Mouthaan, Calum Williams, Jonghee Yoon, Timothy D. Wilkinson, Sarah E. Bohndiek

Abstract: The ability to form images through hair-thin optical fibres promises to open up new applications from biomedical imaging to industrial inspection. Unfortunately, deployment has been limited because small changes in mechanical deformation (e.g. bending) and temperature can completely scramble optical information, distorting images. Since such changes are dynamic, correcting them requires measuremen… ▽ More The ability to form images through hair-thin optical fibres promises to open up new applications from biomedical imaging to industrial inspection. Unfortunately, deployment has been limited because small changes in mechanical deformation (e.g. bending) and temperature can completely scramble optical information, distorting images. Since such changes are dynamic, correcting them requires measurement of the fibre transmission matrix (TM) in situ immediately before imaging. TM calibration typically requires access to both the proximal and distal facets of the fibre simultaneously, which is not feasible during most realistic usage scenarios without compromising the thin form factor with bulky distal optics. Here, we introduce a new approach to determine the TM of multi-mode fibre (MMF) or multi-core fibre (MCF) in a reflection-mode configuration without access to the distal facet. A thin stack of structured metasurface reflectors is used at the distal facet to introduce wavelength-dependent, spatially heterogeneous reflectance profiles. We derive a first-order fibre model that compensates these wavelength-dependent changes in the TM and show that, consequently, the reflected data at 3 wavelengths can be used to unambiguously reconstruct the full TM by an iterative optimisation algorithm. We then present a method for sample illumination and imaging following TM reconstruction. Unlike previous approaches, our method does not require the TM to be unitary making it applicable to physically realistic fibre systems. We demonstrate TM reconstruction and imaging first using simulated non-unitary fibres and noisy reflection matrices, then using much larger experimentally-measured TMs of a densely-packed MCF, and finally on an experimentally-measured multi-wavelength set of TMs recorded from a MMF. Our findings pave the way for online transmission matrix calibration in situ in hair-thin optical fibres △ Less

Submitted 5 April, 2019; v1 submitted 4 April, 2019; originally announced April 2019.

Comments: Main text: 38 pages, 9 Figures, Appendices: 26 pages, 6 Figures. Corrected author affiliation

Journal ref: Phys. Rev. X 9, 041050 (2019)

arXiv:1804.10636 [pdf, other]

doi 10.1109/TMI.2018.2875875

Reconstruction of optical vector-fields with applications in endoscopic imaging

Authors: Milana Gataric, George S. D. Gordon, Francesco Renna, Alberto Gil C. P. Ramos, Maria P. Alcolea, Sarah E. Bohndiek

Abstract: We introduce a framework for the reconstruction of the amplitude, phase and polarisation of an optical vector-field using calibration measurements acquired by an imaging device with an unknown linear transformation. By incorporating effective regularisation terms, this new approach is able to recover an optical vector-field with respect to an arbitrary representation system, which may be different… ▽ More We introduce a framework for the reconstruction of the amplitude, phase and polarisation of an optical vector-field using calibration measurements acquired by an imaging device with an unknown linear transformation. By incorporating effective regularisation terms, this new approach is able to recover an optical vector-field with respect to an arbitrary representation system, which may be different from the one used in calibration. In particular, it enables the recovery of an optical vector-field with respect to a Fourier basis, which is shown to yield indicative features of increased scattering associated with tissue abnormalities. We demonstrate the effectiveness of our approach using synthetic holographic images as well as biological tissue samples in an experimental setting where measurements of an optical vector-field are acquired by a fibre endoscope, and observe that indeed the recovered Fourier coefficients are useful in distinguishing healthy tissues from lesions in early stages of oesophageal cancer. △ Less

Submitted 18 July, 2018; v1 submitted 27 April, 2018; originally announced April 2018.

Showing 1–14 of 14 results for author: Ramos, G