(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–42 of 42 results for author: Ding, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.17645  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

    Authors: Shuangrui Ding, Zihan Liu, Xiaoyi Dong, Pan Zhang, Rui Qian, Conghui He, Dahua Lin, Jiaqi Wang

    Abstract: We present SongComposer, an innovative LLM designed for song composition. It could understand and generate melodies and lyrics in symbolic song representations, by leveraging the capability of LLM. Existing music-related LLM treated the music as quantized audio signals, while such implicit encoding leads to inefficient encoding and poor flexibility. In contrast, we resort to symbolic song represen… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: project page: https://pjlab-songcomposer.github.io/ code: https://github.com/pjlab-songcomposer/songcomposer

  2. arXiv:2401.12238  [pdf, other

    eess.AS cs.LG cs.SD

    Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

    Authors: Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello

    Abstract: Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 1 table, to be presented at ICASSP 2024 in Seoul, South Korea

  3. arXiv:2312.08553  [pdf, other

    eess.AS cs.SD

    USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models

    Authors: Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal

    Abstract: End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the enormous memory usage and computational cost. Therefore, model compression is an important research topic to fit USM-based ASR under budget in real-world scenarios… ▽ More

    Submitted 16 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024. Preprint

  4. arXiv:2311.07703  [pdf, other

    cs.CL cs.SD eess.AS

    Measuring Entrainment in Spontaneous Code-switched Speech

    Authors: Debasmita Bhattacharya, Siying Ding, Alayna Nguyen, Julia Hirschberg

    Abstract: It is well-known that speakers who entrain to one another have more successful conversations than those who do not. Previous research has shown that interlocutors entrain on linguistic features in both written and spoken monolingual domains. More recent work on code-switched communication has also shown preliminary evidence of entrainment on certain aspects of code-switching (CSW). However, such s… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: Edits: camera-ready manuscript for NAACL 2024

  5. Privacy-Preserved Aggregate Thermal Dynamic Model of Buildings

    Authors: Zeyin Hou, Shuai Lu, Yijun Xu, Haifeng Qiu, Wei Gu, Zhaoyang Dong, Shixing Ding

    Abstract: The thermal inertia of buildings brings considerable flexibility to the heating and cooling load, which is known to be a promising demand response resource. The aggregate model that can describe the thermal dynamics of the building cluster is an important interference for energy systems to exploit its intrinsic thermal inertia. However, the private information of users, such as the indoor temperat… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  6. arXiv:2309.09953  [pdf, other

    eess.SY math.AP

    PINN-based viscosity solution of HJB equation

    Authors: Tianyu Liu, Steven Ding, Jiarui Zhang, Liutao Zhou

    Abstract: This paper proposed a novel PINN-based viscosity solution for HJB equations. Although there exists work using PINN to solve HJB, but none of them gives the solution in viscosity sense. This paper reveals the fact that using the convex neural network, one can guarantee the viscosity solution and thus the neural network can easily converge to the true solution of HJB despite of the starting point.

    Submitted 18 September, 2023; originally announced September 2023.

  7. arXiv:2309.02732  [pdf, other

    eess.SY

    A study on fault diagnosis in nonlinear dynamic systems with uncertainties

    Authors: Steven X. Ding, Linlin Li

    Abstract: In this draft, fault diagnosis in nonlinear dynamic systems is addressed. The objective of this work is to establish a framework, in which not only model-based but also data-driven and machine learning based fault diagnosis strategies can be uniformly handled. Instead of the well-established input-output and the associated state space models, stable image and kernel representations are adopted in… ▽ More

    Submitted 26 October, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  8. arXiv:2306.04286  [pdf, other

    cs.SD cs.AI eess.AS

    A Mask Free Neural Network for Monaural Speech Enhancement

    Authors: Liang Liu, Haixin Guan, Jinlong Ma, Wei Dai, Guangyong Wang, Shaowei Ding

    Abstract: In speech enhancement, the lack of clear structural characteristics in the target speech phase requires the use of conservative and cumbersome network frameworks. It seems difficult to achieve competitive performance using direct methods and simple network architectures. However, we propose the MFNet, a direct and simple network that can not only map speech but also map reverse noise. This network… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  9. arXiv:2306.02020  [pdf, ps, other

    eess.SY

    Replay Attack Detection Based on Parity Space Method for Cyber-Physical Systems

    Authors: Dong Zhao, Yang Shi, Steven X. Ding, Yueyang Li, Fangzhou Fu

    Abstract: The replay attack detection problem is studied from a new perspective based on parity space method in this paper. The proposed detection methods have the ability to distinguish system fault and replay attack, handle both input and output data replay, maintain certain control performance, and can be implemented conveniently and efficiently. First, the replay attack effect on the residual is derived… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

  10. arXiv:2305.16619  [pdf, other

    eess.AS

    2-bit Conformer quantization for automatic speech recognition

    Authors: Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He

    Abstract: Large speech models are rapidly gaining traction in research community. As a result, model compression has become an important topic, so that these models can fit in memory and be served with reduced cost. Practical approaches for compressing automatic speech recognition (ASR) model use int8 or int4 weight quantization. In this study, we propose to develop 2-bit ASR models. We explore the impact o… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: submitted to Interspeech

  11. arXiv:2305.15536  [pdf, other

    eess.AS cs.LG

    RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

    Authors: David Qiu, David Rim, Shaojin Ding, Oleg Rybakov, Yanzhang He

    Abstract: With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models. Despite recent advances in quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have differen… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  12. arXiv:2305.03997  [pdf, other

    eess.IV cs.CV

    Dual Degradation Representation for Joint Deraining and Low-Light Enhancement in the Dark

    Authors: Xin Lin, Jingtong Yue, Sixian Ding, Chao Ren, Lu Qi, Ming-Hsuan Yang

    Abstract: Rain in the dark poses a significant challenge to deploying real-world applications such as autonomous driving, surveillance systems, and night photography. Existing low-light enhancement or deraining methods struggle to brighten low-light conditions and remove rain simultaneously. Additionally, cascade approaches like ``deraining followed by low-light enhancement'' or the reverse often result in… ▽ More

    Submitted 17 June, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

  13. arXiv:2303.08343  [pdf, ps, other

    eess.AS cs.AI cs.LG cs.SD

    Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

    Authors: Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw

    Abstract: Continued improvements in machine learning techniques offer exciting new opportunities through the use of larger models and larger training datasets. However, there is a growing need to offer these new capabilities on-board low-powered devices such as smartphones, wearables and other embedded environments where only low memory is available. Towards this, we consider methods to reduce the model siz… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted to IEEE ICASSP 2023

  14. arXiv:2211.02718  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

    Authors: Siwen Ding, You Zhang, Zhiyao Duan

    Abstract: Voice anti-spoofing systems are crucial auxiliaries for automatic speaker verification (ASV) systems. A major challenge is caused by unseen attacks empowered by advanced speech synthesis technologies. Our previous research on one-class learning has improved the generalization ability to unseen attacks by compacting the bona fide speech in the embedding space. However, such compactness lacks consid… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  15. arXiv:2208.10933  [pdf

    cond-mat.mtrl-sci eess.SY

    Large-Scale Integrated Flexible Tactile Sensor Array for Sensitive Smart Robotic Touch

    Authors: Zhenxuan Zhao, Jianshi Tang, Jian Yuan, Yijun Li, Yuan Dai, Jian Yao, Qingtian Zhang, Sanchuan Ding, Tingyu Li, Ruirui Zhang, Yu Zheng, Zhengyou Zhang, Song Qiu, Qingwen Li, Bin Gao, Ning Deng, He Qian, Fei Xing, Zheng You, Huaqiang Wu

    Abstract: In the long pursuit of smart robotics, it has been envisioned to empower robots with human-like senses, especially vision and touch. While tremendous progress has been made in image sensors and computer vision over the past decades, the tactile sense abilities are lagging behind due to the lack of large-scale flexible tactile sensor array with high sensitivity, high spatial resolution, and fast re… ▽ More

    Submitted 3 November, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: Correction in Methods: The weight ratio of TPU:DMF was set to be 1:5

    Journal ref: ACS Nano 2022, 16, 16784

  16. arXiv:2208.06411  [pdf, ps, other

    cs.CV eess.IV

    SFF-DA: Sptialtemporal Feature Fusion for Detecting Anxiety Nonintrusively

    Authors: Haimiao Mo, Yuchen Li, Shanlin Yang, Wei Zhang, Shuai Ding

    Abstract: Early detection of anxiety is crucial for reducing the suffering of individuals with mental disorders and improving treatment outcomes. Utilizing an mHealth platform for anxiety screening can be particularly practical in improving screening efficiency and reducing costs. However, the effectiveness of existing methods has been hindered by differences in mobile devices used to capture subjects' phys… ▽ More

    Submitted 8 March, 2023; v1 submitted 11 August, 2022; originally announced August 2022.

  17. arXiv:2208.01291  [pdf, other

    eess.SY

    Control theoretically explainable application of autoencoder methods to fault detection in nonlinear dynamic systems

    Authors: Linlin Li, Steven X. Ding, Ketian Liang, Zhiwen Chen, Ting Xue

    Abstract: This paper is dedicated to control theoretically explainable application of autoencoders to optimal fault detection in nonlinear dynamic systems. Autoencoder-based learning is a standard machine learning method and widely applied for fault (anomaly) detection and classification. In the context of representation learning, the so-called latent (hidden) variable plays an important role towards an opt… ▽ More

    Submitted 15 May, 2023; v1 submitted 2 August, 2022; originally announced August 2022.

  18. arXiv:2204.08466  [pdf, other

    eess.IV cs.AI cs.CV physics.med-ph

    Robust PCA Unrolling Network for Super-resolution Vessel Extraction in X-ray Coronary Angiography

    Authors: Binjie Qin, Haohao Mao, Yiming Liu, Jun Zhao, Yisong Lv, Yueqi Zhu, Song Ding, Xu Chen

    Abstract: Although robust PCA has been increasingly adopted to extract vessels from X-ray coronary angiography (XCA) images, challenging problems such as inefficient vessel-sparsity modelling, noisy and dynamic background artefacts, and high computational cost still remain unsolved. Therefore, we propose a novel robust PCA unrolling network with sparse feature selection for super-resolution XCA vessel imagi… ▽ More

    Submitted 23 April, 2022; v1 submitted 16 April, 2022; originally announced April 2022.

  19. arXiv:2204.06164  [pdf, other

    eess.AS cs.LG cs.SD

    A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

    Authors: Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman

    Abstract: In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of quality. Namely, with the dynamic cascaded encoder model, we explore three techniques to maximally boost the performance of each model size: 1) Use separa… ▽ More

    Submitted 24 June, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted by INTERSPEECH 2022

  20. arXiv:2204.03793  [pdf, other

    eess.AS cs.LG cs.SD

    Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

    Authors: Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw

    Abstract: Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers. In this work, we present Personal VAD 2.0, a personalized voice activity detector that detects the voice activity of a target speaker, as part of a streaming on-device ASR system. Although… ▽ More

    Submitted 24 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted by INTERSPEECH 2022

  21. arXiv:2203.15952  [pdf, other

    eess.AS cs.LG

    4-bit Conformer with Native Quantization Aware Training for Speech Recognition

    Authors: Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov

    Abstract: Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to compress neural networks and reduce computation cost. Most of the existing practical ASR systems apply post-training 8-bit quantization. To achieve a higher compr… ▽ More

    Submitted 2 March, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Published at INTERSPEECH 2022

  22. arXiv:2202.08108  [pdf, other

    eess.SY

    An alternative paradigm of fault diagnosis in dynamic systems: orthogonal projection-based methods

    Authors: Steven X. Ding, Linlin Li, Tianyu Liu

    Abstract: In this paper, we propose a new paradigm of fault diagnosis in dynamic systems as an alternative to the well-established observer-based framework. The basic idea behind this work is to (i) formulate fault detection and isolation as projection of measurement signals onto (system) subspaces in Hilbert space, and (ii) solve the resulting problems by means of projection methods with orthogonal project… ▽ More

    Submitted 7 May, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

  23. arXiv:2111.08185  [pdf, other

    eess.SY cs.LG

    Graph neural network-based fault diagnosis: a review

    Authors: Zhiwen Chen, Jiamin Xu, Cesare Alippi, Steven X. Ding, Yuri Shardt, Tao Peng, Chunhua Yang

    Abstract: Graph neural network (GNN)-based fault diagnosis (FD) has received increasing attention in recent years, due to the fact that data coming from several application domains can be advantageously represented as graphs. Indeed, this particular representation form has led to superior performance compared to traditional FD approaches. In this review, an easy introduction to GNN, potential applications t… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: 17 pages, 18 figures, 10 tables

  24. RF-Net: a Unified Meta-learning Framework for RF-enabled One-shot Human Activity Recognition

    Authors: Shuya Ding, Zhe Chen, Tianyue Zheng, Jun Luo

    Abstract: Radio-Frequency (RF) based device-free Human Activity Recognition (HAR) rises as a promising solution for many applications. However, device-free (or contactless) sensing is often more sensitive to environment changes than device-based (or wearable) sensing. Also, RF datasets strictly require on-line labeling during collection, starkly different from image and text data collections where human int… ▽ More

    Submitted 28 October, 2021; originally announced November 2021.

    Comments: 14 pages

    Journal ref: SenSys '20: Proceedings of the 18th Conference on Embedded Networked Sensor Systems, November 2020

  25. Enhancing RF Sensing with Deep Learning: A Layered Approach

    Authors: Tianyue Zheng, Zhe Chen, Shuya Ding, Jun Luo

    Abstract: In recent years, radio frequency (RF) sensing has gained increasing popularity due to its pervasiveness, low cost, non-intrusiveness, and privacy preservation. However, realizing the promises of RF sensing is highly nontrivial, given typical challenges such as multipath and interference. One potential solution leverages deep learning to build direct mappings from the RF domain to target domains, h… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: 7 pages

    Journal ref: IEEE Communications Magazine ( Volume: 59, Issue: 2, February 2021)

  26. arXiv:2110.04482  [pdf, other

    eess.AS cs.CL cs.SD

    Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis

    Authors: Mu Yang, Shaojin Ding, Tianlong Chen, Tong Wang, Zhangyang Wang

    Abstract: This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system, where each language was seen as an individual task and was learned sequentially and continually. It does not require pooled data from all languages altogether, and thus alleviates the storage and computation burden. One of the challenges of lifelong learning methods is "catastrophic forgetting": in… ▽ More

    Submitted 18 May, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022. Camera-ready

  27. arXiv:2108.04212  [pdf, other

    cs.CV cs.LG eess.IV

    AutoVideo: An Automated Video Action Recognition System

    Authors: Daochen Zha, Zaid Pervaiz Bhat, Yi-Wei Chen, Yicheng Wang, Sirui Ding, Jiaben Chen, Kwei-Herng Lai, Mohammad Qazim Bhat, Anmoll Kumar Jain, Alfredo Costilla Reyes, Na Zou, Xia Hu

    Abstract: Action recognition is an important task for video understanding with broad applications. However, developing an effective action recognition solution often requires extensive engineering efforts in building and testing different combinations of the modules and their hyperparameters. In this demo, we present AutoVideo, a Python system for automated video action recognition. AutoVideo is featured fo… ▽ More

    Submitted 16 July, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

    Comments: Accepted by IJCAI https://github.com/datamllab/autovideo

  28. arXiv:2107.13353  [pdf, other

    cs.LG cs.AI eess.SP

    Fast Wireless Sensor Anomaly Detection based on Data Stream in Edge Computing Enabled Smart Greenhouse

    Authors: Yihong Yang, Sheng Ding, Yuwen Liu, Shunmei Meng, Xiaoxiao Chi, Rui Ma, Chao Yan

    Abstract: Edge computing enabled smart greenhouse is a representative application of Internet of Things technology, which can monitor the environmental information in real time and employ the information to contribute to intelligent decision-making. In the process, anomaly detection for wireless sensor data plays an important role. However, traditional anomaly detection algorithms originally designed for an… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

    Comments: 12 pages, 8 figures

  29. arXiv:2106.00610  [pdf, other

    eess.SP cs.SD eess.AS

    Deep Learning for Depression Recognition with Audiovisual Cues: A Review

    Authors: Lang He, Mingyue Niu, Prayag Tiwari, Pekka Marttinen, Rui Su, Jiewei Jiang, Chenguang Guo, Hongyu Wang, Songtao Ding, Zhongmin Wang, Wei Dang, Xiaoying Pan

    Abstract: With the acceleration of the pace of work and life, people have to face more and more pressure, which increases the possibility of suffering from depression. However, many patients may fail to get a timely diagnosis due to the serious imbalance in the doctor-patient ratio in the world. Promisingly, physiological and psychological studies have indicated some differences in speech and facial express… ▽ More

    Submitted 27 May, 2021; originally announced June 2021.

  30. arXiv:2103.00210  [pdf, other

    eess.SY

    Application of the unified control and detection framework to detecting stealthy integrity cyber-attacks on feedback control systems

    Authors: Steven X. Ding, Linlin Li, Dong Zhao, Chris Louen, Tianyu Liu

    Abstract: This draft addresses issues of detecting stealthy integrity cyber-attacks on automatic control systems in the unified control and detection framework. A general form of integrity cyber-attacks that cannot be detected using the well-established observer-based technique is first introduced as kernel attacks. The well-known replay, zero dynamics and covert attacks are special forms of the kernel atta… ▽ More

    Submitted 4 June, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

  31. arXiv:2012.15427  [pdf, other

    quant-ph cs.LG eess.SY

    Curriculum-based Deep Reinforcement Learning for Quantum Control

    Authors: Hailan Ma, Daoyi Dong, Steven X. Ding, Chunlin Chen

    Abstract: Deep reinforcement learning has been recognized as an efficient technique to design optimal strategies for different complex systems without prior knowledge of the control landscape. To achieve a fast and precise control for quantum systems, we propose a novel deep reinforcement learning approach by constructing a curriculum consisting of a set of intermediate tasks defined by a fidelity threshold… ▽ More

    Submitted 2 January, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

  32. arXiv:2011.08482  [pdf

    cs.RO cs.LG eess.SP

    Collaborative Three-Tier Architecture Non-contact Respiratory Rate Monitoring using Target Tracking and False Peaks Eliminating Algorithms

    Authors: Haimiao Mo, Shuai Ding, Shanlin Yang, Athanasios V. Vasilakos, Xi Zheng

    Abstract: Monitoring the respiratory rate is crucial for helping us identify respiratory disorders. Devices for conventional respiratory monitoring are inconvenient and scarcely available. Recent research has demonstrated the ability of non-contact technologies, such as photoplethysmography and infrared thermography, to gather respiratory signals from the face and monitor breathing. However, the current non… ▽ More

    Submitted 26 July, 2022; v1 submitted 17 November, 2020; originally announced November 2020.

  33. arXiv:2009.03575  [pdf, ps, other

    cs.NI eess.SY

    NC-MOPSO: Network centrality guided multi-objective particle swarm optimization for transport optimization on networks

    Authors: Jiexin Wu, Cunlai Pu, Shuxin Ding, Guo Cao, Panos M. Pardalos

    Abstract: Transport processes are universal in real-world complex networks, such as communication and transportation networks. As the increase of the traffic in these complex networks, problems like traffic congestion and transport delay are becoming more and more serious, which call for a systematic optimization of these networks. In this paper, we formulate a multi-objective optimization problem (MOP) to… ▽ More

    Submitted 27 July, 2021; v1 submitted 8 September, 2020; originally announced September 2020.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  34. arXiv:2008.06006  [pdf, other

    eess.AS cs.LG stat.ML

    Textual Echo Cancellation

    Authors: Shaojin Ding, Ye Jia, Ke Hu, Quan Wang

    Abstract: In this paper, we propose Textual Echo Cancellation (TEC) - a framework for cancelling the text-to-speech (TTS) playback echo from overlapping speech recordings. Such a system can largely improve speech recognition performance and user experience for intelligent devices such as smart speakers, as the user can talk to the device while the device is still playing the TTS signal responding to the pre… ▽ More

    Submitted 16 September, 2021; v1 submitted 13 August, 2020; originally announced August 2020.

  35. arXiv:2005.03215  [pdf, other

    eess.AS cs.LG

    AutoSpeech: Neural Architecture Search for Speaker Recognition

    Authors: Shaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha, Zhangyang Wang

    Abstract: Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet. However, these backbones were originally proposed for image classification, and therefore may not be naturally fit for speaker recognition. Due to the prohibitive complexity of manually exploring the design space, we propose the first neural architecture… ▽ More

    Submitted 31 August, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

  36. arXiv:2003.09609  [pdf, ps, other

    quant-ph eess.SY

    Fault-tolerant Coherent H-infinity Control for Linear Quantum Systems

    Authors: Yanan Liu, Daoyi Dong, Ian R. Petersen, Qing Gao, Steven X. Ding, Shota Yokoyama, Hidehiro Yonezawa

    Abstract: Robustness and reliability are two key requirements for developing practical quantum control systems. The purpose of this paper is to design a coherent feedback controller for a class of linear quantum systems suffering from Markovian jumping faults so that the closed-loop quantum system has both fault tolerance and H-infinity disturbance attenuation performance. This paper first extends the physi… ▽ More

    Submitted 21 March, 2020; originally announced March 2020.

    Comments: 12 pages, 3 figures

  37. arXiv:2002.01607  [pdf, other

    cs.CV eess.IV

    Anomaly Detection by One Class Latent Regularized Networks

    Authors: Chengwei Chen, Pan Chen, Haichuan Song, Yiqing Tao, Yuan Xie, Shouhong Ding, Lizhuang Ma

    Abstract: Anomaly detection is a fundamental problem in computer vision area with many real-world applications. Given a wide range of images belonging to the normal class, emerging from some distribution, the objective of this task is to construct the model to detect out-of-distribution images belonging to abnormal instances. Semi-supervised Generative Adversarial Networks (GAN)-based methods have been gain… ▽ More

    Submitted 14 July, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

  38. arXiv:1912.10753  [pdf, other

    math.OC eess.SY

    Heterogeneous Hegselmann-Krause Dynamics with Environment and Communication Noise

    Authors: Ge Chen, Wei Su, Songyuan Ding, Yiguang Hong

    Abstract: The Hegselmann-Krause (HK) model is a wellknown opinion dynamics, attracting a significant amount of interest from a number of fields. However, the heterogeneous HK model is difficult to analyze - even the most basic property of convergence is still open to prove. For the first time, this paper takes into consideration heterogeneous HK models with environment or communication noise. Under environm… ▽ More

    Submitted 23 December, 2019; originally announced December 2019.

  39. arXiv:1909.08723  [pdf, other

    cs.CL cs.SD eess.AS

    Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

    Authors: Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language… ▽ More

    Submitted 14 October, 2019; v1 submitted 18 September, 2019; originally announced September 2019.

    Comments: Accepted to ASRU 2019

  40. arXiv:1908.04284  [pdf, other

    eess.AS cs.LG stat.ML

    Personal VAD: Speaker-Conditioned Voice Activity Detection

    Authors: Shaojin Ding, Quan Wang, Shuo-yiin Chang, Li Wan, Ignacio Lopez Moreno

    Abstract: In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level. This system is useful for gating the inputs to a streaming on-device speech recognition system, such that it only triggers for the target user, which helps reduce the computational cost and battery consumption, especially in scenarios where a keyword detector is unpreferable. We… ▽ More

    Submitted 8 April, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

    Comments: Speaker Odyssey 2020

  41. arXiv:1908.03735  [pdf

    eess.IV cs.CV cs.LG

    Automatic acute ischemic stroke lesion segmentation using semi-supervised learning

    Authors: Bin Zhao, Shuxue Ding, Hong Wu, Guohua Liu, Chen Cao, Song Jin, Zhiyang Liu

    Abstract: Ischemic stroke is a common disease in the elderly population, which can cause long-term disability and even death. However, the time window for treatment of ischemic stroke in its acute stage is very short. To fast localize and quantitively evaluate the acute ischemic stroke (AIS) lesions, many deep-learning-based lesion segmentation methods have been proposed in the literature, where a deep conv… ▽ More

    Submitted 20 September, 2020; v1 submitted 10 August, 2019; originally announced August 2019.

  42. arXiv:1803.10299  [pdf, other

    cs.CL cs.SD eess.AS

    Multi-Modal Data Augmentation for End-to-End ASR

    Authors: Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, Shinji Watanabe

    Abstract: We present a new end-to-end architecture for automatic speech recognition (ASR) that can be trained using \emph{symbolic} input in addition to the traditional acoustic input. This architecture utilizes two separate encoders: one for acoustic input and another for symbolic input, both sharing the attention and decoder parameters. We call this architecture a multi-modal data augmentation network (MM… ▽ More

    Submitted 18 June, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: 5 Pages, 1 Figure, accepted at INTERSPEECH 2018