(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 67 results for author: Wu, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.11277  [pdf, other

    cs.CL eess.AS

    Target conversation extraction: Source separation using turn-taking dynamics

    Authors: Tuochao Chen, Qirui Wang, Bohan Wu, Malek Itani, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota

    Abstract: Extracting the speech of participants in a conversation amidst interfering speakers and noise presents a challenging problem. In this paper, we introduce the novel task of target conversation extraction, where the goal is to extract the audio of a target conversation based on the speaker embedding of one of its participants. To accomplish this, we propose leveraging temporal patterns inherent in h… ▽ More

    Submitted 29 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by Interspeech 2024

  2. arXiv:2406.18373  [pdf, other

    cs.CL cs.SD eess.AS

    Dynamic Data Pruning for Automatic Speech Recognition

    Authors: Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu

    Abstract: The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  3. arXiv:2406.13357  [pdf, other

    cs.CL cs.SD eess.AS

    Transferable speech-to-text large language model alignment module

    Authors: Boyong Wu, Chao Yan, Haoran Pu

    Abstract: By leveraging the power of Large Language Models(LLMs) and speech foundation models, state of the art speech-text bimodal works can achieve challenging tasks like spoken translation(ST) and question answering(SQA) altogether with much simpler architectures. In this paper, we utilize the capability of Whisper encoder and pre-trained Yi-6B. Empirical results reveal that modal alignment can be achiev… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 2 figures

  4. arXiv:2405.20617  [pdf, other

    eess.SP

    Large-scale Outdoor Cell-free mMIMO Channel Measurement in an Urban Scenario at 3.5 GHz

    Authors: Yuning Zhang, Thomas Choi, Zihang Cheng, Issei Kanno, Masaaki Ito, Jorge Gomez-Ponce, Hussein Hammoud, Bowei Wu, Ashwani Pradhan, Kelvin Arana, Pramod Krishna, Tianyi Yang, Tyler Chen, Ishita Vasishtha, Haoyu Xie, Linyu Sun, Andreas F. Molisch

    Abstract: The design of cell-free massive MIMO (CF-mMIMO) systems requires accurate, measurement-based channel models. This paper provides the first results from the by far most extensive outdoor measurement campaign for CF-mMIMO channels in an urban environment. We measured impulse responses between over 20,000 potential access point (AP) locations and 80 user equipments (UEs) at 3.5 GHz with 350 MHz bandw… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Submitted to: VTC 2024-Fall

  5. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  6. arXiv:2311.05929  [pdf, other

    cs.CV eess.IV

    Efficient Segmentation with Texture in Ore Images Based on Box-supervised Approach

    Authors: Guodong Sun, Delong Huang, Yuting Peng, Le Cheng, Bo Wu, Yang Zhang

    Abstract: Image segmentation methods have been utilized to determine the particle size distribution of crushed ores. Due to the complex working environment, high-powered computing equipment is difficult to deploy. At the same time, the ore distribution is stacked, and it is difficult to identify the complete features. To address this issue, an effective box-supervised technique with texture features is prov… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 14 pages, 8 figures

  7. arXiv:2306.02894  [pdf, ps, other

    eess.IV

    Recyclable Semi-supervised Method Based on Multi-model Ensemble for Video Scene Parsing

    Authors: Biao Wu, Shaoli Liu, Diankai Zhang, Chengjian Zheng, Si Gao, Xiaofeng Zhang, Ning Wang

    Abstract: Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Since the real-world is actually video-based rather than a static state, learning to perform video semantic segmentation is more reasonable and practical for realistic applications. In this paper, we adopt Mask2Former… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  8. arXiv:2305.01183  [pdf, other

    cs.CV eess.IV

    Faster OreFSDet : A Lightweight and Effective Few-shot Object Detector for Ore Images

    Authors: Yang Zhang, Le Cheng, Yuting Peng, Chengming Xu, Yanwei Fu, Bo Wu, Guodong Sun

    Abstract: For the ore particle size detection, obtaining a sizable amount of high-quality ore labeled data is time-consuming and expensive. General object detection methods often suffer from severe over-fitting with scarce labeled data. Despite their ability to eliminate over-fitting, existing few-shot object detectors encounter drawbacks such as slow detection speed and high memory requirements, making the… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 18 pages, 11 figures

  9. S2S-WTV: Seismic Data Noise Attenuation Using Weighted Total Variation Regularized Self-Supervised Learning

    Authors: Zitai Xu, Yisi Luo, Bangyu Wu, Deyu Meng

    Abstract: Seismic data often undergoes severe noise due to environmental factors, which seriously affects subsequent applications. Traditional hand-crafted denoisers such as filters and regularizations utilize interpretable domain knowledge to design generalizable denoising techniques, while their representation capacities may be inferior to deep learning denoisers, which can learn complex and representativ… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

    Journal ref: TGRS 2023

  10. arXiv:2211.14522  [pdf, other

    cs.CV eess.IV

    Visual Fault Detection of Multi-scale Key Components in Freight Trains

    Authors: Yang Zhang, Yang Zhou, Huilin Pan, Bo Wu, Guodong Sun

    Abstract: Fault detection for key components in the braking system of freight trains is critical for ensuring railway transportation safety. Despite the frequently employed methods based on deep learning, these fault detectors are highly reliant on hardware resources and are complex to implement. In addition, no train fault detectors consider the drop in accuracy induced by scale variation of fault parts. T… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

    Comments: 9 pages, 4 figures

  11. arXiv:2211.05256  [pdf, other

    eess.IV cs.CV

    Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang , et al. (29 additional authors not shown)

    Abstract: Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.08826, arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.03885

  12. A Lightweight NMS-free Framework for Real-time Visual Fault Detection System of Freight Trains

    Authors: Guodong Sun, Yang Zhou, Huilin Pan, Bo Wu, Ye Hu, Yang Zhang

    Abstract: Real-time vision-based system of fault detection (RVBS-FD) for freight trains is an essential part of ensuring railway transportation safety. Most existing vision-based methods still have high computational costs based on convolutional neural networks. The computational cost is mainly reflected in the backbone, neck, and post-processing, i.e., non-maximum suppression (NMS). In this paper, we propo… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: 11 pages, 5 figures, accepted by IEEE Transactions on Instrumentation and Measurement

  13. A simple suboptimal moving horizon estimation scheme with guaranteed robust stability

    Authors: Julian D. Schiller, Boyang Wu, Matthias A. Müller

    Abstract: We propose a suboptimal moving horizon estimation (MHE) scheme for a general class of nonlinear systems. To this end, we consider an MHE formulation that optimizes over the trajectory of a robustly stable observer. Assuming that the observer admits a Lyapunov function, we show that this function is an M-step Lyapunov function for suboptimal MHE. The presented sufficient conditions can be easily ve… ▽ More

    Submitted 15 July, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Journal ref: IEEE Control Systems Letters, vol. 7, pp. 19-24, 2023

  14. arXiv:2110.12224  [pdf, other

    cs.IT eess.SP

    Generalized Polarization Transform: A Novel Coded Transmission Paradigm

    Authors: Bolin Wu, Jincheng Dai, Kai Niu, Zhongwei Si, Ping Zhang, Sen Wang, Yifei Yuan, Chih-Lin I

    Abstract: For the upcoming 6G wireless networks, a new wave of applications and services will demand ultra-high data rates and reliability. To this end, future wireless systems are expected to pave the way for entirely new fundamental air interface technologies to attain a breakthrough in spectrum efficiency (SE). This article discusses a new paradigm, named generalized polarization transform (GPT), to achi… ▽ More

    Submitted 27 April, 2022; v1 submitted 23 October, 2021; originally announced October 2021.

  15. arXiv:2109.06715  [pdf, other

    cs.LG cs.AI cs.NI eess.SP

    IGNNITION: Bridging the Gap Between Graph Neural Networks and Networking Systems

    Authors: David Pujol-Perich, José Suárez-Varela, Miquel Ferriol, Shihan Xiao, Bo Wu, Albert Cabellos-Aparicio, Pere Barlet-Ros

    Abstract: Recent years have seen the vast potential of Graph Neural Networks (GNN) in many fields where data is structured as graphs (e.g., chemistry, recommender systems). In particular, GNNs are becoming increasingly popular in the field of networking, as graphs are intrinsically present at many levels (e.g., topology, routing). The main novelty of GNNs is their ability to generalize to other networks uns… ▽ More

    Submitted 2 February, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

    Journal ref: IEEE Network, vol. 35, no. 6, pp. 171-177, 2021

  16. Optimal Variable Speed Limit Control Strategy on Freeway Segments under Fog Conditions

    Authors: Ben Zhai, Yanli Wang, Wenxuan Wang, Bing Wu

    Abstract: Fog is a critical external factor that threatens traffic safety on freeways. Variable speed limit (VSL) control can effectively harmonize vehicle speed and improve safety. However, most existing weather-related VSL controllers are limited to adapt to the dynamic traffic environment. This study developed optimal VSL control strategy under fog conditions with fully consideration of factors that affe… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

  17. arXiv:2107.14137  [pdf

    eess.SP

    Radio Frequency Interference Management with Free-Space Optical Communication and Photonic Signal Processing

    Authors: Yang Qi, Ben Wu

    Abstract: We design and experimentally demonstrate a radio frequency interference management system with free-space optical communication and photonic signal processing. The system provides real-time interference cancellation in 6 GHz wide bandwidth.

    Submitted 25 July, 2021; originally announced July 2021.

    Comments: Frontier in Optics 2021

  18. arXiv:2107.14134  [pdf

    eess.SP

    Photonic Interference Cancellation with Hybrid Free Space Optical Communication and MIMO Receiver

    Authors: Taichu Shi, Yang Qi, Ben Wu

    Abstract: We proposed and demonstrated a hybrid blind source separation system which can switch between multiple-input and multi-output mode and free space optical communication mode depends on different situation to get best condition for separation.

    Submitted 25 July, 2021; originally announced July 2021.

    Comments: Frontier in Optics 2021

  19. Sub-Nyquist Sampling with Optical Pulses for Photonic Blind Source Separation

    Authors: Taichu Shi, Yang Qi, Weipeng Zhang, Paul Prucnal, Ben Wu

    Abstract: We proposed and demonstrated an optical pulse sampling method for photonic blind source separation. It can separate large bandwidth of mixed signals by small sampling frequency, which can reduce the workload of digital signal processing.

    Submitted 25 July, 2021; originally announced July 2021.

    Comments: Frontier in Optics

  20. arXiv:2107.10415  [pdf

    eess.SP

    Wideband photonic interference cancellation based on free space optical communication

    Authors: Yang Qi, Ben Wu

    Abstract: We propose and experimentally demonstrate an interference management system that removes wideband wireless interference by using photonic signal processing and free space optical communication. The receiver separates radio frequency interferences by upconverting the mixed signals to optical frequencies and processing the signals with the photonic circuits. Signals with GHz bandwidth are processed… ▽ More

    Submitted 13 November, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

  21. arXiv:2107.10357  [pdf

    eess.SP physics.optics

    Wideband photonic blind source separation with optical pulse sampling

    Authors: Taichu Shi, Yang Qi, Weipeng Zhang, Paul R. Prucnal, Jie Li, Ben Wu

    Abstract: We propose and experimentally demonstrate an optical pulse sampling method for photonic blind source separation. The photonic system processes and separates wideband signals based on the statistical information of the mixed signals and thus the sampling frequency can be orders of magnitude lower than the bandwidth of the signals. The ultra-fast optical pulse functions as a tweezer that collects sa… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

  22. arXiv:2103.16849  [pdf, other

    eess.AS cs.SD

    TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation

    Authors: Helin Wang, Bo Wu, Lianwu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu

    Abstract: In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments. We propose a temporal-contextual attention approach on the deep neural network (DNN) for environment-aware speech dereverberation, which can adaptively attend to the contextual information. More specifically, a FullBand based Tempo… ▽ More

    Submitted 26 August, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: Submitted to Interspeech 2021

  23. arXiv:2012.10580  [pdf, other

    cs.CV cs.LG eess.IV

    Identifying Invariant Texture Violation for Robust Deepfake Detection

    Authors: Xinwei Sun, Botong Wu, Wei Chen

    Abstract: Existing deepfake detection methods have reported promising in-distribution results, by accessing published large-scale dataset. However, due to the non-smooth synthesis method, the fake samples in this dataset may expose obvious artifacts (e.g., stark visual contrast, non-smooth boundary), which were heavily relied on by most of the frame-level detection methods above. As these artifacts do not c… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

  24. arXiv:2011.12985  [pdf, other

    cs.SD cs.LG eess.AS

    FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge

    Authors: Bichen Wu, Qing He, Peizhao Zhang, Thilo Koehler, Kurt Keutzer, Peter Vajda

    Abstract: Nowadays more and more applications can benefit from edge-based text-to-speech (TTS). However, most existing TTS models are too computationally expensive and are not flexible enough to be deployed on the diverse variety of edge devices with their equally diverse computational capacities. To address this, we propose FBWave, a family of efficient and scalable neural vocoders that can achieve optimal… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  25. arXiv:2011.09162  [pdf, other

    eess.AS cs.SD

    WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation

    Authors: Zhaoheng Ni, Yong Xu, Meng Yu, Bo Wu, Shixiong Zhang, Dong Yu, Michael I Mandel

    Abstract: This paper aims at eliminating the interfering speakers' speech, additive noise, and reverberation from the noisy multi-talker speech mixture that benefits automatic speech recognition (ASR) backend. While the recently proposed Weighted Power minimization Distortionless response (WPD) beamformer can perform separation and dereverberation simultaneously, the noise cancellation component still has t… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

    Comments: accepted by SLT 2021

  26. arXiv:2011.07755  [pdf, other

    eess.AS

    Audio-visual Multi-channel Integration and Recognition of Overlapped Speech

    Authors: Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu

    Abstract: Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades. However, recognition of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in current ASR systems. Motivated by the invariance of visual modality to acoustic signal corruption and the additional cues they provide to sep… ▽ More

    Submitted 30 August, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: TASLP 2021

  27. arXiv:2010.11607  [pdf, other

    cs.CR cs.LG cs.SD eess.AS

    Backdoor Attack against Speaker Verification

    Authors: Tongqing Zhai, Yiming Li, Ziqi Zhang, Baoyuan Wu, Yong Jiang, Shu-Tao Xia

    Abstract: Speaker verification has been widely and successfully adopted in many mission-critical areas for user identification. The training of speaker verification requires a large amount of data, therefore users usually need to adopt third-party data ($e.g.$, data from the Internet or third-party data company). This raises the question of whether adopting untrusted third-party data can pose a security thr… ▽ More

    Submitted 2 February, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: Accepted by the ICASSP 2021. The first two authors contributed equally to this work

  28. arXiv:2009.00155  [pdf, other

    cs.CV cs.LG eess.IV

    A Review of Single-Source Deep Unsupervised Visual Domain Adaptation

    Authors: Sicheng Zhao, Xiangyu Yue, Shanghang Zhang, Bo Li, Han Zhao, Bichen Wu, Ravi Krishna, Joseph E. Gonzalez, Alberto L. Sangiovanni-Vincentelli, Sanjit A. Seshia, Kurt Keutzer

    Abstract: Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks. However, in many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data. To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another s… ▽ More

    Submitted 18 September, 2020; v1 submitted 31 August, 2020; originally announced September 2020.

  29. arXiv:2008.04768  [pdf, other

    eess.SY

    Constrained Active Classification Using Partially Observable Markov Decision Processes

    Authors: Bo Wu, Niklas Lauffer, Mohamadreza Ahmadi, Suda Bharadwaj, Zhe Xu, Ufuk Topcu

    Abstract: In this work, we study the problem of actively classifying the attributes of dynamical systems characterized as a finite set of Markov decision process (MDP) models. We are interested in finding strategies that actively interact with the dynamical system and observe its reactions so that the attribute of interest is classified efficiently with high confidence. We present a decision-theoretic frame… ▽ More

    Submitted 4 January, 2023; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1810.00097

  30. arXiv:2008.00164  [pdf, other

    eess.SY

    Byzantine-Resilient Distributed Hypothesis Testing With Time-Varying Network Topology

    Authors: Bo Wu, Steven Carr, Suda Bharadwaj, Zhe Xu, Ufuk Topcu

    Abstract: We study the problem of distributed hypothesis testing over a network of mobile agents with limited communication and sensing ranges to infer the true hypothesis collaboratively. In particular, we consider a scenario where there is an unknown subset of compromised agents that may deliberately share altered information to undermine the team objective. We propose two distributed algorithms where eac… ▽ More

    Submitted 17 July, 2021; v1 submitted 31 July, 2020; originally announced August 2020.

  31. arXiv:2007.15874  [pdf

    eess.IV cs.CV

    Residual-CycleGAN based Camera Adaptation for Robust Diabetic Retinopathy Screening

    Authors: Dalu Yang, Yehui Yang, Tiantian Huang, Binghong Wu, Lei Wang, Yanwu Xu

    Abstract: There are extensive researches focusing on automated diabetic reti-nopathy (DR) detection from fundus images. However, the accuracy drop is ob-served when applying these models in real-world DR screening, where the fun-dus camera brands are different from the ones used to capture the training im-ages. How can we train a classification model on labeled fundus images ac-quired from only one camera b… ▽ More

    Submitted 31 July, 2020; originally announced July 2020.

  32. arXiv:2007.15114  [pdf, other

    eess.SY math.OC q-bio.PE

    Control Strategies for COVID-19 Epidemic with Vaccination, Shield Immunity and Quarantine: A Metric Temporal Logic Approach

    Authors: Zhe Xu, Bo Wu, Ufuk Topcu

    Abstract: Ever since the outbreak of the COVID-19 epidemic, various public health control strategies have been proposed and tested against the coronavirus SARS-CoV-2. We study three specific COVID-19 epidemic control models: the susceptible, exposed, infectious, recovered (SEIR) model with vaccination control; the SEIR model with shield immunity control; and the susceptible, un-quarantined infected, quarant… ▽ More

    Submitted 12 August, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

  33. arXiv:2007.01566  [pdf, other

    eess.AS

    Distortionless Multi-Channel Target Speech Enhancement for Overlapped Speech Recognition

    Authors: Bo Wu, Meng Yu, Lianwu Chen, Yong Xu, Chao Weng, Dan Su, Dong Yu

    Abstract: Speech enhancement techniques based on deep learning have brought significant improvement on speech quality and intelligibility. Nevertheless, a large gain in speech quality measured by objective metrics, such as perceptual evaluation of speech quality (PESQ), does not necessarily lead to improved speech recognition performance due to speech distortion in the enhancement stage. In this paper, a mu… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

  34. arXiv:2006.08357  [pdf, other

    cs.CV eess.IV

    CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

    Authors: Zhen Dong, Dequan Wang, Qijing Huang, Yizhao Gao, Yaohui Cai, Tian Li, Bichen Wu, Kurt Keutzer, John Wawrzynek

    Abstract: Deploying deep learning models on embedded systems has been challenging due to limited computing resources. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and… ▽ More

    Submitted 25 January, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Github repo: https://github.com/DequanWang/CoDeNet arXiv:2002.08357 is the preliminary version of this paper

    Journal ref: FPGA 2021

  35. arXiv:2006.03677  [pdf, other

    cs.CV cs.LG eess.IV

    Visual Transformers: Token-based Image Representation and Processing for Computer Vision

    Authors: Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, Peter Vajda

    Abstract: Computer vision has achieved remarkable success by (a) representing images as uniformly-arranged pixel arrays and (b) convolving highly-localized features. However, convolutions treat all image pixels equally regardless of importance; explicitly model all concepts across all images, regardless of content; and struggle to relate spatially-distant concepts. In this work, we challenge this paradigm b… ▽ More

    Submitted 19 November, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

  36. arXiv:2005.10386  [pdf, other

    eess.AS cs.SD

    End-to-End Multi-Look Keyword Spotting

    Authors: Meng Yu, Xuan Ji, Bo Wu, Dan Su, Dong Yu

    Abstract: The performance of keyword spotting (KWS), measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. In this paper, we propose a multi-look neural network modeling for speech enhancement which simultaneously steers to listen to multiple sampled look directions. The multi-look enhancement is then jointly trained with KWS to form an end-to-end KWS m… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech2020

  37. arXiv:2005.08571  [pdf, other

    eess.AS cs.CL cs.SD

    Audio-visual Multi-channel Recognition of Overlapped Speech

    Authors: Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu. Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng

    Abstract: Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in state-of-the-art ASR systems. Motivated by the invariance of visual modality to acoustic signal corruption, this paper presents an audio-visual multi-channel overlapped speech recognition system featuring tightly integrated separatio… ▽ More

    Submitted 18 November, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: submitted to Interspeech 2020

  38. arXiv:2002.08357  [pdf, other

    eess.IV cs.CV

    Algorithm-hardware Co-design for Deformable Convolution

    Authors: Qijing Huang, Dequan Wang, Yizhao Gao, Yaohui Cai, Zhen Dong, Bichen Wu, Kurt Keutzer, John Wawrzynek

    Abstract: FPGAs provide a flexible and efficient platform to accelerate rapidly-changing algorithms for computer vision. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, including object detection and instance segmentation, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the s… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

    Journal ref: NeurIPS EMC2 2019

  39. arXiv:2001.11539  [pdf, other

    cs.CV cs.LG eess.IV

    Adversarial Code Learning for Image Generation

    Authors: Jiangbo Yuan, Bing Wu, Wanying Ding, Qing Ping, Zhendong Yu

    Abstract: We introduce the "adversarial code learning" (ACL) module that improves overall image generation performance to several types of deep models. Instead of performing a posterior distribution modeling in the pixel spaces of generators, ACLs aim to jointly learn a latent code with another image encoder/inference net, with a prior noise as its input. We conduct the learning in an adversarial learning p… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

  40. Active Task-Inference-Guided Deep Inverse Reinforcement Learning

    Authors: Farzan Memarian, Zhe Xu, Bo Wu, Min Wen, Ufuk Topcu

    Abstract: We consider the problem of reward learning for temporally extended tasks. For reward learning, inverse reinforcement learning (IRL) is a widely used paradigm. Given a Markov decision process (MDP) and a set of demonstrations for a task, IRL learns a reward function that assigns a real-valued reward to each state of the MDP. However, for temporally extended tasks, the underlying reward function may… ▽ More

    Submitted 10 September, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

    Comments: To be published in IEEE Conference on Decision and Control (CDC) 2020

  41. arXiv:2001.05685  [pdf, other

    cs.SD eess.AS

    SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis

    Authors: Bohan Zhai, Tianren Gao, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph E. Gonzalez, Kurt Keutzer

    Abstract: Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGl… ▽ More

    Submitted 16 January, 2020; originally announced January 2020.

  42. arXiv:2001.01656  [pdf, other

    eess.AS cs.SD

    Audio-visual Recognition of Overlapped speech for the LRS2 dataset

    Authors: Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

    Abstract: Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. Three issues associated with the construction of audio-visual speech recognition (AVSR) systems are addressed. First, the basic architecture designs i.e. end-… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

    Comments: 5 pages, 5 figures, submitted to icassp2019

  43. arXiv:2001.00835  [pdf, other

    eess.SY

    Policy Synthesis for Switched Linear Systems with Markov Decision Process Switching

    Authors: Bo Wu, Murat Cubuktepe, Franck Djeumou, Zhe Xu, Ufuk Topcu

    Abstract: We study the synthesis of mode switching protocols for a class of discrete-time switched linear systems in which the mode jumps are governed by Markov decision processes (MDPs). We call such systems MDP-JLS for brevity. Each state of the MDP corresponds to a mode in the switched system. The probabilistic state transitions in the MDP represent the mode transitions. We focus on finding a policy that… ▽ More

    Submitted 3 January, 2020; originally announced January 2020.

    Comments: arXiv admin note: text overlap with arXiv:1904.11456

  44. Erase-hidden and Drivability-improved Magnetic Non-Volatile Flip-Flops with NAND-SPIN Devices

    Authors: Ziyi Wang, Zhaohao Wang, Yansong Xu, Bi Wu, Weisheng Zhao

    Abstract: Non-volatile flip-flops (NVFFs) using power gating techniques promise to overcome the soaring leakage power consumption issue with the scaling of CMOS technology. Magnetic tunnel junction (MTJ) is a good candidate for constructing the NVFF thanks to its low power, high speed, good CMOS compatibility, etc. In this paper, we propose a novel magnetic NVFF based on an emerging memory device called NAN… ▽ More

    Submitted 17 June, 2020; v1 submitted 15 December, 2019; originally announced December 2019.

    Comments: This article has been accepted in a future issue of IEEE Transactions on Nanotechnology: Regular Papers

  45. arXiv:1910.13825  [pdf, ps, other

    eess.AS

    Overlapped speech recognition from a jointly learned multi-channel neural speech extraction and representation

    Authors: Bo Wu, Meng Yu, Lianwu Chen, Chao Weng, Dan Su, Dong Yu

    Abstract: We propose an end-to-end joint optimization framework of a multi-channel neural speech extraction and deep acoustic model without mel-filterbank (FBANK) extraction for overlapped speech recognition. First, based on a multi-channel convolutional TasNet with STFT kernel, we unify the multi-channel target speech enhancement front-end network and a convolutional, long short-term memory and fully conne… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

  46. arXiv:1909.09939  [pdf, other

    eess.SY cs.RO

    Controller Synthesis for Multi-Agent Systems With Intermittent Communication: A Metric Temporal Logic Approach

    Authors: Zhe Xu, Federico M. Zegers, Bo Wu, Warren Dixon, Ufuk Topcu

    Abstract: This paper develops a controller synthesis approach for a multi-agent system (MAS) with intermittent communication. We adopt a leader-follower scheme, where a mobile leader with absolute position sensors switches among a set of followers without absolute position sensors to provide each follower with intermittent state information.We model the MAS as a switched system. The followers are to asympto… ▽ More

    Submitted 22 September, 2019; originally announced September 2019.

  47. Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

    Authors: Zhenyu Tang, Lianwu Chen, Bo Wu, Dong Yu, Dinesh Manocha

    Abstract: We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks. Our physically-based acoustic simulation method is capable of modeling occlusion, specular and diffuse reflections of sound in complicated acoustic environments, whereas the classical image method can only model specular reflections in s… ▽ More

    Submitted 2 April, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

    Comments: Accepted to ICASSP 2020, impulse response generation code at https://github.com/RoyJames/pygsound

    Journal ref: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6969-6973)

  48. arXiv:1906.00884  [pdf, other

    cs.CV eess.IV

    Fashion Editing with Adversarial Parsing Learning

    Authors: Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin

    Abstract: Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value. Existing works often treat it as a general inpainting task and do not fully leverage the semantic structural information in fashion images. Moreover, they directly utilize conventional convolution and normalization layers to re… ▽ More

    Submitted 28 September, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: 22 pages, 18 figures

  49. arXiv:1905.08095  [pdf, other

    eess.SY

    Control Theory Meets POMDPs: A Hybrid Systems Approach

    Authors: Mohamadreza Ahmadi, Nils Jansen, Bo Wu, Ufuk Topcu

    Abstract: Partially observable Markov decision processes (POMDPs) provide a modeling framework for a variety of sequential decision making under uncertainty scenarios in artificial intelligence (AI). Since the states are not directly observable in a POMDP, decision making has to be performed based on the output of a Bayesian filter (continuous beliefs). Hence, POMDPs are often computationally intractable to… ▽ More

    Submitted 17 May, 2019; originally announced May 2019.

    Comments: arXiv admin note: text overlap with arXiv:1810.00093

  50. arXiv:1904.08031  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Hard Sample Mining for the Improved Retraining of Automatic Speech Recognition

    Authors: Jiabin Xue, Jiqing Han, Tieran Zheng, Jiaxing Guo, Boyong Wu

    Abstract: It is an effective way that improves the performance of the existing Automatic Speech Recognition (ASR) systems by retraining with more and more new training data in the target domain. Recently, Deep Neural Network (DNN) has become a successful model in the ASR field. In the training process of the DNN based methods, a back propagation of error between the transcription and the corresponding annot… ▽ More

    Submitted 16 April, 2019; originally announced April 2019.

    Comments: Submitted to Interspeech 2019;