(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–8 of 8 results for author: Zhang, C Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.04840  [pdf, other

    cs.SD eess.AS

    TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

    Authors: Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen

    Abstract: Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: acceped by interspeech 2024

  2. arXiv:2309.06780  [pdf, other

    cs.SD eess.AS

    Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

    Authors: Chu Yuan Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Xinrui Yan

    Abstract: Recent strides in neural speech synthesis technologies, while enjoying widespread applications, have nonetheless introduced a series of challenges, spurring interest in the defence against the threat of misuse and abuse. Notably, source attribution of synthesized speech has value in forensics and intellectual property protection, but prior work in this area has certain limitations in scope. To add… ▽ More

    Submitted 15 June, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by CCL 2024

  3. arXiv:2308.14970  [pdf, other

    cs.SD eess.AS

    Audio Deepfake Detection: A Survey

    Authors: Jiangyan Yi, Chenglong Wang, Jianhua Tao, Xiaohui Zhang, Chu Yuan Zhang, Yan Zhao

    Abstract: Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluati… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  4. arXiv:2305.13774  [pdf, other

    cs.SD eess.AS

    ADD 2023: the Second Audio Deepfake Detection Challenge

    Authors: Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li

    Abstract: Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on s… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  5. arXiv:2303.12704  [pdf, other

    cs.CV eess.IV

    AptSim2Real: Approximately-Paired Sim-to-Real Image Translation

    Authors: Charles Y Zhang, Ashish Shrivastava

    Abstract: Advancements in graphics technology has increased the use of simulated data for training machine learning models. However, the simulated data often differs from real-world data, creating a distribution gap that can decrease the efficacy of models trained on simulation data in real-world applications. To mitigate this gap, sim-to-real domain transfer modifies simulated images to better match real-w… ▽ More

    Submitted 23 March, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  6. arXiv:2302.03839  [pdf, other

    eess.IV cs.CV cs.LG

    Futuristic Variations and Analysis in Fundus Images Corresponding to Biological Traits

    Authors: Muhammad Hassan, Hao Zhang, Ahmed Fateh Ameen, Home Wu Zeng, Shuye Ma, Wen Liang, Dingqi Shang, Jiaming Ding, Ziheng Zhan, Tsz Kwan Lam, Ming Xu, Qiming Huang, Dongmei Wu, Can Yang Zhang, Zhou You, Awiwu Ain, Pei Wu Qin

    Abstract: Fundus image captures rear of an eye, and which has been studied for the diseases identification, classification, segmentation, generation, and biological traits association using handcrafted, conventional, and deep learning methods. In biological traits estimation, most of the studies have been carried out for the age prediction and gender classification with convincing results. However, the curr… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: 10 pages, 4 figures, 3 tables

  7. arXiv:2212.10191  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Emotion Selectable End-to-End Text-based Speech Editing

    Authors: Tao Wang, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen, Chu Yuan Zhang

    Abstract: Text-based speech editing allows users to edit speech by intuitively cutting, copying, and pasting text to speed up the process of editing speech. In the previous work, CampNet (context-aware mask prediction network) is proposed to realize text-based speech editing, significantly improving the quality of edited speech. This paper aims at a new task: adding emotional effect to the editing speech du… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: Under review, 12 pages, 11 figures, demo page is available at https://hairuo55.github.io/Emo-CampNet/

  8. arXiv:2211.06073  [pdf, other

    cs.SD cs.CL eess.AS

    SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

    Authors: Jiangyan Yi, Chenglong Wang, Jianhua Tao, Chu Yuan Zhang, Cunhang Fan, Zhengkun Tian, Haoxin Ma, Ruibo Fu

    Abstract: Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio. These datasets leave out a scenario, in which the acoustic scene of an original audio is manipulated with a forged one. It will pose a major threat to our society i… ▽ More

    Submitted 4 April, 2024; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: Accepted by Pattern Recognition, 1 April 2024