(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–48 of 48 results for author: Zhai, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.21510  [pdf, other

    cs.CV

    PEAR: Phrase-Based Hand-Object Interaction Anticipation

    Authors: Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang

    Abstract: First-person hand-object interaction anticipation aims to predict the interaction process over a forthcoming period based on current scenes and prompts. This capability is crucial for embodied intelligence and human-robot collaboration. The complete interaction process involves both pre-contact interaction intention (i.e., hand motion trends and interaction hotspots) and post-contact interaction m… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 22 pages, 10 figures, 4 tables

  2. arXiv:2407.16131  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Crystals with Transformers on Graphs, for Prediction of Unconventional Crystal Material Properties and the Benchmark

    Authors: Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong, Bolong Huang, Hua Zhang

    Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of cr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  3. arXiv:2405.13659  [pdf, other

    cs.CV

    EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views

    Authors: Yuhang Yang, Wei Zhai, Chengfeng Wang, Chengjun Yu, Yang Cao, Zheng-Jun Zha

    Abstract: Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception, facilitating applications like AR/VR and embodied AI. For the egocentric HOI, in addition to perceiving semantics e.g., ''what'' interaction is occurring, capturing ''where'' the interaction specifically manifests in 3D space is also crucial, which links the perception and operation. Existi… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 23 pages,10 figures

  4. arXiv:2405.11794  [pdf, other

    cs.CV

    ViViD: Video Virtual Try-on using Diffusion Models

    Authors: Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu Chen, Zhiheng Liu, Yang Cao, Zheng-Jun Zha

    Abstract: Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful… ▽ More

    Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  5. arXiv:2405.07030  [pdf

    cs.LG

    Lasso Ridge based XGBoost and Deep_LSTM Help Tennis Players Perform better

    Authors: Wankang Zhai, Yuhan Wang

    Abstract: Understanding the dynamics of momentum and game fluctuation in tennis matches is cru-cial for predicting match outcomes and enhancing player performance. In this study, we present a comprehensive analysis of these factors using a dataset from the 2023 Wimbledon final. Ini-tially, we develop a sliding-window-based scoring model to assess player performance, ac-counting for the influence of serving… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 22 pages, 11 figures

    MSC Class: I.2

  6. arXiv:2405.06992  [pdf

    cs.LG stat.AP

    ResSurv: Cancer Survival Analysis Prediction Model Based on Residual Networks

    Authors: Wankang Zhai

    Abstract: Survival prediction is an important branch of cancer prognosis analysis. The model that predicts survival risk through TCGA genomics data can discover genes related to cancer and provide diagnosis and treatment recommendations based on patient characteristics. We found that deep learning models based on Cox proportional hazards often suffer from overfitting when dealing with high-throughput data.… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 7pages, 7figures

    MSC Class: I.2

  7. arXiv:2405.05552  [pdf, other

    cs.CV

    Bidirectional Progressive Transformer for Interaction Intention Anticipation

    Authors: Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang

    Abstract: Interaction intention anticipation aims to jointly predict future hand trajectories and interaction hotspots. Existing research often treated trajectory forecasting and interaction hotspots prediction as separate tasks or solely considered the impact of trajectories on interaction hotspots, which led to the accumulation of prediction errors over time. However, a deeper inherent connection exists b… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  8. arXiv:2404.17835  [pdf, other

    cs.CL

    VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition

    Authors: Junyi Biana, Weiqi Zhai, Xiaodi Huang, Jiaxuan Zheng, Shanfeng Zhu

    Abstract: Prevalent solution for BioNER involves using representation learning techniques coupled with sequence labeling. However, such methods are inherently task-specific, demonstrate poor generalizability, and often require dedicated model for each dataset. To leverage the versatile capabilities of recently remarkable large language models (LLMs), several endeavors have explored generative approaches to… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  9. arXiv:2404.12659  [pdf, ps, other

    cs.CL

    SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

    Authors: Hongzhi Qi, Hanfei Liu, Jianqiang Li, Qing Zhao, Wei Zhai, Dan Luo, Tian Yu He, Shuo Liu, Bing Xiang Yang, Guanghui Fu

    Abstract: In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  10. arXiv:2404.12083  [pdf, other

    cs.CV

    MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking

    Authors: Zhong Wang, Zengyu Wan, Han Han, Bohao Liao, Yuliang Wu, Wei Zhai, Yang Cao, Zheng-jun Zha

    Abstract: Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy provided by the event camera. However, the diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization. To achieve a stable event-based eye-tracking system, this paper proposes a bidirectional long-… ▽ More

    Submitted 30 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024 Workshop (AIS: Vision, Graphics and AI for Streaming), top solution of challenge Event-based Eye Tracking, see https://www.kaggle.com/competitions/event-based-eye-tracking-ais2024

  11. arXiv:2404.11770  [pdf, other

    cs.CV cs.AI

    Event-Based Eye Tracking. AIS 2024 Challenge Survey

    Authors: Zuowen Wang, Chang Gao, Zongwei Wu, Marcos V. Conde, Radu Timofte, Shih-Chii Liu, Qinyu Chen, Zheng-jun Zha, Wei Zhai, Han Han, Bohao Liao, Yuliang Wu, Zengyu Wan, Zhong Wang, Yang Cao, Ganchao Tan, Jinze Chen, Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Oliver Coenen, Baoheng Zhang, Yizhao Gao, Jingyuan Li , et al. (14 additional authors not shown)

    Abstract: This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggl… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Qinyu Chen is the corresponding author

  12. arXiv:2404.11449  [pdf, other

    cs.CL cs.LG

    AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

    Authors: Meng Jiang, Yi Jing Yu, Qing Zhao, Jianqiang Li, Changwei Song, Hongzhi Qi, Wei Zhai, Dan Luo, Xiaoqin Wang, Guanghui Fu, Bing Xiang Yang

    Abstract: Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  13. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  14. arXiv:2403.12381  [pdf, other

    cs.CE

    Explainable AutoML (xAutoML) with adaptive modeling for yield enhancement in semiconductor smart manufacturing

    Authors: Weihong Zhai, Xiupeng Shi, Yiik Diew Wong, Qing Han, Lisheng Chen

    Abstract: Enhancing yield is recognized as a paramount driver to reducing production costs in semiconductor smart manufacturing. However, optimizing and ensuring high yield rates is a highly complex and technical challenge, especially while maintaining reliable yield diagnosis and prognosis, and this shall require understanding all the confounding factors in a complex condition. This study proposes a domain… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  15. arXiv:2403.09392  [pdf, other

    eess.IV cs.CV

    Event-based Asynchronous HDR Imaging by Temporal Incident Light Modulation

    Authors: Yuliang Wu, Ganchao Tan, Jinze Chen, Wei Zhai, Yang Cao, Zheng-Jun Zha

    Abstract: Dynamic Range (DR) is a pivotal characteristic of imaging systems. Current frame-based cameras struggle to achieve high dynamic range imaging due to the conflict between globally uniform exposure and spatially variant scene illumination. In this paper, we propose AsynHDR, a Pixel-Asynchronous HDR imaging system, based on key insights into the challenges in HDR imaging and the unique event-generati… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  16. arXiv:2403.09194  [pdf, other

    cs.CV

    Intention-driven Ego-to-Exo Video Generation

    Authors: Hongchen Luo, Kai Zhu, Wei Zhai, Yang Cao

    Abstract: Ego-to-exo video generation refers to generating the corresponding exocentric video according to the egocentric video, providing valuable applications in AR/VR and embodied AI. Benefiting from advancements in diffusion model techniques, notable progress has been achieved in video generation. However, existing methods build upon the spatiotemporal consistency assumptions between adjacent frames, wh… ▽ More

    Submitted 17 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  17. arXiv:2402.09151  [pdf, other

    cs.CL cs.LG

    Chinese MentalBERT: Domain-Adaptive Pre-training on Social Media for Chinese Mental Health Text Analysis

    Authors: Wei Zhai, Hongzhi Qi, Qing Zhao, Jianqiang Li, Ziqi Wang, Han Wang, Bing Xiang Yang, Guanghui Fu

    Abstract: In the current environment, psychological issues are prevalent and widespread, with social media serving as a key outlet for individuals to share their feelings. This results in the generation of vast quantities of data daily, where negative emotions have the potential to precipitate crisis situations. There is a recognized need for models capable of efficient analysis. While pre-trained language… ▽ More

    Submitted 12 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  18. arXiv:2312.08963  [pdf, other

    cs.CV

    LEMON: Learning 3D Human-Object Interaction Relation from 2D Images

    Authors: Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Zheng-Jun Zha

    Abstract: Learning 3D human-object interaction relation is pivotal to embodied AI and interaction modeling. Most existing methods approach the goal by learning to predict isolated interaction elements, e.g., human contact, object affordance, and human-object spatial relation, primarily from the perspective of either the human or the object. Which underexploit certain correlations between the interaction cou… ▽ More

    Submitted 30 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: accept by CVPR2024

  19. arXiv:2312.01732  [pdf, other

    cs.CV

    Likelihood-Aware Semantic Alignment for Full-Spectrum Out-of-Distribution Detection

    Authors: Fan Lu, Kai Zhu, Kecheng Zheng, Wei Zhai, Yang Cao

    Abstract: Full-spectrum out-of-distribution (F-OOD) detection aims to accurately recognize in-distribution (ID) samples while encountering semantic and covariate shifts simultaneously. However, existing out-of-distribution (OOD) detectors tend to overfit the covariance information and ignore intrinsic semantic correlation, inadequate for adapting to complex domain transformations. To address this issue, we… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 16 pages, 7 figures

  20. arXiv:2310.04910  [pdf, other

    cs.CL cs.AI

    Faithful Knowledge Graph Explanations for Commonsense Reasoning

    Authors: Weihe Zhai, Arkaitz Zubiaga

    Abstract: The fusion of language models (LMs) and knowledge graphs (KGs) is widely used in commonsense question answering, but generating faithful explanations remains challenging. Current methods often overlook path decoding faithfulness, leading to divergence between graph encoder outputs and model predictions. We identify confounding effects and LM-KG misalignment as key factors causing spurious explanat… ▽ More

    Submitted 22 June, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

  21. arXiv:2309.12943  [pdf, other

    cs.CV

    Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

    Authors: Wei Zhai, Pingyu Wu, Kai Zhu, Yang Cao, Feng Wu, Zheng-Jun Zha

    Abstract: Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While existing FPM-based methods use cross-entropy to evaluate the foreground prediction map and to guide the learning of the generator, this paper presents tw… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: Accepted by IJCV. arXiv admin note: text overlap with arXiv:2112.00580

  22. arXiv:2309.03564  [pdf, other

    cs.CL cs.LG

    Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media

    Authors: Hongzhi Qi, Qing Zhao, Jianqiang Li, Changwei Song, Wei Zhai, Dan Luo, Shuo Liu, Yi Jing Yu, Fan Wang, Huijing Zou, Bing Xiang Yang, Guanghui Fu

    Abstract: On social media, users often express their personal feelings, which may exhibit cognitive distortions or even suicidal tendencies on certain specific topics. Early recognition of these signs is critical for effective psychological intervention. In this paper, we introduce two novel datasets from Chinese social media: SOS-HL-1K for suicidal risk classification and SocialCD-3K for cognitive distorti… ▽ More

    Submitted 9 June, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: 10 pages

  23. arXiv:2308.15192  [pdf, other

    cs.AI cs.CL

    Enhancing Psychological Counseling with Large Language Model: A Multifaceted Decision-Support System for Non-Professionals

    Authors: Guanghui Fu, Qing Zhao, Jianqiang Li, Dan Luo, Changwei Song, Wei Zhai, Shuo Liu, Fan Wang, Yan Wang, Lijuan Cheng, Juan Zhang, Bing Xiang Yang

    Abstract: In the contemporary landscape of social media, an alarming number of users express negative emotions, some of which manifest as strong suicidal intentions. This situation underscores a profound need for trained psychological counselors who can enact effective mental interventions. However, the development of these professionals is often an imperative but time-consuming task. Consequently, the mobi… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  24. arXiv:2308.11981  [pdf, other

    cs.DC

    Federated Semi-Supervised and Semi-Asynchronous Learning for Anomaly Detection in IoT Networks

    Authors: Wenbin Zhai, Feng Wang, Liang Liu, Youwei Ding, Wanying Lu

    Abstract: Existing FL-based approaches are based on the unrealistic assumption that the data on the client-side is fully annotated with ground truths. Furthermore, it is a great challenge how to improve the training efficiency while ensuring the detection accuracy in the highly heterogeneous and resource-constrained IoT networks. Meanwhile, the communication cost between clients and the server is also a pro… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: 18 pages, 5 figures

  25. arXiv:2308.11977  [pdf, other

    cs.DC

    ESTA: An Efficient Spatial-Temporal Range Aggregation Query Processing Algorithm for UAV Networks

    Authors: Wenbin Zhai, Xin Li, Liang Liu, Youwei Ding, Wanying Lu

    Abstract: Unmanned Aerial Vehicle (UAV) networks have been widely used in both military and civilian scenarios. When users are interested in the statistical information of the historical sensory data in a certain region during a certain time period, they will send an aggregation query request with a spatial-temporal constraint to target UAVs which store the qualified data. Then, the target UAVs will return… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: 14 pages, 14 figures

  26. arXiv:2306.15977  [pdf, other

    cs.CV cs.AI

    A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning

    Authors: Lingyu Si, Hongwei Dong, Wenwen Qiang, Junzhi Yu, Wenlong Zhai, Changwen Zheng, Fanjiang Xu, Fuchun Sun

    Abstract: Due to limitations in data quality, some essential visual tasks are difficult to perform independently. Introducing previously unavailable information to transfer informative dark knowledge has been a common way to solve such hard tasks. However, research on why transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  27. arXiv:2306.15736  [pdf, other

    cs.CL

    DMNER: Biomedical Entity Recognition by Detection and Matching

    Authors: Junyi Bian, Rongze Jiang, Weiqi Zhai, Tianyang Huang, Hong Zhou, Shanfeng Zhu

    Abstract: Biomedical named entity recognition (BNER) serves as the foundation for numerous biomedical text mining tasks. Unlike general NER, BNER require a comprehensive grasp of the domain, and incorporating external knowledge beyond training data poses a significant challenge. In this study, we propose a novel BNER framework called DMNER. By leveraging existing entity representation models SAPBERT, we tac… ▽ More

    Submitted 5 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: 9 pages content, 2 pages appendix

  28. arXiv:2303.10449  [pdf, other

    cs.CV

    Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection

    Authors: Fan Lu, Kai Zhu, Wei Zhai, Kecheng Zheng, Yang Cao

    Abstract: Semantically coherent out-of-distribution (SCOOD) detection aims to discern outliers from the intended data distribution with access to unlabeled extra set. The coexistence of in-distribution and out-of-distribution samples will exacerbate the model overfitting when no distinction is made. To address this problem, we propose a novel uncertainty-aware optimal transport scheme. Our scheme consists o… ▽ More

    Submitted 21 March, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023

  29. arXiv:2303.10438  [pdf, other

    cs.CV

    Spatial-Aware Token for Weakly Supervised Object Localization

    Authors: Pingyu Wu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha

    Abstract: Weakly supervised object localization (WSOL) is a challenging task aiming to localize objects with only image-level supervision. Recent works apply visual transformer to WSOL and achieve significant success by exploiting the long-range feature dependency in self-attention mechanism. However, existing transformer-based methods synthesize the classification feature maps as the localization map, whic… ▽ More

    Submitted 9 August, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted by ICCV 2023. Code:https://github.com/wpy1999/SAT

  30. arXiv:2303.10437  [pdf, other

    cs.CV

    Grounding 3D Object Affordance from 2D Interactions in Images

    Authors: Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Jiebo Luo, Zheng-Jun Zha

    Abstract: Grounding 3D object affordance seeks to locate objects' ''action possibilities'' regions in the 3D space, which serves as a link between perception and operation for embodied agents. Existing studies primarily focus on connecting visual affordances with geometry structures, e.g. relying on annotations to declare interactive regions of interest on the object and establishing a mapping between the r… ▽ More

    Submitted 9 August, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: ICCV2023, camera-ready version

  31. arXiv:2208.13196  [pdf, other

    cs.CV

    Grounded Affordance from Exocentric View

    Authors: Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

    Abstract: Affordance grounding aims to locate objects' "action possibilities" regions, which is an essential step toward embodied intelligence. Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels. Human has the ability that transforms the variou… ▽ More

    Submitted 25 May, 2023; v1 submitted 28 August, 2022; originally announced August 2022.

    Comments: arXiv admin note: text overlap with arXiv:2203.09905

  32. arXiv:2203.11849  [pdf, other

    cs.CL cs.CR cs.LG

    A Girl Has A Name, And It's ... Adversarial Authorship Attribution for Deobfuscation

    Authors: Wanyue Zhai, Jonathan Rusert, Zubair Shafiq, Padmini Srinivasan

    Abstract: Recent advances in natural language processing have enabled powerful privacy-invasive authorship attribution. To counter authorship attribution, researchers have proposed a variety of rule-based and learning-based text obfuscation approaches. However, existing authorship obfuscation approaches do not consider the adversarial threat model. Specifically, they are not evaluated against adversarially… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: 9 pages, 7 figures, 3 tables, ACL 2022

  33. arXiv:2203.09905  [pdf, other

    cs.CV

    Learning Affordance Grounding from Exocentric Images

    Authors: Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

    Abstract: Affordance grounding, a task to ground (i.e., localize) action possibility region in objects, which faces the challenge of establishing an explicit link with object parts due to the diversity of interactive affordance. Human has the ability that transform the various exocentric interactions to invariant egocentric affordance so as to counter the impact of interactive diversity. To empower an agent… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: CVPR2022

  34. arXiv:2203.09845  [pdf, other

    cs.CV

    Location-Free Camouflage Generation Network

    Authors: Yangyang Li, Wei Zhai, Yang Cao, Zheng-jun Zha

    Abstract: Camouflage is a common visual phenomenon, which refers to hiding the foreground objects into the background images, making them briefly invisible to the human eye. Previous work has typically been implemented by an iterative optimization process. However, these methods struggle in 1) efficiently generating camouflage images using foreground and background with arbitrary structure; 2) camouflaging… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

  35. arXiv:2203.06359  [pdf, other

    cs.CV

    Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning

    Authors: Kai Zhu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha

    Abstract: Non-exemplar class-incremental learning is to recognize both the old and new classes when old class samples cannot be saved. It is a challenging task since representation optimization and feature retention can only be achieved under supervision from new classes. To address this problem, we propose a novel self-sustaining representation expansion scheme. Our scheme consists of a structure reorganiz… ▽ More

    Submitted 16 March, 2022; v1 submitted 12 March, 2022; originally announced March 2022.

    Comments: Camera_Ready Version for CVPR 2022

  36. arXiv:2202.12076  [pdf, other

    cs.CV cs.AI

    Phrase-Based Affordance Detection via Cyclic Bilateral Interaction

    Authors: Liangsheng Lu, Wei Zhai, Hongchen Luo, Yu Kang, Yang Cao

    Abstract: Affordance detection, which refers to perceiving objects with potential action possibilities in images, is a challenging task since the possible affordance depends on the person's purpose in real-world application scenarios. The existing works mainly extract the inherent human-object dependencies from image/video to accommodate affordance properties that change dynamically. In this paper, we explo… ▽ More

    Submitted 24 February, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  37. arXiv:2112.00580  [pdf, other

    cs.CV

    Background Activation Suppression for Weakly Supervised Object Localization

    Authors: Pingyu Wu, Wei Zhai, Yang Cao

    Abstract: Weakly supervised object localization (WSOL) aims to localize objects using only image-level labels. Recently a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve localization task. Existing FPM-based methods use cross-entropy (CE) to evaluate the foreground prediction map and to guide the learning of generator. We argue for using activation value to achieve more e… ▽ More

    Submitted 2 April, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: Accepted by CVPR 2022. Code: https://github.com/wpy1999/BAS

  38. arXiv:2110.05700  [pdf, other

    cs.CV

    On Exploring and Improving Robustness of Scene Text Detection Models

    Authors: Shilian Wu, Wei Zhai, Yongrui Li, Kewei Wang, Zengfu Wang

    Abstract: It is crucial to understand the robustness of text detection models with regard to extensive corruptions, since scene text detection techniques have many practical applications. For systematically exploring this problem, we propose two datasets from which to evaluate scene text detection models: ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C). Our study extends the investigation of the performance and… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

  39. arXiv:2108.05675  [pdf, other

    cs.CV

    Learning Visual Affordance Grounding from Demonstration Videos

    Authors: Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

    Abstract: Visual affordance grounding aims to segment all possible interaction regions between people and objects from an image/video, which is beneficial for many applications, such as robot grasping and action recognition. However, existing methods mainly rely on the appearance feature of the objects to segment each region of the image, which face the following two problems: (i) there are multiple possibl… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

  40. arXiv:2108.03658  [pdf, other

    cs.CV

    One-Shot Object Affordance Detection in the Wild

    Authors: Wei Zhai, Hongchen Luo, Jing Zhang, Yang Cao, Dacheng Tao

    Abstract: Affordance detection refers to identifying the potential action possibilities of objects in an image, which is a crucial ability for robot perception and manipulation. To empower robots with this ability in unseen scenarios, we first study the challenging one-shot affordance detection problem in this paper, i.e., given a support image that depicts the action purpose, all objects in a scene with th… ▽ More

    Submitted 8 August, 2021; originally announced August 2021.

  41. arXiv:2107.08918  [pdf, other

    cs.CV

    Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning

    Authors: Kai Zhu, Yang Cao, Wei Zhai, Jie Cheng, Zheng-Jun Zha

    Abstract: Few-shot class-incremental learning is to recognize the new classes given few samples and not forget the old classes. It is a challenging task since representation optimization and prototype reorganization can only be achieved under little supervision. To address this problem, we propose a novel incremental prototype learning scheme. Our scheme consists of a random episode selection strategy that… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: Accepted by CVPR 2021

  42. arXiv:2106.14747  [pdf, other

    cs.CV

    One-Shot Affordance Detection

    Authors: Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

    Abstract: Affordance detection refers to identifying the potential action possibilities of objects in an image, which is an important ability for robot perception and manipulation. To empower robots with this ability in unseen scenarios, we consider the challenging one-shot affordance detection problem in this paper, i.e., given a support image that depicts the action purpose, all objects in a scene with th… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

  43. arXiv:2103.00829  [pdf, ps, other

    cs.IT eess.SP

    6G Downlink Transmission via Rate Splitting Space Division Multiple Access Based on Grouped Code Index Modulation

    Authors: Wenchao Zhai, Yishan Wu, Jun Zhao, Huimei Han

    Abstract: A novel rate splitting space division multiple access (SDMA) scheme based on grouped code index modulation (GrCIM) is proposed for the sixth generation (6G) downlink transmission. The proposed RSMA-GrCIM scheme transmits information to multiple user equipments (UEs) through the space division multiple access (SDMA) technique, and exploits code index modulation for rate splitting. Since the CIM sch… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

  44. arXiv:2012.13539  [pdf, ps, other

    cs.IT eess.SP

    A GCICA Grant-Free Random Access Scheme for M2M Communications in Crowded Massive MIMO Systems

    Authors: Huimei Han, Lushun Fang, Weidang Lu, Wenchao Zhai, Ying Li, Jun Zhao

    Abstract: A high success rate of grant-free random access scheme is proposed to support massive access for machine-to-machine communications in massive multipleinput multiple-output systems. This scheme allows active user equipments (UEs) to transmit their modulated uplink messages along with super pilots consisting of multiple sub-pilots to a base station (BS). Then, the BS performs channel state informati… ▽ More

    Submitted 25 December, 2020; originally announced December 2020.

  45. arXiv:2012.13537  [pdf, ps, other

    eess.SP cs.NI

    An LSTM-Aided Hybrid Random Access Scheme for 6G Machine Type Communication Networks

    Authors: Wenchao Zhai, Huimei Han, Lei Liu, Jun Zhao

    Abstract: In this paper, an LSTM-aided hybrid random access scheme (LSTMH-RA) is proposed to support diverse quality of service (QoS) requirements in 6G machine-type communication (MTC) networks, where massive MTC (mMTC) devices and ultra-reliable low latency communications (URLLC) devices coexist. In the proposed LSTMH-RA scheme, mMTC devices access the network via a timing advance (TA)-aided four-step pro… ▽ More

    Submitted 29 July, 2022; v1 submitted 25 December, 2020; originally announced December 2020.

  46. arXiv:2004.05538  [pdf, other

    cs.CV

    Self-Supervised Tuning for Few-Shot Segmentation

    Authors: Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao

    Abstract: Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples. It is a challenging task since the dense prediction can only be achieved under the guidance of latent features defined by sparse annotations. Existing meta-learning method tends to fail in generating category-specifically discriminative descriptor when the visual features extracted from support… ▽ More

    Submitted 13 December, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

    Comments: Accepted to IJCAI 2020

  47. arXiv:1905.06656  [pdf, other

    cs.CV cs.AI

    One-Shot Texture Retrieval with Global Context Metric

    Authors: Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao

    Abstract: In this paper, we tackle one-shot texture retrieval: given an example of a new reference texture, detect and segment all the pixels of the same texture category within an arbitrary image. To address this problem, we present an OS-TR network to encode both reference and query image, leading to achieve texture segmentation towards the reference category. Unlike the existing texture encoding methods… ▽ More

    Submitted 11 April, 2020; v1 submitted 16 May, 2019; originally announced May 2019.

    Comments: ijcai2019-lastest

  48. arXiv:1902.07137  [pdf, ps, other

    cs.LG stat.ML

    Recovery of a mixture of Gaussians by sum-of-norms clustering

    Authors: Tao Jiang, Stephen Vavasis, Chen Wen Zhai

    Abstract: Sum-of-norms clustering is a method for assigning $n$ points in $\mathbb{R}^d$ to $K$ clusters, $1\le K\le n$, using convex optimization. Recently, Panahi et al.\ proved that sum-of-norms clustering is guaranteed to recover a mixture of Gaussians under the restriction that the number of samples is not too large. The purpose of this note is to lift this restriction, i.e., show that sum-of-norms clu… ▽ More

    Submitted 19 February, 2019; originally announced February 2019.