(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 531 results for author: Hu, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00403  [pdf, other

    cs.LG cs.AI

    Dual-perspective Cross Contrastive Learning in Graph Transformers

    Authors: Zelin Yao, Chuang Liu, Xueqi Ma, Mukun Chen, Jia Wu, Xiantao Cai, Bo Du, Wenbin Hu

    Abstract: Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollabl… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE TKDE

  2. arXiv:2405.20279  [pdf, other

    cs.CV cs.AI eess.IV

    CV-VAE: A Compatible Video VAE for Latent Generative Video Models

    Authors: Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan

    Abstract: Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent ex… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project Page: https://ailab-cvc.github.io/cvvae/index.html

  3. arXiv:2405.19958  [pdf, other

    cs.CL cs.AI

    Multi-Aspect Controllable Text Generation with Disentangled Counterfactual Augmentation

    Authors: Yi Liu, Xiangyu Liu, Xiangrong Zhu, Wei Hu

    Abstract: Multi-aspect controllable text generation aims to control the generated texts in attributes from multiple aspects (e.g., "positive" from sentiment and "sport" from topic). For ease of obtaining training samples, existing works neglect attribute correlations formed by the intertwining of different attributes. Particularly, the stereotype formed by imbalanced attribute correlations significantly aff… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  4. arXiv:2405.19782  [pdf, other

    cs.SE cs.CL

    Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

    Authors: Wei Cheng, Yuhan Wu, Wei Hu

    Abstract: Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories. Previous studies retrieve cross-file context based on import relations or text similarity, which is insufficiently relevant to completion targets. In this paper, we pr… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  5. arXiv:2405.19315  [pdf, other

    cs.CV cs.CL cs.LG

    Matryoshka Query Transformer for Large Vision-Language Models

    Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

    Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

  6. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  7. arXiv:2405.17875  [pdf, other

    math.OC cs.LG

    BO4IO: A Bayesian optimization approach to inverse optimization with uncertainty quantification

    Authors: Yen-An Lu, Wei-Shou Hu, Joel A. Paulson, Qi Zhang

    Abstract: This work addresses data-driven inverse optimization (IO), where the goal is to estimate unknown parameters in an optimization model from observed decisions that can be assumed to be optimal or near-optimal solutions to the optimization problem. The IO problem is commonly formulated as a large-scale bilevel program that is notoriously difficult to solve. Deviating from traditional exact solution m… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  8. arXiv:2405.17811  [pdf, other

    cs.GR cs.CV

    Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

    Authors: Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang, Qi Zhang, Wenbo Hu, Chaopeng Zhang, Yao Yao, Ying Shan, Long Quan

    Abstract: Neural 3D representations such as Neural Radiance Fields (NeRF), excel at producing photo-realistic rendering results but lack the flexibility for manipulation and editing which is crucial for content creation. Previous works have attempted to address this issue by deforming a NeRF in canonical space or manipulating the radiance field based on an explicit mesh. However, manipulating NeRF is not hi… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page here: https://gaoxiangjun.github.io/mani_gs/

  9. arXiv:2405.16122  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars

    Authors: Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low

    Abstract: Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar s… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 23 pages, 1 figure, 23 tables

  10. arXiv:2405.14545  [pdf, other

    q-bio.BM cs.LG

    A Cross-Field Fusion Strategy for Drug-Target Interaction Prediction

    Authors: Hongzhi Zhang, Xiuwen Gong, Shirui Pan, Jia Wu, Bo Du, Wenbin Hu

    Abstract: Drug-target interaction (DTI) prediction is a critical component of the drug discovery process. In the drug development engineering field, predicting novel drug-target interactions is extremely crucial.However, although existing methods have achieved high accuracy levels in predicting known drugs and drug targets, they fail to utilize global protein information during DTI prediction. This leads to… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  11. arXiv:2405.14536  [pdf, other

    q-bio.MN cs.AI cs.LG

    Regressor-free Molecule Generation to Support Drug Response Prediction

    Authors: Kun Li, Xiuwen Gong, Shirui Pan, Jia Wu, Bo Du, Wenbin Hu

    Abstract: Drug response prediction (DRP) is a crucial phase in drug discovery, and the most important metric for its evaluation is the IC50 score. DRP results are heavily dependent on the quality of the generated molecules. Existing molecule generation methods typically employ classifier-based guidance, enabling sampling within the IC50 classification range. However, these methods fail to ensure the samplin… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages, 7 figures, 9 tables,

  12. arXiv:2405.13568  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    CPE-Identifier: Automated CPE identification and CVE summaries annotation with Deep Learning and NLP

    Authors: Wanyu Hu, Vrizlynn L. L. Thing

    Abstract: With the drastic increase in the number of new vulnerabilities in the National Vulnerability Database (NVD) every year, the workload for NVD analysts to associate the Common Platform Enumeration (CPE) with the Common Vulnerabilities and Exposures (CVE) summaries becomes increasingly laborious and slow. The delay causes organisations, which depend on NVD for vulnerability management and security me… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: International Conference on Information Systems Security and Privacy 2024

  13. arXiv:2405.11456  [pdf, other

    cs.CR

    Biometrics-Based Authenticated Key Exchange with Multi-Factor Fuzzy Extractor

    Authors: Hong Yen Tran, Jiankun Hu, Wen Hu

    Abstract: Existing fuzzy extractors and similar methods provide an effective way for extracting a secret key from a user's biometric data, but are susceptible to impersonation attack: once a valid biometric sample is captured, the scheme is no longer secure. We propose a novel multi-factor fuzzy extractor that integrates both a user's secret (e.g., a password) and a user's biometrics in the generation and r… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 17 pages

  14. arXiv:2405.11001  [pdf, other

    astro-ph.IM astro-ph.EP cs.RO

    Using physics-based simulation towards eliminating empiricism in extraterrestrial terramechanics applications

    Authors: Wei Hu, Pei Li, Arno Rogg, Alexander Schepelmann, Colin Creager, Samuel Chandler, Ken Kamrin, Dan Negrut

    Abstract: Recently, there has been a surge of international interest in extraterrestrial exploration targeting the Moon, Mars, the moons of Mars, and various asteroids. This contribution discusses how current state-of-the-art Earth-based testing for designing rovers and landers for these missions currently leads to overly optimistic conclusions about the behavior of these devices upon deployment on the targ… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  15. arXiv:2405.10642  [pdf, other

    cs.LG

    Hi-GMAE: Hierarchical Graph Masked Autoencoders

    Authors: Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu, Shirui Pan, Bo Du

    Abstract: Graph Masked Autoencoders (GMAEs) have emerged as a notable self-supervised learning approach for graph-structured data. Existing GMAE models primarily focus on reconstructing node-level information, categorizing them as single-scale GMAEs. This methodology, while effective in certain contexts, tends to overlook the complex hierarchical structures inherent in many real-world graphs. For instance,… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, 3 tables

  16. arXiv:2405.10288  [pdf, other

    cs.CL cs.AI

    Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction

    Authors: Jianhao Chen, Haoyuan Ouyang, Junyang Ren, Wentao Ding, Wei Hu, Yuzhong Qu

    Abstract: Facts extraction is pivotal for constructing knowledge graphs. Recently, the increasing demand for temporal facts in downstream tasks has led to the emergence of the task of temporal fact extraction. In this paper, we specifically address the extraction of temporal facts from natural language text. Previous studies fail to handle the challenge of establishing time-to-fact correspondences in comple… ▽ More

    Submitted 2 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL2024 main conference

  17. arXiv:2405.06424  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

    Authors: JoonHo Lee, Jae Oh Woo, Juree Seok, Parisa Hassanzadeh, Wooseok Jang, JuYoun Son, Sima Didari, Baruch Gutow, Heng Hao, Hankyu Moon, Wenjun Hu, Yeong-Dae Kwon, Taehee Lee, Seungjai Min

    Abstract: Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for t… ▽ More

    Submitted 19 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  18. arXiv:2405.05708  [pdf

    cs.IT

    Characteristic-Mode Based Conformal Design of Ultra-Wideband Antenna Array

    Authors: Zhan Chen, Wei Hu, Yuchen Gao, Qi Luo, Xiangbo Wang, Steven Gao

    Abstract: An innovative design method of conformal array antennas is presented by utilizing characteristic mode analysis (CMA) in this work. A single-layer continuous perfect electric conductor under bending conditions is conducted by CMA to evaluate the variations in operating performance. By using this method, the design process of a conformal array is simplified. The results indicate that the operating p… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  19. Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids

    Authors: Junchen Liu, Wenbo Hu, Zhuo Yang, Jianteng Chen, Guoliang Wang, Xiaoxue Chen, Yantong Cai, Huan-ang Gao, Hao Zhao

    Abstract: Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively and efficiently characterize anisotropic areas induced by the cone-casting procedure. This paper introduces a Ripmap-Encoded Platonic Solid representation to precisely and efficiently featurize 3D anisotrop… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH 2024, Project page: https://junchenliu77.github.io/Rip-NeRF , Code: https://github.com/JunchenLiu77/Rip-NeRF

  20. arXiv:2404.15806  [pdf, other

    cs.LG

    Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders

    Authors: Chuang Liu, Yuyao Wang, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu

    Abstract: Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the po… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 Figures. Accepted by IJCAI 2024

  21. arXiv:2404.15729  [pdf, other

    cs.LG

    Gradformer: Graph Transformer with Exponential Decay

    Authors: Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Shirui Pan, Wenbin Hu

    Abstract: Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytic… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 9 pages, 7 figures. Accepted by IJCAI 2024

  22. arXiv:2404.13874  [pdf, other

    cs.CL cs.CV

    VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

    Authors: Haoyi Qiu, Wenbo Hu, Zi-Yi Dou, Nanyun Peng

    Abstract: Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs, undermining their reliability. A comprehensive quantitative evaluation is necessary to identify and understand the extent of hallucinations in these models. However, existing benchmarks are often limited in scope, focusing mainly on object hallucina… ▽ More

    Submitted 5 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: ACL 2024 Findings

  23. arXiv:2404.13518  [pdf, other

    cs.CR cs.AI

    Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

    Authors: Hongyu Zhu, Sichu Liang, Wentao Hu, Fangqi Li, Ju Jia, Shilin Wang

    Abstract: With the rise of Machine Learning as a Service (MLaaS) platforms,safeguarding the intellectual property of deep learning models is becoming paramount. Among various protective measures, trigger set watermarking has emerged as a flexible and effective strategy for preventing unauthorized model distribution. However, this paper identifies an inherent flaw in the current paradigm of trigger set water… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  24. arXiv:2404.12210  [pdf, other

    cs.CV

    An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training

    Authors: Jin Gao, Shubo Lin, Shaoru Wang, Yutong Kou, Zeming Li, Liang Li, Congxuan Zhang, Xiaoqin Zhang, Yizheng Wang, Weiming Hu

    Abstract: Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the \textit{extremely simple} lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-esta… ▽ More

    Submitted 25 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: A submission to IJCV

  25. arXiv:2404.10413  [pdf, other

    cs.DB cs.LG cs.PF

    VDTuner: Automated Performance Tuning for Vector Data Management Systems

    Authors: Tiannuo Yang, Wen Hu, Wangqi Peng, Yusen Li, Jianguo Li, Gang Wang, Xiaoguang Liu

    Abstract: Vector data management systems (VDMSs) have become an indispensable cornerstone in large-scale information retrieval and machine learning systems like large language models. To enhance the efficiency and flexibility of similarity search, VDMS exposes many tunable index parameters and system parameters for users to specify. However, due to the inherent characteristics of VDMS, automatic performance… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by ICDE 2024

  26. LoS Sensing-based Channel Estimation in UAV-Assisted OFDM Systems

    Authors: Chaojin Qing, Zhiying Liu, Wenquan Hu, Yinjie Zhang, Xi Cai, Pengfei Du

    Abstract: In unmanned aerial vehicle (UAV)-assisted orthogonal frequency division multiplexing (OFDM) systems, the potential advantage of the line-of-sight (LoS) path, characterized by its high probability of existence, has not been fully harnessed, thereby impeding the improvement of channel estimation (CE) accuracy. Inspired by the ideas of integrated sensing and communication (ISAC), this letter develops… ▽ More

    Submitted 22 February, 2024; originally announced April 2024.

  27. arXiv:2404.01340  [pdf, other

    cs.LG cs.AI

    From Similarity to Superiority: Channel Clustering for Time Series Forecasting

    Authors: Jialin Chen, Jan Eric Lenssen, Aosong Feng, Weihua Hu, Matthias Fey, Leandros Tassiulas, Jure Leskovec, Rex Ying

    Abstract: Time series forecasting has attracted significant attention in recent decades. Previous studies have demonstrated that the Channel-Independent (CI) strategy improves forecasting performance by treating different channels individually, while it leads to poor generalization on unseen instances and ignores potentially necessary interactions between channels. Conversely, the Channel-Dependent (CD) str… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 20 pages, 6 figures

  28. arXiv:2404.00776  [pdf, other

    cs.LG cs.DB stat.ML

    PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning

    Authors: Weihua Hu, Yiwen Yuan, Zecheng Zhang, Akihiro Nitta, Kaidi Cao, Vid Kocijan, Jure Leskovec, Matthias Fey

    Abstract: We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a model abstraction to enable modular implementation of tabular models, and allowing external foundation models to be incorporated to handle complex columns (e.g.,… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: https://github.com/pyg-team/pytorch-frame

  29. arXiv:2403.18142  [pdf, other

    cs.LG

    HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks

    Authors: Yongyi Yang, Jiaming Yang, Wei Hu, Michał Dereziński

    Abstract: As a variant of Graph Neural Networks (GNNs), Unfolded GNNs offer enhanced interpretability and flexibility over traditional designs. Nevertheless, they still suffer from scalability challenges when it comes to the training cost. Although many methods have been proposed to address the scalability issues, they mostly focus on per-iteration efficiency, without worst-case convergence guarantees. More… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  30. arXiv:2403.16224  [pdf, other

    cs.CV

    Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields

    Authors: Haoyuan Wang, Wenbo Hu, Lei Zhu, Rynson W. H. Lau

    Abstract: Inverse rendering aims at recovering both geometry and materials of objects. It provides a more compatible reconstruction for conventional rendering engines, compared with the neural radiance fields (NeRFs). On the other hand, existing NeRF-based inverse rendering methods cannot handle glossy objects with local light interactions well, as they typically oversimplify the illumination as a 2D enviro… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: CVPR 2024 paper. Project webpage https://whyy.site/paper/nep

  31. arXiv:2403.15530  [pdf, other

    cs.CV

    Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

    Authors: Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, Hengshuang Zhao

    Abstract: 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, it relies heavily on the quality of the initial point cloud, resulting in blurring and needle-like artifacts in areas with insufficient initializing points. This is mainly attributed to the point cloud growth condition in 3DGS that only considers the avera… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  32. arXiv:2403.15075  [pdf, other

    cs.IR cs.AI

    Bilateral Unsymmetrical Graph Contrastive Learning for Recommendation

    Authors: Jiaheng Yu, Jing Li, Yue He, Kai Zhu, Shuyi Zhang, Wen Hu

    Abstract: Recent methods utilize graph contrastive Learning within graph-structured user-item interaction data for collaborative filtering and have demonstrated their efficacy in recommendation tasks. However, they ignore that the difference relation density of nodes between the user- and item-side causes the adaptability of graphs on bilateral nodes to be different after multi-hop graph interaction calcula… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  33. arXiv:2403.14950  [pdf, other

    cs.CL cs.LG

    KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation

    Authors: Xindi Luo, Zequn Sun, Jing Zhao, Zhe Zhao, Wei Hu

    Abstract: Parameter-efficient finetuning (PEFT) is a key technique for adapting large language models (LLMs) to downstream tasks. In this paper, we study leveraging knowledge graph embeddings to improve the effectiveness of PEFT. We propose a knowledgeable adaptation method called KnowLA. It inserts an adaptation layer into an LLM to integrate the embeddings of entities appearing in the input text. The adap… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted in the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)

  34. arXiv:2403.14939  [pdf, other

    cs.CV

    STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

    Authors: Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao

    Abstract: Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose STAG4D, a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation. Drawing inspiration from… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  35. arXiv:2403.11451  [pdf, other

    cs.CV

    CasSR: Activating Image Power for Real-World Image Super-Resolution

    Authors: Haolan Chen, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou, Wei Hu

    Abstract: The objective of image super-resolution is to generate clean and high-resolution images from degraded versions. Recent advancements in diffusion modeling have led to the emergence of various image super-resolution techniques that leverage pretrained text-to-image (T2I) models. Nevertheless, due to the prevalent severe degradation in low-resolution images and the inherent characteristics of diffusi… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  36. arXiv:2403.11056  [pdf, other

    cs.CV

    Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration

    Authors: Zhihao Liang, Qi Zhang, Wenbo Hu, Ying Feng, Lei Zhu, Kui Jia

    Abstract: The 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single po… ▽ More

    Submitted 3 April, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: 29 pages

  37. arXiv:2403.10094  [pdf, other

    cs.CV eess.IV

    RangeLDM: Fast Realistic LiDAR Point Cloud Generation

    Authors: Qianjiang Hu, Zhimin Zhang, Wei Hu

    Abstract: Autonomous driving demands high-quality LiDAR data, yet the cost of physical LiDAR sensors presents a significant scaling-up challenge. While recent efforts have explored deep generative models to address this issue, they often consume substantial computational resources with slow generation speeds while suffering from a lack of realism. To address these limitations, we introduce RangeLDM, a novel… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  38. arXiv:2403.10050  [pdf, other

    cs.CV

    Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing

    Authors: Tian-Xing Xu, Wenbo Hu, Yu-Kun Lai, Ying Shan, Song-Hai Zhang

    Abstract: 3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering. However, it couples the appearance and geometry of the scene within the Gaussian attributes, which hinders the flexibility of editing operations, such as texture swapping. To address this issue, we propose a novel approach, namel… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  39. arXiv:2403.07264  [pdf, other

    stat.ML cs.LG

    Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization

    Authors: Yutong Wang, Rishi Sonthalia, Wei Hu

    Abstract: We study the generalization capability of nearly-interpolating linear regressors: $\boldsymbolβべーた$'s whose training error $τたう$ is positive but small, i.e., below the noise floor. Under a random matrix theoretic assumption on the data distribution and an eigendecay assumption on the data covariance matrix $\boldsymbolΣしぐま$, we demonstrate that any near-interpolator exhibits rapid norm growth: for $τたう$ fix… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: AISTATS 2024

  40. arXiv:2403.06600  [pdf, other

    cs.CV

    BEV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

    Authors: Fudong Ge, Yiwei Zhang, Shuhan Shen, Yue Wang, Weiming Hu, Jin Gao

    Abstract: In this paper, we propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird's-eye view (BEV) from a single monocular camera. The motivation arises from two key observations about VPR: 1) For the methods based on both camera and LiDAR sensors, the integration of LiDAR in robotic systems has led to increased expenses, while the alignment of data bet… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  41. arXiv:2403.05854  [pdf, other

    cs.CV

    LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

    Authors: Qihao Zhao, Yalun Dai, Hao Li, Wei Hu, Fan Zhang, Jun Liu

    Abstract: Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories. In this paper, we propose a novel generative and fine-tuning framework, LTGC, to handle long-tail recognition via leveraging generated content. Firstly, inspired by the rich implicit knowledge in large-scale models (e.g., large language… ▽ More

    Submitted 26 May, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: CVPR 2024, Oral

  42. arXiv:2403.04993  [pdf, other

    cs.CV

    PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts

    Authors: Zewen Chen, Haina Qin, Juan Wang, Chunfeng Yuan, Bing Li, Weiming Hu, Liang Wang

    Abstract: Due to the diversity of assessment requirements in various application scenarios for the IQA task, existing IQA methods struggle to directly adapt to these varied requirements after training. Thus, when facing new requirements, a typical approach is fine-tuning these models on datasets specifically created for those requirements. However, it is time-consuming to establish IQA datasets. In this wor… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  43. arXiv:2403.02993  [pdf, other

    cs.AI

    Localized Zeroth-Order Prompt Optimization

    Authors: Wenyang Hu, Yao Shu, Zongmin Yu, Zhaoxuan Wu, Xiangqiang Lin, Zhongxiang Dai, See-Kiong Ng, Bryan Kian Hsiang Low

    Abstract: The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in developing prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of fin… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  44. arXiv:2403.00249  [pdf, other

    cs.CV

    Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

    Authors: Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

    Abstract: In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment. However, in most existing methods, the reconstruction targets for MIM lack high-level semantics, and text is not sufficiently involved in masked modeling. These two drawbacks limit the effect of MIM in facilitating cross-modal semantic alignment. In this work, we… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  45. Mobile Health Text Misinformation Identification Using Mobile Data Mining

    Authors: Wen-Chen Hu, Sanjaikanth E Vadakkethil Somanathan Pillai, Abdelrahman Ahmed ElSaid

    Abstract: More than six million people died of the COVID-19 by April 2022. The heavy casualties have put people on great and urgent alert and people try to find all kinds of information to keep them from being inflected by the coronavirus. This research tries to find out whether the mobile health text information sent to peoples devices is correct as smartphones becoming the major information source for peo… ▽ More

    Submitted 5 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  46. arXiv:2402.17464  [pdf, other

    cs.CV

    Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing

    Authors: Bi'an Du, Xiang Gao, Wei Hu, Renjie Liao

    Abstract: Generative 3D part assembly involves understanding part relationships and predicting their 6-DoF poses for assembling a realistic 3D shape. Prior work often focus on the geometry of individual parts, neglecting part-whole hierarchies of objects. Leveraging two key observations: 1) super-part poses provide strong hints about part poses, and 2) predicting super-part poses is easier due to fewer supe… ▽ More

    Submitted 26 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  47. arXiv:2402.17010  [pdf, other

    cs.CL cs.AI

    Can Large Language Models Recall Reference Location Like Humans?

    Authors: Ye Wang, Xinrun Xu, Rui Xie, Wenxin Hu, Wei Ye

    Abstract: When completing knowledge-intensive tasks, humans sometimes need not just an answer but also a corresponding reference passage for auxiliary reading. Previous methods required obtaining pre-segmented article chunks through additional retrieval models. This paper explores leveraging the parameterized knowledge stored during the pre-training phase of large language models (LLMs) to independently rec… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  48. arXiv:2402.16872  [pdf, other

    cs.IR

    NFT1000: A Visual Text Dataset For Non-Fungible Token Retrieval

    Authors: Shuxun Wang, Yunfei Lei, Ziqi Zhang, Wei Liu, Haowei Liu, Li Yang, Wenjuan Li, Bing Li, Weiming Hu

    Abstract: With the rise of 'Metaverse' and 'Web3.0', NFT ( Non-Fungible Token ) has emerged as a kind of pivotal digital asset, garnering significant attention. By the end of November 2023, more than 1.4 billion NFT tokens have been minted across various blockchain platforms. To effectively locate a satisfactory NFT token, conducting searches within the extensive array of NFT data is essential. The challeng… ▽ More

    Submitted 28 January, 2024; originally announced February 2024.

    Comments: 6 pages,7 figures

  49. arXiv:2402.16769  [pdf, other

    cs.CV

    Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

    Authors: Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

    Abstract: In video-text retrieval, most existing methods adopt the dual-encoder architecture for fast retrieval, which employs two individual encoders to extract global latent representations for videos and texts. However, they face challenges in capturing fine-grained semantic concepts. In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to LREC-COLING 2024

  50. MatchNAS: Optimizing Edge AI in Sparse-Label Data Contexts via Automating Deep Neural Network Porting for Mobile Deployment

    Authors: Hongtao Huang, Xiaojun Chang, Wen Hu, Lina Yao

    Abstract: Recent years have seen the explosion of edge intelligence with powerful Deep Neural Networks (DNNs). One popular scheme is training DNNs on powerful cloud servers and subsequently porting them to mobile devices after being lightweight. Conventional approaches manually specialized DNNs for various edge platforms and retrain them with real-world data. However, as the number of platforms increases, t… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.