(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 189 results for author: Cai, S

Searching in archive cs. Search in all archives.
.
  1. Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem

    Authors: Qiwen Zhu, Yanjie Wang, Shilv Cai, Liqun Chen, Jiahuan Zhou, Luxin Yan, Sheng Zhong, Xu Zou

    Abstract: Training Single-Image Super-Resolution (SISR) models using pixel-based regression losses can achieve high distortion metrics scores (e.g., PSNR and SSIM), but often results in blurry images due to insufficient recovery of high-frequency details. Conversely, using GAN or perceptual losses can produce sharp images with high perceptual metric scores (e.g., LPIPS), but may introduce artifacts and inco… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.02489  [pdf, other

    cs.SD cs.AI eess.AS

    NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention

    Authors: Dashanka De Silva, Siqi Cai, Saurav Pahuja, Tanja Schultz, Haizhou Li

    Abstract: In the study of auditory attention, it has been revealed that there exists a robust correlation between attended speech and elicited neural responses, measurable through electroencephalography (EEG). Therefore, it is possible to use the attention information available within EEG signals to guide the extraction of the target speaker in a cocktail party computationally. In this paper, we present a n… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2408.12304  [pdf, other

    cs.AI

    OPTDTALS: Approximate Logic Synthesis via Optimal Decision Trees Approach

    Authors: Hao Hu, Shaowei Cai

    Abstract: The growing interest in Explainable Artificial Intelligence (XAI) motivates promising studies of computing optimal Interpretable Machine Learning models, especially decision trees. Such models generally provide optimality in compact size or empirical accuracy. Recent works focus on improving efficiency due to the natural scalability issue. The application of such models to practical problems is qu… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  4. arXiv:2408.05914  [pdf, other

    cs.CV

    Deep Multimodal Collaborative Learning for Polyp Re-Identification

    Authors: Suncheng Xiang, Jincheng Li, Zhengjie Zhang, Shilun Cai, Jiale Guan, Dahong Qian

    Abstract: Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras and plays an important role in the prevention and treatment of colorectal cancer in computer-aided diagnosis. However, traditional methods for object ReID directly adopting CNN models trained on the ImageNet dataset usually produce unsatisfactory ret… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Work in progress. arXiv admin note: text overlap with arXiv:2307.10625

  5. arXiv:2408.03013  [pdf, other

    cs.DB cs.AI cs.LG

    NeurDB: On the Design and Implementation of an AI-powered Autonomous Database

    Authors: Zhanhao Zhao, Shaofeng Cai, Haotian Gao, Hexiang Pan, Siqi Xiang, Naili Xing, Gang Chen, Beng Chin Ooi, Yanyan Shen, Yuncheng Wu, Meihui Zhang

    Abstract: Databases are increasingly embracing AI to provide autonomous system optimization and intelligent in-database analytics, aiming to relieve end-user burdens across various industry sectors. Nonetheless, most existing approaches fail to account for the dynamic nature of databases, which renders them ineffective for real-world applications characterized by evolving data and workloads. This paper intr… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  6. arXiv:2408.00513  [pdf, other

    cs.LG

    VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection

    Authors: Fei Xiao, Shaofeng Cai, Gang Chen, H. V. Jagadish, Beng Chin Ooi, Meihui Zhang

    Abstract: Fraud detection presents a challenging task characterized by ever-evolving fraud patterns and scarce labeled data. Existing methods predominantly rely on graph-based or sequence-based approaches. While graph-based approaches connect users through shared entities to capture structural information, they remain vulnerable to fraudsters who can disrupt or manipulate these connections. In contrast, seq… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by KDD 2024

  7. ParLS-PBO: A Parallel Local Search Solver for Pseudo Boolean Optimization

    Authors: Zhihan Chen, Peng Lin, Hao Hu, Shaowei Cai

    Abstract: As a broadly applied technique in numerous optimization problems, recently, local search has been employed to solve Pseudo-Boolean Optimization (PBO) problem. A representative local search solver for PBO is LSPBO. In this paper, firstly, we improve LSPBO by a dynamic scoring mechanism, which dynamically strikes a balance between score on hard constraints and score on the objective function. More… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 17 pages,2 figures, to be published in The 30th International Conference on Principles and Practice of Constraint Programming

  8. arXiv:2407.06109  [pdf, other

    cs.CV

    PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models

    Authors: Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu

    Abstract: Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or Contr… ▽ More

    Submitted 16 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  9. arXiv:2407.05285  [pdf, other

    cs.LG cs.AI cs.CR

    Gradient Diffusion: A Perturbation-Resilient Gradient Leakage Attack

    Authors: Xuan Liu, Siqi Cai, Qihua Zhou, Song Guo, Ruibin Li, Kaiwei Lin

    Abstract: Recent years have witnessed the vulnerability of Federated Learning (FL) against gradient leakage attacks, where the private training data can be recovered from the exchanged gradients, making gradient protection a critical issue for the FL training process. Existing solutions often resort to perturbation-based mechanisms, such as differential privacy, where each participating client injects a spe… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  10. arXiv:2407.00114  [pdf, other

    cs.LG cs.AI cs.CL

    OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

    Authors: Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

    Abstract: We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimod… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  11. arXiv:2406.19154  [pdf

    cs.LG physics.ao-ph

    Advancing operational PM2.5 forecasting with dual deep neural networks (D-DNet)

    Authors: Shengjuan Cai, Fangxin Fang, Vincent-Henri Peuch, Mihai Alexe, Ionel Michael Navon, Yanghua Wang

    Abstract: PM2.5 forecasting is crucial for public health, air quality management, and policy development. Traditional physics-based models are computationally demanding and slow to adapt to real-time conditions. Deep learning models show potential in efficiency but still suffer from accuracy loss over time due to error accumulation. To address these challenges, we propose a dual deep neural network (D-DNet)… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  12. arXiv:2406.15782  [pdf, other

    cs.SC cs.LO

    A Local Search Algorithm for MaxSMT(LIA)

    Authors: Xiang He, Bohan Li, Mengyu Zhao, Shaowei Cai

    Abstract: MaxSAT modulo theories (MaxSMT) is an important generalization of Satisfiability modulo theories (SMT) with various applications. In this paper, we focus on MaxSMT with the background theory of Linear Integer Arithmetic, denoted as MaxSMT(LIA). We design the first local search algorithm for MaxSMT(LIA) called PairLS, based on the following novel ideas. A novel operator called pairwise operator is… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  13. arXiv:2406.11503  [pdf, other

    cs.CV cs.CL

    GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

    Authors: Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng

    Abstract: Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source da… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  14. arXiv:2406.08152  [pdf, other

    cs.CV

    CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer

    Authors: Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao, Jieping Ye

    Abstract: The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two fram… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures

  15. arXiv:2406.04523  [pdf, other

    cs.CL cs.LG

    Proofread: Fixes All Errors with One Tap

    Authors: Renjie Liu, Yanxiang Zhang, Yun Zhu, Haicheng Sun, Yuanbo Zhang, Michael Xuelin Huang, Shanqing Cai, Lei Meng, Shumin Zhai

    Abstract: The impressive capabilities in Large Language Models (LLMs) provide a powerful approach to reimagine users' typing experience. This paper demonstrates Proofread, a novel Gboard feature powered by a server-side LLM in Gboard, enabling seamless sentence-level and paragraph-level corrections with a single tap. We describe the complete system in this paper, from data generation, metrics design to mode… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures, 2 tables

  16. arXiv:2405.17414  [pdf, other

    cs.CV cs.GR

    Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

    Authors: Zhengfei Kuang, Shengqu Cai, Hao He, Yinghao Xu, Hongsheng Li, Leonidas Guibas, Gordon Wetzstein

    Abstract: Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  17. arXiv:2405.09883  [pdf, other

    cs.CV

    RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception

    Authors: Xiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, Jingkuan Song, Jieping Ye

    Abstract: We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird's Eye View (BEV) approaches for more challenging traffic scenes. The highlights of RoScenes include significantly large perception area, full scene coverage and crowded traffic. More specifically, our dataset achieves surprising 21.13M 3D annotations within… ▽ More

    Submitted 4 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: ECCV 2024. Extended version. 33 pages, 21 figures, 13 tables. https://github.com/xiaosu-zhu/RoScenes

  18. arXiv:2405.09111  [pdf, other

    cs.RO cs.AI

    CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving

    Authors: Dechen Gao, Shuangyu Cai, Hanchu Zhou, Hang Wang, Iman Soltani, Junshan Zhang

    Abstract: To safely navigate intricate real-world scenarios, autonomous vehicles must be able to adapt to diverse road conditions and anticipate future events. World model (WM) based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments. Nevertheless, to the best of our knowledge, there does not exist an accessible platform fo… ▽ More

    Submitted 25 July, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Dechen Gao, Shuangyu Cai, Hanchu Zhou, Hang Wang contributed equally

  19. arXiv:2405.03946  [pdf

    cs.SI

    Association between centrality and flourishing trait: analyzing student co-occurrence networks drawn from dining activities

    Authors: Yi Cao, Shimin Cai, Xiaorong Shen, Tao Zhou

    Abstract: Comprehending the association between social capabilities and individual psychological traits is paramount for educational administrators. Presently, many studies heavily depend on online questionnaires and self-reported data, while analysis of the connection between offline social networks and mental health status remains scarce. By leveraging a public dataset encompassing on-campus dining activi… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 14 pages, 2 figures, 1 Table

  20. arXiv:2405.03924  [pdf, other

    cs.DB cs.AI cs.LG

    NeurDB: An AI-powered Autonomous Data System

    Authors: Beng Chin Ooi, Shaofeng Cai, Gang Chen, Yanyan Shen, Kian-Lee Tan, Yuncheng Wu, Xiaokui Xiao, Naili Xing, Cong Yue, Lingze Zeng, Meihui Zhang, Zhanhao Zhao

    Abstract: In the wake of rapid advancements in artificial intelligence (AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, sel… ▽ More

    Submitted 4 July, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  21. arXiv:2405.00568  [pdf, other

    cs.DB cs.AI

    Powering In-Database Dynamic Model Slicing for Structured Data Analytics

    Authors: Lingze Zeng, Naili Xing, Shaofeng Cai, Gang Chen, Beng Chin Ooi, Jian Pei, Yuncheng Wu

    Abstract: Relational database management systems (RDBMS) are widely used for the storage and retrieval of structured data. To derive insights beyond statistical aggregation, we typically have to extract specific subdatasets from the database using conventional database operations, and then apply deep neural networks (DNN) training and inference on these respective subdatasets in a separate machine learning… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  22. arXiv:2405.00482  [pdf, other

    cs.CR cs.LG

    PackVFL: Efficient HE Packing for Vertical Federated Learning

    Authors: Liu Yang, Shuowei Cai, Di Chai, Junxue Zhang, Han Tian, Yilun Jin, Kun Guo, Kai Chen, Qiang Yang

    Abstract: As an essential tool of secure distributed machine learning, vertical federated learning (VFL) based on homomorphic encryption (HE) suffers from severe efficiency problems due to data inflation and time-consuming operations. To this core, we propose PackVFL, an efficient VFL framework based on packed HE (PackedHE), to accelerate the existing HE-based VFL algorithms. PackVFL packs multiple cleartex… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 12 pages excluding references

  23. arXiv:2404.16387  [pdf, other

    cs.LO

    Revisiting Restarts of CDCL: Should the Search Information be Preserved?

    Authors: Xindi Zhang, Zhihan Chen, Shaowei Cai

    Abstract: SAT solvers are indispensable in formal verification for hardware and software with many important applications. CDCL is the most widely used framework for modern SAT solvers, and restart is an essential technique of CDCL. When restarting, CDCL solvers cancel the current variable assignment while maintaining the branching order, variable phases, and learnt clauses. This type of restart is referred… ▽ More

    Submitted 27 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  24. arXiv:2404.09654  [pdf, other

    cs.CV cs.MM

    Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection

    Authors: Jiaqi Zhu, Shaofeng Cai, Fang Deng, Beng Chin Ooi, Junran Wu

    Abstract: Large vision-language models (LVLMs) are markedly proficient in deriving visual representations guided by natural language. Recent explorations have utilized LVLMs to tackle zero-shot visual anomaly detection (VAD) challenges by pairing images with textual descriptions indicative of normal and abnormal conditions, referred to as anomaly prompts. However, existing approaches depend on static anomal… ▽ More

    Submitted 10 September, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by MM'24 (Oral)

  25. arXiv:2404.08412  [pdf, other

    physics.flu-dyn cs.AI

    PiRD: Physics-informed Residual Diffusion for Flow Field Reconstruction

    Authors: Siming Shan, Pengkai Wang, Song Chen, Jiaxu Liu, Chao Xu, Shengze Cai

    Abstract: The use of machine learning in fluid dynamics is becoming more common to expedite the computation when solving forward and inverse problems of partial differential equations. Yet, a notable challenge with existing convolutional neural network (CNN)-based methods for data fidelity enhancement is their reliance on specific low-fidelity data patterns and distributions during the training phase. In ad… ▽ More

    Submitted 9 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 22 pages

  26. arXiv:2403.19501  [pdf, other

    cs.CV

    RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method

    Authors: Ming Yan, Yan Zhang, Shuqiang Cai, Shuqi Fan, Xincheng Lin, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang

    Abstract: Comprehensive capturing of human motions requires both accurate captures of complex poses and precise localization of the human within scenes. Most of the HPE datasets and methods primarily rely on RGB, LiDAR, or IMU data. However, solely using these modalities or a combination of them may not be adequate for HPE, particularly for complex and fast movements. For holistic human motion understanding… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR2024, Project website: http://www.lidarhumanmotion.net/reli11d/

  27. arXiv:2403.14346  [pdf, other

    cs.CV

    Towards Efficient Information Fusion: Concentric Dual Fusion Attention Based Multiple Instance Learning for Whole Slide Images

    Authors: Yujian Liu, Ruoxuan Wu, Xinjie Shen, Zihuang Lu, Lingyu Liang, Haiyu Zhou, Shipu Xu, Shaoai Cai, Shidang Xu

    Abstract: In the realm of digital pathology, multi-magnification Multiple Instance Learning (multi-mag MIL) has proven effective in leveraging the hierarchical structure of Whole Slide Images (WSIs) to reduce information loss and redundant data. However, current methods fall short in bridging the domain gap between pretrained models and medical imaging, and often fail to account for spatial relationships ac… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 14 pages, 7 figures

  28. arXiv:2403.14135  [pdf, other

    eess.IV cs.CV

    Powerful Lossy Compression for Noisy Images

    Authors: Shilv Cai, Xiaoguo Liang, Shuning Cao, Luxin Yan, Sheng Zhong, Liqun Chen, Xu Zou

    Abstract: Image compression and denoising represent fundamental challenges in image processing with many real-world applications. To address practical demands, current solutions can be categorized into two main strategies: 1) sequential method; and 2) joint method. However, sequential methods have the disadvantage of error accumulation as there is information loss between multiple individual models. Recentl… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME 2024

  29. arXiv:2403.10318  [pdf, other

    cs.LG

    Anytime Neural Architecture Search on Tabular Data

    Authors: Naili Xing, Shaofeng Cai, Zhaojing Luo, Beng Chin Ooi, Jian Pei

    Abstract: The increasing demand for tabular data analysis calls for transitioning from manual architecture design to Neural Architecture Search (NAS). This transition demands an efficient and responsive anytime NAS approach that is capable of returning current optimal architectures within any given time budget while progressively enhancing architecture quality with increased budget allocation. However, the… ▽ More

    Submitted 6 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  30. arXiv:2403.06568  [pdf, other

    cs.AI

    Better Understandings and Configurations in MaxSAT Local Search Solvers via Anytime Performance Analysis

    Authors: Furong Ye, Chuan Luo, Shaowei Cai

    Abstract: Though numerous solvers have been proposed for the MaxSAT problem, and the benchmark environment such as MaxSAT Evaluations provides a platform for the comparison of the state-of-the-art solvers, existing assessments were usually evaluated based on the quality, e.g., fitness, of the best-found solutions obtained within a given running time budget. However, concerning solely the final obtained solu… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  31. arXiv:2403.05182  [pdf, other

    cs.HC cs.GR

    ViboPneumo: A Vibratory-Pneumatic Finger-Worn Haptic Device for Altering Perceived Texture Roughness in Mixed Reality

    Authors: Shaoyu Cai, Zhenlin Chen, Haichen Gao, Ya Huang, Qi Zhang, Xinge Yu, Kening Zhu

    Abstract: Extensive research has been done in haptic feedback for texture simulation in virtual reality (VR). However, it is challenging to modify the perceived tactile texture of existing physical objects which usually serve as anchors for virtual objects in mixed reality (MR). In this paper, we present ViboPneumo, a finger-worn haptic device that uses vibratory-pneumatic feedback to modulate (i.e., increa… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 13 pages, 12 figures

  32. arXiv:2403.01414  [pdf, other

    cs.CV

    Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes

    Authors: Yujie Lu, Long Wan, Nayu Ding, Yulong Wang, Shuhan Shen, Shen Cai, Lin Gao

    Abstract: Neural implicit representation of geometric shapes has witnessed considerable advancements in recent years. However, common distance field based implicit representations, specifically signed distance field (SDF) for watertight shapes or unsigned distance field (UDF) for arbitrary shapes, routinely suffer from degradation of reconstruction accuracy when converting to explicit surface points and mes… ▽ More

    Submitted 1 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: accepted by CVPR 2024

  33. arXiv:2402.18008  [pdf, other

    cs.CV

    Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations

    Authors: Shen Cai, Zhanhao Wu, Lingxi Guo, Jiachun Wang, Siyu Zhang, Junchi Yan, Shuhan Shen

    Abstract: In this paper, we present two fast and interpretable decomposition methods for 2D homography, which are named Similarity-Kernel-Similarity (SKS) and Affine-Core-Affine (ACA) transformations respectively. Under the minimal $4$-point configuration, the first and the last similarity transformations in SKS are computed by two anchor points on target and source planes, respectively. Then, the other two… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  34. arXiv:2402.10705  [pdf, other

    cs.AI

    AutoSAT: Automatically Optimize SAT Solvers via Large Language Models

    Authors: Yiwen Sun, Xianyin Zhang, Shiyu Huang, Shaowei Cai, BingZhen Zhang, Ke Wei

    Abstract: Heuristics are crucial in SAT solvers, but no heuristic rules are suitable for all SAT problems. Therefore, it is helpful to refine specific heuristics for specific problems. In this context, we present AutoSAT, a novel framework for automatically optimizing heuristics in SAT solvers. AutoSAT is based on Large Language Models (LLMs) which is able to autonomously generate codes, conduct evaluation,… ▽ More

    Submitted 31 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  35. arXiv:2402.06158  [pdf, other

    cs.DS cs.AI cs.IR

    Assortment Planning with Sponsored Products

    Authors: Shaojie Tang, Shuzhang Cai, Jing Yuan, Kai Han

    Abstract: In the rapidly evolving landscape of retail, assortment planning plays a crucial role in determining the success of a business. With the rise of sponsored products and their increasing prominence in online marketplaces, retailers face new challenges in effectively managing their product assortment in the presence of sponsored products. Remarkably, previous research in assortment planning largely o… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  36. arXiv:2401.11740  [pdf, other

    cs.CV cs.LG

    Multi-level Cross-modal Alignment for Image Clustering

    Authors: Liping Qiu, Qin Zhang, Xiaojun Chen, Shaotian Cai

    Abstract: Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pre-training model could produce poor-quality pseudo-labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel \textbf{Multi-level Cross-modal Alignmen… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  37. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

    Authors: Susan Lin, Jeremy Warner, J. D. Zamfirescu-Pereira, Matthew G. Lee, Sauhard Jain, Michael Xuelin Huang, Piyawat Lertvittayakumjorn, Shanqing Cai, Shumin Zhai, Björn Hartmann, Can Liu

    Abstract: Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates key… ▽ More

    Submitted 7 March, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: To appear at ACM CHI 2024

  38. METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection

    Authors: Jiaqi Zhu, Shaofeng Cai, Fang Deng, Beng Chin Ooi, Wenqiao Zhang

    Abstract: Real-time analytics and decision-making require online anomaly detection (OAD) to handle drifts in data streams efficiently and effectively. Unfortunately, existing approaches are often constrained by their limited detection capacity and slow adaptation to evolving data streams, inhibiting their efficacy and efficiency in handling concept drift, which is a major challenge in evolving data streams.… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  39. arXiv:2312.14574  [pdf, other

    cs.CV cs.LG

    MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning

    Authors: Liang Peng, Songyue Cai, Zongqian Wu, Huifang Shang, Xiaofeng Zhu, Xiaoxiao Li

    Abstract: Prompt learning has demonstrated impressive efficacy in the fine-tuning of multimodal large models to a wide range of downstream tasks. Nonetheless, applying existing prompt learning methods for the diagnosis of neurological disorder still suffers from two issues: (i) existing methods typically treat all patches equally, despite the fact that only a small number of patches in neuroimaging are rele… ▽ More

    Submitted 27 June, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

  40. arXiv:2312.14327  [pdf, other

    cs.CL

    Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion

    Authors: Katrin Tomanek, Shanqing Cai, Subhashini Venugopalan

    Abstract: Abbreviation expansion is a strategy used to speed up communication by limiting the amount of typing and using a language model to suggest expansions. Here we look at personalizing a Large Language Model's (LLM) suggestions based on prior conversations to enhance the relevance of predictions, particularly when the user data is small (~1000 samples). Specifically, we compare fine-tuning, prompt-tun… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  41. arXiv:2312.13530  [pdf, other

    cs.CR cs.AI cs.LG

    HW-V2W-Map: Hardware Vulnerability to Weakness Mapping Framework for Root Cause Analysis with GPT-assisted Mitigation Suggestion

    Authors: Yu-Zheng Lin, Muntasir Mamun, Muhtasim Alam Chowdhury, Shuyu Cai, Mingyu Zhu, Banafsheh Saber Latibari, Kevin Immanuel Gubbi, Najmeh Nazari Bavarsad, Arjun Caputo, Avesta Sasan, Houman Homayoun, Setareh Rafatirad, Pratik Satam, Soheil Salehi

    Abstract: The escalating complexity of modern computing frameworks has resulted in a surge in the cybersecurity vulnerabilities reported to the National Vulnerability Database (NVD) by practitioners. Despite the fact that the stature of NVD is one of the most significant databases for the latest insights into vulnerabilities, extracting meaningful trends from such a large amount of unstructured data is stil… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 22 pages, 10 pages appendix, 10 figures, Submitted to ACM TODAES

  42. arXiv:2312.01532  [pdf, other

    cs.HC cs.CL

    Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments

    Authors: Shanqing Cai, Subhashini Venugopalan, Katie Seaver, Xiang Xiao, Katrin Tomanek, Sri Jalasutram, Meredith Ringel Morris, Shaun Kane, Ajit Narayanan, Robert L. MacDonald, Emily Kornman, Daniel Vance, Blair Casey, Steve M. Gleason, Philip Q. Nelson, Michael P. Brenner

    Abstract: Finding ways to accelerate text input for individuals with profound motor impairments has been a long-standing area of research. Closing the speed gap for augmentative and alternative communication (AAC) devices such as eye-tracking keyboards is important for improving the quality of life for such individuals. Recent advances in neural networks of natural language pose new opportunities for re-thi… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  43. arXiv:2312.01409  [pdf, other

    cs.CV cs.AI cs.GR

    Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

    Authors: Shengqu Cai, Duygu Ceylan, Matheus Gadelha, Chun-Hao Paul Huang, Tuanfeng Yang Wang, Gordon Wetzstein

    Abstract: Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hinderin… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Project page: https://primecai.github.io/generative_rendering/

  44. arXiv:2311.14249  [pdf, other

    cs.SC

    Efficient Local Search for Nonlinear Real Arithmetic

    Authors: Zhonghan Wang, Bohua Zhan, Bohan Li, Shaowei Cai

    Abstract: Local search has recently been applied to SMT problems over various arithmetic theories. Among these, nonlinear real arithmetic poses special challenges due to its uncountable solution space and potential need to solve higher-degree polynomials. As a consequence, existing work on local search only considered fragments of the theory. In this work, we analyze the difficulties and propose ways to add… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: Full version of VMCAI'2024 publication

  45. arXiv:2311.05997  [pdf, other

    cs.AI

    JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

    Authors: Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang

    Abstract: Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents. Existing approaches can handle certain long-horizon tasks in an open world. However, they still struggle when the number of open-world tasks could potentially be infinite and lack the capability to progressively enhance task completion as game time progr… ▽ More

    Submitted 30 November, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: update project page

  46. arXiv:2310.08240  [pdf, ps, other

    cs.CL

    Who Said That? Benchmarking Social Media AI Detection

    Authors: Wanyun Cui, Linqiu Zhang, Qianle Wang, Shuyang Cai

    Abstract: AI-generated text has proliferated across various online platforms, offering both transformative prospects and posing significant risks related to misinformation and manipulation. Addressing these challenges, this paper introduces SAID (Social media AI Detection), a novel benchmark developed to assess AI-text detection models' capabilities in real social media platforms. It incorporates real AI-ge… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  47. arXiv:2310.08235  [pdf, other

    cs.AI cs.LG

    GROOT: Learning to Follow Instructions by Watching Gameplay Videos

    Authors: Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang

    Abstract: We study the problem of building a controller that can follow open-ended instructions in open-world environments. We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations. A new learning framework is derived to allow learning such instruction-following controllers from gameplay videos while… ▽ More

    Submitted 28 November, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

  48. arXiv:2310.00029   

    cs.AI cs.GT cs.LG cs.RO

    Adversarial Driving Behavior Generation Incorporating Human Risk Cognition for Autonomous Vehicle Evaluation

    Authors: Zhen Liu, Hang Gao, Hao Ma, Shuo Cai, Yunfeng Hu, Ting Qu, Hong Chen, Xun Gong

    Abstract: Autonomous vehicle (AV) evaluation has been the subject of increased interest in recent years both in industry and in academia. This paper focuses on the development of a novel framework for generating adversarial driving behavior of background vehicle interfering against the AV to expose effective and rational risky events. Specifically, the adversarial behavior is learned by a reinforcement lear… ▽ More

    Submitted 14 October, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: We find there is expression error in III.A. A correction edition will be offered

  49. arXiv:2309.15940  [pdf, other

    cs.RO cs.CV

    Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

    Authors: Haonan Chang, Kowndinya Boyalakuntla, Shiyang Lu, Siwei Cai, Eric Jing, Shreesh Keskar, Shijie Geng, Adeeb Abbas, Lifeng Zhou, Kostas Bekris, Abdeslam Boularias

    Abstract: We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: The code and dataset used for evaluation can be found at https://github.com/changhaonan/OVSG}{https://github.com/changhaonan/OVSG. This paper has been accepted by CoRL2023

  50. arXiv:2308.14774  [pdf, other

    eess.AS cs.SD eess.SP q-bio.QM

    EEG-Derived Voice Signature for Attended Speaker Detection

    Authors: Hongxu Zhu, Siqi Cai, Yidi Jiang, Qiquan Zhang, Haizhou Li

    Abstract: \textit{Objective:} Conventional EEG-based auditory attention detection (AAD) is achieved by comparing the time-varying speech stimuli and the elicited EEG signals. However, in order to obtain reliable correlation values, these methods necessitate a long decision window, resulting in a long detection latency. Humans have a remarkable ability to recognize and follow a known speaker, regardless of t… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 8 pages, 2 figures