(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 157 results for author: Cao, R

.
  1. arXiv:2408.04867  [pdf, other

    cs.LG

    An Evaluation of Standard Statistical Models and LLMs on Time Series Forecasting

    Authors: Rui Cao, Qiao Wang

    Abstract: This research examines the use of Large Language Models (LLMs) in predicting time series, with a specific focus on the LLMTIME model. Despite the established effectiveness of LLMs in tasks such as text generation, language translation, and sentiment analysis, this study highlights the key challenges that large language models encounter in the context of time series prediction. We assess the perfor… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  2. arXiv:2407.20469  [pdf

    physics.optics eess.IV

    Efficient, gigapixel-scale, aberration-free whole slide scanner using angular ptychographic imaging with closed-form solution

    Authors: Shi Zhao, Haowen Zhou, Siyu Lin, Ruizhi Cao, Changhuei Yang

    Abstract: Whole slide imaging provides a wide field-of-view (FOV) across cross-sections of biopsy or surgery samples, significantly facilitating pathological analysis and clinical diagnosis. Such high-quality images that enable detailed visualization of cellular and tissue structures are essential for effective patient care and treatment planning. To obtain such high-quality images for pathology application… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  3. arXiv:2407.10956  [pdf, other

    cs.AI cs.CL

    Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

    Authors: Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

    Abstract: Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 34 pages, 14 figures, 10 tables

  4. arXiv:2407.08333  [pdf, other

    cs.CV

    SR-Mamba: Effective Surgical Phase Recognition with State Space Model

    Authors: Rui Cao, Jiangliu Wang, Yun-Hui Liu

    Abstract: Surgical phase recognition is crucial for enhancing the efficiency and safety of computer-assisted interventions. One of the fundamental challenges involves modeling the long-distance temporal relationships present in surgical videos. Inspired by the recent success of Mamba, a state space model with linear scalability in sequence length, this paper presents SR-Mamba, a novel attention-free model s… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Technical Report

  5. arXiv:2407.00932  [pdf, other

    cond-mat.quant-gas quant-ph

    Orbital phases of $p$-band ultracold fermions in the frustrated triangular lattice

    Authors: Jiaqi Wu, Hui Tan, Rui Cao, Jianmin Yuan, Yongqiang Li

    Abstract: Orbital degrees of freedom play an important role for understanding the emergence of unconventional quantum phases. Ultracold atomic gases in optical lattices provide a wonderful platform to simulate orbital physics. In this work, we consider spinless fermionic atoms loaded into $p$-orbital bands of a two-dimensional frustrated triangular lattice. The system can be described by an extended Fermi-H… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures

  6. arXiv:2407.00383  [pdf, other

    cs.LG cs.AI

    FANFOLD: Graph Normalizing Flows-driven Asymmetric Network for Unsupervised Graph-Level Anomaly Detection

    Authors: Rui Cao, Shijie Xue, Jindong Li, Qi Wang, Yi Chang

    Abstract: Unsupervised graph-level anomaly detection (UGAD) has attracted increasing interest due to its widespread application. In recent studies, knowledge distillation-based methods have been widely used in unsupervised anomaly detection to improve model efficiency and generalization. However, the inherent symmetry between the source (teacher) and target (student) networks typically results in consistent… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  7. arXiv:2406.05546  [pdf, other

    cs.DC cs.AI

    Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training

    Authors: Ray Cao, Sherry Luo, Steve Gan, Sujeeth Jinesh

    Abstract: In this study, we explore the impact of relaxing data consistency in parallel machine learning training during a failure using various parameter server configurations. Our failure recovery strategies include traditional checkpointing, chain replication (which ensures a backup server takes over in case of failure), and a novel stateless parameter server approach. In the stateless approach, workers… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  8. arXiv:2406.04536  [pdf, other

    cond-mat.quant-gas cond-mat.str-el physics.atom-ph

    Emergence of topological states in relaxation dynamics of interacting bosons

    Authors: Wang Huang, Xuchen Yang, Rui Cao, Yinghai Wu, Jianmin Yuan, Yongqiang Li

    Abstract: Topological concepts have been employed to understand the ground states of many strongly correlated systems, but it is still quite unclear if and how topology manifests itself in the relaxation dynamics. Here we uncover emergent topological phenomena in the time evolution of far-from-equilibrium one-dimensional interacting bosons. Beginning with simple product states, the system evolves into long-… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 7 pages, 4 figures, with supplymentary information

  9. arXiv:2406.01422  [pdf, other

    cs.SE cs.CL

    How to Understand Whole Software Repository?

    Authors: Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li

    Abstract: Recently, Large Language Model (LLM) based agents have advanced the significant development of Automatic Software Engineering (ASE). Although verified effectiveness, the designs of the existing methods mainly focus on the local information of codes, e.g., issues, classes, and functions, leading to limitations in capturing the global context and interdependencies within the software system. From th… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  10. arXiv:2405.03126  [pdf

    eess.IV eess.SP

    Infrared Polarization Imaging-based Non-destructive Thermography Inspection

    Authors: Xianyu Wu, Bin Zhou, Peng Lin, Rongjin Cao, Feng Huang

    Abstract: Infrared pulse thermography non-destructive testing (NDT) method is developed based on the difference in the infrared radiation intensity emitted by defective and non-defective areas of an object. However, when the radiation intensity of the defective target is similar to that of the non-defective area of the object, the detection results are poor. To address this issue, this study investigated th… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  11. arXiv:2405.02712  [pdf, other

    cs.CL

    CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions

    Authors: Hanchong Zhang, Ruisheng Cao, Hongshen Xu, Lu Chen, Kai Yu

    Abstract: Recently, Large Language Models (LLMs) have been demonstrated to possess impressive capabilities in a variety of domains and tasks. We investigate the issue of prompt design in the multi-turn text-to-SQL task and attempt to enhance the LLMs' reasoning capacity when generating SQL queries. In the conversational context, the current SQL query can be modified from the preceding SQL query with only a… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  12. arXiv:2405.02660  [pdf, other

    cs.IT eess.SP

    AFDM Channel Estimation in Multi-Scale Multi-Lag Channels

    Authors: Rongyou Cao, Yuheng Zhong, Jiangbin Lyu, Deqing Wang, Liqun Fu

    Abstract: Affine Frequency Division Multiplexing (AFDM) is a brand new chirp-based multi-carrier (MC) waveform for high mobility communications, with promising advantages over Orthogonal Frequency Division Multiplexing (OFDM) and other MC waveforms. Existing AFDM research focuses on wireless communication at high carrier frequency (CF), which typically considers only Doppler frequency shift (DFS) as a resul… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures. Investigate AFDM under underwater multi-scale multi-lag channels. Derive the new input-output formula with the impact of Doppler time scaling. Propose two new channel estimation methods to tackle different level of Doppler factors. Perform diversity analyis based on CFR overlap probability (COP) and mutual incoherent property (MIP)

  13. arXiv:2405.01958  [pdf, other

    stat.CO math.ST

    Improved distance correlation estimation

    Authors: Blanca E. Monroy-Castillo, M. A, Jácome, Ricardo Cao

    Abstract: Distance correlation is a novel class of multivariate dependence measure, taking positive values between 0 and 1, and applicable to random vectors of arbitrary dimensions, not necessarily equal. It offers several advantages over the well-known Pearson correlation coefficient, the most important is that distance correlation equals zero if and only if the random vectors are independent. There are… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  14. arXiv:2404.08979  [pdf, other

    cs.CV cs.LG

    BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection

    Authors: Jian Zhang, Ruiteng Zhang, Xinyue Yan, Xiting Zhuang, Ruicheng Cao

    Abstract: Degraded underwater images decrease the accuracy of underwater object detection. However, existing methods for underwater image enhancement mainly focus on improving the indicators in visual aspects, which may not benefit the tasks of underwater image detection, and may lead to serious degradation in performance. To alleviate this problem, we proposed a bidirectional-guided method for underwater o… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures, 4 tables

    MSC Class: 68T07; 68T45 ACM Class: I.4.3; I.4.8; I.4.9; I.4.10; I.2.10

  15. arXiv:2404.07972  [pdf, other

    cs.AI cs.CL

    OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

    Authors: Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu

    Abstract: Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: 51 pages, 21 figures

  16. arXiv:2404.05937  [pdf

    physics.med-ph

    De-aberration for transcranial photoacoustic computed tomography through an adult human skull

    Authors: Yousuf Aborahama, Karteekeya Sastry, Manxiu Cui, Yang Zhang, Yilin Luo, Rui Cao, Lihong V. Wang

    Abstract: Noninvasive transcranial photoacoustic computed tomography (PACT) of the human brain, despite its clinical potential, remains impeded by the acoustic distortion induced by the human skull. The distortion, which is attributed to the markedly different material properties of the skull relative to soft tissue, results in heavily aberrated PACT images -- a problem that has remained unsolved in the pas… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 11 pages, 3 figures

  17. arXiv:2404.01298  [pdf, other

    cs.CV eess.IV

    Noise2Image: Noise-Enabled Static Scene Recovery for Event Cameras

    Authors: Ruiming Cao, Dekel Galor, Amit Kohli, Jacob L Yates, Laura Waller

    Abstract: Event cameras capture changes of intensity over time as a stream of 'events' and generally cannot measure intensity itself; hence, they are only used for imaging dynamic scenes. However, fluctuations due to random photon arrival inevitably trigger noise events, even for static scenes. While previous efforts have been focused on filtering out these undesirable noise events to improve signal quality… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  18. DBPF: A Framework for Efficient and Robust Dynamic Bin-Picking

    Authors: Yichuan Li, Junkai Zhao, Yixiao Li, Zheng Wu, Rui Cao, Masayoshi Tomizuka, Yunhui Liu

    Abstract: Efficiency and reliability are critical in robotic bin-picking as they directly impact the productivity of automated industrial processes. However, traditional approaches, demanding static objects and fixed collisions, lead to deployment limitations, operational inefficiencies, and process unreliability. This paper introduces a Dynamic Bin-Picking Framework (DBPF) that challenges traditional stati… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures. This paper has been accepted by IEEE RA-L on 2024-03-24. See the supplementary video at youtube: https://youtu.be/n5af2VsKhkg

  19. arXiv:2403.02948  [pdf, other

    physics.ins-det hep-ex

    Front-end electronics development of large-area SiPM arrays for high-precision single-photon time measurement

    Authors: Wei Zhi, Ruike Cao, Jiannan Tang, Mingxin Wang, Yongqi Tan, Weihao Wu, Donglian Xu

    Abstract: TRopIcal DEep-sea Neutrino Telescope (TRIDENT) plans to incorporate silicon photomultipliers (SiPMs) with superior time resolution in addition to photomultiplier tubes (PMTs) into its detection units, namely hybrid Digital Optical Modules (hDOMs), to improve its angular resolution. However, the time resolution significantly degrades for large-area SiPMs due to the large detector capacitance, posin… ▽ More

    Submitted 7 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Revised version. 12 pages, 10 figures

  20. arXiv:2403.00129  [pdf, ps, other

    cs.DS

    Average-Case Local Computation Algorithms

    Authors: Amartya Shankha Biswas, Ruidi Cao, Edward Pyne, Ronitt Rubinfeld

    Abstract: We initiate the study of Local Computation Algorithms on average case inputs. In the Local Computation Algorithm (LCA) model, we are given probe access to a huge graph, and asked to answer membership queries about some combinatorial structure on the graph, answering each query with sublinear work. For instance, an LCA for the $k$-spanner problem gives access to a sparse subgraph $H\subseteq G$ t… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: 27 pages

  21. arXiv:2402.18262  [pdf, other

    cs.CL cs.CV

    Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding

    Authors: Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu

    Abstract: The growing prevalence of visually rich documents, such as webpages and scanned/digital-born documents (images, PDFs, etc.), has led to increased interest in automatic document understanding and information extraction across academia and industry. Although various document modalities, including image, text, layout, and structure, facilitate human information retrieval, the interconnected nature of… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  22. arXiv:2402.18258  [pdf, other

    cs.CL

    A BiRGAT Model for Multi-intent Spoken Language Understanding with Hierarchical Semantic Frames

    Authors: Hongshen Xu, Ruisheng Cao, Su Zhu, Sheng Jiang, Hanchong Zhang, Lu Chen, Kai Yu

    Abstract: Previous work on spoken language understanding (SLU) mainly focuses on single-intent settings, where each input utterance merely contains one user intent. This configuration significantly limits the surface form of user utterances and the capacity of output semantics. In this work, we first propose a Multi-Intent dataset which is collected from a realistic in-Vehicle dialogue System, called MIVS.… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  23. arXiv:2402.11845  [pdf, other

    cs.CL cs.CV

    Modularized Networks for Few-shot Hateful Meme Detection

    Authors: Rui Cao, Roy Ka-Wei Lee, Jing Jiang

    Abstract: In this paper, we address the challenge of detecting hateful memes in the low-resource setting where only a few labeled examples are available. Our approach leverages the compositionality of Low-rank adaptation (LoRA), a widely used parameter-efficient tuning technique. We commence by fine-tuning large language models (LLMs) with LoRA on selected tasks pertinent to hateful meme detection, thereby… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: camera-ready for WWW, 2024, Web4Good

  24. arXiv:2402.05589  [pdf, other

    cs.CV

    RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner

    Authors: Ying Zang, Chenglong Fu, Runlong Cao, Didi Zhu, Min Zhang, Wenjun Hu, Lanyun Zhu, Tianrun Chen

    Abstract: Referring expression segmentation (RES), a task that involves localizing specific instance-level objects based on free-form linguistic descriptions, has emerged as a crucial frontier in human-AI interaction. It demands an intricate understanding of both visual and textual contexts and often requires extensive training data. This paper introduces RESMatch, the first semi-supervised learning (SSL) a… ▽ More

    Submitted 11 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  25. arXiv:2402.02541  [pdf, other

    cs.CL cs.AI cs.CV

    Knowledge Generation for Zero-shot Knowledge-based VQA

    Authors: Rui Cao, Jing Jiang

    Abstract: Previous solutions to knowledge-based visual question answering~(K-VQA) retrieve knowledge from external knowledge bases and use supervised learning to train the K-VQA model. Recently pre-trained LLMs have been used as both a knowledge source and a zero-shot QA model for K-VQA and demonstrated promising results. However, these recent methods do not explicitly show the knowledge needed to answer th… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: accepted as Findings in EACL 2023

  26. Bagging cross-validated bandwidths with application to Big Data

    Authors: Daniel Barreiro-Ures, Ricardo Cao, Mario Francisco Fernández, Jeffrey D. Hart

    Abstract: Hall and Robinson (2009) proposed and analyzed the use of bagged cross-validation to choose the bandwidth of a kernel density estimator. They established that bagging greatly reduces the noise inherent in ordinary cross-validation, and hence leads to a more efficient bandwidth selector. The asymptotic theory of Hall and Robinson (2009) assumes that $N$, the number of bagged subsamples, is… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 37 pages, 9 figures

    MSC Class: 62G07 (Primary); 62G20 (Secondary)

    Journal ref: Bagging cross-validated bandwidths with application to Big Data. Biometrika (2021), 108(4), 981-988

  27. Cure models to estimate time until hospitalization due to COVID-19

    Authors: Maria Pedrosa-Laza, Ana López-Cheda, Ricardo Cao

    Abstract: A short introduction to survival analysis and censored data is included in this paper. A thorough literature review in the field of cure models has been done. An overview on the most important and recent approaches on parametric, semiparametric and nonparametric mixture cure models is also included. The main nonparametric and semiparametric approaches were applied to a real time dataset of COVID-1… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 14 pages, 8 figures

    Journal ref: Appl Intell, 2022, 52, 794-807

  28. Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models

    Authors: Ana López-Cheda, Ricardo Cao, M. Amalia Jácome, Ingrid Van Keilegom

    Abstract: A completely nonparametric method for the estimation of mixture cure models is proposed. A nonparametric estimator of the incidence is extensively studied and a nonparametric estimator of the latency is presented. These estimators, which are based on the Beran estimator of the conditional survival function, are proved to be the local maximum likelihood estimators. An i.i.d. representation is obtai… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 22 pages; 8 figures

    Journal ref: Computational Statistics and Data Analysis, 2017, 105, 144-165

  29. Nonparametric covariate hypothesis tests for the cure rate in mixture cure models

    Authors: Ana López-Cheda, M. Amalia Jácome, Ingrid Van Keilegom, Ricardo Cao

    Abstract: In lifetime data, like cancer studies, theremay be long term survivors, which lead to heavy censoring at the end of the follow-up period. Since a standard survival model is not appropriate to handle these data, a cure model is needed. In the literature, covariate hypothesis tests for cure models are limited to parametric and semiparametric methods.We fill this important gap by proposing a nonparam… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 17 pages, 4 figures

    Journal ref: Statistics in Medicine, 2020, 39, 2291-2307

  30. Nonparametric latency estimation for mixture cure models

    Authors: Ana López-Cheda, M. Amalia Jácome, Ricardo Cao

    Abstract: A nonparametric latency estimator for mixture cure models is studied in this paper. An i.i.d. representation is obtained, the asymptotic mean squared error of the latency estimator is found, and its asymptotic normality is proven. A bootstrap bandwidth selection method is introduced and its efficiency is evaluated in a simulation study. The proposed methods are applied to a dataset of colorectal c… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 24 pages, 3 figures

    Journal ref: TEST, 2017, 26(2), 353 -376

  31. arXiv:2401.16727  [pdf, other

    cs.CL

    Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models

    Authors: Ming Shan Hee, Shivam Sharma, Rui Cao, Palash Nandi, Tanmoy Chakraborty, Roy Ka-Wei Lee

    Abstract: In the evolving landscape of online communication, moderating hate speech (HS) presents an intricate challenge, compounded by the multimodal nature of digital content. This comprehensive survey delves into the recent strides in HS moderation, spotlighting the burgeoning role of large language models (LLMs) and large multimodal models (LMMs). Our exploration begins with a thorough analysis of curre… ▽ More

    Submitted 1 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Preprint; Under-Review

  32. Estimating lengths-of-stay of hospitalised COVID-19 patients using a non-parametric model: a case study in Galicia (Spain)

    Authors: Ana López-Cheda, M. Amalia Jácome, Ricardo Cao, Pablo M. De Salazar

    Abstract: Estimating the lengths-of-stay (LoS) of hospitalised COVID-19 patients is key for predicting the hospital beds' demand and planning mitigation strategies, as overwhelming the healthcare systems has critical consequences for disease mortality. However, accurately mapping the time-to-event of hospital outcomes, such as the LoS in the intensive care unit (ICU), requires understanding patient trajecto… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 14 pages, 4 figures

    Journal ref: Epidemiology and Infection; 149:e102, 2021

  33. arXiv:2401.04866  [pdf

    math.OC

    Airline recovery problem under disruptions: A review

    Authors: Shuai Wu, Enze Liu, Rui Cao, Qiang Bai

    Abstract: In practice, both passenger and cargo flights are vulnerable to unexpected factors, such as adverse weather, airport flow control, crew absence, unexpected aircraft maintenance, and pandemic, which can cause disruptions in flight schedules. Thus, managers need to reallocate relevant resources to ensure that the airport can return to normal operations on the basis of minimum cost, which is the airl… ▽ More

    Submitted 16 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  34. arXiv:2312.13683  [pdf, other

    eess.SP cs.IT

    Joint Channel Estimation and Cooperative Localization for Near-Field Ultra-Massive MIMO

    Authors: Ruoxiao Cao, Hengtao He, Xianghao Yu, Shenghui Song, Kaibin Huang, Jun Zhang, Yi Gong, Khaled B. Letaief

    Abstract: The next-generation (6G) wireless networks are expected to provide not only seamless and high data-rate communications, but also ubiquitous sensing services. By providing vast spatial degrees of freedom (DoFs), ultra-massive multiple-input multiple-output (UM-MIMO) technology is a key enabler for both sensing and communications in 6G. However, the adoption of UM-MIMO leads to a shift from the far… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Submit to JSAC

  35. arXiv:2312.11201  [pdf, other

    eess.AS cs.SD eess.SP

    A Refining Underlying Information Framework for Monaural Speech Enhancement

    Authors: Rui Cao, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

    Abstract: Supervised speech enhancement has gained significantly from recent advancements in neural networks, especially due to their ability to non-linearly fit the diverse representations of target speech, such as waveform or spectrum. However, these direct-fitting solutions continue to face challenges with degraded speech and residual noise in hearing evaluations. By bridging the speech enhancement and t… ▽ More

    Submitted 24 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 5 pages

  36. arXiv:2312.06094  [pdf, other

    cs.CL cs.CV cs.MM

    MATK: The Meme Analytical Tool Kit

    Authors: Ming Shan Hee, Aditi Kumaresan, Nguyen Khoi Hoang, Nirmalendu Prakash, Rui Cao, Roy Ka-Wei Lee

    Abstract: The rise of social media platforms has brought about a new digital culture called memes. Memes, which combine visuals and text, can strongly influence public opinions on social and cultural issues. As a result, people have become interested in categorizing memes, leading to the development of various datasets and multimodal models that show promising results in this field. However, there is curren… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Accepted at ACM Multimedia'23 Open-Source Software Competition Track

    ACM Class: I.1.4

  37. arXiv:2311.18446  [pdf, other

    stat.CO stat.AP stat.ME

    Length-of-stay times in hospital for COVID-19 patients using the smoothed Beran's estimator with bootstrap bandwidth selection

    Authors: Rebeca Peláez, Ricardo Cao, Juan Vilar

    Abstract: The survival function of length-of-stay in hospital ward and ICU for COVID-19 patients is studied in this paper. Flexible statistical methods are used to estimate this survival function given relevant covariates such as age, sex, obesity and chronic obstructive pulmonary disease (COPD). A doubly-smoothed Beran's estimator has been considered to this aim. The bootstrap method has been used to produ… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  38. arXiv:2311.14288  [pdf, other

    cs.SI

    Fair Influence Maximization in Social Networks: A Community-Based Evolutionary Algorithm

    Authors: Kaicong Ma, Xinxiang Xu, Haipeng Yang, Renzhi Cao, Lei Zhang

    Abstract: Influence Maximization (IM) has been extensively studied in network science, which attempts to find a subset of users to maximize the influence spread. A new variant of IM, Fair Influence Maximization (FIM), which primarily enhances the fair propagation of information, attracts increasing attention in academic. However, existing algorithms for FIM suffer from a trade-off between fairness and runni… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  39. arXiv:2310.18662  [pdf, other

    cs.CL

    ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL

    Authors: Ruisheng Cao, Hanchong Zhang, Hongshen Xu, Jieyu Li, Da Ma, Lu Chen, Kai Yu

    Abstract: Text-to-SQL aims to generate an executable SQL program given the user utterance and the corresponding database schema. To ensure the well-formedness of output SQLs, one prominent approach adopts a grammar-based recurrent decoder to produce the equivalent SQL abstract syntax tree (AST). However, previous methods mainly utilize an RNN-series decoder, which 1) is time-consuming and inefficient and 2)… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  40. arXiv:2310.17342  [pdf, other

    cs.CL

    ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought

    Authors: Hanchong Zhang, Ruisheng Cao, Lu Chen, Hongshen Xu, Kai Yu

    Abstract: Recently Large Language Models (LLMs) have been proven to have strong abilities in various domains and tasks. We study the problem of prompt designing in the text-to-SQL task and attempt to improve the LLMs' reasoning ability when generating SQL queries. Besides the trivial few-shot in-context learning setting, we design our chain-of-thought (CoT) prompt with a similar method to schema linking. We… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  41. arXiv:2309.13105  [pdf, other

    hep-ph astro-ph.CO

    Nonabelian Kinetic Mixing in a Confining Phase

    Authors: Gonzalo Alonso-Álvarez, Ruike Cao, James M. Cline, Karishma Moorthy, Tianzhuo Xiao

    Abstract: Dark matter from a hidden sector with SU($N$) gauge symmetry can have a nonabelian kinetic mixing portal with the standard model. The dark photon becomes massive in the confining phase without the need for spontaneous symmetry breaking. Depending on the particle content of the dark sector, there can be two or more composite vectors that get kinetic mixing through a heavy mediator particle $X$. Thi… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: 14 pages, 12 figures, comments welcome

  42. arXiv:2309.02391  [pdf, other

    cs.CR cs.SE

    Empirical Review of Smart Contract and DeFi Security: Vulnerability Detection and Automated Repair

    Authors: Peng Qian, Rui Cao, Zhenguang Liu, Wenqing Li, Ming Li, Lun Zhang, Yufeng Xu, Jianhai Chen, Qinming He

    Abstract: Decentralized Finance (DeFi) is emerging as a peer-to-peer financial ecosystem, enabling participants to trade products on a permissionless blockchain. Built on blockchain and smart contracts, the DeFi ecosystem has experienced explosive growth in recent years. Unfortunately, smart contracts hold a massive amount of value, making them an attractive target for attacks. So far, attacks against smart… ▽ More

    Submitted 6 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: This paper is submitted to the journal of Expert Systems with Applications (ESWA) for review

  43. arXiv:2309.00755  [pdf

    physics.optics eess.IV

    High-resolution, large field-of-view label-free imaging via aberration-corrected, closed-form complex field reconstruction

    Authors: Ruizhi Cao, Cheng Shen, Changhuei Yang

    Abstract: Computational imaging methods empower modern microscopy with the ability of producing high-resolution, large field-of-view, aberration-free images. One of the dominant computational label-free imaging methods, Fourier ptychographic microscopy (FPM), effectively increases the spatial-bandwidth product of conventional microscopy by using multiple tilted illuminations to achieve high-throughput imagi… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: 13 pages, 5 figures

  44. Effects of the $αあるふぁ$-cluster structure and the intrinsic momentum component of nuclei on the longitudinal asymmetry in relativistic heavy-ion collisions

    Authors: Ru-XIn Cao, Song Zhang, Yu-Gang Ma

    Abstract: The longitudinal asymmetry in relativistic heavy ion collisions arises from the fluctuation in the number of nucleons involved. This asymmetry causes a rapidity shift in the center of mass of the participating zone. Both the rapidity shift and the longitudinal asymmetry have been found to be significant at the top CERN Large Hadron Collider (LHC) energy for collisions of identical nuclei, and the… ▽ More

    Submitted 4 January, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: 13 pages, 7 figures

    Journal ref: Physical Review C 108, 064906 (2023)

  45. arXiv:2308.14127  [pdf, other

    math.NA math.AP

    Information geometric regularization of the barotropic Euler equation

    Authors: Ruijia Cao, Florian Schäfer

    Abstract: A key numerical difficulty in compressible fluid dynamics is the formation of shock waves. Shock waves feature jump discontinuities in the velocity and density of the fluid and thus preclude the existence of classical solutions to the compressible Euler equations. Weak entropy solutions are commonly defined by viscous regularization, but even small amounts of viscosity can substantially change the… ▽ More

    Submitted 18 March, 2024; v1 submitted 27 August, 2023; originally announced August 2023.

    MSC Class: 35L65; 76L05; 65M25; 76J20; 58B20

  46. arXiv:2308.08088  [pdf, other

    cs.CV cs.IR cs.MM

    Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection

    Authors: Rui Cao, Ming Shan Hee, Adriel Kuek, Wen-Haw Chong, Roy Ka-Wei Lee, Jing Jiang

    Abstract: Hateful meme detection is a challenging multimodal task that requires comprehension of both vision and language, as well as cross-modal interactions. Recent studies have tried to fine-tune pre-trained vision-language models (PVLMs) for this task. However, with increasing model sizes, it becomes important to leverage powerful PVLMs more efficiently, rather than simply fine-tuning them. Recently, re… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: Camera-ready for 23, ACM MM

  47. arXiv:2306.14471  [pdf

    physics.med-ph eess.IV physics.ins-det physics.optics

    Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging

    Authors: Rui Cao, Yilin Luo, Jinhua Xu, Xiaofei Luo, Ku Geng, Yousuf Aborahama, Manxiu Cui, Samuel Davis, Shuai Na, Xin Tong, Cindy Liu, Karteek Sastry, Konstantin Maslov, Peng Hu, Yide Zhang, Li Lin, Yang Zhang, Lihong V. Wang

    Abstract: Photoacoustic computed tomography (PACT) is emerging as a new technique for functional brain imaging, primarily due to its capabilities in label-free hemodynamic imaging. Despite its potential, the transcranial application of PACT has encountered hurdles, such as acoustic attenuations and distortions by the skull and limited light penetration through the skull. To overcome these challenges, we hav… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  48. arXiv:2306.11477  [pdf, other

    cs.CL

    CATS: A Pragmatic Chinese Answer-to-Sequence Dataset with Large Scale and High Quality

    Authors: Liang Li, Ruiying Geng, Chengyang Fang, Bing Li, Can Ma, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li

    Abstract: There are three problems existing in the popular data-to-text datasets. First, the large-scale datasets either contain noise or lack real application scenarios. Second, the datasets close to real applications are relatively small in size. Last, current datasets bias in the English language while leaving other languages underexplored. To alleviate these limitations, in this paper, we present CATS,… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: ACL 2023

  49. arXiv:2306.02625  [pdf, other

    cs.SD eess.AS

    Rethinking the visual cues in audio-visual speaker extraction

    Authors: Junjie Li, Meng Ge, Zexu pan, Rui Cao, Longbiao Wang, Jianwu Dang, Shiliang Zhang

    Abstract: The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to leverage two visual cues, namely speaker identity and synchronization, to enhance performance compared to audio-only algorithms. However, the visual front-end in AVSE is often derived from a pre-trained model or end-to-end trained, making it unclear which visual cue contributes more to the speaker extraction p… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted in Interspeech 2023

  50. arXiv:2305.17369  [pdf, other

    cs.CV cs.MM

    Modularized Zero-shot VQA with Pre-trained Models

    Authors: Rui Cao, Jing Jiang

    Abstract: Large-scale pre-trained models (PTMs) show great zero-shot capabilities. In this paper, we study how to leverage them for zero-shot visual question answering (VQA). Our approach is motivated by a few observations. First, VQA questions often require multiple steps of reasoning, which is still a capability that most PTMs lack. Second, different steps in VQA reasoning chains require different skills… ▽ More

    Submitted 24 January, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: accepted as Findings in ACL 2023; Code available: https://github.com/abril4416/Mod-Zero-VQA