(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 653 results for author: Zhou, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.04170  [pdf

    cs.CV

    M2EF-NNs: Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction

    Authors: Hui Luo, Jiashuang Huang, Hengrong Ju, Tianyi Zhou, Weiping Ding

    Abstract: Accurate cancer survival prediction is crucial for assisting clinical doctors in formulating treatment plans. Multimodal data, including histopathological images and genomic data, offer complementary and comprehensive information that can greatly enhance the accuracy of this task. However, the current methods, despite yielding promising results, suffer from two notable limitations: they do not eff… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  2. FDiff-Fusion:Denoising diffusion fusion network based on fuzzy learning for 3D medical image segmentation

    Authors: Weiping Ding, Sheng Geng, Haipeng Wang, Jiashuang Huang, Tianyi Zhou

    Abstract: In recent years, the denoising diffusion model has achieved remarkable success in image segmentation modeling. With its powerful nonlinear modeling capabilities and superior generalization performance, denoising diffusion models have gradually been applied to medical image segmentation tasks, bringing new perspectives and methods to this field. However, existing methods overlook the uncertainty of… ▽ More

    Submitted 21 July, 2024; originally announced August 2024.

    Comments: This paper has been accepted by Information Fusion. Permission from Elsevier must be obtained for all other uses, in any current or future media. The final version is available at [doi:10.1016/J.INFFUS.2024.102540]

    Journal ref: Information Fusion, 2024: 102540

  3. FMDNN: A Fuzzy-guided Multi-granular Deep Neural Network for Histopathological Image Classification

    Authors: Weiping Ding, Tianyi Zhou, Jiashuang Huang, Shu Jiang, Tao Hou, Chin-Teng Lin

    Abstract: Histopathological image classification constitutes a pivotal task in computer-aided diagnostics. The precise identification and categorization of histopathological images are of paramount significance for early disease detection and treatment. In the diagnostic process of pathologists, a multi-tiered approach is typically employed to assess abnormalities in cell regions at different magnifications… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by IEEE Transactions on Fuzzy Systems for publication. Permission from IEEE must be obtained for all other uses, in any current or future media. The final version is available at [doi: 10.1109/TFUZZ.2024.3410929]

    Journal ref: IEEE Transactions on Fuzzy Systems ( Early Access ) 2024

  4. arXiv:2407.14746  [pdf, other

    cs.CV eess.IV

    Difflare: Removing Image Lens Flare with Latent Diffusion Model

    Authors: Tianwen Zhou, Qihao Duan, Zitong Yu

    Abstract: The recovery of high-quality images from images corrupted by lens flare presents a significant challenge in low-level vision. Contemporary deep learning methods frequently entail training a lens flare removing model from scratch. However, these methods, despite their noticeable success, fail to utilize the generative prior learned by pre-trained models, resulting in unsatisfactory performance in l… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: Accepted by BMVC 2024

  5. arXiv:2407.14504  [pdf, other

    cs.LG

    Nonlinear Schrödinger Network

    Authors: Yiming Zhou, Callen MacPhee, Tingyi Zhou, Bahram Jalali

    Abstract: Deep neural networks (DNNs) have achieved exceptional performance across various fields by learning complex nonlinear mappings from large-scale datasets. However, they encounter challenges such as high computational costs and limited interpretability. To address these issues, hybrid approaches that integrate physics with AI are gaining interest. This paper introduces a novel physics-based AI model… ▽ More

    Submitted 24 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  6. arXiv:2407.08133  [pdf, other

    cs.CV cs.AI

    Nonverbal Interaction Detection

    Authors: Jianan Wei, Tianfei Zhou, Yi Yang, Wenguan Wang

    Abstract: This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the l… ▽ More

    Submitted 14 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://github.com/weijianan1/NVI

  7. arXiv:2407.05633  [pdf, other

    cs.LG cs.CR

    AdaPI: Facilitating DNN Model Adaptivity for Efficient Private Inference in Edge Computing

    Authors: Tong Zhou, Jiahui Zhao, Yukui Luo, Xi Xie, Wujie Wen, Caiwen Ding, Xiaolin Xu

    Abstract: Private inference (PI) has emerged as a promising solution to execute computations on encrypted data, safeguarding user privacy and model parameters in edge computing. However, existing PI methods are predominantly developed considering constant resource constraints, overlooking the varied and dynamic resource constraints in diverse edge devices, like energy budgets. Consequently, model providers… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: ICCAD 2024 accepted publication

  8. arXiv:2407.05267  [pdf, other

    cs.CV

    DTR: A Unified Deep Tensor Representation Framework for Multimedia Data Recovery

    Authors: Ting-Wei Zhou, Xi-Le Zhao, Jian-Li Wang, Yi-Si Luo, Min Wang, Xiao-Xuan Bai, Hong Yan

    Abstract: Recently, the transform-based tensor representation has attracted increasing attention in multimedia data (e.g., images and videos) recovery problems, which consists of two indispensable components, i.e., transform and characterization. Previously, the development of transform-based tensor representation mainly focuses on the transform aspect. Although several attempts consider using shallow matri… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  9. arXiv:2407.03089  [pdf, other

    eess.SP cs.LG q-bio.NC

    Spatio-Temporal Adaptive Diffusion Models for EEG Super-Resolution in Epilepsy Diagnosis

    Authors: Tong Zhou, Shuqiang Wang

    Abstract: Electroencephalogram (EEG) technology, particularly high-density EEG (HD EEG) devices, is widely used in fields such as neuroscience. HD EEG devices improve the spatial resolution of EEG by placing more electrodes on the scalp, meeting the requirements of clinical diagnostic applications such as epilepsy focus localization. However, this technique faces challenges such as high acquisition costs an… ▽ More

    Submitted 6 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  10. arXiv:2407.02764  [pdf, other

    cs.OS

    Data-driven Software-based Power Estimation for Embedded Devices

    Authors: Haoyu Wang, Xinyi Li, Ti Zhou, Man Lin

    Abstract: Energy measurement of computer devices, which are widely used in the Internet of Things (IoT), is an important yet challenging task. Most of these IoT devices lack ready-to-use hardware or software for power measurement. A cost-effective solution is to use low-end consumer-grade power meters. However, these low-end power meters cannot provide accurate instantaneous power measurements. In this pape… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  11. arXiv:2407.02408  [pdf, other

    cs.CL cs.LG

    CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

    Authors: Song Wang, Peng Wang, Tong Zhou, Yushun Dong, Zhen Tan, Jundong Li

    Abstract: As Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks, concerns regarding the potential negative societal impacts of LLM-generated content have also arisen. To evaluate the biases exhibited by LLMs, researchers have recently proposed a variety of datasets. However, existing bias evaluation efforts often focus on only a particular type o… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 37 pages, 32 figures

  12. arXiv:2407.00995  [pdf, other

    cs.CY eess.SY physics.app-ph

    Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense

    Authors: Yi Yu, Shengyue Yao, Tianchen Zhou, Yexuan Fu, Jingru Yu, Ding Wang, Xuhong Wang, Cen Chen, Yilun Lin

    Abstract: In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, an… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  13. arXiv:2407.00629  [pdf, ps, other

    cs.MA

    Identification of LFT Structured Descriptor Systems with Slow and Non-uniform Sampling

    Authors: Tong Zhou

    Abstract: Time domain identification is studied in this paper for parameters of a continuous-time multi-input multi-output descriptor system, with these parameters affecting system matrices through a linear fractional transformation. Sampling is permitted to be slow and non-uniform, and there are no necessities to satisfy the Nyquist frequency restrictions. This model can be used to described the behaviors… ▽ More

    Submitted 6 August, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: 14 pages

  14. arXiv:2407.00256  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

    Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh

    Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction.… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts

    MSC Class: 68T01

    Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024

  15. arXiv:2406.19364  [pdf, other

    cs.CV

    SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

    Authors: Yuxin Xie, Tao Zhou, Yi Zhou, Geng Chen

    Abstract: Weakly-supervised medical image segmentation is a challenging task that aims to reduce the annotation cost while keep the segmentation performance. In this paper, we present a novel framework, SimTxtSeg, that leverages simple text cues to generate high-quality pseudo-labels and study the cross-modal fusion in training segmentation models, simultaneously. Our contribution consists of two key compon… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: accepted by MICCAI 2024

  16. arXiv:2406.18966  [pdf, other

    cs.CL

    UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

    Authors: Siyuan Wu, Yue Huang, Chujie Gao, Dongping Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

    Abstract: Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges remain in the areas of generalization, controllability, diversity, and truthfulness within the existing generative frameworks. To address these challenges, this pap… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  17. arXiv:2406.18313  [pdf, other

    cs.SD cs.CL eess.AS

    Advancing Airport Tower Command Recognition: Integrating Squeeze-and-Excitation and Broadcasted Residual Learning

    Authors: Yuanxi Lin, Tonglin Zhou, Yang Xiao

    Abstract: Accurate recognition of aviation commands is vital for flight safety and efficiency, as pilots must follow air traffic control instructions precisely. This paper addresses challenges in speech command recognition, such as noisy environments and limited computational resources, by advancing keyword spotting technology. We create a dataset of standardized airport tower commands, including routine an… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by IALP 2024

  18. arXiv:2406.17806  [pdf, other

    cs.CL cs.AI cs.CR cs.CV cs.LG

    MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

    Authors: Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh

    Abstract: Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  19. arXiv:2406.17231  [pdf, other

    cs.CL

    CogMG: Collaborative Augmentation Between Large Language Model and Knowledge Graph

    Authors: Tong Zhou, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Large language models have become integral to question-answering applications despite their propensity for generating hallucinations and factually inaccurate content. Querying knowledge graphs to reduce hallucinations in LLM meets the challenge of incomplete knowledge coverage in knowledge graphs. On the other hand, updating knowledge graphs by information extraction and knowledge graph completion… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  20. arXiv:2406.16229  [pdf, other

    cs.CL

    Multi-Objective Linguistic Control of Large Language Models

    Authors: Dang Nguyen, Jiuhai Chen, Tianyi Zhou

    Abstract: Large language models (LLMs), despite their breakthroughs on many challenging benchmark tasks, lean to generate verbose responses and lack the controllability of output complexity, which is usually preferred by human users in practice. In this paper, we study how to precisely control multiple linguistic complexities of LLM output by finetuning using off-the-shelf data. To this end, we propose mult… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  21. arXiv:2406.15938  [pdf, other

    cs.CL cs.AI cs.LG

    RuleR: Improving LLM Controllability by Rule-based Data Recycling

    Authors: Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou

    Abstract: Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR),… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  22. arXiv:2406.14721  [pdf, other

    cs.CL

    1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

    Authors: Yue Huang, Chenrui Fan, Yuan Li, Siyuan Wu, Tianyi Zhou, Xiangliang Zhang, Lichao Sun

    Abstract: Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling identical queries in different languages, presenting challenges for further advancement. This paper introduces a method to enhance the multilingual performance of LLMs by aggregating kn… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  23. arXiv:2406.12463  [pdf, other

    cs.CV eess.IV

    LFMamba: Light Field Image Super-Resolution with State Space Model

    Authors: Wang xia, Yao Lu, Shunzhou Wang, Ziqi Wang, Peiqi Xia, Tianfei Zhou

    Abstract: Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scan… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  24. arXiv:2406.10900  [pdf, other

    cs.CV cs.CL

    AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

    Authors: Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber, Tianyi Zhou, Dinesh Manocha

    Abstract: Large vision-language models (LVLMs) hallucinate: certain context cues in an image may trigger the language module's overconfident and incorrect reasoning on abnormal or hypothetical objects. Though a few benchmarks have been developed to investigate LVLM hallucinations, they mainly rely on hand-crafted corner cases whose fail patterns may hardly generalize, and finetuning on them could undermine… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  25. arXiv:2406.10819  [pdf, other

    cs.CV cs.AI cs.CL

    GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

    Authors: Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have been used as agents to control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) and generating corresponding code. However, current agents primarily exhibit excellent understanding capabilities in static environments and are predominantly applied in relatively simple domains, such as Web or mobile interfaces… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  26. arXiv:2406.10323  [pdf, other

    cs.CL

    GenQA: Generating Millions of Instructions from a Handful of Prompts

    Authors: Jiuhai Chen, Rifaa Qadri, Yuxin Wen, Neel Jain, John Kirchenbauer, Tianyi Zhou, Tom Goldstein

    Abstract: Most public instruction finetuning datasets are relatively small compared to the closed source datasets used to train industry models. To study questions about finetuning at scale, such as curricula and learning rate cooldown schedules, there is a need for industrial-scale datasets. However, this scale necessitates a data generation process that is almost entirely automated. In this work, we study… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9.5 pages, 6 Figures, and 3 tables in the main body. Dataset available at https://huggingface.co/datasets/tomg-group-umd/GenQA

  27. arXiv:2406.07657  [pdf, other

    cs.LG cs.CL

    OPTune: Efficient Online Preference Tuning

    Authors: Lichang Chen, Jiuhai Chen, Chenxi Liu, John Kirchenbauer, Davit Soselia, Chen Zhu, Tom Goldstein, Tianyi Zhou, Heng Huang

    Abstract: Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent works have shown that the online variants achieve even better alignment. However, online alignment requires on-the-fly generation of new training data, which is… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures

  28. arXiv:2406.06965  [pdf, other

    cs.CV

    Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey

    Authors: Ping Liu, Qiqi Tao, Joey Tianyi Zhou

    Abstract: This survey addresses the critical challenge of deepfake detection amidst the rapid advancements in artificial intelligence. As AI-generated media, including video, audio and text, become more realistic, the risk of misuse to spread misinformation and commit identity fraud increases. Focused on face-centric deepfakes, this work traces the evolution from traditional single-modality methods to sophi… ▽ More

    Submitted 14 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: P. Liu is with the Department of Computer Science and Engineering, University of Nevada, Reno, NV, 89512. Q. Tao and J. Zhou are with Centre for Frontier AI Research (CFAR), and Institute of High Performance Computing (IHPC), A*STAR, Singapore. J. Zhou is also with Centre for Advanced Technologies in Online Safety (CATOS), A*STAR, Singapore. J. Zhou is the corresponding author

  29. arXiv:2406.05677  [pdf, other

    cs.CV

    Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification

    Authors: Yuxin Hong, Xiao Zhang, Xin Zhang, Joey Tianyi Zhou

    Abstract: In the medical field, managing high-dimensional massive medical imaging data and performing reliable medical analysis from it is a critical challenge, especially in resource-limited environments such as remote medical facilities and mobile devices. This necessitates effective dataset compression techniques to reduce storage, transmission, and computational cost. However, existing coreset selection… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  30. arXiv:2406.03445  [pdf, other

    cs.LG cs.CL

    Pre-trained Large Language Models Use Fourier Features to Compute Addition

    Authors: Tianyi Zhou, Deqing Fu, Vatsal Sharan, Robin Jia

    Abstract: Pre-trained large language models (LLMs) exhibit impressive mathematical reasoning capabilities, yet how they compute basic arithmetic, such as addition, remains unclear. This paper shows that pre-trained LLMs add numbers using Fourier features -- dimensions in the hidden state that represent numbers via a set of features sparse in the frequency domain. Within the model, MLP and attention layers u… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  31. arXiv:2406.02965  [pdf, other

    cs.CV

    Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

    Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh

    Abstract: The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  32. arXiv:2406.01970  [pdf, other

    cs.CV cs.AI

    The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

    Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Boqing Gong, Cho-Jui Hsieh, Minhao Cheng

    Abstract: Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``universal'' and can be generalized across various positio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  33. arXiv:2406.01946  [pdf, other

    cs.CR cs.CL

    Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature

    Authors: Tong Zhou, Xuandong Zhao, Xiaolin Xu, Shaolei Ren

    Abstract: Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  34. arXiv:2406.00956  [pdf, other

    cs.CV cs.LG eess.IV

    Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

    Authors: Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang

    Abstract: The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Project Link: https://sam-auxol.github.io/AuxOL/

  35. arXiv:2405.20910  [pdf, other

    physics.app-ph cs.AI cs.CV physics.data-an

    Predicting ptychography probe positions using single-shot phase retrieval neural network

    Authors: Ming Du, Tao Zhou, Junjing Deng, Daniel J. Ching, Steven Henke, Mathew J. Cherukara

    Abstract: Ptychography is a powerful imaging technique that is used in a variety of fields, including materials science, biology, and nanotechnology. However, the accuracy of the reconstructed ptychography image is highly dependent on the accuracy of the recorded probe positions which often contain errors. These errors are typically corrected jointly with phase retrieval through numerical optimization appro… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    MSC Class: 94A08 ACM Class: I.4.0

  36. arXiv:2405.16472  [pdf, other

    cs.LG

    Multi-Level Additive Modeling for Structured Non-IID Federated Learning

    Authors: Shutong Chen, Tianyi Zhou, Guodong Long, Jie Ma, Jing Jiang, Chengqi Zhang

    Abstract: The primary challenge in Federated Learning (FL) is to model non-IID distributions across clients, whose fine-grained structure is important to improve knowledge sharing. For example, some knowledge is globally shared across all clients, some is only transferable within a subgroup of clients, and some are client-specific. To capture and exploit this structure, we train models organized in a multi-… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  37. arXiv:2405.15973  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

    Authors: Xiyao Wang, Jiuhai Chen, Zhaoyang Wang, Yuhang Zhou, Yiyang Zhou, Huaxiu Yao, Tianyi Zhou, Tom Goldstein, Parminder Bhatia, Furong Huang, Cao Xiao

    Abstract: Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignment between visual and language modalities. Previous methods to enhance this alignment typically require external models or data, heavily depending… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 15 pages, 8 figures

  38. arXiv:2405.14767  [pdf, other

    q-fin.ST cs.CL cs.LG q-fin.TR

    FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models

    Authors: Hongyang Yang, Boyu Zhang, Neng Wang, Cheng Guo, Xiaoli Zhang, Likun Lin, Junlin Wang, Tianyu Zhou, Mao Guan, Runjia Zhang, Christina Dan Wang

    Abstract: As financial institutions and professionals increasingly incorporate Large Language Models (LLMs) into their workflows, substantial barriers, including proprietary data and specialized knowledge, persist between the finance sector and the AI community. These challenges impede the AI community's ability to enhance financial tasks effectively. Acknowledging financial analysis's critical role, we aim… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: FinRobot Whitepaper V1.0

  39. arXiv:2405.13326  [pdf, other

    cs.CL

    Mosaic IT: Enhancing Instruction Tuning with Data Mosaics

    Authors: Ming Li, Pei Chen, Chenguang Wang, Hongyu Zhao, Yijun Liang, Yupeng Hou, Fuxiao Liu, Tianyi Zhou

    Abstract: Finetuning large language models with a variety of instruction-response pairs has enhanced their capability to understand and follow instructions. Current instruction tuning primarily relies on teacher models or human intervention to generate and refine the instructions and responses, which are costly, non-sustainable, and may lack diversity. In this paper, we introduce Mosaic Instruction Tuning (… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  40. arXiv:2405.11793  [pdf, other

    cs.CV

    MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise

    Authors: Ruiqi Wu, Chenran Zhang, Jianle Zhang, Yi Zhou, Tao Zhou, Huazhu Fu

    Abstract: Current fundus image analysis models are predominantly built for specific tasks relying on individual datasets. The learning process is usually based on data-driven paradigm without prior knowledge, resulting in poor transferability and generalizability. To address this issue, we propose MM-Retinal, a multi-modal dataset that encompasses high-quality image-text pairs collected from professional fu… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Early Accepted by The International Conference on Medical Image Computing and Computer Assisted Intervention(MICCAI)2024

  41. arXiv:2405.11311  [pdf

    cs.LG cs.ET

    A Dual Power Grid Cascading Failure Model for the Vulnerability Analysis

    Authors: Tianxin Zhou, Xiang Li, Haibing Lu

    Abstract: Considering the attacks against the power grid, one of the most effective approaches could be the attack to the transmission lines that leads to large cascading failures. Hence, the problem of locating the most critical or vulnerable transmission lines for a Power Grid Cascading Failure (PGCF) has drawn much attention from the research society. There exists many deterministic solutions and stochas… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  42. arXiv:2405.09034  [pdf, ps, other

    quant-ph cs.NI

    Entanglement Distribution Delay Optimization in Quantum Networks with Distillation

    Authors: Mahdi Chehimi, Kenneth Goodenough, Walid Saad, Don Towsley, Tony X. Zhou

    Abstract: Quantum networks (QNs) distribute entangled states to enable distributed quantum computing and sensing applications. However, in such QNs, quantum switches (QSs) have limited resources that are highly sensitive to noise and losses and must be carefully allocated to minimize entanglement distribution delay. In this paper, a QS resource allocation framework is proposed, which jointly optimizes the a… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures

  43. arXiv:2405.04975  [pdf, other

    cs.SE

    Prototype2Code: End-to-end Front-end Code Generation from UI Design Prototypes

    Authors: Shuhong Xiao, Yunnong Chen, Jiazhi Li, Liuqing Chen, Lingyun Sun, Tingting Zhou

    Abstract: UI-to-code technology has streamlined the front-end development process, reducing repetitive tasks for engineers. prior research mainly use design prototypes as inputs, with the effectiveness of the generated code heavily dependent on these prototypes' quality, leading to compromised robustness. Moreover, these approaches also exhibit shortcomings in code quality, including issues such as disorgan… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 11 pages, 6 figures

  44. arXiv:2405.04371  [pdf, other

    cs.SI cs.AI cs.CY

    Community Detection for Heterogeneous Multiple Social Networks

    Authors: Ziqing Zhu, Guan Yuan, Tao Zhou, Jiuxin Cao

    Abstract: The community plays a crucial role in understanding user behavior and network characteristics in social networks. Some users can use multiple social networks at once for a variety of objectives. These users are called overlapping users who bridge different social networks. Detecting communities across multiple social networks is vital for interaction mining, information diffusion, and behavior mig… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: This paper was accepted by IEEE Transactions on Computational Social Systems(TCSS)

  45. arXiv:2405.03974  [pdf, other

    cs.CR cs.AI cs.LG

    TBNet: A Neural Architectural Defense Framework Facilitating DNN Model Protection in Trusted Execution Environments

    Authors: Ziyu Liu, Tong Zhou, Yukui Luo, Xiaolin Xu

    Abstract: Trusted Execution Environments (TEEs) have become a promising solution to secure DNN models on edge devices. However, the existing solutions either provide inadequate protection or introduce large performance overhead. Taking both security and performance into consideration, this paper presents TBNet, a TEE-based defense framework that protects DNN model from a neural architectural perspective. Sp… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: DAC2024

  46. arXiv:2405.03946  [pdf

    cs.SI

    Association between centrality and flourishing trait: analyzing student co-occurrence networks drawn from dining activities

    Authors: Yi Cao, Shimin Cai, Xiaorong Shen, Tao Zhou

    Abstract: Comprehending the association between social capabilities and individual psychological traits is paramount for educational administrators. Presently, many studies heavily depend on online questionnaires and self-reported data, while analysis of the connection between offline social networks and mental health status remains scarce. By leveraging a public dataset encompassing on-campus dining activi… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 14 pages, 2 figures, 1 Table

  47. arXiv:2405.03082  [pdf, other

    cs.LG

    Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning

    Authors: Tianchen Zhou, FNU Hairi, Haibo Yang, Jia Liu, Tian Tong, Fan Yang, Michinari Momma, Yan Gao

    Abstract: Reinforcement learning with multiple, potentially conflicting objectives is pervasive in real-world applications, while this problem remains theoretically under-explored. This paper tackles the multi-objective reinforcement learning (MORL) problem and introduces an innovative actor-critic algorithm named MOAC which finds a policy by iteratively making trade-offs among conflicting reward signals. N… ▽ More

    Submitted 9 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted in ICML 2024

  48. arXiv:2405.00700  [pdf

    cs.NE cond-mat.str-el

    Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

    Authors: Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Yaping Li, Shuo Wu, Shanguang Zhao, Jinglin Zhu, Meiling Liu, Zhihan Lin, Bowen Sun, Jianjun Li, Fangwen Sun, Chongwen Zou

    Abstract: Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from pr… ▽ More

    Submitted 16 April, 2024; originally announced May 2024.

    Comments: 18 pages,4 figures

  49. arXiv:2404.19475  [pdf, other

    cs.CV

    TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models

    Authors: Teng Zhou, Yongchuan Tang

    Abstract: Diffusion models have emerged as effective tools for generating diverse and high-quality content. However, their capability in high-resolution image generation, particularly for panoramic images, still faces challenges such as visible seams and incoherent transitions. In this paper, we propose TwinDiffusion, an optimized framework designed to address these challenges through two key innovations: t… ▽ More

    Submitted 6 July, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  50. arXiv:2404.15163  [pdf, other

    cs.CV eess.IV

    Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

    Authors: Tianwei Zhou, Songbai Tan, Wei Zhou, Yu Luo, Yuan-Gen Wang, Guanghui Yue

    Abstract: With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a nov… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: IEEE Transactions on Broadcasting (TBC)