(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 930 results for author: Tan, X

.
  1. arXiv:2409.06874  [pdf, other

    astro-ph.SR astro-ph.EP

    The only inflated brown dwarf in an eclipsing white dwarf-brown dwarf binary: WD1032+011B

    Authors: Jenni R. French, Sarah L. Casewell, Rachael C. Amaro, Joshua D. Lothringer, L. C. Mayorga, Stuart P. Littlefair, Ben W. P. Lew, Yifan Zhou, Daniel Apai, Mark S. Marley, Vivien Parmentier, Xianyu Tan

    Abstract: Due to their short orbital periods and relatively high flux ratios, irradiated brown dwarfs in binaries with white dwarfs offer better opportunities to study irradiated atmospheres than hot Jupiters, which have lower planet-to-star flux ratios. WD1032+011 is an eclipsing, tidally locked white dwarf-brown dwarf binary with a 9950 K white dwarf orbited by a 69.7 M$_{Jup}$ brown dwarf in a 0.09 day o… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 20 pages, 23 figures. Accepted for Publication in MNRAS

  2. arXiv:2409.05933  [pdf, other

    cs.LG

    Self-Supervised State Space Model for Real-Time Traffic Accident Prediction Using eKAN Networks

    Authors: Xin Tan, Meng Zhao

    Abstract: Accurate prediction of traffic accidents across different times and regions is vital for public safety. However, existing methods face two key challenges: 1) Generalization: Current models rely heavily on manually constructed multi-view structures, like POI distributions and road network densities, which are labor-intensive and difficult to scale across cities. 2) Real-Time Performance: While some… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  3. arXiv:2409.05377  [pdf, other

    eess.AS cs.SD

    BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec

    Authors: Detai Xin, Xu Tan, Shinnosuke Takamichi, Hiroshi Saruwatari

    Abstract: We present BigCodec, a low-bitrate neural speech codec. While recent neural speech codecs have shown impressive progress, their performance significantly deteriorates at low bitrates (around 1 kbps). Although a low bitrate inherently restricts performance, other factors, such as model capacity, also hinder further improvements. To address this problem, we scale up the model size to 159M parameters… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 4 pages, 1 figure. Audio samples available at: https://aria-k-alethia.github.io/bigcodec-demo/

  4. arXiv:2409.04744  [pdf, other

    cs.LG cs.AI

    LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs

    Authors: Yongxin Deng, Xihe Qiu, Xiaoyu Tan, Wei Chu, Yinghui Xu

    Abstract: The uncertainty inherent in the environmental transition model of Reinforcement Learning (RL) necessitates a careful balance between exploration and exploitation to optimize the use of computational resources for accurately estimating an agent's expected reward. Achieving balance in control systems is particularly challenging in scenarios with sparse rewards. However, given the extensive prior kno… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  5. arXiv:2409.04025  [pdf, other

    cs.CV cs.AI

    BFA-YOLO: Balanced multiscale object detection network for multi-view building facade attachments detection

    Authors: Yangguang Chen, Tong Wang, Guanzhou Chen, Kun Zhu, Xiaoliang Tan, Jiaqi Wang, Hong Xie, Wenlin Zhou, Jingyi Zhao, Qing Wang, Xiaolong Luo, Xiaodong Zhang

    Abstract: Detection of building facade attachments such as doors, windows, balconies, air conditioner units, billboards, and glass curtain walls plays a pivotal role in numerous applications. Building facade attachments detection aids in vbuilding information modeling (BIM) construction and meeting Level of Detail 3 (LOD3) standards. Yet, it faces challenges like uneven object distribution, small object det… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: 22 pages

  6. arXiv:2409.03381  [pdf, other

    cs.CL cs.AI

    CogniDual Framework: Self-Training Large Language Models within a Dual-System Theoretical Framework for Improving Cognitive Tasks

    Authors: Yongxin Deng, Xihe Qiu, Xiaoyu Tan, Chao Qu, Jing Pan, Yuan Cheng, Yinghui Xu, Wei Chu

    Abstract: Cognitive psychology investigates perception, attention, memory, language, problem-solving, decision-making, and reasoning. Kahneman's dual-system theory elucidates the human decision-making process, distinguishing between the rapid, intuitive System 1 and the deliberative, rational System 2. Recent advancements have positioned large language Models (LLMs) as formidable tools nearing human-level p… ▽ More

    Submitted 6 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

  7. arXiv:2409.01038  [pdf, other

    cs.RO cs.AI cs.CV

    Robust Vehicle Localization and Tracking in Rain using Street Maps

    Authors: Yu Xiang Tan, Malika Meghjani

    Abstract: GPS-based vehicle localization and tracking suffers from unstable positional information commonly experienced in tunnel segments and in dense urban areas. Also, both Visual Odometry (VO) and Visual Inertial Odometry (VIO) are susceptible to adverse weather conditions that causes occlusions or blur on the visual input. In this paper, we propose a novel approach for vehicle localization that uses st… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Journal ref: IEEE International Conference on Intelligent Transportation Systems, 2024

  8. arXiv:2408.17175  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

    Authors: Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

    Abstract: Recent advancements in audio generation have been significantly propelled by the capabilities of Large Language Models (LLMs). The existing research on audio LLM has primarily focused on enhancing the architecture and scale of audio language models, as well as leveraging larger datasets, and generally, acoustic codecs, such as EnCodec, are used for audio tokenization. However, these codecs were or… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  9. arXiv:2408.16617  [pdf, other

    quant-ph

    Long-Range $ZZ$ Interaction via Resonator-Induced Phase in Superconducting Qubits

    Authors: Xiang Deng, Wen Zheng, Xudong Liao, Haoyu Zhou, Yangyang Ge, Jie Zhao, Dong Lan, Xinsheng Tan, Yu Zhang, Shaoxiong Li, Yang Yu

    Abstract: Superconducting quantum computing emerges as one of leading candidates for achieving quantum advantage. However, a prevailing challenge is the coding overhead due to limited quantum connectivity, constrained by nearest-neighbor coupling among superconducting qubits. Here, we propose a novel multimode coupling scheme using three resonators driven by two microwaves, based on the resonator-induced ph… ▽ More

    Submitted 11 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

    Comments: 7 pages, 4 figures

  10. arXiv:2408.16315  [pdf, other

    cs.HC cs.LG eess.SP

    Passenger hazard perception based on EEG signals for highly automated driving vehicles

    Authors: Ashton Yu Xuan Tan, Yingkai Yang, Xiaofei Zhang, Bowen Li, Xiaorong Gao, Sifa Zheng, Jianqiang Wang, Xinyu Gu, Jun Li, Yang Zhao, Yuxin Zhang, Tania Stathaki

    Abstract: Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  11. arXiv:2408.14340  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Foundation Models for Music: A Survey

    Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan , et al. (17 additional authors not shown)

    Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More

    Submitted 3 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  12. arXiv:2408.11982  [pdf, other

    eess.IV cs.CV cs.MM

    AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

    Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

    Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More

    Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  13. arXiv:2408.10608  [pdf, other

    cs.CL cs.AI

    Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory

    Authors: Yongxin Deng, Xihe Qiu, Xiaoyu Tan, Jing Pan, Chen Jue, Zhijun Fang, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Large language models (LLMs) are trained on extensive text corpora, which inevitably include biased information. Although techniques such as Affective Alignment can mitigate some negative impacts of these biases, existing prompt-based attack methods can still extract these biases from the model's weights. Moreover, these biases frequently appear subtly when LLMs are prompted to perform identical t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  14. arXiv:2408.09606  [pdf, other

    astro-ph.EP astro-ph.SR

    Global weather map reveals persistent top-of-atmosphere features on the nearest brown dwarfs

    Authors: Xueqing Chen, Beth A. Biller, Johanna M. Vos, Ian J. M. Crossfield, Gregory N. Mace, Callie E. Hood, Xianyu Tan, Katelyn N. Allers, Emily C. Martin, Emma Bubb, Jonathan J. Fortney, Caroline V. Morley, Mark Hammond

    Abstract: Brown dwarfs and planetary-mass companions display rotationally modulated photometric variability, especially those near the L/T transition. This variability is commonly attributed to top-of-atmosphere (TOA) inhomogeneities, with proposed models including patchy thick and thin clouds, planetary-scale jets, or chemical disequilibrium. Surface mapping techniques are powerful tools to probe their atm… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 32 pages, 14 figures, accepted for publication in MNRAS

  15. arXiv:2408.09405  [pdf, ps, other

    math.AP math.DG

    Boundary determination of the Riemannian metric from Cauchy data for the Stokes equations

    Authors: Xiaoming Tan

    Abstract: For a compact connected Riemannian manifold of dimension $n$ with smooth boundary, $n\geqslant 2$, we prove that the Cauchy data (or the Dirichlet-to-Neumann map) for the Stokes equations uniquely determines the partial derivatives of all orders of the metric on the boundary of the manifold.

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 13 pages

  16. arXiv:2408.06483  [pdf, other

    cs.GT

    Clock Auctions Augmented with Unreliable Advice

    Authors: Vasilis Gkatzelis, Daniel Schoepflin, Xizhi Tan

    Abstract: We provide the first analysis of clock auctions through the learning-augmented framework. Deferred-acceptance clock auctions are a compelling class of mechanisms satisfying a unique list of highly practical properties, including obvious strategy-proofness, transparency, and unconditional winner privacy, making them particularly well-suited for real-world applications. However, early work that eval… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  17. arXiv:2408.05683  [pdf, other

    cs.CV cs.MM

    Single Image Dehazing Using Scene Depth Ordering

    Authors: Pengyang Ling, Huaian Chen, Xiao Tan, Yimeng Shan, Yi Jin

    Abstract: Images captured in hazy weather generally suffer from quality degradation, and many dehazing methods have been developed to solve this problem. However, single image dehazing problem is still challenging due to its ill-posed nature. In this paper, we propose a depth order guided single image dehazing method, which utilizes depth order in hazy images to guide the dehazing process to achieve a simil… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 14 pages, 15 figures

  18. arXiv:2408.04957   

    cs.CV cs.AI

    LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description

    Authors: Yizhang Jin, Jian Li, Jiangning Zhang, Jianlong Hu, Zhenye Gan, Xin Tan, Yong Liu, Yabiao Wang, Chengjie Wang, Lizhuang Ma

    Abstract: Visual Spatial Description (VSD) aims to generate texts that describe the spatial relationships between objects within images. Traditional visual spatial relationship classification (VSRC) methods typically output the spatial relationship between two objects in an image, often neglecting world knowledge and lacking general language capabilities. In this paper, we propose a Large Language-and-Visio… ▽ More

    Submitted 28 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

    Comments: We have discovered a significant error in the paper that affects the main conclusions. To ensure the accuracy of our research, we have decided to withdraw this paper and will resubmit it after making the necessary corrections

  19. arXiv:2408.03584  [pdf, other

    astro-ph.CO astro-ph.GA

    Constraints on Primordial Magnetic Fields from High Redshift Stellar Mass Density

    Authors: Qile Zhang, Shang Li, Xiu-Hui Tan, Jun-Qing Xia

    Abstract: Primordial magnetic fields (PMFs) play a pivotal role in influencing small-scale fluctuations within the primordial density field, thereby enhancing the matter power spectrum within the context of the $Λらむだ$CDM model at small scales. These amplified fluctuations accelerate the early formation of galactic halos and stars, which can be observed through advanced high-redshift observational techniques. T… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  20. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  21. arXiv:2407.21581  [pdf, other

    cs.CV

    InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios

    Authors: Xiaofei Zhang, Yining Li, Jinping Wang, Xiangyi Qin, Ying Shen, Zhengping Fan, Xiaojun Tan

    Abstract: Perception systems of autonomous vehicles are susceptible to occlusion, especially when examined from a vehicle-centric perspective. Such occlusion can lead to overlooked object detections, e.g., larger vehicles such as trucks or buses may create blind spots where cyclists or pedestrians could be obscured, accentuating the safety concerns associated with such perception system limitations. To miti… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  22. Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy

    Authors: Xiaoheng Tan, Jiabin Zhang, Yuhui Quan, Jing Li, Yajing Wu, Zilin Bian

    Abstract: Deep Video Quality Assessment (VQA) methods have shown impressive high-performance capabilities. Notably, no-reference (NR) VQA methods play a vital role in situations where obtaining reference videos is restricted or not feasible. Nevertheless, as more streaming videos are being created in ultra-high definition (e.g., 4K) to enrich viewers' experiences, the current deep VQA methods face unaccepta… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  23. arXiv:2407.17164  [pdf, other

    cs.LG cs.AI

    Robust Deep Hawkes Process under Label Noise of Both Event and Occurrence

    Authors: Xiaoyu Tan, Bin Li, Xihe Qiu, Jingjing Huang, Yinghui Xu, Wei Chu

    Abstract: Integrating deep neural networks with the Hawkes process has significantly improved predictive capabilities in finance, health informatics, and information technology. Nevertheless, these models often face challenges in real-world settings, particularly due to substantial label noise. This issue is of significant concern in the medical field, where label noise can arise from delayed updates in ele… ▽ More

    Submitted 29 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: ECAI2024

  24. arXiv:2407.16364  [pdf, other

    cs.CV

    Harmonizing Visual Text Comprehension and Generation

    Authors: Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, Yuan Xie

    Abstract: In this work, we present TextHarmony, a unified and versatile multimodal generative model proficient in comprehending and generating visual text. Simultaneously generating images and texts typically results in performance degradation due to the inherent inconsistency between vision and language modalities. To overcome this challenge, existing approaches resort to modality-specific data for supervi… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  25. arXiv:2407.15808  [pdf, other

    quant-ph cond-mat.mtrl-sci physics.comp-ph

    Quantum Computing for Phonon Scattering Effects on Thermal Conductivity

    Authors: Xiangjun Tan

    Abstract: Recent investigations have demonstrated that multi-phonon scattering processes substantially influence the thermal conductivity of materials, posing significant computational challenges for classical simulations as the complexity of phonon modes escalates. This study examines the potential of quantum simulations to address these challenges, utilizing Noisy Intermediate Scale Quantum era (NISQ) qua… ▽ More

    Submitted 28 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  26. arXiv:2407.15334  [pdf, other

    cs.CV

    Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection

    Authors: Yiran Yang, Xu Gao, Tong Wang, Xin Hao, Yifeng Shi, Xiao Tan, Xiaoqing Ye, Jingdong Wang

    Abstract: Camera and LiDAR serve as informative sensors for accurate and robust autonomous driving systems. However, these sensors often exhibit heterogeneous natures, resulting in distributional modality gaps that present significant challenges for fusion. To address this, a robust fusion technique is crucial, particularly for enhancing 3D object detection. In this paper, we introduce a dynamic adjustment… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  27. arXiv:2407.14562  [pdf, other

    cs.AI cs.CL

    Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought

    Authors: Xiaoyu Tan, Yongxin Deng, Xihe Qiu, Weidi Xu, Chao Qu, Wei Chu, Yinghui Xu, Yuan Qi

    Abstract: Large language models (LLMs) have shown exceptional performance as general-purpose assistants, excelling across a variety of reasoning tasks. This achievement represents a significant step toward achieving artificial general intelligence (AGI). Despite these advancements, the effectiveness of LLMs often hinges on the specific prompting strategies employed, and there remains a lack of a robust fram… ▽ More

    Submitted 10 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    ACM Class: I.2.7

  28. arXiv:2407.13388  [pdf, other

    astro-ph.EP

    Evidence for Nightside Water Emission Found in Transit of Ultrahot Jupiter WASP-33b

    Authors: Yuanheng Yang, Guo Chen, Fei Yan, Xianyu Tan, Jianghui Ji

    Abstract: To date, the dayside thermal structure of ultrahot Jupiters (UHJs) is generally considered to be inverted, but their nightside thermal structure has been less explored. Here we explore the impact of nightside thermal emission on high-resolution infrared transmission spectroscopy, which should not be neglected, especially for UHJs. We present a general equation for the high-resolution transmission… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 17 pages, 9 figures, 1 table. Accepted for publication in ApJL

  29. arXiv:2407.12758  [pdf, other

    cs.CV

    Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification

    Authors: Zhizhong Zhang, Jiangming Wang, Xin Tan, Yanyun Qu, Junping Wang, Yong Xie, Yuan Xie

    Abstract: Unsupervised visible infrared person re-identification (USVI-ReID) is a challenging retrieval task that aims to retrieve cross-modality pedestrian images without using any label information. In this task, the large cross-modality variance makes it difficult to generate reliable cross-modality labels, and the lack of annotations also provides additional difficulties for learning modality-invariant… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  30. arXiv:2407.12532  [pdf, other

    cs.CL cs.AI

    Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models

    Authors: Xihe Qiu, Haoyu Wang, Xiaoyu Tan, Chao Qu, Yujie Xiong, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Effective collaboration in multi-agent systems requires communicating goals and intentions between agents. Current agent frameworks often suffer from dependencies on single-agent execution and lack robust inter-module communication, frequently leading to suboptimal multi-agent reinforcement learning (MARL) policies and inadequate task coordination. To address these challenges, we present a framewo… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  31. arXiv:2407.12522  [pdf, other

    cs.CL cs.AI

    Struct-X: Enhancing Large Language Models Reasoning with Structured Data

    Authors: Xiaoyu Tan, Haoyu Wang, Xihe Qiu, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Structured data, rich in logical and relational information, has the potential to enhance the reasoning abilities of large language models (LLMs). Still, its integration poses a challenge due to the risk of overwhelming LLMs with excessive tokens and irrelevant context information. To address this, we propose Struct-X, a novel framework that operates through five key phases: ``read-model-fill-refl… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  32. arXiv:2407.10753  [pdf, other

    cs.CV

    OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

    Authors: Jinghua Hou, Tong Wang, Xiaoqing Ye, Zhe Liu, Shi Gong, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai

    Abstract: Accurate depth information is crucial for enhancing the performance of multi-view 3D object detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise depth supervision, they overlook two significant phenomena: 1) the depth supervision obtained from LiDAR points is usually distributed on the surface of the object, which is not so friendly to existing DETR-based 3D… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  33. arXiv:2407.09793  [pdf, other

    cs.SE

    Uncovering Weaknesses in Neural Code Generation

    Authors: Xiaoli Lian, Shuaisong Wang, Jieping Ma, Fang Liu, Xin Tan, Li Zhang, Lin Shi, Cuiyun Gao

    Abstract: Code generation, the task of producing source code from prompts, has seen significant advancements with the advent of pre-trained large language models (PLMs). Despite these achievements, there lacks a comprehensive taxonomy of weaknesses about the benchmark and the generated code, which risks the community's focus on known issues at the cost of under-explored areas. Our systematic study aims to… ▽ More

    Submitted 17 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

  34. arXiv:2407.09194  [pdf, other

    astro-ph.SR astro-ph.EP

    The JWST Weather Report from the Nearest Brown Dwarfs I: multi-period JWST NIRSpec + MIRI monitoring of the benchmark binary brown dwarf WISE 1049AB

    Authors: Beth A. Biller, Johanna M. Vos, Yifan Zhou, Allison M. McCarthy, Xianyu Tan, Ian J. M. Crossfield, Niall Whiteford, Genaro Suarez, Jacqueline Faherty, Elena Manjavacas, Xueqing Chen, Pengyu Liu, Ben J. Sutlieff, Mary Anne Limbach, Paul Molliere, Trent J. Dupuy, Natalia Oliveros-Gomez, Philip S. Muirhead, Thomas Henning, Gregory Mace, Nicolas Crouzet, Theodora Karalidi, Caroline V. Morley, Pascal Tremblin, Tiffany Kataria

    Abstract: We report results from 8 hours of JWST/MIRI LRS spectroscopic monitoring directly followed by 7 hours of JWST/NIRSpec prism spectroscopic monitoring of the benchmark binary brown dwarf WISE 1049AB, the closest, brightest brown dwarfs known. We find water, methane, and CO absorption features in both components, including the 3.3 $μみゅー$m methane absorption feature and a tentative detection of small gra… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 28 pages, 27 figures, accepted to MNRAS

  35. arXiv:2407.08975  [pdf, other

    cs.AR cs.ET

    Hybrid Temporal Computing for Lower Power Hardware Accelerators

    Authors: Maliha Tasnim, Sachin Sachdeva, Yibo Liu, Sheldon X. -D. Tan

    Abstract: In this paper, we propose a new hybrid temporal computing (HTC) framework that leverages both pulse rate and temporal data encoding to design ultra-low energy hardware accelerators. Our approach is inspired by the recently proposed temporal computing, or race logic, which encodes data values as single delays, leading to significantly lower energy consumption due to minimized signal switching. Howe… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 7 pages, 8 figures and 3 tables

  36. arXiv:2407.07465  [pdf, other

    cs.CV

    Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining

    Authors: Tianfang Sun, Zhizhong Zhang, Xin Tan, Yanyun Qu, Yuan Xie

    Abstract: LiDAR-camera 3D representation pretraining has shown significant promise for 3D perception tasks and related applications. However, two issues widely exist in this framework: 1) Solely keyframes are used for training. For example, in nuScenes, a substantial quantity of unpaired LiDAR and camera frames remain unutilized, limiting the representation capabilities of the pretrained network. 2) The con… ▽ More

    Submitted 17 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: preprint, version 2

  37. arXiv:2407.05679  [pdf, other

    cs.CV cs.AI

    BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

    Authors: Yumeng Zhang, Shi Gong, Kaixin Xiong, Xiaoqing Ye, Xiao Tan, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

    Abstract: World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence… ▽ More

    Submitted 18 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: 10 pages

  38. arXiv:2407.05305  [pdf, other

    cs.AI

    MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

    Authors: Rui Xu, Dakuan Lu, Xiaoyu Tan, Xintao Wang, Siyu Yuan, Jiangjie Chen, Wei Chu, Xu Yinghui

    Abstract: Large language models~(LLMs) have demonstrated impressive performance in various applications, among which role-playing language agents (RPLAs) have engaged a broad user base. Now, there is a growing demand for RPLAs that represent Key Opinion Leaders (KOLs), \ie, Internet celebrities who shape the trends and opinions in their domains. However, research in this line remains underexplored. In this… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  39. arXiv:2407.05239  [pdf, other

    cs.DS cs.NI

    Competitive Analysis of Online Path Selection: Impacts of Path Length, Topology, and System-Level Costs

    Authors: Ying Cao, Siyuan Yu, Xiaoqi Tan, Danny H. K. Tsang

    Abstract: Consider a communication network to which a sequence of self-interested users come and send requests for data transmission between nodes. This work studies the question of how to guide the path selection choices made by those online-arriving users and maximize the social welfare. Competitive analysis is the main technical tool. Specifically, the impacts of path length bounds and topology on the co… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  40. arXiv:2407.04946  [pdf, ps, other

    math.DG math.AP

    Boundary determination of the Riemannian metric by the elastic Dirichlet-to-Neumann map

    Authors: Xiaoming Tan

    Abstract: For a compact connected Riemannian manifold with smooth boundary, by computing the full symbol of the elastic Dirichlet-to-Neumann map, we prove that the elastic Dirichlet-to-Neumann map can uniquely determine the partial derivatives of all orders of the Riemannian metric on the boundary of the manifold.

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 pages

  41. arXiv:2407.00486  [pdf, other

    cs.CL

    Towards Massive Multilingual Holistic Bias

    Authors: Xiaoqing Ellen Tan, Prangthip Hansanti, Carleigh Wood, Bokai Yu, Christophe Ropers, Marta R. Costa-jussà

    Abstract: In the current landscape of automatic language generation, there is a need to understand, evaluate, and mitigate demographic biases as existing models are becoming increasingly multilingual. To address this, we present the initial eight languages from the MASSIVE MULTILINGUAL HOLISTICBIAS (MMHB) dataset and benchmark consisting of approximately 6 million sentences representing 13 demographic axes.… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    ACM Class: I.2.7

  42. arXiv:2407.00326  [pdf, other

    cs.DC cs.AI cs.NI

    Teola: Towards End-to-End Optimization of LLM-based Applications

    Authors: Xin Tan, Yimin Jiang, Yitao Yang, Hong Xu

    Abstract: Large language model (LLM)-based applications consist of both LLM and non-LLM components, each contributing to the end-to-end latency. Despite great efforts to optimize LLM inference, end-to-end workflow optimization has been overlooked. Existing frameworks employ coarse-grained orchestration with task modules, which confines optimizations to within each module and yields suboptimal scheduling dec… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  43. arXiv:2407.00136  [pdf, other

    hep-ex

    Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-ηいーた_c$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

    Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψぷさい(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-ηいーた_c$ with a statistical significance of $5.4σしぐま$. We measure the ratio of the branching fractions… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

  44. arXiv:2406.19969  [pdf, other

    q-bio.QM

    Enhancing Terrestrial Net Primary Productivity Estimation with EXP-CASA: A Novel Light Use Efficiency Model Approach

    Authors: Guanzhou Chen, Kaiqi Zhang, Xiaodong Zhang, Hong Xie, Haobo Yang, Xiaoliang Tan, Tong Wang, Yule Ma, Qing Wang, Jinzhou Cao, Weihong Cui

    Abstract: The Light Use Efficiency model, epitomized by the CASA model, is extensively applied in the quantitative estimation of vegetation Net Primary Productivity. However, the classic CASA model is marked by significant complexity: the estimation of environmental stress parameters, in particular, necessitates multi-source observation data, adding to the complexity and uncertainty of the model's operation… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  45. arXiv:2406.18449  [pdf, other

    cs.CL cs.AI

    Cascading Large Language Models for Salient Event Graph Generation

    Authors: Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He

    Abstract: Generating event graphs from long documents is challenging due to the inherent complexity of multiple tasks involved such as detecting events, identifying their relationships, and reconciling unstructured input with structured graphs. Recent studies typically consider all events with equal importance, failing to distinguish salient events crucial for understanding narratives. This paper presents C… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 9 + 12 pages

  46. arXiv:2406.18009  [pdf, other

    eess.AS cs.SD

    E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

    Abstract: This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the… ▽ More

    Submitted 12 September, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted to SLT 2024. Added evaluation data, see https://github.com/microsoft/e2tts-test-suite for more details

  47. arXiv:2406.14228  [pdf, other

    cs.AI

    EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

    Authors: Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Dongsheng Li, Deqing Yang

    Abstract: The rise of powerful large language models (LLMs) has spurred a new trend in building LLM-based autonomous agents for solving complex tasks, especially multi-agent systems. Despite the remarkable progress, we notice that existing works are heavily dependent on human-designed frameworks, which greatly limits the functional scope and scalability of agent systems. How to automatically extend the spec… ▽ More

    Submitted 11 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in process

  48. PIG: Prompt Images Guidance for Night-Time Scene Parsing

    Authors: Zhifeng Xie, Rui Qiu, Sen Wang, Xin Tan, Yuan Xie, Lizhuang Ma

    Abstract: Night-time scene parsing aims to extract pixel-level semantic information in night images, aiding downstream tasks in understanding scene object distribution. Due to limited labeled night image datasets, unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes. UDA typically relies on paired day-night image pairs to guide adaptation, but this approach hamper… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: This paper is accepted by IEEE TIP. Code: https://github.com/qiurui4shu/PIG

  49. arXiv:2406.10056  [pdf, other

    cs.SD eess.AS

    UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

    Authors: Dongchao Yang, Haohan Guo, Yuanyuan Wang, Rongjie Huang, Xiang Li, Xu Tan, Xixin Wu, Helen Meng

    Abstract: The Large Language models (LLMs) have demonstrated supreme capabilities in text understanding and generation, but cannot be directly applied to cross-modal tasks without fine-tuning. This paper proposes a cross-modal in-context learning approach, empowering the frozen LLMs to achieve multiple audio tasks in a few-shot style without any parameter update. Specifically, we propose a novel and LLMs-dr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  50. arXiv:2406.09641  [pdf, other

    astro-ph.EP

    Phase-resolving the absorption signatures of water and carbon monoxide in the atmosphere of the ultra-hot Jupiter WASP-121b with GEMINI-S/IGRINS

    Authors: Joost P. Wardenier, Vivien Parmentier, Michael R. Line, Megan Weiner Mansfield, Xianyu Tan, Shang-Min Tsai, Jacob L. Bean, Jayne L. Birkby, Matteo Brogi, Jean-Michel Désert, Siddharth Gandhi, Elspeth K. H. Lee, Colette I. Levens, Lorenzo Pino, Peter C. B. Smith

    Abstract: Ultra-hot Jupiters are among the best targets for atmospheric characterization at high spectral resolution. Resolving their transmission spectra as a function of orbital phase offers a unique window into the 3D nature of these objects. In this work, we present three transits of the ultra-hot Jupiter WASP-121b observed with Gemini-S/IGRINS. For the first time, we measure the phase-dependent absorpt… ▽ More

    Submitted 18 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 24 pages, 16 figures, accepted for publication in PASP