-
On the (In)Security of LLM App Stores
Authors:
Xinyi Hou,
Yanjie Zhao,
Haoyu Wang
Abstract:
LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with exploitable vulnerabilities. Over five months, we co…
▽ More
LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with exploitable vulnerabilities. Over five months, we collected 786,036 LLM apps from six major app stores: GPT Store, FlowGPT, Poe, Coze, Cici, and Character.AI. Our research integrates static and dynamic analysis, the development of a large-scale toxic word dictionary (i.e., ToxicDict) comprising over 31,783 entries, and automated monitoring tools to identify and mitigate threats. We uncovered that 15,146 apps had misleading descriptions, 1,366 collected sensitive personal information against their privacy policies, and 15,996 generated harmful content such as hate speech, self-harm, extremism, etc. Additionally, we evaluated the potential for LLM apps to facilitate malicious activities, finding that 616 apps could be used for malware generation, phishing, etc. Our findings highlight the urgent need for robust regulatory frameworks and enhanced enforcement mechanisms.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL
Authors:
Zhenhe Wu,
Zhongqiu Li,
Jie Zhang,
Mengxiang Li,
Yu Zhao,
Ruiyu Fang,
Zhongjiang He,
Xuelong Li,
Zhoujun Li,
Shuangyong Song
Abstract:
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v…
▽ More
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting valuable information for more efficient prompt engineering. Based on above analysis, we propose RB-SQL, a novel retrieval-based LLM framework for in-context prompt engineering, which consists of three modules that retrieve concise tables and columns as schema, and targeted examples for in-context learning. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Metasurface-based Snapshot Shortwave-Infrared Hyperspectral Image Reconstruction with Inter and Intra Prior Learning Network
Authors:
Linqiang Li,
Jinglei Hao,
Yongqiang Zhao,
Pan Liu,
Haofang Yan,
Ziqin Zhang,
Seong G. Kong
Abstract:
Shortwave-infrared(SWIR) spectral information,ranging from 1 μm to 2.5μm, breaks the limitations of traditional color cameras in acquiring scene information and has been used in many fields. However, conventional SWIR hyperspectral imaging systems face challenges due to their bulky setups and low acquisition speed. In this work, we introduce a snapshot SWIR hyperspectral imaging system based on a…
▽ More
Shortwave-infrared(SWIR) spectral information,ranging from 1 μm to 2.5μm, breaks the limitations of traditional color cameras in acquiring scene information and has been used in many fields. However, conventional SWIR hyperspectral imaging systems face challenges due to their bulky setups and low acquisition speed. In this work, we introduce a snapshot SWIR hyperspectral imaging system based on a metasurface filter and a corresponding filter selection method to achieve the lowest correlation coefficient among these filters.This systemhas the advantages of small size and snapshot imaging. We propose a novel inter and intra prior learning unfolding framework proposed to achieve high-quality SWIR hyperspectral image reconstruction, which bridges the gap between prior learning and cross-stage information interaction. We also design an adaptive feature transfer mechanism to adaptively the transfer contextual correlation of multi-scale encoder features to prevent detailed information loss in the decoder. Experiment results demonstrate that our method can reconstruct HSI with high speed and superior performance over existing methods.
△ Less
Submitted 10 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting
Authors:
Luoxiao Yang,
Yun Wang,
Xinqi Fan,
Israel Cohen,
Yue Zhao,
Zijun Zhang
Abstract:
The success of large pretrained models in natural language processing (NLP) and computer vision (CV) has opened new avenues for constructing foundation models for time series forecasting (TSF). Traditional TSF foundation models rely heavily on numerical data fitting. In contrast, the human brain is inherently skilled at processing visual information, prefer predicting future trends by observing vi…
▽ More
The success of large pretrained models in natural language processing (NLP) and computer vision (CV) has opened new avenues for constructing foundation models for time series forecasting (TSF). Traditional TSF foundation models rely heavily on numerical data fitting. In contrast, the human brain is inherently skilled at processing visual information, prefer predicting future trends by observing visualized sequences. From a biomimetic perspective, utilizing models to directly process numerical sequences might not be the most effective route to achieving Artificial General Intelligence (AGI). This paper proposes ViTime, a novel Visual Intelligence-based foundation model for TSF. ViTime overcomes the limitations of numerical time series data fitting by utilizing visual data processing paradigms and employs a innovative data synthesis method during training, called Real Time Series (RealTS). Experiments on a diverse set of previously unseen forecasting datasets demonstrate that ViTime achieves state-of-the-art zero-shot performance, even surpassing the best individually trained supervised models in some situations. These findings suggest that visual intelligence can significantly enhance time series analysis and forecasting, paving the way for more advanced and versatile models in the field. The code for our framework is accessible at https://github.com/IkeYang/ViTime.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
TrackFormers: In Search of Transformer-Based Particle Tracking for the High-Luminosity LHC Era
Authors:
Sascha Caron,
Nadezhda Dobreva,
Antonio Ferrer Sánchez,
José D. Martín-Guerrero,
Uraz Odyurt,
Roberto Ruiz de Austri Bazan,
Zef Wolffs,
Yue Zhao
Abstract:
High-Energy Physics experiments are facing a multi-fold data increase with every new iteration. This is certainly the case for the upcoming High-Luminosity LHC upgrade. Such increased data processing requirements forces revisions to almost every step of the data processing pipeline. One such step in need of an overhaul is the task of particle track reconstruction, a.k.a., tracking. A Machine Learn…
▽ More
High-Energy Physics experiments are facing a multi-fold data increase with every new iteration. This is certainly the case for the upcoming High-Luminosity LHC upgrade. Such increased data processing requirements forces revisions to almost every step of the data processing pipeline. One such step in need of an overhaul is the task of particle track reconstruction, a.k.a., tracking. A Machine Learning-assisted solution is expected to provide significant improvements, since the most time-consuming step in tracking is the assignment of hits to particles or track candidates. This is the topic of this paper.
We take inspiration from large language models. As such, we consider two approaches: the prediction of the next word in a sentence (next hit point in a track), as well as the one-shot prediction of all hits within an event. In an extensive design effort, we have experimented with three models based on the Transformer architecture and one model based on the U-Net architecture, performing track association predictions for collision event hit points. In our evaluation, we consider a spectrum of simple to complex representations of the problem, eliminating designs with lower metrics early on. We report extensive results, covering both prediction accuracy (score) and computational performance. We have made use of the REDVID simulation framework, as well as reductions applied to the TrackML data set, to compose five data sets from simple to complex, for our experiments. The results highlight distinct advantages among different designs in terms of prediction accuracy and computational performance, demonstrating the efficiency of our methodology. Most importantly, the results show the viability of a one-shot encoder-classifier based Transformer solution as a practical approach for the task of tracking.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge
Authors:
Sriram Yenamandra,
Arun Ramachandran,
Mukul Khanna,
Karmesh Yadav,
Jay Vakil,
Andrew Melnik,
Michael Büttner,
Leon Harz,
Lyon Brown,
Gora Chand Nandi,
Arjun PS,
Gaurav Kumar Yadav,
Rahul Kala,
Robert Haschke,
Yang Luo,
Jinxin Zhu,
Yansen Han,
Bingyi Lu,
Xuan Gu,
Qinyuan Liu,
Yaping Zhao,
Qiting Ye,
Chenxiao Dou,
Yansong Chua,
Volodymyr Kuzma
, et al. (20 additional authors not shown)
Abstract:
In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface withi…
▽ More
In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface within that environment. We organized a NeurIPS 2023 competition featuring both simulation and real-world components to evaluate solutions to this task. Our baselines on the most challenging version of this task, using real perception in simulation, achieved only an 0.8% success rate; by the end of the competition, the best participants achieved an 10.8\% success rate, a 13x improvement. We observed that the most successful teams employed a variety of methods, yet two common threads emerged among the best solutions: enhancing error detection and recovery, and improving the integration of perception with decision-making processes. In this paper, we detail the results and methodologies used, both in simulation and real-world settings. We discuss the lessons learned and their implications for future research. Additionally, we compare performance in real and simulated environments, emphasizing the necessity for robust generalization to novel settings.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning
Authors:
Jesse Jiang,
Samuel Coogan,
Ye Zhao
Abstract:
This study examines the problem of hopping robot navigation planning to achieve simultaneous goal-directed and environment exploration tasks. We consider a scenario in which the robot has mandatory goal-directed tasks defined using Linear Temporal Logic (LTL) specifications as well as optional exploration tasks represented using a reward function. Additionally, there exists uncertainty in the robo…
▽ More
This study examines the problem of hopping robot navigation planning to achieve simultaneous goal-directed and environment exploration tasks. We consider a scenario in which the robot has mandatory goal-directed tasks defined using Linear Temporal Logic (LTL) specifications as well as optional exploration tasks represented using a reward function. Additionally, there exists uncertainty in the robot dynamics which results in motion perturbation. We first propose an abstraction of 3D hopping robot dynamics which enables high-level planning and a neural-network-based optimization for low-level control. We then introduce a Multi-task Product IMDP (MT-PIMDP) model of the system and tasks. We propose a unified control policy synthesis algorithm which enables both task-directed goal-reaching behaviors as well as task-agnostic exploration to learn perturbations and reward. We provide a formal proof of the trade-off induced by prioritizing either LTL or RL actions. We demonstrate our methods with simulation case studies in a 2D world navigation environment.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Efficient Stochastic Routing in Path-Centric Uncertain Road Networks -- Extended Version
Authors:
Chenjuan Guo,
Ronghui Xu,
Bin Yang,
Ye Yuan,
Tung Kieu,
Yan Zhao,
Christian S. Jensen
Abstract:
The availability of massive vehicle trajectory data enables the modeling of road-network constrained movement as travel-cost distributions rather than just single-valued costs, thereby capturing the inherent uncertainty of movement and enabling improved routing quality. Thus, stochastic routing has been studied extensively in the edge-centric model, where such costs are assigned to the edges in a…
▽ More
The availability of massive vehicle trajectory data enables the modeling of road-network constrained movement as travel-cost distributions rather than just single-valued costs, thereby capturing the inherent uncertainty of movement and enabling improved routing quality. Thus, stochastic routing has been studied extensively in the edge-centric model, where such costs are assigned to the edges in a graph representation of a road network. However, as this model still disregards important information in trajectories and fails to capture dependencies among cost distributions, a path-centric model, where costs are assigned to paths, has been proposed that captures dependencies better and provides an improved foundation for routing. Unfortunately, when applied in this model, existing routing algorithms are inefficient due to two shortcomings that we eliminate. First, when exploring candidate paths, existing algorithms only consider the costs of candidate paths from the source to intermediate vertices, while disregarding the costs of travel from the intermediate vertices to the destination, causing many non-competitive paths to be explored. We propose two heuristics for estimating the cost from an intermediate vertex to the destination, thus improving routing efficiency. Second, the edge-centric model relies on stochastic dominance-based pruning to improve efficiency. This pruning assumes that costs are independent and is therefore inapplicable in the path-centric model that takes dependencies into account. We introduce a notion of virtual path that effectively enables stochastic dominance-based pruning in the path-based model, thus further improving efficiency. Empirical studies using two real-world trajectory sets offer insight into the properties of the proposed solution, indicating that it enables efficient stochastic routing in the path-centric model.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Training-free CryoET Tomogram Segmentation
Authors:
Yizhou Zhao,
Hengwei Bian,
Michael Mu,
Mostofa R. Uddin,
Zhenyang Li,
Xiang Li,
Tianyang Wang,
Min Xu
Abstract:
Cryogenic Electron Tomography (CryoET) is a useful imaging technology in structural biology that is hindered by its need for manual annotations, especially in particle picking. Recent works have endeavored to remedy this issue with few-shot learning or contrastive learning techniques. However, supervised training is still inevitable for them. We instead choose to leverage the power of existing 2D…
▽ More
Cryogenic Electron Tomography (CryoET) is a useful imaging technology in structural biology that is hindered by its need for manual annotations, especially in particle picking. Recent works have endeavored to remedy this issue with few-shot learning or contrastive learning techniques. However, supervised training is still inevitable for them. We instead choose to leverage the power of existing 2D foundation models and present a novel, training-free framework, CryoSAM. In addition to prompt-based single-particle instance segmentation, our approach can automatically search for similar features, facilitating full tomogram semantic segmentation with only one prompt. CryoSAM is composed of two major parts: 1) a prompt-based 3D segmentation system that uses prompts to complete single-particle instance segmentation recursively with Cross-Plane Self-Prompting, and 2) a Hierarchical Feature Matching mechanism that efficiently matches relevant features with extracted tomogram features. They collaborate to enable the segmentation of all particles of one category with just one particle-specific prompt. Our experiments show that CryoSAM outperforms existing works by a significant margin and requires even fewer annotations in particle picking. Further visualizations demonstrate its ability when dealing with full tomogram segmentation for various subcellular structures. Our code is available at: https://github.com/xulabs/aitom
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
LLM for Mobile: An Initial Roadmap
Authors:
Daihang Chen,
Yonghui Liu,
Mingyi Zhou,
Yanjie Zhao,
Haoyu Wang,
Shuai Wang,
Xiao Chen,
Tegawendé F. Bissyandé,
Jacques Klein,
Li Li
Abstract:
When mobile meets LLMs, mobile app users deserve to have more intelligent usage experiences. For this to happen, we argue that there is a strong need to appl LLMs for the mobile ecosystem. We therefore provide a research roadmap for guiding our fellow researchers to achieve that as a whole. In this roadmap, we sum up six directions that we believe are urgently required for research to enable nativ…
▽ More
When mobile meets LLMs, mobile app users deserve to have more intelligent usage experiences. For this to happen, we argue that there is a strong need to appl LLMs for the mobile ecosystem. We therefore provide a research roadmap for guiding our fellow researchers to achieve that as a whole. In this roadmap, we sum up six directions that we believe are urgently required for research to enable native intelligence in mobile devices. In each direction, we further summarize the current research progress and the gaps that still need to be filled by our fellow researchers.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration
Authors:
Jiayi Liu,
Qianyu Zhang,
Xue Wan,
Shengyang Zhang,
Yaolin Tian,
Haodong Han,
Yutao Zhao,
Baichuan Liu,
Zeyuan Zhao,
Xubo Luo
Abstract:
With the complexity of lunar exploration missions, the moon needs to have a higher level of autonomy. Environmental perception and navigation algorithms are the foundation for lunar rovers to achieve autonomous exploration. The development and verification of algorithms require highly reliable data support. Most of the existing lunar datasets are targeted at a single task, lacking diverse scenes a…
▽ More
With the complexity of lunar exploration missions, the moon needs to have a higher level of autonomy. Environmental perception and navigation algorithms are the foundation for lunar rovers to achieve autonomous exploration. The development and verification of algorithms require highly reliable data support. Most of the existing lunar datasets are targeted at a single task, lacking diverse scenes and high-precision ground truth labels. To address this issue, we propose a multi-task, multi-scene, and multi-label lunar benchmark dataset LuSNAR. This dataset can be used for comprehensive evaluation of autonomous perception and navigation systems, including high-resolution stereo image pairs, panoramic semantic labels, dense depth maps, LiDAR point clouds, and the position of rover. In order to provide richer scene data, we built 9 lunar simulation scenes based on Unreal Engine. Each scene is divided according to topographic relief and the density of objects. To verify the usability of the dataset, we evaluated and analyzed the algorithms of semantic segmentation, 3D reconstruction, and autonomous navigation. The experiment results prove that the dataset proposed in this paper can be used for ground verification of tasks such as autonomous environment perception and navigation, and provides a lunar benchmark dataset for testing the accessibility of algorithm metrics. We make LuSNAR publicly available at: https://github.com/autumn999999/LuSNAR-dataset.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
Authors:
Qi Jia,
Baoyu Fan,
Cong Xu,
Lu Liu,
Liang Jin,
Guoguang Du,
Zhenhua Guo,
Yaqian Zhao,
Xuanjing Huang,
Rengang Li
Abstract:
Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos, has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro…
▽ More
Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos, has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro videos and the related comments provide a rich application scenario for viewers induced sentiment analysis. In light of this, we introduces a novel research task, Multi-modal Sentiment Analysis for Comment Response of Video Induced(MSA-CRVI), aims to inferring opinions and emotions according to the comments response to micro video. Meanwhile, we manually annotate a dataset named Comment Sentiment toward to Micro Video (CSMV) to support this research. It is the largest video multi-modal sentiment dataset in terms of scale and video duration to our knowledge, containing 107,267 comments and 8,210 micro videos with a video duration of 68.83 hours. To infer the induced sentiment of comment should leverage the video content, so we propose the Video Content-aware Comment Sentiment Analysis (VC-CSA) method as baseline to address the challenges inherent in this new task. Extensive experiments demonstrate that our method is showing significant improvements over other established baselines.
△ Less
Submitted 15 May, 2024;
originally announced July 2024.
-
Neuromorphic Imaging with Super-Resolution
Authors:
Pei Zhang,
Shuo Zhu,
Chutian Wang,
Yaping Zhao,
Edmund Y. Lam
Abstract:
Neuromorphic imaging is a bio-inspired technique that imitates the human retina to sense variations in a dynamic scene. It responds to pixel-level brightness changes by asynchronous streaming events and boasts microsecond temporal precision over a high dynamic range, yielding blur-free recordings under extreme illumination. Nevertheless, such a modality falls short in spatial resolution and leads…
▽ More
Neuromorphic imaging is a bio-inspired technique that imitates the human retina to sense variations in a dynamic scene. It responds to pixel-level brightness changes by asynchronous streaming events and boasts microsecond temporal precision over a high dynamic range, yielding blur-free recordings under extreme illumination. Nevertheless, such a modality falls short in spatial resolution and leads to a low level of visual richness and clarity. Pursuing hardware upgrades is expensive and might cause compromised performance due to more burdens on computational requirements. Another option is to harness offline, plug-in-play neuromorphic super-resolution solutions. However, existing ones, which demand substantial sample volumes for lengthy training on massive computing resources, are largely restricted by real data availability owing to the current imperfect high-resolution devices, as well as the randomness and variability of motion. To tackle these challenges, we introduce the first self-supervised neuromorphic super-resolution prototype. It can be self-adaptive to per input source from any low-resolution camera to estimate an optimal, high-resolution counterpart of any scale, without the need of side knowledge and prior training. Evaluated on downstream event-driven tasks, such a simple yet effective method can obtain competitive results against the state-of-the-arts, significantly promoting flexibility but not sacrificing accuracy. It also delivers enhancements for inferior natural images and optical micrographs acquired under non-ideal imaging conditions, breaking through the limitations that are challenging to overcome with traditional frame techniques. In the current landscape where the use of high-resolution cameras for event-based sensing remains an open debate, our solution serves as a cost-efficient and practical alternative, paving the way for more intelligent imaging systems.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
StmtTree: An Easy-to-Use yet Versatile Fortran Transformation Toolkit
Authors:
Jingbo Lin,
Yi Yu,
Zhang Yang,
Yafan Zhao
Abstract:
The Fortran programming language continues to dominate the scientific computing community, with many production codes written in the outdated Fortran-77 dialect, yet with many non-standard extensions such as Cray poiters. This creates significant maintenance burden within the community, with tremendous efforts devoted to modernization. However, despite the modern age of advanced compiler framework…
▽ More
The Fortran programming language continues to dominate the scientific computing community, with many production codes written in the outdated Fortran-77 dialect, yet with many non-standard extensions such as Cray poiters. This creates significant maintenance burden within the community, with tremendous efforts devoted to modernization. However, despite the modern age of advanced compiler frameworks, processing and transforming old Fortran codes remains challenging. In this paper, we present StmtTree, a new Fortran code transformation toolkit to address this issue. StmtTree abstracts the Fortran grammar into statement tree, offering both a low-level representation manipulation API and a high-level, easy-to-use query and manipulation mini-language. StmtTree simplifies the creation of customized Fortran transformation tools. Experiments show that StmtTree adapts well to legacy Fortran-77 codes, and complex tools such as removing unused statements can be developed with fewer than 100 lines of python code.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
LLMBox: A Comprehensive Library for Large Language Models
Authors:
Tianyi Tang,
Yiwen Hu,
Bingqian Li,
Wenyang Luo,
Zijing Qin,
Haoxiang Sun,
Jiapeng Wang,
Shiyi Xu,
Xiaoxue Cheng,
Geyang Guo,
Han Peng,
Bowen Zheng,
Yiru Tang,
Yingqian Min,
Yushuo Chen,
Jie Chen,
Yuanqian Zhao,
Luran Ding,
Yuhao Wang,
Zican Dong,
Chunxuan Xia,
Junyi Li,
Kun Zhou,
Wayne Xin Zhao,
Ji-Rong Wen
Abstract:
To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets,…
▽ More
To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets, and models, and (3) more practical consideration, especially on user-friendliness and efficiency. With our library, users can easily reproduce existing methods, train new models, and conduct comprehensive performance comparisons. To rigorously test LLMBox, we conduct extensive experiments in a diverse coverage of evaluation settings, and experimental results demonstrate the effectiveness and efficiency of our library in supporting various implementations related to LLMs. The detailed introduction and usage guidance can be found at https://github.com/RUCAIBox/LLMBox.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Feedback-Driven Automated Whole Bug Report Reproduction for Android Apps
Authors:
Dingbang Wang,
Yu Zhao,
Sidong Feng,
Zhaoxu Zhang,
William G. J. Halfond,
Chunyang Chen,
Xiaoxia Sun,
Jiangfan Shi,
Tingting Yu
Abstract:
In software development, bug report reproduction is a challenging task. This paper introduces ReBL, a novel feedback-driven approach that leverages GPT-4, a large-scale language model, to automatically reproduce Android bug reports. Unlike traditional methods, ReBL bypasses the use of Step to Reproduce (S2R) entities. Instead, it leverages the entire textual bug report and employs innovative promp…
▽ More
In software development, bug report reproduction is a challenging task. This paper introduces ReBL, a novel feedback-driven approach that leverages GPT-4, a large-scale language model, to automatically reproduce Android bug reports. Unlike traditional methods, ReBL bypasses the use of Step to Reproduce (S2R) entities. Instead, it leverages the entire textual bug report and employs innovative prompts to enhance GPT's contextual reasoning. This approach is more flexible and context-aware than the traditional step-by-step entity matching approach, resulting in improved accuracy and effectiveness. In addition to handling crash reports, ReBL has the capability of handling non-crash bug reports. Our evaluation of 96 Android bug reports (73 crash and 23 non-crash) demonstrates that ReBL successfully reproduced 90.63% of these reports, averaging only 74.98 seconds per bug report. Additionally, ReBL outperformed three existing tools in both success rate and speed.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Automatically Analyzing Performance Issues in Android Apps: How Far Are We?
Authors:
Dianshu Liao,
Shidong Pan,
Siyuan Yang,
Yitong Wang,
Yanjie Zhao,
Zhenchang Xing,
Xiaoyu Sun
Abstract:
Performance plays a critical role in ensuring the smooth operation of any mobile application, directly influencing user engagement and retention. Android applications are no exception. However, unlike functionality issues, performance issues are more challenging to discover as their root causes are sophisticated and typically emerge under specific payloads. To tackle this problem, researchers have…
▽ More
Performance plays a critical role in ensuring the smooth operation of any mobile application, directly influencing user engagement and retention. Android applications are no exception. However, unlike functionality issues, performance issues are more challenging to discover as their root causes are sophisticated and typically emerge under specific payloads. To tackle this problem, researchers have dedicated substantial efforts to proposing automatic approaches for understanding, detecting, and resolving performance issues. Despite these endeavors, it still remains unknown what the status quo of Android performance analysis is, and whether existing approaches can indeed accurately reflect real performance issues. To fill this research gap, we conducted a systematic literature review followed by an explanatory study to explore relevant studies and real-world challenges. Our findings reveal that current tools have limited capabilities, covering only 17.50% of the performance issues. Additionally, existing datasets encompass only 27.50% of the issues and are very limited in size. We also show real-world issue patterns, underscoring the huge gap between the identified techniques and practical concerns. Furthermore, possible solutions are provided to guide future research towards achieving effective performance issue detection and resolution.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Volume-optimal persistence homological scaffolds of hemodynamic networks covary with MEG theta-alpha aperiodic dynamics
Authors:
Nghi Nguyen,
Tao Hou,
Enrico Amico,
Jingyi Zheng,
Huajun Huang,
Alan D. Kaplan,
Giovanni Petri,
Joaquín Goñi,
Yize Zhao,
Duy Duong-Tran,
Li Shen
Abstract:
Higher-order properties of functional magnetic resonance imaging (fMRI) induced connectivity have been shown to unravel many exclusive topological and dynamical insights beyond pairwise interactions. Nonetheless, whether these fMRI-induced higher-order properties play a role in disentangling other neuroimaging modalities' insights remains largely unexplored and poorly understood. In this work, by…
▽ More
Higher-order properties of functional magnetic resonance imaging (fMRI) induced connectivity have been shown to unravel many exclusive topological and dynamical insights beyond pairwise interactions. Nonetheless, whether these fMRI-induced higher-order properties play a role in disentangling other neuroimaging modalities' insights remains largely unexplored and poorly understood. In this work, by analyzing fMRI data from the Human Connectome Project Young Adult dataset using persistent homology, we discovered that the volume-optimal persistence homological scaffolds of fMRI-based functional connectomes exhibited conservative topological reconfigurations from the resting state to attentional task-positive state. Specifically, while reflecting the extent to which each cortical region contributed to functional cycles following different cognitive demands, these reconfigurations were constrained such that the spatial distribution of cavities in the connectome is relatively conserved. Most importantly, such level of contributions covaried with powers of aperiodic activities mostly within the theta-alpha (4-12 Hz) band measured by magnetoencephalography (MEG). This comprehensive result suggests that fMRI-induced hemodynamics and MEG theta-alpha aperiodic activities are governed by the same functional constraints specific to each cortical morpho-structure. Methodologically, our work paves the way toward an innovative computing paradigm in multimodal neuroimaging topological learning.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Granular Ta-Te nanowire superconductivity violating the Pauli limit
Authors:
Lingxiao Zhao,
Yi Zhao,
Cuiying Pei,
Changhua Li,
Qi Wang,
Juefei Wu,
Weizheng Cao,
Lin Xiong,
Haiyin Zhu,
Tianping Ying,
Yanpeng Qi
Abstract:
Strategies to achieve higher upper-critical-field superconductors (μ0Hc2(0)) are of great interest for both fundamental science and practical applications. While reducing the thickness of two-dimensional (2D) materials to a few layers significantly enhances μ0Hc2(0) with accompanied potential unconventional pairing mechanisms, further dimensional reduction to 1D compounds rarely exceeds the expect…
▽ More
Strategies to achieve higher upper-critical-field superconductors (μ0Hc2(0)) are of great interest for both fundamental science and practical applications. While reducing the thickness of two-dimensional (2D) materials to a few layers significantly enhances μ0Hc2(0) with accompanied potential unconventional pairing mechanisms, further dimensional reduction to 1D compounds rarely exceeds the expected Pauli limit. Here, we report the discovery of a 1D granular Ta-Te nanowire that becomes superconducting under high pressure, with a maximum critical temperature (Tc) of 5.1 K. Remarkably, the μ0Hc2(0) reaches 16 T, which is twice the Pauli limit, setting a record of μ0Hc2 (0) in all the reported 1D superconductors. Our work demonstrates that the Ta-Te nanowire not only is a potential candidate for applications in high magnetic fields, but also provides an ideal platform for further investigations of the mechanisms between nanowires and large μ0Hc2(0).
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Converse Techniques for Identification via Channels
Authors:
Larissa Brüche,
Marcel A. Mross,
Yaning Zhao,
Wafa Labidi,
Christian Deppe,
Eduard A. Jorswieck
Abstract:
There is a growing interest in models that extend beyond Shannon's classical transmission scheme, renowned for its channel capacity formula $C$. One such promising direction is message identification via channels, introduced by Ahlswede and Dueck. Unlike in Shannon's classical model, where the receiver aims to determine which message was sent from a set of $M$ messages, message identification focu…
▽ More
There is a growing interest in models that extend beyond Shannon's classical transmission scheme, renowned for its channel capacity formula $C$. One such promising direction is message identification via channels, introduced by Ahlswede and Dueck. Unlike in Shannon's classical model, where the receiver aims to determine which message was sent from a set of $M$ messages, message identification focuses solely on discerning whether a specific message $m$ was transmitted. The encoder can operate deterministically or through randomization, with substantial advantages observed particularly in the latter approach. While Shannon's model allows transmission of $M = 2^{nC}$ messages, Ahlswede and Dueck's model facilitates the identification of $M = 2^{2^{nC}}$ messages, exhibiting a double exponential growth in block length. In their seminal paper, Ahlswede and Dueck established the achievability and introduced a "soft" converse bound. Subsequent works have further refined this, culminating in a strong converse bound, applicable under specific conditions. Watanabe's contributions have notably enhanced the applicability of the converse bound. The aim of this survey is multifaceted: to grasp the formalism and proof techniques outlined in the aforementioned works, analyze Watanabe's converse, trace the evolution from earlier converses to Watanabe's, emphasizing key similarities and differences that underpin the enhancements. Furthermore, we explore the converse proof for message identification with feedback, also pioneered by Ahlswede and Dueck. By elucidating how their approaches were inspired by preceding proofs, we provide a comprehensive overview. This overview paper seeks to offer readers insights into diverse converse techniques for message identification, with a focal point on the seminal works of Hayashi, Watanabe, and, in the context of feedback, Ahlswede and Dueck.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Global analysis of fragmentation functions to charged hadrons with high-precision data from the LHC
Authors:
Jun Gao,
ChongYang Liu,
XiaoMin Shen,
Hongxi Xing,
Yuxiang Zhao
Abstract:
Fragmentation functions (FFs) are essential non-perturbative QCD inputs for predicting hadron production cross sections in high energy scatterings. In this study, we present a joint determination of FFs for light charged hadrons through a global analysis at next-to-leading order (NLO) in QCD. Our analysis incorporates a wide range of precision measurements from the LHC, as well as data from electr…
▽ More
Fragmentation functions (FFs) are essential non-perturbative QCD inputs for predicting hadron production cross sections in high energy scatterings. In this study, we present a joint determination of FFs for light charged hadrons through a global analysis at next-to-leading order (NLO) in QCD. Our analysis incorporates a wide range of precision measurements from the LHC, as well as data from electron-positron collisions and semi-inclusive deep inelastic scatterings. By including measurements of jet fragmentation at the LHC in our global analysis, we are able to impose strong constraints on the gluon FFs. A careful selection of hadron kinematics is applied to ensure the validity of factorization and perturbative calculations of QCD. In addition, we introduce several methodological advances in fitting, resulting in a flexible parametrization form and the inclusion of theoretical uncertainties from perturbative calculations. Our best-fit predictions show very good agreement with the global data, with $χ^2/N_{pt}\sim 0.90$. We also generate a large number of Hessian error sets to estimate uncertainties and correlations of the extracted FFs. FFs to charged pions (kaons and protons) are well constrained for momentum fractions down to 0.01 (0.1). Total momentum of partons carried by light charged hadrons are determined precisely. Their values for $u$, $d$ quarks and gluon saturate at about 50\% for a lower cut of the momentum fraction of 0.01. Pulls from individual datasets and impact of various choices of the analysis are also studied in details. Additionally, we present an update of the FMNLO program used for calculating hadron production cross sections. Our FFs, including the error sets (denoted as NPC23), are publicly available in the form of LHAPDF6 grids.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
CLIP-DR: Textual Knowledge-Guided Diabetic Retinopathy Grading with Ranking-aware Prompting
Authors:
Qinkai Yu,
Jianyang Xie,
Anh Nguyen,
He Zhao,
Jiong Zhang,
Huazhu Fu,
Yitian Zhao,
Yalin Zheng,
Yanda Meng
Abstract:
Diabetic retinopathy (DR) is a complication of diabetes and usually takes decades to reach sight-threatening levels. Accurate and robust detection of DR severity is critical for the timely management and treatment of diabetes. However, most current DR grading methods suffer from insufficient robustness to data variability (\textit{e.g.} colour fundus images), posing a significant difficulty for ac…
▽ More
Diabetic retinopathy (DR) is a complication of diabetes and usually takes decades to reach sight-threatening levels. Accurate and robust detection of DR severity is critical for the timely management and treatment of diabetes. However, most current DR grading methods suffer from insufficient robustness to data variability (\textit{e.g.} colour fundus images), posing a significant difficulty for accurate and robust grading. In this work, we propose a novel DR grading framework CLIP-DR based on three observations: 1) Recent pre-trained visual language models, such as CLIP, showcase a notable capacity for generalisation across various downstream tasks, serving as effective baseline models. 2) The grading of image-text pairs for DR often adheres to a discernible natural sequence, yet most existing DR grading methods have primarily overlooked this aspect. 3) A long-tailed distribution among DR severity levels complicates the grading process. This work proposes a novel ranking-aware prompting strategy to help the CLIP model exploit the ordinal information. Specifically, we sequentially design learnable prompts between neighbouring text-image pairs in two different ranking directions. Additionally, we introduce a Similarity Matrix Smooth module into the structure of CLIP to balance the class distribution. Finally, we perform extensive comparisons with several state-of-the-art methods on the GDRBench benchmark, demonstrating our CLIP-DR's robustness and superior performance. The implementation code is available \footnote{\url{https://github.com/Qinkaiyu/CLIP-DR}
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Towards reproducible machine learning-based process monitoring and quality prediction research for additive manufacturing
Authors:
Jiarui Xie,
Mutahar Safdar,
Andrei Mircea,
Yan Lu,
Hyunwoong Ko,
Zhuo Yang,
Yaoyao Fiona Zhao
Abstract:
Machine learning (ML)-based monitoring systems have been extensively developed to enhance the print quality of additive manufacturing (AM). In-situ and in-process data acquired using sensors can be used to train ML models that detect process anomalies, predict part quality, and adjust process parameters. However, the reproducibility of the proposed AM monitoring systems has not been investigated.…
▽ More
Machine learning (ML)-based monitoring systems have been extensively developed to enhance the print quality of additive manufacturing (AM). In-situ and in-process data acquired using sensors can be used to train ML models that detect process anomalies, predict part quality, and adjust process parameters. However, the reproducibility of the proposed AM monitoring systems has not been investigated. There has not been a method to evaluate and improve reproducibility in the joint domain of AM and ML. Consequently, some crucial information for reproducing the research is usually missing from the publications; thus, systems reproduced based on the publications often cannot achieve the claimed performance. This paper establishes the definition of reproducibility in this domain, proposes a reproducibility investigation pipeline, and composes a reproducibility checklist. A research is reproducible if a performance comparable to the original research can be obtained when reproduced by a different team using a different experiment setup. The reproducibility investigation pipeline sequentially guides the readers through all the necessary reproduction steps, during which the reproducibility checklist will help extract the reproducibility information from the publication. A case study that reproduced a vision-based warping detection system demonstrated the usage and validated the efficacy of the proposed pipeline and checklist. It has been observed that the reproducibility checklist can help the authors verify that all the information critical to reproducibility is provided in the publications. The investigation pipeline can help identify the missing reproducibility information, which should be acquired from the original authors to achieve the claimed performance.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices
Authors:
Jiayi Zhang,
Chuang Zhao,
Yihan Zhao,
Zhaoyang Yu,
Ming He,
Jianping Fan
Abstract:
The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement…
▽ More
The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement in handling complex tasks and reducing high reasoning costs. In this paper, we introduce MobileExperts, which for the first time introduces tool formulation and multi-agent collaboration to address the aforementioned challenges. More specifically, MobileExperts dynamically assembles teams based on the alignment of agent portraits with the human requirements. Following this, each agent embarks on an independent exploration phase, formulating its tools to evolve into an expert. Lastly, we develop a dual-layer planning mechanism to establish coordinate collaboration among experts. To validate our effectiveness, we design a new benchmark of hierarchical intelligence levels, offering insights into algorithm's capability to address tasks across a spectrum of complexity. Experimental results demonstrate that MobileExperts performs better on all intelligence levels and achieves ~ 22% reduction in reasoning costs, thus verifying the superiority of our design.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition
Authors:
Huanzhang Dou,
Pengyi Zhang,
Yuhan Zhao,
Lu Jin,
Xi Li
Abstract:
Gait recognition, which aims at identifying individuals by their walking patterns, has achieved great success based on silhouette. The binary silhouette sequence encodes the walking pattern within the sparse boundary representation. Therefore, most pixels in the silhouette are under-sensitive to the walking pattern since the sparse boundary lacks dense spatial-temporal information, which is suitab…
▽ More
Gait recognition, which aims at identifying individuals by their walking patterns, has achieved great success based on silhouette. The binary silhouette sequence encodes the walking pattern within the sparse boundary representation. Therefore, most pixels in the silhouette are under-sensitive to the walking pattern since the sparse boundary lacks dense spatial-temporal information, which is suitable to be represented with dense texture. To enhance the sensitivity to the walking pattern while maintaining the robustness of recognition, we present a Complementary Learning with neural Architecture Search (CLASH) framework, consisting of walking pattern sensitive gait descriptor named dense spatial-temporal field (DSTF) and neural architecture search based complementary learning (NCL). Specifically, DSTF transforms the representation from the sparse binary boundary into the dense distance-based texture, which is sensitive to the walking pattern at the pixel level. Further, NCL presents a task-specific search space for complementary learning, which mutually complements the sensitivity of DSTF and the robustness of the silhouette to represent the walking pattern effectively. Extensive experiments demonstrate the effectiveness of the proposed methods under both in-the-lab and in-the-wild scenarios. On CASIA-B, we achieve rank-1 accuracy of 98.8%, 96.5%, and 89.3% under three conditions. On OU-MVLP, we achieve rank-1 accuracy of 91.9%. Under the latest in-the-wild datasets, we outperform the latest silhouette-based methods by 16.3% and 19.7% on Gait3D and GREW, respectively.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Three-dimensional Imaging of Pion using Lattice QCD: Generalized Parton Distributions
Authors:
Heng-Tong Ding,
Xiang Gao,
Swagato Mukherjee,
Peter Petreczky,
Qi Shi,
Sergey Syritsyn,
Yong Zhao
Abstract:
In this work, we report a lattice calculation of $x$-dependent valence pion generalized parton distributions (GPDs) at zero skewness with multiple values of the momentum transfer $-t$. The calculations are based on an $N_f=2+1$ gauge ensemble of highly improved staggered quarks with Wilson-Clover valence fermion. The lattice spacing is 0.04 fm, and the pion valence mass is tuned to be 300 MeV. We…
▽ More
In this work, we report a lattice calculation of $x$-dependent valence pion generalized parton distributions (GPDs) at zero skewness with multiple values of the momentum transfer $-t$. The calculations are based on an $N_f=2+1$ gauge ensemble of highly improved staggered quarks with Wilson-Clover valence fermion. The lattice spacing is 0.04 fm, and the pion valence mass is tuned to be 300 MeV. We determine the Lorentz-invariant amplitudes of the quasi-GPD matrix elements for both symmetric and asymmetric momenta transfers with similar values and show the equivalence of both frames. Then, focusing on the asymmetric frame, we utilize a hybrid scheme to renormalize the quasi-GPD matrix elements obtained from the lattice calculations. After the Fourier transforms, the quasi-GPDs are then matched to the light-cone GPDs within the framework of large momentum effective theory with improved matching, including the next-to-next-to-leading order perturbative corrections, and leading renormalon and renormalization group resummations. We also present the 3-dimensional image of the pion in impact-parameter space through the Fourier transform of the momentum transfer $-t$.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
An Update on the External Calibrator for Hydrogen Observatories (ECHO)
Authors:
Yifan Zhao,
Daniel C. Jacobs,
Titu Samson,
Mrudula Gopal Krishna,
Michael Horn,
Marc-Olivier R. Lalonde,
Raven Braithwaite,
Logan Skabelund
Abstract:
Precision measurements of the beam pattern response are needed to predict the response of a radio telescope. Mapping the beam of a low frequency radio array presents a unique challenge and science cases such as the observation of the 21\,cm line at high redshift have demanding requirements. Drone-based systems offer the unique potential for a measurement which is entirely under experimenter contro…
▽ More
Precision measurements of the beam pattern response are needed to predict the response of a radio telescope. Mapping the beam of a low frequency radio array presents a unique challenge and science cases such as the observation of the 21\,cm line at high redshift have demanding requirements. Drone-based systems offer the unique potential for a measurement which is entirely under experimenter control, but progress has been paced by practical implementation challenges. Previously, a prototype drone system, called the External Calibrator for Hydrogen Observatories (ECHO), demonstrated good performance in making a complete hemispherical beam measurement. This paper reports updates to the system focusing on performance of a new drone platform, minimizing interference from the drone, and a new transmitter.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Constraints Matrices and Convergence Proof of TPMS2STEP
Authors:
Yaonaiming Zhao,
Qiang Zou
Abstract:
TPMS is consistently described in the functional representation (F-rep) format, while modern CAD/CAM/CAE tools are built upon the boundary representation (B-rep) format. To solve this issue, translating TPMS to STEP is needed, called TPMS2STEP. This paper provides constraint matrices and convergence proof of TPMS2STEP so that $C^2$ continuity and an error bound of $2ε$ on the deviation can be ensu…
▽ More
TPMS is consistently described in the functional representation (F-rep) format, while modern CAD/CAM/CAE tools are built upon the boundary representation (B-rep) format. To solve this issue, translating TPMS to STEP is needed, called TPMS2STEP. This paper provides constraint matrices and convergence proof of TPMS2STEP so that $C^2$ continuity and an error bound of $2ε$ on the deviation can be ensured during the translation.
△ Less
Submitted 14 June, 2024;
originally announced July 2024.
-
Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method
Authors:
Sijie Xu,
Shenyan Zong,
Chang-Sheng Mei,
Guofeng Shen,
Yueran Zhao,
He Wang
Abstract:
Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruc…
▽ More
Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruct the temperature maps. The enhanced training modules included offline/online data augmentations, knowledge distillation, and the amplitude-phase decoupling loss function. The heating experiments were performed by a FUS transducer on phantom and ex vivo tissues, respectively. These data were manually under-sampled to imitate acceleration procedures and trained in our method to get the reconstruction model. The additional dozen or so testing datasets were separately obtained for evaluating the real-time performance and temperature accuracy. Acceleration factors of 1.9 and 3.7 were found for 2 times and 4 times k-space under-sampling strategies and the ResUNet-based deep learning reconstruction performed exceptionally well. In 2-fold acceleration scenario, the RMSE of temperature map patches provided the values of 0.888 degree centigrade and 1.145 degree centigrade on phantom and ex vivo testing datasets. The DICE value of temperature areas enclosed by 43 degree centigrade isotherm was 0.809, and the Bland-Altman analysis showed a bias of -0.253 degree centigrade with the apart of plus or minus 2.16 degree centigrade. In 4 times under-sampling case, these evaluating values decreased by approximately 10%. This study demonstrates that deep learning-based reconstruction can significantly enhance the accuracy and efficiency of MR thermometry for clinical FUS thermal therapies.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Hilbert band complexes and their applications
Authors:
Zeying Zhang,
Y. X. Zhao,
Yugui Yao,
Shengyuan A. Yang
Abstract:
The study of band connectivity is a fundamental problem in condensed matter physics. Here, we develop a new method for analyzing band connectivity, which completely solves the outstanding questions of the reducibility and decomposition of band complexes. By translating the symmetry conditions into a set of band balance equations, we show that all possible band structure solutions can be described…
▽ More
The study of band connectivity is a fundamental problem in condensed matter physics. Here, we develop a new method for analyzing band connectivity, which completely solves the outstanding questions of the reducibility and decomposition of band complexes. By translating the symmetry conditions into a set of band balance equations, we show that all possible band structure solutions can be described by a positive affine monoid structure, which has a unique minimal set of generators, called Hilbert basis. We show that Hilbert basis completely determine whether a band complex is reducible and how it can be decomposed. The band complexes corresponding to Hilbert basis vectors, termed as Hilbert band complexes (HBCs), can be regarded as elementary building blocks of band structures. We develop algorithms to construct HBCs, analyze their graph features, and merge them into large complexes. We find some interesting examples, such as HBCs corresponding to complete bipartite graphs, and complexes which can grow without bound by successively merging a HBC.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic
Authors:
Liulu He,
Yufei Zhao,
Rui Gao,
Yuan Du,
Li Du
Abstract:
Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast co…
▽ More
Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast convolution by extending the Discrete Fourier Transform (DFT) with symbolic computing, in which only additions are required to perform the transformation at specific transform points, avoiding the calculation of irrational number and reducing the requirement for precision. Additionally, we enhance convolution efficiency by introducing correction terms to convert invalid circular convolution outputs of the Fourier method into effective ones. The numerical error analysis is presented for the first time in this type of work and proves that our algorithms can provide a 3.68x multiplication reduction for 3x3 convolution, while the Winograd algorithm only achieves a 2.25x reduction with similarly low numerical errors. Experiments carried out on benchmarks and FPGA show that our new algorithms can further improve the computation efficiency of quantized models while maintaining accuracy, surpassing both the quantization-alone method and existing works on fast convolution quantization.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion
Authors:
Yu Zhao,
Ying Zhang,
Baohang Zhou,
Xinying Qian,
Kehui Song,
Xiangrui Cai
Abstract:
A large number of studies have emerged for Multimodal Knowledge Graph Completion (MKGC) to predict the missing links in MKGs. However, fewer studies have been proposed to study the inductive MKGC (IMKGC) involving emerging entities unseen during training. Existing inductive approaches focus on learning textual entity representations, which neglect rich semantic information in visual modality. More…
▽ More
A large number of studies have emerged for Multimodal Knowledge Graph Completion (MKGC) to predict the missing links in MKGs. However, fewer studies have been proposed to study the inductive MKGC (IMKGC) involving emerging entities unseen during training. Existing inductive approaches focus on learning textual entity representations, which neglect rich semantic information in visual modality. Moreover, they focus on aggregating structural neighbors from existing KGs, which of emerging entities are usually limited. However, the semantic neighbors are decoupled from the topology linkage and usually imply the true target entity. In this paper, we propose the IMKGC task and a semantic neighbor retrieval-enhanced IMKGC framework CMR, where the contrast brings the helpful semantic neighbors close, and then the memorize supports semantic neighbor retrieval to enhance inference. Specifically, we first propose a unified cross-modal contrastive learning to simultaneously capture the textual-visual and textual-textual correlations of query-entity pairs in a unified representation space. The contrastive learning increases the similarity of positive query-entity pairs, therefore making the representations of helpful semantic neighbors close. Then, we explicitly memorize the knowledge representations to support the semantic neighbor retrieval. At test time, we retrieve the nearest semantic neighbors and interpolate them to the query-entity similarity distribution to augment the final prediction. Extensive experiments validate the effectiveness of CMR on three inductive MKGC datasets. Codes are available at https://github.com/OreOZhao/CMR.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
52B to 1T: Lessons Learned via Tele-FLM Series
Authors:
Xiang Li,
Yiqun Yao,
Xin Jiang,
Xuezhi Fang,
Chao Wang,
Xinzhang Liu,
Zihan Wang,
Yu Zhao,
Xin Wang,
Yuyao Huang,
Shuangyong Song,
Yongxiang Li,
Zheng Zhang,
Bo Zhao,
Aixin Sun,
Yequan Wang,
Zhongjiang He,
Zhongyuan Wang,
Xuelong Li,
Tiejun Huang
Abstract:
Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion…
▽ More
Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion-parameter model. We delve into two primary areas: we first discuss our observation of Supervised Fine-tuning (SFT) on Tele-FLM-52B, which supports the "less is more" approach for SFT data construction; second, we demonstrate our experiments and analyses on the best practices for progressively growing a model from 52 billion to 102 billion, and subsequently to 1 trillion parameters. We will open-source a 1T model checkpoint, namely Tele-FLM-1T, to advance further training and research.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Multi-Scenario Combination Based on Multi-Agent Reinforcement Learning to Optimize the Advertising Recommendation System
Authors:
Yang Zhao,
Chang Zhou,
Jin Cao,
Yi Zhao,
Shaobo Liu,
Chiyu Cheng,
Xingchen Li
Abstract:
This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a sh…
▽ More
This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a shared objective and allows for strategy communication to boost overall performance. Our results show marked improvements in metrics such as click-through rate (CTR), conversion rate, and total sales, confirming our method's efficacy in practical settings.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
A Survey of Retrieval Algorithms in Ad and Content Recommendation Systems
Authors:
Yu Zhao
Abstract:
This survey examines the most effective retrieval algorithms utilized in ad recommendation and content recommendation systems. Ad targeting algorithms rely on detailed user profiles and behavioral data to deliver personalized advertisements, thereby driving revenue through targeted placements. Conversely, organic retrieval systems aim to improve user experience by recommending content that matches…
▽ More
This survey examines the most effective retrieval algorithms utilized in ad recommendation and content recommendation systems. Ad targeting algorithms rely on detailed user profiles and behavioral data to deliver personalized advertisements, thereby driving revenue through targeted placements. Conversely, organic retrieval systems aim to improve user experience by recommending content that matches user preferences. This paper compares these two applications and explains the most effective methods employed in each.
△ Less
Submitted 20 June, 2024;
originally announced July 2024.
-
Performance of small-diameter muon drift tube chambers with new fast readout ASIC at high background rates
Authors:
Sergey Abovyan,
Nayana Bangaru,
Francesco Fallavollita,
Oliver Kortner,
Sandra Kortner,
Hubert Kroha,
Elena Voevodina,
Robert Richter,
Yazhou Zhao
Abstract:
Experiments like ATLAS at the HL-LHC or detectors at future hadron colliders need muon detectors with excellent momentum resolution up to the TeV scale both at the trigger and offline reconstruction levels. This requires muon tracking chambers with high spatial resolution even at the highest background fluxes. Drift-tube chambers are the most cost-effective technology for large-area muon systems,…
▽ More
Experiments like ATLAS at the HL-LHC or detectors at future hadron colliders need muon detectors with excellent momentum resolution up to the TeV scale both at the trigger and offline reconstruction levels. This requires muon tracking chambers with high spatial resolution even at the highest background fluxes. Drift-tube chambers are the most cost-effective technology for large-area muon systems, providing the required high rate capability and three-dimensional spatial resolution. Thanks to advances in electronics, the new generation small-diameter Muon Drift Tube (sMDT) detectors with 15 mm tube diameter can be used in stand-alone mode up to the background rates expected at future hadron collider experiments, providing event times and second coordinates without additional trigger chambers. New developments in integrated front-end electronics include fast baseline restoration of the shaped signal and picosecond time-to-digital converters for second coordinate measurement with double-sided read-out. Self-triggered operation is now possible using modern high-performance FPGAs for real-time pattern recognition and track reconstruction. A new amplifier shaper discriminator chip in 65 nm TSMC CMOS technology with increased sensitivity and faster baseline recovery has been developed to cope with high background fluxes. Extensive test beam campaigns using sMDT chambers with new readout electronics have been performed at the CERN Gamma Irradiation Facility (GIF++). Results show that the shorter peaking time of the new chip enhances the spatial resolution of the drift tubes by up to 100 $μ$m at a background rate of 1 MHz, the maximum rate expected at the 100 TeV collider experiment.
△ Less
Submitted 3 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Actuation system of the inertial sensor for high-precision space missions using torsion pendulum
Authors:
Fangchao Yang,
Yan Zhu,
Xiaofei Jin,
Yujie Zhao,
Shixun Pei,
Wei Hong
Abstract:
Precision space inertial sensors are imperative to Earth geodesy missions, gravitational wave observations and several fundamental physics experiments in space. In these missions, the residual acceleration noise of the test mass(TM) caused by the forces from inertial sensor components and environment is supposed to be kept below a certain level. As a number of forces contributing to residual accel…
▽ More
Precision space inertial sensors are imperative to Earth geodesy missions, gravitational wave observations and several fundamental physics experiments in space. In these missions, the residual acceleration noise of the test mass(TM) caused by the forces from inertial sensor components and environment is supposed to be kept below a certain level. As a number of forces contributing to residual acceleration are related to actuation system, developing a precise actuation system to exclude any erroneous force and obtain an ultra sensitive value for TM acceleration noise is necessary and essential. However, it is difficult to test the actuation system on ground. In this paper, a torsion pendulum is established to test the influence of actuation system on TM torque noise and a closed-loop control system combined torsion pendulum and parts of actuation modules is designed to assess the performance of actuation control algorithm. The experimental results show that the parameters in an actuation system will introduce additional torque noise and the maximum noise can reach as much as 10^{-13}Nm /Hz^{1/2} at 1 mHz. The stable tracking error for the closed-loop system is about 10^{-7}, indicating that the combination system achieves good tracking performance and robustness for TM rotation control in different conditions of inertial sensors.
△ Less
Submitted 10 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Evaluation of Text-to-Video Generation Models: A Dynamics Perspective
Authors:
Mingxiang Liao,
Hannan Lu,
Xinyu Zhang,
Fang Wan,
Tianyu Wang,
Yuzhong Zhao,
Wangmeng Zuo,
Qixiang Ye,
Jingdong Wang
Abstract:
Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to…
▽ More
Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to text prompts. In this study, we propose an effective evaluation protocol, termed DEVIL, which centers on the dynamics dimension to evaluate T2V models. For this purpose, we establish a new benchmark comprising text prompts that fully reflect multiple dynamics grades, and define a set of dynamics scores corresponding to various temporal granularities to comprehensively evaluate the dynamics of each generated video. Based on the new benchmark and the dynamics scores, we assess T2V models with the design of three metrics: dynamics range, dynamics controllability, and dynamics-based quality. Experiments show that DEVIL achieves a Pearson correlation exceeding 90% with human ratings, demonstrating its potential to advance T2V generation models. Code is available at https://github.com/MingXiangL/DEVIL.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Multi-Functional Beamforming Design for Integrated Sensing, Communication, and Computation
Authors:
Yapeng Zhao,
Qingqing Wu,
Wen Chen,
Yong Zeng,
Ruiqi Liu,
Weidong Mei,
Fen Hou,
Shaodan Ma
Abstract:
Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target,…
▽ More
Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target, and multiple singleantenna communication users. The BS needs to allocate the available resources to efficiently provide sensing, communication, and computation services. Due to the heavy service burden and limited power budget, the BS can partially offload the tasks to the nearby edge server instead of computing them locally. We consider the estimation of the target response matrix, a general problem in radar sensing, and utilize Cramer-Rao bound (CRB) as the corresponding performance metric. To tackle the non-convex optimization problem, we propose both semidefinite relaxation (SDR)-based alternating optimization and SDR-based successive convex approximation (SCA) algorithms to minimize the CRB of radar sensing while meeting the requirement of communication users and the need for task computing. Furthermore, we demonstrate that the optimal rankone solutions of both the alternating and SCA algorithms can be directly obtained via the solver or further constructed even when dealing with multiple functionalities. Simulation results show that the proposed algorithms can provide higher target estimation performance than state-of-the-art benchmarks while satisfying the communication and computation constraints.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Applying Deep Learning Technique to Chiral Magnetic Wave Search
Authors:
Yuan-Sheng Zhao,
Xu-Guang Huang
Abstract:
The chiral magnetic wave (CMW) is a collective mode in quark-gluon plasma originated from the chiral magnetic effect (CME) and chiral separation effect. Its detection in heavy-ion collisions is challenging due to significant background contamination. In Ref.[1], we have constructed a neural network which can accurately identify the CME-related signal from the final-state pion spectra. In this pape…
▽ More
The chiral magnetic wave (CMW) is a collective mode in quark-gluon plasma originated from the chiral magnetic effect (CME) and chiral separation effect. Its detection in heavy-ion collisions is challenging due to significant background contamination. In Ref.[1], we have constructed a neural network which can accurately identify the CME-related signal from the final-state pion spectra. In this paper, we generalize such a neural network to the case of CMW search. We show that, after a updated training, the neural network can effectively recognize the CMW-related signal. Additionally, we assess the performance of the neural network compared to other known methods for CMW search.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Discovering one molecule out of a million: inverse design of molecular hole transporting semiconductors tailored for perovskite solar cells
Authors:
Jianchang Wu,
Luca Torresi,
ManMan Hu,
Patrick Reiser,
Jiyun Zhang,
Juan S. Rocha-Ortiz,
Luyao Wang,
Zhiqiang Xie,
Kaicheng Zhang,
Byung-wook Park,
Anastasia Barabash,
Yicheng Zhao,
Junsheng Luo,
Yunuo Wang,
Larry Lüer,
Lin-Long Deng,
Jens A. Hauch,
Sang Il Seok,
Pascal Friederich,
Christoph J. Brabec
Abstract:
The inverse design of tailored organic molecules for specific optoelectronic devices of high complexity holds an enormous potential, but has not yet been realized1,2. The complexity and literally infinite diversity of conjugated molecular structures present both, an unprecedented opportunity for technological breakthroughs as well as an unseen optimization challenge. Current models rely on big dat…
▽ More
The inverse design of tailored organic molecules for specific optoelectronic devices of high complexity holds an enormous potential, but has not yet been realized1,2. The complexity and literally infinite diversity of conjugated molecular structures present both, an unprecedented opportunity for technological breakthroughs as well as an unseen optimization challenge. Current models rely on big data which do not exist for specialized research films. However, a hybrid computational and high throughput experimental screening workflow allowed us to train predictive models with as little as 149 molecules. We demonstrate a unique closed-loop workflow combining high throughput synthesis and Bayesian optimization that discovers new hole transporting materials with tailored properties for solar cell applications. A series of high-performance molecules were identified from minimal suggestions, achieving up to 26.23% (certified 25.88%) power conversion efficiency in perovskite solar cells. Our work paves the way for rapid, informed discovery in vast molecular libraries, revolutionizing material selection for complex devices. We believe that our approach can be generalized to other emerging fields and indeed accelerate the development of optoelectronic semiconductor devices in general.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Authors:
Jinsheng Huang,
Liang Chen,
Taian Guo,
Fu Zeng,
Yusheng Zhao,
Bohan Wu,
Ye Yuan,
Haozhe Zhao,
Zhihui Guo,
Yichi Zhang,
Jingyang Yuan,
Wei Ju,
Luchen Liu,
Tianyu Liu,
Baobao Chang,
Ming Zhang
Abstract:
Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p…
▽ More
Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial performance, undermining the credibility of these evaluations. To address this issue while maintaining the efficiency of MCQ evaluations, we propose MMEvalPro, a benchmark designed to avoid Type-I errors through a trilogy evaluation pipeline and more rigorous metrics. For each original question from existing benchmarks, human annotators augment it by creating one perception question and one knowledge anchor question through a meticulous annotation process. MMEvalPro comprises $2,138$ question triplets, totaling $6,414$ distinct questions. Two-thirds of these questions are manually labeled by human experts, while the rest are sourced from existing benchmarks (MMMU, ScienceQA, and MathVista). Compared with the existing benchmarks, our experiments with the latest LLMs and LMMs demonstrate that MMEvalPro is more challenging (the best LMM lags behind human performance by $31.73\%$, compared to an average gap of $8.03\%$ in previous benchmarks) and more trustworthy (the best LLM trails the best LMM by $23.09\%$, whereas the gap for previous benchmarks is just $14.64\%$). Our in-depth analysis explains the reason for the large performance gap and justifies the trustworthiness of evaluation, underscoring its significant potential for advancing future research.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Lattice QCD Calculation of $x$-dependent Meson Distribution Amplitudes at Physical Pion Mass with Threshold Logarithm Resummation
Authors:
Ian Cloet,
Xiang Gao,
Swagato Mukherjee,
Sergey Syritsyn,
Nikhil Karthik,
Peter Petreczky,
Rui Zhang,
Yong Zhao
Abstract:
We present a lattice QCD calculation of the $x$-dependent pion and kaon distribution amplitudes (DA) in the framework of large momentum effective theory. This calculation is performed on a fine lattice of $a=0.076$~fm at physical pion mass, with the pion boosted to $1.8$~GeV and kaon boosted to $2.3$~GeV. We renormalize the matrix elements in the hybrid scheme and match to $\overline{\rm MS}$ with…
▽ More
We present a lattice QCD calculation of the $x$-dependent pion and kaon distribution amplitudes (DA) in the framework of large momentum effective theory. This calculation is performed on a fine lattice of $a=0.076$~fm at physical pion mass, with the pion boosted to $1.8$~GeV and kaon boosted to $2.3$~GeV. We renormalize the matrix elements in the hybrid scheme and match to $\overline{\rm MS}$ with a subtraction of the leading renormalon in the Wilson-line mass. The perturbative matching is improved by resumming the large logarithms related to the small quark and gluon momenta in the soft-gluon limit. After resummation, we demonstrate that we are able to calculate a range of $x\in[x_0,1-x_0]$ with $x_0=0.25$ for pion and $x_0=0.2$ for kaon with systematics under control. The kaon DA is shown to be slighted skewed, and narrower than pion DA. Although the $x$-dependence cannot be direct calculated beyond these ranges, we estimate higher moments of the pion and kaon DAs {by complementing} our calculation with short-distance factorization.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 2 July, 2024; v1 submitted 28 June, 2024;
originally announced July 2024.
-
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
Authors:
Christopher E. Mower,
Yuhui Wan,
Hongzhan Yu,
Antoine Grosnit,
Jonas Gonzalez-Billandon,
Matthieu Zimmer,
Jinlong Wang,
Xinyu Zhang,
Yao Zhao,
Anbang Zhai,
Puze Liu,
Daniel Palenicek,
Davide Tateo,
Cesar Cadena,
Marco Hutter,
Jan Peters,
Guangjian Tian,
Yuzheng Zhuang,
Kun Shao,
Xingyue Quan,
Jianye Hao,
Jun Wang,
Haitham Bou-Ammar
Abstract:
We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connect…
▽ More
We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.
△ Less
Submitted 2 July, 2024; v1 submitted 28 June, 2024;
originally announced June 2024.
-
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Authors:
Yue Fan,
Lei Ding,
Ching-Chen Kuo,
Shan Jiang,
Yang Zhao,
Xinze Guan,
Jie Yang,
Yi Zhang,
Xin Eric Wang
Abstract:
Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid acce…
▽ More
Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid accessible screen reading tools, in great need of new models driven by advancements in Multimodal Large Language Models (MLLMs). In this paper, we propose a Tree-of-Lens (ToL) agent, utilizing a novel ToL grounding mechanism, to address the SPR task. Based on the input point coordinate and the corresponding GUI screenshot, our ToL agent constructs a Hierarchical Layout Tree. Based on the tree, our ToL agent not only comprehends the content of the indicated area but also articulates the layout and spatial relationships between elements. Such layout information is crucial for accurately interpreting information on the screen, distinguishing our ToL agent from other screen reading tools. We also thoroughly evaluate the ToL agent against other baselines on a newly proposed SPR benchmark, which includes GUIs from mobile, web, and operating systems. Last but not least, we test the ToL agent on mobile GUI navigation tasks, demonstrating its utility in identifying incorrect actions along the path of agent execution trajectories. Code and data: screen-point-and-read.github.io
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Topotaxial Mutual-Exchange Growth of Magnetic Zintl Eu$_3$In$_2$As$_4$ Nanowires with Axion Insulator Classification
Authors:
Man Suk Song,
Lothar Houben,
Yufei Zhao,
Hyeonhu Bae,
Nadav Rothem,
Ambikesh Gupta,
Binghai Yan,
Beena Kalisky,
Magdalena Zaluska-Kotur,
Perla Kacman,
Hadas Shtrikman,
Haim Beidenkopf
Abstract:
Nanomaterials bring to expression unique electronic properties that promote advanced functionality and technologies. Albeit, nanoscale growth presents paramount challenges for synthesis limiting the diversity in structures and compositions. Here, we demonstrate solid-state topotactic exchange that converts Wurtzite InAs nanowires into Zintl phase Eu$_3$In$_2$As$_4$ nanowires. In situ evaporation o…
▽ More
Nanomaterials bring to expression unique electronic properties that promote advanced functionality and technologies. Albeit, nanoscale growth presents paramount challenges for synthesis limiting the diversity in structures and compositions. Here, we demonstrate solid-state topotactic exchange that converts Wurtzite InAs nanowires into Zintl phase Eu$_3$In$_2$As$_4$ nanowires. In situ evaporation of Eu and As over InAs nanowire cores in molecular beam epitaxy results in mutual exchange of Eu from the shell and In from the core. A continuous Eu$_3$In$_2$As$_4$ shell thereby grows that gradually consumes the InAs core and converts it into a single phase Eu$_3$In$_2$As$_4$ nanowire. Topotaxy, which facilitates the mutual exchange, is supported by the substructure of the As matrix which is similar across the Wurtzite InAs and Zintl Eu$_3$In$_2$As$_4$. We provide initial evidence of an antiferromagnetic transition at T$_N$ $\sim$ 6.5 K in the Zintl phase Eu$_3$In$_2$As$_4$ nanowires. Ab initio calculation confirms the antiferromagnetic state and classifies Eu$_3$In$_2$As$_4$ as a $C_2 T$ axion insulator hosting both chiral hinge modes and unpinned Dirac surface states. The topotactic mutual-exchange growth of Zintl Eu$_3$In$_2$As$_4$ nanowires thus enables the exploration of intricate magneto-topological states of nanomaterials. Moreover, it may open the path for topotactic mutual-exchange synthesis of nanowires made of other exotic compounds.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.