Search | arXiv e-print repository

A 95 GeV Higgs boson and spontaneous CP-violation at the finite temperature

Authors: Jing Gao, Jinghong Ma, Lei Wang, Haotian Xu

Abstract: The ATLAS and CMS collaborations reported a diphoton excess in the invariant mass distribution around the 95.4 GeV with a local significance of $3.1σしぐま$. Moreover, there is another $2.3σしぐま$ local excess in the $b\bar{b}$ final state at LEP in the same mass region. A plausible solution is that the Higgs sector is extended to include an additional Higgs bosom with a mass of $95.4$ GeV. We study a comple… ▽ More The ATLAS and CMS collaborations reported a diphoton excess in the invariant mass distribution around the 95.4 GeV with a local significance of $3.1σしぐま$. Moreover, there is another $2.3σしぐま$ local excess in the $b\bar{b}$ final state at LEP in the same mass region. A plausible solution is that the Higgs sector is extended to include an additional Higgs bosom with a mass of $95.4$ GeV. We study a complex singlet scalar extension of the two-Higgs-doublet model in which the 95.4 GeV Higgs is from the mixing of three CP-even Higgs fields. In addition, the extended Higgs potential can achieve spontaneous CP-violation at the finite temperature and restore CP symmetry at the present temperature of the Universe. We find that the model can simultaneously explain the baryon asymmetry of the Universe, the diphoton and $b\bar{b}$ excesses around the 95.4 GeV while satisfying various relevant constraints including the experiments of collider and electric dipole moment. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 28 pages, 8 figures, 1 tables. arXiv admin note: text overlap with arXiv:2311.02828

arXiv:2408.03238 [pdf, other]

LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion

Authors: Jinyu Zhang, Yongchong Gu, Jianxiong Gao, Haitao Lin, Qiang Sun, Xinwei Sun, Xiangyang Xue, Yanwei Fu

Abstract: This paper addresses the challenge of perceiving complete object shapes through visual perception. While prior studies have demonstrated encouraging outcomes in segmenting the visible parts of objects within a scene, amodal segmentation, in particular, has the potential to allow robots to infer the occluded parts of objects. To this end, this paper introduces a new framework that explores amodal s… ▽ More This paper addresses the challenge of perceiving complete object shapes through visual perception. While prior studies have demonstrated encouraging outcomes in segmenting the visible parts of objects within a scene, amodal segmentation, in particular, has the potential to allow robots to infer the occluded parts of objects. To this end, this paper introduces a new framework that explores amodal segmentation for robotic grasping in cluttered scenes, thus greatly enhancing robotic grasping abilities. Initially, we use a conventional segmentation algorithm to detect the visible segments of the target object, which provides shape priors for completing the full object mask. Particularly, to explore how to utilize semantic features from RGB images and geometric information from depth images, we propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net). LAC-Net utilizes the linear-fusion strategy to effectively fuse this cross-modal data, and then uses the prior visible mask as attention map to guide the network to focus on target feature locations for further complete mask recovery. Using the amodal mask of the target object provides advantages in selecting more accurate and robust grasp points compared to relying solely on the visible segments. The results on different datasets show that our method achieves state-of-the-art performance. Furthermore, the robot experiments validate the feasibility and robustness of this method in the real world. Our code and demonstrations are available on the project page: https://jrryzh.github.io/LAC-Net. △ Less

Submitted 6 August, 2024; originally announced August 2024.

Comments: accepted by IROS2024

arXiv:2408.03079 [pdf, other]

Enhancing Complex Causality Extraction via Improved Subtask Interaction and Knowledge Fusion

Authors: Jinglong Gao, Chen Lu, Xiao Ding, Zhongyang Li, Ting Liu, Bing Qin

Abstract: Event Causality Extraction (ECE) aims at extracting causal event pairs from texts. Despite ChatGPT's recent success, fine-tuning small models remains the best approach for the ECE task. However, existing fine-tuning based ECE methods cannot address all three key challenges in ECE simultaneously: 1) Complex Causality Extraction, where multiple causal-effect pairs occur within a single sentence; 2)… ▽ More Event Causality Extraction (ECE) aims at extracting causal event pairs from texts. Despite ChatGPT's recent success, fine-tuning small models remains the best approach for the ECE task. However, existing fine-tuning based ECE methods cannot address all three key challenges in ECE simultaneously: 1) Complex Causality Extraction, where multiple causal-effect pairs occur within a single sentence; 2) Subtask~ Interaction, which involves modeling the mutual dependence between the two subtasks of ECE, i.e., extracting events and identifying the causal relationship between extracted events; and 3) Knowledge Fusion, which requires effectively fusing the knowledge in two modalities, i.e., the expressive pretrained language models and the structured knowledge graphs. In this paper, we propose a unified ECE framework (UniCE to address all three issues in ECE simultaneously. Specifically, we design a subtask interaction mechanism to enable mutual interaction between the two ECE subtasks. Besides, we design a knowledge fusion mechanism to fuse knowledge in the two modalities. Furthermore, we employ separate decoders for each subtask to facilitate complex causality extraction. Experiments on three benchmark datasets demonstrate that our method achieves state-of-the-art performance and outperforms ChatGPT with a margin of at least 30% F1-score. More importantly, our model can also be used to effectively improve the ECE performance of ChatGPT via in-context learning. △ Less

Submitted 6 August, 2024; originally announced August 2024.

Comments: NLPCC 2024 Oral

arXiv:2408.03075 [pdf]

Characterizing the current systems in the Martian ionosphere

Authors: Jiawei Gao, Shibang Li, Anna Mittelholz, Zhaojin Rong, Moa Persson, Zhen Shi, Haoyu Lu, Chi Zhang, Xiaodong Wang, Chuanfei Dong, Lucy Klinger, Jun Cui, Yong Wei, Yongxin Pan

Abstract: When the solar wind interacts with the ionosphere of an unmagnetized planet, it induces currents that form an induced magnetosphere. These currents and their associated magnetic fields play a pivotal role in controlling the movement of charged particles, which is essential for understanding the escape of planetary ions. Unlike the well-documented magnetospheric current systems, the ionospheric cur… ▽ More When the solar wind interacts with the ionosphere of an unmagnetized planet, it induces currents that form an induced magnetosphere. These currents and their associated magnetic fields play a pivotal role in controlling the movement of charged particles, which is essential for understanding the escape of planetary ions. Unlike the well-documented magnetospheric current systems, the ionospheric current systems on unmagnetized planets remain less understood, which constrains the quantification of electrodynamic energy transfer from stars to these planets. Here, utilizing eight years of data from the Mars Atmosphere and Volatile EvolutioN (MAVEN) mission, we investigate the global distribution of ionospheric currents on Mars. We have identified two distinct current systems in the ionosphere: one aligns with the solar wind electric field yet exhibits hemispheric asymmetry perpendicular to the electric field direction; the other corresponds to the flow pattern of annually-averaged neutral winds. We propose that these two current systems are driven by the solar wind and atmospheric neutral winds, respectively. Our findings reveal that Martian ionospheric dynamics are influenced by the neutral winds from below and the solar wind from above, highlighting the complex and intriguing nature of current systems on unmagnetized planets. △ Less

Submitted 6 August, 2024; originally announced August 2024.

Comments: 20 pages, 6 figures

arXiv:2408.01970 [pdf, other]

SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning

Authors: Biqing Qi, Junqi Gao, Xinquan Chen, Dong Li, Weinan Zhang, Bowen Zhou

Abstract: The ability of humans to rapidly learn new knowledge while retaining old memories poses a significant challenge for current deep learning models. To handle this challenge, we draw inspiration from human memory and learning mechanisms and propose the Self-Reflective Complementary Incremental System (SR-CIS). Comprising the deconstructed Complementary Inference Module (CIM) and Complementary Memory… ▽ More The ability of humans to rapidly learn new knowledge while retaining old memories poses a significant challenge for current deep learning models. To handle this challenge, we draw inspiration from human memory and learning mechanisms and propose the Self-Reflective Complementary Incremental System (SR-CIS). Comprising the deconstructed Complementary Inference Module (CIM) and Complementary Memory Module (CMM), SR-CIS features a small model for fast inference and a large model for slow deliberation in CIM, enabled by the Confidence-Aware Online Anomaly Detection (CA-OAD) mechanism for efficient collaboration. CMM consists of task-specific Short-Term Memory (STM) region and a universal Long-Term Memory (LTM) region. By setting task-specific Low-Rank Adaptive (LoRA) and corresponding prototype weights and biases, it instantiates external storage for parameter and representation memory, thus deconstructing the memory module from the inference module. By storing textual descriptions of images during training and combining them with the Scenario Replay Module (SRM) post-training for memory combination, along with periodic short-to-long-term memory restructuring, SR-CIS achieves stable incremental memory with limited storage requirements. Balancing model plasticity and memory stability under constraints of limited storage and low data resources, SR-CIS surpasses existing competitive baselines on multiple standard and few-shot incremental learning benchmarks. △ Less

Submitted 4 August, 2024; originally announced August 2024.

arXiv:2408.01766 [pdf, other]

MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition

Authors: Ruoyu Wang, Wenqian Wang, Jianjun Gao, Dan Lin, Kim-Hui Yap, Bingbing Li

Abstract: Driver action recognition, aiming to accurately identify drivers' behaviours, is crucial for enhancing driver-vehicle interactions and ensuring driving safety. Unlike general action recognition, drivers' environments are often challenging, being gloomy and dark, and with the development of sensors, various cameras such as IR and depth cameras have emerged for analyzing drivers' behaviors. Therefor… ▽ More Driver action recognition, aiming to accurately identify drivers' behaviours, is crucial for enhancing driver-vehicle interactions and ensuring driving safety. Unlike general action recognition, drivers' environments are often challenging, being gloomy and dark, and with the development of sensors, various cameras such as IR and depth cameras have emerged for analyzing drivers' behaviors. Therefore, in this paper, we propose a novel multimodal fusion transformer, named MultiFuser, which identifies cross-modal interrelations and interactions among multimodal car cabin videos and adaptively integrates different modalities for improved representations. Specifically, MultiFuser comprises layers of Bi-decomposed Modules to model spatiotemporal features, with a modality synthesizer for multimodal features integration. Each Bi-decomposed Module includes a Modal Expertise ViT block for extracting modality-specific features and a Patch-wise Adaptive Fusion block for efficient cross-modal fusion. Extensive experiments are conducted on Drive&Act dataset and the results demonstrate the efficacy of our proposed approach. △ Less

Submitted 3 August, 2024; originally announced August 2024.

arXiv:2408.01343 [pdf, other]

StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation

Authors: Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

Abstract: Multimodal semantic segmentation shows significant potential for enhancing segmentation accuracy in complex scenes. However, current methods often incorporate specialized feature fusion modules tailored to specific modalities, thereby restricting input flexibility and increasing the number of training parameters. To address these challenges, we propose StitchFusion, a straightforward yet effective… ▽ More Multimodal semantic segmentation shows significant potential for enhancing segmentation accuracy in complex scenes. However, current methods often incorporate specialized feature fusion modules tailored to specific modalities, thereby restricting input flexibility and increasing the number of training parameters. To address these challenges, we propose StitchFusion, a straightforward yet effective modal fusion framework that integrates large-scale pre-trained models directly as encoders and feature fusers. This approach facilitates comprehensive multi-modal and multi-scale feature fusion, accommodating any visual modal inputs. Specifically, Our framework achieves modal integration during encoding by sharing multi-modal visual information. To enhance information exchange across modalities, we introduce a multi-directional adapter module (MultiAdapter) to enable cross-modal information transfer during encoding. By leveraging MultiAdapter to propagate multi-scale information across pre-trained encoders during the encoding process, StitchFusion achieves multi-modal visual information integration during encoding. Extensive comparative experiments demonstrate that our model achieves state-of-the-art performance on four multi-modal segmentation datasets with minimal additional parameters. Furthermore, the experimental integration of MultiAdapter with existing Feature Fusion Modules (FFMs) highlights their complementary nature. Our code is available at StitchFusion_repo. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2408.01132 [pdf, other]

Spectral methods on a triangle and W-systems

Authors: Jing Gao, Arieh Iserles

Abstract: We present an overarching framework for stable spectral methods on a triangle, defined by a multivariate W-system and based on orthogonal polynomials on the triangle. Motivated by the Koornwinder orthogonal polynomials on the triangle, we introduce a Koornwinder W-system. Once discretised by this W-system, the resulting spatial differentiation matrix is skew symmetric, affording important advantag… ▽ More We present an overarching framework for stable spectral methods on a triangle, defined by a multivariate W-system and based on orthogonal polynomials on the triangle. Motivated by the Koornwinder orthogonal polynomials on the triangle, we introduce a Koornwinder W-system. Once discretised by this W-system, the resulting spatial differentiation matrix is skew symmetric, affording important advantages insofar as stability and conservation of structure are concerned. We analyse the construction of the differentiation matrix and matrix vector multiplication, demonstrating optimal computational cost. Numerical convergence is illustrated through experiments with different parameter choices. As a result, our method exhibits key characteristics of a practical spectral method, inclusive of rapid convergence, fast computation and the preservation of structure of the underlying partial differential equation. △ Less

Submitted 2 August, 2024; originally announced August 2024.

MSC Class: Primary 65M70; secondary 42C05

arXiv:2408.01091 [pdf, other]

Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions

Authors: Jin Gao, Lei Gan, Yuankai Li, Yixin Ye, Dequan Wang

Abstract: Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands.… ▽ More Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands. It comprises 20,000 conflicts, evenly distributed between language and vision paradigms. It is constructed by a novel automatic dataset creation framework, which expedites the process and enables us to encompass a wide range of instruction forms. Our comprehensive evaluation reveals current LMMs consistently struggle to identify multimodal instruction discordance due to a lack of self-awareness. Hence, we propose the Cognitive Awakening Prompting to inject cognition from external, largely enhancing dissonance detection. The dataset and code are here: https://selfcontradiction.github.io/. △ Less

Submitted 5 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

Comments: Accepted by the 18th European Conference on Computer Vision ECCV 2024

arXiv:2408.00355 [pdf, other]

DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Authors: Yu Xie, Qian Qiao, Jun Gao, Tianxiang Wu, Shaoyao Huang, Jiaqing Fan, Ziqiang Cao, Zili Wang, Yue Zhang, Jielei Zhang, Huyang Sun

Abstract: More and more end-to-end text spotting methods based on Transformer architecture have demonstrated superior performance. These methods utilize a bipartite graph matching algorithm to perform one-to-one optimal matching between predicted objects and actual objects. However, the instability of bipartite graph matching can lead to inconsistent optimization targets, thereby affecting the training perf… ▽ More More and more end-to-end text spotting methods based on Transformer architecture have demonstrated superior performance. These methods utilize a bipartite graph matching algorithm to perform one-to-one optimal matching between predicted objects and actual objects. However, the instability of bipartite graph matching can lead to inconsistent optimization targets, thereby affecting the training performance of the model. Existing literature applies denoising training to solve the problem of bipartite graph matching instability in object detection tasks. Unfortunately, this denoising training method cannot be directly applied to text spotting tasks, as these tasks need to perform irregular shape detection tasks and more complex text recognition tasks than classification. To address this issue, we propose a novel denoising training method (DNTextSpotter) for arbitrary-shaped text spotting. Specifically, we decompose the queries of the denoising part into noised positional queries and noised content queries. We use the four Bezier control points of the Bezier center curve to generate the noised positional queries. For the noised content queries, considering that the output of the text in a fixed positional order is not conducive to aligning position with content, we employ a masked character sliding method to initialize noised content queries, thereby assisting in the alignment of text content and position. To improve the model's perception of the background, we further utilize an additional loss function for background characters classification in the denoising training part.Although DNTextSpotter is conceptually simple, it outperforms the state-of-the-art methods on four benchmarks (Total-Text, SCUT-CTW1500, ICDAR15, and Inverse-Text), especially yielding an improvement of 11.3% against the best approach in Inverse-Text dataset. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: Accepted by ACMMM2024

arXiv:2407.21308 [pdf, other]

Enhanced Self-Checkout System for Retail Based on Improved YOLOv10

Authors: Lianghao Tan, Shubing Liu, Jing Gao, Xiaoyi Liu, Linyue Chu, Huangqi Jiang

Abstract: With the rapid advancement of deep learning technologies, computer vision has shown immense potential in retail automation. This paper presents a novel self-checkout system for retail based on an improved YOLOv10 network, aimed at enhancing checkout efficiency and reducing labor costs. We propose targeted optimizations to the YOLOv10 model, by incorporating the detection head structure from YOLOv8… ▽ More With the rapid advancement of deep learning technologies, computer vision has shown immense potential in retail automation. This paper presents a novel self-checkout system for retail based on an improved YOLOv10 network, aimed at enhancing checkout efficiency and reducing labor costs. We propose targeted optimizations to the YOLOv10 model, by incorporating the detection head structure from YOLOv8, which significantly improves product recognition accuracy. Additionally, we develop a post-processing algorithm tailored for self-checkout scenarios, to further enhance the application of system. Experimental results demonstrate that our system outperforms existing methods in both product recognition accuracy and checkout speed. This research not only provides a new technical solution for retail automation but offers valuable insights into optimizing deep learning models for real-world applications. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.20789 [pdf, other]

Hölder regularity of harmonic functions on metric measure spaces

Authors: Jin Gao, Meng Yang

Abstract: We introduce the Hölder regularity condition for harmonic functions on metric measure spaces and prove that under mild volume regular condition and upper heat kernel estimate, the Hölder regularity condition, the weak Bakry-Émery non-negative curvature condition, the heat kernel Hölder continuity with or without exponential terms and the heat kernel near-diagonal lower bound are equivalent. As app… ▽ More We introduce the Hölder regularity condition for harmonic functions on metric measure spaces and prove that under mild volume regular condition and upper heat kernel estimate, the Hölder regularity condition, the weak Bakry-Émery non-negative curvature condition, the heat kernel Hölder continuity with or without exponential terms and the heat kernel near-diagonal lower bound are equivalent. As applications, firstly, we prove the validity of the so-called generalized reverse Hölder inequality on the Sierpiński carpet cable system, which was left open by Devyver, Russ, Yang (Int. Math. Res. Not. IMRN (2023), no. 18, 15537-15583). Secondly, we prove that two-sided heat kernel estimates alone imply gradient estimate for the heat kernel on strongly recurrent fractal-like cable systems, which improves the main results of the aforementioned paper. Thirdly, we obtain Hölder (Lipschitz) estimate for heat kernel on general metric measure spaces, which extends the classical Li-Yau gradient estimate for heat kernel on Riemannian manifolds. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: 36 pages

MSC Class: 28A80; 35K08

arXiv:2407.20613 [pdf]

Escape of an Active Ring from an Attractive Surface: Behaving Like a Self-Propelled Brownian Particle

Authors: Bin Tang, Jin-cheng Gao, Kang Chen, Tian Hui Zhang, Wen-de Tian

Abstract: Escape of active agents from metastable states are of great current interest in statistical and biological physics. In this study, we find that a flexible active Brownian ring escapes from a flat attractive surface though two distinct mechanisms: Kramers-like thermal activation at large rotational diffusion coefficients, but with an effective temperature, and the first-passage process at small rot… ▽ More Escape of active agents from metastable states are of great current interest in statistical and biological physics. In this study, we find that a flexible active Brownian ring escapes from a flat attractive surface though two distinct mechanisms: Kramers-like thermal activation at large rotational diffusion coefficients, but with an effective temperature, and the first-passage process at small rotational diffusion coefficients. The escape time depends explicitly on the shape of potential barrier at large activity and small rotational diffusion coefficient. Moreover, when the propulsion force was biased along its contour, escape becomes more difficult and is primarily caused by thermal noise. Our findings highlight that, despite its intricate configuration, the active ring can be taken as an self-propelled Brownian particle when studying its escape from a smooth surface. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.20610 [pdf]

Constrained motion of self-propelling eccentric disks linked by a spring

Authors: Tian-liang Xu, Chao-ran Qin, Bin Tang, Jin-cheng Gao, Jiankang Zhou, Kang Chen, Tian Hui Zhang, Wen-de Tian

Abstract: It has been supposed that the interplay of elasticity and activity plays a key role in triggering the non-equilibrium behaviors in biological systems. However, the experimental model system is missing to investigate the spatiotemporally dynamical phenomena. Here, a model system of an active chain, where active eccentric-disks are linked by a spring, is designed to study the interplay of activity,… ▽ More It has been supposed that the interplay of elasticity and activity plays a key role in triggering the non-equilibrium behaviors in biological systems. However, the experimental model system is missing to investigate the spatiotemporally dynamical phenomena. Here, a model system of an active chain, where active eccentric-disks are linked by a spring, is designed to study the interplay of activity, elasticity, and friction. Individual active chain exhibits longitudinal and transverse motion, however, it starts to self-rotate when pinning one end, and self-beats when clamping one end. Additionally, our eccentric-disk model can qualitatively reproduce such behaviors and explain the unusual self-rotation of the first disk around its geometric center. Further, the structure and dynamics of long chains were studied via simulations without steric interactions. It was found that hairpin conformation emerges in free motion, while in the constrained motions, the rotational and beating frequencies scale with the flexure number (the ratio of self-propelling force to bending rigidity), ~4/3. Scaling analysis suggests that it results from the balance between activity and energy dissipation. Our findings show that topological constraints play a vital role in non-equilibrium synergy behavior. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.20606 [pdf]

Evidence for Two-dimensional Weyl Fermions in Air-Stable Monolayer PtTe$_{1.75}$

Authors: Zhihao Cai, Haijun Cao, Haohao Sheng, Xuegao Hu, Zhenyu Sun, Qiaoxiao Zhao, Jisong Gao, Shin-ichiro Ideta, Kenya Shimada, Jiawei Huang, Peng Cheng, Lan Chen, Yugui Yao, Sheng Meng, Kehui Wu, Zhijun Wang, Baojie Feng

Abstract: The Weyl semimetals represent a distinct category of topological materials wherein the low-energy excitations appear as the long-sought Weyl fermions. Exotic transport and optical properties are expected because of the chiral anomaly and linear energy-momentum dispersion. While three-dimensional Weyl semimetals have been successfully realized, the quest for their two-dimensional (2D) counterparts… ▽ More The Weyl semimetals represent a distinct category of topological materials wherein the low-energy excitations appear as the long-sought Weyl fermions. Exotic transport and optical properties are expected because of the chiral anomaly and linear energy-momentum dispersion. While three-dimensional Weyl semimetals have been successfully realized, the quest for their two-dimensional (2D) counterparts is ongoing. Here, we report the realization of 2D Weyl fermions in monolayer PtTe$_{1.75}$, which has strong spin-orbit coupling and lacks inversion symmetry, by combined angle-resolved photoemission spectroscopy, scanning tunneling microscopy, second harmonic generation, X-ray photoelectron spectroscopy measurements, and first-principles calculations. The giant Rashba splitting and band inversion lead to the emergence of three pairs of critical Weyl cones. Moreover, monolayer PtTe$_{1.75}$ exhibits excellent chemical stability in ambient conditions, which is critical for future device applications. The discovery of 2D Weyl fermions in monolayer PtTe$_{1.75}$ opens up new possibilities for designing and fabricating novel spintronic devices. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: Nano Letters, In Press

arXiv:2407.20478 [pdf]

Hidden high-risky states identification from routine urban traffic

Authors: Shiyan Liu, Mingyang Bai, Shengmin Guo, Jianxi Gao, Huijun Sun, Ziyou Gao, Daqing Li

Abstract: One of the core risk management tasks is to identify hidden high-risky states that may lead to system breakdown, which can provide valuable early warning knowledge. However, due to high dimensionality and nonlinear interaction embedded in large-scale complex systems like urban traffic, it remains challenging to identify hidden high-risky states from huge system state space where over 99% of possib… ▽ More One of the core risk management tasks is to identify hidden high-risky states that may lead to system breakdown, which can provide valuable early warning knowledge. However, due to high dimensionality and nonlinear interaction embedded in large-scale complex systems like urban traffic, it remains challenging to identify hidden high-risky states from huge system state space where over 99% of possible system states are not yet visited in empirical data. Based on maximum entropy model, we infer the underlying interaction network from complicated dynamical processes of urban traffic, and construct system energy landscape. In this way, we can locate hidden high-risky states that have never been observed from real data. These states can serve as risk signals with high probability of entering hazardous minima in energy landscape, which lead to huge recovery cost. Our finding might provide insights for complex system risk management. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19679 [pdf]

Harnessing Large Vision and Language Models in Agriculture: A Review

Authors: Hongyan Zhu, Shuai Qin, Min Su, Chengzhi Lin, Anjie Li, Junfeng Gao

Abstract: Large models can play important roles in many domains. Agriculture is another key factor affecting the lives of people around the world. It provides food, fabric, and coal for humanity. However, facing many challenges such as pests and diseases, soil degradation, global warming, and food security, how to steadily increase the yield in the agricultural sector is a problem that humans still need to… ▽ More Large models can play important roles in many domains. Agriculture is another key factor affecting the lives of people around the world. It provides food, fabric, and coal for humanity. However, facing many challenges such as pests and diseases, soil degradation, global warming, and food security, how to steadily increase the yield in the agricultural sector is a problem that humans still need to solve. Large models can help farmers improve production efficiency and harvest by detecting a series of agricultural production tasks such as pests and diseases, soil quality, and seed quality. It can also help farmers make wise decisions through a variety of information, such as images, text, etc. Herein, we delve into the potential applications of large models in agriculture, from large language model (LLM) and large vision model (LVM) to large vision-language models (LVLM). After gaining a deeper understanding of multimodal large language models (MLLM), it can be recognized that problems such as agricultural image processing, agricultural question answering systems, and agricultural machine automation can all be solved by large models. Large models have great potential in the field of agriculture. We outline the current applications of agricultural large models, and aims to emphasize the importance of large models in the domain of agriculture. In the end, we envisage a future in which famers use MLLM to accomplish many tasks in agriculture, which can greatly improve agricultural production efficiency and yield. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.19389 [pdf, other]

FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

Authors: Feijie Wu, Xingchen Wang, Yaqing Wang, Tianci Liu, Lu Su, Jing Gao

Abstract: In federated learning (FL), accommodating clients' varied computational capacities poses a challenge, often limiting the participation of those with constrained resources in global model training. To address this issue, the concept of model heterogeneity through submodel extraction has emerged, offering a tailored solution that aligns the model's complexity with each client's computational capacit… ▽ More In federated learning (FL), accommodating clients' varied computational capacities poses a challenge, often limiting the participation of those with constrained resources in global model training. To address this issue, the concept of model heterogeneity through submodel extraction has emerged, offering a tailored solution that aligns the model's complexity with each client's computational capacity. In this work, we propose Federated Importance-Aware Submodel Extraction (FIARSE), a novel approach that dynamically adjusts submodels based on the importance of model parameters, thereby overcoming the limitations of previous static and dynamic submodel extraction methods. Compared to existing works, the proposed method offers a theoretical foundation for the submodel extraction and eliminates the need for additional information beyond the model parameters themselves to determine parameter importance, significantly reducing the overhead on clients. Extensive experiments are conducted on various datasets to showcase superior performance of the proposed FIARSE. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.19150 [pdf, other]

RoSE-Opt: Robust and Efficient Analog Circuit Parameter Optimization with Knowledge-infused Reinforcement Learning

Authors: Weidong Cao, Jian Gao, Tianrui Ma, Rui Ma, Mouhacine Benosman, Xuan Zhang

Abstract: This paper proposes a learning framework, RoSE-Opt, to achieve robust and efficient analog circuit parameter optimization. RoSE-Opt has two important features. First, it incorporates key domain knowledge of analog circuit design, such as circuit topology, couplings between circuit specifications, and variations of process, supply voltage, and temperature, into the learning loop. This strategy faci… ▽ More This paper proposes a learning framework, RoSE-Opt, to achieve robust and efficient analog circuit parameter optimization. RoSE-Opt has two important features. First, it incorporates key domain knowledge of analog circuit design, such as circuit topology, couplings between circuit specifications, and variations of process, supply voltage, and temperature, into the learning loop. This strategy facilitates the training of an artificial agent capable of achieving design goals by identifying device parameters that are optimal and robust. Second, it exploits a two-level optimization method, that is, integrating Bayesian optimization (BO) with reinforcement learning (RL) to improve sample efficiency. In particular, BO is used for a coarse yet quick search of an initial starting point for optimization. This sets a solid foundation to efficiently train the RL agent with fewer samples. Experimental evaluations on benchmarking circuits show promising sample efficiency, extraordinary figure-of-merit in terms of design efficiency and design success rate, and Pareto optimality in circuit performance of our framework, compared to previous methods. Furthermore, this work thoroughly studies the performance of different RL optimization algorithms, such as Deep Deterministic Policy Gradients (DDPG) with an off-policy learning mechanism and Proximal Policy Optimization (PPO) with an on-policy learning mechanism. This investigation provides users with guidance on choosing the appropriate RL algorithms to optimize the device parameters of analog circuits. Finally, our study also demonstrates RoSE-Opt's promise in parasitic-aware device optimization for analog circuits. In summary, our work reports a knowledge-infused BO-RL design automation framework for reliable and efficient optimization of analog circuits' device parameters. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: 14 pages, 12 Figures. Accepted by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

arXiv:2407.18590 [pdf, other]

From 2D to 3D: AISG-SLA Visual Localization Challenge

Authors: Jialin Gao, Bill Ong, Darld Lwi, Zhen Hao Ng, Xun Wei Yee, Mun-Thye Mak, Wee Siong Ng, See-Kiong Ng, Hui Ying Teo, Victor Khoo, Georg Bökman, Johan Edstedt, Kirill Brodt, Clémentin Boittiaux, Maxime Ferrera, Stepan Konev

Abstract: Research in 3D mapping is crucial for smart city applications, yet the cost of acquiring 3D data often hinders progress. Visual localization, particularly monocular camera position estimation, offers a solution by determining the camera's pose solely through visual cues. However, this task is challenging due to limited data from a single camera. To tackle these challenges, we organized the AISG-SL… ▽ More Research in 3D mapping is crucial for smart city applications, yet the cost of acquiring 3D data often hinders progress. Visual localization, particularly monocular camera position estimation, offers a solution by determining the camera's pose solely through visual cues. However, this task is challenging due to limited data from a single camera. To tackle these challenges, we organized the AISG-SLA Visual Localization Challenge (VLC) at IJCAI 2023 to explore how AI can accurately extract camera pose data from 2D images in 3D space. The challenge attracted over 300 participants worldwide, forming 50+ teams. Winning teams achieved high accuracy in pose estimation using images from a car-mounted camera with low frame rates. The VLC dataset is available for research purposes upon request via vlc-dataset@aisingapore.org. △ Less

Submitted 26 July, 2024; originally announced July 2024.

arXiv:2407.18525 [pdf, other]

Is larger always better? Evaluating and prompting large language models for non-generative medical tasks

Authors: Yinghao Zhu, Junyi Gao, Zixiang Wang, Weibin Liao, Xiaochen Zheng, Lifang Liang, Yasha Wang, Chengwei Pan, Ewen M. Harrison, Liantao Ma

Abstract: The use of Large Language Models (LLMs) in medicine is growing, but their ability to handle both structured Electronic Health Record (EHR) data and unstructured clinical notes is not well-studied. This study benchmarks various models, including GPT-based LLMs, BERT-based models, and traditional clinical predictive models, for non-generative medical tasks utilizing renowned datasets. We assessed 14… ▽ More The use of Large Language Models (LLMs) in medicine is growing, but their ability to handle both structured Electronic Health Record (EHR) data and unstructured clinical notes is not well-studied. This study benchmarks various models, including GPT-based LLMs, BERT-based models, and traditional clinical predictive models, for non-generative medical tasks utilizing renowned datasets. We assessed 14 language models (9 GPT-based and 5 BERT-based) and 7 traditional predictive models using the MIMIC dataset (ICU patient records) and the TJH dataset (early COVID-19 EHR data), focusing on tasks such as mortality and readmission prediction, disease hierarchy reconstruction, and biomedical sentence matching, comparing both zero-shot and finetuned performance. Results indicated that LLMs exhibited robust zero-shot predictive capabilities on structured EHR data when using well-designed prompting strategies, frequently surpassing traditional models. However, for unstructured medical texts, LLMs did not outperform finetuned BERT models, which excelled in both supervised and unsupervised tasks. Consequently, while LLMs are effective for zero-shot learning on structured data, finetuned BERT models are more suitable for unstructured texts, underscoring the importance of selecting models based on specific task requirements and data characteristics to optimize the application of NLP technology in healthcare. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.01713

arXiv:2407.18116 [pdf]

Sedimenting microrollers navigate saturated porous media

Authors: Samuel R. Wilson-Whitford, David Kramer, Jinghui Gao, Maria Chiara Roffin, James F. Gilchrist

Abstract: Particle sedimentation through porous media is limited by the inability of passive material to overcome surface interactions and a tortuous network of pores. This limits transport, delivery, and effectiveness of chemicals used as reactants, nutrients, pesticides, or for waste remediation. This work develops magnetically responsive microrollers that navigate the complex interstitial network of poro… ▽ More Particle sedimentation through porous media is limited by the inability of passive material to overcome surface interactions and a tortuous network of pores. This limits transport, delivery, and effectiveness of chemicals used as reactants, nutrients, pesticides, or for waste remediation. This work develops magnetically responsive microrollers that navigate the complex interstitial network of porous matter. Rather than arresting on the upward facing surfaces of the pores, particles can roll and fall further, increasing transport by orders of magnitude. This work directly investigates Janus microrollers, activated by a rotating magnetic field, rolling and sedimenting though an index-matched porous medium. The mechanism of enhanced transport is determined, and the material flux is primarily a function of microroller concentration, rotation rate, and magnetic field strength. This mechanism is most efficient using a minimum number of rotations spaced out periodically in time to reduce the required energy input to greatly enhance transport. This general mechanism of transport enhancement can be broadly applied in numerous applications because the particles delivered within the porous matrix may be comprised of a wide variety of functional materials. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.16935 [pdf, other]

Federated Automatic Latent Variable Selection in Multi-output Gaussian Processes

Authors: Jingyi Gao, Seokhyun Chung

Abstract: This paper explores a federated learning approach that automatically selects the number of latent processes in multi-output Gaussian processes (MGPs). The MGP has seen great success as a transfer learning tool when data is generated from multiple sources/units/entities. A common approach in MGPs to transfer knowledge across units involves gathering all data from each unit to a central server and e… ▽ More This paper explores a federated learning approach that automatically selects the number of latent processes in multi-output Gaussian processes (MGPs). The MGP has seen great success as a transfer learning tool when data is generated from multiple sources/units/entities. A common approach in MGPs to transfer knowledge across units involves gathering all data from each unit to a central server and extracting common independent latent processes to express each unit as a linear combination of the shared latent patterns. However, this approach poses key challenges in (i) determining the adequate number of latent processes and (ii) relying on centralized learning which leads to potential privacy risks and significant computational burdens on the central server. To address these issues, we propose a hierarchical model that places spike-and-slab priors on the coefficients of each latent process. These priors help automatically select only needed latent processes by shrinking the coefficients of unnecessary ones to zero. To estimate the model while avoiding the drawbacks of centralized learning, we propose a variational inference-based approach, that formulates model inference as an optimization problem compatible with federated settings. We then design a federated learning algorithm that allows units to jointly select and infer the common latent processes without sharing their data. We also discuss an efficient learning approach for a new unit within our proposed federated framework. Simulation and case studies on Li-ion battery degradation and air temperature data demonstrate the advantageous features of our proposed approach. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.16584 [pdf]

The need to implement FAIR principles in biomolecular simulations

Authors: Rommie Amaro, Johan Åqvist, Ivet Bahar, Federica Battistini, Adam Bellaiche, Daniel Beltran, Philip C. Biggin, Massimiliano Bonomi, Gregory R. Bowman, Richard Bryce, Giovanni Bussi, Paolo Carloni, David Case, Andrea Cavalli, Chie-En A. Chang, Thomas E. Cheatham III, Margaret S. Cheung, Cris Chipot, Lillian T. Chong, Preeti Choudhary, Cecilia Clementi, Rosana Collepardo-Guevara, Peter Coveney, T. Daniel Crawford, Matteo Dal Peraro , et al. (96 additional authors not shown)

Abstract: This letter illustrates the opinion of the molecular dynamics (MD) community on the need to adopt a new FAIR paradigm for the use of molecular simulations. It highlights the necessity of a collaborative effort to create, establish, and sustain a database that allows findability, accessibility, interoperability, and reusability of molecular dynamics simulation data. Such a development would democra… ▽ More This letter illustrates the opinion of the molecular dynamics (MD) community on the need to adopt a new FAIR paradigm for the use of molecular simulations. It highlights the necessity of a collaborative effort to create, establish, and sustain a database that allows findability, accessibility, interoperability, and reusability of molecular dynamics simulation data. Such a development would democratize the field and significantly improve the impact of MD simulations on life science research. This will transform our working paradigm, pushing the field to a new frontier. We invite you to support our initiative at the MDDB community (https://mddbr.eu/community/) △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.16326

On The Expressive Power of Knowledge Graph Embedding Methods

Authors: Jiexing Gao, Dmitry Rodin, Vasily Motolygin, Denis Zaytsev

Abstract: Knowledge Graph Embedding (KGE) is a popular approach, which aims to represent entities and relations of a knowledge graph in latent spaces. Their representations are known as embeddings. To measure the plausibility of triplets, score functions are defined over embedding spaces. Despite wide dissemination of KGE in various tasks, KGE methods have limitations in reasoning abilities. In this paper w… ▽ More Knowledge Graph Embedding (KGE) is a popular approach, which aims to represent entities and relations of a knowledge graph in latent spaces. Their representations are known as embeddings. To measure the plausibility of triplets, score functions are defined over embedding spaces. Despite wide dissemination of KGE in various tasks, KGE methods have limitations in reasoning abilities. In this paper we propose a mathematical framework to compare reasoning abilities of KGE methods. We show that STransE has a higher capability than TransComplEx, and then present new STransCoRe method, which improves the STransE by combining it with the TransCoRe insights, which can reduce the STransE space complexity. △ Less

Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

Comments: This paper may involve data that is not readily available to the public

MSC Class: MCS 68T30 ACM Class: I.2.4

arXiv:2407.16061 [pdf, other]

A framework for simultaneous fit of QCD and BSM parameters with xFitter

Authors: XiaoMin Shen, Simone Amoroso, Jun Gao, Katerina Lipka, Oleksandr Zenaiev

Abstract: An extension of the xFitter open-source program for QCD analyses is presented, allowing for a polynomial parameterization of the dependence of physical observables on theoretical parameters. This extension enables simultaneous determination of parton distribution functions (PDFs) and new physics parameters within the framework of the Standard Model Effective Field Theory (SMEFT). The functionaliti… ▽ More An extension of the xFitter open-source program for QCD analyses is presented, allowing for a polynomial parameterization of the dependence of physical observables on theoretical parameters. This extension enables simultaneous determination of parton distribution functions (PDFs) and new physics parameters within the framework of the Standard Model Effective Field Theory (SMEFT). The functionalities of the code are illustrated in a sensitivity study, where a simultaneous determination of the PDFs, top quark mass and the couplings of selected dimension-6 SMEFT operators is addressed using projections for measurements of top quark-antiquark pair production at the High-Luminosity LHC. The importance of considering all the correlations of the parton distributions, top quark mass and the EFT parameters in a simultaneous QCD+SMEFT analysis is demonstrated. This work serves as a new platform for simultaneous extraction of the PDFs and the SM/SMEFT parameters based on xFitter. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: 31 pages, 10 figures

arXiv:2407.14086 [pdf, other]

Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking

Authors: Yunfei Zhang, Chao Liang, Jin Gao, Zhipeng Zhang, Weiming Hu, Stephen Maybank, Xue Zhou, Liang Li

Abstract: Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks by incorporating the extraction of appearance features as auxiliary tasks through embedding Re-Identification task (ReID) into the detector, achieving a balance between inference speed and tracking performance. However, solving the competition between the detector and the featu… ▽ More Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks by incorporating the extraction of appearance features as auxiliary tasks through embedding Re-Identification task (ReID) into the detector, achieving a balance between inference speed and tracking performance. However, solving the competition between the detector and the feature extractor has always been a challenge. Meanwhile, the issue of directly embedding the ReID task into MOT has remained unresolved. The lack of high discriminability in appearance features results in their limited utility. In this paper, a new learning approach using cross-correlation to capture temporal information of objects is proposed. The feature extraction network is no longer trained solely on appearance features from each frame but learns richer motion features by utilizing feature heatmaps from consecutive frames, which addresses the challenge of inter-class feature similarity. Furthermore, our learning approach is applied to a more lightweight feature extraction network, and treat the feature matching scores as strong cues rather than auxiliary cues, with an appropriate weight calculation to reflect the compatibility between our obtained features and the MOT task. Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks, i.e., MOT17, MOT20, and DanceTrack datasets. Specifically, on the DanceTrack test set, we achieve 56.8 HOTA, 58.1 IDF1 and 92.5 MOTA, making it the best online tracker capable of achieving real-time performance. Comparative evaluations with other trackers prove that our tracker achieves the best balance between speed, robustness and accuracy. Code is available at https://github.com/yfzhang1214/TCBTrack. △ Less

Submitted 6 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

Comments: A submission to IJCV

arXiv:2407.13985 [pdf]

Cluster Sliding Ferroelectricity in Trilayer Quasi-Hexagonal C60

Authors: Xuefei Wang, Yanhan Ren, Shi Qiu, Fan Zhang, Xueao Li, Junfeng Gao, Weiwei Gao, Jijun Zhao

Abstract: Electric polarization typically originates from non-centrosymmetric charge distributions. Since chemical bonds between atoms of the same elements favor centrosymmetric crystal structures and symmetrically distributed electron charges, elemental ferroelectrics are extremely rare. In comparison to atoms, elemental clusters are less symmetric and typically have various preferred orientations in cryst… ▽ More Electric polarization typically originates from non-centrosymmetric charge distributions. Since chemical bonds between atoms of the same elements favor centrosymmetric crystal structures and symmetrically distributed electron charges, elemental ferroelectrics are extremely rare. In comparison to atoms, elemental clusters are less symmetric and typically have various preferred orientations in crystals. Consequently, the assembly of clusters with different orientations tends to break the inversion symmetry. Based on this concept, we show that sliding ferroelectricity naturally emerges in trilayer quasi-hexagonal phase (qHP) C60, a cluster-assembled carbon allotrope recently synthesized. Trilayer qHP C60's have several stable polar structures, which are distinguishable in second-harmonic generation (SHG) responses. Compared to previously found elemental ferroelectrics, trilayer qHP C60's have sizable band gaps and some of them have both switchable out-of-plane and in-plane polarizations. Remarkably, the out-of-plane and in-plane polarizations are decoupled, enabling an easy-to-implement construction of Van der Waals homostructures with ferroelectrically switchable chirality. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 5 figures

arXiv:2407.13899 [pdf, other]

Main-sequence systems: orbital stability around single star hosts

Authors: Hareesh Gautham Bhaskar, Nathaniel W. H. Moore, Jiapeng Gao, Gongjie Li, Billy Quarles

Abstract: Stability is one of the most fundamental aspects regarding planetary systems. It plays an important role in our understanding on the formation channel of the planetary systems, as well as their habitability. Many approaches have been adopted to determine the stability of these systems, including brute-force N-body simulations, semi-analytical calculations, and more recently machine learning method… ▽ More Stability is one of the most fundamental aspects regarding planetary systems. It plays an important role in our understanding on the formation channel of the planetary systems, as well as their habitability. Many approaches have been adopted to determine the stability of these systems, including brute-force N-body simulations, semi-analytical calculations, and more recently machine learning methods. This allows significant advances in our understanding of planetary system dynamics, as well as providing tools to constrain unknown parameters of exoplanetary systems (assuming these systems are stable). In the following, we focus on planets around single star hosts, and we provide an overview of the studies of planetary system stability for compact multi-planet systems and hierarchical multi-planet systems. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Preprint of a chapter for the 'Encyclopedia of Astrophysics' (Editor-in-Chief Ilya Mandel, Section Editor Dimitri Veras) to be published by Elsevier as a Reference Module. The number of references was capped

arXiv:2407.13227 [pdf, other]

Solving the Model Unavailable MARE using Q-Learning Algorithm

Authors: Fei Yan, Jie Gao, Tao Feng, Jianxing Liu

Abstract: In this paper, the discrete-time modified algebraic Riccati equation (MARE) is solved when the system model is completely unavailable. To achieve this, firstly a brand new iterative method based on the standard discrete-time algebraic Riccati equation (DARE) and its input weighting matrix is proposed to solve the MARE. For the single-input case, the iteration can be initialized by an arbitrary pos… ▽ More In this paper, the discrete-time modified algebraic Riccati equation (MARE) is solved when the system model is completely unavailable. To achieve this, firstly a brand new iterative method based on the standard discrete-time algebraic Riccati equation (DARE) and its input weighting matrix is proposed to solve the MARE. For the single-input case, the iteration can be initialized by an arbitrary positive input weighting if and only if the MARE has a stabilizing solution; nevertheless a pre-given input weighting matrix of a sufficiently large magnitude is used to perform the iteration for the multi-input case when the characteristic parameter belongs to a specified subset. Benefit from the developed specific iteration structure, the Q-learning (QL) algorithm can be employed to subtly solve the MARE where only the system input/output data is used thus the system model is not required. Finally, a numerical simulation example is given to verify the effectiveness of the theoretical results and the algorithm. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.12204 [pdf, other]

Acoustic modulation of individual nanowire quantum dots integrated into a hybrid thin-film lithium niobate photonic platform

Authors: Thomas Descamps, Tanguy Schetelat, Jun Gao, Philip J. Poole, Dan Dalacu, Ali W. Elshaari, Val Zwiller

Abstract: Surface acoustic waves (SAWs) are a powerful tool for controlling a wide range of quantum systems, particularly quantum dots (QDs) via their oscillating strain fields. The resulting energy modulation of these single photon sources can be harnessed to achieve spectral overlap between two QDs otherwise emitting at different wavelengths. In this study, we integrate InAsP/InP nanowire quantum dots ont… ▽ More Surface acoustic waves (SAWs) are a powerful tool for controlling a wide range of quantum systems, particularly quantum dots (QDs) via their oscillating strain fields. The resulting energy modulation of these single photon sources can be harnessed to achieve spectral overlap between two QDs otherwise emitting at different wavelengths. In this study, we integrate InAsP/InP nanowire quantum dots onto a thin-film lithium niobate platform, a strong piezoelectric material, and embed them within Si$_{3}$N$_{4}$-loaded waveguides. We demonstrate emission wavelength modulation of 0.70 nm at 13 dBでしべるm with a single focused interdigital transducer (FIDT) operating at 400 MHz, and achieve twice this modulation by using two FIDTs as an acoustic cavity. Additionally, we bring two QDs with an initial wavelength difference of 0.5 nm into resonance using SAWs. This scalable strain-tuning approach represents a significant step towards producing indistinguishable single photons from remote emitters heterogeneously integrated on a single photonic chip, and paves the way for large scale on-chip quantum information processing using photonic platforms. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11781 [pdf, other]

SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction

Authors: Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li

Abstract: High-quality 3D photoacoustic imaging (PAI) reconstruction under sparse view or limited view has long been challenging. Traditional 3D iterative-based reconstruction methods suffer from both slow speed and high memory consumption. Recently, in computer graphics, the differentiable rendering has made significant progress, particularly with the rise of 3D Gaussian Splatting. Inspired by these, we in… ▽ More High-quality 3D photoacoustic imaging (PAI) reconstruction under sparse view or limited view has long been challenging. Traditional 3D iterative-based reconstruction methods suffer from both slow speed and high memory consumption. Recently, in computer graphics, the differentiable rendering has made significant progress, particularly with the rise of 3D Gaussian Splatting. Inspired by these, we introduce differentiable radiation into PAI, developing a novel reconstruction algorithm: the Sliding Ball Adaptive Growth algorithm (SlingBAG) for 3D PAI, which shows ability in high-quality 3D PAI reconstruction both under extremely sparse view and limited view. We established the point cloud dataset in PAI, and used unique differentiable rapid radiator based on the spherical decomposition strategy and the randomly initialized point cloud adaptively optimized according to sparse sensor data. Each point undergoes updates in 3D coordinates, initial pressure, and resolution (denoted by the radius of ball). Points undergo adaptive growth during iterative process, including point destroying, splitting and duplicating along the gradient of their positions, manifesting the sliding ball effect. Finally, our point cloud to voxel grid shader renders the final reconstruction results. Simulation and in vivo experiments demonstrate that our SlingBAG reconstruction result's SNR can be more than 40 dBでしべる under extremely sparse view, while the SNR of traditional back-projection algorithm's result is less than 20 dBでしべる. Moreover, the result of SlingBAG's structural similarity to the ground truth is significantly higher, with an SSIM value of 95.6%. Notably, our differentiable rapid radiator can conduct forward PA simulation in homogeneous, non-viscous media substantially faster than current methods that numerically simulate the wave propagation, such as k-Wave. The dataset and all code will be open source. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11398 [pdf, other]

Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

Authors: Yanqin Jiang, Chaohui Yu, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao

Abstract: Recent advances in 4D generation mainly focus on generating 4D content by distilling pre-trained text or single-view image-conditioned models. It is inconvenient for them to take advantage of various off-the-shelf 3D assets with multi-view attributes, and their results suffer from spatiotemporal inconsistency owing to the inherent ambiguity in the supervision signals. In this work, we present Anim… ▽ More Recent advances in 4D generation mainly focus on generating 4D content by distilling pre-trained text or single-view image-conditioned models. It is inconvenient for them to take advantage of various off-the-shelf 3D assets with multi-view attributes, and their results suffer from spatiotemporal inconsistency owing to the inherent ambiguity in the supervision signals. In this work, we present Animate3D, a novel framework for animating any static 3D model. The core idea is two-fold: 1) We propose a novel multi-view video diffusion model (MV-VDM) conditioned on multi-view renderings of the static 3D object, which is trained on our presented large-scale multi-view video dataset (MV-Video). 2) Based on MV-VDM, we introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects. Specifically, for MV-VDM, we design a new spatiotemporal attention module to enhance spatial and temporal consistency by integrating 3D and video diffusion models. Additionally, we leverage the static 3D model's multi-view renderings as conditions to preserve its identity. For animating 3D models, an effective two-stage pipeline is proposed: we first reconstruct motions directly from generated multi-view videos, followed by the introduced 4D-SDS to refine both appearance and motion. Qualitative and quantitative experiments demonstrate that Animate3D significantly outperforms previous approaches. Data, code, and models will be open-released. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Project Page: https://animate3d.github.io/

arXiv:2407.10249 [pdf, other]

Low Sensitivity Hopsets

Authors: Vikrant Ashvinkumar, Aaron Bernstein, Chengyuan Deng, Jie Gao, Nicole Wein

Abstract: Given a weighted graph $G$, a $(βべーた,\varepsilon)$-hopset $H$ is an edge set such that for any $s,t \in V(G)$, where $s$ can reach $t$ in $G$, there is a path from $s$ to $t$ in $G \cup H$ which uses at most $βべーた$ hops whose length is in the range $[dist_G(s,t), (1+\varepsilon)dist_G(s,t)]$. We break away from the traditional question that asks for a hopset that achieves small $|H|$ and instead study i… ▽ More Given a weighted graph $G$, a $(βべーた,\varepsilon)$-hopset $H$ is an edge set such that for any $s,t \in V(G)$, where $s$ can reach $t$ in $G$, there is a path from $s$ to $t$ in $G \cup H$ which uses at most $βべーた$ hops whose length is in the range $[dist_G(s,t), (1+\varepsilon)dist_G(s,t)]$. We break away from the traditional question that asks for a hopset that achieves small $|H|$ and instead study its sensitivity, a new quality measure which, informally, is the maximum number of times a vertex (or edge) is bypassed by an edge in $H$. The highlights of our results are: (i) $(\widetilde{O}(\sqrt{n}),0)$-hopsets on undirected graphs with $O(\log n)$ sensitivity, complemented with a lower bound showing that $\widetilde{O}(\sqrt{n})$ is tight up to polylogarithmic factors for any construction with polylogarithmic sensitivity. (ii) $(n^{o(1)},\varepsilon)$-hopsets on undirected graphs with $n^{o(1)}$ sensitivity for any $\varepsilon > 0$ that is at least inverse polylogarithmic, complemented with a lower bound on the tradeoff between $βべーた, \varepsilon$, and the sensitivity. (iii) $\widetilde{O}(\sqrt{n})$-shortcut sets on directed graphs with $O(\log n)$ sensitivity, complemented with a lower bound showing that $βべーた= \widetildeΩおめが(n^{1/3})$ for any construction with polylogarithmic sensitivity. We believe hopset sensitivity is a natural measure in and of itself, and could potentially find use in a diverse range of contexts. More concretely, the notion of hopset sensitivity is also directly motivated by the Differentially Private All Sets Range Queries problem. Our result for $O(\log n)$ sensitivity $(\widetilde{O}(\sqrt{n}),0)$-hopsets on undirected graphs immediately improves the current best-known upper bound on utility from $\widetilde{O}(n^{1/3})$ to $\widetilde{O}(n^{1/4})$ in the pure-DP setting, which is tight up to polylogarithmic factors. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: Abstract shortened to meet arXiv requirements

arXiv:2407.10105 [pdf, other]

Hierarchical Multi-modal Transformer for Cross-modal Long Document Classification

Authors: Tengfei Liu, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin

Abstract: Long Document Classification (LDC) has gained significant attention recently. However, multi-modal data in long documents such as texts and images are not being effectively utilized. Prior studies in this area have attempted to integrate texts and images in document-related tasks, but they have only focused on short text sequences and images of pages. How to classify long documents with hierarchic… ▽ More Long Document Classification (LDC) has gained significant attention recently. However, multi-modal data in long documents such as texts and images are not being effectively utilized. Prior studies in this area have attempted to integrate texts and images in document-related tasks, but they have only focused on short text sequences and images of pages. How to classify long documents with hierarchical structure texts and embedding images is a new problem and faces multi-modal representation difficulties. In this paper, we propose a novel approach called Hierarchical Multi-modal Transformer (HMT) for cross-modal long document classification. The HMT conducts multi-modal feature interaction and fusion between images and texts in a hierarchical manner. Our approach uses a multi-modal transformer and a dynamic multi-scale multi-modal transformer to model the complex relationships between image features, and the section and sentence features. Furthermore, we introduce a new interaction strategy called the dynamic mask transfer module to integrate these two transformers by propagating features between them. To validate our approach, we conduct cross-modal LDC experiments on two newly created and two publicly available multi-modal long document datasets, and the results show that the proposed HMT outperforms state-of-the-art single-modality and multi-modality methods. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: IEEE Transactions on Multimedia

arXiv:2407.10059 [pdf, other]

Towards ultimate fragmentation functions at future lepton colliders

Authors: Bin Zhou, Jun Gao

Abstract: In this work, we study the constraining power of future lepton colliders on fragmentation functions (FFs) to light charged hadrons from quarks and gluon in the framework of QCD collinear factorization. We perform analyses of FFs by including a wide range of pseudo--data from future lepton colliders, such as measurements on hadron multiplicities in the inclusive production of two jets and $W$ boson… ▽ More In this work, we study the constraining power of future lepton colliders on fragmentation functions (FFs) to light charged hadrons from quarks and gluon in the framework of QCD collinear factorization. We perform analyses of FFs by including a wide range of pseudo--data from future lepton colliders, such as measurements on hadron multiplicities in the inclusive production of two jets and $W$ boson pairs, at various center of mass energies, and from hadronic decays of the Higgs boson, including both to heavy quarks and to gluons. The high luminosity and high energies of future lepton colliders allow for quark flavor separations and ensure a precise determination of FFs based solely on data from electron-positron collisions. We find that either CEPC, FCC-$ee$ or ILC can significantly reduce the uncertainties of FFs in a wide kinematic range, compared to the NPC23 set obtained with a global analyses to current world data. We also discuss the impact of higher-order QCD corrections, and the potential constraints from measurements of three-jet production. Furthermore, we describe an update of the FMNLO program allowing for calculating hadron production cross sections at next-to-next-to-leading order in QCD, which is used in this study. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: 30 pages, 9 figures, 2 tables. Code is available at https://fmnlo.sjtu.edu.cn/~fmnlo/

arXiv:2407.09698 [pdf, other]

RIO-CPD: A Riemannian Geometric Method for Correlation-aware Online Change Point Detection

Authors: Chengyuan Deng, Zhengzhang Chen, Xujiang Zhao, Haoyu Wang, Junxiang Wang, Haifeng Chen, Jie Gao

Abstract: The objective of change point detection is to identify abrupt changes at potentially multiple points within a data sequence. This task is particularly challenging in the online setting where various types of changes can occur, including shifts in both the marginal and joint distributions of the data. This paper tackles these challenges by sequentially tracking correlation matrices on the Riemannia… ▽ More The objective of change point detection is to identify abrupt changes at potentially multiple points within a data sequence. This task is particularly challenging in the online setting where various types of changes can occur, including shifts in both the marginal and joint distributions of the data. This paper tackles these challenges by sequentially tracking correlation matrices on the Riemannian geometry, where the geodesic distances accurately capture the development of correlations. We propose Rio-CPD, a non-parametric correlation-aware online change point detection framework that combines the Riemannian geometry of the manifold of symmetric positive definite matrices and the cumulative sum statistic (CUSUM) for detecting change points. Rio-CPD enhances CUSUM by computing the geodesic distance from present observations to the Fréchet mean of previous observations. With careful choice of metrics equipped to the Riemannian geometry, Rio-CPD is simple and computationally efficient. Experimental results on both synthetic and real-world datasets demonstrate that Rio-CPD outperforms existing methods in detection accuracy and efficiency. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09590 [pdf, other]

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

Authors: Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao

Abstract: By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our… ▽ More By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve model's parameter efficiency. We validate the effectiveness of our method by pruning two state-of-the-art MoE models, Mixtral-8x7B and Mixtral-8x22B. Evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks. To facilitate future research, we will release our code and the pruned MoE models. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 13pages, 6 figures

arXiv:2407.08959 [pdf, other]

Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification

Authors: Ke Ji, Peng Wang, Wenjun Ke, Guozheng Li, Jiajun Liu, Jingsheng Gao, Ziyu Shang

Abstract: Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex structured scenarios, such as hierarchical text classification (HTC), especially when the downstream data is extremely scarce. The… ▽ More Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex structured scenarios, such as hierarchical text classification (HTC), especially when the downstream data is extremely scarce. The main challenge is how to transfer the unstructured semantic space in PLMs to the downstream domain hierarchy. Unlike previous work on HTC which directly performs multi-label classification or uses graph neural network (GNN) to inject label hierarchy, in this work, we study the HTC problem under a few-shot setting to adapt knowledge in PLMs from an unstructured manner to the downstream hierarchy. Technically, we design a simple yet effective method named Hierarchical Iterative Conditional Random Field (HierICRF) to search the most domain-challenging directions and exquisitely crafts domain-hierarchy adaptation as a hierarchical iterative language modeling problem, and then it encourages the model to make hierarchical consistency self-correction during the inference, thereby achieving knowledge transfer with hierarchical consistency preservation. We perform HierICRF on various architectures, and extensive experiments on two popular HTC datasets demonstrate that prompt with HierICRF significantly boosts the few-shot HTC performance with an average Micro-F1 by 28.80% to 1.50% and Macro-F1 by 36.29% to 1.5% over the previous state-of-the-art (SOTA) baselines under few-shot settings, while remaining SOTA hierarchical consistency performance. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 9 pages, 2 figures, Accepted by IJCAI2024

arXiv:2407.08937 [pdf, other]

Self-Evolving GPT: A Lifelong Autonomous Experiential Learner

Authors: Jinglong Gao, Xiao Ding, Yiming Cui, Jianbai Zhao, Hepeng Wang, Ting Liu, Bing Qin

Abstract: To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential lea… ▽ More To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential learning framework based on LLMs to explore whether LLMs can imitate human ability for learning and utilizing experience. It autonomously learns and accumulates experience through experience transfer and induction, categorizing the types of input questions to select which accumulated experience to employ for them. Experimental results on six widely used NLP datasets show that our framework performs reliably in each intermediate step and effectively improves the performance of GPT-3.5 and GPT-4. This validates the feasibility of using LLMs to mimic human experiential learning and application capabilities. Additionally, we provide a detailed analysis of the behavior of our framework at each step. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Accepted by ACL 2024 MAIN

arXiv:2407.08639 [pdf, other]

$βべーた$-DPO: Direct Preference Optimization with Dynamic $βべーた$

Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

Abstract: Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $βべーた$, as well as to the quality of the preference data. We analyze the impact of $βべーた$ and data quality on DPO, uncovering that optimal $βべーた$ values vary with the inf… ▽ More Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $βべーた$, as well as to the quality of the preference data. We analyze the impact of $βべーた$ and data quality on DPO, uncovering that optimal $βべーた$ values vary with the informativeness of pairwise data. Addressing the limitations of static $βべーた$ values, we introduce a novel framework that dynamically calibrates $βべーた$ at the batch level, informed by data quality considerations. Additionally, our method incorporates $βべーた$-guided data filtering to safeguard against the influence of outliers. Through empirical evaluation, we demonstrate that our dynamic $βべーた$ adjustment technique significantly improves DPO's performance across a range of models and datasets, offering a more robust and adaptable training paradigm for aligning LLMs with human feedback. The code is available at \url{https://github.com/junkangwu/beta-DPO}. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.07880 [pdf, other]

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jiawei Chen, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

Abstract: This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robus… ▽ More This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robust Optimization (DRO), we enhance DPO's resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient $βべーた$ playing a critical role in its noise resistance. Extending this framework, we introduce Distributionally Robustifying DPO (Dr. DPO), which integrates pairwise robustness by optimizing against worst-case pairwise scenarios. The novel hyperparameter $βべーた'$ in Dr. DPO allows for fine-tuned control over data pair reliability, providing a strategic balance between exploration and exploitation in noisy training environments. Empirical evaluations demonstrate that Dr. DPO substantially improves the quality of generated text and response accuracy in preference datasets, showcasing enhanced performance in both noisy and noise-free settings. The code is available at https://github.com/junkangwu/Dr_DPO. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.06453 [pdf, ps, other]

Dual minus partial order

Authors: Ju Gao, Hongxing Wang, Xiaoji Liu

Abstract: In this paper, we introduce the Dual-minus partial order, get some characterizations of the partial order, and prove that both the dual star partial order and the dual sharp partial order are Dual-minus-type partial orders. Based on the Dual-minus partial order, we introduce the Dual-minus sharp partial order and the Dual-minus star partial order, which are also Dual-minus-type partial orders. In… ▽ More In this paper, we introduce the Dual-minus partial order, get some characterizations of the partial order, and prove that both the dual star partial order and the dual sharp partial order are Dual-minus-type partial orders. Based on the Dual-minus partial order, we introduce the Dual-minus sharp partial order and the Dual-minus star partial order, which are also Dual-minus-type partial orders. In addition, we discuss relationships among the Dual-minus sharp partial order, the D-sharp partial order and the G-sharp partial order(the Dual-minus star partial order, the D-star partial order and the P-star partial order). △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 23 pages

MSC Class: 15A09; 15A24; 62G30

arXiv:2407.06022 [pdf]

Investigation of microstructural evolution of irradiation-induced defects in tungsten: an experimental-numerical approach

Authors: Salahudeen Mohamed, Qian Yuan, Dimitri Litvinov, Jie Gao, Ermile Gaganidze, Dmitry Terentyev, Hans-Christian Schneider, Jarir Aktaa

Abstract: The hostile condition in a fusion tokomak reactor poses the main challenge in the development and design of in-vessel components such as divertor and breeding blanket due to fusion relevant irradiation conditions (14 MeV) and large thermal loads. The current work describes the employment of an integrated experimental-numerical approach to assess the microstructure evolution of dislocation loops an… ▽ More The hostile condition in a fusion tokomak reactor poses the main challenge in the development and design of in-vessel components such as divertor and breeding blanket due to fusion relevant irradiation conditions (14 MeV) and large thermal loads. The current work describes the employment of an integrated experimental-numerical approach to assess the microstructure evolution of dislocation loops and voids in tungsten proposed for fusion application. Cluster dynamics (CD) model is implemented and simulations are performed on the irradiated tungsten Disk shape Compact Tension (DCT) specimen used in the experimental test. TEM characterisation is performed on the DCT specimen irradiated at 400 °C and 600 °C with around 1 dpa, respectively. The dpa rate and cascade overlap rate from the experiments and SPECTRA-PKA code, respectively, are implemented in the CD model. Based on the comparison between experimental and computational results, the dose and temperature dependence of irradiation-induced defects (dislocation loops, voids, c15 clusters) are clearly observed. Trap mediated diffusion is studied and the impact of cascades with the pre-existing defects is analysed through full cascade overlap mode and the consequent influence on the defect concentration is evaluated. The exchange of self-interstitial atoms (SIAs) and the change in the size of loops through reaction between <111> and <100> loops are studied in detail by means of the transfer rate of the SIAs. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05771 [pdf, other]

Multi-times Monte Carlo Rendering for Inter-reflection Reconstruction

Authors: Tengjie Zhu, Zhuo Chen, Jingnan Gao, Yichao Yan, Xiaokang Yang

Abstract: Inverse rendering methods have achieved remarkable performance in reconstructing high-fidelity 3D objects with disentangled geometries, materials, and environmental light. However, they still face huge challenges in reflective surface reconstruction. Although recent methods model the light trace to learn specularity, the ignorance of indirect illumination makes it hard to handle inter-reflections… ▽ More Inverse rendering methods have achieved remarkable performance in reconstructing high-fidelity 3D objects with disentangled geometries, materials, and environmental light. However, they still face huge challenges in reflective surface reconstruction. Although recent methods model the light trace to learn specularity, the ignorance of indirect illumination makes it hard to handle inter-reflections among multiple smooth objects. In this work, we propose Ref-MC2 that introduces the multi-time Monte Carlo sampling which comprehensively computes the environmental illumination and meanwhile considers the reflective light from object surfaces. To address the computation challenge as the times of Monte Carlo sampling grow, we propose a specularity-adaptive sampling strategy, significantly reducing the computational complexity. Besides the computational resource, higher geometry accuracy is also required because geometric errors accumulate multiple times. Therefore, we further introduce a reflection-aware surface model to initialize the geometry and refine it during inverse rendering. We construct a challenging dataset containing scenes with multiple objects and inter-reflections. Experiments show that our method outperforms other inverse rendering methods on various object groups. We also show downstream applications, e.g., relighting and material editing, to illustrate the disentanglement ability of our method. △ Less

Submitted 7 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: 10 pages,6 figures,NeurIPS 2024 Submitted

arXiv:2407.05705 [pdf, other]

Fast and Continual Knowledge Graph Embedding via Incremental LoRA

Authors: Jiajun Liu, Wenjun Ke, Peng Wang, Jiahao Wang, Jinhua Gao, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji, Yining Li

Abstract: Continual Knowledge Graph Embedding (CKGE) aims to efficiently learn new knowledge and simultaneously preserve old knowledge. Dominant approaches primarily focus on alleviating catastrophic forgetting of old knowledge but neglect efficient learning for the emergence of new knowledge. However, in real-world scenarios, knowledge graphs (KGs) are continuously growing, which brings a significant chall… ▽ More Continual Knowledge Graph Embedding (CKGE) aims to efficiently learn new knowledge and simultaneously preserve old knowledge. Dominant approaches primarily focus on alleviating catastrophic forgetting of old knowledge but neglect efficient learning for the emergence of new knowledge. However, in real-world scenarios, knowledge graphs (KGs) are continuously growing, which brings a significant challenge to fine-tuning KGE models efficiently. To address this issue, we propose a fast CKGE framework (\model), incorporating an incremental low-rank adapter (\mec) mechanism to efficiently acquire new knowledge while preserving old knowledge. Specifically, to mitigate catastrophic forgetting, \model\ isolates and allocates new knowledge to specific layers based on the fine-grained influence between old and new KGs. Subsequently, to accelerate fine-tuning, \model\ devises an efficient \mec\ mechanism, which embeds the specific layers into incremental low-rank adapters with fewer training parameters. Moreover, \mec\ introduces adaptive rank allocation, which makes the LoRA aware of the importance of entities and adjusts its rank scale adaptively. We conduct experiments on four public datasets and two new datasets with a larger initial scale. Experimental results demonstrate that \model\ can reduce training time by 34\%-49\% while still achieving competitive link prediction performance against state-of-the-art models on four public datasets (average MRR score of 21.0\% vs. 21.1\%).Meanwhile, on two newly constructed datasets, \model\ saves 51\%-68\% training time and improves link prediction performance by 1.5\%. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted by IJCAI2024

arXiv:2407.04422 [pdf, other]

Global analysis of fragmentation functions to charged hadrons with high-precision data from the LHC

Authors: Jun Gao, ChongYang Liu, XiaoMin Shen, Hongxi Xing, Yuxiang Zhao

Abstract: Fragmentation functions (FFs) are essential non-perturbative QCD inputs for predicting hadron production cross sections in high energy scatterings. In this study, we present a joint determination of FFs for light charged hadrons through a global analysis at next-to-leading order (NLO) in QCD. Our analysis incorporates a wide range of precision measurements from the LHC, as well as data from electr… ▽ More Fragmentation functions (FFs) are essential non-perturbative QCD inputs for predicting hadron production cross sections in high energy scatterings. In this study, we present a joint determination of FFs for light charged hadrons through a global analysis at next-to-leading order (NLO) in QCD. Our analysis incorporates a wide range of precision measurements from the LHC, as well as data from electron-positron collisions and semi-inclusive deep inelastic scatterings. By including measurements of jet fragmentation at the LHC in our global analysis, we are able to impose strong constraints on the gluon FFs. A careful selection of hadron kinematics is applied to ensure the validity of factorization and perturbative calculations of QCD. In addition, we introduce several methodological advances in fitting, resulting in a flexible parametrization form and the inclusion of theoretical uncertainties from perturbative calculations. Our best-fit predictions show very good agreement with the global data, with $χかい^2/N_{pt}\sim 0.90$. We also generate a large number of Hessian error sets to estimate uncertainties and correlations of the extracted FFs. FFs to charged pions (kaons and protons) are well constrained for momentum fractions down to 0.01 (0.1). Total momentum of partons carried by light charged hadrons are determined precisely. Their values for $u$, $d$ quarks and gluon saturate at about 50\% for a lower cut of the momentum fraction of 0.01. Pulls from individual datasets and impact of various choices of the analysis are also studied in details. Additionally, we present an update of the FMNLO program used for calculating hadron production cross sections. Our FFs, including the error sets (denoted as NPC23), are publicly available in the form of LHAPDF6 grids. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 44 pages, 45 figures

arXiv:2407.03038 [pdf, other]

On the Client Preference of LLM Fine-tuning in Federated Learning

Authors: Feijie Wu, Xiaoze Liu, Haoyu Wang, Xingchen Wang, Jing Gao

Abstract: Reinforcement learning with human feedback (RLHF) fine-tunes a pretrained large language model (LLM) using preference datasets, enabling the LLM to generate outputs that align with human preferences. Given the sensitive nature of these preference datasets held by various clients, there is a need to implement RLHF within a federated learning (FL) framework, where clients are reluctant to share thei… ▽ More Reinforcement learning with human feedback (RLHF) fine-tunes a pretrained large language model (LLM) using preference datasets, enabling the LLM to generate outputs that align with human preferences. Given the sensitive nature of these preference datasets held by various clients, there is a need to implement RLHF within a federated learning (FL) framework, where clients are reluctant to share their data due to privacy concerns. To address this, we introduce a feasible framework in which clients collaboratively train a binary selector with their preference datasets using our proposed FedBis. With a well-trained selector, we can further enhance the LLM that generates human-preferred completions. Meanwhile, we propose a novel algorithm, FedBiscuit, that trains multiple selectors by organizing clients into balanced and disjoint clusters based on their preferences. Compared to the FedBis, FedBiscuit demonstrates superior performance in simulating human preferences for pairwise completions. Our extensive experiments on federated human preference datasets -- marking the first benchmark to address heterogeneous data partitioning among clients -- demonstrate that FedBiscuit outperforms FedBis and even surpasses traditional centralized training. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Work in progress

arXiv:2407.01875 [pdf, ps, other]

Spatio-Temporal Graphical Counterfactuals: An Overview

Authors: Mingyu Kang, Duxin Chen, Ziyuan Pu, Jianxi Gao, Wenwu Yu

Abstract: Counterfactual thinking is a critical yet challenging topic for artificial intelligence to learn knowledge from data and ultimately improve their performances for new scenarios. Many research works, including Potential Outcome Model and Structural Causal Model, have been proposed to realize it. However, their modelings, theoretical foundations and application approaches are usually different. More… ▽ More Counterfactual thinking is a critical yet challenging topic for artificial intelligence to learn knowledge from data and ultimately improve their performances for new scenarios. Many research works, including Potential Outcome Model and Structural Causal Model, have been proposed to realize it. However, their modelings, theoretical foundations and application approaches are usually different. Moreover, there is a lack of graphical approach to infer spatio-temporal counterfactuals, that considers spatial and temporal interactions between multiple units. Thus, in this work, our aim is to investigate a survey to compare and discuss different counterfactual models, theories and approaches, and further build a unified graphical causal frameworks to infer the spatio-temporal counterfactuals. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01414 [pdf, other]

StyleShot: A Snapshot on Any Style

Authors: Junyao Gao, Yanchen Liu, Yanan Sun, Yinhao Tang, Yanhong Zeng, Kai Chen, Cairong Zhao

Abstract: In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training… ▽ More In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training strategy, and StyleGallery enables the generalization ability. We further employ a content-fusion encoder to enhance image-driven style transfer. We highlight that, our approach, named StyleShot, is simple yet effective in mimicking various desired styles, i.e., 3D, flat, abstract or even fine-grained styles, without test-time tuning. Rigorous experiments validate that, StyleShot achieves superior performance across a wide range of styles compared to existing state-of-the-art methods. The project page is available at: https://styleshot.github.io/. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: project page:https://styleshot.github.io/

Showing 1–50 of 2,154 results for author: Gao, J