Search | arXiv e-print repository

Search for the lepton-flavor violating decay $B^0_s\toφふぁいμみゅー^\pmτたう^\mp$

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

Abstract: A search for the lepton-flavor violating decays $B^0_s\toφふぁいμみゅー^\pmτたう^\mp$ is presented, using a sample of proton-proton collisions at center-of-mass energies of 7, 8, and 13 TeV, collected with the LHCb detector and corresponding to a total integrated luminosity of $9\,\text{fb}^{-1}$. The $τたう$ leptons are selected using decays with three charged pions. No significant excess is observed, and an upper l… ▽ More A search for the lepton-flavor violating decays $B^0_s\toφふぁいμみゅー^\pmτたう^\mp$ is presented, using a sample of proton-proton collisions at center-of-mass energies of 7, 8, and 13 TeV, collected with the LHCb detector and corresponding to a total integrated luminosity of $9\,\text{fb}^{-1}$. The $τたう$ leptons are selected using decays with three charged pions. No significant excess is observed, and an upper limit on the branching fraction is determined to be ${\cal B}( B^0_s\toφふぁいμみゅー^\pmτたう^\mp) < 1.0\times 10^{-5}$ at 90% confidence level. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-006.html (LHCb public pages)

Report number: LHCb-PAPER-2024-006, CERN-EP-2024-114

arXiv:2405.12688 [pdf, other]

Study of $b$-hadron decays to $Λらむだ_c^+ h^- h^{\prime -}$ final states

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1072 additional authors not shown)

Abstract: Decays of $Ξくしー_b^-$ and $Ωおめが_b^-$ baryons to $Λらむだ_c^+ h^- h^{\prime -}$ final states, with $h^- h^{\prime -}$ being $πぱい^-πぱい^-$, $K^-πぱい^-$ and $K^-K^-$ meson pairs, are searched for using data collected with the LHCb detector. The data sample studied corresponds to an integrated luminosity of $8.7\,\mathrm{fb}^{-1}$ of $pp$ collisions collected at centre-of-mass energies $\sqrt{s} = 7$, $8$ and… ▽ More Decays of $Ξくしー_b^-$ and $Ωおめが_b^-$ baryons to $Λらむだ_c^+ h^- h^{\prime -}$ final states, with $h^- h^{\prime -}$ being $πぱい^-πぱい^-$, $K^-πぱい^-$ and $K^-K^-$ meson pairs, are searched for using data collected with the LHCb detector. The data sample studied corresponds to an integrated luminosity of $8.7\,\mathrm{fb}^{-1}$ of $pp$ collisions collected at centre-of-mass energies $\sqrt{s} = 7$, $8$ and $13\,\mathrm{Te\kern -0.1em V}$. The products of the relative branching fractions and fragmentation fractions for each signal mode, relative to the $B^- \to Λらむだ_c^+ \overline{p} πぱい^-$ mode, are measured, with $Ξくしー_{b}^- \toΛらむだ_{c}^+ K^- πぱい^-$, $Ξくしー_{b}^- \toΛらむだ_{c}^+ K^- K^-$ and $Ωおめが_{b}^- \toΛらむだ_{c}^+ K^- K^-$ decays being observed at over $5\,σしぐま$ significance. The $Ξくしー_{b}^- \toΛらむだ_{c}^+ K^- πぱい^-$ mode is also used to measure the $Ξくしー_{b}^-$ production asymmetry, which is found to be consistent with zero. In addition, the $B^- \to Λらむだ_{c}^+ \overline{p} K^-$ decay is observed for the first time, and its branching fraction is measured relative to that of the $B^- \to Λらむだ_{c}^+ \overline{p} πぱい^-$ mode. △ Less

Submitted 22 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-013.html

Report number: CERN-EP-2024-116, LHCb-PAPER-2024-013

arXiv:2405.11324 [pdf, other]

Transverse polarization measurement of $Λらむだ$ hyperons in $p$Ne collisions at $\sqrt{s_{NN}}$ = 68.4 GeV with the $\mbox{LHCb}$ detector

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1065 additional authors not shown)

Abstract: A measurement of the transverse polarization of the $Λらむだ$ and $\barΛらむだ$ hyperons in $p$Ne fixed-target collisions at $\sqrt{s_{NN}}$ = 68.4 GeV is presented using data collected by the LHCb detector. The polarization is studied using the decay $Λらむだ\rightarrow p πぱい^-$ together with its charge conjugated process, the integrated values measured are… ▽ More A measurement of the transverse polarization of the $Λらむだ$ and $\barΛらむだ$ hyperons in $p$Ne fixed-target collisions at $\sqrt{s_{NN}}$ = 68.4 GeV is presented using data collected by the LHCb detector. The polarization is studied using the decay $Λらむだ\rightarrow p πぱい^-$ together with its charge conjugated process, the integrated values measured are $$ P_Λらむだ = 0.029 \pm 0.019 \, (\rm{stat}) \pm 0.012 \, (\rm{syst}) \, , $$ $$ P_{\barΛらむだ} = 0.003 \pm 0.023 \, (\rm{stat}) \pm 0.014 \,(\rm{syst}) \,. $$ Furthermore, the results are shown as a function of the Feynman~$x$~variable, transverse momentum, pseudorapidity and rapidity of the hyperons, and are compared with previous measurements. △ Less

Submitted 24 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3120 (LHCb public pages)

Report number: CERN-EP-2024-121, LHCb-PAPER-2024-009

arXiv:2405.10037 [pdf, other]

Bilateral Event Mining and Complementary for Event Stream Super-Resolution

Authors: Zhilin Huang, Quanmin Liang, Yijie Yu, Chujun Qin, Xiawu Zheng, Kai Huang, Zikun Zhou, Wenming Yang

Abstract: Event Stream Super-Resolution (ESR) aims to address the challenge of insufficient spatial resolution in event streams, which holds great significance for the application of event cameras in complex scenarios. Previous works for ESR often process positive and negative events in a mixed paradigm. This paradigm limits their ability to effectively model the unique characteristics of each event and mut… ▽ More Event Stream Super-Resolution (ESR) aims to address the challenge of insufficient spatial resolution in event streams, which holds great significance for the application of event cameras in complex scenarios. Previous works for ESR often process positive and negative events in a mixed paradigm. This paradigm limits their ability to effectively model the unique characteristics of each event and mutually refine each other by considering their correlations. In this paper, we propose a bilateral event mining and complementary network (BMCNet) to fully leverage the potential of each event and capture the shared information to complement each other simultaneously. Specifically, we resort to a two-stream network to accomplish comprehensive mining of each type of events individually. To facilitate the exchange of information between two streams, we propose a bilateral information exchange (BIE) module. This module is layer-wisely embedded between two streams, enabling the effective propagation of hierarchical global information while alleviating the impact of invalid information brought by inherent characteristics of events. The experimental results demonstrate that our approach outperforms the previous state-of-the-art methods in ESR, achieving performance improvements of over 11\% on both real and synthetic datasets. Moreover, our method significantly enhances the performance of event-based downstream tasks such as object recognition and video reconstruction. Our code is available at https://github.com/Lqm26/BMCNet-ESR. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: Accepted to CVPR2024

arXiv:2405.09738 [pdf, other]

X-ray Cool Core Remnants Heated by Strong Radio AGN Feedback

Authors: Wenhao Liu, Ming Sun, G. Mark Voit, Dharam Vir Lal, Paul Nulsen, Massimo Gaspari, Craig Sarazin, Steven Ehlert, Xianzhong Zheng

Abstract: Strong AGN heating provides an alternative means for the disruption of cluster cool cores (CCs) to cluster mergers. In this work we present a systematic Chandra study of a sample of 108 nearby ($z<0.1$) galaxy clusters, to investigate the effect of AGN heating on CCs. About 40% of clusters with small offsets between the BCG and the X-ray centre ($\le50$ kpc) have small CCs. For comparison, 14 of 1… ▽ More Strong AGN heating provides an alternative means for the disruption of cluster cool cores (CCs) to cluster mergers. In this work we present a systematic Chandra study of a sample of 108 nearby ($z<0.1$) galaxy clusters, to investigate the effect of AGN heating on CCs. About 40% of clusters with small offsets between the BCG and the X-ray centre ($\le50$ kpc) have small CCs. For comparison, 14 of 17 clusters with large offsets have small CCs, which suggests that mergers or sloshing can be efficient in reducing the CC size. Relaxed, small CC clusters generally have weak radio AGNs ($P_{1.4\rm GHz}<10^{23}$ W Hz$^{-1}$), and they show a lack of systems hosting a radio AGN with intermediate radio power ($2\times10^{23}<P_{1.4\rm GHz}<2\times10^{24}$ W Hz$^{-1}$). We found that the strongest circumnuclear ($<1$ kpc) X-ray emission only exists in clusters with strong radio AGN. The duty cycle of relaxed, small CC clusters is less than half of that for large CC clusters. It suggests that the radio activity of BCGs is affected by the properties of the surrounding gas beyond the central $\sim10$ kpc, and strong radio AGNs in small X-ray CCs fade more rapidly than those embedded in large X-ray CCs. A scenario is also presented for the transition of large CCs and coronae due to radio AGN feedback. We also present a detailed analysis of galaxy cluster 3C 129.1 as an example of a CC remnant possibly disrupted by radio AGN. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 17 pages, 10 figures, Accepted for publication on MNRAS

arXiv:2405.09308 [pdf, other]

TimeX++: Learning Time-Series Explanations with Information Bottleneck

Authors: Zichuan Liu, Tianchun Wang, Jimeng Shi, Xu Zheng, Zhuomin Chen, Lei Song, Wenqian Dong, Jayantha Obeysekera, Farhad Shirani, Dongsheng Luo

Abstract: Explaining deep learning models operating on time series data is crucial in various applications of interest which require interpretable and transparent insights from time series signals. In this work, we investigate this problem from an information theoretic perspective and show that most existing measures of explainability may suffer from trivial solutions and distributional shift issues. To add… ▽ More Explaining deep learning models operating on time series data is crucial in various applications of interest which require interpretable and transparent insights from time series signals. In this work, we investigate this problem from an information theoretic perspective and show that most existing measures of explainability may suffer from trivial solutions and distributional shift issues. To address these issues, we introduce a simple yet practical objective function for time series explainable learning. The design of the objective function builds upon the principle of information bottleneck (IB), and modifies the IB objective function to avoid trivial solutions and distributional shift issues. We further present TimeX++, a novel explanation framework that leverages a parametric network to produce explanation-embedded instances that are both in-distributed and label-preserving. We evaluate TimeX++ on both synthetic and real-world datasets comparing its performance against leading baselines, and validate its practical efficacy through case studies in a real-world environmental application. Quantitative and qualitative evaluations show that TimeX++ outperforms baselines across all datasets, demonstrating a substantial improvement in explanation quality for time series data. The source code is available at \url{https://github.com/zichuan-liu/TimeXplusplus}. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted by International Conference on Machine Learning (ICML 2024)

arXiv:2405.08748 [pdf, other]

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Authors: Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu , et al. (20 additional authors not shown)

Abstract: We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu… ▽ More We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models. Code and pretrained models are publicly available at github.com/Tencent/HunyuanDiT △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Project Page: https://dit.hunyuan.tencent.com/

arXiv:2405.06939 [pdf, other]

Tests for principal eigenvalues and eigenvectors

Authors: Jianqing Fan, Yingying Li, Ningning Xia, Xinghua Zheng

Abstract: We establish central limit theorems for principal eigenvalues and eigenvectors under a large factor model setting, and develop two-sample tests of both principal eigenvalues and principal eigenvectors. One important application is to detect structural breaks in large factor models. Compared with existing methods for detecting structural breaks, our tests provide unique insights into the source of… ▽ More We establish central limit theorems for principal eigenvalues and eigenvectors under a large factor model setting, and develop two-sample tests of both principal eigenvalues and principal eigenvectors. One important application is to detect structural breaks in large factor models. Compared with existing methods for detecting structural breaks, our tests provide unique insights into the source of structural breaks because they can distinguish between individual principal eigenvalues and/or eigenvectors. We demonstrate the application by comparing the principal eigenvalues and principal eigenvectors of S\&P500 Index constituents' daily returns over different years. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06556 [pdf, other]

Search for time-dependent $CP$ violation in $D^0 \rightarrow πぱい^+ πぱい^- πぱい^0$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

Abstract: A measurement of time-dependent $CP$ violation in $D^0 \rightarrow πぱい^+ πぱい^- πぱい^0$ decays using a $pp$ collision data sample collected by the LHCb experiment in 2012 and from 2015 to 2018, corresponding to an integrated luminosity of 7.7$\,\mathrm{fb}^{-1}$, is presented. The initial flavour of each $D^0$ candidate is determined from the charge of the pion produced in the… ▽ More A measurement of time-dependent $CP$ violation in $D^0 \rightarrow πぱい^+ πぱい^- πぱい^0$ decays using a $pp$ collision data sample collected by the LHCb experiment in 2012 and from 2015 to 2018, corresponding to an integrated luminosity of 7.7$\,\mathrm{fb}^{-1}$, is presented. The initial flavour of each $D^0$ candidate is determined from the charge of the pion produced in the $D^*(2010)^+ \rightarrow D^0 πぱい^+$ decay. The decay $D^0 \rightarrow K^- πぱい^+ πぱい^0$ is used as a control channel to validate the measurement procedure. The gradient of the time-dependent $CP$ asymmetry, $ΔでるたY$, in $D^0 \rightarrow πぱい^+ πぱい^- πぱい^0$ decays is measured to be \begin{equation*} ΔでるたY = (-1.3 \pm 6.3 \pm 2.4) \times 10^{-4}, \end{equation*} where the first uncertainty is statistical and the second is systematic, which is compatible with $CP$ conservation. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lhcbproject.web.cern.ch/Publications/p/LHCb-PAPER-2024-003.html (LHCb public pages)

Report number: LHCb-PAPER-2024-003, CERN-EP-2024-111

arXiv:2405.06125 [pdf]

Cooperative Route Guidance and Flow Control for Mixed Road Networks Comprising Expressway and Arterial Network

Authors: Yunran Di, Haotian Shi, Weihua Zhang, Heng Ding, Xiaoyan Zheng, Bin Ran

Abstract: Facing the congestion challenges of mixed road networks comprising expressways and arterial road networks, traditional control solutions fall short. To effectively alleviate traffic congestion in mixed road networks, it is crucial to clear the interaction between expressways and arterial networks and achieve orderly coordination between them. This study employs the multi-class cell transmission mo… ▽ More Facing the congestion challenges of mixed road networks comprising expressways and arterial road networks, traditional control solutions fall short. To effectively alleviate traffic congestion in mixed road networks, it is crucial to clear the interaction between expressways and arterial networks and achieve orderly coordination between them. This study employs the multi-class cell transmission model (CTM) combined with the macroscopic fundamental diagram (MFD) to model the traffic dynamics of expressway systems and arterial subregions, enabling vehicle path tracking across these two systems. Consequently, a comprehensive traffic transmission model suitable for mixed road networks has been integrated. Utilizing the SUMO software, a simulation platform for the mixed road network is established, and the average trip lengths within the model have been calibrated. Based on the proposed traffic model, this study constructs a route guidance model for mixed road networks and develops an integrated model predictive control (MPC) strategy that merges route guidance, perimeter control, and ramp metering to address the challenges of mixed road networks' traffic flow control. A case study of a scenario in which a bidirectional expressway connects two subregions is conducted, and the results validate the effectiveness of the proposed cooperative guidance and control (CGC) method in reducing overall congestion in mixed road networks. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05741 [pdf, ps, other]

Can large language models understand uncommon meanings of common words?

Authors: Jinyang Wu, Feihu Che, Xinxin Zheng, Shuai Zhang, Ruihan Jin, Shuai Nie, Pengpeng Shao, Jianhua Tao

Abstract: Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents. Yet, lacking widely acknowledged testing mechanisms, answering `whether LLMs are stochastic parrots or genuinely comprehend the world' remains unclear, fostering numerous studies and sparking heated debates. P… ▽ More Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents. Yet, lacking widely acknowledged testing mechanisms, answering `whether LLMs are stochastic parrots or genuinely comprehend the world' remains unclear, fostering numerous studies and sparking heated debates. Prevailing research mainly focuses on surface-level NLU, neglecting fine-grained explorations. However, such explorations are crucial for understanding their unique comprehension mechanisms, aligning with human cognition, and finally enhancing LLMs' general NLU capacities. To address this gap, our study delves into LLMs' nuanced semantic comprehension capabilities, particularly regarding common words with uncommon meanings. The idea stems from foundational principles of human communication within psychology, which underscore accurate shared understandings of word semantics. Specifically, this paper presents the innovative construction of a Lexical Semantic Comprehension (LeSC) dataset with novel evaluation metrics, the first benchmark encompassing both fine-grained and cross-lingual dimensions. Introducing models of both open-source and closed-source, varied scales and architectures, our extensive empirical experiments demonstrate the inferior performance of existing models in this basic lexical-meaning understanding task. Notably, even the state-of-the-art LLMs GPT-4 and GPT-3.5 lag behind 16-year-old humans by 3.9% and 22.3%, respectively. Additionally, multiple advanced prompting techniques and retrieval-augmented generation are also introduced to help alleviate this trouble, yet limitations persist. By highlighting the above critical shortcomings, this research motivates further investigation and offers novel insights for developing more intelligent LLMs. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.04264 [pdf, other]

doi 10.3847/2041-8213/ad47e0

Morphological Evidence for the eROSITA Bubbles Being Giant and Distant Structures

Authors: Teng Liu, Andrea Merloni, Jeremy Sanders, Gabriele Ponti, Andrew Strong, Michael Yeung, Nicola Locatelli, Peter Predehl, Xueying Zheng, Manami Sasaki, Michael Freyberg, Konrad Dennerl, Werner Becker, Kirpal Nandra, Martin Mayer, Johannes Buchner

Abstract: There are two contradictory views of the eROSITA bubbles: either a 10 kpc-scale pair of giant bubbles blown by the Galactic center (GC), or a 100 pc-scale local structure coincidentally located in the direction of GC. A key element of this controversy is the distance to the bubbles. Based on the 3D dust distribution in the Galactic plane, we found three isolated, distant (500-800 pc) clouds at int… ▽ More There are two contradictory views of the eROSITA bubbles: either a 10 kpc-scale pair of giant bubbles blown by the Galactic center (GC), or a 100 pc-scale local structure coincidentally located in the direction of GC. A key element of this controversy is the distance to the bubbles. Based on the 3D dust distribution in the Galactic plane, we found three isolated, distant (500-800 pc) clouds at intermediate Galactic latitudes. Their projected morphologies perfectly match the X-ray shadows on the defining features of the north eROSITA bubble, i.e., the North Polar Spur (NPS) and the Lotus Petal Cloud (LPC), indicating that both the NPS and LPC are distant with a distance lower limit of nearly 1kpc. In the X-ray dark region between the NPS and LPC, we found a few polarized radio arcs and attributed them to the bubble's shock front. These arcs match up perfectly with the outer border of the NPS and LPC and provide a way to define the bubble's border. The border defined in this way can be well described by the line-of-sight tangent of a 3D skewed cup model rooted in the GC. We conclude that, instead of being two independent, distant features, NPS and LPC compose a single, giant bubble, which, therefore, is most plausibly a 10-kpc scale bubble rooted at the GC. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Accepted for publication in ApJL. 11 pages, 5 figures

arXiv:2405.03327 [pdf, other]

Clustering of Disease Trajectories with Explainable Machine Learning: A Case Study on Postoperative Delirium Phenotypes

Authors: Xiaochen Zheng, Manuel Schürch, Xingyu Chen, Maria Angeliki Komninou, Reto Schüpbach, Ahmed Allam, Jan Bartussek, Michael Krauthammer

Abstract: The identification of phenotypes within complex diseases or syndromes is a fundamental component of precision medicine, which aims to adapt healthcare to individual patient characteristics. Postoperative delirium (POD) is a complex neuropsychiatric condition with significant heterogeneity in its clinical manifestations and underlying pathophysiology. We hypothesize that POD comprises several disti… ▽ More The identification of phenotypes within complex diseases or syndromes is a fundamental component of precision medicine, which aims to adapt healthcare to individual patient characteristics. Postoperative delirium (POD) is a complex neuropsychiatric condition with significant heterogeneity in its clinical manifestations and underlying pathophysiology. We hypothesize that POD comprises several distinct phenotypes, which cannot be directly observed in clinical practice. Identifying these phenotypes could enhance our understanding of POD pathogenesis and facilitate the development of targeted prevention and treatment strategies. In this paper, we propose an approach that combines supervised machine learning for personalized POD risk prediction with unsupervised clustering techniques to uncover potential POD phenotypes. We first demonstrate our approach using synthetic data, where we simulate patient cohorts with predefined phenotypes based on distinct sets of informative features. We aim to mimic any clinical disease with our synthetic data generation method. By training a predictive model and applying SHAP, we show that clustering patients in the SHAP feature importance space successfully recovers the true underlying phenotypes, outperforming clustering in the raw feature space. We then present a case study using real-world data from a cohort of elderly surgical patients. The results showcase the utility of our approach in uncovering clinically relevant subtypes of complex disorders like POD, paving the way for more precise and personalized treatment strategies. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.03252 [pdf, other]

A Universal List Decoding Algorithm with Application to Decoding of Polar Codes

Authors: Xiangping Zheng, Xiao Ma

Abstract: This paper is concerned with a guessing codeword decoding (GCD) of linear block codes. Compared with the guessing noise decoding (GND), which is only efficient for high-rate codes, the GCD is efficient for not only high-rate codes but also low-rate codes. We prove that the GCD typically requires a fewer number of queries than the GND. Compared with the ordered statistics decoding (OSD), the GCD do… ▽ More This paper is concerned with a guessing codeword decoding (GCD) of linear block codes. Compared with the guessing noise decoding (GND), which is only efficient for high-rate codes, the GCD is efficient for not only high-rate codes but also low-rate codes. We prove that the GCD typically requires a fewer number of queries than the GND. Compared with the ordered statistics decoding (OSD), the GCD does not require the online Gaussian elimination (GE). In addition to limiting the maximum number of searches, we suggest limiting the radius of searches in terms of soft weights or tolerated performance loss to further reduce the decoding complexity, resulting in the so-called truncated GCD. The performance gap between the truncated GCD and the optimal decoding can be upper bounded approximately by the saddlepoint approach or other numerical approaches. The derived upper bound captures the relationship between the performance and the decoding parameters, enabling us to balance the performance and the complexity by optimizing the decoding parameters of the truncated GCD. We also introduce a parallel implementation of the (truncated) GCD algorithm to reduce decoding latency without compromising performance. Another contribution of this paper is the application of the GCD to the polar codes. We propose a multiple-bit-wise decoding algorithm over a pruned tree for the polar codes, referred to as the successive-cancellation list (SCL) decoding algorithm by GCD. First, we present a strategy for pruning the conventional polar decoding tree based on the complexity analysis rather than the specific bit patterns. Then we apply the GCD algorithm in parallel aided by the early stopping criteria to the leaves of the pruned tree. Simulation results show that, without any performance loss as justified by analysis, the proposed decoding algorithm can significantly reduce the decoding latency of the polar codes. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 47 pages, 24 figures

arXiv:2405.02764 [pdf, other]

Assessing Adversarial Robustness of Large Language Models: An Empirical Study

Authors: Zeyu Yang, Zhao Meng, Xiaochen Zheng, Roger Wattenhofer

Abstract: Large Language Models (LLMs) have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact of model size, structure, and fine-tuning strategies on their resistance to adversar… ▽ More Large Language Models (LLMs) have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact of model size, structure, and fine-tuning strategies on their resistance to adversarial perturbations. Our comprehensive evaluation across five diverse text classification tasks establishes a new benchmark for LLM robustness. The findings of this study have far-reaching implications for the reliable deployment of LLMs in real-world applications and contribute to the advancement of trustworthy AI systems. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 16 pages, 9 figures, 10 tables

arXiv:2405.00587 [pdf, other]

GraCo: Granularity-Controllable Interactive Segmentation

Authors: Yian Zhao, Kehan Li, Zesen Cheng, Pengchong Qiao, Xiawu Zheng, Rongrong Ji, Chang Liu, Li Yuan, Jie Chen

Abstract: Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant resul… ▽ More Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant results. In this work, we introduce Granularity-Controllable Interactive Segmentation (GraCo), a novel approach that allows precise control of prediction granularity by introducing additional parameters to input. This enhances the customization of the interactive system and eliminates redundancy while resolving ambiguity. Nevertheless, the exorbitant cost of annotating multi-granularity masks and the lack of available datasets with granularity annotations make it difficult for models to acquire the necessary guidance to control output granularity. To address this problem, we design an any-granularity mask generator that exploits the semantic property of the pre-trained IS model to automatically generate abundant mask-granularity pairs without requiring additional manual annotation. Based on these pairs, we propose a granularity-controllable learning strategy that efficiently imparts the granularity controllability to the IS model. Extensive experiments on intricate scenarios at object and part levels demonstrate that our GraCo has significant advantages over previous methods. This highlights the potential of GraCo to be a flexible annotation tool, capable of adapting to diverse segmentation scenarios. The project page: https://zhao-yian.github.io/GraCo. △ Less

Submitted 16 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: CVPR2024 Highlight, Project: https://zhao-yian.github.io/GraCo

arXiv:2405.00098 [pdf, other]

Amplitude analysis and branching fraction measurement of $B^{+}\to D^{*-}D^{+}_{s}πぱい^{+}$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1057 additional authors not shown)

Abstract: The decays of the $B^{+}$ meson to the final state $D^{*-}D^{+}_{s}πぱい^{+}$ are studied in proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV, corresponding to a total integrated luminosity of 9 fb$^{-1}$. The ratio of branching fractions of the $B^{+}\to D^{*-}D^{+}_{s}πぱい^{+}$ and $B^{0}\to D^{*-}D^{+}_{s}$ decays is measured to be… ▽ More The decays of the $B^{+}$ meson to the final state $D^{*-}D^{+}_{s}πぱい^{+}$ are studied in proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV, corresponding to a total integrated luminosity of 9 fb$^{-1}$. The ratio of branching fractions of the $B^{+}\to D^{*-}D^{+}_{s}πぱい^{+}$ and $B^{0}\to D^{*-}D^{+}_{s}$ decays is measured to be $0.173\pm 0.006\pm 0.010$, where the first uncertainty is statistical and the second is systematic. Using partially reconstructed $D^{*+}_{s}\to D^{+}_{s}γがんま$ and $D^{+}_{s}πぱい^{0}$ decays, the ratio of branching fractions between the $B^{+}\to D^{*-}D^{*+}_{s}πぱい^{+}$ and $B^{+}\to D^{*-}D^{+}_{s}πぱい^{+}$ decays is determined as $1.31\pm 0.07\pm 0.14$. An amplitude analysis of the $B^{+}\to D^{*-}D^{+}_{s}πぱい^{+}$ decay is performed for the first time, revealing dominant contributions from known excited charm resonances decaying to the $D^{*-}πぱい^{+}$ final state. No significant evidence of exotic contributions in the $D^{+}_{s}πぱい^{+}$ or $D^{*-}D^{+}_{s}$ channels is found. The fit fraction of the scalar state $T_{c\bar{s} 0}^{\ast}(2900)^{++}$ observed in the $B^{+}\to D^{-}D^{+}_{s}πぱい^{+}$ decay is determined to be less than 2.3% at a 90% confidence level. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-001.html (LHCb public pages)

Report number: LHCb-PAPER-2024-001, CERN-EP-2024-110

arXiv:2404.19510 [pdf, other]

First observation of $Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{(*)++} D^{(*)-} K^{-}$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1067 additional authors not shown)

Abstract: The four decays, $Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{(*)++} D^{(*)-} K^{-}$, are observed for the first time using proton-proton collision data collected with the LHCb detector at a centre-of-mass energy of $13\,\rm{TeV}$, corresponding to an integrated luminosity of $6\,\rm{fb}^{-1}$. By considering the $Λらむだ_b^0 \rightarrow Λらむだ_c^{+} \overline{D}^0 K^{-}$ decay as reference channel, the following branching f… ▽ More The four decays, $Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{(*)++} D^{(*)-} K^{-}$, are observed for the first time using proton-proton collision data collected with the LHCb detector at a centre-of-mass energy of $13\,\rm{TeV}$, corresponding to an integrated luminosity of $6\,\rm{fb}^{-1}$. By considering the $Λらむだ_b^0 \rightarrow Λらむだ_c^{+} \overline{D}^0 K^{-}$ decay as reference channel, the following branching fraction ratios are measured to be, $$\frac{\cal{B} (Λらむだ_{b}^{0} \rightarrow Σしぐま_{c}^{++} \rm{D}^{-} {K}^{-})}{\cal{B}(Λらむだ_{b}^{0} \rightarrow Λらむだ_c^{+} \rm \overline{D}^0 {K}^{-})} = {0.282}\pm{0.016}\pm{0.016}\pm{0.005}, \frac{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_{c}^{*++} \rm {D}^{-} {K}^{-})}{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{++} \rm {D}^{-} {K}^{-})} = {0.460}\pm{0.052}\pm{0.028}, \frac{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_{c}^{++} \rm {D}^{*-} {K}^{-})}{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{++} \rm {D}^{-} {K}^{-})} = {2.261}\pm{0.202}\pm{0.129}\pm{0.046}, \frac{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_{c}^{*++} \rm D^{*-} K^{-})}{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{++} \rm D^{-} K^{-})} = {0.896}\pm{0.137}\pm{0.066}\pm{0.018},$$ where the first uncertainties are statistical, the second are systematic, and the third are due to uncertainties in the branching fractions of intermediate particle decays. These initial observations mark the beginning of pentaquark searches in these modes, with more data set to become available following the LHCb upgrade. △ Less

Submitted 11 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-044.html (LHCb public pages)

Report number: LHCb-PAPER-2023-044, CERN-EP-2024-098

arXiv:2404.19242 [pdf, other]

A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems

Authors: Xin Ma, Puchen Zhu, Xiao Li, Xiaoyin Zheng, Jianshu Zhou, Xuchen Wang, Kwok Wai Samuel Au

Abstract: Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an… ▽ More Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial and decentering distortions of the lens to improve the accuracy of stereo vision systems and simplify their calibration process. In addition, we present an easy and flexible calibration method for the MDM of stereo vision systems with a commonly used planar pattern, which requires cameras to observe the planar pattern in different orientations. The proposed technique is easy to use and flexible compared with classical calibration techniques for depth-dependent distortion models in which the lens must be perpendicular to the planar pattern. The experimental validation of the MDM and its calibration method showed that the MDM improved the calibration accuracy by 56.55% and 74.15% compared with the Li's distortion model and traditional Brown's distortion model. Besides, an iteration-based reconstruction method is proposed to iteratively estimate the depth information in the MDM during three-dimensional reconstruction. The results showed that the accuracy of the iteration-based reconstruction method was improved by 9.08% compared with that of the non-iteration reconstruction method. △ Less

Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement

arXiv:2404.17164 [pdf, other]

DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs

Authors: Xindi Zheng, Yuwei Wu, Yu Pan, Wanyu Lin, Lei Ma, Jianjun Zhao

Abstract: Missing data imputation poses a paramount challenge when dealing with graph data. Prior works typically are based on feature propagation or graph autoencoders to address this issue. However, these methods usually encounter the over-smoothing issue when dealing with missing data, as the graph neural network (GNN) modules are not explicitly designed for handling missing data. This paper proposes a n… ▽ More Missing data imputation poses a paramount challenge when dealing with graph data. Prior works typically are based on feature propagation or graph autoencoders to address this issue. However, these methods usually encounter the over-smoothing issue when dealing with missing data, as the graph neural network (GNN) modules are not explicitly designed for handling missing data. This paper proposes a novel framework, called Dual-Path Generative Adversarial Network (DPGAN), that can deal simultaneously with missing data and avoid over-smoothing problems. The crux of our work is that it admits both global and local representations of the input graph signal, which can capture the long-range dependencies. It is realized via our proposed generator, consisting of two key components, i.e., MLPUNet++ and GraphUNet++. Our generator is trained with a designated discriminator via an adversarial process. In particular, to avoid assessing the entire graph as did in the literature, our discriminator focuses on the local subgraph fidelity, thereby boosting the quality of the local imputation. The subgraph size is adjustable, allowing for control over the intensity of adversarial regularization. Comprehensive experiments across various benchmark datasets substantiate that DPGAN consistently rivals, if not outperforms, existing state-of-the-art imputation algorithms. The code is provided at \url{https://github.com/momoxia/DPGAN}. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 9 pages

arXiv:2404.16686 [pdf, other]

Calibrating non-parametric morphological indicators from {\it JWST} images for galaxies over $0.5<z<3$

Authors: Jian Ren, F. S. Liu, Nan Li, Qifan Cui, Pinsong Zhao, Yubin Li, Qi Song, Hassen M. Yesuf, Xian Zhong Zheng

Abstract: The measurements of morphological indicators of galaxies are often influenced by a series of observational effects. In this study, we utilize a sample of over 800 TNG50 simulated galaxies with log($M_*$/M$_\odot$)$>9$ at $0.5<z<3$ to investigate the differences in non-parametric morphological indicators ($C$, $S$, $Gini$, $M_{\rm 20}$, $A_{\rm O}$, and $D_{\rm O}$) derived from noise-free and high… ▽ More The measurements of morphological indicators of galaxies are often influenced by a series of observational effects. In this study, we utilize a sample of over 800 TNG50 simulated galaxies with log($M_*$/M$_\odot$)$>9$ at $0.5<z<3$ to investigate the differences in non-parametric morphological indicators ($C$, $S$, $Gini$, $M_{\rm 20}$, $A_{\rm O}$, and $D_{\rm O}$) derived from noise-free and high-resolution TNG50 images and mock images simulated to have the same observational conditions as {\it JWST}/NIRCam. We quantify the relationship between intrinsic and observed values of the morphological indicators and accordingly apply this calibration to over 4600 galaxies in the same stellar mass and redshift ranges observed in {\it JWST} CEERS and JADES surveys. We find a significant evolution of morphological indicators with rest-frame wavelength ($λらむだ_{\rm rf}$) at $λらむだ_{\rm rf}<1$\,$μみゅー$m, while essentially no obvious variations occur at $λらむだ_{\rm rf}>1$\,$μみゅー$m. The morphological indicators of star-forming galaxies (SFGs) and quiescent galaxies (QGs) are significantly different. The morphologies of QGs exhibit a higher sensitivity to rest-frame wavelength than SFGs. After analyzing the evolution of morphological indicators in the rest-frame V-band (0.5-0.7\,$μみゅー$m) and rest-frame J-band (1.1-1.4\,$μみゅー$m), we find that the morphologies of QGs evolve substantially with both redshift and stellar mass. For SFGs, the $C$, $Gini$ and $M_{\rm 20}$ show a rapid evolution with stellar mass at log($M_*$/M$_\odot$)$\geq10.5$, while the $A_{\rm O}$, $D_{\rm O}$ and $A$ evolve with both redshift and stellar mass. Our comparison shows that TNG50 simulations effectively reproduce the morphological indicators we measured from {\it JWST} observations when the impact of dust attenuation is considered. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 21 pages, 14 figures, 1 table. Accepted for publication in ApJ

arXiv:2404.16501 [pdf, other]

360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes

Authors: Xu Zheng, Pengyuan Zhou, Athanasios V. Vasilakos, Lin Wang

Abstract: In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2)… ▽ More In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2) style discrepancies inherent in the UDA problem, and 3) inevitable distortion of the panoramic images. To tackle these problems, we propose 360SFUDA++ that effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images and transfers the reliable knowledge to the target panoramic domain. Specifically, we first utilize Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) to patches with fixed FoV projection (FFP) to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, as the distinct projections make it less possible to directly transfer knowledge between domains, we then propose Reliable Panoramic Prototype Adaptation Module (RP2AM) to transfer knowledge at both prediction and prototype levels. RP$^2$AM selects the confident knowledge and integrates panoramic prototypes for reliable knowledge adaptation. Moreover, we introduce Cross-projection Dual Attention Module (CDAM), which better aligns the spatial and channel characteristics across projections at the feature level between domains. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our 360SFUDA++ achieves significantly better performance than prior SFUDA methods. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2403.12505

arXiv:2404.16235 [pdf, other]

Inclusive studies of two- and three-nucleon short-range correlations in $^3$H and $^3$He

Authors: S. Li, S. N. Santiesteban, J. Arrington, R. Cruz-Torres, L. Kurbany, D. Abrams, S. Alsalmi, D. Androic, K. Aniol, T. Averett, C. Ayerbe Gayoso, J. Bane, S. Barcus, J. Barrow, A. Beck, V. Bellini, H. Bhatt, D. Bhetuwal, D. Biswas, D. Bulumulla, A. Camsonne, J. Castellanos, J. Chen, J-P. Chen, D. Chrisman , et al. (91 additional authors not shown)

Abstract: Inclusive electron scattering at carefully chosen kinematics can isolate scattering from short-range correlations (SRCs), produced through hard, short-distance interactions of nucleons in the nucleus. Because the two-nucleon (2N) SRCs arise from the same N-N interaction in all nuclei, the cross section in the SRC-dominated regime is identical up to an overall scaling factor, and the A/2H cross sec… ▽ More Inclusive electron scattering at carefully chosen kinematics can isolate scattering from short-range correlations (SRCs), produced through hard, short-distance interactions of nucleons in the nucleus. Because the two-nucleon (2N) SRCs arise from the same N-N interaction in all nuclei, the cross section in the SRC-dominated regime is identical up to an overall scaling factor, and the A/2H cross section ratio is constant in this region. This scaling behavior has been used to identify SRC dominance and to map out the contribution of SRCs for a wide range of nuclei. We examine this scaling behavior at lower momentum transfers using new data on $^2$H, $^3$H, and $^3$He which show that the scaling region is larger than in heavy nuclei. Based on the improved scaling, especially for $^3$H/$^3$He, we examine the ratios at kinematics where three-nucleon SRCs may play an important role. The data for the largest initial nucleon momenta are consistent with isolation of scattering from 3N-SRCs, and suggest that the very-highest momentum nucleons in $^3$He have a nearly isospin-independent momentum configuration, or a small enhancement of the proton distribution. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.16033 [pdf, other]

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Authors: Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

Abstract: With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-… ▽ More With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-level perception tools that fail to provide abstract summaries necessary for comprehensive reasoning. We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks. This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability. To this end, we propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture. Cantor first acts as a decision generator and integrates visual inputs to analyze the image and problem, ensuring a closer alignment with the actual context. Furthermore, Cantor leverages the advanced cognitive functions of MLLMs to perform as multifaceted experts for deriving higher-level information, enhancing the CoT generation process. Our extensive experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance across two complex visual reasoning datasets, without necessitating fine-tuning or ground-truth rationales. Project Page: https://ggg0919.github.io/cantor/ . △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: The project page is available at https://ggg0919.github.io/cantor/

arXiv:2404.15660 [pdf, other]

KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering

Authors: Xinxin Zheng, Feihu Che, Jinyang Wu, Shuai Zhang, Shuai Nie, Kang Liu, Jianhua Tao

Abstract: Large language models (LLMs) suffer from the hallucination problem and face significant challenges when applied to knowledge-intensive tasks. A promising approach is to leverage evidence documents as extra supporting knowledge, which can be obtained through retrieval or generation. However, existing methods directly leverage the entire contents of the evidence document, which may introduce noise i… ▽ More Large language models (LLMs) suffer from the hallucination problem and face significant challenges when applied to knowledge-intensive tasks. A promising approach is to leverage evidence documents as extra supporting knowledge, which can be obtained through retrieval or generation. However, existing methods directly leverage the entire contents of the evidence document, which may introduce noise information and impair the performance of large language models. To tackle this problem, we propose a novel Knowledge Selection of Large Language Models (KS-LLM) method, aiming to identify valuable information from evidence documents. The KS-LLM approach utilizes triples to effectively select knowledge snippets from evidence documents that are beneficial to answering questions. Specifically, we first generate triples based on the input question, then select the evidence sentences most similar to triples from the evidence document, and finally combine the evidence sentences and triples to assist large language models in generating answers. Experimental comparisons on several question answering datasets, such as TriviaQA, WebQ, and NQ, demonstrate that the proposed method surpasses the baselines and achieves the best results. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.14949 [pdf, other]

Multi-Modal Prompt Learning on Blind Image Quality Assessment

Authors: Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

Abstract: Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semant… ▽ More Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. However, the generalist nature of these pre-trained Vision-Language (VL) models often renders them suboptimal for IQA-specific tasks. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. Existing prompt-based VL models overly focus on incremental semantic information from text, neglecting the rich insights available from visual data analysis. This imbalance limits their performance improvements in IQA tasks. This paper introduces an innovative multi-modal prompt-based methodology for IQA. Our approach employs carefully crafted prompts that synergistically mine incremental semantic information from both visual and linguistic data. Specifically, in the visual branch, we introduce a multi-layer prompt structure to enhance the VL model's adaptability. In the text branch, we deploy a dual-prompt scheme that steers the model to recognize and differentiate between scene category and distortion type, thereby refining the model's capacity to assess image quality. Our experimental findings underscore the effectiveness of our method over existing Blind Image Quality Assessment (BIQA) approaches. Notably, it demonstrates competitive performance across various datasets. Our method achieves Spearman Rank Correlation Coefficient (SRCC) values of 0.961(surpassing 0.946 in CSIQ) and 0.941 (exceeding 0.930 in KADID), illustrating its robustness and accuracy in diverse contexts. △ Less

Submitted 18 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14047 [pdf, other]

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Authors: Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno

Abstract: Meta's LLaMA family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-limited scenarios, we explore LLaMA3's capabilities when qua… ▽ More Meta's LLaMA family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-limited scenarios, we explore LLaMA3's capabilities when quantized to low bit-width. This exploration holds the potential to unveil new insights and challenges for low-bit quantization of LLaMA3 and other forthcoming LLMs, especially in addressing performance degradation problems that suffer in LLM compression. Specifically, we evaluate the 10 existing post-training quantization and LoRA-finetuning methods of LLaMA3 on 1-8 bits and diverse datasets to comprehensively reveal LLaMA3's low-bit quantization performance. Our experiment results indicate that LLaMA3 still suffers non-negligent degradation in these scenarios, especially in ultra-low bit-width. This highlights the significant performance gap under low bit-width that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, pushing the LLMs to lower bit-width with higher accuracy for being practical. Our project is released on https://github.com/Macaronlin/LLaMA3-Quantization and quantized LLaMA3 models are released in https://huggingface.co/LLMQ. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13534 [pdf, other]

Motion-aware Latent Diffusion Models for Video Frame Interpolation

Authors: Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, Yaowei Wang, Wenming Yang

Abstract: With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecut… ▽ More With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecutive frames, and this imprecise estimation leads to blurred and visually incoherent interpolated frames. In this paper, we propose a novel diffusion framework, motion-aware latent diffusion models (MADiff), which is specifically designed for the VFI task. By incorporating motion priors between the conditional neighboring frames with the target interpolated frame predicted throughout the diffusion sampling procedure, MADiff progressively refines the intermediate outcomes, culminating in generating both visually smooth and realistic results. Extensive experiments conducted on benchmark datasets demonstrate that our method achieves state-of-the-art performance significantly outperforming existing approaches, especially under challenging scenarios involving dynamic textures with complex motion. △ Less

Submitted 4 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: 17 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2303.09508 by other authors

arXiv:2404.13425 [pdf, other]

AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models

Authors: Yuheng Ji, Yue Liu, Zhicheng Zhang, Zhao Zhang, Yuting Zhao, Gang Zhou, Xingwei Zhang, Xinwang Liu, Xiaolong Zheng

Abstract: Vision-Language Models (VLMs) are a significant technique for Artificial General Intelligence (AGI). With the fast growth of AGI, the security problem become one of the most important challenges for VLMs. In this paper, through extensive experiments, we demonstrate the vulnerability of the conventional adaptation methods for VLMs, which may bring significant security risks. In addition, as the siz… ▽ More Vision-Language Models (VLMs) are a significant technique for Artificial General Intelligence (AGI). With the fast growth of AGI, the security problem become one of the most important challenges for VLMs. In this paper, through extensive experiments, we demonstrate the vulnerability of the conventional adaptation methods for VLMs, which may bring significant security risks. In addition, as the size of the VLMs increases, performing conventional adversarial adaptation techniques on VLMs results in high computational costs. To solve these problems, we propose a parameter-efficient \underline{Adv}ersarial adaptation method named \underline{AdvLoRA} by \underline{Lo}w-\underline{R}ank \underline{A}daptation. At first, we investigate and reveal the intrinsic low-rank property during the adversarial adaptation for VLMs. Different from LoRA, we improve the efficiency and robustness of adversarial adaptation by designing a novel reparameterizing method based on parameter clustering and parameter alignment. In addition, an adaptive parameter update strategy is proposed to further improve the robustness. By these settings, our proposed AdvLoRA alleviates the model security and high resource waste problems. Extensive experiments demonstrate the effectiveness and efficiency of the AdvLoRA. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.11846 [pdf, other]

Directional intense terahertz radiation driven by abruptly autofocusing lasers in air

Authors: Xiao-Ran Zheng, Nan Li, Wei-Min Wang, Rui Zhang, Cun-Lin Zhang, Liang-Liang Zhang

Abstract: Two-color laser induced plasma filamentation in air could serve as tabletop sources of broadband terahertz (THz) pulses. Ubiquitous air in the earth facilitates its widespread utilities, particularly, in wireless communication and remote sensing, exploiting the unique advantage of the air-based scheme that the THz sources can be delivered over standoff distances via the pump laser propagation in a… ▽ More Two-color laser induced plasma filamentation in air could serve as tabletop sources of broadband terahertz (THz) pulses. Ubiquitous air in the earth facilitates its widespread utilities, particularly, in wireless communication and remote sensing, exploiting the unique advantage of the air-based scheme that the THz sources can be delivered over standoff distances via the pump laser propagation in air. However, the THz emission pattern inevitably has a conical angular profile with a dip in the propagation axis, therefore, THz energy concentration, propagation directionality, and accuracy of signal demodulation are significantly impaired, greatly limiting its direct applications. Here, we successfully eliminate the unfavorable conical profile by experiments and meanwhile enhance THz directionality and intensity by 17 folds by use of abruptly-autofocusing laser beams. Our theory and simulations show that these observations are attributed to efficient suppression of the dephasing effect appearing in previous investigations with ordinary Gaussian laser beams. This scheme is easily accessible since the abruptly-autofocusing beam can be achieved by imposing a spatial light modulator on the input Gaussian beam. This study solves a long-standing problem in the two-color laser scheme and clarifies the underlying physics of the conical angular profile formation. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 22 pages, 4 figures, 1 table

arXiv:2404.11093 [pdf, other]

Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems

Authors: Long Cao, Liwei Ge, Daochi Zhang, Xiang Li, Yao Wang, Rui-Xue Xu, YiJing Yan, Xiao Zheng

Abstract: Simulating the dynamics of open quantum systems coupled to non-Markovian environments remains an outstanding challenge due to exponentially scaling computational costs. We present an artificial intelligence strategy to overcome this obstacle by integrating the neural quantum states approach into the dissipaton-embedded quantum master equation in second quantization (DQME-SQ). Our approach utilizes… ▽ More Simulating the dynamics of open quantum systems coupled to non-Markovian environments remains an outstanding challenge due to exponentially scaling computational costs. We present an artificial intelligence strategy to overcome this obstacle by integrating the neural quantum states approach into the dissipaton-embedded quantum master equation in second quantization (DQME-SQ). Our approach utilizes restricted Boltzmann machines (RBMs) to compactly represent the reduced density tensor, explicitly encoding the combined effects of system-environment correlations and nonMarkovian memory. Applied to model systems exhibiting prominent effects of system-environment correlation and non-Markovian memory, our approach achieves comparable accuracy to conventional hierarchical equations of motion, while requiring significantly fewer dynamical variables. The novel RBM-based DQME-SQ approach paves the way for investigating non-Markovian open quantum dynamics in previously intractable regimes, with implications spanning various frontiers of modern science. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 7 pages, 5 figures

arXiv:2404.11064 [pdf, other]

Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

Authors: Yongdong Luo, Haojia Lin, Xiawu Zheng, Yigeng Jiang, Fei Chao, Jie Hu, Guannan Jiang, Songan Zhang, Rongrong Ji

Abstract: 3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in various 3D applications, which require both shared and complementary information in localization and visual-language relationships. Therefore, existing approaches adopt the two-stage "detect-then-describe/discriminate" pipeline, which relies heavily on the performance of the detector, resulting in suboptimal perform… ▽ More 3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in various 3D applications, which require both shared and complementary information in localization and visual-language relationships. Therefore, existing approaches adopt the two-stage "detect-then-describe/discriminate" pipeline, which relies heavily on the performance of the detector, resulting in suboptimal performance. Inspired by DETR, we propose a unified framework, 3DGCTR, to jointly solve these two distinct but closely related tasks in an end-to-end fashion. The key idea is to reconsider the prompt-based localization ability of the 3DVG model. In this way, the 3DVG model with a well-designed prompt as input can assist the 3DDC task by extracting localization information from the prompt. In terms of implementation, we integrate a Lightweight Caption Head into the existing 3DVG network with a Caption Text Prompt as a connection, effectively harnessing the existing 3DVG model's inherent localization capacity, thereby boosting 3DDC capability. This integration facilitates simultaneous multi-task training on both tasks, mutually enhancing their performance. Extensive experimental results demonstrate the effectiveness of this approach. Specifically, on the ScanRefer dataset, 3DGCTR surpasses the state-of-the-art 3DDC method by 4.3% in CIDEr@0.5IoU in MLE training and improves upon the SOTA 3DVG method by 3.16% in Acc@0.25IoU. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10794 [pdf, other]

Constraining on the non-standard cosmological models combining the observations of high-redshift quasars and BAO

Authors: Ziqiang Liu, Tonghua Liu, Xinyi Zhong, Yifei Xu, Xiaogang Zheng

Abstract: In this work, we studied four types of cosmological models with different mechanisms driving the accelerated expansion of the universe, include Braneworld models, Chaplygin Gas models, Emergent Dark Energy models, and cosmological torsion models. Considering that the dynamics of these models at low redshifts are very similar and difficult to distinguish, we used the latest and largest UV and X-ray… ▽ More In this work, we studied four types of cosmological models with different mechanisms driving the accelerated expansion of the universe, include Braneworld models, Chaplygin Gas models, Emergent Dark Energy models, and cosmological torsion models. Considering that the dynamics of these models at low redshifts are very similar and difficult to distinguish, we used the latest and largest UV and X-ray measurements of quasars (QSOs) observations covering the range of redshift $0.009<z<7.5$. However, the high intrinsic dispersion of this sample and the degeneracy between cosmological model parameters, we added 2D-BAO and 3D-BAO datasets to help us constrain the parameters of these cosmological models. Our results suggest that standard cold dark matter scenario may not be the best cosmological model preferred by the high-redshift observations. The Generalized Chaplygin Gas (GCG) and cosmological constant plus torsion (named Case II) models perform best by Akaike Information Criterion (AIC), but the $Λらむだ$CDM is the best cosmological model preferred by Bayesian Information Criterion (BIC). Our work also supports that the Phenomenologically Emergent Dark Energy and cosmological torsion models may alleviate the Hubble tension, the reported value of the Hubble constant obtained from QSO+BAO datasets combination lies between Planck 2018 observations and local measurements from the SH0ES collaboration, while other cosmological models all support that the Hubble constant tends to be closer to recent Planck 2018 results, but these model are penalized by information criterion. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 15 pages, 10 figures, accepted for publication in EPJC. arXiv admin note: substantial text overlap with arXiv:2105.04992 by other authors

Journal ref: EPJC, 84, 444 (2024)

arXiv:2404.09498 [pdf, other]

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

Authors: Xinyu Xie, Yawen Cui, Chio-In Ieong, Tao Tan, Xiaozhi Zhang, Xubin Zheng, Zitong Yu

Abstract: Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confro… ▽ More Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity. Recently, the Selective Structured State Space Model has exhibited significant potential for long-range dependency modeling with linear complexity, offering a promising avenue to address the aforementioned dilemma. In this paper, we propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba. Specifically, we devise an improved efficient Mamba model for image fusion, integrating efficient visual state space model with dynamic convolution and channel attention. This refined model not only upholds the performance of Mamba and global modeling capability but also diminishes channel redundancy while enhancing local enhancement capability. Additionally, we devise a dynamic feature fusion module (DFFM) comprising two dynamic feature enhancement modules (DFEM) and a cross modality fusion mamba module (CMFM). The former serves for dynamic texture enhancement and dynamic difference perception, whereas the latter enhances correlation features between modes and suppresses redundant intermodal information. FusionMamba has yielded state-of-the-art (SOTA) performance across various multimodal medical image fusion tasks (CT-MRI, PET-MRI, SPECT-MRI), infrared and visible image fusion task (IR-VIS) and multimodal biomedical image fusion dataset (GFP-PC), which is proved that our model has generalization ability. The code for FusionMamba is available at https://github.com/millieXie/FusionMamba. △ Less

Submitted 20 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09435 [pdf, other]

doi 10.1016/j.xcrp.2023.101725

Device-independent Verification of Quantum Coherence without Quantum Control

Authors: Yan-Han Yang, Xue Yang, Xing-Zhou Zheng, Ming-Xing Luo

Abstract: Quantum coherence plays a crucial role in manipulating and controlling quantum systems, leading to breakthroughs in various fields such as quantum information, quantum sensing, and the detection of gravitational waves. Most coherence witnesses rely on the assumption of being able to control quantum states. Here we report a device-independent coherence model by extending the standard Bell theory to… ▽ More Quantum coherence plays a crucial role in manipulating and controlling quantum systems, leading to breakthroughs in various fields such as quantum information, quantum sensing, and the detection of gravitational waves. Most coherence witnesses rely on the assumption of being able to control quantum states. Here we report a device-independent coherence model by extending the standard Bell theory to multiple source scenarios. We propose a Greenberger-Horne-Zeilinger-type paradox to verify the particle and wave behaviors of a coherent carrier. We experimentally generate generalized two-photon entangled states that violate the present paradox, witnessing spatial quantum superposition through local measurements. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 19 pages, 7 figures (Almost publish version)

Journal ref: Cell Reports Physical Science, Volume 4, Issue 12, 20 December 2023, 101725

arXiv:2404.09421 [pdf, ps, other]

Two methods addressing variable-exponent fractional initial and boundary value problems and Abel integral equation

Authors: Xiangcheng Zheng

Abstract: Variable-exponent fractional models attract increasing attentions in various applications, while the rigorous analysis is far from well developed. This work provides general tools to address these models. Specifically, we first develop a convolution method to study the well-posedness, regularity, an inverse problem and numerical approximation for the sundiffusion of variable exponent. For models s… ▽ More Variable-exponent fractional models attract increasing attentions in various applications, while the rigorous analysis is far from well developed. This work provides general tools to address these models. Specifically, we first develop a convolution method to study the well-posedness, regularity, an inverse problem and numerical approximation for the sundiffusion of variable exponent. For models such as the variable-exponent two-sided space-fractional boundary value problem (including the variable-exponent fractional Laplacian equation as a special case) and the distributed variable-exponent model, for which the convolution method does not apply, we develop a perturbation method to prove their well-posedness. The relation between the convolution method and the perturbation method is discussed, and we further apply the latter to prove the well-posedness of the variable-exponent Abel integral equation and discuss the constraint on the data under different initial values of variable exponent. △ Less

Submitted 11 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.09204 [pdf, other]

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Authors: Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng, Wei Zeng

Abstract: Multimodal Large Language Models (MLLMs) have shown impressive results on various multimodal tasks. However, most existing MLLMs are not well suited for document-oriented tasks, which require fine-grained image perception and information compression. In this paper, we present TextHawk, a MLLM that is specifically designed for document-oriented tasks, while preserving the general capabilities of ML… ▽ More Multimodal Large Language Models (MLLMs) have shown impressive results on various multimodal tasks. However, most existing MLLMs are not well suited for document-oriented tasks, which require fine-grained image perception and information compression. In this paper, we present TextHawk, a MLLM that is specifically designed for document-oriented tasks, while preserving the general capabilities of MLLMs. TextHawk is aimed to explore efficient fine-grained perception by designing four dedicated components. Firstly, a ReSampling and ReArrangement (ReSA) module is proposed to reduce the redundancy in the document texts and lower the computational cost of the MLLM. We explore encoding the positions of each local feature by presenting Scalable Positional Embeddings (SPEs), which can preserve the scalability of various image sizes. A Query Proposal Network (QPN) is then adopted to initialize the queries dynamically among different sub-images. To further enhance the fine-grained visual perceptual ability of the MLLM, we design a Multi-Level Cross-Attention (MLCA) mechanism that captures the hierarchical structure and semantic relations of document images. Furthermore, we create a new instruction-tuning dataset for document-oriented tasks by enriching the multimodal document data with Gemini Pro. We conduct extensive experiments on both general and document-oriented MLLM benchmarks, and show that TextHawk outperforms the state-of-the-art methods, demonstrating its effectiveness and superiority in fine-grained document perception and general abilities. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.08939 [pdf, other]

NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT

Authors: Xinzhe Zheng, Sijie Ji, Yipeng Pan, Kaiwen Zhang, Chenshu Wu

Abstract: Inertial tracking is vital for robotic IoT and has gained popularity thanks to the ubiquity of low-cost Inertial Measurement Units (IMUs) and deep learning-powered tracking algorithms. Existing works, however, have not fully utilized IMU measurements, particularly magnetometers, nor maximized the potential of deep learning to achieve the desired accuracy. To enhance the tracking accuracy for indoo… ▽ More Inertial tracking is vital for robotic IoT and has gained popularity thanks to the ubiquity of low-cost Inertial Measurement Units (IMUs) and deep learning-powered tracking algorithms. Existing works, however, have not fully utilized IMU measurements, particularly magnetometers, nor maximized the potential of deep learning to achieve the desired accuracy. To enhance the tracking accuracy for indoor robotic applications, we introduce NeurIT, a sequence-to-sequence framework that elevates tracking accuracy to a new level. NeurIT employs a Time-Frequency Block-recurrent Transformer (TF-BRT) at its core, combining the power of recurrent neural network (RNN) and Transformer to learn representative features in both time and frequency domains. To fully utilize IMU information, we strategically employ body-frame differentiation of the magnetometer, which considerably reduces the tracking error. NeurIT is implemented on a customized robotic platform and evaluated in various indoor environments. Experimental results demonstrate that NeurIT achieves a mere 1-meter tracking error over a 300-meter distance. Notably, it significantly outperforms state-of-the-art baselines by 48.21% on unseen data. NeurIT also performs comparably to the visual-inertial approach (Tango Phone) in vision-favored conditions and surpasses it in plain environments. We believe NeurIT takes an important step forward toward practical neural inertial tracking for ubiquitous and scalable tracking of robotic things. NeurIT, including the source code and the dataset, is open-sourced here: https://github.com/NeurIT-Project/NeurIT. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.07777 [pdf, ps, other]

doi 10.1140/epjc/s10052-024-12887-3

Improved analysis of double $J/ψぷさい$ production in $Z$-boson decay

Authors: Guang-Yu Wang, Xing-Gang Wu, Xu-Chang Zheng, Jiang Yan, Jia-Wei Zhang

Abstract: In this paper, we present an improved calculation for the decay rate of the rare $Z$-boson decay into $J/ψぷさい+ J/ψぷさい$. This decay is dominated by the photon fragmentation mechanism, i.e., the transition $Z\to J/ψぷさい+ γがんま^{*}$ followed by the fragmentation $γがんま^{*}\to J/ψぷさい$. In our calculation, the amplitude of $γがんま^{*}\to J/ψぷさい$ is extracted from the measured value of $Γがんま(J/ψぷさい\to e^+ e^-)$, and the amplitude of… ▽ More In this paper, we present an improved calculation for the decay rate of the rare $Z$-boson decay into $J/ψぷさい+ J/ψぷさい$. This decay is dominated by the photon fragmentation mechanism, i.e., the transition $Z\to J/ψぷさい+ γがんま^{*}$ followed by the fragmentation $γがんま^{*}\to J/ψぷさい$. In our calculation, the amplitude of $γがんま^{*}\to J/ψぷさい$ is extracted from the measured value of $Γがんま(J/ψぷさい\to e^+ e^-)$, and the amplitude of $Z\to J/ψぷさい+ γがんま^{*}$ is calculate through the light-cone approach. The higher-order QCD and relativistic corrections in the amplitude of $γがんま^{*}\to J/ψぷさい$ and the large logarithms of $m_{_Z}^2/m_c^2$ that appear in the amplitude of $Z\to J/ψぷさい+ γがんま^{*}$ are resummed in our calculation. Besides, the non-fragmentation amplitude is calculated based on the NRQCD factorization, and the next-to-leading order QCD and relativistic corrections are included. The obtained branching fraction for this $Z$ decay channel is $8.66 ^{+1.48} _{-0.69}\times 10^{-11}$. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 10 pages, 2 figures

Journal ref: Eur. Phys. J. C 84, 544 (2024)

arXiv:2404.07131 [pdf, other]

Search for prompt production of pentaquarks in charm hadron final states

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, B. Adeva, M. Adinolfi, P. Adlarson, H. Afsharnia, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, A. Alfonso Albero, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey , et al. (1090 additional authors not shown)

Abstract: A search for hidden-charm pentaquark states decaying to a range of $Σしぐま_{c}\bar{D}$ and $Λらむだ_{c}\bar{D}$ final states, as well as doubly-charmed pentaquark states to $Σしぐま_{c}D$ and $Λらむだ_{c}^{+}D$, is made using samples of proton-proton collision data corresponding to an integrated luminosity of $5.7fb^{-1}$ recorded by the LHCb detector at $\sqrt{s} = 13Te\kern -0.1em V$. Since no significant signals are… ▽ More A search for hidden-charm pentaquark states decaying to a range of $Σしぐま_{c}\bar{D}$ and $Λらむだ_{c}\bar{D}$ final states, as well as doubly-charmed pentaquark states to $Σしぐま_{c}D$ and $Λらむだ_{c}^{+}D$, is made using samples of proton-proton collision data corresponding to an integrated luminosity of $5.7fb^{-1}$ recorded by the LHCb detector at $\sqrt{s} = 13Te\kern -0.1em V$. Since no significant signals are found, upper limits are set on the pentaquark yields relative to that of the $Λらむだ_{c}^{+}$ baryon in the $Λらむだ_{c}^{+}\to pK^{-}πぱい^{+}$ decay mode. The known pentaquark states are also investigated, and their signal yields are found to be consistent with zero in all cases. △ Less

Submitted 2 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-018.html (LHCb public pages)

Report number: LHCb-PAPER-2023-018, CERN-EP-2024-071

arXiv:2404.05662 [pdf, other]

Towards Accurate Binarization of Diffusion Model

Authors: Xingyu Zheng, Haotong Qin, Xudong Ma, Mingyuan Zhang, Haojie Hao, Jiakai Wang, Zixiang Zhao, Jinyang Guo, Xianglong Liu

Abstract: With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel quantization-aware tr… ▽ More With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel quantization-aware training approach for DMs, namely BinaryDM. The proposed method pushes DMs' weights toward accurate and efficient binarization, considering the representation and computation properties. From the representation perspective, we present a Learnable Multi-basis Binarizer (LMB) to recover the representations generated by the binarized DM. The LMB enhances detailed information through the flexible combination of dual binary bases while applying to parameter-sparse locations of DM architectures to achieve minor burdens. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Moreover, a quick progressive warm-up is applied to BinaryDM, avoiding convergence difficulties by layerwisely progressive quantization at the beginning of training. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1.1-bit weight and 4-bit activation (W1.1A4), BinaryDM achieves as low as 7.11 FID and saves the performance from collapse (baseline FID 39.69). As the first binarization method for diffusion models, W1.1A4 BinaryDM achieves impressive 9.3 times OPs and 24.8 times model size savings, showcasing its substantial potential for edge deployment. △ Less

Submitted 28 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: The code is available at https://github.com/Xingyu-Zheng/BinaryDM

arXiv:2404.03375 [pdf, other]

Search for the $B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま$ decay

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1068 additional authors not shown)

Abstract: A search for the fully reconstructed $B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま$ decay is performed at the LHCb experiment using proton-proton collisions at $\sqrt{s}=13$\,TeV corresponding to an integrated luminosity of $5.4\,\mathrm{fb^{-1}}$. No significant signal is found and upper limits on the branching fraction in intervals of the dimuon mass are set \begin{align} {\cal B}(B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま) <… ▽ More A search for the fully reconstructed $B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま$ decay is performed at the LHCb experiment using proton-proton collisions at $\sqrt{s}=13$\,TeV corresponding to an integrated luminosity of $5.4\,\mathrm{fb^{-1}}$. No significant signal is found and upper limits on the branching fraction in intervals of the dimuon mass are set \begin{align} {\cal B}(B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま) < 4.2\times10^{-8},~&m(μみゅーμみゅー)\in[2m_μみゅー,~1.70]\,\mathrm{GeV/c^2} ,\nonumber {\cal B}(B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま) < 7.7\times10^{-8},~&m(μみゅーμみゅー)\in[1.70,~2.88]\,\mathrm{GeV/c^2},\nonumber {\cal B}(B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま) < 4.2\times10^{-8},~&m(μみゅーμみゅー)\in[3.92 ,~m_{B_s^0}]\,\mathrm{GeV/c^2},\nonumber \end{align} at 95\% confidence level. Additionally, upper limits are set on the branching fraction in the $[2m_μみゅー,~1.70]\,\mathrm{GeV/c^2}$ dimuon mass region excluding the contribution from the intermediate $φふぁい(1020)$ meson, and in the region combining all dimuon-mass intervals. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-045.html

Report number: LHCb-PAPER-2023-045, CERN-EP-2024-065

arXiv:2404.02981 [pdf]

Remote-contact catalysis for target-diameter semiconducting carbon nanotube array

Authors: Jiangtao Wang, Xudong Zheng, Gregory Pitner, Xiang Ji, Tianyi Zhang, Aijia Yao, Jiadi Zhu, Tomás Palacios, Lain-Jong Li, Han Wang, Jing Kong

Abstract: Electrostatic catalysis has been an exciting development in chemical synthesis (beyond enzymes catalysis) in recent years, boosting reaction rates and selectively producing certain reaction products. Most of the studies to date have been focused on using external electric field (EEF) to rearrange the charge distribution in small molecule reactions such as Diels-Alder addition, carbene reaction, et… ▽ More Electrostatic catalysis has been an exciting development in chemical synthesis (beyond enzymes catalysis) in recent years, boosting reaction rates and selectively producing certain reaction products. Most of the studies to date have been focused on using external electric field (EEF) to rearrange the charge distribution in small molecule reactions such as Diels-Alder addition, carbene reaction, etc. However, in order for these EEFs to be effective, a field on the order of 1 V/nm (10 MV/cm) is required, and the direction of the EEF has to be aligned with the reaction axis. Such a large and oriented EEF will be challenging for large-scale implementation, or materials growth with multiple reaction axis or steps. Here, we demonstrate that the energy band at the tip of an individual single-walled carbon nanotube (SWCNT) can be spontaneously shifted in a high-permittivity growth environment, with its other end in contact with a low-work function electrode (e.g., hafnium carbide or titanium carbide). By adjusting the Fermi level at a point where there is a substantial disparity in the density of states (DOS) between semiconducting (s-) and metallic (m-) SWCNTs, we achieve effective electrostatic catalysis for s-SWCNT growth assisted by a weak EEF perturbation (200V/cm). This approach enables the production of high-purity (99.92%) s-SWCNT horizontal arrays with narrow diameter distribution (0.95+-0.04 nm), targeting the requirement of advanced SWCNT-based electronics for future computing. These findings highlight the potential of electrostatic catalysis in precise materials growth, especially for s-SWCNTs, and pave the way for the development of advanced SWCNT-based electronics. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 4 figures, 23 pages

arXiv:2404.01466 [pdf, other]

TS-CausalNN: Learning Temporal Causal Relations from Non-linear Non-stationary Time Series Data

Authors: Omar Faruque, Sahara Ali, Xue Zheng, Jianwu Wang

Abstract: The growing availability and importance of time series data across various domains, including environmental science, epidemiology, and economics, has led to an increasing need for time-series causal discovery methods that can identify the intricate relationships in the non-stationary, non-linear, and often noisy real world data. However, the majority of current time series causal discovery methods… ▽ More The growing availability and importance of time series data across various domains, including environmental science, epidemiology, and economics, has led to an increasing need for time-series causal discovery methods that can identify the intricate relationships in the non-stationary, non-linear, and often noisy real world data. However, the majority of current time series causal discovery methods assume stationarity and linear relations in data, making them infeasible for the task. Further, the recent deep learning-based methods rely on the traditional causal structure learning approaches making them computationally expensive. In this paper, we propose a Time-Series Causal Neural Network (TS-CausalNN) - a deep learning technique to discover contemporaneous and lagged causal relations simultaneously. Our proposed architecture comprises (i) convolutional blocks comprising parallel custom causal layers, (ii) acyclicity constraint, and (iii) optimization techniques using the augmented Lagrangian approach. In addition to the simple parallel design, an advantage of the proposed model is that it naturally handles the non-stationarity and non-linearity of the data. Through experiments on multiple synthetic and real world datasets, we demonstrate the empirical proficiency of our proposed approach as compared to several state-of-the-art methods. The inferred graphs for the real world dataset are in good agreement with the domain understanding. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.20292 [pdf, ps, other]

Further remarks on absorbing Markov decision processes

Authors: Yi Zhang, Xinran Zheng

Abstract: In this note, based on the recent remarkable results of Dufour and Prieto-Rumeau, we deduce that for an absorbing MDP with a given initial state, under a standard compactness-continuity condition, the space of occupation measures has the same convergent sequences, when it is endowed with the weak topology and with the weak-strong topology. We provided two examples demonstrating that imposed condit… ▽ More In this note, based on the recent remarkable results of Dufour and Prieto-Rumeau, we deduce that for an absorbing MDP with a given initial state, under a standard compactness-continuity condition, the space of occupation measures has the same convergent sequences, when it is endowed with the weak topology and with the weak-strong topology. We provided two examples demonstrating that imposed condition cannot be replaced with its popular alternative, and the above assertion does not hold for the space of marginals of occupation measures on the state space. Moreover, the examples also clarify some results in the previous literature. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.19791 [pdf, other]

Spinon Kondo lattice in quantum spin liquids

Authors: Xia-Ming Zheng, Mehdi Kargarian

Abstract: Motivated by recent experimental observations of Kondo resonances in cobalt atoms on single layer 1T-TaSe$_{2}$, we theoretically investigate the effect of coupling a U(1) quantum spin liquid with a spinon Fermi surface to a lattice of Anderson impurities. Within the slave-rotor formalism, we find that above a critical coupling strength between the spin liquid and impurity lattice, the spinons hyb… ▽ More Motivated by recent experimental observations of Kondo resonances in cobalt atoms on single layer 1T-TaSe$_{2}$, we theoretically investigate the effect of coupling a U(1) quantum spin liquid with a spinon Fermi surface to a lattice of Anderson impurities. Within the slave-rotor formalism, we find that above a critical coupling strength between the spin liquid and impurity lattice, the spinons hybridize to form heavy quasiparticles near the Fermi level, realizing a {\it spinon Kondo lattice phase} analogous to heavy fermion materials. Using the Bethe-Salpeter equation and accounting for emergent gauge fluctuations, we compute the spectral density and density of states, revealing the formation of spinon-chargon bound states in the spinon Kondo lattice phase. We characterize the thermodynamic and spectroscopic signatures of this phase, demonstrating specific heat and neutron scattering responses distinct from a pure quantum spin liquid. Our findings establish the spinon Kondo lattice as a framework to study the rich physics of spin liquids. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 12 pages, 7 figures

arXiv:2403.17234 [pdf, other]

Speeding Up Path Planning via Reinforcement Learning in MCTS for Automated Parking

Authors: Xinlong Zheng, Xiaozhou Zhang, Donghao Xu

Abstract: In this paper, we address a method that integrates reinforcement learning into the Monte Carlo tree search to boost online path planning under fully observable environments for automated parking tasks. Sampling-based planning methods under high-dimensional space can be computationally expensive and time-consuming. State evaluation methods are useful by leveraging the prior knowledge into the searc… ▽ More In this paper, we address a method that integrates reinforcement learning into the Monte Carlo tree search to boost online path planning under fully observable environments for automated parking tasks. Sampling-based planning methods under high-dimensional space can be computationally expensive and time-consuming. State evaluation methods are useful by leveraging the prior knowledge into the search steps, making the process faster in a real-time system. Given the fact that automated parking tasks are often executed under complex environments, a solid but lightweight heuristic guidance is challenging to compose in a traditional analytical way. To overcome this limitation, we propose a reinforcement learning pipeline with a Monte Carlo tree search under the path planning framework. By iteratively learning the value of a state and the best action among samples from its previous cycle's outcomes, we are able to model a value estimator and a policy generator for given states. By doing that, we build up a balancing mechanism between exploration and exploitation, speeding up the path planning process while maintaining its quality without using human expert driver data. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16519 [pdf, ps, other]

Two Algorithms for Computing Rational Univariate Representations of Zero-Dimensional Ideals with Parameters

Authors: Dingkang Wang, Jingjing Wei, Fanghui Xiao, Xiaopeng Zheng

Abstract: Two algorithms for computing the rational univariate representation of zero-dimensional ideals with parameters are presented in the paper. Different from the rational univariate representation of zero-dimensional ideals without parameters, the number of zeros of zero-dimensional ideals with parameters under various specializations is different, which leads to choosing and checking the separating e… ▽ More Two algorithms for computing the rational univariate representation of zero-dimensional ideals with parameters are presented in the paper. Different from the rational univariate representation of zero-dimensional ideals without parameters, the number of zeros of zero-dimensional ideals with parameters under various specializations is different, which leads to choosing and checking the separating element, the key to computing the rational univariate representation, is difficult. In order to pick out the separating element, by partitioning the parameter space we can ensure that under each branch the ideal has the same number of zeros. Subsequently with the help of the extended subresultant theorem for parametric cases, two ideas are given to conduct the further partition of parameter space for choosing and checking the separating element. Based on these, we give two algorithms for computing rational univariate representations of zero-dimensional ideals with parameters. Furthermore, the two algorithms have been implemented on the computer algebra system Singular. Experimental data show that the second algorithm has the better performance in contrast to the first one. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16398 [pdf, other]

Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data

Authors: Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Fengyuan Yu, Huabin Zhu, Binhui Yao, Tao Wang, Xiaolin Zheng, Yanchao Tan

Abstract: Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) incon… ▽ More Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) inconsistent representation spaces among local models. The former indicates that representation collapse in local model will subsequently impact the global model and other local models. The latter means that clients model data representation with inconsistent parameters due to the deficiency of supervision signals. In this work, we propose FedU2 which enhances generating uniform and unified representation in FUSL with non-IID data. Specifically, FedU2 consists of flexible uniform regularizer (FUR) and efficient unified aggregator (EUA). FUR in each client avoids representation collapse via dispersing samples uniformly, and EUA in server promotes unified representation by constraining consistent client model updating. To extensively validate the performance of FedU2, we conduct both cross-device and cross-silo evaluation experiments on two benchmark datasets, i.e., CIFAR10 and CIFAR100. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.16370 [pdf, other]

GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation

Authors: Weiming Zhang, Yexin Liu, Xu Zheng, Lin Wang

Abstract: This paper tackles a novel yet challenging problem: how to transfer knowledge from the emerging Segment Anything Model (SAM) -- which reveals impressive zero-shot instance segmentation capacity -- to learn a compact panoramic semantic segmentation model, i.e., student, without requiring any labeled data. This poses considerable challenges due to SAM's inability to provide semantic labels and the l… ▽ More This paper tackles a novel yet challenging problem: how to transfer knowledge from the emerging Segment Anything Model (SAM) -- which reveals impressive zero-shot instance segmentation capacity -- to learn a compact panoramic semantic segmentation model, i.e., student, without requiring any labeled data. This poses considerable challenges due to SAM's inability to provide semantic labels and the large capacity gap between SAM and the student. To this end, we propose a novel framework, called GoodSAM, that introduces a teacher assistant (TA) to provide semantic information, integrated with SAM to generate ensemble logits to achieve knowledge transfer. Specifically, we propose a Distortion-Aware Rectification (DAR) module that first addresses the distortion problem of panoramic images by imposing prediction-level consistency and boundary enhancement. This subtly enhances TA's prediction capacity on panoramic images. DAR then incorporates a cross-task complementary fusion block to adaptively merge the predictions of SAM and TA to obtain more reliable ensemble logits. Moreover, we introduce a Multi-level Knowledge Adaptation (MKA) module to efficiently transfer the multi-level feature knowledge from TA and ensemble logits to learn a compact student model. Extensive experiments on two benchmarks show that our GoodSAM achieves a remarkable +3.75\% mIoU improvement over the state-of-the-art (SOTA) domain adaptation methods. Also, our most lightweight model achieves comparable performance to the SOTA methods with only 3.7M parameters. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

Showing 51–100 of 1,510 results for author: Zheng, X