Search | arXiv e-print repository

arXiv:2407.06776 [pdf, ps, other]

Finite time blow-up for the hypodissipative Navier Stokes equations with a uniform force in $L^1_t C_x^{1,εいぷしろん}\cap L^2$

Authors: Diego Córdoba, Luis Martínez-Zoroa, Fan Zheng

Abstract: In this work we establish the formation of singularities of classical solutions with finite energy of the forced fractional Navier Stokes equations where the dissipative term is given by $|\nabla|^αあるふぁ$ for any $αあるふぁ\in [0, αあるふぁ_0)$ ($αあるふぁ_0 = \frac{22-8\sqrt7}{9} > 0$). We construct solutions in $\mathbb{R}^3\times [0,T]$ with a finite $T>0$ and with an external forcing which is in… ▽ More In this work we establish the formation of singularities of classical solutions with finite energy of the forced fractional Navier Stokes equations where the dissipative term is given by $|\nabla|^αあるふぁ$ for any $αあるふぁ\in [0, αあるふぁ_0)$ ($αあるふぁ_0 = \frac{22-8\sqrt7}{9} > 0$). We construct solutions in $\mathbb{R}^3\times [0,T]$ with a finite $T>0$ and with an external forcing which is in $L^1_t([0, T]) C_x^{1,εいぷしろん}\cap L^2$, such that on the time interval $0 \le t < T$, the velocity $u$ is in the space $C^\infty\cap L^2$ and such that as the time $t$ approaches the blow-up moment $T$, the integral $\int_0^t |\nabla u| ds$ tends to infinity. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 33 pages

arXiv:2406.19706 [pdf, other]

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR

Authors: Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

Abstract: Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker adaptive mixture of LoRA experts (SAML) approach, which uses low-rank adaptation (LoRA) modules as experts to reduce the number of trainable parameters… ▽ More Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker adaptive mixture of LoRA experts (SAML) approach, which uses low-rank adaptation (LoRA) modules as experts to reduce the number of trainable parameters in MoE. Specifically, SAML is applied to the quantised and personalised end-to-end automatic speech recognition models, which combines test-time speaker adaptation to improve the performance of heavily compressed models in speaker-specific scenarios. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size, 29.1% and 31.1% relative word error rate reductions were achieved on the quantised Whisper model and Conformer-based attention-based encoder-decoder ASR model respectively, comparing to the original full precision models. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 5 pages, accepted by Interspeech 2024. arXiv admin note: substantial text overlap with arXiv:2309.09136

arXiv:2406.18361 [pdf, other]

Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse Process

Authors: Tianyu Lin, Zhiguang Chen, Zhonghao Yan, Weijiang Yu, Fudan Zheng

Abstract: Diffusion models have demonstrated their effectiveness across various generative tasks. However, when applied to medical image segmentation, these models encounter several challenges, including significant resource and time requirements. They also necessitate a multi-step reverse process and multiple samples to produce reliable predictions. To address these challenges, we introduce the first laten… ▽ More Diffusion models have demonstrated their effectiveness across various generative tasks. However, when applied to medical image segmentation, these models encounter several challenges, including significant resource and time requirements. They also necessitate a multi-step reverse process and multiple samples to produce reliable predictions. To address these challenges, we introduce the first latent diffusion segmentation model, named SDSeg, built upon stable diffusion (SD). SDSeg incorporates a straightforward latent estimation strategy to facilitate a single-step reverse process and utilizes latent fusion concatenation to remove the necessity for multiple samples. Extensive experiments indicate that SDSeg surpasses existing state-of-the-art methods on five benchmark datasets featuring diverse imaging modalities. Remarkably, SDSeg is capable of generating stable predictions with a solitary reverse step and sample, epitomizing the model's stability as implied by its name. The code is available at https://github.com/lin-tianyu/Stable-Diffusion-Seg △ Less

Submitted 9 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted at MICCAI 2024. Code and citation info see https://github.com/lin-tianyu/Stable-Diffusion-Seg

arXiv:2406.17005 [pdf, other]

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic scenarios. The MOSE challenge had 140 registered teams in total, 65 teams participated the validation phase and 12 teams made valid submissions in the final challenge phase. The MeViS challenge had 225 registered teams in total, 50 teams participated the validation phase and 5 teams made valid submissions in the final challenge phase. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γがんま$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.07043 [pdf, other]

1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

Authors: Mingqi Gao, Jingnan Luo, Jinyu Yang, Jungong Han, Feng Zheng

Abstract: Motion Expression guided Video Segmentation (MeViS), as an emerging task, poses many new challenges to the field of referring video object segmentation (RVOS). In this technical report, we investigated and validated the effectiveness of static-dominant data and frame sampling on this challenging setting. Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeVi… ▽ More Motion Expression guided Video Segmentation (MeViS), as an emerging task, poses many new challenges to the field of referring video object segmentation (RVOS). In this technical report, we investigated and validated the effectiveness of static-dominant data and frame sampling on this challenging setting. Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge. The code is available at: https://github.com/Tapall-AI/MeViS_Track_Solution_2024. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.03866 [pdf, other]

LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model

Authors: Yixuan Yang, Junru Lu, Zixiang Zhao, Zhen Luo, James J. Q. Yu, Victor Sanchez, Feng Zheng

Abstract: Designing 3D indoor layouts is a crucial task with significant applications in virtual reality, interior design, and automated space planning. Existing methods for 3D layout design either rely on diffusion models, which utilize spatial relationship priors, or heavily leverage the inferential capabilities of proprietary Large Language Models (LLMs), which require extensive prompt engineering and in… ▽ More Designing 3D indoor layouts is a crucial task with significant applications in virtual reality, interior design, and automated space planning. Existing methods for 3D layout design either rely on diffusion models, which utilize spatial relationship priors, or heavily leverage the inferential capabilities of proprietary Large Language Models (LLMs), which require extensive prompt engineering and in-context exemplars via black-box trials. These methods often face limitations in generalization and dynamic scene editing. In this paper, we introduce LLplace, a novel 3D indoor scene layout designer based on lightweight fine-tuned open-source LLM Llama3. LLplace circumvents the need for spatial relationship priors and in-context exemplars, enabling efficient and credible room layout generation based solely on user inputs specifying the room type and desired objects. We curated a new dialogue dataset based on the 3D-Front dataset, expanding the original data volume and incorporating dialogue data for adding and removing objects. This dataset can enhance the LLM's spatial understanding. Furthermore, through dialogue, LLplace activates the LLM's capability to understand 3D layouts and perform dynamic scene editing, enabling the addition and removal of objects. Our approach demonstrates that LLplace can effectively generate and edit 3D indoor layouts interactively and outperform existing methods in delivering high-quality 3D design solutions. Code and dataset will be released. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.00330 [pdf, other]

Magnetic ground state of monolayer CeI$_{2}$: occupation matrix control and DFT+U calculations

Authors: Yue-Fei Hou, Shujing Li, Xinlong Yang, Wei Jiang, Qiuhao Wang, Fawei Zheng, Zhen-Guo Fu, Ping Zhang

Abstract: The magnetic ground state is crucial for the applications of the two-dimension magnets as it decides fundamental magnetic properties of the material, such as magnetic order, magnetic transition temperature, and low-energy excitation of the spin waves. However, the simulations for magnetism of local-electron systems are challenging due to the existence of metastable states. In this study, occupatio… ▽ More The magnetic ground state is crucial for the applications of the two-dimension magnets as it decides fundamental magnetic properties of the material, such as magnetic order, magnetic transition temperature, and low-energy excitation of the spin waves. However, the simulations for magnetism of local-electron systems are challenging due to the existence of metastable states. In this study, occupation matrix control (OMC) and density functional theory plus Hubbard $U$ calculations are applied to investigate the magnetic ground state of monolayer CeI$_{2}$. Following the predicted ferrimagnetic (FM) order, the FM ground state and the FM metastable states are identified and found to have different values of the magnetic parameters. Based on the calculated magnetic parameters of the FM ground state, the Curie temperature is estimated to be $128$ K for monolayer CeI$_{2}$. When spin-orbit coupling (SOC) is considered,the FM ground state is further confirmed to contain both off-plane and in-plane components of magnetization. SOC is shown to be essential for reasonably describing not only magnetic anisotropy but also local electronic orbital state of monolayer CeI$_{2}$. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 4 figures. Comments are welcome

arXiv:2405.18744 [pdf, other]

PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN

Authors: Fei Zheng, Chaochao Chen, Zhongxuan Han, Xiaolin Zheng

Abstract: The emergence of ChatGPT marks the arrival of the large language model (LLM) era. While LLMs demonstrate their power in a variety of fields, they also raise serious privacy concerns as the users' queries are sent to the model provider. On the other side, deploying the LLM on the user's device will also leak all the model data. Existing methods based on secure multiparty computation (MPC) managed t… ▽ More The emergence of ChatGPT marks the arrival of the large language model (LLM) era. While LLMs demonstrate their power in a variety of fields, they also raise serious privacy concerns as the users' queries are sent to the model provider. On the other side, deploying the LLM on the user's device will also leak all the model data. Existing methods based on secure multiparty computation (MPC) managed to protect both the privacy of the model parameters and user queries. However, they require gigabytes of data transfer and several minutes to generate just one token, making them impractical for most real-world applications. To improve the efficiency of private LLM inference, we propose PermLLM, which accelerates the evaluation of non-linear functions using secure random permutation. Along with the optimized secret sharing protocols and homomorphic encryption, PermLLM achieves two-party private inference of the ChatGLM-6B model at the speed of around 3s/token, under a realistic network setting (10ms RTT and 1Gbps bandwidth), which is magnitudes faster than existing MPC solutions. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17264 [pdf, other]

On the Noise Robustness of In-Context Learning for Text Generation

Authors: Hongfu Gao, Feipeng Zhang, Wenyu Jiang, Jun Shu, Feng Zheng, Hongxin Wei

Abstract: Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that in-context learning is robust to noisy demonstrations in text classification. In this work, we show that, on text generation tasks, noisy annotations significan… ▽ More Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that in-context learning is robust to noisy demonstrations in text classification. In this work, we show that, on text generation tasks, noisy annotations significantly hurt the performance of in-context learning. To circumvent the issue, we propose a simple and effective approach called Local Perplexity Ranking (LPR), which replaces the "noisy" candidates with their nearest neighbors that are more likely to be clean. Our method is motivated by analyzing the perplexity deviation caused by noisy labels and decomposing perplexity into inherent perplexity and matching perplexity. Our key idea behind LPR is thus to decouple the matching perplexity by performing the ranking among the neighbors in semantic space. Our approach can prevent the selected demonstrations from including mismatched input-label pairs while preserving the effectiveness of the original selection methods. Extensive experiments demonstrate the effectiveness of LPR, improving the EM score by up to 18.75 on common benchmarks with noisy annotations. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16428 [pdf]

Crystal facet orientation and temperature dependence of charge and spin Hall effects in noncollinear antiferromagnet: A first-principles investigation

Authors: Meng Zhu, Xinlu Li, Fanxing Zheng, Jianting Dong, Ye Zhou, Kun Wu, Jia Zhang

Abstract: Noncollinear antiferromagnets (nc-AFMs) have attracted increasing research attention in spintronics due to their unique spin structures and fascinating charge and spin transport properties. By using first-principles calculations, we comprehensively investigate the charge and spin Hall effects in representative noncollinear antiferromagnet Mn3Pt. Our study reveals that the Hall effects in nc-AFMs a… ▽ More Noncollinear antiferromagnets (nc-AFMs) have attracted increasing research attention in spintronics due to their unique spin structures and fascinating charge and spin transport properties. By using first-principles calculations, we comprehensively investigate the charge and spin Hall effects in representative noncollinear antiferromagnet Mn3Pt. Our study reveals that the Hall effects in nc-AFMs are critically dependent on the crystal facet orientation and temperature. For (001) orientated Mn3Pt, each charge and spin Hall conductivity element is comprised of both time reversal odd (T-odd) and even (T-even) contribution, associated with longitudinal conductivity, which leads to sizable and highly anisotropic Hall conductivity. The temperature dependence of charge and spin Hall conductivity has been elucidated by considering both phonon and spin disorder scattering. The scaling relations between Hall conductivity and longitudinal conductivity have also been investigated. The existence of prominent spin Hall effect in nc-AFMs may generate spin current with Sz spin polarization, which is advantageous for field free switching of perpendicular magnetization. Our work may provide unambiguous understanding on the charge and spin transport in noncollinear antiferromagnets and pave their way for applications in antiferromagnetic spintronics. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.09110 [pdf, ps, other]

Bismut torsion parallel metrics with constant holomorphic sectional curvature

Authors: Shuwen Chen, Fangyang Zheng

Abstract: An old conjecture in non-Kähler geometry states that, if a compact Hermitian manifold has constant holomorphic sectional curvature, then the metric must be Kähler (when the constant is non-zero) or Chern flat (when the constant is zero). It is known to be true in complex dimension $2$ by the work of Balas and Gauduchon in 1985 (when the constant is negative or zero) and Apostolov, Davidov and Musk… ▽ More An old conjecture in non-Kähler geometry states that, if a compact Hermitian manifold has constant holomorphic sectional curvature, then the metric must be Kähler (when the constant is non-zero) or Chern flat (when the constant is zero). It is known to be true in complex dimension $2$ by the work of Balas and Gauduchon in 1985 (when the constant is negative or zero) and Apostolov, Davidov and Muskarov in 1996 (when the constant is positive). In dimension $3$ or higher, the conjecture is only known in some special cases, such as the locally conformally Kähler case (when the constant is negative or zero) by the work of Chen, Chen and Nie, or for complex nilmanifolds with nilpotent $J$ by the work of Li and the second named author. In this note, we confirm the above conjecture for all non-balanced Bismut torsion parallel (BTP) manifolds. Here the BTP condition means that the Bismut connection has parallel torsion. In particular, the conjecture is valid for all Vaisman manifolds. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 12 pages

MSC Class: 53C55

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γがんま$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σしぐま$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2404.17205 [pdf, other]

Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer

Authors: Xinpeng Li, Teng Wang, Jian Zhao, Shuyi Mao, Jinbao Wang, Feng Zheng, Xiaojiang Peng, Xuelong Li

Abstract: Emotion recognition aims to discern the emotional state of subjects within an image, relying on subject-centric and contextual visual cues. Current approaches typically follow a two-stage pipeline: first localize subjects by off-the-shelf detectors, then perform emotion classification through the late fusion of subject and context features. However, the complicated paradigm suffers from disjoint t… ▽ More Emotion recognition aims to discern the emotional state of subjects within an image, relying on subject-centric and contextual visual cues. Current approaches typically follow a two-stage pipeline: first localize subjects by off-the-shelf detectors, then perform emotion classification through the late fusion of subject and context features. However, the complicated paradigm suffers from disjoint training stages and limited interaction between fine-grained subject-context elements. To address the challenge, we present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT), for simultaneous subject localization and emotion classification. Rather than compartmentalizing training stages, we jointly leverage box and emotion signals as supervision to enrich subject-centric feature learning. Furthermore, we introduce DSCT to facilitate interactions between fine-grained subject-context cues in a decouple-then-fuse manner. The decoupled query token--subject queries and context queries--gradually intertwine across layers within DSCT, during which spatial and semantic relations are exploited and aggregated. We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC. Our approach surpasses two-stage alternatives with fewer parameter numbers, achieving a 3.39% accuracy improvement and a 6.46% average precision gain on CAER-S and EMOTIC datasets, respectively. △ Less

Submitted 28 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.04801 [pdf, ps, other]

doi 10.1007/s41605-024-00467-8

LHAASO-KM2A detector simulation using Geant4

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.03179 [pdf, other]

UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

Authors: Tiantian Geng, Teng Wang, Yanfu Zhang, Jinming Duan, Weili Guan, Feng Zheng

Abstract: Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio… ▽ More Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio-Visual perception network, to achieve joint learning of TAL, SED and AVEL tasks for the first time. UniAV can leverage diverse data available in task-specific datasets, allowing the model to learn and share mutually beneficial knowledge across tasks and modalities. To tackle the challenges posed by substantial variations in datasets (size/domain/duration) and distinct task characteristics, we propose to uniformly encode visual and audio modalities of all videos to derive generic representations, while also designing task-specific experts to capture unique knowledge for each task. Besides, we develop a unified language-aware classifier by utilizing a pre-trained text encoder, enabling the model to flexibly detect various types of instances and previously unseen ones by simply changing prompts during inference. UniAV outperforms its single-task counterparts by a large margin with fewer parameters, achieving on-par or superior performances compared to state-of-the-art task-specific methods across ActivityNet 1.3, DESED and UnAV-100 benchmarks. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.20078 [pdf, other]

Negative Label Guided OOD Detection with Pretrained Vision-Language Models

Authors: Xue Jiang, Feng Liu, Zhen Fang, Hong Chen, Tongliang Liu, Feng Zheng, Bo Han

Abstract: Out-of-distribution (OOD) detection aims at identifying samples from unknown classes, playing a crucial role in trustworthy models against errors on unexpected inputs. Extensive research has been dedicated to exploring OOD detection in the vision modality. Vision-language models (VLMs) can leverage both textual and visual information for various multi-modal applications, whereas few OOD detection… ▽ More Out-of-distribution (OOD) detection aims at identifying samples from unknown classes, playing a crucial role in trustworthy models against errors on unexpected inputs. Extensive research has been dedicated to exploring OOD detection in the vision modality. Vision-language models (VLMs) can leverage both textual and visual information for various multi-modal applications, whereas few OOD detection methods take into account information from the text modality. In this paper, we propose a novel post hoc OOD detection method, called NegLabel, which takes a vast number of negative labels from extensive corpus databases. We design a novel scheme for the OOD score collaborated with negative labels. Theoretical analysis helps to understand the mechanism of negative labels. Extensive experiments demonstrate that our method NegLabel achieves state-of-the-art performance on various OOD detection benchmarks and generalizes well on multiple VLM architectures. Furthermore, our method NegLabel exhibits remarkable robustness against diverse domain shifts. The codes are available at https://github.com/tmlr-group/NegLabel. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: ICLR 2024 Spotlight

arXiv:2403.16931 [pdf, other]

Embedded skyrmion bags in thin films of chiral magnets

Authors: Luyan Yang, Andrii S. Savchenko, Fengshan Zheng, Nikolai S. Kiselev, Filipp N. Rybakov, Xiaodong Han, Stefan Blügel, Rafal E. Dunin-Borkowski

Abstract: Magnetic skyrmions are topologically nontrivial spin configurations that possess particle-like properties. Earlier research was mainly focused on a specific type of skyrmion with topological charge Q = -1. However, theoretical analyses of two-dimensional chiral magnets have predicted the existence of skyrmion bags -- solitons with arbitrary positive or negative topological charge. Although such sp… ▽ More Magnetic skyrmions are topologically nontrivial spin configurations that possess particle-like properties. Earlier research was mainly focused on a specific type of skyrmion with topological charge Q = -1. However, theoretical analyses of two-dimensional chiral magnets have predicted the existence of skyrmion bags -- solitons with arbitrary positive or negative topological charge. Although such spin textures are metastable states, recent experimental observations have confirmed the stability of isolated skyrmion bags in a limited range of applied magnetic fields. Here, by utilizing Lorentz transmission electron microscopy, we show the extraordinary stability of skyrmion bags in thin plates of B20-type FeGe. In particular, we show that skyrmion bags embedded within a skyrmion lattice remain stable even in zero or inverted external magnetic fields. A robust protocol for nucleating such embedded skyrmion bags is provided. Our results agree perfectly with micromagnetic simulations and establish thin plates of cubic chiral magnets as a powerful platform for exploring a broad spectrum of topological magnetic solitons. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 17 pages, 15 figures

arXiv:2403.14233 [pdf, other]

SoftPatch: Unsupervised Anomaly Detection with Noisy Data

Authors: Xi Jiang, Ying Chen, Qiang Nie, Yong Liu, Jianlin Liu, Bin-Bin Gao, Jun Liu, Chengjie Wang, Feng Zheng

Abstract: Although mainstream unsupervised anomaly detection (AD) algorithms perform well in academic datasets, their performance is limited in practical application due to the ideal experimental setting of clean training data. Training with noisy data is an inevitable problem in real-world anomaly detection but is seldom discussed. This paper considers label-level noise in image sensory anomaly detection f… ▽ More Although mainstream unsupervised anomaly detection (AD) algorithms perform well in academic datasets, their performance is limited in practical application due to the ideal experimental setting of clean training data. Training with noisy data is an inevitable problem in real-world anomaly detection but is seldom discussed. This paper considers label-level noise in image sensory anomaly detection for the first time. To solve this problem, we proposed a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level. Noise discriminators are utilized to generate outlier scores for patch-level noise elimination before coreset construction. The scores are then stored in the memory bank to soften the anomaly detection boundary. Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset. Comprehensive experiments in various noise scenes demonstrate that SoftPatch outperforms the state-of-the-art AD methods on the MVTecAD and BTAD benchmarks and is comparable to those methods under the setting without noise. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 36th Conference on Neural Information Processing Systems

Journal ref: Advances in Neural Information Processing Systems 35, ISBN: 9781713871088, (2022)

arXiv:2403.14213 [pdf, other]

Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference

Authors: Xi Jiang, Ying Chen, Qiang Nie, Jianlin Liu, Yong Liu, Chengjie Wang, Feng Zheng

Abstract: In the context of high usability in single-class anomaly detection models, recent academic research has become concerned about the more complex multi-class anomaly detection. Although several papers have designed unified models for this task, they often overlook the utility of class labels, a potent tool for mitigating inter-class interference. To address this issue, we introduce a Multi-class Imp… ▽ More In the context of high usability in single-class anomaly detection models, recent academic research has become concerned about the more complex multi-class anomaly detection. Although several papers have designed unified models for this task, they often overlook the utility of class labels, a potent tool for mitigating inter-class interference. To address this issue, we introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD), which leverages the fine-grained category information in the training stage. By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder, mitigating inter-class interference within the reconstruction model. Utilizing such an implicit neural representation network, MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions. Experimental results on multiple datasets demonstrate that MINT-AD outperforms existing unified training models. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.12658 [pdf, other]

Tuning-Free Image Customization with Image and Text Guidance

Authors: Pengzhi Li, Qiang Nie, Ying Chen, Xi Jiang, Kai Wu, Yuhuan Lin, Yong Liu, Jinlong Peng, Chengjie Wang, Feng Zheng

Abstract: Despite significant advancements in image customization with diffusion models, current methods still have several limitations: 1) unintended changes in non-target areas when regenerating the entire image; 2) guidance solely by a reference image or text descriptions; and 3) time-consuming fine-tuning, which limits their practical application. In response, we introduce a tuning-free framework for si… ▽ More Despite significant advancements in image customization with diffusion models, current methods still have several limitations: 1) unintended changes in non-target areas when regenerating the entire image; 2) guidance solely by a reference image or text descriptions; and 3) time-consuming fine-tuning, which limits their practical application. In response, we introduce a tuning-free framework for simultaneous text-image-guided image customization, enabling precise editing of specific image regions within seconds. Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions. To achieve this, we propose an innovative attention blending strategy that blends self-attention features in the UNet decoder during the denoising process. To our knowledge, this is the first tuning-free method that concurrently utilizes text and image guidance for image customization in specific regions. Our approach outperforms previous methods in both human and quantitative evaluations, providing an efficient solution for various practical applications, such as image synthesis, design, and creative photography. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 17 pages, 8 figures

Journal ref: ECCV 2024

arXiv:2403.10010 [pdf, other]

doi 10.1103/PhysRevLett.132.131002

Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen , et al. (256 additional authors not shown)

Abstract: We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at… ▽ More We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components. △ Less

Submitted 26 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures

Journal ref: Physical Review Letters 132, 131002 (2024)

arXiv:2403.08615 [pdf, ps, other]

TopoTB: A software package for calculating the electronic structure and topological properties of the tight-binding model

Authors: Xinliang Huang, Fawei Zheng, Ning Hao

Abstract: We present TopoTB, a software package written in the Mathematica language, designed to compute electronic structures, topological properties, and phase diagrams based on tight-binding models. TopoTB is user-friendly, with an interactive user interface that enables the tuning of model parameters for fitting the target energy bands in a WYSIWYG way. In addition, TopoTB also includes functionalities… ▽ More We present TopoTB, a software package written in the Mathematica language, designed to compute electronic structures, topological properties, and phase diagrams based on tight-binding models. TopoTB is user-friendly, with an interactive user interface that enables the tuning of model parameters for fitting the target energy bands in a WYSIWYG way. In addition, TopoTB also includes functionalities for processing results from Density Functional Theory calculations. The outputs of TopoTB are rich and readable, and they can be displayed in various styles. These features make TopoTB a useful tool for the theoretical study of materials. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 22 pages, 10 figures

arXiv:2403.01949 [pdf]

Spatially Dependent in-Gap States Induced by Andreev Tunneling through a Single Electronic State

Authors: Ruixia Zhong, Zhongzheng Yang, Qi Wang, Fanbang Zheng, Wenhui Li, Juefei Wu, Chenhaoping Wen, Xi Chen, Yanpeng Qi, Shichao Yan

Abstract: By using low-temperature scanning tunneling microscopy and spectroscopy (STM/STS), we observe in-gap states induced by Andreev tunneling through a single impurity state in a low carrier density superconductor (NaAlSi). The energy-symmetric in-gap states appear when the impurity state is located within the superconducting gap. In-gap states can cross the Fermi level, and they show X-shaped spatial… ▽ More By using low-temperature scanning tunneling microscopy and spectroscopy (STM/STS), we observe in-gap states induced by Andreev tunneling through a single impurity state in a low carrier density superconductor (NaAlSi). The energy-symmetric in-gap states appear when the impurity state is located within the superconducting gap. In-gap states can cross the Fermi level, and they show X-shaped spatial variation. We interpret the in-gap states as a consequence of the Andreev tunneling through the impurity state, which involves the formation or breakup of a Cooper pair. Due to the low carrier density in NaAlSi, the in-gap state is tunable by controlling the STM tip-sample distance. Under strong external magnetic fields, the impurity state shows Zeeman splitting when it is located near the Fermi level. Our findings not only demonstrate the Andreev tunneling involving single electronic state, but also provide new insights for understanding the spatially-dependent in-gap states in low carrier density superconductors. △ Less

Submitted 3 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 12 pages, 4 figures

arXiv:2402.14316 [pdf, other]

Place Anything into Any Video

Authors: Ziling Liu, Jinyu Yang, Mingqi Gao, Feng Zheng

Abstract: Controllable video editing has demonstrated remarkable potential across diverse applications, particularly in scenarios where capturing or re-capturing real-world videos is either impractical or costly. This paper introduces a novel and efficient system named Place-Anything, which facilitates the insertion of any object into any video solely based on a picture or text description of the target obj… ▽ More Controllable video editing has demonstrated remarkable potential across diverse applications, particularly in scenarios where capturing or re-capturing real-world videos is either impractical or costly. This paper introduces a novel and efficient system named Place-Anything, which facilitates the insertion of any object into any video solely based on a picture or text description of the target object or element. The system comprises three modules: 3D generation, video reconstruction, and 3D target insertion. This integrated approach offers an efficient and effective solution for producing and editing high-quality videos by seamlessly inserting realistic objects. Through a user study, we demonstrate that our system can effortlessly place any object into any video using just a photograph of the object. Our demo video can be found at https://youtu.be/afXqgLLRnTE. Please also visit our project page https://place-anything.github.io to get access. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.13886 [pdf, other]

Multigap superconductivity in lithium intercalated bilayer Mo$_2$C

Authors: Can Hong, Danhong Wu, Xi-Bo Li, Feipeng Zheng

Abstract: Interlayer coupling can significantly influence the physical properties of layered transition metal compounds. The superconductivity in layered Mo$_2$C systems, belonging to the emergent family of MXene, has garnered considerable attention. However, the impact of interlayer coupling on superconductivity, and the anisotropic superconducting properties in these systems are not yet clear. By performi… ▽ More Interlayer coupling can significantly influence the physical properties of layered transition metal compounds. The superconductivity in layered Mo$_2$C systems, belonging to the emergent family of MXene, has garnered considerable attention. However, the impact of interlayer coupling on superconductivity, and the anisotropic superconducting properties in these systems are not yet clear. By performing first-principles calculations of electron-phonon coupling and anisotropic superconducting properties, we show that the interlayer coupling in bilayer 1$T$-Mo$_2$C suppresses superconductivity, resulting in a significant drop in superconducting transition temperature ($T_{\mathrm{c}}$) from 4.2 $K$ in its monolayer form to nearly 0 $K$. By introducing lithium atoms into the interlayer space of the bilayer, the interlayer coupling can be effectively weakened, transforming the system into a two-gap superconductor with a $T_{\mathrm{c}}$ above 10 $K$. A 3\% tensile strain can further transform the system into a three-gap superconductor with a significantly enhanced $T_{\mathrm{c}}$ of approximately 24.7 $K$, which is very high in the Mo$_2$C related systems. The enhancement of the superconductivity induced by the strain is mainly due to the downshift of an energy band with a flat dispersion to the energy near the Fermi level. The in-plane vibrations of Mo atoms and the $d$-orbital electrons of Mo atoms are most important for the formation of the superconductivity. Our method can also be applied to multilayer Mo$_2$C systems. Given the successful synthesis of layered Mo$_2$C systems and the experimental realization of alkaline metal atom depositions, our work presents a practically feasible strategy for achieving high $T_{\mathrm{c}}$ and multigap superconductivity in layered Mo$_2$C. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 9 pages, 7 figures

Journal ref: Physical Review B 2024, 109: 064515

arXiv:2402.03783 [pdf, other]

Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning

Authors: Fudan Zheng, Jindong Cao, Weijiang Yu, Zhiguang Chen, Nong Xiao, Yutong Lu

Abstract: Most advances in medical image recognition supporting clinical auxiliary diagnosis meet challenges due to the low-resource situation in the medical field, where annotations are highly expensive and professional. This low-resource problem can be alleviated by leveraging the transferable representations of large-scale pre-trained vision-language models via relevant medical text prompts. However, exi… ▽ More Most advances in medical image recognition supporting clinical auxiliary diagnosis meet challenges due to the low-resource situation in the medical field, where annotations are highly expensive and professional. This low-resource problem can be alleviated by leveraging the transferable representations of large-scale pre-trained vision-language models via relevant medical text prompts. However, existing pre-trained vision-language models require domain experts to carefully design the medical prompts, which greatly increases the burden on clinicians. To address this problem, we propose a weakly supervised prompt learning method MedPrompt to automatically generate medical prompts, which includes an unsupervised pre-trained vision-language model and a weakly supervised prompt learning model. The unsupervised pre-trained vision-language model utilizes the natural correlation between medical images and corresponding medical texts for pre-training, without any manual annotations. The weakly supervised prompt learning model only utilizes the classes of images in the dataset to guide the learning of the specific class vector in the prompt, while the learning of other context vectors in the prompt requires no manual annotations for guidance. To the best of our knowledge, this is the first model to automatically generate medical prompts. With these prompts, the pre-trained vision-language model can be freed from the strong expert dependency of manual annotation and manual prompt design. Experimental results show that the model using our automatically generated prompts outperforms its full-shot learning hand-crafted prompts counterparts with only a minimal number of labeled samples for few-shot learning, and reaches superior or comparable accuracy on zero-shot image classification. The proposed prompt generator is lightweight and therefore can be embedded into any network architecture. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Accepted by Pattern Recognition

arXiv:2402.03754 [pdf, other]

Intensive Vision-guided Network for Radiology Report Generation

Authors: Fudan Zheng, Mengfei Li, Ying Wang, Weijiang Yu, Ruixuan Wang, Zhiguang Chen, Nong Xiao, Yutong Lu

Abstract: Automatic radiology report generation is booming due to its huge application potential for the healthcare industry. However, existing computer vision and natural language processing approaches to tackle this problem are limited in two aspects. First, when extracting image features, most of them neglect multi-view reasoning in vision and model single-view structure of medical images, such as space-… ▽ More Automatic radiology report generation is booming due to its huge application potential for the healthcare industry. However, existing computer vision and natural language processing approaches to tackle this problem are limited in two aspects. First, when extracting image features, most of them neglect multi-view reasoning in vision and model single-view structure of medical images, such as space-view or channel-view. However, clinicians rely on multi-view imaging information for comprehensive judgment in daily clinical diagnosis. Second, when generating reports, they overlook context reasoning with multi-modal information and focus on pure textual optimization utilizing retrieval-based methods. We aim to address these two issues by proposing a model that better simulates clinicians' perspectives and generates more accurate reports. Given the above limitation in feature extraction, we propose a Globally-intensive Attention (GIA) module in the medical image encoder to simulate and integrate multi-view vision perception. GIA aims to learn three types of vision perception: depth view, space view, and pixel view. On the other hand, to address the above problem in report generation, we explore how to involve multi-modal signals to generate precisely matched reports, i.e., how to integrate previously predicted words with region-aware visual content in next word prediction. Specifically, we design a Visual Knowledge-guided Decoder (VKGD), which can adaptively consider how much the model needs to rely on visual information and previously predicted text to assist next word prediction. Hence, our final Intensive Vision-guided Network (IVGN) framework includes a GIA-guided Visual Encoder and the VKGD. Experiments on two commonly-used datasets IU X-Ray and MIMIC-CXR demonstrate the superior ability of our method compared with other state-of-the-art approaches. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Accepted by Physics in Medicine & Biology

arXiv:2401.17024 [pdf, other]

doi 10.1016/j.mtphys.2024.101374

Prediction of ambient pressure superconductivity in cubic ternary hydrides with MH$_6$ octahedra

Authors: Feng Zheng, Zhen Zhang, Shunqing Wu, Qiubao Lin, Renhai Wang, Yimei Fang, Cai-Zhuang Wang, Vladimir Antropov, Yang Sun, Kai-Ming Ho

Abstract: Exploring high-temperature superconducting (high-$T_c$) material at ambient pressure holds immense significance for physics, chemistry, and materials science. In this study, we perform a high-throughput screening of strong electron-phonon interactions in X$_2$MH$_6$ compounds (X = Li, Na, Mg, Al, K, Ca, Ga, Rb, Sr, and In; M are $3d$, $4d$, and $5d$ transition metals). These compounds have a cubic… ▽ More Exploring high-temperature superconducting (high-$T_c$) material at ambient pressure holds immense significance for physics, chemistry, and materials science. In this study, we perform a high-throughput screening of strong electron-phonon interactions in X$_2$MH$_6$ compounds (X = Li, Na, Mg, Al, K, Ca, Ga, Rb, Sr, and In; M are $3d$, $4d$, and $5d$ transition metals). These compounds have a cubic structure featuring an MH$_6$ octahedron motif. Our screening calculations suggest that 26 compounds exhibit dynamic stability and strong electron-phonon coupling. Among these 26 compounds, Mg$_2$RhH$_6$, Mg$_2$IrH$_6$, Al$_2$MnH$_6$, and Li$_2$CuH$_6$ show promising energetic stability and $T_c$ of more than 50 K at ambient pressure. This study underscores promising high-$T_c$ compounds at ambient pressure with distinctive MH$_6$ motifs. △ Less

Submitted 30 January, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Journal ref: Mater. Today Phys. 42, 101374 (2024)

arXiv:2401.13698 [pdf, other]

Finite-volume hyperbolic Coxeter 4-dimensional polytopes with 7 facets

Authors: Jiming Ma, Fangting Zheng

Abstract: In this paper, we obtain a complete classification of 315 finite-volume hyperbolic Coxeter 4-dimensional polytopes with 7 facets. In this paper, we obtain a complete classification of 315 finite-volume hyperbolic Coxeter 4-dimensional polytopes with 7 facets. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2401.04399 [pdf, other]

Monolayer Fe$_{3}$GaX$_{2}$ (X=I, Br, Sb): high-temperature two-dimensional magnets and a novel partially ordered spin state

Authors: Qiuhao Wang, Xinlong Yang, Fawei Zheng, Ping Zhang

Abstract: We systematically investigated the effects of charge doping and strain on monolayer Fe$_3$GaTe$_2$, and proposed three new novel two-dimensional magnetic materials: monolayer Fe$_3$GaX$_2$ (X=I, Br, Sb). We found that both strain and charge doping can tune the magnetic interactions, and the tuning by charge doping is more significant. Differential charge analysis revealed that the doped charges pr… ▽ More We systematically investigated the effects of charge doping and strain on monolayer Fe$_3$GaTe$_2$, and proposed three new novel two-dimensional magnetic materials: monolayer Fe$_3$GaX$_2$ (X=I, Br, Sb). We found that both strain and charge doping can tune the magnetic interactions, and the tuning by charge doping is more significant. Differential charge analysis revealed that the doped charges predominantly accumulate around Te atoms. Based on this insight, we introduced Fe$_{3}$GaI$_{2}$, Fe$_{3}$GaBr$_{2}$, and Fe$_{3}$GaSb$_{2}$ monolayers. The Fe$_{3}$GaI$_{2}$ and Fe$_{3}$GaBr$_{2}$ monolayers contain I and Br atoms rather than Te atoms, emulate electron-doped Fe$_{3}$GaTe$_{2}$ monolayer, resulting in notably high T$_c$ values of 867 K and 844 K, respectively. In contrast, the Fe$_3$GaSb$_2$ monolayer mimics hole-doped Fe$_{3}$GaTe$_{2}$ monolayer, presents a mix of FM and antiferromagnetic interactions, manifesting a distinctive partially ordered magnetic state. Our study demonstrates that substitution atoms based on the charge-doping effect offer a promising approach for predicting new magnetic materials. The proposed Fe$_{3}$GaI$_{2}$, Fe$_{3}$GaBr$_{2}$, and Fe$_{3}$GaSb$_{2}$ monolayers hold great potential for spintronics applications, and may stimulate the pursuit of new types of spin liquid. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.03853 [pdf, ps, other]

Lattice thermal conductivity of ZrSe2 based on the anharmonic phonon approach and on-the-fly machine learning force fields

Authors: Yong Lu, Fawei Zheng

Abstract: The lattice thermal conductivity (LTC) of ZrSe$_2$, a typical layered transition metal disulfide, has been calculated using a hybrid approach that combines force field molecular dynamics (MD) simulation and Boltzmann transport equation (BTE). In this approach, the phonon quasiparticle picture of each normal mode can be obtained directly by the velocity autocorrelation function and its power spectr… ▽ More The lattice thermal conductivity (LTC) of ZrSe$_2$, a typical layered transition metal disulfide, has been calculated using a hybrid approach that combines force field molecular dynamics (MD) simulation and Boltzmann transport equation (BTE). In this approach, the phonon quasiparticle picture of each normal mode can be obtained directly by the velocity autocorrelation function and its power spectrum projected in $q$-space. By employing the retarded one phonon Green's function method, the phonon quasiparticle frequency and lifetime of each independent normal mode are effectively determined. On-the-fly machine learning force fields combine the precision of quantum mechanics and the scale of classical MD to analyze sizable supercells with long-wavelength phonons. This yields accurate LTC by using sufficient $q$-samples in the Brillouin zone, which cannot be achieved at the \emph{ab initio} molecular dynamics scale. The convergent LTC tensor of bulk and monolayer ZrSe$_2$ decays faster with increasing temperature than the well-known $\frac{1}{T}$ scale, which is typically observed when considering only three-phonon scattering. The phonon lifetime and mean free path exhibit significant dependence on temperature. MD simulations encompass all orders of anharmonic effects, thereby enabling an accurate description of anharmonic interactions between phonons at finite temperatures. Moreover, this approach respects the contribution of each normal mode to the LTC based on the BTE, which facilitates the quantitative analysis of phonon anharmonic properties and the role of specific normal modes. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.02693 [pdf]

Interlayer magnetic interactions and ferroelectricity in $πぱい$/3-twisted CrX$_2$ (X = Se, Te) bilayers

Authors: Wenqi Yang, Xinlong Yang, Menglei Li, Lin Hu, Fawei Zheng

Abstract: Recently, two-dimensional (2D) bilayer magnetic systems have been widely studied. Their interlayer magnetic interactions play a vital role in the magnetic properties. In this paper, we theoretically studied the interlayer magnetic interactions, magnetic states and ferroelectricity of $πぱい$/3-twisted CrX$_2$ (X = Se, Te) bilayers ($πぱい$/3-CrX$_2$). Our study reveals that the lateral shift could switch… ▽ More Recently, two-dimensional (2D) bilayer magnetic systems have been widely studied. Their interlayer magnetic interactions play a vital role in the magnetic properties. In this paper, we theoretically studied the interlayer magnetic interactions, magnetic states and ferroelectricity of $πぱい$/3-twisted CrX$_2$ (X = Se, Te) bilayers ($πぱい$/3-CrX$_2$). Our study reveals that the lateral shift could switch the magnetic state of the $πぱい$/3-CrSe$_2$ between interlayer ferromagnetic and antiferromagnetic, while just tuning the strength of the interlayer antiferromagnetic interactions in $πぱい$/3-CrTe$_2$. Furthermore, the lateral shift can alter the off-plane electric polarization in both $πぱい$/3-CrSe$_2$ and $πぱい$/3-CrTe$_2$. These results show that stacking is an effective way to tune both the magnetic and ferroelectric properties of 1T-CrX$_2$ bilayers, making the 1T-CrX$_2$ bilayers hold promise for 2D spintronic devices. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2401.02689 [pdf, other]

Scaling Laws Governing the Elastic Properties of 3D-Graphenes

Authors: Ming Li, Guo Lu, Haodong Yu, Menglei Li, Fawei Zheng

Abstract: In this study, we have comprehensively investigated the scaling law for elastic properties of three-dimensional honeycomb-like graphenes (3D-graphenes) using hybrid neural network potential based molecular dynamics simulations and theoretical analyses. The elastic constants as functions of honeycomb hole size, denoted by the graphene wall length $L$, were provided. All five independent elastic con… ▽ More In this study, we have comprehensively investigated the scaling law for elastic properties of three-dimensional honeycomb-like graphenes (3D-graphenes) using hybrid neural network potential based molecular dynamics simulations and theoretical analyses. The elastic constants as functions of honeycomb hole size, denoted by the graphene wall length $L$, were provided. All five independent elastic constants in the large $L$ limit are proportional to $L^{-1}$. The associated coefficients are combinations of two-dimensional graphene's elastic constants. High-order terms including $L^{-2}$ and $L^{-3}$ emerge for finite $L$ values. They have three origins, the distorted areas close to the joint lines of 3D-graphenes, the variation of solid angles between graphene plates, and the bending distortion of graphene plates. Significantly, the chirality becomes essential with the decreasing of $L$, because the joint line structures are different between the armchair and zigzag type 3D-graphenes. Our findings provide insights into the elastic properties of graphene-based superstructures and can be used for further studies on graphene-based materials. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2401.02671 [pdf, ps, other]

Spin-Lattice Coupling Induced Rich Magnetic States in CrF$_3$ monolayer

Authors: Fawei Zheng, Yong Lu

Abstract: We systematically studied the spin-lattice couplings in the CrF$_3$ monolayer. Our study reveals that the spin exchange constants between the nearest neighbors are notably affected by these couplings. Specifically, the couplings arise predominantly from three distinct phonon modes, namely the covariant, rotation, and stretch of the Cr-F-Cr-F rhombus. By integrating out the phonon degrees of freedo… ▽ More We systematically studied the spin-lattice couplings in the CrF$_3$ monolayer. Our study reveals that the spin exchange constants between the nearest neighbors are notably affected by these couplings. Specifically, the couplings arise predominantly from three distinct phonon modes, namely the covariant, rotation, and stretch of the Cr-F-Cr-F rhombus. By integrating out the phonon degrees of freedom, we derived an effective spin Hamiltonian featuring four-spin product terms, which yields a remarkably intricate magnetic phase diagram. Significantly, numerous plateau states characterized by fractional magnetizations, including 1/2, 1/3, 2/3, 1/4, 1/5, 5/8, 1/9, and 2/9, emerge in the vicinity of the phase transition boundary separating ferromagnetic and antiferromagnetic states. These findings show the profound influence of spin-lattice couplings on magnetic properties near the magnetic phase boundaries, and the predicted plateau states are expected to be observable in future experiments. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2401.01010 [pdf, other]

Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt

Authors: Jiaqi Liu, Kai Wu, Qiang Nie, Ying Chen, Bin-Bin Gao, Yong Liu, Jinbao Wang, Chengjie Wang, Feng Zheng

Abstract: Unsupervised Anomaly Detection (UAD) with incremental training is crucial in industrial manufacturing, as unpredictable defects make obtaining sufficient labeled data infeasible. However, continual learning methods primarily rely on supervised annotations, while the application in UAD is limited due to the absence of supervision. Current UAD methods train separate models for different classes sequ… ▽ More Unsupervised Anomaly Detection (UAD) with incremental training is crucial in industrial manufacturing, as unpredictable defects make obtaining sufficient labeled data infeasible. However, continual learning methods primarily rely on supervised annotations, while the application in UAD is limited due to the absence of supervision. Current UAD methods train separate models for different classes sequentially, leading to catastrophic forgetting and a heavy computational burden. To address this issue, we introduce a novel Unsupervised Continual Anomaly Detection framework called UCADきゃど, which equips the UAD with continual learning capability through contrastively-learned prompts. In the proposed UCADきゃど, we design a Continual Prompting Module (CPM) by utilizing a concise key-prompt-knowledge memory bank to guide task-invariant `anomaly' model predictions using task-specific `normal' knowledge. Moreover, Structure-based Contrastive Learning (SCL) is designed with the Segment Anything Model (SAM) to improve prompt learning and anomaly segmentation results. Specifically, by treating SAM's masks as structure, we draw features within the same mask closer and push others apart for general feature representations. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation, demonstrating that our method is significantly better than anomaly detection methods, even with rehearsal training. The code will be available at https://github.com/shirowalker/UCAD. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI 2024

arXiv:2312.17432 [pdf, other]

Video Understanding with Large Language Models: A Survey

Authors: Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, Jianguo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

Abstract: With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly. Given the remarkable capabilities of Large Language Models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of the recent advancements in video understanding harnessing the power of LLMs (Vid-… ▽ More With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly. Given the remarkable capabilities of Large Language Models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of the recent advancements in video understanding harnessing the power of LLMs (Vid-LLMs). The emergent capabilities of Vid-LLMs are surprisingly advanced, particularly their ability for open-ended spatial-temporal reasoning combined with commonsense knowledge, suggesting a promising path for future video understanding. We examine the unique characteristics and capabilities of Vid-LLMs, categorizing the approaches into four main types: LLM-based Video Agents, Vid-LLMs Pretraining, Vid-LLMs Instruction Tuning, and Hybrid Methods. Furthermore, this survey presents a comprehensive study of the tasks, datasets, and evaluation methodologies for Vid-LLMs. Additionally, it explores the expansive applications of Vid-LLMs across various domains, highlighting their remarkable scalability and versatility in real-world video understanding challenges. Finally, it summarizes the limitations of existing Vid-LLMs and outlines directions for future research. For more information, readers are recommended to visit the repository at https://github.com/yunlong10/Awesome-LLMs-for-Video-Understanding. △ Less

Submitted 3 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.16046 [pdf, other]

AdaNAS: Adaptively Post-processing with Self-supervised Neural Architecture Search for Ensemble Rainfall Forecasts

Authors: Yingpeng Wen, Weijiang Yu, Fudan Zheng, Dan Huang, Nong Xiao

Abstract: Previous post-processing studies on rainfall forecasts using numerical weather prediction (NWP) mainly focus on statistics-based aspects, while learning-based aspects are rarely investigated. Although some manually-designed models are proposed to raise accuracy, they are customized networks, which need to be repeatedly tried and verified, at a huge cost in time and labor. Therefore, a self-supervi… ▽ More Previous post-processing studies on rainfall forecasts using numerical weather prediction (NWP) mainly focus on statistics-based aspects, while learning-based aspects are rarely investigated. Although some manually-designed models are proposed to raise accuracy, they are customized networks, which need to be repeatedly tried and verified, at a huge cost in time and labor. Therefore, a self-supervised neural architecture search (NAS) method without significant manual efforts called AdaNAS is proposed in this study to perform rainfall forecast post-processing and predict rainfall with high accuracy. In addition, we design a rainfall-aware search space to significantly improve forecasts for high-rainfall areas. Furthermore, we propose a rainfall-level regularization function to eliminate the effect of noise data during the training. Validation experiments have been performed under the cases of \emph{None}, \emph{Light}, \emph{Moderate}, \emph{Heavy} and \emph{Violent} on a large-scale precipitation benchmark named TIGGE. Finally, the average mean-absolute error (MAE) and average root-mean-square error (RMSE) of the proposed AdaNAS model are 0.98 and 2.04 mm/day, respectively. Additionally, the proposed AdaNAS model is compared with other neural architecture search methods and previous studies. Compared results reveal the satisfactory performance and superiority of the proposed AdaNAS model in terms of precipitation amount prediction and intensity classification. Concretely, the proposed AdaNAS model outperformed previous best-performing manual methods with MAE and RMSE improving by 80.5\% and 80.3\%, respectively. △ Less

Submitted 4 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.11872 [pdf, other]

Beyond Prototypes: Semantic Anchor Regularization for Better Representation Learning

Authors: Yanqi Ge, Qiang Nie, Ye Huang, Yong Liu, Chengjie Wang, Feng Zheng, Wen Li, Lixin Duan

Abstract: One of the ultimate goals of representation learning is to achieve compactness within a class and well-separability between classes. Many outstanding metric-based and prototype-based methods following the Expectation-Maximization paradigm, have been proposed for this objective. However, they inevitably introduce biases into the learning process, particularly with long-tail distributed training dat… ▽ More One of the ultimate goals of representation learning is to achieve compactness within a class and well-separability between classes. Many outstanding metric-based and prototype-based methods following the Expectation-Maximization paradigm, have been proposed for this objective. However, they inevitably introduce biases into the learning process, particularly with long-tail distributed training data. In this paper, we reveal that the class prototype is not necessarily to be derived from training features and propose a novel perspective to use pre-defined class anchors serving as feature centroid to unidirectionally guide feature learning. However, the pre-defined anchors may have a large semantic distance from the pixel features, which prevents them from being directly applied. To address this issue and generate feature centroid independent from feature learning, a simple yet effective Semantic Anchor Regularization (SAR) is proposed. SAR ensures the interclass separability of semantic anchors in the semantic space by employing a classifier-aware auxiliary cross-entropy loss during training via disentanglement learning. By pulling the learned features to these semantic anchors, several advantages can be attained: 1) the intra-class compactness and naturally inter-class separability, 2) induced bias or errors from feature learning can be avoided, and 3) robustness to the long-tailed problem. The proposed SAR can be used in a plug-and-play manner in the existing models. Extensive experiments demonstrate that the SAR performs better than previous sophisticated prototype-based methods. The implementation is available at https://github.com/geyanqi/SAR. △ Less

Submitted 4 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2312.09464 [pdf, other]

Enhanced Eye Diagram Estimation Method for Nonlinear Systems With Input Jitter

Authors: Hanqing Zhang, Feijun Zheng

Abstract: An enhanced multiple-edge response (MER) based eye diagram estimation method is proposed to evaluate the performance of nonlinear systems with input jitter. Compared with existing MER-based methods which only took into account the bit effect, the proposed method first determines both orders of bit effect and jitter effect. These decided orders can affirm the necessary MERs. Subsequently, the propo… ▽ More An enhanced multiple-edge response (MER) based eye diagram estimation method is proposed to evaluate the performance of nonlinear systems with input jitter. Compared with existing MER-based methods which only took into account the bit effect, the proposed method first determines both orders of bit effect and jitter effect. These decided orders can affirm the necessary MERs. Subsequently, the proposed method figures out the minimal number of sampling points so that the necessary MERs can be recovered quickly based on the Nyquist theory and can be used to create eye diagrams. Lastly, the eye diagrams and their parameters are compared with those generated by traditional transient simulation and an existing MER-based method which introduces input jitter through a convolution process. The result indicates that this enhanced method is more accurate than the existing MER-based method. △ Less

Submitted 10 November, 2023; originally announced December 2023.

Comments: The article was accepted but not published by EMC+SIPI 2023 because of failure to attend the conference for personal emergency reason. The information is attached at the end of article

arXiv:2311.18204 [pdf, other]

Carrier Transport in 2D Hybrid Organic-Inorganic Perovskites: the role of spacer molecules

Authors: Caihong Zheng, Fan Zheng

Abstract: Two-dimensional organic-inorganic hybrid perovskites (2D HOIPs) have been widely used for various optoelectronics owing to the excellent photoelectric properties. Recently, a great deal of studies have focused on engineering the organic spacer cation in 2D HOIPs to enhance the carrier transport and improve the performance of devices. However, the selection of organic spacer cations is mostly quali… ▽ More Two-dimensional organic-inorganic hybrid perovskites (2D HOIPs) have been widely used for various optoelectronics owing to the excellent photoelectric properties. Recently, a great deal of studies have focused on engineering the organic spacer cation in 2D HOIPs to enhance the carrier transport and improve the performance of devices. However, the selection of organic spacer cations is mostly qualitative without a quantitative guidance. Meanwhile, the fundamental mechanism of the carrier transport across the organic spacer layer is still unclear. Here, by using the first-principle non-adiabatic molecular dynamics (NAMD) method, we have studied the transport process of excited carriers between 2D HOIPs separated by the spacer cation layer. Various types of spacer cations of 2D HOIPs are investigated, where the carrier transport processes are simulated in real-time at atomic levels. We find that the excited electrons and holes can transfer from single-inorganic-layer 2D HOIP to bi-inorganic-layer 2D HOIP on a sub-picosecond scale, and different types of spacer cations can influence the carrier transfer rate significantly. Meanwhile, Dion-Jacobson (DJ) phase 2D HOIP leads to a more conductive carrier transport compared to the Ruddlesden-Popper (RP) phase, which is related to the different electron-phonon coupling strengths of these two phases. Moreover, we have developed a new method to capture the electron-hole interaction in the frame of NAMD. This work provides a promising direction to design new materials towards high performance optoelectronics. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: 19 pages, 6 figures, 81 references

arXiv:2311.10364 [pdf]

Elimination of the confrontation between theory and experiment in flexoelectric Bi2GeO5

Authors: Yuying Cao, Xulong Zhang, Long Zhou, Hongfei Liu, Hua Gao, Fu Zheng, Zhi Ma

Abstract: In this paper, we have investigated the flexoelectric effect of Bi2GeO5(BGO), successfully predicted the maximum flexoelectric coefficient of BGO, and tried to explore the difference between experimental and simulated flexoelectric coefficients. In this paper, we have investigated the flexoelectric effect of Bi2GeO5(BGO), successfully predicted the maximum flexoelectric coefficient of BGO, and tried to explore the difference between experimental and simulated flexoelectric coefficients. △ Less

Submitted 28 November, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: 16 pages,6 figures

arXiv:2311.09906 [pdf, ps, other]

Fino-Vezzoni conjecture on Lie algebras with abelian ideals of codimension two

Authors: Kexiang Cao, Fangyang Zheng

Abstract: In this paper, we confirm the Fino-Vezzoni Conjecture for unimodular Lie algebras which contain abelian ideals of codimension two, a natural generalization to the class of almost abelian Lie algebras. This provides new evidence towards the validity of the conjecture on a very special type of $3$-step solvmanifolds. In this paper, we confirm the Fino-Vezzoni Conjecture for unimodular Lie algebras which contain abelian ideals of codimension two, a natural generalization to the class of almost abelian Lie algebras. This provides new evidence towards the validity of the conjecture on a very special type of $3$-step solvmanifolds. △ Less

Submitted 18 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: 20 pages, updated some reference info and added a couple of appendices

MSC Class: 53C55

arXiv:2311.07585 [pdf, other]

Input Reconstruction Attack against Vertical Federated Large Language Models

Authors: Fei Zheng

Abstract: Recently, large language models (LLMs) have drawn extensive attention from academia and the public, due to the advent of the ChatGPT. While LLMs show their astonishing ability in text generation for various tasks, privacy concerns limit their usage in real-life businesses. More specifically, either the user's inputs (the user sends the query to the model-hosting server) or the model (the user down… ▽ More Recently, large language models (LLMs) have drawn extensive attention from academia and the public, due to the advent of the ChatGPT. While LLMs show their astonishing ability in text generation for various tasks, privacy concerns limit their usage in real-life businesses. More specifically, either the user's inputs (the user sends the query to the model-hosting server) or the model (the user downloads the complete model) itself will be revealed during the usage. Vertical federated learning (VFL) is a promising solution to this kind of problem. It protects both the user's input and the knowledge of the model by splitting the model into a bottom part and a top part, which is maintained by the user and the model provider, respectively. However, in this paper, we demonstrate that in LLMs, VFL fails to protect the user input since it is simple and cheap to reconstruct the input from the intermediate embeddings. Experiments show that even with a commercial GPU, the input sentence can be reconstructed in only one second. We also discuss several possible solutions to enhance the privacy of vertical federated LLMs. △ Less

Submitted 24 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.03887 [pdf, other]

Toward ground-truth optical coherence tomography via three-dimensional unsupervised deep learning processing and data

Authors: Renxiong Wu, Fei Zheng, Meixuan Li, Shaoyan Huang, Xin Ge, Linbo Liu, Yong Liu, Guangming Ni

Abstract: Optical coherence tomography (OCT) can perform non-invasive high-resolution three-dimensional (3D) imaging and has been widely used in biomedical fields, while it is inevitably affected by coherence speckle noise which degrades OCT imaging performance and restricts its applications. Here we present a novel speckle-free OCT imaging strategy, named toward-ground-truth OCT (tGT-OCT), that utilizes un… ▽ More Optical coherence tomography (OCT) can perform non-invasive high-resolution three-dimensional (3D) imaging and has been widely used in biomedical fields, while it is inevitably affected by coherence speckle noise which degrades OCT imaging performance and restricts its applications. Here we present a novel speckle-free OCT imaging strategy, named toward-ground-truth OCT (tGT-OCT), that utilizes unsupervised 3D deep-learning processing and leverages OCT 3D imaging features to achieve speckle-free OCT imaging. Specifically, our proposed tGT-OCT utilizes an unsupervised 3D-convolution deep-learning network trained using random 3D volumetric data to distinguish and separate speckle from real structures in 3D imaging volumetric space; moreover, tGT-OCT effectively further reduces speckle noise and reveals structures that would otherwise be obscured by speckle noise while preserving spatial resolution. Results derived from different samples demonstrated the high-quality speckle-free 3D imaging performance of tGT-OCT and its advancement beyond the previous state-of-the-art. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.01387 [pdf]

Millimeter-scale exfoliation of hBN with tunable flake thickness

Authors: Amy S. McKeown-Green, Helen J. Zeng, Ashley P. Saunders, Jiayi Li, Jenny Hu, Jiaojian Shi, Yuejun Shen, Feng Pan, Jennifer A. Dionne, Tony F. Heinz, Stephen Wu, Fan Zheng, Fang Liu

Abstract: As a two-dimensional (2D) dielectric material, hexagonal boron nitride (hBN) is in high demand for applications in photonics, nonlinear optics, and nanoelectronics. Unfortunately, the high-throughput preparation of macroscopic-scale, high-quality hBN flakes with controlled thickness is an ongoing challenge, limiting device fabrication and technological integration. Here, we present a metal thin-fi… ▽ More As a two-dimensional (2D) dielectric material, hexagonal boron nitride (hBN) is in high demand for applications in photonics, nonlinear optics, and nanoelectronics. Unfortunately, the high-throughput preparation of macroscopic-scale, high-quality hBN flakes with controlled thickness is an ongoing challenge, limiting device fabrication and technological integration. Here, we present a metal thin-film exfoliation method to prepare hBN flakes with millimeter-scale dimension, near-unity yields, and tunable flake thickness distribution from 1-7 layers, a substantial improvement over scotch tape exfoliation. The single crystallinity and high quality of the exfoliated hBN are demonstrated with optical microscopy, atomic force microscopy, Raman spectroscopy, and second harmonic generation. We further explore a possible mechanism for the effectiveness and selectivity based on thin-film residual stress measurements, density functional theory calculations, and transmission electron microscopy imaging of the deposited metal films. We find that the magnitude of the residual tensile stress induced by thin film deposition plays a key role in determining exfoliated flake thickness in a manner which closely resembles 3D semiconductor spalling. Lastly, we demonstrate that our exfoliated, large-area hBN flakes can be readily incorporated as encapsulating layers for other 2D monolayers. Altogether, this method brings us one step closer to the high throughput, mass production of hBN-based 2D photonic, optoelectronic, and quantum devices. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 21 pages, 5 figures, work completed at Stanford University

arXiv:2310.17082 [pdf, ps, other]

Does or did the supernova remnant Cassiopeia A operate as a PeVatron?

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE;… ▽ More For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE; $E_γがんま\geq 100$~TeV) $γがんま$-rays. In this context, the historical SNR Cassiopeia A (Cas A) is considered one of the most promising target for UHE observations. This paper presents the observation of Cas A and its vicinity by the LHAASO KM2A detector. The exceptional sensitivity of LHAASO KM2A in the UHE band, combined with the young age of Cas A, enabled us to derive stringent model-independent limits on the energy budget of UHE protons and nuclei accelerated by Cas A at any epoch after the explosion. The results challenge the prevailing paradigm that Cas A-type SNRs are major suppliers of PeV CRs in the Milky Way. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 11 pages, 3 figures, Accepted by the APJL

arXiv:2310.14002 [pdf, ps, other]

A note on compact homogeneous manifolds with Bismut parallel torsion

Authors: Fabio Podestà, Fangyang Zheng

Abstract: In this article, we investigate the class of Hermitian manifolds whose Bismut connection has parallel torsion ({\rm BTP} for brevity). In particular, we focus on the case where the manifold is (locally) homogeneous with respect to a group of holomorphic isometries and we fully characterize the compact Chern flat {\rm BTP} manifolds. Moreover we show that certain compact flag manifolds are {\rm BTP… ▽ More In this article, we investigate the class of Hermitian manifolds whose Bismut connection has parallel torsion ({\rm BTP} for brevity). In particular, we focus on the case where the manifold is (locally) homogeneous with respect to a group of holomorphic isometries and we fully characterize the compact Chern flat {\rm BTP} manifolds. Moreover we show that certain compact flag manifolds are {\rm BTP} if and only if the metric is Kähler or induced by the Cartan-Killing form and we then characterize {\rm BTP} invariant metrics on compact semisimple Lie groups which are Hermitian w.r.t. a Samelson structure and are projectable along the Tits fibration. We state a conjecture concerning the question when the Bismut connection of a BTP compact Hermitian locally homogeneous manifold has parallel curvature, giving examples and providing evidence in some special cases. △ Less

Submitted 21 October, 2023; originally announced October 2023.

MSC Class: 53C55; 53C25

arXiv:2310.08845 [pdf, other]

doi 10.1126/sciadv.adj2778

Very high energy gamma-ray emission beyond 10 TeV from GRB 221009A

Authors: Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the t… ▽ More The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the trigger. The intrinsic energy spectrum of gamma-rays can be described by a power-law after correcting for extragalactic background light (EBL) absorption. Such a hard spectrum challenges the synchrotron self-Compton (SSC) scenario of relativistic electrons for the afterglow emission above several TeV. Observations of gamma-rays up to 13 TeV from a source with a measured redshift of z=0.151 hints more transparency in intergalactic space than previously expected. Alternatively, one may invoke new physics such as Lorentz Invariance Violation (LIV) or an axion origin of very high energy (VHE) signals. △ Less

Submitted 22 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 49pages, 11figures

Journal ref: Science Advances, 9, eadj2778 (2023) 15 November 2023

Showing 1–50 of 392 results for author: Zheng, F