Search | arXiv e-print repository

SignLLM: Sign Languages Production Large Language Models

Authors: Sen Fang, Lei Wang, Ce Zheng, Yapeng Tian, Chen Chen

Abstract: In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose Si… ▽ More In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose SignLLM, the first multilingual Sign Language Production (SLP) model, which includes two novel multilingual SLP modes that allow for the generation of sign language gestures from input text or prompt. Both of the modes can use a new loss and a module based on reinforcement learning, which accelerates the training by enhancing the model's capability to autonomously sample high-quality data. We present benchmark results of SignLLM, which demonstrate that our model achieves state-of-the-art performance on SLP tasks across eight sign languages. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 33 pages, website at https://signllm.github.io/

arXiv:2405.10025 [pdf, other]

Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models

Authors: Yuchen Hu, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng, Ruizhe Li

Abstract: Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which aims to predict the ground-truth transcription from the decoded N-best hypotheses. Thanks to the strong language generation ability of LLMs and rich information in the N-best list, GER shows great effectiveness in enhancing ASR results. However, it still suf… ▽ More Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which aims to predict the ground-truth transcription from the decoded N-best hypotheses. Thanks to the strong language generation ability of LLMs and rich information in the N-best list, GER shows great effectiveness in enhancing ASR results. However, it still suffers from two limitations: 1) LLMs are unaware of the source speech during GER, which may lead to results that are grammatically correct but violate the source speech content, 2) N-best hypotheses usually only vary in a few tokens, making it redundant to send all of them for GER, which could confuse LLM about which tokens to focus on and thus lead to increased miscorrection. In this paper, we propose ClozeGER, a new paradigm for ASR generative error correction. First, we introduce a multimodal LLM (i.e., SpeechGPT) to receive source speech as extra input to improve the fidelity of correction output. Then, we reformat GER as a cloze test with logits calibration to remove the input information redundancy and simplify GER with clear instructions. Experiments show that ClozeGER achieves a new breakthrough over vanilla GER on 9 popular ASR datasets. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 14 pages, Accepted by ACL 2024

arXiv:2405.09839 [pdf, other]

Advances in Robust Federated Learning: Heterogeneity Considerations

Authors: Chuan Chen, Tianchi Liao, Xiaojun Deng, Zihou Wu, Sheng Huang, Zibin Zheng

Abstract: In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first… ▽ More In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first outline the basic concepts of heterogeneous federated learning and summarize the research challenges in federated learning in terms of five aspects: data, model, task, device, and communication. In addition, we explore how existing state-of-the-art approaches cope with the heterogeneity of federated learning, and categorize and review these approaches at three different levels: data-level, model-level, and architecture-level. Subsequently, the paper extensively discusses privacy-preserving strategies in heterogeneous federated learning environments. Finally, the paper discusses current open issues and directions for future research, aiming to promote the further development of heterogeneous federated learning. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.09359 [pdf, other]

Visual Attention Based Cognitive Human-Robot Collaboration for Pedicle Screw Placement in Robot-Assisted Orthopedic Surgery

Authors: Chen Chen, Qikai Zou, Yuhang Song, Shiji Song, Xiang Li

Abstract: Current orthopedic robotic systems largely focus on navigation, aiding surgeons in positioning a guiding tube but still requiring manual drilling and screw placement. The automation of this task not only demands high precision and safety due to the intricate physical interactions between the surgical tool and bone but also poses significant risks when executed without adequate human oversight. As… ▽ More Current orthopedic robotic systems largely focus on navigation, aiding surgeons in positioning a guiding tube but still requiring manual drilling and screw placement. The automation of this task not only demands high precision and safety due to the intricate physical interactions between the surgical tool and bone but also poses significant risks when executed without adequate human oversight. As it involves continuous physical interaction, the robot should collaborate with the surgeon, understand the human intent, and always include the surgeon in the loop. To achieve this, this paper proposes a new cognitive human-robot collaboration framework, including the intuitive AR-haptic human-robot interface, the visual-attention-based surgeon model, and the shared interaction control scheme for the robot. User studies on a robotic platform for orthopedic surgery are presented to illustrate the performance of the proposed method. The results demonstrate that the proposed human-robot collaboration framework outperforms full robot and full human control in terms of safety and ergonomics. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 7 pages, 8 figures, submitted to IROS 2024

arXiv:2405.09184 [pdf, other]

SRG/ART-XC all-sky X-ray survey: Catalog of sources detected during the first five surveys

Authors: S. Sazonov, R. Burenin, E. Filippova, R. Krivonos, V. Arefiev, K. Borisov, M. Buntov, C. -T. Chen, S. Ehlert, S. Garanin, M. Garin, S. Grigorovich, I. Lapshov, V. Levin, A. Lutovinov, I. Mereminskiy, S. Molkov, M. Pavlinsky, B. D. Ramsey, A. Semena, N. Semena, A. Shtykovsky, R. Sunyaev, A. Tkachenko, D. A. Swartz , et al. (5 additional authors not shown)

Abstract: We present an updated catalog of sources detected by the Mikhail Pavlinsky ART-XC telescope aboard the Spektrum-Roentgen-Gamma (SRG) observatory during its all-sky survey. It is based on the data of the first four and the partially completed fifth scans of the sky (ARTSS1-5). The catalog comprises 1545 sources detected in the 4-12 keV energy band. The achieved sensitivity ranges between… ▽ More We present an updated catalog of sources detected by the Mikhail Pavlinsky ART-XC telescope aboard the Spektrum-Roentgen-Gamma (SRG) observatory during its all-sky survey. It is based on the data of the first four and the partially completed fifth scans of the sky (ARTSS1-5). The catalog comprises 1545 sources detected in the 4-12 keV energy band. The achieved sensitivity ranges between $\sim 4\times 10^{-12}$ erg s$^{-1}$ cm$^{-2}$ near the ecliptic plane and $\sim 7\times 10^{-13}$ erg s$^{-1}$ cm$^{-2}$ near the ecliptic poles, which is a $\sim$30-50% improvement over the previous version of the catalog based on the first two all-sky scans (ARTSS12). There are $\sim 130$ objects, excluding the expected contribution of spurious detections, that were not known as X-ray sources before the SRG/ART-XC all-sky survey. We provide information, partly based on our ongoing follow-up optical spectroscopy program, on the identification and classification of the majority of the ARTSS1-5 sources (1463), of which 173 are tentative at the moment. The majority of the classified objects (964) are extragalactic, a small fraction (30) are located in the Local Group of galaxies, and 469 are Galactic. The dominant classes of objects in the catalog are active galactic nuclei (911) and cataclysmic variables (192). △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted for publication in A&A, 67 pages, 11 figures, the catalog of sources is included

arXiv:2405.09066 [pdf, other]

Search for the leptonic decays $D^{*+}\to e^+νにゅー_e$ and $D^{*+}\to μみゅー^+νにゅー_μみゅー$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, V. Batozskaya, D. Becker, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, J. Bloms, A. Bortone, I. Boyko , et al. (559 additional authors not shown)

Abstract: We present the first search for the leptonic decays $D^{*+}\to e^+νにゅー_e$ and $D^{*+}\to μみゅー^+νにゅー_μみゅー$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for… ▽ More We present the first search for the leptonic decays $D^{*+}\to e^+νにゅー_e$ and $D^{*+}\to μみゅー^+νにゅー_μみゅー$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for $D^{*+}\to e^+νにゅー_e$ and $D^{*+}\to μみゅー^+νにゅー_μみゅー$ are set to be $1.1 \times 10^{-5}$ and $4.3 \times 10^{-6}$ at 90\% confidence level, respectively. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 14 pages, 7 figures

arXiv:2405.08953 [pdf, other]

$b \to c l \barνにゅー$ decays at LHCb

Authors: Chen Chen

Abstract: This report introduces two recent measurements of semileptonic $b$-hadron decays at the LHCb experiment, including a test of Lepton Flavour Universality (LFU) using $\bar{B}^0 \to D^{(*)+} l^- \barνにゅー_l$ decays where $l\in\{μみゅー,τたう\}$, and a study of the $D^{*+}$ longitudinal polarisation in the $\bar{B}^0 \to D^{*+} τたう^- \barνにゅー_τたう$ decay. With the inclusion of the new results of the LFU ratios, the world… ▽ More This report introduces two recent measurements of semileptonic $b$-hadron decays at the LHCb experiment, including a test of Lepton Flavour Universality (LFU) using $\bar{B}^0 \to D^{(*)+} l^- \barνにゅー_l$ decays where $l\in\{μみゅー,τたう\}$, and a study of the $D^{*+}$ longitudinal polarisation in the $\bar{B}^0 \to D^{*+} τたう^- \barνにゅー_τたう$ decay. With the inclusion of the new results of the LFU ratios, the world average on $R(D)$ and $R(D^*)$, still shows a tension over three standard deviations from the SM prediction, while the measured $D^{*+}$ longitudinal polarisation is compatible with the SM value. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Contribution to the 2024 QCD session of the 58th Rencontres de Moriond

arXiv:2405.08838 [pdf, other]

PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset

Authors: Yang Hou, Haitao Fu, Chuankai Chen, Zida Li, Haoyu Zhang, Jianjun Zhao

Abstract: With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, an… ▽ More With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, and the few that are multimodal employ outdated techniques, and their audio content is limited to a single language, thereby failing to represent the cutting-edge advancements and globalization trends in current deepfake technologies. To address this gap, we propose a novel, multilingual, and multimodal deepfake dataset: PolyGlotFake. It includes content in seven languages, created using a variety of cutting-edge and popular Text-to-Speech, voice cloning, and lip-sync technologies. We conduct comprehensive experiments using state-of-the-art detection methods on PolyGlotFake dataset. These experiments demonstrate the dataset's significant challenges and its practical value in advancing research into multimodal deepfake detection. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 13 page, 4 figures

MSC Class: 68T45 ACM Class: I.4.9

arXiv:2405.08493 [pdf]

Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study

Authors: Qinfeng Zhu, Yuan Fang, Yuanzhi Cai, Cheng Chen, Lei Fan

Abstract: Deep learning methods, especially Convolutional Neural Networks (CNN) and Vision Transformer (ViT), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global re… ▽ More Deep learning methods, especially Convolutional Neural Networks (CNN) and Vision Transformer (ViT), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global receptive field, has gained extensive attention for vision tasks. In such tasks, images need to be serialized to form sequences compatible with the Mamba model. Numerous research efforts have explored scanning strategies to serialize images, aiming to enhance the Mamba model's understanding of images. However, the effectiveness of these scanning strategies remains uncertain. In this research, we conduct a comprehensive experimental investigation on the impact of mainstream scanning directions and their combinations on semantic segmentation of remotely sensed images. Through extensive experiments on the LoveDA, ISPRS Potsdam, and ISPRS Vaihingen datasets, we demonstrate that no single scanning strategy outperforms others, regardless of their complexity or the number of scanning directions involved. A simple, single scanning direction is deemed sufficient for semantic segmentation of high-resolution remotely sensed images. Relevant directions for future research are also recommended. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.08483 [pdf, other]

RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images

Authors: Zong-Wei Hong, Yen-Yang Hung, Chu-Song Chen

Abstract: In this work, we introduce a novel method for calculating the 6DoF pose of an object using a single RGB-D image. Unlike existing methods that either directly predict objects' poses or rely on sparse keypoints for pose recovery, our approach addresses this challenging task using dense correspondence, i.e., we regress the object coordinates for each visible pixel. Our method leverages existing objec… ▽ More In this work, we introduce a novel method for calculating the 6DoF pose of an object using a single RGB-D image. Unlike existing methods that either directly predict objects' poses or rely on sparse keypoints for pose recovery, our approach addresses this challenging task using dense correspondence, i.e., we regress the object coordinates for each visible pixel. Our method leverages existing object detection methods. We incorporate a re-projection mechanism to adjust the camera's intrinsic matrix to accommodate cropping in RGB-D images. Moreover, we transform the 3D object coordinates into a residual representation, which can effectively reduce the output space and yield superior performance. We conducted extensive experiments to validate the efficacy of our approach for 6D pose estimation. Our approach outperforms most previous methods, especially in occlusion scenarios, and demonstrates notable improvements over the state-of-the-art methods. Our code is available on https://github.com/AI-Application-and-Integration-Lab/RDPN6D. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Accepted by CVPR Workshop DLGC, 2024

arXiv:2405.08107 [pdf, other]

Studying geometry of the ultraluminous X-ray pulsar Swift J0243.6+6124 using X-ray and optical polarimetry

Authors: Juri Poutanen, Sergey S. Tsygankov, Victor Doroshenko, Sofia V. Forsblom, Peter Jenke, Philip Kaaret, Andrei V. Berdyugin, Dmitry Blinov, Vadim Kravtsov, Ioannis Liodakis, Anastasia Tzouvanou, Alessandro Di Marco, Jeremy Heyl, Fabio La Monaca, Alexander A. Mushtukov, George G. Pavlov, Alexander Salganik, Alexandra Veledina, Martin C. Weisskopf, Silvia Zane, Vladislav Loktev, Valery F. Suleimanov, Colleen Wilson-Hodge, Svetlana V. Berdyugina, Masato Kagitani , et al. (86 additional authors not shown)

Abstract: Discovery of pulsations from a number of ultra-luminous X-ray (ULX) sources proved that accretion onto neutron stars can produce luminosities exceeding the Eddington limit by a couple of orders of magnitude. The conditions necessary to achieve such high luminosities as well as the exact geometry of the accretion flow in the neutron star vicinity are, however, a matter of debate. The pulse phase-re… ▽ More Discovery of pulsations from a number of ultra-luminous X-ray (ULX) sources proved that accretion onto neutron stars can produce luminosities exceeding the Eddington limit by a couple of orders of magnitude. The conditions necessary to achieve such high luminosities as well as the exact geometry of the accretion flow in the neutron star vicinity are, however, a matter of debate. The pulse phase-resolved polarization measurements that became possible with the launch of the IXPE can be used to determine the pulsar geometry and its orientation relative to the orbital plane. They provide an avenue to test different theoretical models of ULX pulsars. In this paper we present the results of three IXPE observations of the first Galactic ULX pulsar Swift J0243.6+6124 during its 2023 outburst. We find strong variations of the polarization characteristics with the pulsar phase. The average polarization degree increases from about 5% to 15% as the flux dropped by a factor of three in the course of the outburst. The polarization angle (PA) as function of the pulsar phase shows two peaks in the first two observations, but changes to a characteristic sawtooth pattern in the remaining data set. This is not consistent with a simple rotating vector model. Assuming the existence of an additional constant polarized component, we were able to fit the three observations with a common rotating vector model and obtain constraints on the pulsar geometry. In particular, we find the pulsar angular momentum inclination with respect to the line-of-sight of 15-40 deg, the magnetic obliquity of 60-80 deg, and the pulsar spin position angle of -50 deg, which differs from the constant component PA of about 10 deg. Combining these X-ray measurements with the optical PA, we find evidence for a 30 deg misalignment between the pulsar spin and the binary orbital axis. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 13 pages, 10 figures, submitted to A&A

arXiv:2405.08020 [pdf, other]

ReActXGB: A Hybrid Binary Convolutional Neural Network Architecture for Improved Performance and Computational Efficiency

Authors: Po-Hsun Chu, Ching-Han Chen

Abstract: Binary convolutional neural networks (BCNNs) provide a potential solution to reduce the memory requirements and computational costs associated with deep neural networks (DNNs). However, achieving a trade-off between performance and computational resources remains a significant challenge. Furthermore, the fully connected layer of BCNNs has evolved into a significant computational bottleneck. This i… ▽ More Binary convolutional neural networks (BCNNs) provide a potential solution to reduce the memory requirements and computational costs associated with deep neural networks (DNNs). However, achieving a trade-off between performance and computational resources remains a significant challenge. Furthermore, the fully connected layer of BCNNs has evolved into a significant computational bottleneck. This is mainly due to the conventional practice of excluding the input layer and fully connected layer from binarization to prevent a substantial loss in accuracy. In this paper, we propose a hybrid model named ReActXGB, where we replace the fully convolutional layer of ReActNet-A with XGBoost. This modification targets to narrow the performance gap between BCNNs and real-valued networks while maintaining lower computational costs. Experimental results on the FashionMNIST benchmark demonstrate that ReActXGB outperforms ReActNet-A by 1.47% in top-1 accuracy, along with a reduction of 7.14% in floating-point operations (FLOPs) and 1.02% in model size. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: Accepted to ICCE-TW 2024

arXiv:2405.07870 [pdf]

Mapping the Invisible: A Framework for Tracking COVID-19 Spread Among College Students with Google Location Data

Authors: Prajindra Sankar Krishnan, Chai Phing Chen, Gamal Alkawsi, Sieh Kiong Tiong, Luiz Fernando Capretz

Abstract: The COVID-19 pandemic and the implementation of social distancing policies have rapidly changed people's visiting patterns, as reflected in mobility data that tracks mobility traffic using location trackers on cell phones. However, the frequency and duration of concurrent occupancy at specific locations govern the transmission rather than the number of customers visiting. Therefore, understanding… ▽ More The COVID-19 pandemic and the implementation of social distancing policies have rapidly changed people's visiting patterns, as reflected in mobility data that tracks mobility traffic using location trackers on cell phones. However, the frequency and duration of concurrent occupancy at specific locations govern the transmission rather than the number of customers visiting. Therefore, understanding how people interact in different locations is crucial to target policies, inform contact tracing, and prevention strategies. This study proposes an efficient way to reduce the spread of the virus among on-campus university students by developing a self-developed Google History Location Extractor and Indicator software based on real-world human mobility data. The platform enables policymakers and researchers to explore the possibility of future developments in the epidemic's spread and simulate the outcomes of human mobility and epidemic state under different epidemic control policies. It offers functions for determining potential contacts, assessing individual infection risks, and evaluating the effectiveness of on-campus policies. The proposed multi-functional platform facilitates the screening process by more accurately targeting potential virus carriers and aids in making informed decisions on epidemic control policies, ultimately contributing to preventing and managing future outbreaks. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 8 pages

Journal ref: Latin American Workshop on Data Fusion (LAFUSION 2023), November/2023, pp 1-8, Rio de Janeiro, Brazil

arXiv:2405.07741 [pdf, other]

Search for the radiative transition $χかい_{c1}(3872)\toγがんまψぷさい_2(3823)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (635 additional authors not shown)

Abstract: Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χかい_{c1}(3872)\toγがんまψぷさい_2(3823)$. No $χかい_{c1}(3872)\toγがんまψぷさい_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions… ▽ More Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χかい_{c1}(3872)\toγがんまψぷさい_2(3823)$. No $χかい_{c1}(3872)\toγがんまψぷさい_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions $\mathcal{B}(χかい_{c1}(3872)\toγがんまψぷさい_2(3823), ψぷさい_2(3823)\toγがんまχかい_{c1})/\mathcal{B}(χかい_{c1}(3872)\toπぱい^+πぱい^- J/ψぷさい)$ is set as 0.075 at the 90\% confidence level. Our result contradicts theoretical predictions under the assumption that the $χかい_{c1}(3872)$ is the pure charmonium state $χかい_{c1}(2P)$. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures

arXiv:2405.07594 [pdf, other]

RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration

Authors: Congjia Chen, Xiaoyu Jia, Yanhong Zheng, Yufu Qu

Abstract: Point cloud registration is a fundamental task for estimating rigid transformations between point clouds. Previous studies have used geometric information for extracting features, matching and estimating transformation. Recently, owing to the advancement of RGB-D sensors, researchers have attempted to utilize visual information to improve registration performance. However, these studies focused on… ▽ More Point cloud registration is a fundamental task for estimating rigid transformations between point clouds. Previous studies have used geometric information for extracting features, matching and estimating transformation. Recently, owing to the advancement of RGB-D sensors, researchers have attempted to utilize visual information to improve registration performance. However, these studies focused on extracting distinctive features by deep feature fusion, which cannot effectively solve the negative effects of each feature's weakness, and cannot sufficiently leverage the valid information. In this paper, we propose a new feature combination framework, which applies a looser but more effective fusion and can achieve better performance. An explicit filter based on transformation consistency is designed for the combination framework, which can overcome each feature's weakness. And an adaptive threshold determined by the error distribution is proposed to extract more valid information from the two types of features. Owing to the distinctive design, our proposed framework can estimate more accurate correspondences and is applicable to both hand-crafted and learning-based feature descriptors. Experiments on ScanNet show that our method achieves a state-of-the-art performance and the rotation accuracy of 99.1%. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07577 [pdf, other]

doi 10.3847/2041-8213/ad4a68

Discovery of a shock-compressed magnetic field in the north-western rim of the young supernova remnant RX J1713.7-3946 with X-ray polarimetry

Authors: Riccardo Ferrazzoli, Dmitry Prokhorov, Niccolò Bucciantini, Patrick Slane, Jacco Vink, Martina Cardillo, Yi-Jung Yang, Stefano Silvestri, Ping Zhou, Enrico Costa, Nicola Omodei, C. -Y. Ng, Paolo Soffitta, Martin C. Weisskopf, Luca Baldini, Alessandro Di Marco, Victor Doroshenko, Jeremy Heyl, Philip Kaaret, Dawoon E. Kim, Frédéric Marin, Tsunefumi Mizuno, Melissa Pesce-Rollins, Carmelo Sgrò, Douglas A. Swartz , et al. (77 additional authors not shown)

Abstract: Supernova remnants (SNRs) provide insights into cosmic-ray acceleration and magnetic field dynamics at shock fronts. Recent X-ray polarimetric measurements by the Imaging X-ray Polarimetry Explorer (IXPE) have revealed radial magnetic fields near particle acceleration sites in young SNRs, including Cassiopeia A, Tycho, and SN 1006. We present here the spatially-resolved IXPE X-ray polarimetric obs… ▽ More Supernova remnants (SNRs) provide insights into cosmic-ray acceleration and magnetic field dynamics at shock fronts. Recent X-ray polarimetric measurements by the Imaging X-ray Polarimetry Explorer (IXPE) have revealed radial magnetic fields near particle acceleration sites in young SNRs, including Cassiopeia A, Tycho, and SN 1006. We present here the spatially-resolved IXPE X-ray polarimetric observation of the northwestern rim of SNR RX J1713.7-3946. For the first time, our analysis shows that the magnetic field in particle acceleration sites of this SNR is oriented tangentially with respect to the shock front. Because of the lack of precise Faraday-rotation measurements in the radio band, this was not possible before. The average measured polarization degree (PD) of the synchtrotron emission is 12.5 {\pm} 3.3%, lower than the one measured by IXPE in SN 1006, comparable to the Tycho one, but notably higher than the one in Cassiopeia A. On sub-parsec scales, localized patches within RX J1713.7-3946 display PD up to 41.5 {\pm} 9.5%. These results are compatible with a shock-compressed magnetic field. However, in order to explain the observed PD, either the presence of a radial net magnetic field upstream of the shock, or partial reisotropization of the turbulence downstream by radial magneto-hydrodynamical instabilities, can be invoked. From comparison of PD and magnetic field distribution with γがんま-rays and 12 CO data, our results provide new inputs in favor of a leptonic origin of the γがんま-ray emission. △ Less

Submitted 10 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: 18 pages, 6 figures, 2 tables, published in ApJ Letters

Journal ref: ApJL 967 L38 (2024)

arXiv:2405.07409 [pdf, other]

ZBanner: Fast Stateless Scanning Capable of Obtaining Responses over TCP

Authors: Chiyu Chen, Yuliang Lu, Guozheng Yang, Yi Xie, Shasha Guo

Abstract: Fast large-scale network scanning is an important way to understand internet service configurations and security in real time, among which stateless scan is representative. Existing stateless scanners can perform single-packet scans for internet-wide network measurements but are limited to host discovery or port scanning. To obtain further information over TCP, slower stateful scanners must be use… ▽ More Fast large-scale network scanning is an important way to understand internet service configurations and security in real time, among which stateless scan is representative. Existing stateless scanners can perform single-packet scans for internet-wide network measurements but are limited to host discovery or port scanning. To obtain further information over TCP, slower stateful scanners must be used in conjunction which spend more time and memory because of connection state maintenance. Through simplifying TCP finite state machine, this paper proposes a novel stateless scanning model, which can establish TCP connections and obtain further responses in a completely stateless manner. Based on this model, we implement ZBanner, an improved modular stateless scanner that utilizes user-defined probes for identifying services and versions, fingerprinting TLS servers, etc. We present unique design of ZBanner and experimentally characterize its feasibility and performance. Experiments show that ZBanner performs better than current state-of-the-art solutions in terms of scan rate and memory usage. ZBanner achieves at least three times faster than current tools for generic ports and over 90 times faster for open ports while keeping a minimum and stable memory usage. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: The paper has been submitted and the code will be published later

arXiv:2405.07386 [pdf, other]

Search for lepton-flavor-violating $τたう^- \to μみゅー^-μみゅー^+μみゅー^-$ decays at Belle II

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Aihara, N. Akopov, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien, F. Becherer, J. Becker , et al. (407 additional authors not shown)

Abstract: We present the result of a search for the charged-lepton-flavor violating decay $τたう^- \to μみゅー^-μみゅー^+μみゅー^-$ using a $424fb^{-1}$ sample of data recorded by the Belle II experiment at the SuperKEKB $e^{-}e^{+}$ collider. The selection of $e^{-}e^{+}\toτたう^+τたう^-$ events is based on an inclusive reconstruction of the non-signal tau decay, and on a boosted decision tree to suppress background. We observe one sig… ▽ More We present the result of a search for the charged-lepton-flavor violating decay $τたう^- \to μみゅー^-μみゅー^+μみゅー^-$ using a $424fb^{-1}$ sample of data recorded by the Belle II experiment at the SuperKEKB $e^{-}e^{+}$ collider. The selection of $e^{-}e^{+}\toτたう^+τたう^-$ events is based on an inclusive reconstruction of the non-signal tau decay, and on a boosted decision tree to suppress background. We observe one signal candidate, which is compatible with the expectation from background processes. We set a $90\%$ confidence level upper limit of $1.9 \times 10^{-8}$ on the branching fraction of the \taumu decay, which is the most stringent bound to date. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Report number: Belle II Preprint 2024-012 KEK Preprint 2024-6

arXiv:2405.07142 [pdf, other]

Cross-Domain Continual Learning via CLAMP

Authors: Weiwei Weng, Mahardhika Pratama, Jie Zhang, Chen Chen, Edward Yapp Kien Yee, Ramasamy Savitha

Abstract: Artificial neural networks, celebrated for their human-like cognitive learning abilities, often encounter the well-known catastrophic forgetting (CF) problem, where the neural networks lose the proficiency in previously acquired knowledge. Despite numerous efforts to mitigate CF, it remains the significant challenge particularly in complex changing environments. This challenge is even more pronoun… ▽ More Artificial neural networks, celebrated for their human-like cognitive learning abilities, often encounter the well-known catastrophic forgetting (CF) problem, where the neural networks lose the proficiency in previously acquired knowledge. Despite numerous efforts to mitigate CF, it remains the significant challenge particularly in complex changing environments. This challenge is even more pronounced in cross-domain adaptation following the continual learning (CL) setting, which is a more challenging and realistic scenario that is under-explored. To this end, this article proposes a cross-domain CL approach making possible to deploy a single model in such environments without additional labelling costs. Our approach, namely continual learning approach for many processes (CLAMP), integrates a class-aware adversarial domain adaptation strategy to align a source domain and a target domain. An assessor-guided learning process is put forward to navigate the learning process of a base model assigning a set of weights to every sample controlling the influence of every sample and the interactions of each loss function in such a way to balance the stability and plasticity dilemma thus preventing the CF problem. The first assessor focuses on the negative transfer problem rejecting irrelevant samples of the source domain while the second assessor prevents noisy pseudo labels of the target domain. Both assessors are trained in the meta-learning approach using random transformation techniques and similar samples of the source domain. Theoretical analysis and extensive numerical validations demonstrate that CLAMP significantly outperforms established baseline algorithms across all experiments by at least $10\%$ margin. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: Under Review in Elsevier Journal

arXiv:2405.07090 [pdf, other]

MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling

Authors: Sidong Feng, Suyu Ma, Han Wang, David Kong, Chunyang Chen

Abstract: The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI… ▽ More The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI data from Android apps, leveraging Large Language Models (LLMs) to mimic human-like exploration. To ensure dataset quality, we employ the best practices in UI noise filtering and incorporate human annotation as a final validation step. Our results demonstrate the effectiveness of LLMs-enhanced app exploration in mining more meaningful UIs, resulting in a large dataset MUD of 18k human-annotated UIs from 3.3k apps. We highlight the usefulness of MUD in two common UI modeling tasks: element detection and UI retrieval, showcasing its potential to establish a foundation for future research into high-quality, modern UIs. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06556 [pdf, other]

Search for time-dependent $CP$ violation in $D^0 \rightarrow πぱい^+ πぱい^- πぱい^0$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

Abstract: A measurement of time-dependent $CP$ violation in $D^0 \rightarrow πぱい^+ πぱい^- πぱい^0$ decays using a $pp$ collision data sample collected by the LHCb experiment in 2012 and from 2015 to 2018, corresponding to an integrated luminosity of 7.7$\,\mathrm{fb}^{-1}$, is presented. The initial flavour of each $D^0$ candidate is determined from the charge of the pion produced in the… ▽ More A measurement of time-dependent $CP$ violation in $D^0 \rightarrow πぱい^+ πぱい^- πぱい^0$ decays using a $pp$ collision data sample collected by the LHCb experiment in 2012 and from 2015 to 2018, corresponding to an integrated luminosity of 7.7$\,\mathrm{fb}^{-1}$, is presented. The initial flavour of each $D^0$ candidate is determined from the charge of the pion produced in the $D^*(2010)^+ \rightarrow D^0 πぱい^+$ decay. The decay $D^0 \rightarrow K^- πぱい^+ πぱい^0$ is used as a control channel to validate the measurement procedure. The gradient of the time-dependent $CP$ asymmetry, $ΔでるたY$, in $D^0 \rightarrow πぱい^+ πぱい^- πぱい^0$ decays is measured to be \begin{equation*} ΔでるたY = (-1.3 \pm 6.3 \pm 2.4) \times 10^{-4}, \end{equation*} where the first uncertainty is statistical and the second is systematic, which is compatible with $CP$ conservation. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lhcbproject.web.cern.ch/Publications/p/LHCb-PAPER-2024-003.html (LHCb public pages)

Report number: LHCb-PAPER-2024-003, CERN-EP-2024-111

arXiv:2405.06449 [pdf]

doi 10.1002/adma.202401118

Disorder-broadened phase boundary with enhanced amorphous superconductivity in pressurized In2Te5

Authors: Yi Zhao, Tianping Ying, Lingxiao Zhao, Juefei Wu, Cuiying Pei, Jing Chen, Jun Deng, Qinghua Zhang, Lin Gu, Qi Wang, Weizheng Cao, Changhua Li, Shihao Zhu, Mingxin Zhang, Na Yu, Lili Zhang, Yulin Chen, Chui-Zhen Chen, Tongxu Yu, Yanpeng Qi

Abstract: As an empirical tool in materials science and engineering, the iconic phase diagram owes its robustness and practicality to the topological characteristics rooted in the celebrated Gibbs phase law (F = C - P + 2). When crossing the phase diagram boundary, the structure transition occurs abruptly, bringing about an instantaneous change in physical properties and limited controllability on the bound… ▽ More As an empirical tool in materials science and engineering, the iconic phase diagram owes its robustness and practicality to the topological characteristics rooted in the celebrated Gibbs phase law (F = C - P + 2). When crossing the phase diagram boundary, the structure transition occurs abruptly, bringing about an instantaneous change in physical properties and limited controllability on the boundaries (F = 1). Here, we expand the sharp phase boundary to an amorphous transition region (F = 2) by partially disrupting the long-range translational symmetry, leading to a sequential crystalline-amorphous-crystalline (CAC) transition in a pressurized In2Te5 single crystal. Through detailed in-situ synchrotron diffraction, we elucidate that the phase transition stems from the rotation of immobile blocks [In2Te2]2+, linked by hinge-like [Te3]2- trimers. Remarkably, within the amorphous region, the amorphous phase demonstrates a notable 25 % increase of the superconducting transition temperature (Tc), while the carrier concentration remains relatively constant. Furthermore, we propose a theoretical framework revealing that the unconventional boost in amorphous superconductivity might be attributed to an intensified electron correlation, triggered by a disorder-augmented multifractal behavior. These findings underscore the potential of disorder and prompt further exploration of unforeseen phenomena on the phase boundaries. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 14 pages, 4 figures, Accepted for publication in Advanced Materials

arXiv:2405.06393 [pdf, other]

Measurement of the ${e}^{+}{e}^{-}\to p \bar{p}πぱい^{0}$ cross section at $\sqrt{s}=2.1000-3.0800$ GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The process $e^{+}e^{-}\to p\bar{p}πぱい^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}πぱい^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the… ▽ More The process $e^{+}e^{-}\to p\bar{p}πぱい^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}πぱい^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the $p\bar{p}πぱい^0$ energy threshold, we can probe the threshold behavior for this reaction. However, no anomalous threshold enhancement is found in the cross sections for $e^{+}e^{-}\to p\bar{p}πぱい^{0}$. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06262 [pdf, other]

Creating cyclo-N$_5$$^{+}$ cation and assembling N$_5$$^{+}$N$_5$$^{-}$ salt via electronegativity co-matching in tailored ionic compounds

Authors: Bi Zhang, Yu Xin, Meiling Xu, Yiming Zhang, Yinwei Li, Yanchao Wang, Changfeng Chen

Abstract: The recent discovery of crystalline pentazolates marks a major advance in polynitrogen science and raises prospects of making the long-touted potent propellant N$_5$$^{+}$N$_5$$^{-}$ salt. However, despite the synthesis of cyclo-N$_5$$^{-}$ anion in pentazolates, counter cation cyclo-N$_5$$^{+}$ remains elusive due to the strong oxidizing power of pentazole ion; moreover, pure N$_5$$^{+}$N$_5$… ▽ More The recent discovery of crystalline pentazolates marks a major advance in polynitrogen science and raises prospects of making the long-touted potent propellant N$_5$$^{+}$N$_5$$^{-}$ salt. However, despite the synthesis of cyclo-N$_5$$^{-}$ anion in pentazolates, counter cation cyclo-N$_5$$^{+}$ remains elusive due to the strong oxidizing power of pentazole ion; moreover, pure N$_5$$^{+}$N$_5$$^{-}$ salt is known to be unstable. Here, we devise a new strategy for making rare cyclo-N$_5$$^{+}$ cation and assembling the long-sought N$_5$$^{+}$N$_5$$^{-}$ salt in tailored ionic compounds, wherein the negative/positive host ions act as oxidizing/reducing agents to form cyclo-N$_5$$^{+}$/N$_5$$^{-}$ species. This strategy is implemented via an advanced computational crystal structure search, which identifies XN$_5$N$_5$F (X = Li, Na, K) compounds that stabilize at high pressures and remain viable at ambient pressure-temperature conditions based on \textit{ab initio} molecular dynamics simulations. This finding opens an avenue for creating and stabilizing N$_5$$^{+}$N$_5$$^{-}$ salt assembly in ionic compounds, where cyclo-N$_5$ species are oxidized/reduced via co-matching with host ions of high/low electronegativity. The present results demonstrate novel polynitrogen chemistry, and these findings offer new insights and prospects in the design and synthesis of diverse chemical species that exhibit unusual charge states, bonding structures, and superior functionality. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 6 pages, 5 figures

arXiv:2405.06038 [pdf, other]

From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks

Authors: Xue Geng, Zhe Wang, Chunyun Chen, Qing Xu, Kaixin Xu, Chao Jin, Manas Gupta, Xulei Yang, Zhenghua Chen, Mohamed M. Sabry Aly, Jie Lin, Min Wu, Xiaoli Li

Abstract: Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research of compress… ▽ More Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research of compression methods to achieve model efficiency while retaining the performance. Furthermore, more and more works focus on customizing the DNN hardware accelerators to better leverage the model compression techniques. In addition to efficiency, preserving security and privacy is critical for deploying DNNs. However, the vast and diverse body of related works can be overwhelming. This inspires us to conduct a comprehensive survey on recent research toward the goal of high-performance, cost-efficient, and safe deployment of DNNs. Our survey first covers the mainstream model compression techniques such as model quantization, model pruning, knowledge distillation, and optimizations of non-linear operations. We then introduce recent advances in designing hardware accelerators that can adapt to efficient model compression approaches. Additionally, we discuss how homomorphic encryption can be integrated to secure DNN deployment. Finally, we discuss several issues, such as hardware evaluation, generalization, and integration of various compression approaches. Overall, we aim to provide a big picture of efficient DNNs, from algorithm to hardware accelerators and security perspectives. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: This manuscript is the accepted version for TNNLS(IEEE Transactions on Neural Networks and Learning Systems)

arXiv:2405.05953 [pdf, other]

Frame Interpolation with Consecutive Brownian Bridge Diffusion

Authors: Zonglin Lyu, Ming Li, Jianbo Jiao, Chen Chen

Abstract: Recent work in Video Frame Interpolation (VFI) tries to formulate VFI as a diffusion-based conditional image generation problem, synthesizing the intermediate frame given a random noise and neighboring frames. Due to the relatively high resolution of videos, Latent Diffusion Models (LDMs) are employed as the conditional generation model, where the autoencoder compresses images into latent represen… ▽ More Recent work in Video Frame Interpolation (VFI) tries to formulate VFI as a diffusion-based conditional image generation problem, synthesizing the intermediate frame given a random noise and neighboring frames. Due to the relatively high resolution of videos, Latent Diffusion Models (LDMs) are employed as the conditional generation model, where the autoencoder compresses images into latent representations for diffusion and then reconstructs images from these latent representations. Such a formulation poses a crucial challenge: VFI expects that the output is deterministically equal to the ground truth intermediate frame, but LDMs randomly generate a diverse set of different images when the model runs multiple times. The reason for the diverse generation is that the cumulative variance (variance accumulated at each step of generation) of generated latent representations in LDMs is large. This makes the sampling trajectory random, resulting in diverse rather than deterministic generations. To address this problem, we propose our unique solution: Frame Interpolation with Consecutive Brownian Bridge Diffusion. Specifically, we propose consecutive Brownian Bridge diffusion that takes a deterministic initial value as input, resulting in a much smaller cumulative variance of generated latent representations. Our experiments suggest that our method can improve together with the improvement of the autoencoder and achieve state-of-the-art performance in VFI, leaving strong potential for further enhancement. △ Less

Submitted 11 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: corrected typo

arXiv:2405.05745 [pdf, other]

Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation

Authors: Chen Chen, Kai Qiao, Jie Yang, Jian Chen, Bin Yan

Abstract: Element segmentation is a key step in nondestructive testing of Printed Circuit Boards (PCB) based on Computed Tomography (CT) technology. In recent years, the rapid development of self-supervised pretraining technology can obtain general image features without labeled samples, and then use a small amount of labeled samples to solve downstream tasks, which has a good potential in PCB element segme… ▽ More Element segmentation is a key step in nondestructive testing of Printed Circuit Boards (PCB) based on Computed Tomography (CT) technology. In recent years, the rapid development of self-supervised pretraining technology can obtain general image features without labeled samples, and then use a small amount of labeled samples to solve downstream tasks, which has a good potential in PCB element segmentation. At present, Masked Image Modeling (MIM) pretraining model has been initially applied in PCB CT image element segmentation. However, due to the small and regular size of PCB elements such as vias, wires, and pads, the global visual field has redundancy for a single element reconstruction, which may damage the performance of the model. Based on this issue, we propose an efficient pretraining model based on multi-scale local visual field feature reconstruction for PCB CT image element segmentation (EMLR-seg). In this model, the teacher-guided MIM pretraining model is introduced into PCB CT image element segmentation for the first time, and a multi-scale local visual field extraction (MVE) module is proposed to reduce redundancy by focusing on local visual fields. At the same time, a simple 4-Transformer-blocks decoder is used. Experiments show that EMLR-seg can achieve 88.6% mIoU on the PCB CT image dataset we proposed, which exceeds 1.2% by the baseline model, and the training time is reduced by 29.6 hours, a reduction of 17.4% under the same experimental condition, which reflects the advantage of EMLR-seg in terms of performance and efficiency. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05496 [pdf, other]

Boosting Large Language Models with Continual Learning for Aspect-based Sentiment Analysis

Authors: Xuanwen Ding, Jie Zhou, Liang Dou, Qin Chen, Yuanbin Wu, Chengcai Chen, Liang He

Abstract: Aspect-based sentiment analysis (ABSA) is an important subtask of sentiment analysis, which aims to extract the aspects and predict their sentiments. Most existing studies focus on improving the performance of the target domain by fine-tuning domain-specific models (trained on source domains) based on the target domain dataset. Few works propose continual learning tasks for ABSA, which aim to lear… ▽ More Aspect-based sentiment analysis (ABSA) is an important subtask of sentiment analysis, which aims to extract the aspects and predict their sentiments. Most existing studies focus on improving the performance of the target domain by fine-tuning domain-specific models (trained on source domains) based on the target domain dataset. Few works propose continual learning tasks for ABSA, which aim to learn the target domain's ability while maintaining the history domains' abilities. In this paper, we propose a Large Language Model-based Continual Learning (\texttt{LLM-CL}) model for ABSA. First, we design a domain knowledge decoupling module to learn a domain-invariant adapter and separate domain-variant adapters dependently with an orthogonal constraint. Then, we introduce a domain knowledge warmup strategy to align the representation between domain-invariant and domain-variant knowledge. In the test phase, we index the corresponding domain-variant knowledge via domain positioning to not require each sample's domain ID. Extensive experiments over 19 datasets indicate that our \texttt{LLM-CL} model obtains new state-of-the-art performance. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04719 [pdf, other]

First detection of CF$^{+}$ in the Large Magellanic Cloud

Authors: Yan Gong, Karl M. Menten, Arshia M. Jacob, Christian Henkel, C. -H. Rosie Chen

Abstract: CF$^{+}$ has been established as a valuable diagnostic tool for investigating photo-dissociation regions (PDRs) and fluorine abundances in the Milky Way. However, its role in extragalactic environments remains largely uncharted. Our objective is to explore the significance of CF$^{+}$ in the Large Magellanic Cloud (LMC) and assess its utility as a valuable probe for examining C$^{+}$ and fluorine… ▽ More CF$^{+}$ has been established as a valuable diagnostic tool for investigating photo-dissociation regions (PDRs) and fluorine abundances in the Milky Way. However, its role in extragalactic environments remains largely uncharted. Our objective is to explore the significance of CF$^{+}$ in the Large Magellanic Cloud (LMC) and assess its utility as a valuable probe for examining C$^{+}$ and fluorine abundances in external galaxies. We performed pointed CF$^{+}$ observations toward an active star-forming region, N113 in the LMC, using the Atacama Pathfinder EXperiment 12~m sub-millimeter telescope. We report the first discovery of CF$^{+}$ in the LMC through the successful detection of the CF$^{+}$ (2$\to$1) and (3$\to$2) lines. The excitation models indicate that CF$^{+}$ emission originates from dense PDRs characterized by an H$_{2}$ number density of $(0.5-7.9)\times 10^{4}$~cm$^{-3}$ in N113. Our observations provide the first constraint on the fluorine abundance in molecular clouds in the LMC, disclosing a value of $\lesssim 1.7\times 10^{-9}$. This value is about an order of magnitude lower than those previously measured toward red giants in the LMC, indicative of fluorine deficiency in the molecular gas. The estimated column density ratio between C$^{+}$ and CF$^{+}$ appears to be lower than the anticipated equilibrium ratio derived from the fluorine abundance in red giants. Both phenomena can be explained by the deficiency of CF$^{+}$ caused by the freeze-out of its primary chemical precursor, HF, onto dust grains. The deficiency of CF$^{+}$ within molecular clouds suggests that the measurements presented in this work serve exclusively as conservative estimates, establishing lower bounds for both the fluorine abundance and C$^{+}$ column densities in external galaxies. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 9 pages, 6 figures, 1 table, accepted for publication in A&A

arXiv:2405.04589 [pdf, other]

A Novel Wide-Area Multiobject Detection System with High-Probability Region Searching

Authors: Xianlei Long, Hui Zhao, Chao Chen, Fuqiang Gu, Qingyi Gu

Abstract: In recent years, wide-area visual surveillance systems have been widely applied in various industrial and transportation scenarios. These systems, however, face significant challenges when implementing multi-object detection due to conflicts arising from the need for high-resolution imaging, efficient object searching, and accurate localization. To address these challenges, this paper presents a h… ▽ More In recent years, wide-area visual surveillance systems have been widely applied in various industrial and transportation scenarios. These systems, however, face significant challenges when implementing multi-object detection due to conflicts arising from the need for high-resolution imaging, efficient object searching, and accurate localization. To address these challenges, this paper presents a hybrid system that incorporates a wide-angle camera, a high-speed search camera, and a galvano-mirror. In this system, the wide-angle camera offers panoramic images as prior information, which helps the search camera capture detailed images of the targeted objects. This integrated approach enhances the overall efficiency and effectiveness of wide-area visual detection systems. Specifically, in this study, we introduce a wide-angle camera-based method to generate a panoramic probability map (PPM) for estimating high-probability regions of target object presence. Then, we propose a probability searching module that uses the PPM-generated prior information to dynamically adjust the sampling range and refine target coordinates based on uncertainty variance computed by the object detector. Finally, the integration of PPM and the probability searching module yields an efficient hybrid vision system capable of achieving 120 fps multi-object search and detection. Extensive experiments are conducted to verify the system's effectiveness and robustness. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Accepted by ICRA 2024

Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2405.04041 [pdf, other]

Feature Map Convergence Evaluation for Functional Module

Authors: Ludan Zhang, Chaoyi Chen, Lei He, Keqiang Li

Abstract: Autonomous driving perception models are typically composed of multiple functional modules that interact through complex relationships to accomplish environment understanding. However, perception models are predominantly optimized as a black box through end-to-end training, lacking independent evaluation of functional modules, which poses difficulties for interpretability and optimization. Pioneer… ▽ More Autonomous driving perception models are typically composed of multiple functional modules that interact through complex relationships to accomplish environment understanding. However, perception models are predominantly optimized as a black box through end-to-end training, lacking independent evaluation of functional modules, which poses difficulties for interpretability and optimization. Pioneering in the issue, we propose an evaluation method based on feature map analysis to gauge the convergence of model, thereby assessing functional modules' training maturity. We construct a quantitative metric named as the Feature Map Convergence Score (FMCS) and develop Feature Map Convergence Evaluation Network (FMCE-Net) to measure and predict the convergence degree of models respectively. FMCE-Net achieves remarkable predictive accuracy for FMCS across multiple image classification experiments, validating the efficacy and robustness of the introduced approach. To the best of our knowledge, this is the first independent evaluation method for functional modules, offering a new paradigm for the training assessment towards perception models. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03357 [pdf, ps, other]

doi 10.1145/3659463.3660021

A Game Theoretic Analysis of Validator Strategies in Ethereum 2.0

Authors: Chien-Chih Chen, Wojciech Golab

Abstract: Ethereum 2.0 is the second-largest cryptocurrency by market capitalization and a widely used smart contract platform. Therefore, examining the reliability of Ethereum 2.0's incentive mechanism is crucial, particularly its effectiveness in encouraging validators to adhere to the Ethereum 2.0's protocol. This paper studies the incentive mechanism of Ethereum 2.0 and evaluates its robustness by analy… ▽ More Ethereum 2.0 is the second-largest cryptocurrency by market capitalization and a widely used smart contract platform. Therefore, examining the reliability of Ethereum 2.0's incentive mechanism is crucial, particularly its effectiveness in encouraging validators to adhere to the Ethereum 2.0's protocol. This paper studies the incentive mechanism of Ethereum 2.0 and evaluates its robustness by analyzing the interaction between block proposers and attesters in a single slot. To this end, we use Bayesian games to model the strategies of block proposers and attesters and calculate their expected utilities. Our results demonstrate that the Ethereum 2.0 incentive mechanism is incentive-compatible and promotes cooperation among validators. We prove that a Bayesian Nash equilibrium and an ex ante dominant strategy exist between the block proposer and attesters in a single slot. Our research provides a solid foundation for further analysis of Ethereum 2.0's incentive mechanism and insights for individuals considering participation as a validator in Ethereum 2.0. △ Less

Submitted 6 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: This work has been accepted for publication in BSCI 2024

MSC Class: 91A27 ACM Class: C.2.4; G.3

arXiv:2405.02821 [pdf, other]

Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction

Authors: Changan Chen, Jordi Ramos, Anshul Tomar, Kristen Grauman

Abstract: Sim2real transfer has received increasing attention lately due to the success of learning robotic tasks in simulation end-to-end. While there has been a lot of progress in transferring vision-based navigation policies, the existing sim2real strategy for audio-visual navigation performs data augmentation empirically without measuring the acoustic gap. The sound differs from light in that it spans a… ▽ More Sim2real transfer has received increasing attention lately due to the success of learning robotic tasks in simulation end-to-end. While there has been a lot of progress in transferring vision-based navigation policies, the existing sim2real strategy for audio-visual navigation performs data augmentation empirically without measuring the acoustic gap. The sound differs from light in that it spans across much wider frequencies and thus requires a different solution for sim2real. We propose the first treatment of sim2real for audio-visual navigation by disentangling it into acoustic field prediction (AFP) and waypoint navigation. We first validate our design choice in the SoundSpaces simulator and show improvement on the Continuous AudioGoal navigation benchmark. We then collect real-world data to measure the spectral difference between the simulation and the real world by training AFP models that only take a specific frequency subband as input. We further propose a frequency-adaptive strategy that intelligently selects the best frequency band for prediction based on both the measured spectral difference and the energy distribution of the received audio, which improves the performance on the real data. Lastly, we build a real robot platform and show that the transferred policy can successfully navigate to sounding objects. This work demonstrates the potential of building intelligent agents that can see, hear, and act entirely from simulation, and transferring them to the real world. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.02503 [pdf, other]

Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models

Authors: Catherine Chen, Jack Merullo, Carsten Eickhoff

Abstract: Neural models have demonstrated remarkable performance across diverse ranking tasks. However, the processes and internal mechanisms along which they determine relevance are still largely unknown. Existing approaches for analyzing neural ranker behavior with respect to IR properties rely either on assessing overall model behavior or employing probing methods that may offer an incomplete understandi… ▽ More Neural models have demonstrated remarkable performance across diverse ranking tasks. However, the processes and internal mechanisms along which they determine relevance are still largely unknown. Existing approaches for analyzing neural ranker behavior with respect to IR properties rely either on assessing overall model behavior or employing probing methods that may offer an incomplete understanding of causal mechanisms. To provide a more granular understanding of internal model decision-making processes, we propose the use of causal interventions to reverse engineer neural rankers, and demonstrate how mechanistic interpretability methods can be used to isolate components satisfying term-frequency axioms within a ranking model. We identify a group of attention heads that detect duplicate tokens in earlier layers of the model, then communicate with downstream heads to compute overall document relevance. More generally, we propose that this style of mechanistic analysis opens up avenues for reverse engineering the processes neural retrieval models use to compute relevance. This work aims to initiate granular interpretability efforts that will not only benefit retrieval model development and training, but ultimately ensure safer deployment of these models. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 10 pages, 10 figures, accepted at SIGIR 2024 as perspective paper

arXiv:2405.01679 [pdf, other]

Enhanced primordial gravitational waves from a stiff post-inflationary era due to an oscillating inflaton

Authors: Chao Chen, Konstantinos Dimopoulos, Cem Eröncel, Anish Ghoshal

Abstract: We investigate two classes of inflationary models, which lead to a stiff period after inflation that boosts the signal of primordial gravitational waves (GWs). In both families of models studied, we consider an oscillating scalar condensate, which when far away from the minimum it is overdamped by a warped kinetic term, a la $αあるふぁ$-attractors. This leads to successful inflation. The oscillating conde… ▽ More We investigate two classes of inflationary models, which lead to a stiff period after inflation that boosts the signal of primordial gravitational waves (GWs). In both families of models studied, we consider an oscillating scalar condensate, which when far away from the minimum it is overdamped by a warped kinetic term, a la $αあるふぁ$-attractors. This leads to successful inflation. The oscillating condensate is in danger of becoming fragmented by resonant effects when non-linearities take over. Consequently, the stiff phase cannot be prolonged enough to enhance primordial GWs at frequencies observable in the near future for low orders of the envisaged scalar potential. However, this is not the case for a higher-order scalar potential. Indeed, we show that this case results in a boosted GW spectrum that overlaps with future observations without generating too much GW radiation to de-stabilise Big Bang Nucleosynthesis. For example, taking $αあるふぁ={\cal O}(1)$, we find that the GW signal can be safely enhanced up to $Ωおめが_{\rm GW}(f)\sim 10^{-11}$ at frequency $f\sim 10^2\,$Hz, which will be observable by the Einstein Telescope (ET). Our mechanism ends up with a characteristic GW spectrum, which if observed, can lead to the determination of the inflation energy scale, the reheating temperature and the shape (steepness) of the scalar potential around the minimum. △ Less

Submitted 20 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 21 pages, 6 figures; references added

arXiv:2405.01494 [pdf, other]

Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models

Authors: Matias Mendieta, Guangyu Sun, Chen Chen

Abstract: Federated learning (FL) enables multiple clients to train models collectively while preserving data privacy. However, FL faces challenges in terms of communication cost and data heterogeneity. One-shot federated learning has emerged as a solution by reducing communication rounds, improving efficiency, and providing better security against eavesdropping attacks. Nevertheless, data heterogeneity rem… ▽ More Federated learning (FL) enables multiple clients to train models collectively while preserving data privacy. However, FL faces challenges in terms of communication cost and data heterogeneity. One-shot federated learning has emerged as a solution by reducing communication rounds, improving efficiency, and providing better security against eavesdropping attacks. Nevertheless, data heterogeneity remains a significant challenge, impacting performance. This work explores the effectiveness of diffusion models in one-shot FL, demonstrating their applicability in addressing data heterogeneity and improving FL performance. Additionally, we investigate the utility of our diffusion model approach, FedDiff, compared to other one-shot FL methods under differential privacy (DP). Furthermore, to improve generated sample quality under DP settings, we propose a pragmatic Fourier Magnitude Filtering (FMF) method, enhancing the effectiveness of generated data for global model training. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.01461 [pdf, other]

SATO: Stable Text-to-Motion Framework

Authors: Wenshuo Chen, Hongru Xiao, Erhang Zhang, Lijie Hu, Lei Wang, Mengyuan Liu, Chen Chen

Abstract: Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily stem from more accurate predictions of specific actions. However, the text modality typically relies solely on pre-trained Contrastive Language-Image Pretraining (CLIP) models. Our research has uncovered a significant issue with the text-to-motion model: its predictions often exhibit inconsistent outputs, re… ▽ More Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily stem from more accurate predictions of specific actions. However, the text modality typically relies solely on pre-trained Contrastive Language-Image Pretraining (CLIP) models. Our research has uncovered a significant issue with the text-to-motion model: its predictions often exhibit inconsistent outputs, resulting in vastly different or even incorrect poses when presented with semantically similar or identical text inputs. In this paper, we undertake an analysis to elucidate the underlying causes of this instability, establishing a clear link between the unpredictability of model outputs and the erratic attention patterns of the text encoder module. Consequently, we introduce a formal framework aimed at addressing this issue, which we term the Stable Text-to-Motion Framework (SATO). SATO consists of three modules, each dedicated to stable attention, stable prediction, and maintaining a balance between accuracy and robustness trade-off. We present a methodology for constructing an SATO that satisfies the stability of attention and prediction. To verify the stability of the model, we introduced a new textual synonym perturbation dataset based on HumanML3D and KIT-ML. Results show that SATO is significantly more stable against synonyms and other slight perturbations while keeping its high accuracy performance. △ Less

Submitted 3 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.01429 [pdf, other]

Co-rank $1$ Arithmetic Siegel--Weil IV: Analytic local-to-global

Authors: Ryan C. Chen

Abstract: This is the fourth in a sequence of four papers, where we prove the arithmetic Siegel--Weil formula in co-rank $1$ for Kudla--Rapoport special cycles on exotic smooth integral models of unitary Shimura varieties of arbitrarily large even arithmetic dimension. Our arithmetic Siegel--Weil formula implies that degrees of Kudla--Rapoport arithmetic special $1$-cycles are encoded in the first derivativ… ▽ More This is the fourth in a sequence of four papers, where we prove the arithmetic Siegel--Weil formula in co-rank $1$ for Kudla--Rapoport special cycles on exotic smooth integral models of unitary Shimura varieties of arbitrarily large even arithmetic dimension. Our arithmetic Siegel--Weil formula implies that degrees of Kudla--Rapoport arithmetic special $1$-cycles are encoded in the first derivatives of unitary Eisenstein series Fourier coefficients. In this paper, we pin down precise normalizations for some $U(m,m)$ Siegel Eisenstein series, give local Siegel--Weil special value formulas with explicit constants, and record a geometric Siegel--Weil result for degrees of complex $0$-cycles. Using this, we complete the proof of our arithmetic Siegel--Weil results by patching together the local main theorems from our companion papers. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 69 pages

arXiv:2405.01428 [pdf, other]

Co-rank $1$ Arithmetic Siegel--Weil III: Geometric local-to-global

Authors: Ryan C. Chen

Abstract: This is the third in a sequence of four papers, where we prove the arithmetic Siegel--Weil formula in co-rank $1$ for Kudla--Rapoport special cycles on exotic smooth integral models of unitary Shimura varieties of arbitrarily large even arithmetic dimension. Our arithmetic Siegel--Weil formula implies that degrees of Kudla--Rapoport arithmetic special $1$-cycles are encoded in the first derivative… ▽ More This is the third in a sequence of four papers, where we prove the arithmetic Siegel--Weil formula in co-rank $1$ for Kudla--Rapoport special cycles on exotic smooth integral models of unitary Shimura varieties of arbitrarily large even arithmetic dimension. Our arithmetic Siegel--Weil formula implies that degrees of Kudla--Rapoport arithmetic special $1$-cycles are encoded in the first derivatives of unitary Eisenstein series Fourier coefficients. In this paper, we finish the reduction process from global arithmetic intersection numbers for special cycles to the local geometric quantities in our companion papers. Building on our previous companion papers, we also propose a construction for arithmetic special cycle classes associated to possibly singular matrices of arbitrary co-rank. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 67 pages

arXiv:2405.01427 [pdf, other]

Co-rank $1$ Arithmetic Siegel--Weil II: Local Archimedean

Authors: Ryan C. Chen

Abstract: This is the second in a sequence of four papers, where we prove the arithmetic Siegel--Weil formula in co-rank $1$ for Kudla--Rapoport special cycles on exotic smooth integral models of unitary Shimura varieties of arbitrarily large even arithmetic dimension. Our arithmetic Siegel--Weil formula implies that degrees of Kudla--Rapoport arithmetic special $1$-cycles are encoded in the first derivativ… ▽ More This is the second in a sequence of four papers, where we prove the arithmetic Siegel--Weil formula in co-rank $1$ for Kudla--Rapoport special cycles on exotic smooth integral models of unitary Shimura varieties of arbitrarily large even arithmetic dimension. Our arithmetic Siegel--Weil formula implies that degrees of Kudla--Rapoport arithmetic special $1$-cycles are encoded in the first derivatives of unitary Eisenstein series Fourier coefficients. In this paper, we formulate and prove the key Archimedean local theorem. In the case of purely Archimedean intersection numbers, we also prove an Archimedean local arithmetic Siegel--Weil formula, relating Green currents of arbitrary degree and off-central derivatives of local Whittaker functions. The crucial input is a new limiting method, which is structurally parallel to our strategy at non-Archimedean places. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 29 pages

arXiv:2405.01426 [pdf, other]

Co-rank $1$ Arithmetic Siegel--Weil I: Local non-Archimedean

Authors: Ryan C. Chen

Abstract: This is the first in a sequence of four papers, where we prove the arithmetic Siegel--Weil formula in co-rank $1$ for Kudla--Rapoport special cycles on exotic smooth integral models of unitary Shimura varieties of arbitrarily large even arithmetic dimension. Our arithmetic Siegel--Weil formula implies that degrees of Kudla--Rapoport arithmetic special $1$-cycles are encoded in the first derivative… ▽ More This is the first in a sequence of four papers, where we prove the arithmetic Siegel--Weil formula in co-rank $1$ for Kudla--Rapoport special cycles on exotic smooth integral models of unitary Shimura varieties of arbitrarily large even arithmetic dimension. Our arithmetic Siegel--Weil formula implies that degrees of Kudla--Rapoport arithmetic special $1$-cycles are encoded in the first derivatives of unitary Eisenstein series Fourier coefficients. The crucial input is a new local limiting method at all places. In this paper, we formulate and prove the key local theorems at all non-Archimedean places. On the analytic side, the limit relates local Whittaker functions on different groups. On the geometric side at nonsplit non-Archimedean places, the limit relates degrees of $0$-cycles on Rapoport--Zink spaces and local contributions to heights of $1$-cycles in mixed characteristic. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 111 pages

arXiv:2405.01345 [pdf, other]

The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights

Authors: Wenhao Zhu, Shujian Huang, Fei Yuan, Cheng Chen, Jiajun Chen, Alexandra Birch

Abstract: Bridging the significant gap between large language model's English and non-English performance presents a great challenge. While some previous studies attempt to mitigate this gap with translated training data, the recently proposed question alignment approach leverages the model's English expertise to improve multilingual performance with minimum usage of expensive, error-prone translation. In t… ▽ More Bridging the significant gap between large language model's English and non-English performance presents a great challenge. While some previous studies attempt to mitigate this gap with translated training data, the recently proposed question alignment approach leverages the model's English expertise to improve multilingual performance with minimum usage of expensive, error-prone translation. In this paper, we explore how broadly this method can be applied by examining its effects in reasoning with executable code and reasoning with common sense. We also explore how to apply this approach efficiently to extremely large language models using proxy-tuning. Experiment results on multilingual reasoning benchmarks mGSM, mSVAMP and xCSQA demonstrate that the question alignment approach can be used to boost multilingual performance across diverse reasoning scenarios, model families, and sizes. For instance, when applied to the LLaMA2 models, our method brings an average accuracy improvements of 12.2% on mGSM even with the 70B model. To understand the mechanism of its success, we analyze representation space, chain-of-thought and translation data scales, which reveals how question translation training strengthens language alignment within LLMs and shapes their working patterns. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00942 [pdf, other]

LLaVA Finds Free Lunch: Teaching Human Behavior Improves Content Understanding Abilities Of LLMs

Authors: Somesh Singh, Harini S I, Yaman K Singla, Veeky Baths, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

Abstract: Communication is defined as "Who says what to whom with what effect." A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about it. Even after carrying signals about the message, the behavior data is often ignored while training large language models. We show that training LLM… ▽ More Communication is defined as "Who says what to whom with what effect." A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about it. Even after carrying signals about the message, the behavior data is often ignored while training large language models. We show that training LLMs on receiver behavior can actually help improve their content-understanding abilities. Specifically, we show that training LLMs to predict the receiver behavior of likes and comments improves the LLM's performance on a wide variety of downstream content understanding tasks. We show this performance increase over 40 video and image understanding tasks over 23 benchmark datasets across both 0-shot and fine-tuning settings, outperforming many supervised baselines. Moreover, since receiver behavior, such as likes and comments, is collected by default on the internet and does not need any human annotations to be useful, the performance improvement we get after training on this data is essentially free-lunch. We release the receiver behavior cleaned comments and likes of 750k images and videos collected from multiple platforms along with our instruction-tuning data. △ Less

Submitted 16 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00098 [pdf, other]

Amplitude analysis and branching fraction measurement of $B^{+}\to D^{*-}D^{+}_{s}πぱい^{+}$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1057 additional authors not shown)

Abstract: The decays of the $B^{+}$ meson to the final state $D^{*-}D^{+}_{s}πぱい^{+}$ are studied in proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV, corresponding to a total integrated luminosity of 9 fb$^{-1}$. The ratio of branching fractions of the $B^{+}\to D^{*-}D^{+}_{s}πぱい^{+}$ and $B^{0}\to D^{*-}D^{+}_{s}$ decays is measured to be… ▽ More The decays of the $B^{+}$ meson to the final state $D^{*-}D^{+}_{s}πぱい^{+}$ are studied in proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV, corresponding to a total integrated luminosity of 9 fb$^{-1}$. The ratio of branching fractions of the $B^{+}\to D^{*-}D^{+}_{s}πぱい^{+}$ and $B^{0}\to D^{*-}D^{+}_{s}$ decays is measured to be $0.173\pm 0.006\pm 0.010$, where the first uncertainty is statistical and the second is systematic. Using partially reconstructed $D^{*+}_{s}\to D^{+}_{s}γがんま$ and $D^{+}_{s}πぱい^{0}$ decays, the ratio of branching fractions between the $B^{+}\to D^{*-}D^{*+}_{s}πぱい^{+}$ and $B^{+}\to D^{*-}D^{+}_{s}πぱい^{+}$ decays is determined as $1.31\pm 0.07\pm 0.14$. An amplitude analysis of the $B^{+}\to D^{*-}D^{+}_{s}πぱい^{+}$ decay is performed for the first time, revealing dominant contributions from known excited charm resonances decaying to the $D^{*-}πぱい^{+}$ final state. No significant evidence of exotic contributions in the $D^{+}_{s}πぱい^{+}$ or $D^{*-}D^{+}_{s}$ channels is found. The fit fraction of the scalar state $T_{c\bar{s} 0}^{\ast}(2900)^{++}$ observed in the $B^{+}\to D^{-}D^{+}_{s}πぱい^{+}$ decay is determined to be less than 2.3% at a 90% confidence level. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-001.html (LHCb public pages)

Report number: LHCb-PAPER-2024-001, CERN-EP-2024-110

arXiv:2404.19510 [pdf, other]

First observation of $Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{(*)++} D^{(*)-} K^{-}$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1067 additional authors not shown)

Abstract: The four decays, $Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{(*)++} D^{(*)-} K^{-}$, are observed for the first time using proton-proton collision data collected with the LHCb detector at a centre-of-mass energy of $13\,\rm{TeV}$, corresponding to an integrated luminosity of $6\,\rm{fb}^{-1}$. By considering the $Λらむだ_b^0 \rightarrow Λらむだ_c^{+} \overline{D}^0 K^{-}$ decay as reference channel, the following branching f… ▽ More The four decays, $Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{(*)++} D^{(*)-} K^{-}$, are observed for the first time using proton-proton collision data collected with the LHCb detector at a centre-of-mass energy of $13\,\rm{TeV}$, corresponding to an integrated luminosity of $6\,\rm{fb}^{-1}$. By considering the $Λらむだ_b^0 \rightarrow Λらむだ_c^{+} \overline{D}^0 K^{-}$ decay as reference channel, the following branching fraction ratios are measured to be, $$\frac{\cal{B} (Λらむだ_{b}^{0} \rightarrow Σしぐま_{c}^{++} \rm{D}^{-} {K}^{-})}{\cal{B}(Λらむだ_{b}^{0} \rightarrow Λらむだ_c^{+} \rm \overline{D}^0 {K}^{-})} = {0.282}\pm{0.016}\pm{0.016}\pm{0.005}, \frac{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_{c}^{*++} \rm {D}^{-} {K}^{-})}{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{++} \rm {D}^{-} {K}^{-})} = {0.460}\pm{0.052}\pm{0.028}, \frac{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_{c}^{++} \rm {D}^{*-} {K}^{-})}{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{++} \rm {D}^{-} {K}^{-})} = {2.261}\pm{0.202}\pm{0.129}\pm{0.046}, \frac{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_{c}^{*++} \rm D^{*-} K^{-})}{\cal{B}(Λらむだ_{b}^{0} \rightarrow Σしぐま_c^{++} \rm D^{-} K^{-})} = {0.896}\pm{0.137}\pm{0.066}\pm{0.018},$$ where the first uncertainties are statistical, the second are systematic, and the third are due to uncertainties in the branching fractions of intermediate particle decays. These initial observations mark the beginning of pentaquark searches in these modes, with more data set to become available following the LHCb upgrade. △ Less

Submitted 11 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-044.html (LHCb public pages)

Report number: LHCb-PAPER-2023-044, CERN-EP-2024-098

arXiv:2404.19307 [pdf, other]

Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated Monkey

Authors: Han Hu, Han Wang, Ruiqi Dong, Xiao Chen, Chunyang Chen

Abstract: Mobile apps are ubiquitous in our daily lives for supporting different tasks such as reading and chatting. Despite the availability of many GUI testing tools, app testers still struggle with low testing code coverage due to tools frequently getting stuck in loops or overlooking activities with concealed entries. This results in a significant amount of testing time being spent on redundant and repe… ▽ More Mobile apps are ubiquitous in our daily lives for supporting different tasks such as reading and chatting. Despite the availability of many GUI testing tools, app testers still struggle with low testing code coverage due to tools frequently getting stuck in loops or overlooking activities with concealed entries. This results in a significant amount of testing time being spent on redundant and repetitive exploration of a few GUI pages. To address this, we utilize Android's deep links, which assist in triggering Android intents to lead users to specific pages and introduce a deep link-enhanced exploration method. This approach, integrated into the testing tool Monkey, gives rise to Delm (Deep Link-enhanced Monkey). Delm oversees the dynamic exploration process, guiding the tool out of meaningless testing loops to unexplored GUI pages. We provide a rigorous activity context mock-up approach for triggering existing Android intents to discover more activities with hidden entrances. We conduct experiments to evaluate Delm's effectiveness on activity context mock-up, activity coverage, method coverage, and crash detection. The findings reveal that Delm can mock up more complex activity contexts and significantly outperform state-of-the-art baselines with 27.2\% activity coverage, 21.13\% method coverage, and 23.81\% crash detection. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.18612 [pdf]

Enhancing Prosthetic Safety and Environmental Adaptability: A Visual-Inertial Prosthesis Motion Estimation Approach on Uneven Terrains

Authors: Chuheng Chen, Xinxing Chen, Shucong Yin, Yuxuan Wang, Binxin Huang, Yuquan Leng, Chenglong Fu

Abstract: Environment awareness is crucial for enhancing walking safety and stability of amputee wearing powered prosthesis when crossing uneven terrains such as stairs and obstacles. However, existing environmental perception systems for prosthesis only provide terrain types and corresponding parameters, which fails to prevent potential collisions when crossing uneven terrains and may lead to falls and oth… ▽ More Environment awareness is crucial for enhancing walking safety and stability of amputee wearing powered prosthesis when crossing uneven terrains such as stairs and obstacles. However, existing environmental perception systems for prosthesis only provide terrain types and corresponding parameters, which fails to prevent potential collisions when crossing uneven terrains and may lead to falls and other severe consequences. In this paper, a visual-inertial motion estimation approach is proposed for prosthesis to perceive its movement and the changes of spatial relationship between the prosthesis and uneven terrain when traversing them. To achieve this, we estimate the knee motion by utilizing a depth camera to perceive the environment and align feature points extracted from stairs and obstacles. Subsequently, an error-state Kalman filter is incorporated to fuse the inertial data into visual estimations to reduce the feature extraction error and obtain a more robust estimation. The motion of prosthetic joint and toe are derived using the prosthesis model parameters. Experiment conducted on our collected dataset and stair walking trials with a powered prosthesis shows that the proposed method can accurately tracking the motion of the human leg and prosthesis with an average root-mean-square error of toe trajectory less than 5 cm. The proposed method is expected to enable the environmental adaptive control for prosthesis, thereby enhancing amputee's safety and mobility in uneven terrains. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18343 [pdf, other]

G-Refine: A General Quality Refiner for Text-to-Image Generation

Authors: Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai

Abstract: With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compro… ▽ More With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compromising the integrity of high-quality ones. The model is composed of three interconnected modules: a perception quality indicator, an alignment quality indicator, and a general quality enhancement module. Based on the mechanisms of the Human Visual System (HVS) and syntax trees, the first two indicators can respectively identify the perception and alignment deficiencies, and the last module can apply targeted quality enhancement accordingly. Extensive experimentation reveals that when compared to alternative optimization methods, AIGIs after G-Refine outperform in 10+ quality metrics across 4 databases. This improvement significantly contributes to the practical application of contemporary T2I models, paving the way for their broader adoption. The code will be released on https://github.com/Q-Future/Q-Refine. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18203 [pdf, other]

LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

Authors: Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, Guangtao Zhai

Abstract: Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs thro… ▽ More Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs through text supervision. To achieve this, we transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds. To compensate for the loss of perception in the 3D domain, structural features are extracted as well. These quality logits and structural features are then combined and regressed into quality scores. Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA that enhances model understanding and assessment accuracy. We hope our contributions can inspire subsequent investigations into the fusion of LMMs with PCQA, fostering advancements in 3D visual quality analysis and beyond. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18045 [pdf, other]

doi 10.1021/acsanm.4c00914

Blood Works for Graphene Production

Authors: Xiaofan Cai, Ming Li, Chao Chen, Renjun Du, Zijing Guo, Ping Wang, Guodong Ma, Xinglong Wu, Zhiyuan Wang, Yaqing Han, Fuzhuo Lian, Jingkuan Xiao, Siqi Jiang, Lei Wang, Alexander S. Mayorov, Libo Gao, Kostya S. Novoselov, Geliang Yu

Abstract: Blood, a ubiquitous and fundamental carbohydrate material composed of plasma, red blood cells, white blood cells, and platelets, has been playing an important role in biology, life science, history, and religious study, while graphene has garnered significant attention due to its exceptional properties and extensive range of potential applications. Achieving environmentally friendly, cost-effectiv… ▽ More Blood, a ubiquitous and fundamental carbohydrate material composed of plasma, red blood cells, white blood cells, and platelets, has been playing an important role in biology, life science, history, and religious study, while graphene has garnered significant attention due to its exceptional properties and extensive range of potential applications. Achieving environmentally friendly, cost-effective growth using hybrid precursors and obtaining high-quality graphene through a straightforward CVD process has been traditionally considered mutually exclusive. This study demonstrates that we can produce high-quality graphene domains with controlled thickness through a one-step growth process at atmospheric pressure using blood as a precursor. Raman spectroscopy confirms the uniformity of the blood-grown graphene films, and observing the half-integer quantum Hall effect in the measured devices highlights its outstanding electronic properties. This unprecedented approach opens possibilities for blood application, facilitating an unconventional route in graphene growth applications. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Showing 101–150 of 6,848 results for author: Chen, C