Search | arXiv e-print repository

Learning-Based Joint Antenna Selection and Precoding Design for Cell-Free MIMO Networks

Authors: Liangzhi Wang, Chen Chen, Carlo Fischione, Jie Zhang

Abstract: This paper considers a downlink cell-free multiple-input multiple-output (MIMO) network in which multiple multi-antenna base stations (BSs) serve multiple users via coherent joint transmission. In order to reduce the energy consumption by radio frequency components, each BS selects a subset of antennas for downlink data transmission after estimating the channel state information (CSI). We aim to m… ▽ More This paper considers a downlink cell-free multiple-input multiple-output (MIMO) network in which multiple multi-antenna base stations (BSs) serve multiple users via coherent joint transmission. In order to reduce the energy consumption by radio frequency components, each BS selects a subset of antennas for downlink data transmission after estimating the channel state information (CSI). We aim to maximize the sum spectral efficiency by jointly optimizing the antenna selection and precoding design. To alleviate the fronthaul overhead and enable real-time network operation, we propose a distributed scalable machine learning algorithm. In particular, at each BS, we deploy a convolutional neural network (CNN) for antenna selection and a graph neural network (GNN) for precoding design. Different from conventional centralized solutions that require a large amount of CSI and signaling exchange among the BSs, the proposed distributed machine learning algorithm takes only locally estimated CSI as input. With well-trained learning models, it is shown that the proposed algorithm significantly outperforms the distributed baseline schemes and achieves a sum spectral efficiency comparable to its centralized counterpart. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2404.08364 [pdf, other]

FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework

Authors: Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, Xiaoliang Cong

Abstract: Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarras… ▽ More Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2x, 72.1x, and 16.4x speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training. △ Less

Submitted 26 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.08133 [pdf, other]

Search for rare $b \to d\ell^+\ell^-$ transitions at Belle

Authors: Belle, Belle II Collaborations, :, I. Adachi, L. Aggarwal, H. Aihara, N. Akopov, A. Aloisio, S. Al Said, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Beaubien, F. Becherer, J. Becker , et al. (371 additional authors not shown)

Abstract: We present the results of a search for the $b \to d\ell^+\ell^-$ flavor-changing neutral-current rare decays $B^{+, 0} \to (ηいーた, ωおめが, πぱい^{+,0}, ρろー^{+, 0}) e^+e^-$ and $B^{+, 0} \to (ηいーた, ωおめが, πぱい^{0}, ρろー^{+}) μみゅー^+μみゅー^-$ using a $711$ fb$^{-1}$ data sample that contains $772 \times 10^{6}$ $B\overline{B}$ events. The data were collected at the $Υうぷしろん(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy… ▽ More We present the results of a search for the $b \to d\ell^+\ell^-$ flavor-changing neutral-current rare decays $B^{+, 0} \to (ηいーた, ωおめが, πぱい^{+,0}, ρろー^{+, 0}) e^+e^-$ and $B^{+, 0} \to (ηいーた, ωおめが, πぱい^{0}, ρろー^{+}) μみゅー^+μみゅー^-$ using a $711$ fb$^{-1}$ data sample that contains $772 \times 10^{6}$ $B\overline{B}$ events. The data were collected at the $Υうぷしろん(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+e^-$ collider. We find no evidence for signal and set upper limits on branching fractions at the $90\%$ confidence level in the range $(3.8 - 47) \times 10^{-8}$ depending on the decay channel. The obtained limits are the world's best results. This is the first search for the channels $B^{+, 0} \to (ωおめが, ρろー^{+,0}) e^+e^-$ and $B^{+, 0} \to (ωおめが, ρろー^{+})μみゅー^+μみゅー^-$. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 7 pages, 12 figures

Report number: Belle II Preprint 2024-005, KEK Preprint 2023-52

arXiv:2404.08065 [pdf, other]

Ephemeral Myographic Motion: Repurposing the Myo Armband to Control Disposable Pneumatic Sculptures

Authors: Celia Chen, Alex Leitch

Abstract: This paper details the development of an interactive sculpture built from deprecated hardware technology and intentionally decomposable, transient materials. We detail a case study of "Strain" - an emotive prototype that reclaims two orphaned digital artifacts to power a kinetic sculpture made of common disposable objects. We use the Myo, an abandoned myoelectric armband, in concert with the Progr… ▽ More This paper details the development of an interactive sculpture built from deprecated hardware technology and intentionally decomposable, transient materials. We detail a case study of "Strain" - an emotive prototype that reclaims two orphaned digital artifacts to power a kinetic sculpture made of common disposable objects. We use the Myo, an abandoned myoelectric armband, in concert with the Programmable Air, a soft-robotics prototyping project, to manipulate a pneumatic bladder array constructed from condoms, bamboo skewers, and a small library of 3D printed PLA plastic connectors designed to work with these generic parts. The resulting sculpture achieves surprisingly organic actuation. The goal of this project is to produce several reusable components: software to resuscitate the Myo Armband, homeostasis software for the Programmable Air or equivalent pneumatic projects, and a library of easily-printed parts that will work with generic bamboo disposables for sculptural prototyping. This project works to develop usable, repeatable engineering by applying it to a slightly whimsical object that promotes a strong emotional response in its audience. Through this, we transform the disposable into the sustainable. In this paper, we reflect on project-based insights into rescuing and revitalizing abandoned consumer electronics for future works. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures, accepted to CHI2024 workshop "Sustainable Unmaking: Designing for Biodegradation, Decay, and Disassembly"

arXiv:2404.08045 [pdf, other]

JWST Discovery of $40+$ Microlensed Stars in a Magnified Galaxy, the "Dragon" behind Abell 370

Authors: Yoshinobu Fudamoto, Fengwu Sun, Jose M. Diego, Liang Dai, Masamune Oguri, Adi Zitrin, Erik Zackrisson, Mathilde Jauzac, David J. Lagattuta, Eiichi Egami, Edoardo Iani, Rogier A. Windhorst, Katsuya T. Abe, Franz Erik Bauer, Fuyan Bian, Rachana Bhatawdekar, Thomas J. Broadhurst, Zheng Cai, Chian-Chou Chen, Wenlei Chen, Seth H. Cohen, Christopher J. Conselice, Daniel Espada, Nicholas Foo, Brenda L. Frye , et al. (21 additional authors not shown)

Abstract: Strong gravitational magnification by massive galaxy clusters enable us to detect faint background sources, resolve their detailed internal structures, and in the most extreme cases identify and study individual stars in distant galaxies. Highly magnified individual stars allow for a wide range of applications, including studies of stellar populations in distant galaxies and constraining small-sca… ▽ More Strong gravitational magnification by massive galaxy clusters enable us to detect faint background sources, resolve their detailed internal structures, and in the most extreme cases identify and study individual stars in distant galaxies. Highly magnified individual stars allow for a wide range of applications, including studies of stellar populations in distant galaxies and constraining small-scale dark matter structures. However, these applications have been hampered by the small number of events observed, as typically one or a few stars are identified from each distant galaxy. Here, we report the discovery of 46 significant microlensed stars in a single strongly-lensed high-redshift galaxy behind the Abell 370 cluster at redshift of 0.725 when the Universe was half of its current age (dubbed the ``Dragon arc''), based on two observations separated by one year with the James Webb Space Telescope ({\it JWST}). These events are mostly found near the expected lensing critical curves, suggesting that these are magnified individual stars that appear as transients from intracluster stellar microlenses. Through multi-wavelength photometry and colors, we constrain stellar types and find that many of them are consistent with red giants/supergiants magnified by factors of thousands. This finding reveals an unprecedented high occurrence of microlensing events in the Dragon arc, and proves that {\it JWST}'s time-domain observations open up the possibility of conducting statistical studies of high-redshift stars and subgalactic scale perturbations in the lensing dark matter field. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 15 pages, 4 figures, 1 table submitted to Nature Astronomy

arXiv:2404.07987 [pdf, other]

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Authors: Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen

Abstract: To enhance the controllability of text-to-image diffusion models, existing efforts like ControlNet incorporated image-based conditional controls. In this paper, we reveal that existing methods still face significant challenges in generating images that align with the image conditional controls. To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicit… ▽ More To enhance the controllability of text-to-image diffusion models, existing efforts like ControlNet incorporated image-based conditional controls. In this paper, we reveal that existing methods still face significant challenges in generating images that align with the image conditional controls. To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls. Specifically, for an input conditional control, we use a pre-trained discriminative reward model to extract the corresponding condition of the generated images, and then optimize the consistency loss between the input conditional control and extracted condition. A straightforward implementation would be generating images from random noises and then calculating the consistency loss, but such an approach requires storing gradients for multiple sampling timesteps, leading to considerable time and memory costs. To address this, we introduce an efficient reward strategy that deliberately disturbs the input images by adding noise, and then uses the single-step denoised images for reward fine-tuning. This avoids the extensive costs associated with image sampling, allowing for more efficient reward fine-tuning. Extensive experiments show that ControlNet++ significantly improves controllability under various conditional controls. For example, it achieves improvements over ControlNet by 7.9% mIoU, 13.4% SSIM, and 7.6% RMSE, respectively, for segmentation mask, line-art edge, and depth conditions. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Project Page: https://liming-ai.github.io/ControlNet_Plus_Plus

arXiv:2404.07973 [pdf, other]

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Authors: Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

Abstract: While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks. In this work, we unveil Ferret-v2, a significant upgrade to Ferret, with three key designs. (1) Any resolution grounding and… ▽ More While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks. In this work, we unveil Ferret-v2, a significant upgrade to Ferret, with three key designs. (1) Any resolution grounding and referring: A flexible approach that effortlessly handles higher image resolution, improving the model's ability to process and understand images in greater detail. (2) Multi-granularity visual encoding: By integrating the additional DINOv2 encoder, the model learns better and diverse underlying contexts for global and fine-grained visual information. (3) A three-stage training paradigm: Besides image-caption alignment, an additional stage is proposed for high-resolution dense alignment before the final instruction tuning. Experiments show that Ferret-v2 provides substantial improvements over Ferret and other state-of-the-art methods, thanks to its high-resolution scaling and fine-grained visual processing. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Preprint. 14 pages, 4 figures

arXiv:2404.07894 [pdf, ps, other]

Typical blocks of the category $\mathcal O$ and Whittaker modules for Takiff superalgebras

Authors: Chih-Whi Chen, Yongjie Wang

Abstract: We study the simplicity of Kac induced modules over the $\ell$-th Takiff superalgebras $\widetilde{\mathfrak g}_\ell:= \widetilde{\mathfrak g}\otimes \mathbb C[θしーた]/(θしーた^{\ell+1})$, for $\ell>0$, associated with the Lie superalgebras $\widetilde{\mathfrak g}$ of type I. We formulate a general notion of typical weights and typical Jordan blocks of the category $\mathcal O$ for… ▽ More We study the simplicity of Kac induced modules over the $\ell$-th Takiff superalgebras $\widetilde{\mathfrak g}_\ell:= \widetilde{\mathfrak g}\otimes \mathbb C[θしーた]/(θしーた^{\ell+1})$, for $\ell>0$, associated with the Lie superalgebras $\widetilde{\mathfrak g}$ of type I. We formulate a general notion of typical weights and typical Jordan blocks of the category $\mathcal O$ for $\widetilde{\mathfrak g}_\ell$ associated with Lie superalgebras $\mathfrak{gl}(m|n)$, $\mathfrak{osp}(2|2n)$ and $\mathfrak{pe}(n)$. For Lie superalgebras $\mathfrak{gl}(m|n)$ and $\mathfrak{osp}(2|2n)$, we establish an equivalence from an arbitrary typical Jordan block of the category $\mathcal O$ for $\widetilde{\mathfrak g}_\ell$ to a Jordan block of the category $\mathcal O$ for the even subalgebra of $\widetilde{\mathfrak g}_\ell$. This provides a solution to the problem of determining the composition multiplicities of the Verma modules over $\widetilde{\mathfrak g}_\ell$ with typical highest weights. We also investigate non-singular Whittaker modules over these Takiff superalgebras. In particular, we obtain a classification of non-singular simple Whittaker modules and a criterion for simplicity of non-singular standard Whittaker modules. △ Less

Submitted 22 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: 34 pages

MSC Class: 17B10; 17B55

arXiv:2404.07839 [pdf, other]

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07509

Multiparameter cascaded quantum interferometer

Authors: Baihong Li, Zhuo-zhuo Wang, Qi-qi Li, Changhua Chen, Boxin Yuan, Yiwei Zhai, Rui-Bo Jin, Xiaofei Zhang

Abstract: We theoretically propose a multiparameter cascaded quantum interferometer in which a two-input and two-output setup is obtained by concatenating 50:50 beam splitters with n independent and adjustable time delays. A general method for deriving the coincidence probability of such an interferometer is given based on the linear transformation of the matrix of beam splitters. As examples, we analyze th… ▽ More We theoretically propose a multiparameter cascaded quantum interferometer in which a two-input and two-output setup is obtained by concatenating 50:50 beam splitters with n independent and adjustable time delays. A general method for deriving the coincidence probability of such an interferometer is given based on the linear transformation of the matrix of beam splitters. As examples, we analyze the interference characteristics of one-, two- and three-parameter cascaded quantum interferometers with different frequency correlations and input states. Some typical interferograms of such interferometers are provided to reveal more rich and complicated two-photon interference phenomena. In principle, arbitrary two-input and two-output experimental setups can be designed with the proposal. This work offers a toolbox for designing versatile quantum interferometers and provides a convenient method for deriving the coincidence probabilities involved. Potential applications can be found in the complete spectral characterization of two-photon states, multiparameter estimation, and quantum metrology. △ Less

Submitted 8 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: We have found a serious error in this version, which may mislead readers

arXiv:2404.07436 [pdf, other]

Measurement of $e^{+}e^{-}\to ωおめがηいーた^{\prime}$ cross sections at $\sqrt{s}=$ 2.000 to 3.080 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (599 additional authors not shown)

Abstract: The Born cross sections for the process $e^{+}e^{-}\to ωおめがηいーた^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σしぐま$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be… ▽ More The Born cross sections for the process $e^{+}e^{-}\to ωおめがηいーた^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σしぐま$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be $Γがんま_{R}=(167\pm77\pm7)~\rm{MeV}$, where the first uncertainties are statistical and the second are systematic. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.07131 [pdf, other]

Search for prompt production of pentaquarks in charm hadron final states

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, B. Adeva, M. Adinolfi, P. Adlarson, H. Afsharnia, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, A. Alfonso Albero, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey , et al. (1090 additional authors not shown)

Abstract: A search for hidden-charm pentaquark states decaying to a range of $Σしぐま_{c}\bar{D}$ and $Λらむだ_{c}\bar{D}$ final states, as well as doubly-charmed pentaquark states to $Σしぐま_{c}D$ and $Λらむだ_{c}^{+}D$, is made using samples of proton-proton collision data corresponding to an integrated luminosity of $5.7fb^{-1}$ recorded by the LHCb detector at $\sqrt{s} = 13Te\kern -0.1em V$. Since no significant signals are… ▽ More A search for hidden-charm pentaquark states decaying to a range of $Σしぐま_{c}\bar{D}$ and $Λらむだ_{c}\bar{D}$ final states, as well as doubly-charmed pentaquark states to $Σしぐま_{c}D$ and $Λらむだ_{c}^{+}D$, is made using samples of proton-proton collision data corresponding to an integrated luminosity of $5.7fb^{-1}$ recorded by the LHCb detector at $\sqrt{s} = 13Te\kern -0.1em V$. Since no significant signals are found, upper limits are set on the pentaquark yields relative to that of the $Λらむだ_{c}^{+}$ baryon in the $Λらむだ_{c}^{+}\to pK^{-}πぱい^{+}$ decay mode. The known pentaquark states are also investigated, and their signal yields are found to be consistent with zero in all cases. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-018.html (LHCb public pages)

Report number: LHCb-PAPER-2023-018, CERN-EP-2024-071

arXiv:2404.07055 [pdf, other]

Observational features of reflection asymmetric black holes

Authors: Che-Yu Chen, Hung-Yi Pu

Abstract: The Kerr spacetime is symmetric with respect to a well-defined equatorial plane. When testing the equatorial reflection symmetry of an isolated black hole, one is at the same time testing the Kerr hypothesis in General Relativity. In this work, we investigate the possible observational features when a Keplerian disk is surrounding a rotating black hole without reflection symmetry. When such symmet… ▽ More The Kerr spacetime is symmetric with respect to a well-defined equatorial plane. When testing the equatorial reflection symmetry of an isolated black hole, one is at the same time testing the Kerr hypothesis in General Relativity. In this work, we investigate the possible observational features when a Keplerian disk is surrounding a rotating black hole without reflection symmetry. When such symmetry is broken, generically, the photon trajectories around the black hole and the Keplerian orbits on the accretion disk are distorted vertically away from the equatorial plane by an amount that depends on their distance to the black hole. In the reflection asymmetric spacetime we are considering, these two kinds of orbits are distorted in opposite directions. Interestingly, while the size and shape of black hole shadows closely resemble those of Kerr black holes, distinct observational characteristics can emerge in the disk image and emission line profiles. When observing the disk edge-on, a pronounced concave shape may appear along its innermost edge on the incoming side. Furthermore, distinctive horn-like features might be observed on the spectral line profile at the blue-shifted side. These special features can serve as compelling indicators of the reflection asymmetry present in rotating black holes. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 14 pages, 7 figures

Report number: RIKEN-iTHEMS-Report-24

arXiv:2404.06845 [pdf, other]

Probing the shape of the brown dwarf desert around main-sequence A-F-G-type stars using post-common-envelope WD$-$BD binaries

Authors: Zhangliang Chen, Yizhi Chen, Chen Chen, Hongwei Ge, Bo Ma

Abstract: Brown dwarfs (BDs) possessing masses within the range $40-60 M_{\rm Jup}$ are rare around solar-type main-sequence (MS) stars, which gives rise to the brown dwarf desert (BDD). One caveat associated with previous studies of BDD is the relatively limited sample size of MS$-$BD binaries with accurately determined BD masses. We aim to produce a large sample of brown dwarf companions with precisely de… ▽ More Brown dwarfs (BDs) possessing masses within the range $40-60 M_{\rm Jup}$ are rare around solar-type main-sequence (MS) stars, which gives rise to the brown dwarf desert (BDD). One caveat associated with previous studies of BDD is the relatively limited sample size of MS$-$BD binaries with accurately determined BD masses. We aim to produce a large sample of brown dwarf companions with precisely determined mass around main-sequence A-F-G type stars using observations of post common-envelope white dwarf (WD)$-$BD binaries. We employ the rapid binary evolution code COMPAS to deduce the properties of MS$-$BD binary progenitors from post common-envelope WD$-$BD binaries. This method supplements the directly observed MS$-$BD binary sample, enriching the data available for analyzing BDD around main-sequence A-F-G type stars. Our study opens a new window for studying the shape of BDD around A-F-G type main-sequence stars in the short period regime. We find tentative evidence that the `driest' part of BDD around A-F-G type stars may extend into an orbital period of several hundred days, albeit with a small sample size. More post common-envelope WD$-$BD binaries detected in the future will advance our understanding of the BDD around A-F-G type stars. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures. Accepted for publication in A&A

arXiv:2404.06749 [pdf, other]

CGNSDE: Conditional Gaussian Neural Stochastic Differential Equation for Modeling Complex Systems and Data Assimilation

Authors: Chuanqi Chen, Nan Chen, Jin-Long Wu

Abstract: A new knowledge-based and machine learning hybrid modeling approach, called conditional Gaussian neural stochastic differential equation (CGNSDE), is developed to facilitate modeling complex dynamical systems and implementing analytic formulae of the associated data assimilation (DA). In contrast to the standard neural network predictive models, the CGNSDE is designed to effectively tackle both fo… ▽ More A new knowledge-based and machine learning hybrid modeling approach, called conditional Gaussian neural stochastic differential equation (CGNSDE), is developed to facilitate modeling complex dynamical systems and implementing analytic formulae of the associated data assimilation (DA). In contrast to the standard neural network predictive models, the CGNSDE is designed to effectively tackle both forward prediction tasks and inverse state estimation problems. The CGNSDE starts by exploiting a systematic causal inference via information theory to build a simple knowledge-based nonlinear model that nevertheless captures as much explainable physics as possible. Then, neural networks are supplemented to the knowledge-based model in a specific way, which not only characterizes the remaining features that are challenging to model with simple forms but also advances the use of analytic formulae to efficiently compute the nonlinear DA solution. These analytic formulae are used as an additional computationally affordable loss to train the neural networks that directly improve the DA accuracy. This DA loss function promotes the CGNSDE to capture the interactions between state variables and thus advances its modeling skills. With the DA loss, the CGNSDE is more capable of estimating extreme events and quantifying the associated uncertainty. Furthermore, crucial physical properties in many complex systems, such as the translate-invariant local dependence of state variables, can significantly simplify the neural network structures and facilitate the CGNSDE to be applied to high-dimensional systems. Numerical experiments based on chaotic systems with intermittency and strong non-Gaussian features indicate that the CGNSDE outperforms knowledge-based regression models, and the DA loss further enhances the modeling skills of the CGNSDE. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06718 [pdf, other]

Measurement of the Born cross section for $e^{+}e^{-}\to ηいーたh_c $ at center-of-mass energies between 4.1 and 4.6\,GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηいーたh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σしぐま$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth,… ▽ More We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηいーたh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σしぐま$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth, where the first uncertainties are statistical and the second systematic. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06715 [pdf, other]

Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data

Authors: Aakash Kumar, Chen Chen, Ajmal Mian, Neils Lobo, Mubarak Shah

Abstract: 3D detection is a critical task that enables machines to identify and locate objects in three-dimensional space. It has a broad range of applications in several fields, including autonomous driving, robotics and augmented reality. Monocular 3D detection is attractive as it requires only a single camera, however, it lacks the accuracy and robustness required for real world applications. High resolu… ▽ More 3D detection is a critical task that enables machines to identify and locate objects in three-dimensional space. It has a broad range of applications in several fields, including autonomous driving, robotics and augmented reality. Monocular 3D detection is attractive as it requires only a single camera, however, it lacks the accuracy and robustness required for real world applications. High resolution LiDAR on the other hand, can be expensive and lead to interference problems in heavy traffic given their active transmissions. We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection. Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor. Specifically, we use only 512 points, which is just 1% of a full LiDAR frame in the KITTI dataset. Our method reconstructs a complete 3D point cloud from this limited 3D information combined with a single image. The reconstructed 3D point cloud and corresponding image can be used by any multi-modal off-the-shelf detector for 3D object detection. By using the proposed network architecture with an off-the-shelf multi-modal 3D detector, the accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods and 6% to 9% compare to the baseline multi-modal methods on KITTI and JackRabbot datasets. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06663 [pdf, other]

Multi-modal Document Presentation Attack Detection With Forensics Trace Disentanglement

Authors: Changsheng Chen, Yongyi Deng, Liangwei Lin, Zitong Yu, Zhimao Lai

Abstract: Document Presentation Attack Detection (DPAD) is an important measure in protecting the authenticity of a document image. However, recent DPAD methods demand additional resources, such as manual effort in collecting additional data or knowing the parameters of acquisition devices. This work proposes a DPAD method based on multi-modal disentangled traces (MMDT) without the above drawbacks. We first… ▽ More Document Presentation Attack Detection (DPAD) is an important measure in protecting the authenticity of a document image. However, recent DPAD methods demand additional resources, such as manual effort in collecting additional data or knowing the parameters of acquisition devices. This work proposes a DPAD method based on multi-modal disentangled traces (MMDT) without the above drawbacks. We first disentangle the recaptured traces by a self-supervised disentanglement and synthesis network to enhance the generalization capacity in document images with different contents and layouts. Then, unlike the existing DPAD approaches that rely only on data in the RGB domain, we propose to explicitly employ the disentangled recaptured traces as new modalities in the transformer backbone through adaptive multi-modal adapters to fuse RGB/trace features efficiently. Visualization of the disentangled traces confirms the effectiveness of the proposed method in different document contents. Extensive experiments on three benchmark datasets demonstrate the superiority of our MMDT method on representing forensic traces of recapturing distortion. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Accepted to ICME 2024

arXiv:2404.05973 [pdf, ps, other]

Search for the Rare Decays $D_s^+\to h^+(h^{0})e^+e^-$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (618 additional authors not shown)

Abstract: Using 7.33~fb$^{-1}$ of $e^{+}e^{-}$ collision data collected by the BESIII detector at center-of-mass energies in the range of $\sqrt{s}=4.128 - 4.226$~GeV, we search for the rare decays $D_{s}^+\to h^+(h^{0})e^{+}e^{-}$, where $h$ represents a kaon or pion. By requiring the $e^{+}e^{-}$ invariant mass to be consistent with a $φふぁい(1020)$, $0.98<M(e^{+}e^{-})<1.04$ ~GeV/$c^2$, the decay… ▽ More Using 7.33~fb$^{-1}$ of $e^{+}e^{-}$ collision data collected by the BESIII detector at center-of-mass energies in the range of $\sqrt{s}=4.128 - 4.226$~GeV, we search for the rare decays $D_{s}^+\to h^+(h^{0})e^{+}e^{-}$, where $h$ represents a kaon or pion. By requiring the $e^{+}e^{-}$ invariant mass to be consistent with a $φふぁい(1020)$, $0.98<M(e^{+}e^{-})<1.04$ ~GeV/$c^2$, the decay $D_s^+\toπぱい^+φふぁい,φふぁい\to e^{+}e^{-}$ is observed with a statistical significance of 7.8$σしぐま$, and evidence for the decay $D_s^+\toρろー^+φふぁい,φふぁい\to e^{+}e^{-}$ is found for the first time with a statistical significance of 4.4$σしぐま$. The decay branching fractions are measured to be $\mathcal{B}(D_s^+\toπぱい^+φふぁい, φふぁい\to e^{+}e^{-} )=(1.17^{+0.23}_{-0.21}\pm0.03)\times 10^{-5}$, and $\mathcal{B}(D_s^+\toρろー^+φふぁい, φふぁい\to e^{+}e^{-} )=(2.44^{+0.67}_{-0.62}\pm 0.16)\times 10^{-5}$, where the first uncertainties are statistical and the second systematic. No significant signal for the three four-body decays of $D_{s}^{+}\to πぱい^{+}πぱい^{0}e^{+}e^{-},\ D_{s}^{+}\to K^{+}πぱい^{0}e^{+}e^{-}$, and $D_{s}^{+}\to K_{S}^{0}πぱい^{+}e^{+}e^{-}$ is observed. For $D_{s}^{+}\to πぱい^{+}πぱい^{0}e^{+}e^{-}$, the $φふぁい$ mass region is vetoed to minimize the long-distance effects. The 90$\%$ confidence level upper limits set on the branching fractions of these decays are in the range of $(7.0-8.1)\times 10^{-5}$. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 10 pages, 2 figures, 1 table

arXiv:2404.05880 [pdf, other]

Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge

Authors: Weikai Lu, Ziqian Zeng, Jianwei Wang, Zhengdong Lu, Zelin Chen, Huiping Zhuang, Cen Chen

Abstract: Jailbreaking attacks can enable Large Language Models (LLMs) to bypass the safeguard and generate harmful content. Existing jailbreaking defense methods have failed to address the fundamental issue that harmful knowledge resides within the model, leading to potential jailbreak risks for LLMs. In this paper, we propose a novel defense method called Eraser, which mainly includes three goals: unlearn… ▽ More Jailbreaking attacks can enable Large Language Models (LLMs) to bypass the safeguard and generate harmful content. Existing jailbreaking defense methods have failed to address the fundamental issue that harmful knowledge resides within the model, leading to potential jailbreak risks for LLMs. In this paper, we propose a novel defense method called Eraser, which mainly includes three goals: unlearning harmful knowledge, retaining general knowledge, and maintaining safety alignment. The intuition is that if an LLM forgets the specific knowledge required to answer a harmful question, it will no longer have the ability to answer harmful questions. The training of Erase does not actually require the model's own harmful knowledge, and it can benefit from unlearning general answers related to harmful queries, which means it does not need assistance from the red team. The experimental results show that Eraser can significantly reduce the jailbreaking success rate for various attacks without compromising the general capabilities of the model. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05680 [pdf, other]

SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation

Authors: Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, Xiaoguang Han

Abstract: While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often caus… ▽ More While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often causes artifacts for back views. Based on our in-depth analysis, we found the reasons are mainly twofold. First, from network architecture perspective, we found each plane in the utilized tri-plane/tri-grid representation space tends to confuse the features from both sides, causing "mirroring" artifacts (e.g., the glasses appear in the back). Second, from data supervision aspect, we found that existing discriminator training in 3D GANs mainly focuses on the quality of the rendered image itself, and does not care much about its plausibility with the perspective from which it was rendered. This makes it possible to generate "face" in non-frontal views, due to its easiness to fool the discriminator. In response, we propose SphereHead, a novel tri-plane representation in the spherical coordinate system that fits the human head's geometric characteristics and efficiently mitigates many of the generated artifacts. We further introduce a view-image consistency loss for the discriminator to emphasize the correspondence of the camera parameters and the images. The combination of these efforts results in visually superior outcomes with significantly fewer artifacts. Our code and dataset are publicly available at https://lhyfst.github.io/spherehead. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: project page: https://lhyfst.github.io/spherehead

arXiv:2404.05596 [pdf, other]

A Comparative Study of the Ground State Transitions of CO and [C I] as Molecular Gas Tracers at High Redshift

Authors: Marta Frias Castillo, Matus Rybak, Jacqueline A. Hodge, Paul Van der Werk, Ian Smail, Joshua Butterworth, Jasper Jansen, Theodoros Topkaras, Chian-Chou Chen, Scott C. Chapman, Axel Weiss, Hiddo Algera, Jack E. Birkin, Elisabete da Cunha, Jianhang Chen, Helmut Dannerbauer, E. F. Jiménez-Andrade, Soh Ikarashi, Cheng-Lin Liao, Eric J. Murphy, A. M. Swinbank, Fabian Walter, Gabriela Calistro Rivera, R. J. Ivison, Claudia del P. Lagos

Abstract: The CO(1--0) and [\ion{C}{1}](1--0) emission lines are well-established tracers of cold molecular gas mass in local galaxies. At high redshift, where the interstellar medium (ISM) is likely to be denser, there have been limited direct comparisons of both ground state transitions. Here we present a study of CO(1--0) and [\ion{C}{1}](1--0) emission in a sample of 20 unlensed dusty, star-forming gala… ▽ More The CO(1--0) and [\ion{C}{1}](1--0) emission lines are well-established tracers of cold molecular gas mass in local galaxies. At high redshift, where the interstellar medium (ISM) is likely to be denser, there have been limited direct comparisons of both ground state transitions. Here we present a study of CO(1--0) and [\ion{C}{1}](1--0) emission in a sample of 20 unlensed dusty, star-forming galaxies at $z=2-5$. The CO(1--0)/[\ion{C}{1}](1--0) ratio is constant up to at least $z=5$, supporting the use of [CI](1-0) as a gas mass tracer. PDR modelling of the available data indicates a median H$_2$ density of log$(n~[$cm$^{-3}])=4.7\pm0.2$, and UV radiation field log$(G_{\mathrm{UV}} [G$_0$])=3.2\pm0.2$. We use the CO(1--0), [\ion{C}{1}](1--0) and 3mm dust continuum measurements to cross--calibrate the respective gas mass conversion factors, finding no dependence of these factors on either redshift or infrared luminosity. Assuming a variable CO conversion factor then implies [\ion{C}{1}] and dust conversion factors that differ from canonically assumed values but are consistent with the solar/super-solar metallicities expected for our sources. Radiative transfer modelling shows that the warmer CMB at high redshift can significantly affect the [\ion{C}{1}] as well as CO emission, which can change the derived molecular gas masses by up to 70\% for the coldest kinetic gas temperatures expected. Nevertheless, we show that the magnitude of the effect on the ratio of the tracers is within the known scatter of the $L'_\mathrm{CO}-L'_\mathrm{[CI]}$ relation. Further determining the absolute decrease of individual line intensities will require well-sampled spectral line energy distributions (SLEDs) to model the gas excitation conditions in more detail. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05231 [pdf, other]

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Authors: Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, Lizhuang Ma

Abstract: The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the… ▽ More The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the above problem, this paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD. First, we propose semantic concatenation which can transpose normal prompts into anomaly prompts by concatenating normal prompts with anomaly suffixes, thus constructing a large number of negative samples used to guide prompt learning in one-class setting. Furthermore, to mitigate the training challenge caused by the absence of anomaly images, we introduce the concept of explicit anomaly margin, which is used to explicitly control the margin between normal prompt features and anomaly prompt features through a hyper-parameter. For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024

arXiv:2404.05206 [pdf, other]

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

Authors: Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman

Abstract: We propose a novel self-supervised embedding to learn how actions sound from narrated in-the-wild egocentric videos. Whereas existing methods rely on curated data with known audio-visual correspondence, our multimodal contrastive-consensus coding (MC3) embedding reinforces the associations between audio, language, and vision when all modality pairs agree, while diminishing those associations when… ▽ More We propose a novel self-supervised embedding to learn how actions sound from narrated in-the-wild egocentric videos. Whereas existing methods rely on curated data with known audio-visual correspondence, our multimodal contrastive-consensus coding (MC3) embedding reinforces the associations between audio, language, and vision when all modality pairs agree, while diminishing those associations when any one pair does not. We show our approach can successfully discover how the long tail of human actions sound from egocentric video, outperforming an array of recent multimodal embedding techniques on two datasets (Ego4D and EPIC-Sounds) and multiple cross-modal tasks. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR 2024. Project page: https://vision.cs.utexas.edu/projects/soundingactions

arXiv:2404.04917 [pdf, ps, other]

Search for $ηいーた_c(2S)\to 2(πぱい^+πぱい^-)$ and improved measurement of $χかい_{cJ}\to 2(πぱい^+πぱい^-)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: We search for the hadronic decay $ηいーた_c(2S)\to 2(πぱい^+πぱい^-)$ in the $ψぷさい(3686)\toγがんまηいーた_c(2S)$ radiative decay using $(27.12\pm 0.14)\times 10^8$ $ψぷさい(3686)$ events collected by the BESIII detector at the BEPCII collider. No significant signal is found, and the upper limit of $\mathcal{B}[ψぷさい(3686)\toγがんまηいーた_c(2S)]\mathcal{B}[ηいーた_c(2S)\to 2(πぱい^+πぱい^-)]$ is determined to be $0.78\times 10^{-6}$ at the 90\% confidence level… ▽ More We search for the hadronic decay $ηいーた_c(2S)\to 2(πぱい^+πぱい^-)$ in the $ψぷさい(3686)\toγがんまηいーた_c(2S)$ radiative decay using $(27.12\pm 0.14)\times 10^8$ $ψぷさい(3686)$ events collected by the BESIII detector at the BEPCII collider. No significant signal is found, and the upper limit of $\mathcal{B}[ψぷさい(3686)\toγがんまηいーた_c(2S)]\mathcal{B}[ηいーた_c(2S)\to 2(πぱい^+πぱい^-)]$ is determined to be $0.78\times 10^{-6}$ at the 90\% confidence level. Using $ψぷさい(3686)\toγがんまχかい_{cJ}$ transitions, we also measure the branching fractions of $\mathcal{B}[χかい_{cJ(J=0,1,2)}\to 2(πぱい^+πぱい^-)]$, which are $\mathcal{B}[χかい_{c0}\to 2(πぱい^+πぱい^-)]=(2.127\pm 0.002~(\mathrm{stat.})\pm 0.101~(\mathrm{syst.}))$\%, $\mathcal{B}[χかい_{c1}\to 2(πぱい^+πぱい^-)]=(0.685\pm 0.001~(\mathrm{stat.})\pm 0.031~\mathrm{syst.}))$\%, and $\mathcal{B}[χかい_{c2}\to 2(πぱい^+πぱい^-)]=(1.153\pm 0.001~(\mathrm{stat.})\pm 0.063~(\mathrm{syst.}))$\%. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04886 [pdf, other]

PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer

Authors: Xingyu Su, Xiaojie Zhu, Yang Li, Yong Li, Chi Chen, Paulo Esteves-Veríssimo

Abstract: Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist. To address these challenges, we present PagPassGPT, a password guessing model constructed on Generative Pretrained Transformer (GPT). It can perform pattern guided guessing by incorporating pattern structure information as background knowledge,… ▽ More Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist. To address these challenges, we present PagPassGPT, a password guessing model constructed on Generative Pretrained Transformer (GPT). It can perform pattern guided guessing by incorporating pattern structure information as background knowledge, resulting in a significant increase in the hit rate. Furthermore, we propose D&C-GEN to reduce the repeat rate of generated passwords, which adopts the concept of a divide-and-conquer approach. The primary task of guessing passwords is recursively divided into non-overlapping subtasks. Each subtask inherits the knowledge from the parent task and predicts succeeding tokens. In comparison to the state-of-the-art model, our proposed scheme exhibits the capability to correctly guess 12% more passwords while producing 25% fewer duplicates. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04681 [pdf, other]

Computation and Critical Transitions of Rate-Distortion-Perception Functions With Wasserstein Barycenter

Authors: Chunhui Chen, Xueyan Niu, Wenhao Ye, Hao Wu, Bo Bai

Abstract: The information rate-distortion-perception (RDP) function characterizes the three-way trade-off between description rate, average distortion, and perceptual quality measured by discrepancy between probability distributions. We study several variants of the RDP functions through the lens of optimal transport. By transforming the information RDP function into a Wasserstein Barycenter problem, we ide… ▽ More The information rate-distortion-perception (RDP) function characterizes the three-way trade-off between description rate, average distortion, and perceptual quality measured by discrepancy between probability distributions. We study several variants of the RDP functions through the lens of optimal transport. By transforming the information RDP function into a Wasserstein Barycenter problem, we identify the critical transitions when one of the constraints becomes inactive and demonstrate several critical transition properties of the RDP variants. Further, the non-strictly convexity brought by the perceptual constraint can be regularized by an entropy regularization term. We prove that the entropy regularized model converges to the original problem and propose an alternating iteration method based on the Sinkhorn algorithm to numerically solve the regularized optimization problem. Experimental results demonstrate the effectiveness and accuracy of the proposed algorithms. As a practical application of our theory, we incorporate our numerical method into a reverse data hiding problem, where a secret message is imperceptibly embedded into the image with guarantees of the perceptual fidelity. △ Less

Submitted 9 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: arXiv admin note: text overlap with arXiv:2304.14611. This paper was presented in part at the 2023 IEEE International Symposium on Information Theory

arXiv:2404.04661 [pdf, other]

Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

Authors: Tianle Pu, Changjun Fan, Mutian Shen, Yizhou Lu, Li Zeng, Zohar Nussinov, Chao Chen, Zhong Liu

Abstract: Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning problem. However, current finite-horizon-MDP based RL models have inherent limitations. They are not all… ▽ More Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning problem. However, current finite-horizon-MDP based RL models have inherent limitations. They are not allowed to explore adquately for improving solutions at test time, which may be necessary given the complexity of NP-hard optimization tasks. Some recent attempts solve this issue by focusing on reward design and state feature engineering, which are tedious and ad-hoc. In this work, we instead propose a much simpler but more effective technique, named gauge transformation (GT). The technique is originated from physics, but is very effective in enabling RL agents to explore to continuously improve the solutions during test. Morever, GT is very simple, which can be implemented with less than 10 lines of Python codes, and can be applied to a vast majority of RL models. Experimentally, we show that traditional RL models with GT technique produce the state-of-the-art performances on the MaxCut problem. Furthermore, since GT is independent of any RL models, it can be seamlessly integrated into various RL frameworks, paving the way of these models for more effective explorations in the solving of general COPs. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.04640 [pdf, other]

Search for di-photon decays of an axion-like particle in radiative decays of J/psi

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (604 additional authors not shown)

Abstract: We search for the di-photon decay of a light pseudoscalar axion-like particle, $a$, in radiative decays of the $J/ψぷさい$, using 10 billion $J/ψぷさい$ events collected with the BESIII detector. We find no evidence of a narrow resonance and set upper limits at the $95\%$ confidence level on the product branching fraction $\mathcal{B}(J/ψぷさい\to γがんまa) \times \mathcal{B}(a \to γがんまγがんま)$ and the axion-like particle photon… ▽ More We search for the di-photon decay of a light pseudoscalar axion-like particle, $a$, in radiative decays of the $J/ψぷさい$, using 10 billion $J/ψぷさい$ events collected with the BESIII detector. We find no evidence of a narrow resonance and set upper limits at the $95\%$ confidence level on the product branching fraction $\mathcal{B}(J/ψぷさい\to γがんまa) \times \mathcal{B}(a \to γがんまγがんま)$ and the axion-like particle photon coupling constant $g_{a γがんまγがんま}$ in the ranges of $(3.6-49.8) \times 10^{-8}$ and $(2.2 -103.8)\times 10^{-4}$ GeV$^{-1}$, respectively, for $0.18 \le m_a \le 2.85~$ GeV/$c^2$. These are the most stringent limits to date in this mass region. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 9 pages, 5 figures, Submitted to Phys. Rev. D (Letter)

Report number: BESIII Analysis Memo - 671

arXiv:2404.04268 [pdf]

The Use of Generative Search Engines for Knowledge Work and Complex Tasks

Authors: Siddharth Suri, Scott Counts, Leijie Wang, Chacha Chen, Mengting Wan, Tara Safavi, Jennifer Neville, Chirag Shah, Ryen W. White, Reid Andersen, Georg Buscher, Sathish Manivannan, Nagu Rangan, Longqi Yang

Abstract: Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine.… ▽ More Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine. Through the empirical analysis of Bing Copilot (Bing Chat), one of the first publicly available generative search engines, we analyze the types and complexity of tasks that people use Bing Copilot for compared to Bing Search. Findings indicate that people use the generative search engine for more knowledge work tasks that are higher in cognitive complexity than were commonly done with a traditional search engine. △ Less

Submitted 19 March, 2024; originally announced April 2024.

Comments: 32 pages, 3 figures, 4 tables

ACM Class: J.4

arXiv:2404.04231 [pdf, other]

Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

Authors: Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang, Chun-Pei Chen, Yu-Lun Liu, Min-Hung Chen, Hou-Ning Hu, Yung-Yu Chuang, Yen-Yu Lin

Abstract: This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts. We notice that there is a discrepancy between text a… ▽ More This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts. We notice that there is a discrepancy between text alignment and semantic segmentation: A text often consists of multiple semantic concepts, whereas semantic segmentation strives to create semantically homogeneous segments. To address this issue, we propose a novel framework, Image-Text Co-Decomposition (CoDe), where the paired image and text are jointly decomposed into a set of image regions and a set of word segments, respectively, and contrastive learning is developed to enforce region-word alignment. To work with a vision-language model, we present a prompt learning mechanism that derives an extra representation to highlight an image segment or a word segment of interest, with which more effective features can be extracted from that segment. Comprehensive experimental results demonstrate that our method performs favorably against existing text-supervised semantic segmentation methods on six benchmark datasets. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.04102 [pdf, other]

ROPO: Robust Preference Optimization for Large Language Models

Authors: Xize Liang, Chao Chen, Shuang Qiu, Jie Wang, Yue Wu, Zhihang Fu, Zhihao Shi, Feng Wu, Jieping Ye

Abstract: Preference alignment is pivotal for empowering large language models (LLMs) to generate helpful and harmless responses. However, the performance of preference alignment is highly sensitive to the prevalent noise in the preference data. Recent efforts for this problem either marginally alleviate the impact of noise without the ability to actually reduce its presence, or rely on costly teacher LLMs… ▽ More Preference alignment is pivotal for empowering large language models (LLMs) to generate helpful and harmless responses. However, the performance of preference alignment is highly sensitive to the prevalent noise in the preference data. Recent efforts for this problem either marginally alleviate the impact of noise without the ability to actually reduce its presence, or rely on costly teacher LLMs prone to reward misgeneralization. To address these challenges, we propose the RObust Preference Optimization (ROPO) framework, an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models. Specifically, ROPO iteratively solves a constrained optimization problem, where we dynamically assign a quality-aware weight for each sample and constrain the sum of the weights to the number of samples we intend to retain. For noise-tolerant training and effective noise identification, we derive a robust loss by suppressing the gradients of samples with high uncertainty. We demonstrate both empirically and theoretically that the derived loss is critical for distinguishing noisy samples from clean ones. Furthermore, inspired by our derived loss, we propose a robustness-guided rejection sampling technique to compensate for the potential important information in discarded queries. Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods, with its superiority growing as the noise rate increases. △ Less

Submitted 28 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.03375 [pdf, other]

Search for the $B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま$ decay

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1068 additional authors not shown)

Abstract: A search for the fully reconstructed $B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま$ decay is performed at the LHCb experiment using proton-proton collisions at $\sqrt{s}=13$\,TeV corresponding to an integrated luminosity of $5.4\,\mathrm{fb^{-1}}$. No significant signal is found and upper limits on the branching fraction in intervals of the dimuon mass are set \begin{align} {\cal B}(B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま) <… ▽ More A search for the fully reconstructed $B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま$ decay is performed at the LHCb experiment using proton-proton collisions at $\sqrt{s}=13$\,TeV corresponding to an integrated luminosity of $5.4\,\mathrm{fb^{-1}}$. No significant signal is found and upper limits on the branching fraction in intervals of the dimuon mass are set \begin{align} {\cal B}(B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま) < 4.2\times10^{-8},~&m(μみゅーμみゅー)\in[2m_μみゅー,~1.70]\,\mathrm{GeV/c^2} ,\nonumber {\cal B}(B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま) < 7.7\times10^{-8},~&m(μみゅーμみゅー)\in[1.70,~2.88]\,\mathrm{GeV/c^2},\nonumber {\cal B}(B_s^0 \rightarrow μみゅー^+μみゅー^-γがんま) < 4.2\times10^{-8},~&m(μみゅーμみゅー)\in[3.92 ,~m_{B_s^0}]\,\mathrm{GeV/c^2},\nonumber \end{align} at 95\% confidence level. Additionally, upper limits are set on the branching fraction in the $[2m_μみゅー,~1.70]\,\mathrm{GeV/c^2}$ dimuon mass region excluding the contribution from the intermediate $φふぁい(1020)$ meson, and in the region combining all dimuon-mass intervals. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-045.html

Report number: LHCb-PAPER-2023-045, CERN-EP-2024-065

arXiv:2404.03217 [pdf, other]

Evidence of the $h_c\to K_S^0 K^+πぱい^-+c.c.$ decay

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Based on $(2.712\pm0.014)\times10^9$ $ψぷさい(3686)$ events collected by the BESIII collaboration, evidence of the hadronic decay $h_c\to K_S^0K^+πぱい^-+c.c.$ is found with a significance of $4.3σしぐま$ in the $ψぷさい(3686)\toπぱい^0 h_c$ process. The branching fraction of $h_c\to K_S^0 K^+πぱい^- +c.c.$ is measured to be $(7.3\pm0.8\pm1.8)\times10^{-4}$, where the first and second uncertainties are statistical and systemat… ▽ More Based on $(2.712\pm0.014)\times10^9$ $ψぷさい(3686)$ events collected by the BESIII collaboration, evidence of the hadronic decay $h_c\to K_S^0K^+πぱい^-+c.c.$ is found with a significance of $4.3σしぐま$ in the $ψぷさい(3686)\toπぱい^0 h_c$ process. The branching fraction of $h_c\to K_S^0 K^+πぱい^- +c.c.$ is measured to be $(7.3\pm0.8\pm1.8)\times10^{-4}$, where the first and second uncertainties are statistical and systematic, respectively. Combining with the exclusive decay width of $ηいーた_c\to K\bar{K}πぱい$, our result indicates inconsistencies with both pQCD and NRQCD predictions. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.03180 [pdf, other]

Goldfish: An Efficient Federated Unlearning Framework

Authors: Houzhe Wang, Xiaojie Zhu, Chi Chen, Paulo Esteves-Veríssimo

Abstract: With recent legislation on the right to be forgotten, machine unlearning has emerged as a crucial research area. It facilitates the removal of a user's data from federated trained machine learning models without the necessity for retraining from scratch. However, current machine unlearning algorithms are confronted with challenges of efficiency and validity. To address the above issues, we propose… ▽ More With recent legislation on the right to be forgotten, machine unlearning has emerged as a crucial research area. It facilitates the removal of a user's data from federated trained machine learning models without the necessity for retraining from scratch. However, current machine unlearning algorithms are confronted with challenges of efficiency and validity. To address the above issues, we propose a new framework, named Goldfish. It comprises four modules: basic model, loss function, optimization, and extension. To address the challenge of low validity in existing machine unlearning algorithms, we propose a novel loss function. It takes into account the loss arising from the discrepancy between predictions and actual labels in the remaining dataset. Simultaneously, it takes into consideration the bias of predicted results on the removed dataset. Moreover, it accounts for the confidence level of predicted results. Additionally, to enhance efficiency, we adopt knowledge a distillation technique in the basic model and introduce an optimization module that encompasses the early termination mechanism guided by empirical risk and the data partition mechanism. Furthermore, to bolster the robustness of the aggregated model, we propose an extension module that incorporates a mechanism using adaptive distillation temperature to address the heterogeneity of user local data and a mechanism using adaptive weight to handle the variety in the quality of uploaded models. Finally, we conduct comprehensive experiments to illustrate the effectiveness of proposed approach. △ Less

Submitted 23 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.02990 [pdf, other]

ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale

Authors: Jinbin Huang, Chen Chen, Aditi Mishra, Bum Chul Kwon, Zhicheng Liu, Chris Bryan

Abstract: Generative image models have emerged as a promising technology to produce realistic images. Despite potential benefits, concerns grow about its misuse, particularly in generating deceptive images that could raise significant ethical, legal, and societal issues. Consequently, there is growing demand to empower users to effectively discern and comprehend patterns of AI-generated images. To this end,… ▽ More Generative image models have emerged as a promising technology to produce realistic images. Despite potential benefits, concerns grow about its misuse, particularly in generating deceptive images that could raise significant ethical, legal, and societal issues. Consequently, there is growing demand to empower users to effectively discern and comprehend patterns of AI-generated images. To this end, we developed ASAP, an interactive visualization system that automatically extracts distinct patterns of AI-generated images and allows users to interactively explore them via various views. To uncover fake patterns, ASAP introduces a novel image encoder, adapted from CLIP, which transforms images into compact "distilled" representations, enriched with information for differentiating authentic and fake images. These representations generate gradients that propagate back to the attention maps of CLIP's transformer block. This process quantifies the relative importance of each pixel to image authenticity or fakeness, exposing key deceptive patterns. ASAP enables the at scale interactive analysis of these patterns through multiple, coordinated visualizations. This includes a representation overview with innovative cell glyphs to aid in the exploration and qualitative evaluation of fake patterns across a vast array of images, as well as a pattern view that displays authenticity-indicating patterns in images and quantifies their impact. ASAP supports the analysis of cutting-edge generative models with the latest architectures, including GAN-based models like proGAN and diffusion models like the latent diffusion model. We demonstrate ASAP's usefulness through two usage scenarios using multiple fake image detection benchmark datasets, revealing its ability to identify and understand hidden patterns in AI-generated images, especially in detecting fake human faces produced by diffusion-based techniques. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 9 pages, 6 figures

arXiv:2404.02706 [pdf, other]

doi 10.1145/3613904.3642939

Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM

Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Yuekai Huang, Jun Hu, Qing Wang

Abstract: Mobile apps have become indispensable for accessing and participating in various environments, especially for low-vision users. Users with visual impairments can use screen readers to read the content of each screen and understand the content that needs to be operated. Screen readers need to read the hint-text attribute in the text input component to remind visually impaired users what to fill in.… ▽ More Mobile apps have become indispensable for accessing and participating in various environments, especially for low-vision users. Users with visual impairments can use screen readers to read the content of each screen and understand the content that needs to be operated. Screen readers need to read the hint-text attribute in the text input component to remind visually impaired users what to fill in. Unfortunately, based on our analysis of 4,501 Android apps with text inputs, over 0.76 of them are missing hint-text. These issues are mostly caused by developers' lack of awareness when considering visually impaired individuals. To overcome these challenges, we developed an LLM-based hint-text generation model called HintDroid, which analyzes the GUI information of input components and uses in-context learning to generate the hint-text. To ensure the quality of hint-text generation, we further designed a feedback-based inspection mechanism to further adjust hint-text. The automated experiments demonstrate the high BLEU and a user study further confirms its usefulness. HintDroid can not only help visually impaired individuals, but also help ordinary people understand the requirements of input components. HintDroid demo video: https://youtu.be/FWgfcctRbfI. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Accepted by the 2024 CHI Conference on Human Factors in Computing Systems

arXiv:2404.02430 [pdf]

Observation of Kosterlitz-Thouless Metal-to-Insulator Transition in Quantum Anomalous Hall Insulators

Authors: Ruoxi Zhang, Yi-Fan Zhao, Ling-Jie Zhou, Deyi Zhuo, Zi-Jie Yan, Chao-Xing Liu, Moses H. W. Chan, Chui-Zhen Chen, Cui-Zu Chang

Abstract: Interlayer exchange coupling (IEC) between two magnetic layers sandwiched by a nonmagnetic spacer layer plays a critical role in shaping the magnetic properties of such heterostructures. The quantum anomalous Hall (QAH) effect has been realized in a structure composed of two magnetically doped topological insulator (TI) layers separated by an undoped TI layer. The quantized Hall conductance observ… ▽ More Interlayer exchange coupling (IEC) between two magnetic layers sandwiched by a nonmagnetic spacer layer plays a critical role in shaping the magnetic properties of such heterostructures. The quantum anomalous Hall (QAH) effect has been realized in a structure composed of two magnetically doped topological insulator (TI) layers separated by an undoped TI layer. The quantized Hall conductance observed in this sandwich heterostructure originates from the combined contribution of the top and bottom surface states. In this work, we employ molecular beam epitaxy to synthesize a series of magnetic TI sandwiches with varying thicknesses of the middle undoped TI layer. The well-quantized QAH effect is observed in all these samples and its critical behavior is modulated by the IEC between the top and bottom magnetic TI layers. Near the plateau phase transition (PPT), we find that thinner QAH samples exhibit a two-dimensional critical metal behavior with nearly temperature-independent longitudinal resistance, whereas thicker QAH samples behave as a three-dimensional insulator with reduced longitudinal resistance at higher temperatures. The IEC-induced critical-metal-to-insulator transition in the QAH PPT regime can be understood through a two-channel Chalker-Coddington network model by tuning inter-channel tunneling. The agreement between experiment and theory strongly supports the QAH PPT within the Kosterlitz-Thouless framework, where the critical metal and disordered insulator phases exist in bound and unbound states of vortex-antivortex pairs, respectively. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 19 pages, 4 figures, comments are very much welcome

arXiv:2404.02033 [pdf, other]

Search for $C$-even states decaying to $D_{s}^{\pm}D_{s}^{*\mp}$ with masses between $4.08$ and $4.32$ $\rm GeV/{\it c}^{2}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Six $C$-even states, denoted as $X$, with quantum numbers $J^{PC}=0^{-+}$, $1^{\pm+}$, or $2^{\pm+}$, are searched for via the $e^+e^-\toγがんまD_{s}^{\pm}D_{s}^{*\mp}$ process using $(1667.39\pm8.84)~\mathrm{pb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII storage ring at center-of-mass energy of $\sqrt{s}=(4681.92\pm0.30)~\mathrm{MeV}$. No statistically s… ▽ More Six $C$-even states, denoted as $X$, with quantum numbers $J^{PC}=0^{-+}$, $1^{\pm+}$, or $2^{\pm+}$, are searched for via the $e^+e^-\toγがんまD_{s}^{\pm}D_{s}^{*\mp}$ process using $(1667.39\pm8.84)~\mathrm{pb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII storage ring at center-of-mass energy of $\sqrt{s}=(4681.92\pm0.30)~\mathrm{MeV}$. No statistically significant signal is observed in the mass range from $4.08$ to $4.32~\mathrm{GeV}/c^{2}$. The upper limits of $σしぐま[e^+e^-\toγがんまX]\cdot \mathcal{B}[X \to D_{s}^{\pm}D_{s}^{*\mp}]$ at a $90\%$ confidence level are determined. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01855 [pdf, other]

Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation

Authors: Shanshan Feng, Haoming Lyu, Caishun Chen, Yew-Soon Ong

Abstract: Next Point-of-interest (POI) recommendation provides valuable suggestions for users to explore their surrounding environment. Existing studies rely on building recommendation models from large-scale users' check-in data, which is task-specific and needs extensive computational resources. Recently, the pretrained large language models (LLMs) have achieved significant advancements in various NLP tas… ▽ More Next Point-of-interest (POI) recommendation provides valuable suggestions for users to explore their surrounding environment. Existing studies rely on building recommendation models from large-scale users' check-in data, which is task-specific and needs extensive computational resources. Recently, the pretrained large language models (LLMs) have achieved significant advancements in various NLP tasks and have also been investigated for recommendation scenarios. However, the generalization abilities of LLMs still are unexplored to address the next POI recommendations, where users' geographical movement patterns should be extracted. Although there are studies that leverage LLMs for next-item recommendations, they fail to consider the geographical influence and sequential transitions. Hence, they cannot effectively solve the next POI recommendation task. To this end, we design novel prompting strategies and conduct empirical studies to assess the capability of LLMs, e.g., ChatGPT, for predicting a user's next check-in. Specifically, we consider several essential factors in human movement behaviors, including user geographical preference, spatial distance, and sequential transitions, and formulate the recommendation task as a ranking problem. Through extensive experiments on two widely used real-world datasets, we derive several key findings. Empirical evaluations demonstrate that LLMs have promising zero-shot recommendation abilities and can provide accurate and reasonable predictions. We also reveal that LLMs cannot accurately comprehend geographical context information and are sensitive to the order of presentation of candidate POIs, which shows the limitations of LLMs and necessitates further research on robust human mobility reasoning mechanisms. △ Less

Submitted 22 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01705 [pdf]

Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

Authors: Qinfeng Zhu, Yuanzhi Cai, Yuan Fang, Yihan Yang, Cheng Chen, Lei Fan, Anh Nguyen

Abstract: High-resolution remotely sensed images pose a challenge for commonly used semantic segmentation methods such as Convolutional Neural Network (CNN) and Vision Transformer (ViT). CNN-based methods struggle with handling such high-resolution images due to their limited receptive field, while ViT faces challenges in handling long sequences. Inspired by Mamba, which adopts a State Space Model (SSM) to… ▽ More High-resolution remotely sensed images pose a challenge for commonly used semantic segmentation methods such as Convolutional Neural Network (CNN) and Vision Transformer (ViT). CNN-based methods struggle with handling such high-resolution images due to their limited receptive field, while ViT faces challenges in handling long sequences. Inspired by Mamba, which adopts a State Space Model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed images, named Samba. Samba utilizes an encoder-decoder architecture, with Samba blocks serving as the encoder for efficient multi-level semantic information extraction, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets, comparing its performance against top-performing CNN and ViT methods. The results reveal that Samba achieved unparalleled performance on commonly used remote sensing datasets for semantic segmentation. Our proposed Samba demonstrates for the first time the effectiveness of SSM in semantic segmentation of remotely sensed images, setting a new benchmark in performance for Mamba-based techniques in this specific application. The source code and baseline implementations are available at https://github.com/zhuqinfeng1999/Samba. △ Less

Submitted 11 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01217 [pdf, other]

Incorporating Domain Differential Equations into Graph Convolutional Networks to Lower Generalization Discrepancy

Authors: Yue Sun, Chao Chen, Yuesheng Xu, Sihong Xie, Rick S. Blum, Parv Venkitasubramaniam

Abstract: Ensuring both accuracy and robustness in time series prediction is critical to many applications, ranging from urban planning to pandemic management. With sufficient training data where all spatiotemporal patterns are well-represented, existing deep-learning models can make reasonably accurate predictions. However, existing methods fail when the training data are drawn from different circumstances… ▽ More Ensuring both accuracy and robustness in time series prediction is critical to many applications, ranging from urban planning to pandemic management. With sufficient training data where all spatiotemporal patterns are well-represented, existing deep-learning models can make reasonably accurate predictions. However, existing methods fail when the training data are drawn from different circumstances (e.g., traffic patterns on regular days) compared to test data (e.g., traffic patterns after a natural disaster). Such challenges are usually classified under domain generalization. In this work, we show that one way to address this challenge in the context of spatiotemporal prediction is by incorporating domain differential equations into Graph Convolutional Networks (GCNs). We theoretically derive conditions where GCNs incorporating such domain differential equations are robust to mismatched training and testing data compared to baseline domain agnostic models. To support our theory, we propose two domain-differential-equation-informed networks called Reaction-Diffusion Graph Convolutional Network (RDGCN), which incorporates differential equations for traffic speed evolution, and Susceptible-Infectious-Recovered Graph Convolutional Network (SIRGCN), which incorporates a disease propagation model. Both RDGCN and SIRGCN are based on reliable and interpretable domain differential equations that allow the models to generalize to unseen patterns. We experimentally show that RDGCN and SIRGCN are more robust with mismatched testing data than the state-of-the-art deep learning methods. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.01081 [pdf, other]

PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation

Authors: Yunze Liu, Changxi Chen, Chenjing Ding, Li Yi

Abstract: Humanoid Reaction Synthesis is pivotal for creating highly interactive and empathetic robots that can seamlessly integrate into human environments, enhancing the way we live, work, and communicate. However, it is difficult to learn the diverse interaction patterns of multiple humans and generate physically plausible reactions. The kinematics-based approaches face challenges, including issues like… ▽ More Humanoid Reaction Synthesis is pivotal for creating highly interactive and empathetic robots that can seamlessly integrate into human environments, enhancing the way we live, work, and communicate. However, it is difficult to learn the diverse interaction patterns of multiple humans and generate physically plausible reactions. The kinematics-based approaches face challenges, including issues like floating feet, sliding, penetration, and other problems that defy physical plausibility. The existing physics-based method often relies on kinematics-based methods to generate reference states, which struggle with the challenges posed by kinematic noise during action execution. Constrained by their reliance on diffusion models, these methods are unable to achieve real-time inference. In this work, we propose a Forward Dynamics Guided 4D Imitation method to generate physically plausible human-like reactions. The learned policy is capable of generating physically plausible and human-like reactions in real-time, significantly improving the speed(x33) and quality of reactions compared with the existing method. Our experiments on the InterHuman and Chi3D datasets, along with ablation studies, demonstrate the effectiveness of our approach. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00951 [pdf, other]

Adapting CSI-Guided Imaging Across Diverse Environments: An Experimental Study Leveraging Continuous Learning

Authors: Cheng Chen, Shoki Ohta, Takayuki Nishio, Mohamed Wahib

Abstract: This study explores the feasibility of adapting CSI-guided imaging across varied environments. Focusing on continuous model learning through continuous updates, we investigate CSI-Imager's adaptability in dynamically changing settings, specifically transitioning from an office to an industrial environment. Unlike traditional approaches that may require retraining for new environments, our experime… ▽ More This study explores the feasibility of adapting CSI-guided imaging across varied environments. Focusing on continuous model learning through continuous updates, we investigate CSI-Imager's adaptability in dynamically changing settings, specifically transitioning from an office to an industrial environment. Unlike traditional approaches that may require retraining for new environments, our experimental study aims to validate the potential of CSI-guided imaging to maintain accurate imaging performance through Continuous Learning (CL). By conducting experiments across different scenarios and settings, this work contributes to understanding the limitations and capabilities of existing CSI-guided imaging systems in adapting to new environmental contexts. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00922 [pdf, other]

Towards Memorization-Free Diffusion Models

Authors: Chen Chen, Daochang Liu, Chang Xu

Abstract: Pretrained diffusion models and their outputs are widely accessible due to their exceptional capacity for synthesizing high-quality images and their open-source nature. The users, however, may face litigation risks owing to the models' tendency to memorize and regurgitate training data during inference. To address this, we introduce Anti-Memorization Guidance (AMG), a novel framework employing thr… ▽ More Pretrained diffusion models and their outputs are widely accessible due to their exceptional capacity for synthesizing high-quality images and their open-source nature. The users, however, may face litigation risks owing to the models' tendency to memorize and regurgitate training data during inference. To address this, we introduce Anti-Memorization Guidance (AMG), a novel framework employing three targeted guidance strategies for the main causes of memorization: image and caption duplication, and highly specific user prompts. Consequently, AMG ensures memorization-free outputs while maintaining high image quality and text alignment, leveraging the synergy of its guidance methods, each indispensable in its own right. AMG also features an innovative automatic detection system for potential memorization during each step of inference process, allows selective application of guidance strategies, minimally interfering with the original sampling process to preserve output utility. We applied AMG to pretrained Denoising Diffusion Probabilistic Models (DDPM) and Stable Diffusion across various generation tasks. The results demonstrate that AMG is the first approach to successfully eradicates all instances of memorization with no or marginal impacts on image quality and text-alignment, as evidenced by FID and CLIP scores. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: CVPR2024

arXiv:2404.00865 [pdf, other]

Scaling Crystal Structure Relaxation with a Universal Trustworthy Deep Generative Model

Authors: Ziduo Yang, Yiming Zhao, Xiaoqing Liu, Xiuying Zhang, Yifan Li, Qiujie Lyu, Calvin Yu-Chian Chen, Lei Shen

Abstract: The evolution of AI and high-throughput technologies has boosted a rapid increase in the number of new materials, challenging our computational ability to comprehensively analyze their properties. Relaxed crystal structures often serve as the foundational basis for further property calculations. However, determining equilibrium structures traditionally involves computationally expensive iterative… ▽ More The evolution of AI and high-throughput technologies has boosted a rapid increase in the number of new materials, challenging our computational ability to comprehensively analyze their properties. Relaxed crystal structures often serve as the foundational basis for further property calculations. However, determining equilibrium structures traditionally involves computationally expensive iterative calculations. Here, we develop DeepRelax, an efficient deep generative model designed for rapid structural relaxation without any iterative process. DeepRelax learns the equilibrium structural distribution, enabling it to predict relaxed structures directly from their unrelaxed counterparts. The ability to perform structural relaxation in just a few hundred milliseconds per structure, combined with the scalability of parallel processing, makes DeepRelax particularly useful for large-scale virtual screening. To demonstrate the universality of DeepRelax, we benchmark it against three different databases of X-Mn-O oxides, Materials Project, and Computational 2D Materials Database with various types of materials. In these tests, DeepRelax exhibits both high accuracy and efficiency in structural relaxation, as further validated by DFT calculations. Finally, we integrate DeepRelax with an implementation of uncertainty quantification, enhancing its reliability and trustworthiness in material discovery. This work provides an efficient and trustworthy method to significantly accelerate large-scale computations, offering substantial advancements in the field of computational materials science. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00801 [pdf, other]

$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding

Authors: Ye Liu, Jixuan He, Wanhua Li, Junsik Kim, Donglai Wei, Hanspeter Pfister, Chang Wen Chen

Abstract: Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries. Most existing VTG models are built upon frame-wise final-layer CLIP features, aided by additional temporal backbones (e.g., SlowFast) with sophisticated temporal reasoning mechanisms. In this work, we claim that CLIP itself already show… ▽ More Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries. Most existing VTG models are built upon frame-wise final-layer CLIP features, aided by additional temporal backbones (e.g., SlowFast) with sophisticated temporal reasoning mechanisms. In this work, we claim that CLIP itself already shows great potential for fine-grained spatial-temporal modeling, as each layer offers distinct yet useful information under different granularity levels. Motivated by this, we propose Reversed Recurrent Tuning ($R^2$-Tuning), a parameter- and memory-efficient transfer learning framework for video temporal grounding. Our method learns a lightweight $R^2$ Block containing only 1.5% of the total parameters to perform progressive spatial-temporal modeling. Starting from the last layer of CLIP, $R^2$ Block recurrently aggregates spatial features from earlier layers, then refines temporal correlation conditioning on the given query, resulting in a coarse-to-fine scheme. $R^2$-Tuning achieves state-of-the-art performance across three VTG tasks (i.e., moment retrieval, highlight detection, and video summarization) on six public benchmarks (i.e., QVHighlights, Charades-STA, Ego4D-NLQ, TACoS, YouTube Highlights, and TVSum) even without the additional backbone, demonstrating the significance and effectiveness of the proposed scheme. Our code is available at https://github.com/yeliudev/R2-Tuning. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00243 [pdf, other]

DSFNet: Learning Disentangled Scenario Factorization for Multi-Scenario Route Ranking

Authors: Jiahao Yu, Yihai Duan, Longfei Xu, Chao Chen, Shuliang Liu, Li Chen, Kaikui Liu, Fan Yang, Ning Guo

Abstract: Multi-scenario route ranking (MSRR) is crucial in many industrial mapping systems. However, the industrial community mainly adopts interactive interfaces to encourage users to select pre-defined scenarios, which may hinder the downstream ranking performance. In addition, in the academic community, the multi-scenario ranking works only come from other fields, and there are no works specifically foc… ▽ More Multi-scenario route ranking (MSRR) is crucial in many industrial mapping systems. However, the industrial community mainly adopts interactive interfaces to encourage users to select pre-defined scenarios, which may hinder the downstream ranking performance. In addition, in the academic community, the multi-scenario ranking works only come from other fields, and there are no works specifically focusing on route data due to lacking a publicly available MSRR dataset. Moreover, all the existing multi-scenario works still fail to address the three specific challenges of MSRR simultaneously, i.e. explosion of scenario number, high entanglement, and high-capacity demand. Different from the prior, to address MSRR, our key idea is to factorize the complicated scenario in route ranking into several disentangled factor scenario patterns. Accordingly, we propose a novel method, Disentangled Scenario Factorization Network (DSFNet), which flexibly composes scenario-dependent parameters based on a high-capacity multi-factor-scenario-branch structure. Then, a novel regularization is proposed to induce the disentanglement of factor scenarios. Furthermore, two extra novel techniques, i.e. scenario-aware batch normalization and scenario-aware feature filtering, are developed to improve the network awareness of scenario representation. Additionally, to facilitate MSRR research in the academic community, we propose MSDR, the first large-scale publicly available annotated industrial Multi-Scenario Driving Route dataset. Comprehensive experimental results demonstrate the superiority of our DSFNet, which has been successfully deployed in AMap to serve the major online traffic. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2404.00095 [pdf, other]

GDA: Generalized Diffusion for Robust Test-time Adaptation

Authors: Yun-Yun Tsai, Fu-Chen Chen, Albert Y. C. Chen, Junfeng Yang, Che-Chun Su, Min Sun, Cheng-Hao Kuo

Abstract: Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the mod… ▽ More Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the model's weights. Unfortunately, those studies have primarily focused on pixel-level corruptions, thereby lacking the generalization to adapt to a broader range of OOD types. We introduce Generalized Diffusion Adaptation (GDA), a novel diffusion-based test-time adaptation method robust against diverse OOD types. Specifically, GDA iteratively guides the diffusion by applying a marginal entropy loss derived from the model, in conjunction with style and content preservation losses during the reverse sampling process. In other words, GDA considers the model's output behavior with the semantic information of the samples as a whole, which can reduce ambiguity in downstream tasks during the generation process. Evaluation across various popular model architectures and OOD benchmarks shows that GDA consistently outperforms prior work on diffusion-driven adaptation. Notably, it achieves the highest classification accuracy improvements, ranging from 4.4\% to 5.02\% on ImageNet-C and 2.5\% to 7.4\% on Rendition, Sketch, and Stylized benchmarks. This performance highlights GDA's generalization to a broader range of OOD benchmarks. △ Less

Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

arXiv:2404.00088 [pdf, other]

Discovery of optically emitting circumgalactic nebulae around the majority of UV-luminous quasars at intermediate redshift

Authors: Sean D. Johnson, Zhuoqi Will Liu, Jennifer I. Li, Joop Schaye, Jenny E. Greene, Sebastiano Cantalupo, Gwen C. Rudie, Zhijie Qu, Hsiao-Wen Chen, Marc Rafelski, Sowgat Muzahid, Mandy C. Chen, Thierry Contini, Wolfram Kollatschny, Nishant Mishra, Michael Rauch, Patrick Petitjean, Fakhri S. Zahedy

Abstract: We report the discovery of large ionized, [O II] emitting circumgalactic nebulae around the majority of thirty UV luminous quasars at $z=0.4-1.4$ observed with deep, wide-field integral field spectroscopy (IFS) with the Multi-Unit Spectroscopy Explorer (MUSE) by the Cosmic Ultraviolet Baryon Survey (CUBS) and MUSE Quasar Blind Emitters Survey (MUSEQuBES). Among the 30 quasars, seven (23%) exhibit… ▽ More We report the discovery of large ionized, [O II] emitting circumgalactic nebulae around the majority of thirty UV luminous quasars at $z=0.4-1.4$ observed with deep, wide-field integral field spectroscopy (IFS) with the Multi-Unit Spectroscopy Explorer (MUSE) by the Cosmic Ultraviolet Baryon Survey (CUBS) and MUSE Quasar Blind Emitters Survey (MUSEQuBES). Among the 30 quasars, seven (23%) exhibit [O II] emitting nebulae with major axis sizes greater than 100 kpc, twenty greater than 50 kpc (67%), and 27 (90%) greater than 20 kpc. Such large, optically emitting nebulae indicate that cool, dense, and metal-enriched circumgalactic gas is common in the halos of luminous quasars at intermediate redshift. Several of the largest nebulae exhibit morphologies that suggest interaction-related origins. We detect no correlation between the sizes and cosmological dimming corrected surface brightnesses of the nebulae and quasar redshift, luminosity, black hole mass, or radio-loudness, but find a tentative correlation between the nebulae and rest-frame [O II] equivalent width in the quasar spectra. This potential trend suggests a relationship between ISM content and gas reservoirs on CGM scales. The [O II]-emitting nebulae around the $z\approx1$ quasars are smaller and less common than Ly$αあるふぁ$ nebulae around $z\approx3$ quasars. These smaller sizes can be explained if the outer regions of the Ly$αあるふぁ$ halos arise from scattering in more neutral gas, by evolution in the cool CGM content of quasar host halos, by lower-than-expected metallicities on $\gtrsim50$ kpc scales around $z\approx1$ quasars, or by changes in quasar episodic lifetimes between $z=3$ and $1$. △ Less

Submitted 3 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

Comments: 18 pages, 5 figures, 2 tables. Accepted for publication in the Astrophysical Journal

Showing 151–200 of 6,805 results for author: Chen, C