-
Learning-Based Joint Antenna Selection and Precoding Design for Cell-Free MIMO Networks
Authors:
Liangzhi Wang,
Chen Chen,
Carlo Fischione,
Jie Zhang
Abstract:
This paper considers a downlink cell-free multiple-input multiple-output (MIMO) network in which multiple multi-antenna base stations (BSs) serve multiple users via coherent joint transmission. In order to reduce the energy consumption by radio frequency components, each BS selects a subset of antennas for downlink data transmission after estimating the channel state information (CSI). We aim to m…
▽ More
This paper considers a downlink cell-free multiple-input multiple-output (MIMO) network in which multiple multi-antenna base stations (BSs) serve multiple users via coherent joint transmission. In order to reduce the energy consumption by radio frequency components, each BS selects a subset of antennas for downlink data transmission after estimating the channel state information (CSI). We aim to maximize the sum spectral efficiency by jointly optimizing the antenna selection and precoding design. To alleviate the fronthaul overhead and enable real-time network operation, we propose a distributed scalable machine learning algorithm. In particular, at each BS, we deploy a convolutional neural network (CNN) for antenna selection and a graph neural network (GNN) for precoding design. Different from conventional centralized solutions that require a large amount of CSI and signaling exchange among the BSs, the proposed distributed machine learning algorithm takes only locally estimated CSI as input. With well-trained learning models, it is shown that the proposed algorithm significantly outperforms the distributed baseline schemes and achieves a sum spectral efficiency comparable to its centralized counterpart.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework
Authors:
Junyi Mei,
Shixuan Sun,
Chao Li,
Cheng Xu,
Cheng Chen,
Yibo Liu,
Jing Wang,
Cheng Zhao,
Xiaofeng Hou,
Minyi Guo,
Bingsheng He,
Xiaoliang Cong
Abstract:
Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarras…
▽ More
Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2x, 72.1x, and 16.4x speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training.
△ Less
Submitted 26 April, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
Search for rare $b \to d\ell^+\ell^-$ transitions at Belle
Authors:
Belle,
Belle II Collaborations,
:,
I. Adachi,
L. Aggarwal,
H. Aihara,
N. Akopov,
A. Aloisio,
S. Al Said,
D. M. Asner,
H. Atmacan,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Beaubien,
F. Becherer,
J. Becker
, et al. (371 additional authors not shown)
Abstract:
We present the results of a search for the $b \to d\ell^+\ell^-$ flavor-changing neutral-current rare decays $B^{+, 0} \to (η, ω, π^{+,0}, ρ^{+, 0}) e^+e^-$ and $B^{+, 0} \to (η, ω, π^{0}, ρ^{+}) μ^+μ^-$ using a $711$ fb$^{-1}$ data sample that contains $772 \times 10^{6}$ $B\overline{B}$ events. The data were collected at the $Υ(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy…
▽ More
We present the results of a search for the $b \to d\ell^+\ell^-$ flavor-changing neutral-current rare decays $B^{+, 0} \to (η, ω, π^{+,0}, ρ^{+, 0}) e^+e^-$ and $B^{+, 0} \to (η, ω, π^{0}, ρ^{+}) μ^+μ^-$ using a $711$ fb$^{-1}$ data sample that contains $772 \times 10^{6}$ $B\overline{B}$ events. The data were collected at the $Υ(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+e^-$ collider. We find no evidence for signal and set upper limits on branching fractions at the $90\%$ confidence level in the range $(3.8 - 47) \times 10^{-8}$ depending on the decay channel. The obtained limits are the world's best results. This is the first search for the channels $B^{+, 0} \to (ω, ρ^{+,0}) e^+e^-$ and $B^{+, 0} \to (ω, ρ^{+})μ^+μ^-$.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Ephemeral Myographic Motion: Repurposing the Myo Armband to Control Disposable Pneumatic Sculptures
Authors:
Celia Chen,
Alex Leitch
Abstract:
This paper details the development of an interactive sculpture built from deprecated hardware technology and intentionally decomposable, transient materials. We detail a case study of "Strain" - an emotive prototype that reclaims two orphaned digital artifacts to power a kinetic sculpture made of common disposable objects. We use the Myo, an abandoned myoelectric armband, in concert with the Progr…
▽ More
This paper details the development of an interactive sculpture built from deprecated hardware technology and intentionally decomposable, transient materials. We detail a case study of "Strain" - an emotive prototype that reclaims two orphaned digital artifacts to power a kinetic sculpture made of common disposable objects. We use the Myo, an abandoned myoelectric armband, in concert with the Programmable Air, a soft-robotics prototyping project, to manipulate a pneumatic bladder array constructed from condoms, bamboo skewers, and a small library of 3D printed PLA plastic connectors designed to work with these generic parts. The resulting sculpture achieves surprisingly organic actuation. The goal of this project is to produce several reusable components: software to resuscitate the Myo Armband, homeostasis software for the Programmable Air or equivalent pneumatic projects, and a library of easily-printed parts that will work with generic bamboo disposables for sculptural prototyping. This project works to develop usable, repeatable engineering by applying it to a slightly whimsical object that promotes a strong emotional response in its audience. Through this, we transform the disposable into the sustainable. In this paper, we reflect on project-based insights into rescuing and revitalizing abandoned consumer electronics for future works.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
JWST Discovery of $40+$ Microlensed Stars in a Magnified Galaxy, the "Dragon" behind Abell 370
Authors:
Yoshinobu Fudamoto,
Fengwu Sun,
Jose M. Diego,
Liang Dai,
Masamune Oguri,
Adi Zitrin,
Erik Zackrisson,
Mathilde Jauzac,
David J. Lagattuta,
Eiichi Egami,
Edoardo Iani,
Rogier A. Windhorst,
Katsuya T. Abe,
Franz Erik Bauer,
Fuyan Bian,
Rachana Bhatawdekar,
Thomas J. Broadhurst,
Zheng Cai,
Chian-Chou Chen,
Wenlei Chen,
Seth H. Cohen,
Christopher J. Conselice,
Daniel Espada,
Nicholas Foo,
Brenda L. Frye
, et al. (21 additional authors not shown)
Abstract:
Strong gravitational magnification by massive galaxy clusters enable us to detect faint background sources, resolve their detailed internal structures, and in the most extreme cases identify and study individual stars in distant galaxies. Highly magnified individual stars allow for a wide range of applications, including studies of stellar populations in distant galaxies and constraining small-sca…
▽ More
Strong gravitational magnification by massive galaxy clusters enable us to detect faint background sources, resolve their detailed internal structures, and in the most extreme cases identify and study individual stars in distant galaxies. Highly magnified individual stars allow for a wide range of applications, including studies of stellar populations in distant galaxies and constraining small-scale dark matter structures. However, these applications have been hampered by the small number of events observed, as typically one or a few stars are identified from each distant galaxy. Here, we report the discovery of 46 significant microlensed stars in a single strongly-lensed high-redshift galaxy behind the Abell 370 cluster at redshift of 0.725 when the Universe was half of its current age (dubbed the ``Dragon arc''), based on two observations separated by one year with the James Webb Space Telescope ({\it JWST}). These events are mostly found near the expected lensing critical curves, suggesting that these are magnified individual stars that appear as transients from intracluster stellar microlenses. Through multi-wavelength photometry and colors, we constrain stellar types and find that many of them are consistent with red giants/supergiants magnified by factors of thousands. This finding reveals an unprecedented high occurrence of microlensing events in the Dragon arc, and proves that {\it JWST}'s time-domain observations open up the possibility of conducting statistical studies of high-redshift stars and subgalactic scale perturbations in the lensing dark matter field.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Authors:
Ming Li,
Taojiannan Yang,
Huafeng Kuang,
Jie Wu,
Zhaoning Wang,
Xuefeng Xiao,
Chen Chen
Abstract:
To enhance the controllability of text-to-image diffusion models, existing efforts like ControlNet incorporated image-based conditional controls. In this paper, we reveal that existing methods still face significant challenges in generating images that align with the image conditional controls. To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicit…
▽ More
To enhance the controllability of text-to-image diffusion models, existing efforts like ControlNet incorporated image-based conditional controls. In this paper, we reveal that existing methods still face significant challenges in generating images that align with the image conditional controls. To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls. Specifically, for an input conditional control, we use a pre-trained discriminative reward model to extract the corresponding condition of the generated images, and then optimize the consistency loss between the input conditional control and extracted condition. A straightforward implementation would be generating images from random noises and then calculating the consistency loss, but such an approach requires storing gradients for multiple sampling timesteps, leading to considerable time and memory costs. To address this, we introduce an efficient reward strategy that deliberately disturbs the input images by adding noise, and then uses the single-step denoised images for reward fine-tuning. This avoids the extensive costs associated with image sampling, allowing for more efficient reward fine-tuning. Extensive experiments show that ControlNet++ significantly improves controllability under various conditional controls. For example, it achieves improvements over ControlNet by 7.9% mIoU, 13.4% SSIM, and 7.6% RMSE, respectively, for segmentation mask, line-art edge, and depth conditions.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Authors:
Haotian Zhang,
Haoxuan You,
Philipp Dufter,
Bowen Zhang,
Chen Chen,
Hong-You Chen,
Tsu-Jui Fu,
William Yang Wang,
Shih-Fu Chang,
Zhe Gan,
Yinfei Yang
Abstract:
While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks. In this work, we unveil Ferret-v2, a significant upgrade to Ferret, with three key designs. (1) Any resolution grounding and…
▽ More
While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks. In this work, we unveil Ferret-v2, a significant upgrade to Ferret, with three key designs. (1) Any resolution grounding and referring: A flexible approach that effortlessly handles higher image resolution, improving the model's ability to process and understand images in greater detail. (2) Multi-granularity visual encoding: By integrating the additional DINOv2 encoder, the model learns better and diverse underlying contexts for global and fine-grained visual information. (3) A three-stage training paradigm: Besides image-caption alignment, an additional stage is proposed for high-resolution dense alignment before the final instruction tuning. Experiments show that Ferret-v2 provides substantial improvements over Ferret and other state-of-the-art methods, thanks to its high-resolution scaling and fine-grained visual processing.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Typical blocks of the category $\mathcal O$ and Whittaker modules for Takiff superalgebras
Authors:
Chih-Whi Chen,
Yongjie Wang
Abstract:
We study the simplicity of Kac induced modules over the $\ell$-th Takiff superalgebras $\widetilde{\mathfrak g}_\ell:= \widetilde{\mathfrak g}\otimes \mathbb C[θ]/(θ^{\ell+1})$, for $\ell>0$, associated with the Lie superalgebras $\widetilde{\mathfrak g}$ of type I. We formulate a general notion of typical weights and typical Jordan blocks of the category $\mathcal O$ for…
▽ More
We study the simplicity of Kac induced modules over the $\ell$-th Takiff superalgebras $\widetilde{\mathfrak g}_\ell:= \widetilde{\mathfrak g}\otimes \mathbb C[θ]/(θ^{\ell+1})$, for $\ell>0$, associated with the Lie superalgebras $\widetilde{\mathfrak g}$ of type I. We formulate a general notion of typical weights and typical Jordan blocks of the category $\mathcal O$ for $\widetilde{\mathfrak g}_\ell$ associated with Lie superalgebras $\mathfrak{gl}(m|n)$, $\mathfrak{osp}(2|2n)$ and $\mathfrak{pe}(n)$. For Lie superalgebras $\mathfrak{gl}(m|n)$ and $\mathfrak{osp}(2|2n)$, we establish an equivalence from an arbitrary typical Jordan block of the category $\mathcal O$ for $\widetilde{\mathfrak g}_\ell$ to a Jordan block of the category $\mathcal O$ for the even subalgebra of $\widetilde{\mathfrak g}_\ell$. This provides a solution to the problem of determining the composition multiplicities of the Verma modules over $\widetilde{\mathfrak g}_\ell$ with typical highest weights. We also investigate non-singular Whittaker modules over these Takiff superalgebras. In particular, we obtain a classification of non-singular simple Whittaker modules and a criterion for simplicity of non-singular standard Whittaker modules.
△ Less
Submitted 22 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Authors:
Aleksandar Botev,
Soham De,
Samuel L Smith,
Anushan Fernando,
George-Cristian Muraru,
Ruba Haroun,
Leonard Berrada,
Razvan Pascanu,
Pier Giuseppe Sessa,
Robert Dadashi,
Léonard Hussenot,
Johan Ferret,
Sertan Girgin,
Olivier Bachem,
Alek Andreev,
Kathleen Kenealy,
Thomas Mesnard,
Cassidy Hardin,
Surya Bhupatiraju,
Shreya Pathak,
Laurent Sifre,
Morgane Rivière,
Mihir Sanjay Kale,
Juliette Love,
Pouya Tafti
, et al. (37 additional authors not shown)
Abstract:
We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var…
▽ More
We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Multiparameter cascaded quantum interferometer
Authors:
Baihong Li,
Zhuo-zhuo Wang,
Qi-qi Li,
Changhua Chen,
Boxin Yuan,
Yiwei Zhai,
Rui-Bo Jin,
Xiaofei Zhang
Abstract:
We theoretically propose a multiparameter cascaded quantum interferometer in which a two-input and two-output setup is obtained by concatenating 50:50 beam splitters with n independent and adjustable time delays. A general method for deriving the coincidence probability of such an interferometer is given based on the linear transformation of the matrix of beam splitters. As examples, we analyze th…
▽ More
We theoretically propose a multiparameter cascaded quantum interferometer in which a two-input and two-output setup is obtained by concatenating 50:50 beam splitters with n independent and adjustable time delays. A general method for deriving the coincidence probability of such an interferometer is given based on the linear transformation of the matrix of beam splitters. As examples, we analyze the interference characteristics of one-, two- and three-parameter cascaded quantum interferometers with different frequency correlations and input states. Some typical interferograms of such interferometers are provided to reveal more rich and complicated two-photon interference phenomena. In principle, arbitrary two-input and two-output experimental setups can be designed with the proposal. This work offers a toolbox for designing versatile quantum interferometers and provides a convenient method for deriving the coincidence probabilities involved. Potential applications can be found in the complete spectral characterization of two-photon states, multiparameter estimation, and quantum metrology.
△ Less
Submitted 8 May, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Measurement of $e^{+}e^{-}\to ωη^{\prime}$ cross sections at $\sqrt{s}=$ 2.000 to 3.080 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (599 additional authors not shown)
Abstract:
The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be…
▽ More
The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be $Γ_{R}=(167\pm77\pm7)~\rm{MeV}$, where the first uncertainties are statistical and the second are systematic.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Search for prompt production of pentaquarks in charm hadron final states
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1090 additional authors not shown)
Abstract:
A search for hidden-charm pentaquark states decaying to a range of $Σ_{c}\bar{D}$ and $Λ_{c}\bar{D}$ final states, as well as doubly-charmed pentaquark states to $Σ_{c}D$ and $Λ_{c}^{+}D$, is made using samples of proton-proton collision data corresponding to an integrated luminosity of $5.7fb^{-1}$ recorded by the LHCb detector at $\sqrt{s} = 13Te\kern -0.1em V$. Since no significant signals are…
▽ More
A search for hidden-charm pentaquark states decaying to a range of $Σ_{c}\bar{D}$ and $Λ_{c}\bar{D}$ final states, as well as doubly-charmed pentaquark states to $Σ_{c}D$ and $Λ_{c}^{+}D$, is made using samples of proton-proton collision data corresponding to an integrated luminosity of $5.7fb^{-1}$ recorded by the LHCb detector at $\sqrt{s} = 13Te\kern -0.1em V$. Since no significant signals are found, upper limits are set on the pentaquark yields relative to that of the $Λ_{c}^{+}$ baryon in the $Λ_{c}^{+}\to pK^{-}π^{+}$ decay mode. The known pentaquark states are also investigated, and their signal yields are found to be consistent with zero in all cases.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Observational features of reflection asymmetric black holes
Authors:
Che-Yu Chen,
Hung-Yi Pu
Abstract:
The Kerr spacetime is symmetric with respect to a well-defined equatorial plane. When testing the equatorial reflection symmetry of an isolated black hole, one is at the same time testing the Kerr hypothesis in General Relativity. In this work, we investigate the possible observational features when a Keplerian disk is surrounding a rotating black hole without reflection symmetry. When such symmet…
▽ More
The Kerr spacetime is symmetric with respect to a well-defined equatorial plane. When testing the equatorial reflection symmetry of an isolated black hole, one is at the same time testing the Kerr hypothesis in General Relativity. In this work, we investigate the possible observational features when a Keplerian disk is surrounding a rotating black hole without reflection symmetry. When such symmetry is broken, generically, the photon trajectories around the black hole and the Keplerian orbits on the accretion disk are distorted vertically away from the equatorial plane by an amount that depends on their distance to the black hole. In the reflection asymmetric spacetime we are considering, these two kinds of orbits are distorted in opposite directions. Interestingly, while the size and shape of black hole shadows closely resemble those of Kerr black holes, distinct observational characteristics can emerge in the disk image and emission line profiles. When observing the disk edge-on, a pronounced concave shape may appear along its innermost edge on the incoming side. Furthermore, distinctive horn-like features might be observed on the spectral line profile at the blue-shifted side. These special features can serve as compelling indicators of the reflection asymmetry present in rotating black holes.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Probing the shape of the brown dwarf desert around main-sequence A-F-G-type stars using post-common-envelope WD$-$BD binaries
Authors:
Zhangliang Chen,
Yizhi Chen,
Chen Chen,
Hongwei Ge,
Bo Ma
Abstract:
Brown dwarfs (BDs) possessing masses within the range $40-60 M_{\rm Jup}$ are rare around solar-type main-sequence (MS) stars, which gives rise to the brown dwarf desert (BDD). One caveat associated with previous studies of BDD is the relatively limited sample size of MS$-$BD binaries with accurately determined BD masses. We aim to produce a large sample of brown dwarf companions with precisely de…
▽ More
Brown dwarfs (BDs) possessing masses within the range $40-60 M_{\rm Jup}$ are rare around solar-type main-sequence (MS) stars, which gives rise to the brown dwarf desert (BDD). One caveat associated with previous studies of BDD is the relatively limited sample size of MS$-$BD binaries with accurately determined BD masses. We aim to produce a large sample of brown dwarf companions with precisely determined mass around main-sequence A-F-G type stars using observations of post common-envelope white dwarf (WD)$-$BD binaries. We employ the rapid binary evolution code COMPAS to deduce the properties of MS$-$BD binary progenitors from post common-envelope WD$-$BD binaries. This method supplements the directly observed MS$-$BD binary sample, enriching the data available for analyzing BDD around main-sequence A-F-G type stars. Our study opens a new window for studying the shape of BDD around A-F-G type main-sequence stars in the short period regime. We find tentative evidence that the `driest' part of BDD around A-F-G type stars may extend into an orbital period of several hundred days, albeit with a small sample size. More post common-envelope WD$-$BD binaries detected in the future will advance our understanding of the BDD around A-F-G type stars.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
CGNSDE: Conditional Gaussian Neural Stochastic Differential Equation for Modeling Complex Systems and Data Assimilation
Authors:
Chuanqi Chen,
Nan Chen,
Jin-Long Wu
Abstract:
A new knowledge-based and machine learning hybrid modeling approach, called conditional Gaussian neural stochastic differential equation (CGNSDE), is developed to facilitate modeling complex dynamical systems and implementing analytic formulae of the associated data assimilation (DA). In contrast to the standard neural network predictive models, the CGNSDE is designed to effectively tackle both fo…
▽ More
A new knowledge-based and machine learning hybrid modeling approach, called conditional Gaussian neural stochastic differential equation (CGNSDE), is developed to facilitate modeling complex dynamical systems and implementing analytic formulae of the associated data assimilation (DA). In contrast to the standard neural network predictive models, the CGNSDE is designed to effectively tackle both forward prediction tasks and inverse state estimation problems. The CGNSDE starts by exploiting a systematic causal inference via information theory to build a simple knowledge-based nonlinear model that nevertheless captures as much explainable physics as possible. Then, neural networks are supplemented to the knowledge-based model in a specific way, which not only characterizes the remaining features that are challenging to model with simple forms but also advances the use of analytic formulae to efficiently compute the nonlinear DA solution. These analytic formulae are used as an additional computationally affordable loss to train the neural networks that directly improve the DA accuracy. This DA loss function promotes the CGNSDE to capture the interactions between state variables and thus advances its modeling skills. With the DA loss, the CGNSDE is more capable of estimating extreme events and quantifying the associated uncertainty. Furthermore, crucial physical properties in many complex systems, such as the translate-invariant local dependence of state variables, can significantly simplify the neural network structures and facilitate the CGNSDE to be applied to high-dimensional systems. Numerical experiments based on chaotic systems with intermittency and strong non-Gaussian features indicate that the CGNSDE outperforms knowledge-based regression models, and the DA loss further enhances the modeling skills of the CGNSDE.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Measurement of the Born cross section for $e^{+}e^{-}\to ηh_c $ at center-of-mass energies between 4.1 and 4.6\,GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σ$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth,…
▽ More
We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σ$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth, where the first uncertainties are statistical and the second systematic.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data
Authors:
Aakash Kumar,
Chen Chen,
Ajmal Mian,
Neils Lobo,
Mubarak Shah
Abstract:
3D detection is a critical task that enables machines to identify and locate objects in three-dimensional space. It has a broad range of applications in several fields, including autonomous driving, robotics and augmented reality. Monocular 3D detection is attractive as it requires only a single camera, however, it lacks the accuracy and robustness required for real world applications. High resolu…
▽ More
3D detection is a critical task that enables machines to identify and locate objects in three-dimensional space. It has a broad range of applications in several fields, including autonomous driving, robotics and augmented reality. Monocular 3D detection is attractive as it requires only a single camera, however, it lacks the accuracy and robustness required for real world applications. High resolution LiDAR on the other hand, can be expensive and lead to interference problems in heavy traffic given their active transmissions. We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection. Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor. Specifically, we use only 512 points, which is just 1% of a full LiDAR frame in the KITTI dataset. Our method reconstructs a complete 3D point cloud from this limited 3D information combined with a single image. The reconstructed 3D point cloud and corresponding image can be used by any multi-modal off-the-shelf detector for 3D object detection. By using the proposed network architecture with an off-the-shelf multi-modal 3D detector, the accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods and 6% to 9% compare to the baseline multi-modal methods on KITTI and JackRabbot datasets.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Multi-modal Document Presentation Attack Detection With Forensics Trace Disentanglement
Authors:
Changsheng Chen,
Yongyi Deng,
Liangwei Lin,
Zitong Yu,
Zhimao Lai
Abstract:
Document Presentation Attack Detection (DPAD) is an important measure in protecting the authenticity of a document image. However, recent DPAD methods demand additional resources, such as manual effort in collecting additional data or knowing the parameters of acquisition devices. This work proposes a DPAD method based on multi-modal disentangled traces (MMDT) without the above drawbacks. We first…
▽ More
Document Presentation Attack Detection (DPAD) is an important measure in protecting the authenticity of a document image. However, recent DPAD methods demand additional resources, such as manual effort in collecting additional data or knowing the parameters of acquisition devices. This work proposes a DPAD method based on multi-modal disentangled traces (MMDT) without the above drawbacks. We first disentangle the recaptured traces by a self-supervised disentanglement and synthesis network to enhance the generalization capacity in document images with different contents and layouts. Then, unlike the existing DPAD approaches that rely only on data in the RGB domain, we propose to explicitly employ the disentangled recaptured traces as new modalities in the transformer backbone through adaptive multi-modal adapters to fuse RGB/trace features efficiently. Visualization of the disentangled traces confirms the effectiveness of the proposed method in different document contents. Extensive experiments on three benchmark datasets demonstrate the superiority of our MMDT method on representing forensic traces of recapturing distortion.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Search for the Rare Decays $D_s^+\to h^+(h^{0})e^+e^-$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (618 additional authors not shown)
Abstract:
Using 7.33~fb$^{-1}$ of $e^{+}e^{-}$ collision data collected by the BESIII detector at center-of-mass energies in the range of $\sqrt{s}=4.128 - 4.226$~GeV, we search for the rare decays $D_{s}^+\to h^+(h^{0})e^{+}e^{-}$, where $h$ represents a kaon or pion. By requiring the $e^{+}e^{-}$ invariant mass to be consistent with a $φ(1020)$, $0.98<M(e^{+}e^{-})<1.04$ ~GeV/$c^2$, the decay…
▽ More
Using 7.33~fb$^{-1}$ of $e^{+}e^{-}$ collision data collected by the BESIII detector at center-of-mass energies in the range of $\sqrt{s}=4.128 - 4.226$~GeV, we search for the rare decays $D_{s}^+\to h^+(h^{0})e^{+}e^{-}$, where $h$ represents a kaon or pion. By requiring the $e^{+}e^{-}$ invariant mass to be consistent with a $φ(1020)$, $0.98<M(e^{+}e^{-})<1.04$ ~GeV/$c^2$, the decay $D_s^+\toπ^+φ,φ\to e^{+}e^{-}$ is observed with a statistical significance of 7.8$σ$, and evidence for the decay $D_s^+\toρ^+φ,φ\to e^{+}e^{-}$ is found for the first time with a statistical significance of 4.4$σ$. The decay branching fractions are measured to be $\mathcal{B}(D_s^+\toπ^+φ, φ\to e^{+}e^{-} )=(1.17^{+0.23}_{-0.21}\pm0.03)\times 10^{-5}$, and $\mathcal{B}(D_s^+\toρ^+φ, φ\to e^{+}e^{-} )=(2.44^{+0.67}_{-0.62}\pm 0.16)\times 10^{-5}$, where the first uncertainties are statistical and the second systematic. No significant signal for the three four-body decays of $D_{s}^{+}\to π^{+}π^{0}e^{+}e^{-},\ D_{s}^{+}\to K^{+}π^{0}e^{+}e^{-}$, and $D_{s}^{+}\to K_{S}^{0}π^{+}e^{+}e^{-}$ is observed. For $D_{s}^{+}\to π^{+}π^{0}e^{+}e^{-}$, the $φ$ mass region is vetoed to minimize the long-distance effects. The 90$\%$ confidence level upper limits set on the branching fractions of these decays are in the range of $(7.0-8.1)\times 10^{-5}$.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge
Authors:
Weikai Lu,
Ziqian Zeng,
Jianwei Wang,
Zhengdong Lu,
Zelin Chen,
Huiping Zhuang,
Cen Chen
Abstract:
Jailbreaking attacks can enable Large Language Models (LLMs) to bypass the safeguard and generate harmful content. Existing jailbreaking defense methods have failed to address the fundamental issue that harmful knowledge resides within the model, leading to potential jailbreak risks for LLMs. In this paper, we propose a novel defense method called Eraser, which mainly includes three goals: unlearn…
▽ More
Jailbreaking attacks can enable Large Language Models (LLMs) to bypass the safeguard and generate harmful content. Existing jailbreaking defense methods have failed to address the fundamental issue that harmful knowledge resides within the model, leading to potential jailbreak risks for LLMs. In this paper, we propose a novel defense method called Eraser, which mainly includes three goals: unlearning harmful knowledge, retaining general knowledge, and maintaining safety alignment. The intuition is that if an LLM forgets the specific knowledge required to answer a harmful question, it will no longer have the ability to answer harmful questions. The training of Erase does not actually require the model's own harmful knowledge, and it can benefit from unlearning general answers related to harmful queries, which means it does not need assistance from the red team. The experimental results show that Eraser can significantly reduce the jailbreaking success rate for various attacks without compromising the general capabilities of the model.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
Authors:
Heyuan Li,
Ce Chen,
Tianhao Shi,
Yuda Qiu,
Sizhe An,
Guanying Chen,
Xiaoguang Han
Abstract:
While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often caus…
▽ More
While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often causes artifacts for back views. Based on our in-depth analysis, we found the reasons are mainly twofold. First, from network architecture perspective, we found each plane in the utilized tri-plane/tri-grid representation space tends to confuse the features from both sides, causing "mirroring" artifacts (e.g., the glasses appear in the back). Second, from data supervision aspect, we found that existing discriminator training in 3D GANs mainly focuses on the quality of the rendered image itself, and does not care much about its plausibility with the perspective from which it was rendered. This makes it possible to generate "face" in non-frontal views, due to its easiness to fool the discriminator. In response, we propose SphereHead, a novel tri-plane representation in the spherical coordinate system that fits the human head's geometric characteristics and efficiently mitigates many of the generated artifacts. We further introduce a view-image consistency loss for the discriminator to emphasize the correspondence of the camera parameters and the images. The combination of these efforts results in visually superior outcomes with significantly fewer artifacts. Our code and dataset are publicly available at https://lhyfst.github.io/spherehead.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
A Comparative Study of the Ground State Transitions of CO and [C I] as Molecular Gas Tracers at High Redshift
Authors:
Marta Frias Castillo,
Matus Rybak,
Jacqueline A. Hodge,
Paul Van der Werk,
Ian Smail,
Joshua Butterworth,
Jasper Jansen,
Theodoros Topkaras,
Chian-Chou Chen,
Scott C. Chapman,
Axel Weiss,
Hiddo Algera,
Jack E. Birkin,
Elisabete da Cunha,
Jianhang Chen,
Helmut Dannerbauer,
E. F. Jiménez-Andrade,
Soh Ikarashi,
Cheng-Lin Liao,
Eric J. Murphy,
A. M. Swinbank,
Fabian Walter,
Gabriela Calistro Rivera,
R. J. Ivison,
Claudia del P. Lagos
Abstract:
The CO(1--0) and [\ion{C}{1}](1--0) emission lines are well-established tracers of cold molecular gas mass in local galaxies. At high redshift, where the interstellar medium (ISM) is likely to be denser, there have been limited direct comparisons of both ground state transitions. Here we present a study of CO(1--0) and [\ion{C}{1}](1--0) emission in a sample of 20 unlensed dusty, star-forming gala…
▽ More
The CO(1--0) and [\ion{C}{1}](1--0) emission lines are well-established tracers of cold molecular gas mass in local galaxies. At high redshift, where the interstellar medium (ISM) is likely to be denser, there have been limited direct comparisons of both ground state transitions. Here we present a study of CO(1--0) and [\ion{C}{1}](1--0) emission in a sample of 20 unlensed dusty, star-forming galaxies at $z=2-5$. The CO(1--0)/[\ion{C}{1}](1--0) ratio is constant up to at least $z=5$, supporting the use of [CI](1-0) as a gas mass tracer. PDR modelling of the available data indicates a median H$_2$ density of log$(n~[$cm$^{-3}])=4.7\pm0.2$, and UV radiation field log$(G_{\mathrm{UV}} [G$_0$])=3.2\pm0.2$. We use the CO(1--0), [\ion{C}{1}](1--0) and 3mm dust continuum measurements to cross--calibrate the respective gas mass conversion factors, finding no dependence of these factors on either redshift or infrared luminosity. Assuming a variable CO conversion factor then implies [\ion{C}{1}] and dust conversion factors that differ from canonically assumed values but are consistent with the solar/super-solar metallicities expected for our sources. Radiative transfer modelling shows that the warmer CMB at high redshift can significantly affect the [\ion{C}{1}] as well as CO emission, which can change the derived molecular gas masses by up to 70\% for the coldest kinetic gas temperatures expected. Nevertheless, we show that the magnitude of the effect on the ratio of the tracers is within the known scatter of the $L'_\mathrm{CO}-L'_\mathrm{[CI]}$ relation. Further determining the absolute decrease of individual line intensities will require well-sampled spectral line energy distributions (SLEDs) to model the gas excitation conditions in more detail.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
Authors:
Xiaofan Li,
Zhizhong Zhang,
Xin Tan,
Chengwei Chen,
Yanyun Qu,
Yuan Xie,
Lizhuang Ma
Abstract:
The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the…
▽ More
The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the above problem, this paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD. First, we propose semantic concatenation which can transpose normal prompts into anomaly prompts by concatenating normal prompts with anomaly suffixes, thus constructing a large number of negative samples used to guide prompt learning in one-class setting. Furthermore, to mitigate the training challenge caused by the absence of anomaly images, we introduce the concept of explicit anomaly margin, which is used to explicitly control the margin between normal prompt features and anomaly prompt features through a hyper-parameter. For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Authors:
Changan Chen,
Kumar Ashutosh,
Rohit Girdhar,
David Harwath,
Kristen Grauman
Abstract:
We propose a novel self-supervised embedding to learn how actions sound from narrated in-the-wild egocentric videos. Whereas existing methods rely on curated data with known audio-visual correspondence, our multimodal contrastive-consensus coding (MC3) embedding reinforces the associations between audio, language, and vision when all modality pairs agree, while diminishing those associations when…
▽ More
We propose a novel self-supervised embedding to learn how actions sound from narrated in-the-wild egocentric videos. Whereas existing methods rely on curated data with known audio-visual correspondence, our multimodal contrastive-consensus coding (MC3) embedding reinforces the associations between audio, language, and vision when all modality pairs agree, while diminishing those associations when any one pair does not. We show our approach can successfully discover how the long tail of human actions sound from egocentric video, outperforming an array of recent multimodal embedding techniques on two datasets (Ego4D and EPIC-Sounds) and multiple cross-modal tasks.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Search for $η_c(2S)\to 2(π^+π^-)$ and improved measurement of $χ_{cJ}\to 2(π^+π^-)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
We search for the hadronic decay $η_c(2S)\to 2(π^+π^-)$ in the $ψ(3686)\toγη_c(2S)$ radiative decay using $(27.12\pm 0.14)\times 10^8$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. No significant signal is found, and the upper limit of $\mathcal{B}[ψ(3686)\toγη_c(2S)]\mathcal{B}[η_c(2S)\to 2(π^+π^-)]$ is determined to be $0.78\times 10^{-6}$ at the 90\% confidence level…
▽ More
We search for the hadronic decay $η_c(2S)\to 2(π^+π^-)$ in the $ψ(3686)\toγη_c(2S)$ radiative decay using $(27.12\pm 0.14)\times 10^8$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. No significant signal is found, and the upper limit of $\mathcal{B}[ψ(3686)\toγη_c(2S)]\mathcal{B}[η_c(2S)\to 2(π^+π^-)]$ is determined to be $0.78\times 10^{-6}$ at the 90\% confidence level. Using $ψ(3686)\toγχ_{cJ}$ transitions, we also measure the branching fractions of $\mathcal{B}[χ_{cJ(J=0,1,2)}\to 2(π^+π^-)]$, which are $\mathcal{B}[χ_{c0}\to 2(π^+π^-)]=(2.127\pm 0.002~(\mathrm{stat.})\pm 0.101~(\mathrm{syst.}))$\%, $\mathcal{B}[χ_{c1}\to 2(π^+π^-)]=(0.685\pm 0.001~(\mathrm{stat.})\pm 0.031~\mathrm{syst.}))$\%, and $\mathcal{B}[χ_{c2}\to 2(π^+π^-)]=(1.153\pm 0.001~(\mathrm{stat.})\pm 0.063~(\mathrm{syst.}))$\%.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer
Authors:
Xingyu Su,
Xiaojie Zhu,
Yang Li,
Yong Li,
Chi Chen,
Paulo Esteves-Veríssimo
Abstract:
Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist. To address these challenges, we present PagPassGPT, a password guessing model constructed on Generative Pretrained Transformer (GPT). It can perform pattern guided guessing by incorporating pattern structure information as background knowledge,…
▽ More
Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist. To address these challenges, we present PagPassGPT, a password guessing model constructed on Generative Pretrained Transformer (GPT). It can perform pattern guided guessing by incorporating pattern structure information as background knowledge, resulting in a significant increase in the hit rate. Furthermore, we propose D&C-GEN to reduce the repeat rate of generated passwords, which adopts the concept of a divide-and-conquer approach. The primary task of guessing passwords is recursively divided into non-overlapping subtasks. Each subtask inherits the knowledge from the parent task and predicts succeeding tokens. In comparison to the state-of-the-art model, our proposed scheme exhibits the capability to correctly guess 12% more passwords while producing 25% fewer duplicates.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Computation and Critical Transitions of Rate-Distortion-Perception Functions With Wasserstein Barycenter
Authors:
Chunhui Chen,
Xueyan Niu,
Wenhao Ye,
Hao Wu,
Bo Bai
Abstract:
The information rate-distortion-perception (RDP) function characterizes the three-way trade-off between description rate, average distortion, and perceptual quality measured by discrepancy between probability distributions. We study several variants of the RDP functions through the lens of optimal transport. By transforming the information RDP function into a Wasserstein Barycenter problem, we ide…
▽ More
The information rate-distortion-perception (RDP) function characterizes the three-way trade-off between description rate, average distortion, and perceptual quality measured by discrepancy between probability distributions. We study several variants of the RDP functions through the lens of optimal transport. By transforming the information RDP function into a Wasserstein Barycenter problem, we identify the critical transitions when one of the constraints becomes inactive and demonstrate several critical transition properties of the RDP variants. Further, the non-strictly convexity brought by the perceptual constraint can be regularized by an entropy regularization term. We prove that the entropy regularized model converges to the original problem and propose an alternating iteration method based on the Sinkhorn algorithm to numerically solve the regularized optimization problem. Experimental results demonstrate the effectiveness and accuracy of the proposed algorithms. As a practical application of our theory, we incorporate our numerical method into a reverse data hiding problem, where a secret message is imperceptibly embedded into the image with guarantees of the perceptual fidelity.
△ Less
Submitted 9 April, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning
Authors:
Tianle Pu,
Changjun Fan,
Mutian Shen,
Yizhou Lu,
Li Zeng,
Zohar Nussinov,
Chao Chen,
Zhong Liu
Abstract:
Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning problem. However, current finite-horizon-MDP based RL models have inherent limitations. They are not all…
▽ More
Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning problem. However, current finite-horizon-MDP based RL models have inherent limitations. They are not allowed to explore adquately for improving solutions at test time, which may be necessary given the complexity of NP-hard optimization tasks. Some recent attempts solve this issue by focusing on reward design and state feature engineering, which are tedious and ad-hoc. In this work, we instead propose a much simpler but more effective technique, named gauge transformation (GT). The technique is originated from physics, but is very effective in enabling RL agents to explore to continuously improve the solutions during test. Morever, GT is very simple, which can be implemented with less than 10 lines of Python codes, and can be applied to a vast majority of RL models. Experimentally, we show that traditional RL models with GT technique produce the state-of-the-art performances on the MaxCut problem. Furthermore, since GT is independent of any RL models, it can be seamlessly integrated into various RL frameworks, paving the way of these models for more effective explorations in the solving of general COPs.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Search for di-photon decays of an axion-like particle in radiative decays of J/psi
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (604 additional authors not shown)
Abstract:
We search for the di-photon decay of a light pseudoscalar axion-like particle, $a$, in radiative decays of the $J/ψ$, using 10 billion $J/ψ$ events collected with the BESIII detector. We find no evidence of a narrow resonance and set upper limits at the $95\%$ confidence level on the product branching fraction $\mathcal{B}(J/ψ\to γa) \times \mathcal{B}(a \to γγ)$ and the axion-like particle photon…
▽ More
We search for the di-photon decay of a light pseudoscalar axion-like particle, $a$, in radiative decays of the $J/ψ$, using 10 billion $J/ψ$ events collected with the BESIII detector. We find no evidence of a narrow resonance and set upper limits at the $95\%$ confidence level on the product branching fraction $\mathcal{B}(J/ψ\to γa) \times \mathcal{B}(a \to γγ)$ and the axion-like particle photon coupling constant $g_{a γγ}$ in the ranges of $(3.6-49.8) \times 10^{-8}$ and $(2.2 -103.8)\times 10^{-4}$ GeV$^{-1}$, respectively, for $0.18 \le m_a \le 2.85~$ GeV/$c^2$. These are the most stringent limits to date in this mass region.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
The Use of Generative Search Engines for Knowledge Work and Complex Tasks
Authors:
Siddharth Suri,
Scott Counts,
Leijie Wang,
Chacha Chen,
Mengting Wan,
Tara Safavi,
Jennifer Neville,
Chirag Shah,
Ryen W. White,
Reid Andersen,
Georg Buscher,
Sathish Manivannan,
Nagu Rangan,
Longqi Yang
Abstract:
Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine.…
▽ More
Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine. Through the empirical analysis of Bing Copilot (Bing Chat), one of the first publicly available generative search engines, we analyze the types and complexity of tasks that people use Bing Copilot for compared to Bing Search. Findings indicate that people use the generative search engine for more knowledge work tasks that are higher in cognitive complexity than were commonly done with a traditional search engine.
△ Less
Submitted 19 March, 2024;
originally announced April 2024.
-
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Authors:
Ji-Jia Wu,
Andy Chia-Hao Chang,
Chieh-Yu Chuang,
Chun-Pei Chen,
Yu-Lun Liu,
Min-Hung Chen,
Hou-Ning Hu,
Yung-Yu Chuang,
Yen-Yu Lin
Abstract:
This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts. We notice that there is a discrepancy between text a…
▽ More
This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts. We notice that there is a discrepancy between text alignment and semantic segmentation: A text often consists of multiple semantic concepts, whereas semantic segmentation strives to create semantically homogeneous segments. To address this issue, we propose a novel framework, Image-Text Co-Decomposition (CoDe), where the paired image and text are jointly decomposed into a set of image regions and a set of word segments, respectively, and contrastive learning is developed to enforce region-word alignment. To work with a vision-language model, we present a prompt learning mechanism that derives an extra representation to highlight an image segment or a word segment of interest, with which more effective features can be extracted from that segment. Comprehensive experimental results demonstrate that our method performs favorably against existing text-supervised semantic segmentation methods on six benchmark datasets.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
ROPO: Robust Preference Optimization for Large Language Models
Authors:
Xize Liang,
Chao Chen,
Shuang Qiu,
Jie Wang,
Yue Wu,
Zhihang Fu,
Zhihao Shi,
Feng Wu,
Jieping Ye
Abstract:
Preference alignment is pivotal for empowering large language models (LLMs) to generate helpful and harmless responses. However, the performance of preference alignment is highly sensitive to the prevalent noise in the preference data. Recent efforts for this problem either marginally alleviate the impact of noise without the ability to actually reduce its presence, or rely on costly teacher LLMs…
▽ More
Preference alignment is pivotal for empowering large language models (LLMs) to generate helpful and harmless responses. However, the performance of preference alignment is highly sensitive to the prevalent noise in the preference data. Recent efforts for this problem either marginally alleviate the impact of noise without the ability to actually reduce its presence, or rely on costly teacher LLMs prone to reward misgeneralization. To address these challenges, we propose the RObust Preference Optimization (ROPO) framework, an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models. Specifically, ROPO iteratively solves a constrained optimization problem, where we dynamically assign a quality-aware weight for each sample and constrain the sum of the weights to the number of samples we intend to retain. For noise-tolerant training and effective noise identification, we derive a robust loss by suppressing the gradients of samples with high uncertainty. We demonstrate both empirically and theoretically that the derived loss is critical for distinguishing noisy samples from clean ones. Furthermore, inspired by our derived loss, we propose a robustness-guided rejection sampling technique to compensate for the potential important information in discarded queries. Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods, with its superiority growing as the noise rate increases.
△ Less
Submitted 28 May, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
Search for the $B_s^0 \rightarrow μ^+μ^-γ$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1068 additional authors not shown)
Abstract:
A search for the fully reconstructed $B_s^0 \rightarrow μ^+μ^-γ$ decay is performed at the LHCb experiment using proton-proton collisions at $\sqrt{s}=13$\,TeV corresponding to an integrated luminosity of $5.4\,\mathrm{fb^{-1}}$. No significant signal is found and upper limits on the branching fraction in intervals of the dimuon mass are set
\begin{align}
{\cal B}(B_s^0 \rightarrow μ^+μ^-γ) <…
▽ More
A search for the fully reconstructed $B_s^0 \rightarrow μ^+μ^-γ$ decay is performed at the LHCb experiment using proton-proton collisions at $\sqrt{s}=13$\,TeV corresponding to an integrated luminosity of $5.4\,\mathrm{fb^{-1}}$. No significant signal is found and upper limits on the branching fraction in intervals of the dimuon mass are set
\begin{align}
{\cal B}(B_s^0 \rightarrow μ^+μ^-γ) < 4.2\times10^{-8},~&m(μμ)\in[2m_μ,~1.70]\,\mathrm{GeV/c^2} ,\nonumber
{\cal B}(B_s^0 \rightarrow μ^+μ^-γ) < 7.7\times10^{-8},~&m(μμ)\in[1.70,~2.88]\,\mathrm{GeV/c^2},\nonumber
{\cal B}(B_s^0 \rightarrow μ^+μ^-γ) < 4.2\times10^{-8},~&m(μμ)\in[3.92 ,~m_{B_s^0}]\,\mathrm{GeV/c^2},\nonumber \end{align} at 95\% confidence level. Additionally, upper limits are set on the branching fraction in the $[2m_μ,~1.70]\,\mathrm{GeV/c^2}$ dimuon mass region excluding the contribution from the intermediate $φ(1020)$ meson, and in the region combining all dimuon-mass intervals.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Evidence of the $h_c\to K_S^0 K^+π^-+c.c.$ decay
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Based on $(2.712\pm0.014)\times10^9$ $ψ(3686)$ events collected by the BESIII collaboration, evidence of the hadronic decay $h_c\to K_S^0K^+π^-+c.c.$ is found with a significance of $4.3σ$ in the $ψ(3686)\toπ^0 h_c$ process. The branching fraction of $h_c\to K_S^0 K^+π^- +c.c.$ is measured to be $(7.3\pm0.8\pm1.8)\times10^{-4}$, where the first and second uncertainties are statistical and systemat…
▽ More
Based on $(2.712\pm0.014)\times10^9$ $ψ(3686)$ events collected by the BESIII collaboration, evidence of the hadronic decay $h_c\to K_S^0K^+π^-+c.c.$ is found with a significance of $4.3σ$ in the $ψ(3686)\toπ^0 h_c$ process. The branching fraction of $h_c\to K_S^0 K^+π^- +c.c.$ is measured to be $(7.3\pm0.8\pm1.8)\times10^{-4}$, where the first and second uncertainties are statistical and systematic, respectively. Combining with the exclusive decay width of $η_c\to K\bar{K}π$, our result indicates inconsistencies with both pQCD and NRQCD predictions.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Goldfish: An Efficient Federated Unlearning Framework
Authors:
Houzhe Wang,
Xiaojie Zhu,
Chi Chen,
Paulo Esteves-Veríssimo
Abstract:
With recent legislation on the right to be forgotten, machine unlearning has emerged as a crucial research area. It facilitates the removal of a user's data from federated trained machine learning models without the necessity for retraining from scratch. However, current machine unlearning algorithms are confronted with challenges of efficiency and validity. To address the above issues, we propose…
▽ More
With recent legislation on the right to be forgotten, machine unlearning has emerged as a crucial research area. It facilitates the removal of a user's data from federated trained machine learning models without the necessity for retraining from scratch. However, current machine unlearning algorithms are confronted with challenges of efficiency and validity. To address the above issues, we propose a new framework, named Goldfish. It comprises four modules: basic model, loss function, optimization, and extension. To address the challenge of low validity in existing machine unlearning algorithms, we propose a novel loss function. It takes into account the loss arising from the discrepancy between predictions and actual labels in the remaining dataset. Simultaneously, it takes into consideration the bias of predicted results on the removed dataset. Moreover, it accounts for the confidence level of predicted results. Additionally, to enhance efficiency, we adopt knowledge a distillation technique in the basic model and introduce an optimization module that encompasses the early termination mechanism guided by empirical risk and the data partition mechanism. Furthermore, to bolster the robustness of the aggregated model, we propose an extension module that incorporates a mechanism using adaptive distillation temperature to address the heterogeneity of user local data and a mechanism using adaptive weight to handle the variety in the quality of uploaded models. Finally, we conduct comprehensive experiments to illustrate the effectiveness of proposed approach.
△ Less
Submitted 23 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale
Authors:
Jinbin Huang,
Chen Chen,
Aditi Mishra,
Bum Chul Kwon,
Zhicheng Liu,
Chris Bryan
Abstract:
Generative image models have emerged as a promising technology to produce realistic images. Despite potential benefits, concerns grow about its misuse, particularly in generating deceptive images that could raise significant ethical, legal, and societal issues. Consequently, there is growing demand to empower users to effectively discern and comprehend patterns of AI-generated images. To this end,…
▽ More
Generative image models have emerged as a promising technology to produce realistic images. Despite potential benefits, concerns grow about its misuse, particularly in generating deceptive images that could raise significant ethical, legal, and societal issues. Consequently, there is growing demand to empower users to effectively discern and comprehend patterns of AI-generated images. To this end, we developed ASAP, an interactive visualization system that automatically extracts distinct patterns of AI-generated images and allows users to interactively explore them via various views. To uncover fake patterns, ASAP introduces a novel image encoder, adapted from CLIP, which transforms images into compact "distilled" representations, enriched with information for differentiating authentic and fake images. These representations generate gradients that propagate back to the attention maps of CLIP's transformer block. This process quantifies the relative importance of each pixel to image authenticity or fakeness, exposing key deceptive patterns. ASAP enables the at scale interactive analysis of these patterns through multiple, coordinated visualizations. This includes a representation overview with innovative cell glyphs to aid in the exploration and qualitative evaluation of fake patterns across a vast array of images, as well as a pattern view that displays authenticity-indicating patterns in images and quantifies their impact. ASAP supports the analysis of cutting-edge generative models with the latest architectures, including GAN-based models like proGAN and diffusion models like the latent diffusion model. We demonstrate ASAP's usefulness through two usage scenarios using multiple fake image detection benchmark datasets, revealing its ability to identify and understand hidden patterns in AI-generated images, especially in detecting fake human faces produced by diffusion-based techniques.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM
Authors:
Zhe Liu,
Chunyang Chen,
Junjie Wang,
Mengzhuo Chen,
Boyu Wu,
Yuekai Huang,
Jun Hu,
Qing Wang
Abstract:
Mobile apps have become indispensable for accessing and participating in various environments, especially for low-vision users. Users with visual impairments can use screen readers to read the content of each screen and understand the content that needs to be operated. Screen readers need to read the hint-text attribute in the text input component to remind visually impaired users what to fill in.…
▽ More
Mobile apps have become indispensable for accessing and participating in various environments, especially for low-vision users. Users with visual impairments can use screen readers to read the content of each screen and understand the content that needs to be operated. Screen readers need to read the hint-text attribute in the text input component to remind visually impaired users what to fill in. Unfortunately, based on our analysis of 4,501 Android apps with text inputs, over 0.76 of them are missing hint-text. These issues are mostly caused by developers' lack of awareness when considering visually impaired individuals. To overcome these challenges, we developed an LLM-based hint-text generation model called HintDroid, which analyzes the GUI information of input components and uses in-context learning to generate the hint-text. To ensure the quality of hint-text generation, we further designed a feedback-based inspection mechanism to further adjust hint-text. The automated experiments demonstrate the high BLEU and a user study further confirms its usefulness. HintDroid can not only help visually impaired individuals, but also help ordinary people understand the requirements of input components. HintDroid demo video: https://youtu.be/FWgfcctRbfI.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Observation of Kosterlitz-Thouless Metal-to-Insulator Transition in Quantum Anomalous Hall Insulators
Authors:
Ruoxi Zhang,
Yi-Fan Zhao,
Ling-Jie Zhou,
Deyi Zhuo,
Zi-Jie Yan,
Chao-Xing Liu,
Moses H. W. Chan,
Chui-Zhen Chen,
Cui-Zu Chang
Abstract:
Interlayer exchange coupling (IEC) between two magnetic layers sandwiched by a nonmagnetic spacer layer plays a critical role in shaping the magnetic properties of such heterostructures. The quantum anomalous Hall (QAH) effect has been realized in a structure composed of two magnetically doped topological insulator (TI) layers separated by an undoped TI layer. The quantized Hall conductance observ…
▽ More
Interlayer exchange coupling (IEC) between two magnetic layers sandwiched by a nonmagnetic spacer layer plays a critical role in shaping the magnetic properties of such heterostructures. The quantum anomalous Hall (QAH) effect has been realized in a structure composed of two magnetically doped topological insulator (TI) layers separated by an undoped TI layer. The quantized Hall conductance observed in this sandwich heterostructure originates from the combined contribution of the top and bottom surface states. In this work, we employ molecular beam epitaxy to synthesize a series of magnetic TI sandwiches with varying thicknesses of the middle undoped TI layer. The well-quantized QAH effect is observed in all these samples and its critical behavior is modulated by the IEC between the top and bottom magnetic TI layers. Near the plateau phase transition (PPT), we find that thinner QAH samples exhibit a two-dimensional critical metal behavior with nearly temperature-independent longitudinal resistance, whereas thicker QAH samples behave as a three-dimensional insulator with reduced longitudinal resistance at higher temperatures. The IEC-induced critical-metal-to-insulator transition in the QAH PPT regime can be understood through a two-channel Chalker-Coddington network model by tuning inter-channel tunneling. The agreement between experiment and theory strongly supports the QAH PPT within the Kosterlitz-Thouless framework, where the critical metal and disordered insulator phases exist in bound and unbound states of vortex-antivortex pairs, respectively.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Search for $C$-even states decaying to $D_{s}^{\pm}D_{s}^{*\mp}$ with masses between $4.08$ and $4.32$ $\rm GeV/{\it c}^{2}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Six $C$-even states, denoted as $X$, with quantum numbers $J^{PC}=0^{-+}$, $1^{\pm+}$, or $2^{\pm+}$, are searched for via the $e^+e^-\toγD_{s}^{\pm}D_{s}^{*\mp}$ process using $(1667.39\pm8.84)~\mathrm{pb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII storage ring at center-of-mass energy of $\sqrt{s}=(4681.92\pm0.30)~\mathrm{MeV}$. No statistically s…
▽ More
Six $C$-even states, denoted as $X$, with quantum numbers $J^{PC}=0^{-+}$, $1^{\pm+}$, or $2^{\pm+}$, are searched for via the $e^+e^-\toγD_{s}^{\pm}D_{s}^{*\mp}$ process using $(1667.39\pm8.84)~\mathrm{pb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII storage ring at center-of-mass energy of $\sqrt{s}=(4681.92\pm0.30)~\mathrm{MeV}$. No statistically significant signal is observed in the mass range from $4.08$ to $4.32~\mathrm{GeV}/c^{2}$. The upper limits of $σ[e^+e^-\toγX]\cdot \mathcal{B}[X \to D_{s}^{\pm}D_{s}^{*\mp}]$ at a $90\%$ confidence level are determined.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation
Authors:
Shanshan Feng,
Haoming Lyu,
Caishun Chen,
Yew-Soon Ong
Abstract:
Next Point-of-interest (POI) recommendation provides valuable suggestions for users to explore their surrounding environment. Existing studies rely on building recommendation models from large-scale users' check-in data, which is task-specific and needs extensive computational resources. Recently, the pretrained large language models (LLMs) have achieved significant advancements in various NLP tas…
▽ More
Next Point-of-interest (POI) recommendation provides valuable suggestions for users to explore their surrounding environment. Existing studies rely on building recommendation models from large-scale users' check-in data, which is task-specific and needs extensive computational resources. Recently, the pretrained large language models (LLMs) have achieved significant advancements in various NLP tasks and have also been investigated for recommendation scenarios. However, the generalization abilities of LLMs still are unexplored to address the next POI recommendations, where users' geographical movement patterns should be extracted. Although there are studies that leverage LLMs for next-item recommendations, they fail to consider the geographical influence and sequential transitions. Hence, they cannot effectively solve the next POI recommendation task. To this end, we design novel prompting strategies and conduct empirical studies to assess the capability of LLMs, e.g., ChatGPT, for predicting a user's next check-in. Specifically, we consider several essential factors in human movement behaviors, including user geographical preference, spatial distance, and sequential transitions, and formulate the recommendation task as a ranking problem. Through extensive experiments on two widely used real-world datasets, we derive several key findings. Empirical evaluations demonstrate that LLMs have promising zero-shot recommendation abilities and can provide accurate and reasonable predictions. We also reveal that LLMs cannot accurately comprehend geographical context information and are sensitive to the order of presentation of candidate POIs, which shows the limitations of LLMs and necessitates further research on robust human mobility reasoning mechanisms.
△ Less
Submitted 22 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model
Authors:
Qinfeng Zhu,
Yuanzhi Cai,
Yuan Fang,
Yihan Yang,
Cheng Chen,
Lei Fan,
Anh Nguyen
Abstract:
High-resolution remotely sensed images pose a challenge for commonly used semantic segmentation methods such as Convolutional Neural Network (CNN) and Vision Transformer (ViT). CNN-based methods struggle with handling such high-resolution images due to their limited receptive field, while ViT faces challenges in handling long sequences. Inspired by Mamba, which adopts a State Space Model (SSM) to…
▽ More
High-resolution remotely sensed images pose a challenge for commonly used semantic segmentation methods such as Convolutional Neural Network (CNN) and Vision Transformer (ViT). CNN-based methods struggle with handling such high-resolution images due to their limited receptive field, while ViT faces challenges in handling long sequences. Inspired by Mamba, which adopts a State Space Model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed images, named Samba. Samba utilizes an encoder-decoder architecture, with Samba blocks serving as the encoder for efficient multi-level semantic information extraction, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets, comparing its performance against top-performing CNN and ViT methods. The results reveal that Samba achieved unparalleled performance on commonly used remote sensing datasets for semantic segmentation. Our proposed Samba demonstrates for the first time the effectiveness of SSM in semantic segmentation of remotely sensed images, setting a new benchmark in performance for Mamba-based techniques in this specific application. The source code and baseline implementations are available at https://github.com/zhuqinfeng1999/Samba.
△ Less
Submitted 11 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Incorporating Domain Differential Equations into Graph Convolutional Networks to Lower Generalization Discrepancy
Authors:
Yue Sun,
Chao Chen,
Yuesheng Xu,
Sihong Xie,
Rick S. Blum,
Parv Venkitasubramaniam
Abstract:
Ensuring both accuracy and robustness in time series prediction is critical to many applications, ranging from urban planning to pandemic management. With sufficient training data where all spatiotemporal patterns are well-represented, existing deep-learning models can make reasonably accurate predictions. However, existing methods fail when the training data are drawn from different circumstances…
▽ More
Ensuring both accuracy and robustness in time series prediction is critical to many applications, ranging from urban planning to pandemic management. With sufficient training data where all spatiotemporal patterns are well-represented, existing deep-learning models can make reasonably accurate predictions. However, existing methods fail when the training data are drawn from different circumstances (e.g., traffic patterns on regular days) compared to test data (e.g., traffic patterns after a natural disaster). Such challenges are usually classified under domain generalization. In this work, we show that one way to address this challenge in the context of spatiotemporal prediction is by incorporating domain differential equations into Graph Convolutional Networks (GCNs). We theoretically derive conditions where GCNs incorporating such domain differential equations are robust to mismatched training and testing data compared to baseline domain agnostic models. To support our theory, we propose two domain-differential-equation-informed networks called Reaction-Diffusion Graph Convolutional Network (RDGCN), which incorporates differential equations for traffic speed evolution, and Susceptible-Infectious-Recovered Graph Convolutional Network (SIRGCN), which incorporates a disease propagation model. Both RDGCN and SIRGCN are based on reliable and interpretable domain differential equations that allow the models to generalize to unseen patterns. We experimentally show that RDGCN and SIRGCN are more robust with mismatched testing data than the state-of-the-art deep learning methods.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation
Authors:
Yunze Liu,
Changxi Chen,
Chenjing Ding,
Li Yi
Abstract:
Humanoid Reaction Synthesis is pivotal for creating highly interactive and empathetic robots that can seamlessly integrate into human environments, enhancing the way we live, work, and communicate. However, it is difficult to learn the diverse interaction patterns of multiple humans and generate physically plausible reactions. The kinematics-based approaches face challenges, including issues like…
▽ More
Humanoid Reaction Synthesis is pivotal for creating highly interactive and empathetic robots that can seamlessly integrate into human environments, enhancing the way we live, work, and communicate. However, it is difficult to learn the diverse interaction patterns of multiple humans and generate physically plausible reactions. The kinematics-based approaches face challenges, including issues like floating feet, sliding, penetration, and other problems that defy physical plausibility. The existing physics-based method often relies on kinematics-based methods to generate reference states, which struggle with the challenges posed by kinematic noise during action execution. Constrained by their reliance on diffusion models, these methods are unable to achieve real-time inference. In this work, we propose a Forward Dynamics Guided 4D Imitation method to generate physically plausible human-like reactions. The learned policy is capable of generating physically plausible and human-like reactions in real-time, significantly improving the speed(x33) and quality of reactions compared with the existing method. Our experiments on the InterHuman and Chi3D datasets, along with ablation studies, demonstrate the effectiveness of our approach.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Adapting CSI-Guided Imaging Across Diverse Environments: An Experimental Study Leveraging Continuous Learning
Authors:
Cheng Chen,
Shoki Ohta,
Takayuki Nishio,
Mohamed Wahib
Abstract:
This study explores the feasibility of adapting CSI-guided imaging across varied environments. Focusing on continuous model learning through continuous updates, we investigate CSI-Imager's adaptability in dynamically changing settings, specifically transitioning from an office to an industrial environment. Unlike traditional approaches that may require retraining for new environments, our experime…
▽ More
This study explores the feasibility of adapting CSI-guided imaging across varied environments. Focusing on continuous model learning through continuous updates, we investigate CSI-Imager's adaptability in dynamically changing settings, specifically transitioning from an office to an industrial environment. Unlike traditional approaches that may require retraining for new environments, our experimental study aims to validate the potential of CSI-guided imaging to maintain accurate imaging performance through Continuous Learning (CL). By conducting experiments across different scenarios and settings, this work contributes to understanding the limitations and capabilities of existing CSI-guided imaging systems in adapting to new environmental contexts.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Towards Memorization-Free Diffusion Models
Authors:
Chen Chen,
Daochang Liu,
Chang Xu
Abstract:
Pretrained diffusion models and their outputs are widely accessible due to their exceptional capacity for synthesizing high-quality images and their open-source nature. The users, however, may face litigation risks owing to the models' tendency to memorize and regurgitate training data during inference. To address this, we introduce Anti-Memorization Guidance (AMG), a novel framework employing thr…
▽ More
Pretrained diffusion models and their outputs are widely accessible due to their exceptional capacity for synthesizing high-quality images and their open-source nature. The users, however, may face litigation risks owing to the models' tendency to memorize and regurgitate training data during inference. To address this, we introduce Anti-Memorization Guidance (AMG), a novel framework employing three targeted guidance strategies for the main causes of memorization: image and caption duplication, and highly specific user prompts. Consequently, AMG ensures memorization-free outputs while maintaining high image quality and text alignment, leveraging the synergy of its guidance methods, each indispensable in its own right. AMG also features an innovative automatic detection system for potential memorization during each step of inference process, allows selective application of guidance strategies, minimally interfering with the original sampling process to preserve output utility. We applied AMG to pretrained Denoising Diffusion Probabilistic Models (DDPM) and Stable Diffusion across various generation tasks. The results demonstrate that AMG is the first approach to successfully eradicates all instances of memorization with no or marginal impacts on image quality and text-alignment, as evidenced by FID and CLIP scores.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Scaling Crystal Structure Relaxation with a Universal Trustworthy Deep Generative Model
Authors:
Ziduo Yang,
Yiming Zhao,
Xiaoqing Liu,
Xiuying Zhang,
Yifan Li,
Qiujie Lyu,
Calvin Yu-Chian Chen,
Lei Shen
Abstract:
The evolution of AI and high-throughput technologies has boosted a rapid increase in the number of new materials, challenging our computational ability to comprehensively analyze their properties. Relaxed crystal structures often serve as the foundational basis for further property calculations. However, determining equilibrium structures traditionally involves computationally expensive iterative…
▽ More
The evolution of AI and high-throughput technologies has boosted a rapid increase in the number of new materials, challenging our computational ability to comprehensively analyze their properties. Relaxed crystal structures often serve as the foundational basis for further property calculations. However, determining equilibrium structures traditionally involves computationally expensive iterative calculations. Here, we develop DeepRelax, an efficient deep generative model designed for rapid structural relaxation without any iterative process. DeepRelax learns the equilibrium structural distribution, enabling it to predict relaxed structures directly from their unrelaxed counterparts. The ability to perform structural relaxation in just a few hundred milliseconds per structure, combined with the scalability of parallel processing, makes DeepRelax particularly useful for large-scale virtual screening. To demonstrate the universality of DeepRelax, we benchmark it against three different databases of X-Mn-O oxides, Materials Project, and Computational 2D Materials Database with various types of materials. In these tests, DeepRelax exhibits both high accuracy and efficiency in structural relaxation, as further validated by DFT calculations. Finally, we integrate DeepRelax with an implementation of uncertainty quantification, enhancing its reliability and trustworthiness in material discovery. This work provides an efficient and trustworthy method to significantly accelerate large-scale computations, offering substantial advancements in the field of computational materials science.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Authors:
Ye Liu,
Jixuan He,
Wanhua Li,
Junsik Kim,
Donglai Wei,
Hanspeter Pfister,
Chang Wen Chen
Abstract:
Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries. Most existing VTG models are built upon frame-wise final-layer CLIP features, aided by additional temporal backbones (e.g., SlowFast) with sophisticated temporal reasoning mechanisms. In this work, we claim that CLIP itself already show…
▽ More
Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries. Most existing VTG models are built upon frame-wise final-layer CLIP features, aided by additional temporal backbones (e.g., SlowFast) with sophisticated temporal reasoning mechanisms. In this work, we claim that CLIP itself already shows great potential for fine-grained spatial-temporal modeling, as each layer offers distinct yet useful information under different granularity levels. Motivated by this, we propose Reversed Recurrent Tuning ($R^2$-Tuning), a parameter- and memory-efficient transfer learning framework for video temporal grounding. Our method learns a lightweight $R^2$ Block containing only 1.5% of the total parameters to perform progressive spatial-temporal modeling. Starting from the last layer of CLIP, $R^2$ Block recurrently aggregates spatial features from earlier layers, then refines temporal correlation conditioning on the given query, resulting in a coarse-to-fine scheme. $R^2$-Tuning achieves state-of-the-art performance across three VTG tasks (i.e., moment retrieval, highlight detection, and video summarization) on six public benchmarks (i.e., QVHighlights, Charades-STA, Ego4D-NLQ, TACoS, YouTube Highlights, and TVSum) even without the additional backbone, demonstrating the significance and effectiveness of the proposed scheme. Our code is available at https://github.com/yeliudev/R2-Tuning.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
DSFNet: Learning Disentangled Scenario Factorization for Multi-Scenario Route Ranking
Authors:
Jiahao Yu,
Yihai Duan,
Longfei Xu,
Chao Chen,
Shuliang Liu,
Li Chen,
Kaikui Liu,
Fan Yang,
Ning Guo
Abstract:
Multi-scenario route ranking (MSRR) is crucial in many industrial mapping systems. However, the industrial community mainly adopts interactive interfaces to encourage users to select pre-defined scenarios, which may hinder the downstream ranking performance. In addition, in the academic community, the multi-scenario ranking works only come from other fields, and there are no works specifically foc…
▽ More
Multi-scenario route ranking (MSRR) is crucial in many industrial mapping systems. However, the industrial community mainly adopts interactive interfaces to encourage users to select pre-defined scenarios, which may hinder the downstream ranking performance. In addition, in the academic community, the multi-scenario ranking works only come from other fields, and there are no works specifically focusing on route data due to lacking a publicly available MSRR dataset. Moreover, all the existing multi-scenario works still fail to address the three specific challenges of MSRR simultaneously, i.e. explosion of scenario number, high entanglement, and high-capacity demand. Different from the prior, to address MSRR, our key idea is to factorize the complicated scenario in route ranking into several disentangled factor scenario patterns. Accordingly, we propose a novel method, Disentangled Scenario Factorization Network (DSFNet), which flexibly composes scenario-dependent parameters based on a high-capacity multi-factor-scenario-branch structure. Then, a novel regularization is proposed to induce the disentanglement of factor scenarios. Furthermore, two extra novel techniques, i.e. scenario-aware batch normalization and scenario-aware feature filtering, are developed to improve the network awareness of scenario representation. Additionally, to facilitate MSRR research in the academic community, we propose MSDR, the first large-scale publicly available annotated industrial Multi-Scenario Driving Route dataset. Comprehensive experimental results demonstrate the superiority of our DSFNet, which has been successfully deployed in AMap to serve the major online traffic.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
GDA: Generalized Diffusion for Robust Test-time Adaptation
Authors:
Yun-Yun Tsai,
Fu-Chen Chen,
Albert Y. C. Chen,
Junfeng Yang,
Che-Chun Su,
Min Sun,
Cheng-Hao Kuo
Abstract:
Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the mod…
▽ More
Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the model's weights. Unfortunately, those studies have primarily focused on pixel-level corruptions, thereby lacking the generalization to adapt to a broader range of OOD types. We introduce Generalized Diffusion Adaptation (GDA), a novel diffusion-based test-time adaptation method robust against diverse OOD types. Specifically, GDA iteratively guides the diffusion by applying a marginal entropy loss derived from the model, in conjunction with style and content preservation losses during the reverse sampling process. In other words, GDA considers the model's output behavior with the semantic information of the samples as a whole, which can reduce ambiguity in downstream tasks during the generation process. Evaluation across various popular model architectures and OOD benchmarks shows that GDA consistently outperforms prior work on diffusion-driven adaptation. Notably, it achieves the highest classification accuracy improvements, ranging from 4.4\% to 5.02\% on ImageNet-C and 2.5\% to 7.4\% on Rendition, Sketch, and Stylized benchmarks. This performance highlights GDA's generalization to a broader range of OOD benchmarks.
△ Less
Submitted 2 April, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
Discovery of optically emitting circumgalactic nebulae around the majority of UV-luminous quasars at intermediate redshift
Authors:
Sean D. Johnson,
Zhuoqi Will Liu,
Jennifer I. Li,
Joop Schaye,
Jenny E. Greene,
Sebastiano Cantalupo,
Gwen C. Rudie,
Zhijie Qu,
Hsiao-Wen Chen,
Marc Rafelski,
Sowgat Muzahid,
Mandy C. Chen,
Thierry Contini,
Wolfram Kollatschny,
Nishant Mishra,
Michael Rauch,
Patrick Petitjean,
Fakhri S. Zahedy
Abstract:
We report the discovery of large ionized, [O II] emitting circumgalactic nebulae around the majority of thirty UV luminous quasars at $z=0.4-1.4$ observed with deep, wide-field integral field spectroscopy (IFS) with the Multi-Unit Spectroscopy Explorer (MUSE) by the Cosmic Ultraviolet Baryon Survey (CUBS) and MUSE Quasar Blind Emitters Survey (MUSEQuBES). Among the 30 quasars, seven (23%) exhibit…
▽ More
We report the discovery of large ionized, [O II] emitting circumgalactic nebulae around the majority of thirty UV luminous quasars at $z=0.4-1.4$ observed with deep, wide-field integral field spectroscopy (IFS) with the Multi-Unit Spectroscopy Explorer (MUSE) by the Cosmic Ultraviolet Baryon Survey (CUBS) and MUSE Quasar Blind Emitters Survey (MUSEQuBES). Among the 30 quasars, seven (23%) exhibit [O II] emitting nebulae with major axis sizes greater than 100 kpc, twenty greater than 50 kpc (67%), and 27 (90%) greater than 20 kpc. Such large, optically emitting nebulae indicate that cool, dense, and metal-enriched circumgalactic gas is common in the halos of luminous quasars at intermediate redshift. Several of the largest nebulae exhibit morphologies that suggest interaction-related origins. We detect no correlation between the sizes and cosmological dimming corrected surface brightnesses of the nebulae and quasar redshift, luminosity, black hole mass, or radio-loudness, but find a tentative correlation between the nebulae and rest-frame [O II] equivalent width in the quasar spectra. This potential trend suggests a relationship between ISM content and gas reservoirs on CGM scales. The [O II]-emitting nebulae around the $z\approx1$ quasars are smaller and less common than Ly$α$ nebulae around $z\approx3$ quasars. These smaller sizes can be explained if the outer regions of the Ly$α$ halos arise from scattering in more neutral gas, by evolution in the cool CGM content of quasar host halos, by lower-than-expected metallicities on $\gtrsim50$ kpc scales around $z\approx1$ quasars, or by changes in quasar episodic lifetimes between $z=3$ and $1$.
△ Less
Submitted 3 April, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.