Search | arXiv e-print repository

arXiv:2407.07640 [pdf, other]

Single Crystal Diffuse Neutron Scattering Study of the Dipole-Octupole Quantum Spin Ice Candidate Ce$_2$Zr$_2$O$_7$: No Apparent Octupolar Correlations Above $T = 0.05$ K

Authors: E. M. Smith, R. Schäfer, J. Dudemaine, B. Placke, B. Yuan, Z. Morgan, F. Ye, R. Moessner, O. Benton, A. D. Bianchi, B. D. Gaulin

Abstract: The insulating magnetic pyrochlore Ce$_2$Zr$_2$O$_7$ has attracted much attention as a quantum spin ice candidate with dipole-octupole character that permits spin ice phases based not only on magnetic dipole moments but also allows for even-more-exotic octupole-based spin ice phases. This work reports low-temperature neutron diffraction measurements on single crystal Ce$_2$Zr$_2$O$_7$ with $Q$-cov… ▽ More The insulating magnetic pyrochlore Ce$_2$Zr$_2$O$_7$ has attracted much attention as a quantum spin ice candidate with dipole-octupole character that permits spin ice phases based not only on magnetic dipole moments but also allows for even-more-exotic octupole-based spin ice phases. This work reports low-temperature neutron diffraction measurements on single crystal Ce$_2$Zr$_2$O$_7$ with $Q$-coverage both at low $Q$ where the magnetic form factor for dipoles is near maximal and at high $Q$ covering the region where the magnetic form factor for Ce$^{3+}$ octupoles is near maximal. This study was motivated by recent powder neutron diffraction studies of other Ce-based dipole-octupole pyrochlores, Ce$_2$Sn$_2$O$_7$ and Ce$_2$Hf$_2$O$_7$, which each showed temperature-dependent diffuse diffraction at high $Q$ that was interpreted as arising from octupolar correlations. Our measurements use an optimized single crystal diffuse scattering instrument that allows us to screen against strong single crystal Bragg scattering in Ce$_2$Zr$_2$O$_7$. The temperature-difference neutron diffraction reveals a low-$Q$ peak consistent with dipolar spin ice correlations. For larger $Q$, the temperature-difference neutron diffraction shows an alternation between positive and negative net intensity. These features are qualitatively consistent with the corresponding numerical-linked-cluster (NLC) calculations using pseudospin interaction parameters reported for Ce$_2$Zr$_2$O$_7$, Ce$_2$Sn$_2$O$_7$, or Ce$_2$Hf$_2$O$_7$. Importantly, neither the measured data nor any of the NLC calculations show increased scattering at high $Q$ resulting from octupolar correlations. We conclude that at the lowest attainable temperatures for our measurement ($T = 0.05$ K), octupolar correlations are not present in Ce$_2$Zr$_2$O$_7$ on the level of our observation threshold of $\sim$ 0.1$\%$ of the low-$Q$ dipole scattering. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.03680 [pdf, other]

The condition for constructing a finite element from a superspline

Authors: Jun Hu, Ting Lin, Qingyu Wu, Beihui Yuan

Abstract: This paper addresses the sufficient and necessary conditions for constructing $C^r$ conforming finite element spaces from a superspline spaces on general simplicial triangulations. We introduce the concept of extendability for the pre-element spaces, which encompasses both the superspline space and the finite element space. By examining the extendability condition for both types of spaces, we prov… ▽ More This paper addresses the sufficient and necessary conditions for constructing $C^r$ conforming finite element spaces from a superspline spaces on general simplicial triangulations. We introduce the concept of extendability for the pre-element spaces, which encompasses both the superspline space and the finite element space. By examining the extendability condition for both types of spaces, we provide an answer to the conditions regarding the construction. A corollary of our results is that constructing $C^r$ conforming elements in $d$ dimensions should in general require an extra $C^{2^{s}r}$ continuity on $s$-codimensional simplices, and the polynomial degree is at least $(2^d r + 1)$. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 22 pages, 4 figures

MSC Class: 65N30; 65D07

arXiv:2407.02688 [pdf, other]

Funny-Valen-Tine: Planning Solution Distribution Enhances Machine Abstract Reasoning Ability

Authors: Ruizhuo Song, Beiming Yuan

Abstract: Visual abstract reasoning problems hold immense importance in the field of image processing. Both Bongard-Logo and Raven's Progressive Matrices (RPM) belong to this domain, with Bongard-Logo categorized as image clustering reasoning and RPM involving image progression pattern reasoning. This paper introduces Valen, a novel baseline model under probabilistic highlighting models. Valen exhibits rema… ▽ More Visual abstract reasoning problems hold immense importance in the field of image processing. Both Bongard-Logo and Raven's Progressive Matrices (RPM) belong to this domain, with Bongard-Logo categorized as image clustering reasoning and RPM involving image progression pattern reasoning. This paper introduces Valen, a novel baseline model under probabilistic highlighting models. Valen exhibits remarkable performance in solving both RPM and Bongard-Logo problems, offering a versatile solution. Our investigation delves into the underlying mechanisms of probability-highlighting solvers, realizing they approximate solutions to reasoning problem instances as distributions delineated by primary and auxiliary samples. We propose that the learning objective is not the distribution of correct solutions but one defined by both primary and auxiliary samples. To bridge discrepancies, we introduced the Tine method, an adversarial learning-based approach to assist Valen in estimating a solution distribution closer to the correct one, albeit with issues like unstable training. Reflecting on Tine, we propose modeling the sample distribution of reasoning problems as a mixture of Gaussian distributions, leading to the Funny method. This effectively enables Valen to capture the true form of the correct solution distribution. Furthermore, we designed the SBR method to model the distribution of progressive patterns representation similarly. Overall, the Funny, Tine, and SBR methods significantly improve Valen's performance, providing new ideas and methods for studying visual abstract reasoning problems. △ Less

Submitted 7 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: 14 pages, 20 figures, 3 tables

arXiv:2406.04333 [pdf, other]

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Authors: Yang Sui, Yanyu Li, Anil Kag, Yerlan Idelbayev, Junli Cao, Ju Hu, Dhritiman Sagar, Bo Yuan, Sergey Tulyakov, Jian Ren

Abstract: Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this wor… ▽ More Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to 1.99 bits, achieving a model with 7.9X smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project Page: https://snap-research.github.io/BitsFusion

arXiv:2406.03102 [pdf, other]

DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

Authors: Bo Xia, Yilun Kong, Yongzhe Chang, Bo Yuan, Zhiheng Li, Xueqian Wang, Bin Liang

Abstract: Classic reinforcement learning (RL) frequently confronts challenges in tasks involving delays, which cause a mismatch between received observations and subsequent actions, thereby deviating from the Markov assumption. Existing methods usually tackle this issue with end-to-end solutions using state augmentation. However, these black-box approaches often involve incomprehensible processes and redund… ▽ More Classic reinforcement learning (RL) frequently confronts challenges in tasks involving delays, which cause a mismatch between received observations and subsequent actions, thereby deviating from the Markov assumption. Existing methods usually tackle this issue with end-to-end solutions using state augmentation. However, these black-box approaches often involve incomprehensible processes and redundant information in the information states, causing instability and potentially undermining the overall performance. To alleviate the delay challenges in RL, we propose $\textbf{DEER (Delay-resilient Encoder-Enhanced RL)}$, a framework designed to effectively enhance the interpretability and address the random delay issues. DEER employs a pretrained encoder to map delayed states, along with their variable-length past action sequences resulting from different delays, into hidden states, which is trained on delay-free environment datasets. In a variety of delayed scenarios, the trained encoder can seamlessly integrate with standard RL algorithms without requiring additional modifications and enhance the delay-solving capability by simply adapting the input dimension of the original algorithms. We evaluate DEER through extensive experiments on Gym and Mujoco environments. The results confirm that DEER is superior to state-of-the-art RL algorithms in both constant and random delay settings. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.18639 [pdf, other]

Improving Speech Decoding from ECoG with Self-Supervised Pretraining

Authors: Brian A. Yuan, Joseph G. Makin

Abstract: Recent work on intracranial brain-machine interfaces has demonstrated that spoken speech can be decoded with high accuracy, essentially by treating the problem as an instance of supervised learning and training deep neural networks to map from neural activity to text. However, such networks pay for their expressiveness with very large numbers of labeled data, a requirement that is particularly bur… ▽ More Recent work on intracranial brain-machine interfaces has demonstrated that spoken speech can be decoded with high accuracy, essentially by treating the problem as an instance of supervised learning and training deep neural networks to map from neural activity to text. However, such networks pay for their expressiveness with very large numbers of labeled data, a requirement that is particularly burdensome for invasive neural recordings acquired from human patients. On the other hand, these patients typically produce speech outside of the experimental blocks used for training decoders. Making use of such data, and data from other patients, to improve decoding would ease the burden of data collection -- especially onerous for dys- and anarthric patients. Here we demonstrate that this is possible, by reengineering wav2vec -- a simple, self-supervised, fully convolutional model that learns latent representations of audio using a noise-contrastive loss -- for electrocorticographic (ECoG) data. We train this model on unlabelled ECoG recordings, and subsequently use it to transform ECoG from labeled speech sessions into wav2vec's representation space, before finally training a supervised encoder-decoder to map these representations to text. We experiment with various numbers of labeled blocks; for almost all choices, the new representations yield superior decoding performance to the original ECoG data, and in no cases do they yield worse. Performance can also be improved in some cases by pretraining wav2vec on another patient's data. In the best cases, wav2vec's representations decrease word error rates over the original data by upwards of 50%. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.16640 [pdf, other]

A Survey of Multimodal Large Language Model from A Data-centric Perspective

Authors: Tianyi Bai, Hao Liang, Binwang Wan, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Conghui He, Binhang Yuan, Wentao Zhang

Abstract: Human beings perceive the world through diverse senses such as sight, smell, hearing, and touch. Similarly, multimodal large language models (MLLMs) enhance the capabilities of traditional large language models by integrating and processing data from multiple modalities including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these m… ▽ More Human beings perceive the world through diverse senses such as sight, smell, hearing, and touch. Similarly, multimodal large language models (MLLMs) enhance the capabilities of traditional large language models by integrating and processing data from multiple modalities including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these models. In this survey, we comprehensively review the literature on MLLMs from a data-centric perspective. Specifically, we explore methods for preparing multimodal data during the pretraining and adaptation phases of MLLMs. Additionally, we analyze the evaluation methods for datasets and review benchmarks for evaluating MLLMs. Our survey also outlines potential future research directions. This work aims to provide researchers with a detailed understanding of the data-driven aspects of MLLMs, fostering further exploration and innovation in this field. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.11703 [pdf, other]

QComp: A QSAR-Based Data Completion Framework for Drug Discovery

Authors: Bingjia Yang, Yunsie Chung, Archer Y. Yang, Bo Yuan, Xiang Yu

Abstract: In drug discovery, in vitro and in vivo experiments reveal biochemical activities related to the efficacy and toxicity of compounds. The experimental data accumulate into massive, ever-evolving, and sparse datasets. Quantitative Structure-Activity Relationship (QSAR) models, which predict biochemical activities using only the structural information of compounds, face challenges in integrating the… ▽ More In drug discovery, in vitro and in vivo experiments reveal biochemical activities related to the efficacy and toxicity of compounds. The experimental data accumulate into massive, ever-evolving, and sparse datasets. Quantitative Structure-Activity Relationship (QSAR) models, which predict biochemical activities using only the structural information of compounds, face challenges in integrating the evolving experimental data as studies progress. We develop QSAR-Complete (QComp), a data completion framework to address this issue. Based on pre-existing QSAR models, QComp utilizes the correlation inherent in experimental data to enhance prediction accuracy across various tasks. Moreover, QComp emerges as a promising tool for guiding the optimal sequence of experiments by quantifying the reduction in statistical uncertainty for specific endpoints, thereby aiding in rational decision-making throughout the drug discovery process. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.09418 [pdf, other]

Highly Tunable Ru-dimer Molecular Orbital State in 6H-perovskite Ba$_3$MRu$_2$O$_9$

Authors: Bo Yuan, Beom Hyun Kim, Qiang Chen, Daniel Dobrowolski, Monika Azmanska, G. M. Luke, Shiyu Fan, Valentina Bisogni, Jonathan Pelliciari, J. P. Clancy

Abstract: Molecular orbital (MO) systems with clusters of heavy transition metal (TM) ions are one of the most important classes of model materials for studying the interplay between local physics and effects of itinerancy. Despite a large number of candidates identified in the family of 4d TM materials, an understanding of their physics from competing \textit{microscopic} energy scales is still missing. We… ▽ More Molecular orbital (MO) systems with clusters of heavy transition metal (TM) ions are one of the most important classes of model materials for studying the interplay between local physics and effects of itinerancy. Despite a large number of candidates identified in the family of 4d TM materials, an understanding of their physics from competing \textit{microscopic} energy scales is still missing. We bridge this gap by reporting the first resonant inelastic X-ray scattering (RIXS) measurement on a well-known series of Ru dimer systems with a 6H-perovskite structure, Ba$_3$MRu$_2$O$_9$ (M$^{3+}$=In$^{3+}$, Y$^{3+}$, La$^{3+}$). Our RIXS measurements reveal an extremely fragile MO state in these Ru dimer compounds, evidenced by an abrupt change in the RIXS spectrum accompanying a tiny change in the local structure tuned by the M-site ion. By modelling the RIXS spectra, we attribute the enhanced electronic instability in Ba$_3$MRu$_2$O$_9$ to the combined effect of a large hopping and a small spin-orbit coupling in the Ru dimers. The unique combination of energy scales uncovered in the present study make Ru MO systems ideal model systems for studying quantum phase transitions with molecular orbitals. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 7 pages, 3 figures, Supplemental Materials available upon request

arXiv:2405.09207 [pdf, other]

An Exact Theory of Causal Emergence for Linear Stochastic Iteration Systems

Authors: Kaiwei Liu, Bing Yuan, Jiang Zhang

Abstract: After coarse-graining a complex system, the dynamics of its macro-state may exhibit more pronounced causal effects than those of its micro-state. This phenomenon, known as causal emergence, is quantified by the indicator of effective information. However, two challenges confront this theory: the absence of well-developed frameworks in continuous stochastic dynamical systems and the reliance on coa… ▽ More After coarse-graining a complex system, the dynamics of its macro-state may exhibit more pronounced causal effects than those of its micro-state. This phenomenon, known as causal emergence, is quantified by the indicator of effective information. However, two challenges confront this theory: the absence of well-developed frameworks in continuous stochastic dynamical systems and the reliance on coarse-graining methodologies. In this study, we introduce an exact theoretic framework for causal emergence within linear stochastic iteration systems featuring continuous state spaces and Gaussian noise. Building upon this foundation, we derive an analytical expression for effective information across general dynamics and identify optimal linear coarse-graining strategies that maximize the degree of causal emergence when the dimension averaged uncertainty eliminated by coarse-graining has an upper bound. Our investigation reveals that the maximal causal emergence and the optimal coarse-graining methods are primarily determined by the principal eigenvalues and eigenvectors of the dynamic system's parameter matrix, with the latter not being unique. To validate our propositions, we apply our analytical models to three simplified physical systems, comparing the outcomes with numerical simulations, and consistently achieve congruent results. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.06007 [pdf]

Anomalous properties of spark plasma sintered boron nitride solids

Authors: Abhijit Biswas, Peter Serles, Gustavo A. Alvarez, Jesse Schimpf, Michel Hache, Jonathan Kong, Pedro Guerra Demingos, Bo Yuan, Tymofii S. Pieshkov, Chenxi Li, Anand B. Puthirath, Bin Gao, Tia Gray, Xiang Zhang, Jishnu Murukeshan, Robert Vajtai, Pengcheng Dai, Chandra Veer Singh, Jane Howe, Yu Zou, Lane W. Martin, James Patrick Clancy, Zhiting Tian, Tobin Filleter, Pulickel M. Ajayan

Abstract: Hexagonal boron nitride (h-BN) is brittle, however, its atomic-scale structural engineering can lead to unprecedented physical properties. Here we report the bulk synthesis of high-density crystalline h-BN solids by using high-temperature spark plasma sintering (SPS) of micron size h-BN powders. In addition to the high mechanical strength and ductile response of such materials, we have obtained an… ▽ More Hexagonal boron nitride (h-BN) is brittle, however, its atomic-scale structural engineering can lead to unprecedented physical properties. Here we report the bulk synthesis of high-density crystalline h-BN solids by using high-temperature spark plasma sintering (SPS) of micron size h-BN powders. In addition to the high mechanical strength and ductile response of such materials, we have obtained anomalous values of dielectric constant beyond theoretical limits, high thermal conductivity, and exceptional neutron radiation shielding capability. Through exhaustive characterizations we reveal that SPS induces non-basal plane crystallinity, twisting of layers, and facilitates inter-grain fusion with a high degree of in-plane alignment across macroscale dimensions, resulting in near-theoretical density and anomalous properties. Our findings highlight the importance of material design, via new approaches such as twisting and interconnections between atomically thin layers, to create novel ceramics with properties that could go beyond their intrinsic theoretical predictions. △ Less

Submitted 10 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Authors revised version, 46 pages, 4 figures

arXiv:2404.07737 [pdf, ps, other]

Global regularity of 2D Rayleigh-Bénard equations with logarithmic supercritical dissipation

Authors: Baoquan Yuan, Xinyuan Xu, Changhao Li

Abstract: In this paper, we study the global regularity problem for the 2D Rayleigh-Bénard equations with logarithmic supercritical dissipation. By exploiting a combined quantity of the system, the technique of Littlewood-Paley decomposition and Besov spaces, and some commutator estimates, we establish the global regularity of a strong solution to this equations in the Sobolev space $H^{s}(\mathbb{R}^{2})$… ▽ More In this paper, we study the global regularity problem for the 2D Rayleigh-Bénard equations with logarithmic supercritical dissipation. By exploiting a combined quantity of the system, the technique of Littlewood-Paley decomposition and Besov spaces, and some commutator estimates, we establish the global regularity of a strong solution to this equations in the Sobolev space $H^{s}(\mathbb{R}^{2})$ for $s \ge2$. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 18 pages

MSC Class: 35Q35; 76D03; 35B65

arXiv:2404.07509

Multiparameter cascaded quantum interferometer

Authors: Baihong Li, Zhuo-zhuo Wang, Qi-qi Li, Changhua Chen, Boxin Yuan, Yiwei Zhai, Rui-Bo Jin, Xiaofei Zhang

Abstract: We theoretically propose a multiparameter cascaded quantum interferometer in which a two-input and two-output setup is obtained by concatenating 50:50 beam splitters with n independent and adjustable time delays. A general method for deriving the coincidence probability of such an interferometer is given based on the linear transformation of the matrix of beam splitters. As examples, we analyze th… ▽ More We theoretically propose a multiparameter cascaded quantum interferometer in which a two-input and two-output setup is obtained by concatenating 50:50 beam splitters with n independent and adjustable time delays. A general method for deriving the coincidence probability of such an interferometer is given based on the linear transformation of the matrix of beam splitters. As examples, we analyze the interference characteristics of one-, two- and three-parameter cascaded quantum interferometers with different frequency correlations and input states. Some typical interferograms of such interferometers are provided to reveal more rich and complicated two-photon interference phenomena. In principle, arbitrary two-input and two-output experimental setups can be designed with the proposal. This work offers a toolbox for designing versatile quantum interferometers and provides a convenient method for deriving the coincidence probabilities involved. Potential applications can be found in the complete spectral characterization of two-photon states, multiparameter estimation, and quantum metrology. △ Less

Submitted 8 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: We have found a serious error in this version, which may mislead readers

arXiv:2404.04608 [pdf, other]

doi 10.1109/TGRS.2024.3392778

Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation

Authors: Danpei Zhao, Bo Yuan, Ziqiang Chen, Tian Li, Zhuoran Liu, Wentao Li, Yue Gao

Abstract: Current remote-sensing interpretation models often focus on a single task such as detection, segmentation, or caption. However, the task-specific designed models are unattainable to achieve the comprehensive multi-level interpretation of images. The field also lacks support for multi-task joint interpretation datasets. In this paper, we propose Panoptic Perception, a novel task and a new fine-grai… ▽ More Current remote-sensing interpretation models often focus on a single task such as detection, segmentation, or caption. However, the task-specific designed models are unattainable to achieve the comprehensive multi-level interpretation of images. The field also lacks support for multi-task joint interpretation datasets. In this paper, we propose Panoptic Perception, a novel task and a new fine-grained dataset (FineGrip) to achieve a more thorough and universal interpretation for RSIs. The new task, 1) integrates pixel-level, instance-level, and image-level information for universal image perception, 2) captures image information from coarse to fine granularity, achieving deeper scene understanding and description, and 3) enables various independent tasks to complement and enhance each other through multi-task learning. By emphasizing multi-task interactions and the consistency of perception results, this task enables the simultaneous processing of fine-grained foreground instance segmentation, background semantic segmentation, and global fine-grained image captioning. Concretely, the FineGrip dataset includes 2,649 remote sensing images, 12,054 fine-grained instance segmentation masks belonging to 20 foreground things categories, 7,599 background semantic masks for 5 stuff classes and 13,245 captioning sentences. Furthermore, we propose a joint optimization-based panoptic perception model. Experimental results on FineGrip demonstrate the feasibility of the panoptic perception task and the beneficial effect of multi-task joint optimization on individual tasks. The dataset will be publicly available. △ Less

Submitted 25 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2024

arXiv:2404.04390 [pdf, other]

Field-dependent Magnons in a Honeycomb Antiferromagnet CoTiO$_3$

Authors: Bo Yuan, Ezekiel Horsley, M. B. Stone, Nicholas P. Butch, Guangyong Xu, Guo-Jiun Shu, J. P. Clancy, Young-June Kim

Abstract: We report field-dependent high-resolution inelastic neutron scattering (INS) measurements on the honeycomb lattice magnet, CoTiO$_3$, to study the evolution of its magnon excitations across a spin reorientation transition driven by an in-plane magnetic field. By carrying out elastic neutron scattering in a magnetic field, we show that the sample transitions from a collinear antiferromagnetic state… ▽ More We report field-dependent high-resolution inelastic neutron scattering (INS) measurements on the honeycomb lattice magnet, CoTiO$_3$, to study the evolution of its magnon excitations across a spin reorientation transition driven by an in-plane magnetic field. By carrying out elastic neutron scattering in a magnetic field, we show that the sample transitions from a collinear antiferromagnetic state with multiple magnetic domains at a low field to a mono-domain state with a canted magnetic structure at a high field. Concurrent with this transition, we observed significant changes in both the energy and the width of the zone center magnon peak. The observed width change is argued to be consistent with an unusual zero-field state with extended domain walls. On the other hand, the magnon spectra near the $\mathbf{K}$ point of the Brillouin zone boundary are found to be largely insensitive to the changes in the ordered moment directions and the domain configuration. We argue that this observation is difficult to explain within the framework of the bond-dependent model proposed in a recent INS study [Elliot \textit{et\,al}, Nat. Commun., \textbf{12}, 3936 (2021)]. Our study therefore calls for alternative explanations for the observed $\mathbf{K}$-point gap in CoTiO$_3$. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 15 pages, 9 figures, Supplemental Materials available upon request

arXiv:2404.00242 [pdf, other]

DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

Authors: Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

Abstract: Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference. Unlike traditional sequence-based decoding, tree-structured decoding better accommodates modern task requirements, including self-consistency, few-shot prompting, multi-step reasoning, and multi-… ▽ More Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference. Unlike traditional sequence-based decoding, tree-structured decoding better accommodates modern task requirements, including self-consistency, few-shot prompting, multi-step reasoning, and multi-model/head coordination. However, existing sequence-based inference systems are ill-suited for tree-structured decoding, resulting in redundancy in computation, memory footprints, and memory access, thereby undermining inference efficiency. To address this challenge, DeFT maintains memory-efficient attention calculation with low memory footprints through two key stages: (1) QKV Preparation: We propose a KV-Guided Grouping Strategy with Tree Split to intelligently group QKV, optimizing GPU resource utilization while minimizing memory reads/writes for KV cache between GPU global memory and on-chip shared memory; (2)Attention Calculation: We compute partial attention of each QKV group in a fused kernel and employ a Tree-topology-aware Global Reduction strategy to obtain final attention. By reducing 73-99% KV cache IO and nearly 100% IO for partial results during attention calculation (e.g., Softmax), DeFT achieves up to 2.52/3.82x speedup in the end-to-end/attention latency across three practical tree-based workloads: namely, few-shot prompting, multi-step reasoning, and speculative decoding, over state-of-the-art attention algorithms. △ Less

Submitted 29 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Update DeFT-v2. DeFT-v1 was accepted by ICLR'24 AGI Workshop ( https://openreview.net/forum?id=HqfLHoX8bR ). Code will be released soon

arXiv:2403.07952 [pdf, other]

AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production

Authors: Jiuniu Wang, Zehua Du, Yuyuan Zhao, Bo Yuan, Kexiang Wang, Jian Liang, Yaxi Zhao, Yihen Lu, Gengliang Li, Junlong Gao, Xin Tu, Zhenyu Guo

Abstract: The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual… ▽ More The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual users can leverage these modules easily. This innovative system would convert user story proposals into scripts, images, and audio, and then integrate these multimodal contents into videos. Additionally, the animating units (e.g., Gen-2 and Sora) could make the videos more infectious. The AesopAgent system could orchestrate task workflow for video generation, ensuring that the generated video is both rich in content and coherent. This system mainly contains two layers, i.e., the Horizontal Layer and the Utility Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary system that optimizes the whole video generation workflow and the steps within the workflow. It continuously evolves and iteratively optimizes workflow by accumulating expert experience and professional knowledge, including optimizing the LLM prompts and utilities usage. The Utility Layer provides multiple utilities, leading to consistent image generation that is visually coherent in terms of composition, characters, and style. Meanwhile, it provides audio and special effects, integrating them into expressive and logically arranged videos. Overall, our AesopAgent achieves state-of-the-art performance compared with many previous works in visual storytelling. Our AesopAgent is designed for convenient service for individual users, which is available on the following page: https://aesopai.github.io/. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 22 pages, 13 figures

arXiv:2403.06504 [pdf, other]

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Authors: Changyue Liao, Mo Sun, Zihan Yang, Kaiqi Chen, Binhang Yuan, Fei Wu, Zeke Wang

Abstract: Recent advances in large language models have brought immense value to the world, with their superior capabilities stemming from the massive number of parameters they utilize. However, even the GPUs with the highest memory capacities, currently peaking at 80GB, are far from sufficient to accommodate these vast parameters and their associated optimizer states when conducting stochastic gradient des… ▽ More Recent advances in large language models have brought immense value to the world, with their superior capabilities stemming from the massive number of parameters they utilize. However, even the GPUs with the highest memory capacities, currently peaking at 80GB, are far from sufficient to accommodate these vast parameters and their associated optimizer states when conducting stochastic gradient descent-based optimization. One approach to hosting such huge models is to aggregate device memory from many GPUs. However, this approach introduces prohibitive costs for most academic researchers, who always have a limited budget for many high-end GPU servers. In this paper, we focus on huge model fine-tuning on a single, even low-end, GPU in a commodity server, which is accessible to most AI researchers. In such a scenario, the state-of-the-art work ZeRO-Infinity suffers from two severe issues when running in a commodity server: 1) low GPU utilization due to inefficient swapping, and 2) limited trainable model size due to CPU memory capacity. The underlying reason is that ZeRO-Infinity is optimized for running on high-end GPU servers. To this end, we present Fuyou, a low-cost training framework that enables efficient 100B huge model fine-tuning on a low-end server with a low-end GPU and limited CPU memory capacity. The key idea is to add the SSD-CPU communication as an optimization dimension and thus carefully co-optimize computation and data swapping from a systematic approach to maximize GPU utilization. The experimental results show that 1) Fuyou is able to fine-tune 175B GPT-3 on a consumer GPU RTX 4090 with high GPU utilization, while ZeRO-Infinity fails to fine-tune; and 2) when training a small GPT-3 13B model, Fuyou achieves 156 TFLOPS on an RTX 4090 GPU while ZeRO-Infinity only achieves 45 TFLOPS. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.03452 [pdf, other]

D4C Glove-train: Solving the RPM and Bongard-logo Problem by Circumscribing and Building Distribution for Concepts

Authors: Ruizhuo Song, Beiming Yuan

Abstract: This paper achieves noteworthy progress in the realm of abstract reasoning, particularly in addressing Raven's Progressive Matrices (RPM) and Bongard-Logo challenges. Initially, we introduce Lico-Net, a novel baseline model that resolves RPM problems with remarkable accuracy. Leveraging this foundation, we advance with the D3C approach, which advocates representing the underlying concepts in abstr… ▽ More This paper achieves noteworthy progress in the realm of abstract reasoning, particularly in addressing Raven's Progressive Matrices (RPM) and Bongard-Logo challenges. Initially, we introduce Lico-Net, a novel baseline model that resolves RPM problems with remarkable accuracy. Leveraging this foundation, we advance with the D3C approach, which advocates representing the underlying concepts in abstract reasoning problems through distributions. This perspective enhances the performance of both Lico-Net and a baseline model excelling in Bongard-Logo tasks. To bolster the computational efficiency of D3C, we present the D3C-cos variant, offering a streamlined solution. Furthermore, we propose the D2C method, redefining concept boundaries within these domains and bridging the divide between high-level abstractions and their lower-dimensional counterparts. Finally, we extend our methodology to D4C, employing adversarial techniques to refine concept boundaries further and demonstrate substantial improvements in both RPM and Bongard-Logo challenges. Overall, our contributions present a fresh outlook and practical advancements in the field of abstract reasoning. △ Less

Submitted 1 July, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: 15 pages, 15 figures, 6 tables

arXiv:2403.03190 [pdf, other]

Triple-CFN: Restructuring Concept and Feature Spaces for Enhancing Abstract Reasoning Process

Authors: Ruizhuo Song, Beiming Yuan

Abstract: Visual abstract reasoning poses challenges to AI algorithms, requiring cognitive abilities beyond perception. For methodology, this study emphasizes the need to separately extract concepts and features from visual abstract reasoning problems, employing the responses of features to concepts as elements in the reasoning process. It also advocates for clear concept and feature spaces to tackle visual… ▽ More Visual abstract reasoning poses challenges to AI algorithms, requiring cognitive abilities beyond perception. For methodology, this study emphasizes the need to separately extract concepts and features from visual abstract reasoning problems, employing the responses of features to concepts as elements in the reasoning process. It also advocates for clear concept and feature spaces to tackle visual abstract reasoning tasks effectively. For technology, we introduce the Cross-Feature Network (CFN), a framework that separately extracts concepts and features from reasoning problems, utilizing their responses as reasoning representations. The CFN integrates a dual Expectation-Maximization process to actively seek an ideal concept space for problem-solving, yielding notable results despite limitations in generalization tasks. To overcome these limitations, we propose the Triple-CFN, maximizing feature extraction and demonstrating effectiveness in Bongard-Logo and Raven's Progressive Matrices (RPM) problems. Additionally, we present Meta Triple-CFN, which constructs a promising concept space for RPM, ensuring high reasoning accuracy and concept interpretability. Furthermore, we design the Re-space layer, defining a clear feature space for (Meta) Triple-CFN, with its unique warm-start process aiding generalization. Overall, this work advances machine intelligence through innovative network designs for abstract reasoning. △ Less

Submitted 21 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: 13 pages, 16 figures, 7 tables

arXiv:2403.03173 [pdf, other]

Solving the Clustering Reasoning Problems by Modeling a Deep-Learning-Based Probabilistic Model

Authors: Ruizhuo Song, Beiming Yuan

Abstract: Visual abstract reasoning problems pose significant challenges to the perception and cognition abilities of artificial intelligence algorithms, demanding deeper pattern recognition and inductive reasoning beyond mere identification of explicit image features. Research advancements in this field often provide insights and technical support for other similar domains. In this study, we introduce PMoC… ▽ More Visual abstract reasoning problems pose significant challenges to the perception and cognition abilities of artificial intelligence algorithms, demanding deeper pattern recognition and inductive reasoning beyond mere identification of explicit image features. Research advancements in this field often provide insights and technical support for other similar domains. In this study, we introduce PMoC, a deep-learning-based probabilistic model, achieving high reasoning accuracy in the Bongard-Logo, which stands as one of the most challenging clustering reasoning tasks. PMoC is a novel approach for constructing probabilistic models based on deep learning, which is distinctly different from previous techniques. PMoC revitalizes the probabilistic approach, which has been relatively weak in visual abstract reasoning. As a bonus, we also designed Pose-Transformer for complex visual abstract reasoning tasks. Inspired by capsule networks, it focuses on positional relationships in image data, boosting accuracy when combined with PMoC. Our Pose-Transformer effectively addresses reasoning difficulties associated with changes in the position of entities, outperforming previous models on RAVEN dataset, and the PGM dataset. RAVEN and PGM represent two significant progressive pattern reasoning problems. Finally, considering the deployment difficulties of Pose-Transformer, we introduced Straw-Pose-Transformer, a lightweight version. This study contributes to enhancing the capabilities of artificial intelligence in abstract reasoning, cognitive pattern, and probabilistic modeling of complex systems. △ Less

Submitted 13 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: 14 pages, 17 figures, 4 tables

arXiv:2403.00193 [pdf]

Structural Resilience and Connectivity of the IPv6 Internet: An AS-level Topology Examination

Authors: Bin Yuan, Tianbo Song

Abstract: The study utilizes a comprehensive dataset informed by IPv6 routing information to provide statistics, degree distribution, joint degree distribution, and clustering analysis of the IPv6 Internet's structure and resilience.The dataset includes 17,232 unique ASes and 10,000 unique IPv6 prefixes. Analysis reveals an interconnected network with an average path length of approximately 3 hops, suggesti… ▽ More The study utilizes a comprehensive dataset informed by IPv6 routing information to provide statistics, degree distribution, joint degree distribution, and clustering analysis of the IPv6 Internet's structure and resilience.The dataset includes 17,232 unique ASes and 10,000 unique IPv6 prefixes. Analysis reveals an interconnected network with an average path length of approximately 3 hops, suggesting a robust and efficient network with potential redundancy and resilience, despite some isolated components. The paper outlines the degree distribution, indicating many peripheral nodes in a sparse network, and a clustering analysis showing a tendency for ASes to form clusters, which is indicative of redundancy and robustness against failures. The connectivity analysis, including path redundancy and reachability, supports the network's resilience.The findings are crucial for network design and strategic planning, particularly as IPv6 adoption increases. The paper emphasizes the importance of continuous monitoring and improvement of network connectivity in the evolving Internet landscape, highlighting the IPv6 Internet's resilience and structured connectivity. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2403.00190 [pdf]

Identification of important nodes in the information propagation network based on the artificial intelligence method

Authors: Bin Yuan, Tianbo Song, Jerry Yao

Abstract: This study presents an integrated approach for identifying key nodes in information propagation networks using advanced artificial intelligence methods. We introduce a novel technique that combines the Decision-making Trial and Evaluation Laboratory (DEMATEL) method with the Global Structure Model (GSM), creating a synergistic model that effectively captures both local and global influences within… ▽ More This study presents an integrated approach for identifying key nodes in information propagation networks using advanced artificial intelligence methods. We introduce a novel technique that combines the Decision-making Trial and Evaluation Laboratory (DEMATEL) method with the Global Structure Model (GSM), creating a synergistic model that effectively captures both local and global influences within a network. This method is applied across various complex networks, such as social, transportation, and communication systems, utilizing the Global Network Influence Dataset (GNID). Our analysis highlights the structural dynamics and resilience of these networks, revealing insights into node connectivity and community formation. The findings demonstrate the effectiveness of our AI-based approach in offering a comprehensive understanding of network behavior, contributing significantly to strategic network analysis and optimization. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.15054 [pdf, other]

Dynamical Reversibility and A New Theory of Causal Emergence

Authors: Jiang Zhang, Ruyi Tao, Keng Hou Leong, Mingzhe Yang, Bing Yuan

Abstract: The theory of causal emergence based on effective information(EI) suggests that complex systems may exhibit a phenomenon called causal emergence(CE), where the macro-dynamics demonstrate a stronger causal effect than the micro-dynamics. However, a challenge in this theory is the dependence on the method used to coarse-grain the system. In this paper, we introduce a fresh concept of approximate dyn… ▽ More The theory of causal emergence based on effective information(EI) suggests that complex systems may exhibit a phenomenon called causal emergence(CE), where the macro-dynamics demonstrate a stronger causal effect than the micro-dynamics. However, a challenge in this theory is the dependence on the method used to coarse-grain the system. In this paper, we introduce a fresh concept of approximate dynamical reversibility and establish a novel framework for CE based on this and singular value decomposition. Our research not only reveals a strong correlation between the proximity of a Markov chain being dynamically reversible and EI in assessing causal effects across various scenarios but also demonstrates that EI is constrained by the logarithm of the approximate dynamical reversibility. Leveraging this concept, we present an innovative approach to quantifying CE that is agnostic to specific coarse-graining techniques and effectively captures the inherent characteristics of Markov dynamics. Through empirical evaluations on examples of boolean networks, cellular automata, and complex networks, we validate our refined CE definition. △ Less

Submitted 9 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 40 pages,9 figures

arXiv:2402.14477 [pdf, other]

Pressure tunable magnetic skyrmion phase in Co8Zn8Mn4 single crystals

Authors: Zhun Li, Xinrun Mi, Xinming Wang, Jian Lyu, Na Su, Aifeng Wang, Yisheng Chai, Bao Yuan, Wanju Luo, Hui Cheng, Jianxiang Gao, Hongliang Wang, Lijie Hao, Mingquan He, Junying Shen, Young Sun, Xin Tong

Abstract: In a magnetic skyrmion phase, magnetic moments form vortex-like topological textures which are of both fundamental and industrial interests. In $βべーた$-Mn-type Co-Zn-Mn alloys, chrial magnetic skyrmions emerge above room temperature, providing a unique system for studying the skrymion physics and exploring spintronics applications. However, the magnetic skyrmion phase is typically confined in a narrow… ▽ More In a magnetic skyrmion phase, magnetic moments form vortex-like topological textures which are of both fundamental and industrial interests. In $βべーた$-Mn-type Co-Zn-Mn alloys, chrial magnetic skyrmions emerge above room temperature, providing a unique system for studying the skrymion physics and exploring spintronics applications. However, the magnetic skyrmion phase is typically confined in a narrow and limited temperature ($T$) and magnetic field ($H$) range. Here, we demonstrate that hydrostatic pressure can expand the skyrmion phase in the $T-H$ phase diagram of single-crystalline Co$_8$Zn$_8$Mn$_4$. At ambient pressure, signatures of skyrmions are seen within $T\sim302-308$ K and $H\sim50-100$ Oe. Applying a moderate pressure of 6 kbar extends this range to $T\sim300-310$ K and $H\sim50-150$ Oe. However, further escalation of pressure to 10 kbar results in a slight contraction of the skyrmion phase. These findings underscore the sensitivity of the skyrmion phase in Co$_8$Zn$_8$Mn$_4$ to external pressures, and hint at the potential of strain engineering, particularly in $βべーた$-Mn-type Co-Zn-Mn thin films, as a promising avenue to customize the skyrmion phase. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 7 pages, 4 figures

arXiv:2402.02739 [pdf, other]

DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

Authors: Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan

Abstract: In the exciting generative AI era, the diffusion model has emerged as a very powerful and widely adopted content generation and editing tool for various data modalities, making the study of their potential security risks very necessary and critical. Very recently, some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks, calling for in-depth analysis and i… ▽ More In the exciting generative AI era, the diffusion model has emerged as a very powerful and widely adopted content generation and editing tool for various data modalities, making the study of their potential security risks very necessary and critical. Very recently, some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks, calling for in-depth analysis and investigation of the security challenges of this popular and fundamental AI technique. In this paper, for the first time, we systematically explore the detectability of the poisoned noise input for the backdoored diffusion models, an important performance metric yet little explored in the existing works. Starting from the perspective of a defender, we first analyze the properties of the trigger pattern in the existing diffusion backdoor attacks, discovering the important role of distribution discrepancy in Trojan detection. Based on this finding, we propose a low-cost trigger detection mechanism that can effectively identify the poisoned input noise. We then take a further step to study the same problem from the attack side, proposing a backdoor attack strategy that can learn the unnoticeable trigger to evade our proposed detection scheme. Empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategy. For trigger detection, our distribution discrepancy-based solution can achieve a 100\% detection rate for the Trojan triggers used in the existing works. For evading trigger detection, our proposed stealthy trigger design approach performs end-to-end learning to make the distribution of poisoned noise input approach that of benign noise, enabling nearly 100\% detection pass rate with very high attack and benign performance for the backdoored diffusion models. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.01763 [pdf, other]

When Large Language Models Meet Vector Databases: A Survey

Authors: Zhi Jing, Yongye Su, Yikun Han, Bo Yuan, Haiyun Xu, Chunjiang Liu, Kehai Chen, Min Zhang

Abstract: This survey explores the synergistic potential of Large Language Models (LLMs) and Vector Databases (VecDBs), a burgeoning but rapidly evolving research area. With the proliferation of LLMs comes a host of challenges, including hallucinations, outdated knowledge, prohibitive commercial application costs, and memory issues. VecDBs emerge as a compelling solution to these issues by offering an effic… ▽ More This survey explores the synergistic potential of Large Language Models (LLMs) and Vector Databases (VecDBs), a burgeoning but rapidly evolving research area. With the proliferation of LLMs comes a host of challenges, including hallucinations, outdated knowledge, prohibitive commercial application costs, and memory issues. VecDBs emerge as a compelling solution to these issues by offering an efficient means to store, retrieve, and manage the high-dimensional vector representations intrinsic to LLM operations. Through this nuanced review, we delineate the foundational principles of LLMs and VecDBs and critically analyze their integration's impact on enhancing LLM functionalities. This discourse extends into a discussion on the speculative future developments in this domain, aiming to catalyze further research into optimizing the confluence of LLMs and VecDBs for advanced data handling and knowledge extraction capabilities. △ Less

Submitted 5 February, 2024; v1 submitted 30 January, 2024; originally announced February 2024.

arXiv:2401.12994 [pdf, other]

Automated Scoring of Clinical Patient Notes using Advanced NLP and Pseudo Labeling

Authors: Jingyu Xu, Yifeng Jiang, Bin Yuan, Shulin Li, Tianbo Song

Abstract: Clinical patient notes are critical for documenting patient interactions, diagnoses, and treatment plans in medical practice. Ensuring accurate evaluation of these notes is essential for medical education and certification. However, manual evaluation is complex and time-consuming, often resulting in variability and resource-intensive assessments. To tackle these challenges, this research introduce… ▽ More Clinical patient notes are critical for documenting patient interactions, diagnoses, and treatment plans in medical practice. Ensuring accurate evaluation of these notes is essential for medical education and certification. However, manual evaluation is complex and time-consuming, often resulting in variability and resource-intensive assessments. To tackle these challenges, this research introduces an approach leveraging state-of-the-art Natural Language Processing (NLP) techniques, specifically Masked Language Modeling (MLM) pretraining, and pseudo labeling. Our methodology enhances efficiency and effectiveness, significantly reducing training time without compromising performance. Experimental results showcase improved model performance, indicating a potential transformation in clinical note assessment. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.11240 [pdf, other]

CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference

Authors: Suyi Li, Hanfeng Lu, Tianyuan Wu, Minchen Yu, Qizhen Weng, Xusheng Chen, Yizhou Shan, Binhang Yuan, Wei Wang

Abstract: Pre-trained large language models (LLMs) often need specialization for domain-specific tasks. Low-Rank Adaptation (LoRA) is a popular approach that adapts a base model to multiple tasks by adding lightweight trainable adapters. In this paper, we present CaraServe, a system that efficiently serves many LoRA adapters derived from a common base model. CaraServe maintains the base model on GPUs and dy… ▽ More Pre-trained large language models (LLMs) often need specialization for domain-specific tasks. Low-Rank Adaptation (LoRA) is a popular approach that adapts a base model to multiple tasks by adding lightweight trainable adapters. In this paper, we present CaraServe, a system that efficiently serves many LoRA adapters derived from a common base model. CaraServe maintains the base model on GPUs and dynamically loads activated LoRA adapters from main memory. As GPU loading results in a cold-start that substantially delays token generation, CaraServe employs a CPU-assisted approach. It early starts the activated adapters on CPUs for prefilling as they are being loaded onto GPUs; after loading completes, it then switches to the GPUs for generative LoRA inference. CaraServe develops a highly optimized synchronization mechanism to efficiently coordinate LoRA computation on the CPU and GPU. Moreover, CaraServe employs a rank-aware scheduling algorithm to optimally schedule heterogeneous LoRA requests for maximum service-level objective (SLO) attainment. We have implemented CaraServe and evaluated it against state-of-the-art LoRA serving systems. Our results demonstrate that CaraServe can speed up the average request serving latency by up to 1.4$\times$ and achieve an SLO attainment of up to 99%. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2401.10341 [pdf, other]

ELRT: Efficient Low-Rank Training for Compact Convolutional Neural Networks

Authors: Yang Sui, Miao Yin, Yu Gong, Jinqi Xiao, Huy Phan, Bo Yuan

Abstract: Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, an… ▽ More Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, and the entire training phase is always performed on the low-rank structure, bringing attractive benefits for practical applications. However, the existing low-rank training solutions still face several challenges, such as a considerable accuracy drop and/or still needing to update full-size models during the training. In this paper, we perform a systematic investigation on low-rank CNN training. By identifying the proper low-rank format and performance-improving strategy, we propose ELRT, an efficient low-rank training solution for high-accuracy, high-compactness, low-rank CNN models. Our extensive evaluation results for training various CNNs on different datasets demonstrate the effectiveness of ELRT. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.09699 [pdf, other]

Curriculum Recommendations Using Transformer Base Model with InfoNCE Loss And Language Switching Method

Authors: Xiaonan Xu, Bin Yuan, Yongyao Mo, Tianbo Song, Shulin Li

Abstract: The Curriculum Recommendations paradigm is dedicated to fostering learning equality within the ever-evolving realms of educational technology and curriculum development. In acknowledging the inherent obstacles posed by existing methodologies, such as content conflicts and disruptions from language translation, this paradigm aims to confront and overcome these challenges. Notably, it addresses cont… ▽ More The Curriculum Recommendations paradigm is dedicated to fostering learning equality within the ever-evolving realms of educational technology and curriculum development. In acknowledging the inherent obstacles posed by existing methodologies, such as content conflicts and disruptions from language translation, this paradigm aims to confront and overcome these challenges. Notably, it addresses content conflicts and disruptions introduced by language translation, hindrances that can impede the creation of an all-encompassing and personalized learning experience. The paradigm's objective is to cultivate an educational environment that not only embraces diversity but also customizes learning experiences to suit the distinct needs of each learner. To overcome these challenges, our approach builds upon notable contributions in curriculum development and personalized learning, introducing three key innovations. These include the integration of Transformer Base Model to enhance computational efficiency, the implementation of InfoNCE Loss for accurate content-topic matching, and the adoption of a language switching strategy to alleviate translation-related ambiguities. Together, these innovations aim to collectively tackle inherent challenges and contribute to forging a more equitable and effective learning journey for a diverse range of learners. Competitive cross-validation scores underscore the efficacy of sentence-transformers/LaBSE, achieving 0.66314, showcasing our methodology's effectiveness in diverse linguistic nuances for content alignment prediction. Index Terms-Curriculum Recommendation, Transformer model with InfoNCE Loss, Language Switching. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: 4pages, 2 figures, ICAICA2023

MSC Class: 68T50

arXiv:2401.05439 [pdf]

Physics-informed Deep Learning to Solve Three-dimensional Terzaghi Consolidation Equation: Forward and Inverse Problems

Authors: Biao Yuan, Ana Heitor, He Wang, Xiaohui Chen

Abstract: The emergence of neural networks constrained by physical governing equations has sparked a new trend in deep learning research, which is known as Physics-Informed Neural Networks (PINNs). However, solving high-dimensional problems with PINNs is still a substantial challenge, the space complexity brings difficulty to solving large multidirectional problems. In this paper, a novel PINN framework to… ▽ More The emergence of neural networks constrained by physical governing equations has sparked a new trend in deep learning research, which is known as Physics-Informed Neural Networks (PINNs). However, solving high-dimensional problems with PINNs is still a substantial challenge, the space complexity brings difficulty to solving large multidirectional problems. In this paper, a novel PINN framework to quickly predict several three-dimensional Terzaghi consolidation cases under different conditions is proposed. Meanwhile, the loss functions for different cases are introduced, and their differences in three-dimensional consolidation problems are highlighted. The tuning strategies for the PINNs framework for three-dimensional consolidation problems are introduced. Then, the performance of PINNs is tested and compared with traditional numerical methods adopted in forward problems, and the coefficients of consolidation and the impact of noisy data in inverse problems are identified. Finally, the results are summarized and presented from three-dimensional simulations of PINNs, which show an accuracy rate of over 99% compared with ground truth for both forward and inverse problems. These results are desirable with good accuracy and can be used for soil settlement prediction, which demonstrates that the proposed PINNs framework can learn the three-dimensional consolidation PDE well. Keywords: Three-dimensional Terzaghi consolidation; Physics-informed neural networks (PINNs); Forward problems; Inverse problems; soil settlement △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 30 pages, 11 figures, 6 tables, 23 equations

arXiv:2401.00747 [pdf, other]

Polynomial-time Approximation Scheme for Equilibriums of Games

Authors: Hongbo Sun, Chongkun Xia, Junbo Tan, Bo Yuan, Xueqian Wang, Bin Liang

Abstract: Whether a PTAS (polynomial-time approximation scheme) exists for equilibriums of games has been an open question, which relates to questions in three fields, the practicality of methods in algorithmic game theory, the equation PPAD=FP about the two complexity classes in computational complexity theory, and non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning). This… ▽ More Whether a PTAS (polynomial-time approximation scheme) exists for equilibriums of games has been an open question, which relates to questions in three fields, the practicality of methods in algorithmic game theory, the equation PPAD=FP about the two complexity classes in computational complexity theory, and non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning). This paper introduces our discovery of the sufficient and necessary conditions for iterations based on dynamic programming and line search to approximate perfect equilibriums of dynamic games, out of which we construct a method proved to be a FPTAS (fully PTAS) for non-singular perfect equilibriums of dynamic games, where for almost any given dynamic game, all its perfect equilibriums are non-singular, indicating that FP$\subseteq$PPAD$\subseteq$Almost-FP. Our discovery consists of cone interior dynamic programming and primal-dual unbiased regret minimization, which fit into existing theories by degeneration in a structure-preserving manner. The former enables a dynamic programming operator to iteratively converge to a perfect equilibrium based on a concept called policy cone. The latter enables an interior-point line search to approximate a Nash equilibrium based on two concepts called primal-dual bias and unbiased central variety, solving a subproblem of the former. Validity of our discovery is cross-corroborated by a combination of theorem proofs, graphs of the three main concepts, and experimental results. △ Less

Submitted 3 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: 23 pages, 7 figures, code and animation are available at https://github.com/shb20tsinghua/PTAS_Game/tree/main

MSC Class: 90C39; 90C51; 91A15

arXiv:2312.16815 [pdf, other]

Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studies

Authors: Bing Yuan, Zhang Jiang, Aobo Lyu, Jiayun Wu, Zhipeng Wang, Mingzhe Yang, Kaiwei Liu, Muyun Mou, Peng Cui

Abstract: Emergence and causality are two fundamental concepts for understanding complex systems. They are interconnected. On one hand, emergence refers to the phenomenon where macroscopic properties cannot be solely attributed to the cause of individual properties. On the other hand, causality can exhibit emergence, meaning that new causal laws may arise as we increase the level of abstraction. Causal emer… ▽ More Emergence and causality are two fundamental concepts for understanding complex systems. They are interconnected. On one hand, emergence refers to the phenomenon where macroscopic properties cannot be solely attributed to the cause of individual properties. On the other hand, causality can exhibit emergence, meaning that new causal laws may arise as we increase the level of abstraction. Causal emergence theory aims to bridge these two concepts and even employs measures of causality to quantify emergence. This paper provides a comprehensive review of recent advancements in quantitative theories and applications of causal emergence. Two key problems are addressed: quantifying causal emergence and identifying it in data. Addressing the latter requires the use of machine learning techniques, thus establishing a connection between causal emergence and artificial intelligence. We highlighted that the architectures used for identifying causal emergence are shared by causal representation learning, causal model abstraction, and world model-based reinforcement learning. Consequently, progress in any of these areas can benefit the others. Potential applications and future perspectives are also discussed in the final section of the review. △ Less

Submitted 25 February, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: 57 pages, 17 figures, 1 table

MSC Class: 68P30 ACM Class: K.3.2

arXiv:2312.16679 [pdf]

Square Moiré Superlattices in Twisted Two-Dimensional Halide Perovskites

Authors: Shuchen Zhang, Linrui Jin, Yuan Lu, Linghai Zhang, Jiaqi Yang, Qiuchen Zhao, Dewei Sun, Joshua J. P. Thompson, Biao Yuan, Ke Ma, Akriti, Jee Yung Park, Yoon Ho Lee, Zitang Wei, Blake P. Finkenauer, Daria D. Blach, Sarath Kumar, Hailin Peng, Arun Mannodi-Kanakkithodi, Yi Yu, Ermin Malic, Gang Lu, Letian Dou, Libai Huang

Abstract: Moiré superlattices have emerged as a new platform for studying strongly correlated quantum phenomena, but these systems have been largely limited to van der Waals layer two-dimensional (2D) materials. Here we introduce moiré superlattices leveraging ultra-thin, ligand-free halide perovskites, facilitated by ionic interactions. Square moiré superlattices with varying periodic lengths are clearly v… ▽ More Moiré superlattices have emerged as a new platform for studying strongly correlated quantum phenomena, but these systems have been largely limited to van der Waals layer two-dimensional (2D) materials. Here we introduce moiré superlattices leveraging ultra-thin, ligand-free halide perovskites, facilitated by ionic interactions. Square moiré superlattices with varying periodic lengths are clearly visualized through high-resolution transmission electron microscopy. Twist-angle-dependent transient photoluminescence microscopy and electrical characterizations indicate the emergence of localized bright excitons and trapped charge carriers near a twist angle of ~10°. The localized excitons are accompanied by enhanced exciton emission, attributed to an increased oscillator strength by a theoretically forecasted flat band. This work illustrates the potential of extended ionic interaction in realizing moiré physics at room temperature, broadening the horizon for future investigations. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.10515 [pdf, other]

doi 10.1109/TGRS.2023.3343453

PETDet: Proposal Enhancement for Two-Stage Fine-Grained Object Detection

Authors: Wentao Li, Danpei Zhao, Bo Yuan, Yue Gao, Zhenwei Shi

Abstract: Fine-grained object detection (FGOD) extends object detection with the capability of fine-grained recognition. In recent two-stage FGOD methods, the region proposal serves as a crucial link between detection and fine-grained recognition. However, current methods overlook that some proposal-related procedures inherited from general detection are not equally suitable for FGOD, limiting the multi-tas… ▽ More Fine-grained object detection (FGOD) extends object detection with the capability of fine-grained recognition. In recent two-stage FGOD methods, the region proposal serves as a crucial link between detection and fine-grained recognition. However, current methods overlook that some proposal-related procedures inherited from general detection are not equally suitable for FGOD, limiting the multi-task learning from generation, representation, to utilization. In this paper, we present PETDet (Proposal Enhancement for Two-stage fine-grained object detection) to better handle the sub-tasks in two-stage FGOD methods. Firstly, an anchor-free Quality Oriented Proposal Network (QOPN) is proposed with dynamic label assignment and attention-based decomposition to generate high-quality oriented proposals. Additionally, we present a Bilinear Channel Fusion Network (BCFN) to extract independent and discriminative features of the proposals. Furthermore, we design a novel Adaptive Recognition Loss (ARL) which offers guidance for the R-CNN head to focus on high-quality proposals. Extensive experiments validate the effectiveness of PETDet. Quantitative analysis reveals that PETDet with ResNet50 reaches state-of-the-art performance on various FGOD datasets, including FAIR1M-v1.0 (42.96 AP), FAIR1M-v2.0 (48.81 AP), MAR20 (85.91 AP) and ShipRSImageNet (74.90 AP). The proposed method also achieves superior compatibility between accuracy and inference speed. Our code and models will be released at https://github.com/canoe-Z/PETDet. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: IEEE TGRS 2023

arXiv:2312.10343 [pdf, other]

In-Sensor Radio Frequency Computing for Energy-Efficient Intelligent Radar

Authors: Yang Sui, Minning Zhu, Lingyi Huang, Chung-Tse Michael Wu, Bo Yuan

Abstract: Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank dec… ▽ More Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank decomposition to transform a large-scale RFNN into a compact RFNN while almost preserving its accuracy. Specifically, we develop a Tensor-Train RFNN (TT-RFNN) where each layer comprises a sequence of low-rank third-order tensors, leading to a notable reduction in parameter count, thereby optimizing RF interferometer utilization in comparison to the original large-scale RFNN. Additionally, considering the inherent physical errors when mapping TT-RFNN to RF device parameters in real-world deployment, from a general perspective, we construct the Robust TT-RFNN (RTT-RFNN) by incorporating a robustness solver on TT-RFNN to enhance its robustness. To adapt the RTT-RFNN to varying requirements of reshaping operations, we further provide a reconfigurable reshaping solution employing RF switch matrices. Empirical evaluations conducted on MNIST and CIFAR-10 datasets show the effectiveness of our proposed method. △ Less

Submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.00843 [pdf, other]

Exploring the Robustness of Decentralized Training for Large Language Models

Authors: Lin Lu, Chenxi Dai, Wangcheng Tao, Binhang Yuan, Yanan Sun, Pan Zhou

Abstract: Decentralized training of large language models has emerged as an effective way to democratize this technology. However, the potential threats associated with this approach have not been carefully discussed, which would hinder the development of decentralized training infrastructures. This paper aims to initiate discussion towards this end by exploring the robustness of decentralized training from… ▽ More Decentralized training of large language models has emerged as an effective way to democratize this technology. However, the potential threats associated with this approach have not been carefully discussed, which would hinder the development of decentralized training infrastructures. This paper aims to initiate discussion towards this end by exploring the robustness of decentralized training from three main perspectives. First, we demonstrate the vulnerabilities inherent in decentralized training frameworks in terms of hardware, data, and models. Second, we highlight the fundamental difference between decentralized foundation model training and vanilla federated learning, where the security techniques employed in federated learning cannot be applied directly. Third, we discuss the essential components required for a robust and efficient decentralized training framework and present a case study by modeling a concrete threat model. Our objective in this vision paper is to emphasize the importance of addressing security concerns in the context of decentralized training for large language models. △ Less

Submitted 30 November, 2023; originally announced December 2023.

Comments: 6 pages, 3 figures

arXiv:2311.18103 [pdf, other]

Corner-to-Center Long-range Context Model for Efficient Learned Image Compression

Authors: Yang Sui, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, Bo Yuan, Zhenzhong Chen

Abstract: In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression… ▽ More In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression in real-world scenarios. However, performance degradation occurs due to its incomplete casual context. To tackle this issue, we conduct an in-depth analysis of the performance degradation observed in existing parallel context models, focusing on two aspects: the Quantity and Quality of information utilized for context prediction and decoding. Based on such analysis, we propose the \textbf{Corner-to-Center transformer-based Context Model (C$^3$M)} designed to enhance context and latent predictions and improve rate-distortion performance. Specifically, we leverage the logarithmic-based prediction order to predict more context features from corner to center progressively. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder to capture the long-range semantic information by assigning the different window shapes in different channels. Extensive experimental evaluations show that the proposed method is effective and outperforms the state-of-the-art parallel methods. Finally, according to the subjective analysis, we suggest that improving the detailed representation in transformer-based image compression is a promising direction to be explored. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.12873 [pdf, ps, other]

Global Strong Solutions to the incompressible Magnetohydrodynamic Equations with Density-Dependent Viscosity and Vacuum in 3D Exterior Domains

Authors: Bing Yuan, Rong Zhang, Peng Zhou

Abstract: The nonhomogeneous incompressible Magnetohydrodynamic Equations with density-dependent viscosity is studied in three-dimensional (3D) exterior domains with slip boundary conditions. The key is the constraint of an additional initial value condition $B_0\in L^p (1\leqslant p<12/7)$, which increase decay-in-time rates of the solutions, thus we obtain the global existence of strong solutions provided… ▽ More The nonhomogeneous incompressible Magnetohydrodynamic Equations with density-dependent viscosity is studied in three-dimensional (3D) exterior domains with slip boundary conditions. The key is the constraint of an additional initial value condition $B_0\in L^p (1\leqslant p<12/7)$, which increase decay-in-time rates of the solutions, thus we obtain the global existence of strong solutions provided the gradient of the initial velocity and initial magnetic field is suitably small. In particular, the initial density is allowed to contain vacuum states and large oscillations. Moreover, the large-time behavior of the solution is also shown. △ Less

Submitted 19 November, 2023; originally announced November 2023.

Comments: arXiv admin note: text overlap with arXiv:2205.05925, arXiv:1709.05608, arXiv:1506.03884, arXiv:2112.08111 by other authors

arXiv:2311.11557 [pdf, other]

Replay-enhanced Continual Reinforcement Learning

Authors: Tiantian Zhang, Kevin Zehua Shen, Zichuan Lin, Bo Yuan, Xueqian Wang, Xiu Li, Deheng Ye

Abstract: Replaying past experiences has proven to be a highly effective approach for averting catastrophic forgetting in supervised continual learning. However, some crucial factors are still largely ignored, making it vulnerable to serious failure, when used as a solution to forgetting in continual reinforcement learning, even in the context of perfect memory where all data of previous tasks are accessibl… ▽ More Replaying past experiences has proven to be a highly effective approach for averting catastrophic forgetting in supervised continual learning. However, some crucial factors are still largely ignored, making it vulnerable to serious failure, when used as a solution to forgetting in continual reinforcement learning, even in the context of perfect memory where all data of previous tasks are accessible in the current task. On the one hand, since most reinforcement learning algorithms are not invariant to the reward scale, the previously well-learned tasks (with high rewards) may appear to be more salient to the current learning process than the current task (with small initial rewards). This causes the agent to concentrate on those salient tasks at the expense of generality on the current task. On the other hand, offline learning on replayed tasks while learning a new task may induce a distributional shift between the dataset and the learned policy on old tasks, resulting in forgetting. In this paper, we introduce RECALL, a replay-enhanced method that greatly improves the plasticity of existing replay-based methods on new tasks while effectively avoiding the recurrence of catastrophic forgetting in continual reinforcement learning. RECALL leverages adaptive normalization on approximate targets and policy distillation on old tasks to enhance generality and stability, respectively. Extensive experiments on the Continual World benchmark show that RECALL performs significantly better than purely perfect memory replay, and achieves comparable or better overall performance against state-of-the-art continual learning methods. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: Accepted by Transactions on Machine Learning Research 2023

arXiv:2311.11514 [pdf, other]

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment

Authors: Youhe Jiang, Ran Yan, Xiaozhe Yao, Yang Zhou, Beidi Chen, Binhang Yuan

Abstract: Serving generative inference of the large language model is a crucial component of contemporary AI applications. This paper focuses on deploying such services in a heterogeneous and cross-datacenter setting to mitigate the substantial inference costs typically associated with a single centralized datacenter. Towards this end, we propose HexGen, a flexible distributed inference engine that uniquely… ▽ More Serving generative inference of the large language model is a crucial component of contemporary AI applications. This paper focuses on deploying such services in a heterogeneous and cross-datacenter setting to mitigate the substantial inference costs typically associated with a single centralized datacenter. Towards this end, we propose HexGen, a flexible distributed inference engine that uniquely supports the asymmetric partition of generative inference computations over both tensor model parallelism and pipeline parallelism and allows for effective deployment across diverse GPUs interconnected by a fully heterogeneous network. We further propose a sophisticated scheduling algorithm grounded in constrained optimization that can adaptively assign asymmetric inference computation across the GPUs to fulfill inference requests while maintaining acceptable latency levels. We conduct an extensive evaluation to verify the efficiency of HexGen by serving the state-of-the-art Llama-2 (70B) model. The results suggest that HexGen can choose to achieve up to 2.3 times lower latency deadlines or tolerate up to 4 times more request rates compared with the homogeneous baseline given the same budget. △ Less

Submitted 27 May, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

Comments: Accepted by ICML 2024

arXiv:2311.08164 [pdf, other]

Full characterization of biphotons with a generalized quantum interferometer

Authors: Baihong Li, Changhua Chen, Boxin Yuan, Xiaofei Zhang, Ruifang Dong, Shougang Zhang, Rui-Bo Jin

Abstract: Entangled photons (biphotons) in the time-frequency degree of freedom play a crucial role in both foundational physics and advanced quantum technologies. Fully characterizing them poses a key scientific challenge. Here, we propose a theoretical approach to achieving the complete tomography of biphotons by introducing a frequency shift in one arm of the combination interferometer. Our method, a gen… ▽ More Entangled photons (biphotons) in the time-frequency degree of freedom play a crucial role in both foundational physics and advanced quantum technologies. Fully characterizing them poses a key scientific challenge. Here, we propose a theoretical approach to achieving the complete tomography of biphotons by introducing a frequency shift in one arm of the combination interferometer. Our method, a generalized combination interferometer, enables the reconstruction of the full complex joint spectral amplitude associated with both frequency sum and difference in a single interferometer. In contrast, the generalized Hong-Ou-Mandel and N00N state interferometers only allow for the partial tomography of biphotons, either in frequency difference or frequency sum. This provides an alternative method for full characterization of an arbitrary two-photon state with exchange symmetry and holds potential for applications in high-dimensional quantum information processing. △ Less

Submitted 20 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: 14 pages, 3 figures

arXiv:2310.17157 [pdf, other]

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Authors: Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen

Abstract: Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at inference time. Sparsity is a natural approach to reduce this cost, but existing methods either require costly retraining, have to forgo LLM's in-context learning ability, or do not yield wall-clock time speedup on modern hardware.… ▽ More Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at inference time. Sparsity is a natural approach to reduce this cost, but existing methods either require costly retraining, have to forgo LLM's in-context learning ability, or do not yield wall-clock time speedup on modern hardware. We hypothesize that contextual sparsity, which are small, input-dependent sets of attention heads and MLP parameters that yield approximately the same output as the dense model for a given input, can address these issues. We show that contextual sparsity exists, that it can be accurately predicted, and that we can exploit it to speed up LLM inference in wall-clock time without compromising LLM's quality or in-context learning ability. Based on these insights, we propose DejaVu, a system that uses a low-cost algorithm to predict contextual sparsity on the fly given inputs to each layer, along with an asynchronous and hardware-aware implementation that speeds up LLM inference. We validate that DejaVu can reduce the inference latency of OPT-175B by over 2X compared to the state-of-the-art FasterTransformer, and over 6X compared to the widely used Hugging Face implementation, without compromising model quality. The code is available at https://github.com/FMInference/DejaVu. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Journal ref: Proceedings of the 40th International Conference on Machine Learning, 2023, 919

arXiv:2310.14277 [pdf, other]

A Survey on Continual Semantic Segmentation: Theory, Challenge, Method and Application

Authors: Bo Yuan, Danpei Zhao

Abstract: Continual learning, also known as incremental learning or life-long learning, stands at the forefront of deep learning and AI systems. It breaks through the obstacle of one-way training on close sets and enables continuous adaptive learning on open-set conditions. In the recent decade, continual learning has been explored and applied in multiple fields especially in computer vision covering classi… ▽ More Continual learning, also known as incremental learning or life-long learning, stands at the forefront of deep learning and AI systems. It breaks through the obstacle of one-way training on close sets and enables continuous adaptive learning on open-set conditions. In the recent decade, continual learning has been explored and applied in multiple fields especially in computer vision covering classification, detection and segmentation tasks. Continual semantic segmentation (CSS), of which the dense prediction peculiarity makes it a challenging, intricate and burgeoning task. In this paper, we present a review of CSS, committing to building a comprehensive survey on problem formulations, primary challenges, universal datasets, neoteric theories and multifarious applications. Concretely, we begin by elucidating the problem definitions and primary challenges. Based on an in-depth investigation of relevant approaches, we sort out and categorize current CSS models into two main branches including \textit{data-replay} and \textit{data-free} sets. In each branch, the corresponding approaches are similarity-based clustered and thoroughly analyzed, following qualitative comparison and quantitative reproductions on relevant datasets. Besides, we also introduce four CSS specialities with diverse application scenarios and development tendencies. Furthermore, we develop a benchmark for CSS encompassing representative references, evaluation results and reproductions, which is available at~\url{https://github.com/YBIO/SurveyCSS}. We hope this survey can serve as a reference-worthy and stimulating contribution to the advancement of the life-long learning field, while also providing valuable perspectives for related fields. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 20 pages, 12 figures. Undergoing Review

arXiv:2310.08161 [pdf, other]

Phase offset method of ptychographic contrast reversal correction

Authors: Christoph Hofer, Chuang Gao, Tamazouzt Chennit, Biao Yuan, Timothy J. Pennycook

Abstract: The contrast transfer function of direct ptychography methods such as the single side band (SSB) method are single signed, yet these methods still sometimes exhibit contrast reversals, most often where the projected potentials are strong. In thicker samples central focusing often provides the best ptychographic contrast as this leads to defocus variations within the sample canceling out. However f… ▽ More The contrast transfer function of direct ptychography methods such as the single side band (SSB) method are single signed, yet these methods still sometimes exhibit contrast reversals, most often where the projected potentials are strong. In thicker samples central focusing often provides the best ptychographic contrast as this leads to defocus variations within the sample canceling out. However focusing away from the entrance surface is often undesirable as this degrades the annular dark field (ADF) signal. Here we discuss how phase wrap asymptotes in the frequency response of SSB ptychography give rise to contrast reversals, without the need for dynamical scattering, and how these can be counteracted by manipulating the phases such that the asymptotes are either shifted to higher frequencies or damped via amplitude modulation. This is what enables post collection defocus correction of contrast reversals. However, the phase offset method of counteracting contrast reversals we introduce here is generally found to be superior to post collection application of defocus, with greater reliability and generally stronger contrast. Importantly, the phase offset method also works for thin and thick samples where central focusing does not. Finally, the independence of the method from focus is useful for optical sectioning involving ptychography, improving interpretability by better disentangling the effects of strong potentials and focus. △ Less

Submitted 21 December, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.04696 [pdf, other]

Serving Deep Learning Model in Relational Databases

Authors: Alexandre Eichenberger, Qi Lin, Saif Masood, Hong Min, Alexander Sim, Jie Wang, Yida Wang, Kesheng Wu, Binhang Yuan, Lixi Zhou, Jia Zou

Abstract: Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains, sparking growing interest recently. In this visionary paper, we embark on a comprehensive exploration of representative architectures to address the requirement. We highlight three pivotal paradigms: The state-of-the-artDL-Centricarchitecture offloadsDL computati… ▽ More Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains, sparking growing interest recently. In this visionary paper, we embark on a comprehensive exploration of representative architectures to address the requirement. We highlight three pivotal paradigms: The state-of-the-artDL-Centricarchitecture offloadsDL computations to dedicated DL frameworks. The potential UDF-Centric architecture encapsulates one or more tensor computations into User Defined Functions (UDFs) within the database system. The potentialRelation-Centricarchitecture aims to represent a large-scale tensor computation through relational operators. While each of these architectures demonstrates promise in specific use scenarios, we identify urgent requirements for seamless integration of these architectures and the middle ground between these architectures. We delve into the gaps that impede the integration and explore innovative strategies to close them. We present a pathway to establish a novel database system for enabling a broad class of data-intensive DL inference applications. △ Less

Submitted 9 October, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: Authors are ordered alphabetically; Jia Zou is the corresponding author

arXiv:2310.00205 [pdf, other]

An Empirical Study on the Use of Static Analysis Tools in Open Source Embedded Software

Authors: Mingjie Shen, Akul Pillai, Brian A. Yuan, James C. Davis, Aravind Machiry

Abstract: This paper performs the first study to understand the prevalence, challenges, and effectiveness of using Static Application Security Testing (SAST) tools on Open-Source Embedded Software (EMBOSS) repositories. We collect a corpus of 258 of the most popular EMBOSS projects, representing 13 distinct categories such as real-time operating systems, network stacks, and applications. To understand the c… ▽ More This paper performs the first study to understand the prevalence, challenges, and effectiveness of using Static Application Security Testing (SAST) tools on Open-Source Embedded Software (EMBOSS) repositories. We collect a corpus of 258 of the most popular EMBOSS projects, representing 13 distinct categories such as real-time operating systems, network stacks, and applications. To understand the current use of SAST tools on EMBOSS, we measured this corpus and surveyed developers. To understand the challenges and effectiveness of using SAST tools on EMBOSS projects, we applied these tools to the projects in our corpus. We report that almost none of these projects (just 3%) use SAST tools beyond those baked into the compiler, and developers give rationales such as ineffectiveness and false positives. In applying SAST tools ourselves, we show that minimal engineering effort and project expertise are needed to apply many tools to a given EMBOSS project. GitHub's CodeQL was the most effective SAST tool -- using its built-in security checks we found a total of 540 defects (with a false positive rate of 23%) across the 258 projects, with 399 (74%) likely security vulnerabilities, including in projects maintained by Microsoft, Amazon, and the Apache Foundation. EMBOSS engineers have confirmed 273 (51%) of these defects, mainly by accepting our pull requests. Two CVEs were issued. In summary, we urge EMBOSS engineers to adopt the current generation of SAST tools, which offer low false positive rates and are effective at finding security-relevant defects. △ Less

Submitted 29 September, 2023; originally announced October 2023.

arXiv:2309.15413 [pdf, other]

doi 10.1109/TPAMI.2023.3273574

Inherit with Distillation and Evolve with Contrast: Exploring Class Incremental Semantic Segmentation Without Exemplar Memory

Authors: Danpei Zhao, Bo Yuan, Zhenwei Shi

Abstract: As a front-burner problem in incremental learning, class incremental semantic segmentation (CISS) is plagued by catastrophic forgetting and semantic drift. Although recent methods have utilized knowledge distillation to transfer knowledge from the old model, they are still unable to avoid pixel confusion, which results in severe misclassification after incremental steps due to the lack of annotati… ▽ More As a front-burner problem in incremental learning, class incremental semantic segmentation (CISS) is plagued by catastrophic forgetting and semantic drift. Although recent methods have utilized knowledge distillation to transfer knowledge from the old model, they are still unable to avoid pixel confusion, which results in severe misclassification after incremental steps due to the lack of annotations for past and future classes. Meanwhile data-replay-based approaches suffer from storage burdens and privacy concerns. In this paper, we propose to address CISS without exemplar memory and resolve catastrophic forgetting as well as semantic drift synchronously. We present Inherit with Distillation and Evolve with Contrast (IDEC), which consists of a Dense Knowledge Distillation on all Aspects (DADA) manner and an Asymmetric Region-wise Contrastive Learning (ARCL) module. Driven by the devised dynamic class-specific pseudo-labelling strategy, DADA distils intermediate-layer features and output-logits collaboratively with more emphasis on semantic-invariant knowledge inheritance. ARCL implements region-wise contrastive learning in the latent space to resolve semantic drift among known classes, current classes, and unknown classes. We demonstrate the effectiveness of our method on multiple CISS tasks by state-of-the-art performance, including Pascal VOC 2012, ADE20K and ISPRS datasets. Our method also shows superior anti-forgetting ability, particularly in multi-step CISS tasks. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Journal ref: IEEE TPAMI 2023

arXiv:2308.11166 [pdf, other]

Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation

Authors: Zongyi Xu, Bo Yuan, Shanshan Zhao, Qianni Zhang, Xinbo Gao

Abstract: Impressive performance on point cloud semantic segmentation has been achieved by fully-supervised methods with large amounts of labelled data. As it is labour-intensive to acquire large-scale point cloud data with point-wise labels, many attempts have been made to explore learning 3D point cloud segmentation with limited annotations. Active learning is one of the effective strategies to achieve th… ▽ More Impressive performance on point cloud semantic segmentation has been achieved by fully-supervised methods with large amounts of labelled data. As it is labour-intensive to acquire large-scale point cloud data with point-wise labels, many attempts have been made to explore learning 3D point cloud segmentation with limited annotations. Active learning is one of the effective strategies to achieve this purpose but is still under-explored. The most recent methods of this kind measure the uncertainty of each pre-divided region for manual labelling but they suffer from redundant information and require additional efforts for region division. This paper aims at addressing this issue by developing a hierarchical point-based active learning strategy. Specifically, we measure the uncertainty for each point by a hierarchical minimum margin uncertainty module which considers the contextual information at multiple levels. Then, a feature-distance suppression strategy is designed to select important and representative points for manual labelling. Besides, to better exploit the unlabelled data, we build a semi-supervised segmentation framework based on our active strategy. Extensive experiments on the S3DIS and ScanNetV2 datasets demonstrate that the proposed framework achieves 96.5% and 100% performance of fully-supervised baseline with only 0.07% and 0.1% training data, respectively, outperforming the state-of-the-art weakly-supervised and active learning methods. The code will be available at https://github.com/SmiletoE/HPAL. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: International Conference on Computer Vision (ICCV) 2023

Showing 1–50 of 280 results for author: Yuan, B