Search | arXiv e-print repository

doi 10.1007/s41605-024-00467-8

LHAASO-KM2A detector simulation using Geant4

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.03893 [pdf, other]

KGExplainer: Towards Exploring Connected Subgraph Explanations for Knowledge Graph Completion

Authors: Tengfei Ma, Xiang song, Wen Tao, Mufei Li, Jiani Zhang, Xiaoqin Pan, Jianxin Lin, Bosheng Song, xiangxiang Zeng

Abstract: Knowledge graph completion (KGC) aims to alleviate the inherent incompleteness of knowledge graphs (KGs), which is a critical task for various applications, such as recommendations on the web. Although knowledge graph embedding (KGE) models have demonstrated superior predictive performance on KGC tasks, these models infer missing links in a black-box manner that lacks transparency and accountabili… ▽ More Knowledge graph completion (KGC) aims to alleviate the inherent incompleteness of knowledge graphs (KGs), which is a critical task for various applications, such as recommendations on the web. Although knowledge graph embedding (KGE) models have demonstrated superior predictive performance on KGC tasks, these models infer missing links in a black-box manner that lacks transparency and accountability, preventing researchers from developing accountable models. Existing KGE-based explanation methods focus on exploring key paths or isolated edges as explanations, which is information-less to reason target prediction. Additionally, the missing ground truth leads to these explanation methods being ineffective in quantitatively evaluating explored explanations. To overcome these limitations, we propose KGExplainer, a model-agnostic method that identifies connected subgraph explanations and distills an evaluator to assess them quantitatively. KGExplainer employs a perturbation-based greedy search algorithm to find key connected subgraphs as explanations within the local structure of target predictions. To evaluate the quality of the explored explanations, KGExplainer distills an evaluator from the target KGE model. By forwarding the explanations to the evaluator, our method can examine the fidelity of them. Extensive experiments on benchmark datasets demonstrate that KGExplainer yields promising improvement and achieves an optimal ratio of 83.3% in human evaluation. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 13 pages, 7 figures, 11 tables. Under Review

arXiv:2404.01929 [pdf]

Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method

Authors: Jyun-An Lin, Yun-Chien Cheng, Ching-Kai Lin

Abstract: This study aims to establish a computer-aided diagnostic system for lung lesions using endobronchial ultrasound (EBUS) to assist physicians in identifying lesion areas. During EBUS-transbronchial needle aspiration (EBUS-TBNA) procedures, hysicians rely on grayscale ultrasound images to determine the location of lesions. However, these images often contain significant noise and can be influenced by… ▽ More This study aims to establish a computer-aided diagnostic system for lung lesions using endobronchial ultrasound (EBUS) to assist physicians in identifying lesion areas. During EBUS-transbronchial needle aspiration (EBUS-TBNA) procedures, hysicians rely on grayscale ultrasound images to determine the location of lesions. However, these images often contain significant noise and can be influenced by surrounding tissues or blood vessels, making identification challenging. Previous research has lacked the application of object detection models to EBUS-TBNA, and there has been no well-defined solution for the lack of annotated data in the EBUS-TBNA dataset. In related studies on ultrasound images, although models have been successful in capturing target regions for their respective tasks, their training and predictions have been based on two-dimensional images, limiting their ability to leverage temporal features for improved predictions. This study introduces a three-dimensional video-based object detection model. It first generates a set of improved queries using a diffusion model, then captures temporal correlations through an attention mechanism. A filtering mechanism selects relevant information from previous frames to pass to the current frame. Subsequently, a teacher-student model training approach is employed to further optimize the model using unlabeled data. By incorporating various data augmentation and feature alignment, the model gains robustness against interference. Test results demonstrate that this model, which captures spatiotemporal information and employs semi-supervised learning methods, achieves an Average Precision (AP) of 48.7 on the test dataset, outperforming other models. It also achieves an Average Recall (AR) of 79.2, significantly leading over existing models. △ Less

Submitted 20 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01687 [pdf, other]

Search for a sub-eV sterile neutrino using Daya Bay's full dataset

Authors: F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding, Y. Y. Ding , et al. (176 additional authors not shown)

Abstract: This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis… ▽ More This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis benefits from a doubling of the statistics of our previous result and from improvements of several important systematic uncertainties. No significant oscillation due to mixing of a sub-eV sterile neutrino with active neutrinos was found. Exclusion limits are set by both Feldman-Cousins and CLs methods. Light sterile neutrino mixing with $\sin^2 2θしーた_{14} \gtrsim 0.01$ can be excluded at 95\% confidence level in the region of $0.01$ eV$^2 \lesssim |Δでるたm^{2}_{41}| \lesssim 0.1 $ eV$^2$. This result represents the world-leading constraints in the region of $2 \times 10^{-4}$ eV$^2 \lesssim |Δでるたm^{2}_{41}| \lesssim 0.2 $ eV$^2$. △ Less

Submitted 15 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures, 1 table

arXiv:2404.01426 [pdf, other]

Laboratory demonstration of a Photonic Lantern Nuller in monochromatic and broadband light

Authors: Yinzi Xin, Daniel Echeverri, Nemanja Jovanovic, Dimitri Mawet, Sergio Leon-Saval, Rodrigo Amezcua-Correa, Stephanos Yerolatsitis, Michael P. Fitzgerald, Pradip Gatkine, Yoo Jung Kim, Jonathan Lin, Barnaby Norris, Garreth Ruane, Steph Sallum

Abstract: Photonic lantern nulling (PLN) is a method for enabling the detection and characterization of close-in exoplanets by exploiting the symmetries of the ports of a mode-selective photonic lantern (MSPL) to cancel out starlight. A six-port MSPL provides four ports where on-axis starlight is suppressed, while off-axis planet light is coupled with efficiencies that vary as a function of the planet's spa… ▽ More Photonic lantern nulling (PLN) is a method for enabling the detection and characterization of close-in exoplanets by exploiting the symmetries of the ports of a mode-selective photonic lantern (MSPL) to cancel out starlight. A six-port MSPL provides four ports where on-axis starlight is suppressed, while off-axis planet light is coupled with efficiencies that vary as a function of the planet's spatial position. We characterize the properties of a six-port MSPL in the laboratory and perform the first testbed demonstration of the PLN in monochromatic light (1569 nm) and in broadband light (1450 nm to 1625 nm), each using two orthogonal polarizations. We compare the measured spatial throughput maps with those predicted by simulations using the lantern's modes. We find that the morphologies of the measured throughput maps are reproduced by the simulations, though the real lantern is lossy and has lower throughputs overall. The measured ratios of on-axis stellar leakage to peak off-axis throughput are around 10^(-2), likely limited by testbed wavefront errors. These null-depths are already sufficient for observing young gas giants at the diffraction limit using ground-based observatories. Future work includes using wavefront control to further improve the nulls, as well as testing and validating the PLN on-sky. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 30 pages, 12 figures

arXiv:2403.19479 [pdf, other]

Parallel and real-time post-processing for quantum random number generators

Authors: Xiaomin Guo, Fading Lin, Jiehong Lin, Zhijie Song, Yue luo, Qiqi Wang, Yanqiang Guo

Abstract: Quantum systems are particularly suited for generating true randomness due to their inherent unpredictability, which can be justified on physical principles. However, practical implementations of Quantum RNGs (QRNGs) are always subject to noise, or uncontrollable influences, diminishing the quality of raw randomness produced. This necessitates post-processing to convert raw output into genuine ran… ▽ More Quantum systems are particularly suited for generating true randomness due to their inherent unpredictability, which can be justified on physical principles. However, practical implementations of Quantum RNGs (QRNGs) are always subject to noise, or uncontrollable influences, diminishing the quality of raw randomness produced. This necessitates post-processing to convert raw output into genuine randomness. In current QRNG implementations, the critical issue of seed updating is often overlooked, risking security vulnerabilities due to increased security parameters when seeds are reused in post-processing, and frequent seed updates fail to yield net randomness, while reusing seeds relies on the assumption that the original sequence inputs are independent.In this work, we have provided a specific scheme for seed updates that balances practicality and security, exploring the parallel and real-time implementation of multiple seed real-time updating toeplitz hash extractors in an FPGA to achieve parallel QRNGs, focusing on efficient hardware computation resource use. Through logic optimization, we achieved a greater number of parallel channels and a post-processing matrix size three times larger than previous works on the same FPGA platform, utilizing fewer logic resources. This resulted in a higher rate of random number generation and enhanced security. Furthermore, with the use of higher-performance ADCs, we attained a random number production rate exceeding 20Gbps.High-speed random number transfer and seed updating were achieved using the PCIe high-speed interface.This marks a significant step toward chip-based parallel QRNGs, enhancing the practicality of CV QRNGs in trusted, device-independent, and semi-device-independent scenarios. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19076 [pdf, other]

doi 10.1109/MCAS.2023.3302182

Tiny Machine Learning: Progress and Futures

Authors: Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Song Han

Abstract: Tiny Machine Learning (TinyML) is a new frontier of machine learning. By squeezing deep learning models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI applications and enable ubiquitous intelligence. However, TinyML is challenging due to hardware constraints: the tiny memory resource makes it difficult to hold deep learning models designed for cloud and mobile… ▽ More Tiny Machine Learning (TinyML) is a new frontier of machine learning. By squeezing deep learning models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI applications and enable ubiquitous intelligence. However, TinyML is challenging due to hardware constraints: the tiny memory resource makes it difficult to hold deep learning models designed for cloud and mobile platforms. There is also limited compiler and inference engine support for bare-metal devices. Therefore, we need to co-design the algorithm and system stack to enable TinyML. In this review, we will first discuss the definition, challenges, and applications of TinyML. We then survey the recent progress in TinyML and deep learning on MCUs. Next, we will introduce MCUNet, showing how we can achieve ImageNet-scale AI applications on IoT devices with system-algorithm co-design. We will further extend the solution from inference to training and introduce tiny on-device training techniques. Finally, we present future directions in this area. Today's large model might be tomorrow's tiny model. The scope of TinyML should evolve and adapt over time. △ Less

Submitted 29 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2206.15472

Journal ref: IEEE Circuits and Systems Magazine, 23(3), pp. 8-34, October 2023

arXiv:2403.18092 [pdf, other]

OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

Authors: Jisoo Jeong, Hong Cai, Risheek Garrepalli, Jamie Menjay Lin, Munawar Hayat, Fatih Porikli

Abstract: The scarcity of ground-truth labels poses one major challenge in developing optical flow estimation models that are both generalizable and robust. While current methods rely on data augmentation, they have yet to fully exploit the rich information available in labeled video sequences. We propose OCAI, a method that supports robust frame interpolation by generating intermediate video frames alongsi… ▽ More The scarcity of ground-truth labels poses one major challenge in developing optical flow estimation models that are both generalizable and robust. While current methods rely on data augmentation, they have yet to fully exploit the rich information available in labeled video sequences. We propose OCAI, a method that supports robust frame interpolation by generating intermediate video frames alongside optical flows in between. Utilizing a forward warping approach, OCAI employs occlusion awareness to resolve ambiguities in pixel values and fills in missing values by leveraging the forward-backward consistency of optical flows. Additionally, we introduce a teacher-student style semi-supervised learning method on top of the interpolated frames. Using a pair of unlabeled frames and the teacher model's predicted optical flow, we generate interpolated frames and flows to train a student model. The teacher's weights are maintained using Exponential Moving Averaging of the student. Our evaluations demonstrate perceptually superior interpolation quality and enhanced optical flow accuracy on established benchmarks such as Sintel and KITTI. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.16531 [pdf, other]

Giant Hall effect in two-dimensional CoSi$_2$ granular arrays

Authors: Elica Anne Heredia, Shao-Pin Chiu, Ba-Anh-Vu Nguyen, Ruey-Tay Wang, Chih-Yuan Wu, Sheng-Shiuan Yeh, Juhn-Jong Lin

Abstract: Granular metals offer tailorable electronic properties and play crucial roles in device and sensor applications. We have fabricated a series of nonmagnetic granular CoSi2 thin films and studied the Hall effect and transport properties. We observed a two orders of magnitude enhancement in the Hall coefficient in films fall slightly above the metal-insulator transition. This giant Hall effect (GHE)… ▽ More Granular metals offer tailorable electronic properties and play crucial roles in device and sensor applications. We have fabricated a series of nonmagnetic granular CoSi2 thin films and studied the Hall effect and transport properties. We observed a two orders of magnitude enhancement in the Hall coefficient in films fall slightly above the metal-insulator transition. This giant Hall effect (GHE) is ascribed to the local quantum-interference effect induced reduction of the charge carriers. Transmission electron microscopy images and transport properties indicate that our films form two dimensional granular arrays. The GHE may provide useful and sensitive applications. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 5 pages, 4 figures

arXiv:2403.16378 [pdf, other]

Play to Your Strengths: Collaborative Intelligence of Conventional Recommender Models and Large Language Models

Authors: Yunjia Xi, Weiwen Liu, Jianghao Lin, Chuhan Wu, Bo Chen, Ruiming Tang, Weinan Zhang, Yong Yu

Abstract: The rise of large language models (LLMs) has opened new opportunities in Recommender Systems (RSs) by enhancing user behavior modeling and content understanding. However, current approaches that integrate LLMs into RSs solely utilize either LLM or conventional recommender model (CRM) to generate final recommendations, without considering which data segments LLM or CRM excel in. To fill in this gap… ▽ More The rise of large language models (LLMs) has opened new opportunities in Recommender Systems (RSs) by enhancing user behavior modeling and content understanding. However, current approaches that integrate LLMs into RSs solely utilize either LLM or conventional recommender model (CRM) to generate final recommendations, without considering which data segments LLM or CRM excel in. To fill in this gap, we conduct experiments on MovieLens-1M and Amazon-Books datasets, and compare the performance of a representative CRM (DCNv2) and an LLM (LLaMA2-7B) on various groups of data samples. Our findings reveal that LLMs excel in data segments where CRMs exhibit lower confidence and precision, while samples where CRM excels are relatively challenging for LLM, requiring substantial training data and a long training time for comparable performance. This suggests potential synergies in the combination between LLM and CRM. Motivated by these insights, we propose Collaborative Recommendation with conventional Recommender and Large Language Model (dubbed \textit{CoReLLa}). In this framework, we first jointly train LLM and CRM and address the issue of decision boundary shifts through alignment loss. Then, the resource-efficient CRM, with a shorter inference time, handles simple and moderate samples, while LLM processes the small subset of challenging samples for CRM. Our experimental results demonstrate that CoReLLa outperforms state-of-the-art CRM and LLM methods significantly, underscoring its effectiveness in recommendation tasks. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.15872 [pdf, other]

RAAMove: A Corpus for Analyzing Moves in Research Article Abstracts

Authors: Hongzheng Li, Ruojin Wang, Ge Shi, Xing Lv, Lei Lei, Chong Feng, Fang Liu, Jinkun Lin, Yangguang Mei, Lingnan Xu

Abstract: Move structures have been studied in English for Specific Purposes (ESP) and English for Academic Purposes (EAP) for decades. However, there are few move annotation corpora for Research Article (RA) abstracts. In this paper, we introduce RAAMove, a comprehensive multi-domain corpus dedicated to the annotation of move structures in RA abstracts. The primary objective of RAAMove is to facilitate mov… ▽ More Move structures have been studied in English for Specific Purposes (ESP) and English for Academic Purposes (EAP) for decades. However, there are few move annotation corpora for Research Article (RA) abstracts. In this paper, we introduce RAAMove, a comprehensive multi-domain corpus dedicated to the annotation of move structures in RA abstracts. The primary objective of RAAMove is to facilitate move analysis and automatic move identification. This paper provides a thorough discussion of the corpus construction process, including the scheme, data collection, annotation guidelines, and annotation procedures. The corpus is constructed through two stages: initially, expert annotators manually annotate high-quality data; subsequently, based on the human-annotated data, a BERT-based model is employed for automatic annotation with the help of experts' modification. The result is a large-scale and high-quality corpus comprising 33,988 annotated instances. We also conduct preliminary move identification experiments using the BERT-based model to verify the effectiveness of the proposed corpus and model. The annotated corpus is available for academic research purposes and can serve as essential resources for move analysis, English language teaching and writing, as well as move/discourse-related tasks in Natural Language Processing (NLP). △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Accepted by LREC-COLING 2024

arXiv:2403.14668 [pdf, other]

Predicting Learning Performance with Large Language Models: A Study in Adult Literacy

Authors: Liang Zhang, Jionghao Lin, Conrad Borchers, John Sabatini, John Hollander, Meng Cao, Xiangen Hu

Abstract: Intelligent Tutoring Systems (ITSs) have significantly enhanced adult literacy training, a key factor for societal participation, employment opportunities, and lifelong learning. Our study investigates the application of advanced AI models, including Large Language Models (LLMs) like GPT-4, for predicting learning performance in adult literacy programs in ITSs. This research is motivated by the po… ▽ More Intelligent Tutoring Systems (ITSs) have significantly enhanced adult literacy training, a key factor for societal participation, employment opportunities, and lifelong learning. Our study investigates the application of advanced AI models, including Large Language Models (LLMs) like GPT-4, for predicting learning performance in adult literacy programs in ITSs. This research is motivated by the potential of LLMs to predict learning performance based on its inherent reasoning and computational capabilities. By using reading comprehension datasets from the ITS, AutoTutor, we evaluate the predictive capabilities of GPT-4 versus traditional machine learning methods in predicting learning performance through five-fold cross-validation techniques. Our findings show that the GPT-4 presents the competitive predictive abilities with traditional machine learning methods such as Bayesian Knowledge Tracing, Performance Factor Analysis, Sparse Factor Analysis Lite (SPARFA-Lite), tensor factorization and eXtreme Gradient Boosting (XGBoost). While XGBoost (trained on local machine) outperforms GPT-4 in predictive accuracy, GPT-4-selected XGBoost and its subsequent tuning on the GPT-4 platform demonstrates superior performance compared to local machine execution. Moreover, our investigation into hyper-parameter tuning by GPT-4 versus grid-search suggests comparable performance, albeit with less stability in the automated approach, using XGBoost as the case study. Our study contributes to the field by highlighting the potential of integrating LLMs with traditional machine learning models to enhance predictive accuracy and personalize adult literacy education, setting a foundation for future research in applying LLMs within ITSs. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 26TH International Conference on Human-Computer Interaction

arXiv:2403.14455 [pdf, other]

Non-Markovian skin effect

Authors: Po-Chen Kuo, Shen-Liang Yang, Neill Lambert, Jhen-Dong Lin, Yi-Te Huang, Franco Nori, Yueh-Nan Chen

Abstract: The Liouvillian skin effect and the non-Hermitian skin effect have both been used to explain the localization of eigenmodes near system boundaries, though the former is arguably more accurate in some regimes due to its incorporation of quantum jumps. However, these frameworks predominantly focus on weak Markovian interactions, neglecting the potentially crucial role of memory effects. To address t… ▽ More The Liouvillian skin effect and the non-Hermitian skin effect have both been used to explain the localization of eigenmodes near system boundaries, though the former is arguably more accurate in some regimes due to its incorporation of quantum jumps. However, these frameworks predominantly focus on weak Markovian interactions, neglecting the potentially crucial role of memory effects. To address this, we investigate, utilizing the powerful hierarchical equations of motion method, how a non-Markovian environment can modify the Liouvillian skin effect. We demonstrate that a non-Markovian environment can induce not only a ``thick skin effect", where the skin mode broadens and shifts into the bulk, but also skin-mode coherence, leading to the coherence-delocalization and oscillatory relaxation with a characteristic linear scaling with system size. Remarkably, both the skin-mode and steady-state coherence exhibit resistance to decoherence from additional environmental noise. These findings highlight the profound impact of system-bath correlations on relaxation and localization, revealing unique phenomena beyond conventional Markovian approximations. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 15 pages, 9 figures

arXiv:2403.13650 [pdf, ps, other]

doi 10.1016/j.jalgebra.2024.05.022

Certain functional identities on division rings of characteristic two

Authors: Münevver Pınar Eroğlu, Tsiu-Kwen Lee, Jheng-Huei Lin

Abstract: Let $D$ be a noncommutative division ring. In a recent paper, Lee and Lin proved that if $\text{char}\, D\ne 2$, the only solution of additive maps $f, g$ on $D$ satisfying the identity $f(x) = x^n g(x^{-1})$ on $D\setminus \{0\}$ with $n\ne 2$ a positive integer is the trivial case, that is, $f=0$ and $g=0$. Applying Hua's identity and the theory of functional and generalized polynomial identitie… ▽ More Let $D$ be a noncommutative division ring. In a recent paper, Lee and Lin proved that if $\text{char}\, D\ne 2$, the only solution of additive maps $f, g$ on $D$ satisfying the identity $f(x) = x^n g(x^{-1})$ on $D\setminus \{0\}$ with $n\ne 2$ a positive integer is the trivial case, that is, $f=0$ and $g=0$. Applying Hua's identity and the theory of functional and generalized polynomial identities, we give a complete solution of the same identity for any nonnegative integer $n$ if $\text{char}\, D=2$. △ Less

Submitted 20 March, 2024; originally announced March 2024.

MSC Class: 16R60 (Primary) 16R50; 16K40 (Secondary)

Journal ref: J. Algebra 657 (2024) 363-378

arXiv:2403.13244 [pdf]

Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model

Authors: Peng Zhou, Jianmin Wang, Chunyan Li, Zixu Wang, Yiping Liu, Chubo Liu, Siqi Sun, Jianxin Lin, Leyi Wei, Xibao Cai, Houtim Lai, Wei Liu, Longyue Wang, Xiangxiang Zeng, Kenli Li

Abstract: While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Here, we introduce a multi-constraint molecular generation large language model, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the 'tea… ▽ More While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Here, we introduce a multi-constraint molecular generation large language model, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the 'teachers'. To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers', enabling it to generate novel molecules that conform to the descriptions through various text prompts. We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements across two-, three-, and four-constraint tasks, with an average molecular validity of over 99% and success ratio of 82.58%, 68.03%, and 67.48%, respectively. The model also exhibits adaptability through zero-shot testing, creating molecules that satisfy combinations of properties that have not been encountered. It can comprehend text inputs with various language styles, extending beyond the confines of outlined prompts, as confirmed through empirical validation. Additionally, the knowledge distillation feature of TSMMG contributes to the continuous enhancement of small models, while the innovative approach to dataset construction effectively addresses the issues of data scarcity and quality, which positions TSMMG as a promising tool in the domains of drug discovery and materials science. △ Less

Submitted 5 August, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 37 pages, 10 figures

arXiv:2403.12804 [pdf, other]

The Bayes Principle and Segal Axioms for $P(φふぁい)_2$, with application to Periodic Covers

Authors: Jiasheng Lin

Abstract: We construct a $P(φふぁい)_2$ Gibbs state on infinite volume periodic surfaces (namely, with discrete ``time translations'') by analogy with 1-dimensional spin chains and establish the mass gap for our Gibbs state, there are no phase transitions. We also derive asymptotic properties of the $P(φふぁい)_2$ partition function on certain towers of cyclic covers of large degrees that converge to the periodic surfa… ▽ More We construct a $P(φふぁい)_2$ Gibbs state on infinite volume periodic surfaces (namely, with discrete ``time translations'') by analogy with 1-dimensional spin chains and establish the mass gap for our Gibbs state, there are no phase transitions. We also derive asymptotic properties of the $P(φふぁい)_2$ partition function on certain towers of cyclic covers of large degrees that converge to the periodic surface in some appropriate sense. This gives the first construction of an interacting Quantum Field Theory on surfaces of infinite genus with a mass gap. The main ingredient in our approach is to reconcile the so-called $P(φふぁい)_2$ model from classical constructive quantum field theory (CQFT) with Riemannian version of the axioms proposed by G. Segal in the 90's. We show the $P(φふぁい)_2$ model satisfies these axioms, appropriately adjusted. One key ingredient in our proof is to use what we call ``the Bayes principle'' of conditional probabilities in the infinite dimensional setting. We also give a precise statement and full proof of the locality of the $P(φふぁい)_2$ interaction. △ Less

Submitted 3 August, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 52 pages; v3: shortened presentation, revised introduction; v2 has more appendix

arXiv:2403.12800 [pdf, other]

Learning Neural Volumetric Pose Features for Camera Localization

Authors: Jingyu Lin, Jiaqi Gu, Bojian Wu, Lubin Fan, Renjie Chen, Ligang Liu, Jieping Ye

Abstract: We introduce a novel neural volumetric pose feature, termed PoseMap, designed to enhance camera localization by encapsulating the information between images and the associated camera poses. Our framework leverages an Absolute Pose Regression (APR) architecture, together with an augmented NeRF module. This integration not only facilitates the generation of novel views to enrich the training dataset… ▽ More We introduce a novel neural volumetric pose feature, termed PoseMap, designed to enhance camera localization by encapsulating the information between images and the associated camera poses. Our framework leverages an Absolute Pose Regression (APR) architecture, together with an augmented NeRF module. This integration not only facilitates the generation of novel views to enrich the training dataset but also enables the learning of effective pose features. Additionally, we extend our architecture for self-supervised online alignment, allowing our method to be used and fine-tuned for unlabelled images within a unified framework. Experiments demonstrate that our method achieves 14.28% and 20.51% performance gain on average in indoor and outdoor benchmark scenes, outperforming existing APR methods with state-of-the-art accuracy. △ Less

Submitted 11 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: Accepted at ECCV 2024. Project page: https://gujiaqivadin.github.io/posemap/

arXiv:2403.10134 [pdf, other]

doi 10.1140/epjc/s10052-024-12987-0

Measurement of groomed event shape observables in deep-inelastic electron-proton scattering at HERA

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (123 additional authors not shown)

Abstract: The H1 Collaboration at HERA reports the first measurement of groomed event shape observables in deep inelastic electron-proton scattering (DIS) at $\sqrt{s}=319$ GeV, using data recorded between the years 2003 and 2007 with an integrated luminosity of $351$ pb$^{-1}$. Event shapes provide incisive probes of perturbative and non-perturbative QCD. Grooming techniques have been used for jet measurem… ▽ More The H1 Collaboration at HERA reports the first measurement of groomed event shape observables in deep inelastic electron-proton scattering (DIS) at $\sqrt{s}=319$ GeV, using data recorded between the years 2003 and 2007 with an integrated luminosity of $351$ pb$^{-1}$. Event shapes provide incisive probes of perturbative and non-perturbative QCD. Grooming techniques have been used for jet measurements in hadronic collisions; this paper presents the first application of grooming to DIS data. The analysis is carried out in the Breit frame, utilizing the novel Centauro jet clustering algorithm that is designed for DIS event topologies. Events are required to have squared momentum-transfer $Q^2 > 150$ GeV$^2$ and inelasticity $ 0.2 < y < 0.7$. We report measurements of the production cross section of groomed event 1-jettiness and groomed invariant mass for several choices of grooming parameter. Monte Carlo model calculations and analytic calculations based on Soft Collinear Effective Theory are compared to the measurements. △ Less

Submitted 1 August, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 32 pages, 17 tables, 7 figures, version as accepted by EPJ C

Report number: DESY-24-036

Journal ref: EPJC 84 (2024), 718

arXiv:2403.10109 [pdf, other]

Measurement of the 1-jettiness event shape observable in deep-inelastic electron-proton scattering at HERA

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (124 additional authors not shown)

Abstract: The H1 Collaboration reports the first measurement of the 1-jettiness event shape observable $τたう_1^b$ in neutral-current deep-inelastic electron-proton scattering (DIS). The observable $τたう_1^b$ is equivalent to a thrust observable defined in the Breit frame. The data sample was collected at the HERA $ep$ collider in the years 2003-2007 with center-of-mass energy of $\sqrt{s}=319\,\text{GeV}$, corres… ▽ More The H1 Collaboration reports the first measurement of the 1-jettiness event shape observable $τたう_1^b$ in neutral-current deep-inelastic electron-proton scattering (DIS). The observable $τたう_1^b$ is equivalent to a thrust observable defined in the Breit frame. The data sample was collected at the HERA $ep$ collider in the years 2003-2007 with center-of-mass energy of $\sqrt{s}=319\,\text{GeV}$, corresponding to an integrated luminosity of $351.1\,\text{pb}^{-1}$. Triple differential cross sections are provided as a function of $τたう_1^b$, event virtuality $Q^2$, and inelasticity $y$, in the kinematic region $Q^2>150\,\text{GeV}^{2}$. Single differential cross section are provided as a function of $τたう_1^b$ in a limited kinematic range. Double differential cross sections are measured, in contrast, integrated over $τたう_1^b$ and represent the inclusive neutral-current DIS cross section measured as a function of $Q^2$ and $y$. The data are compared to a variety of predictions and include classical and modern Monte Carlo event generators, predictions in fixed-order perturbative QCD where calculations up to $\mathcal{O}(αあるふぁ_s^3)$ are available for $τたう_1^b$ or inclusive DIS, and resummed predictions at next-to-leading logarithmic accuracy matched to fixed order predictions at $\mathcal{O}(αあるふぁ_s^2)$. These comparisons reveal sensitivity of the 1-jettiness observable to QCD parton shower and resummation effects, as well as the modeling of hadronization and fragmentation. Within their range of validity, the fixed-order predictions provide a good description of the data. Monte Carlo event generators are predictive over the full measured range and hence their underlying models and parameters can be constrained by comparing to the presented data. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 45 pages, 38 tables, 13 figures

Report number: DESY-24-035

arXiv:2403.10010 [pdf, other]

doi 10.1103/PhysRevLett.132.131002

Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen , et al. (256 additional authors not shown)

Abstract: We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at… ▽ More We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components. △ Less

Submitted 26 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures

Journal ref: Physical Review Letters 132, 131002 (2024)

arXiv:2403.09464 [pdf, other]

doi 10.1051/0004-6361/202348460

New constraints on Triton's atmosphere from the 6 October 2022 stellar occultation

Authors: Ye Yuan, Chen Zhang, Fan Li, Jian Chen, Yanning Fu, Chunhai Bai, Xing Gao, Yong Wang, Tuhong Zhong, Yixing Gao, Liang Wang, Donghua Chen, Yixing Zhang, Yang Zhang, Wenpeng Xie, Shupi Zhang, Ding Liu, Jun Cao, Xiangdong Yin, Xiaojun Mo, Jing Liu, Xinru Han, Tong Liu, Yuqiang Chen, Zhendong Gao , et al. (25 additional authors not shown)

Abstract: The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by pr… ▽ More The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by previous observations, including only five stellar occultations, and the Voyager 2 radio occultation in 1989. Using an approach consistent with a comparable study, we precisely determined a surface pressure of $14.07_{-0.13}^{+0.21}~\mathrm{μみゅーbar}$ in 2022. This new pressure rules out any significant monotonic variation in pressure between 2017 and 2022 through direct observations, as it is in alignment with the 2017 value. Additionally, both the pressures in 2017 and 2022 align with the 1989 value. This provides further support for the conclusion drawn from the previous volatile transport model simulation, which is consistent with the observed alignment between the pressures in 1989 and 2017; that is to say, the pressure fluctuation is modest. Moreover, this conclusion suggests the existence of a northern polar cap extended down to at least $45^\circ$N$-60^\circ$N and the presence of nitrogen between $30^\circ$S and $0^\circ$. △ Less

Submitted 24 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: Astronomy & Astrophysics, in press. 9 pages, 2 figures, 3 tables

Journal ref: A&A 684, L13 (2024)

arXiv:2403.08982 [pdf, other]

doi 10.1140/epjc/s10052-024-13003-1

Observation and differential cross section measurement of neutral current DIS events with an empty hemisphere in the Breit frame

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (124 additional authors not shown)

Abstract: The Breit frame provides a natural frame to analyze lepton-proton scattering events. In this reference frame, the parton model hard interactions between a quark and an exchanged boson defines the coordinate system such that the struck quark is back-scattered along the virtual photon momentum direction. In Quantum Chromodynamics (QCD), higher order perturbative or non-perturbative effects can chang… ▽ More The Breit frame provides a natural frame to analyze lepton-proton scattering events. In this reference frame, the parton model hard interactions between a quark and an exchanged boson defines the coordinate system such that the struck quark is back-scattered along the virtual photon momentum direction. In Quantum Chromodynamics (QCD), higher order perturbative or non-perturbative effects can change this picture drastically. As Bjorken-$x$ decreases below one half, a rather peculiar event signature is predicted with increasing probability, where no radiation is present in one of the two Breit-frame hemispheres and all emissions are to be found in the other hemisphere. At higher orders in $αあるふぁ_s$ or in the presence of soft QCD effects, predictions of the rate of these events are far from trivial, and that motivates measurements with real data. We report on the first observation of the empty current hemisphere events in electron-proton collisions at the HERA collider using data recorded with the H1 detector at a center-of-mass energy of 319 GeV. The fraction of inclusive neutral-current DIS events with an empty hemisphere is found to be $0.0112 \pm 3.9\,\%_\text{stat} \pm 4.5\,\%_\text{syst} \pm 1.6\,\%_\text{mod}$ in the selected kinematic region of $150< Q^2<1500$ GeV$^2$ and inelasticity $0.14< y<0.7$. The data sample corresponds to an integrated luminosity of 351.1 pb$^{-1}$, sufficient to enable differential cross section measurements of these events. The results show an enhanced discriminating power at lower Bjorken-$x$ among different Monte Carlo event generator predictions. △ Less

Submitted 1 August, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: 13 pages, 5 figures, 2 Tables. This version as accepted for publication

Report number: DESY-24-034

Journal ref: EPJC 84 (2024), 720

arXiv:2403.08347 [pdf, other]

1/f frequency fluctuations due to kinetic inductance in CoSi$_2$ microwave cavities

Authors: Weijun Zeng, Ilari Lilja, Ekaterina Mukhanova, Elica Heredia, Chun-Wei Wu, Juhn-Jong Lin, Sheng-Shiuan Yeh, Pertti Hakonen

Abstract: Cobalt disilicide provides a promising nearly-epitaxial superconducting material on silicon, which is compatible with high-density integrated circuit technology. We have characterized CoSi$_{2}$ superconducting microwave cavities around 5.5 GHz for resonance frequency fluctuations at temperatures 10 - 200 mK. We found relatively weak fluctuations $(δでるたf/f)^2$ following the spectral density $A/f^γがんま $,… ▽ More Cobalt disilicide provides a promising nearly-epitaxial superconducting material on silicon, which is compatible with high-density integrated circuit technology. We have characterized CoSi$_{2}$ superconducting microwave cavities around 5.5 GHz for resonance frequency fluctuations at temperatures 10 - 200 mK. We found relatively weak fluctuations $(δでるたf/f)^2$ following the spectral density $A/f^γがんま $, with $A \simeq 6 \times 10^{-16}$ and $γがんま$ slightly below 1 at an average number of photons of $10^4$; the noise decreased with measurement power as $1/P^{1/2}$. We identify the noise as arising from kinetic inductance fluctuations and discuss possible origins of such fluctuations. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 6 pages, 3 figures

arXiv:2403.08266 [pdf, other]

Sketch2Manga: Shaded Manga Screening from Sketch with Diffusion Models

Authors: Jian Lin, Xueting Liu, Chengze Li, Minshan Xie, Tien-Tsin Wong

Abstract: While manga is a popular entertainment form, creating manga is tedious, especially adding screentones to the created sketch, namely manga screening. Unfortunately, there is no existing method that tailors for automatic manga screening, probably due to the difficulty of generating high-quality shaded high-frequency screentones. The classic manga screening approaches generally require user input to… ▽ More While manga is a popular entertainment form, creating manga is tedious, especially adding screentones to the created sketch, namely manga screening. Unfortunately, there is no existing method that tailors for automatic manga screening, probably due to the difficulty of generating high-quality shaded high-frequency screentones. The classic manga screening approaches generally require user input to provide screentone exemplars or a reference manga image. The recent deep learning models enables the automatic generation by learning from a large-scale dataset. However, the state-of-the-art models still fail to generate high-quality shaded screentones due to the lack of a tailored model and high-quality manga training data. In this paper, we propose a novel sketch-to-manga framework that first generates a color illustration from the sketch and then generates a screentoned manga based on the intensity guidance. Our method significantly outperforms existing methods in generating high-quality manga with shaded high-frequency screentones. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 7 pages, 6 figures

ACM Class: I.4.6; I.3.3; I.3.8

arXiv:2403.06764 [pdf, other]

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Authors: Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, Baobao Chang

Abstract: In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of extreme inefficiency in the deep layers of popular LVLMs, suggesting a need for a sparser approach compared to textual data handling. To this end, we i… ▽ More In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of extreme inefficiency in the deep layers of popular LVLMs, suggesting a need for a sparser approach compared to textual data handling. To this end, we introduce FastV, a versatile plug-and-play method designed to optimize computational efficiency by learning adaptive attention patterns in early layers and pruning visual tokens in subsequent ones. Our evaluations demonstrate FastV's ability to dramatically reduce computational costs (e.g., a 45 reduction in FLOPs for LLaVA-1.5-13B) without sacrificing performance in a wide range of image and video understanding tasks. The computational efficiency and performance trade-off of FastV are highly customizable and pareto-efficient. It can compress the FLOPs of a 13B-parameter model to achieve a lower budget than that of a 7B-parameter model, while still maintaining superior performance. We believe FastV has practical values for deployment of LVLMs in edge devices and commercial models. Code is released at https://github.com/pkunlp-icler/FastV. △ Less

Submitted 25 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: 21 papes, 8 figures, code is released at https://github.com/pkunlp-icler/FastV

arXiv:2403.05808 [pdf, other]

Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

Authors: Junxiong Lin, Yan Wang, Zeng Tao, Boyang Wang, Qing Zhao, Haorang Wang, Xuan Tong, Xinji Mai, Yuxuan Lin, Wei Song, Jiawen Yu, Shaoqi Yan, Wenqiang Zhang

Abstract: Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation inf… ▽ More Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation information on the diffusion process. Furthermore, these methods fail to consider the spatial variability inherent in the estimated blur kernel, stemming from factors such as motion jitter and out-of-focus elements in open-environment scenarios. This oversight results in a notable deviation of the image super-resolution effect from fundamental realities. To address these concerns, we introduce a framework known as Adaptive Multi-modal Fusion of \textbf{S}patially Variant Kernel Refinement with Diffusion Model for Blind Image \textbf{S}uper-\textbf{R}esolution (SSR). Within the SSR framework, we propose a Spatially Variant Kernel Refinement (SVKR) module. SVKR estimates a Depth-Informed Kernel, which takes the depth information into account and is spatially variant. Additionally, SVKR enhance the accuracy of depth information acquired from LR images, allowing for mutual enhancement between the depth map and blur kernel estimates. Finally, we introduce the Adaptive Multi-Modal Fusion (AMF) module to align the information from three modalities: low-resolution images, depth maps, and blur kernels. This alignment can constrain the diffusion model to generate more authentic SR results. △ Less

Submitted 9 July, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.04628 [pdf, other]

On the extinction of multiple shocks in scalar viscous conservation laws

Authors: Jeanne Lin, Dmitry E. Pelinovsky, Bjorn de Rijk

Abstract: We are interested in the dynamics of interfaces, or zeros, of shock waves in general scalar viscous conservation laws with a locally Lipschitz continuous flux function, such as the modular Burgers' equation. We prove that all interfaces coalesce within finite time, leaving behind either a single interface or no interface at all. Our proof relies on mass and energy estimates, regularization of the… ▽ More We are interested in the dynamics of interfaces, or zeros, of shock waves in general scalar viscous conservation laws with a locally Lipschitz continuous flux function, such as the modular Burgers' equation. We prove that all interfaces coalesce within finite time, leaving behind either a single interface or no interface at all. Our proof relies on mass and energy estimates, regularization of the flux function, and an application of the Sturm theorems on the number of zeros of solutions of parabolic problems. Our analysis yields an explicit upper bound on the time of extinction in terms of the initial condition and the flux function. Moreover, in the case of a smooth flux function, we characterize the generic bifurcations arising at a coalescence event with and without the presence of odd symmetry. We identify associated scaling laws describing the local interface dynamics near collision. Finally, we present an extension of these results to the case of anti-shock waves converging to asymptotic limits of opposite signs. Our analysis is corroborated by numerical simulations in the modular Burgers' equation and its regularizations. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 43 pages; 13 figures

arXiv:2403.04514 [pdf, other]

A finite element contour integral method for computing the resonances of metallic grating structures with subwavelength holes

Authors: Yingxia Xi, Junshan Lin, Jiguang Sun

Abstract: We consider the numerical computation of resonances for metallic grating structures with dispersive media and small slit holes. The underlying eigenvalue problem is nonlinear and the mathematical model is multiscale due to the existence of several length scales in problem geometry and material contrast. We discretize the partial differential equation model over the truncated domain using the finit… ▽ More We consider the numerical computation of resonances for metallic grating structures with dispersive media and small slit holes. The underlying eigenvalue problem is nonlinear and the mathematical model is multiscale due to the existence of several length scales in problem geometry and material contrast. We discretize the partial differential equation model over the truncated domain using the finite element method and develop a multi-step contour integral eigensolver to compute the resonances. The eigensolver first locates eigenvalues using a spectral indicator and then computes eigenvalues by a subspace projection scheme. The proposed numerical method is robust and scalable, and does not require initial guess as the iteration methods. Numerical examples are presented to demonstrate its effectiveness. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 21 pages and 27 figures

MSC Class: 35B34; 68U01; 68W25

arXiv:2403.04294 [pdf, other]

A$^{3}$lign-DFER: Pioneering Comprehensive Dynamic Affective Alignment for Dynamic Facial Expression Recognition with CLIP

Authors: Zeng Tao, Yan Wang, Junxiong Lin, Haoran Wang, Xinji Mai, Jiawen Yu, Xuan Tong, Ziheng Zhou, Shaoqi Yan, Qing Zhao, Liyuan Han, Wenqiang Zhang

Abstract: The performance of CLIP in dynamic facial expression recognition (DFER) task doesn't yield exceptional results as observed in other CLIP-based classification tasks. While CLIP's primary objective is to achieve alignment between images and text in the feature space, DFER poses challenges due to the abstract nature of text and the dynamic nature of video, making label representation limited and perf… ▽ More The performance of CLIP in dynamic facial expression recognition (DFER) task doesn't yield exceptional results as observed in other CLIP-based classification tasks. While CLIP's primary objective is to achieve alignment between images and text in the feature space, DFER poses challenges due to the abstract nature of text and the dynamic nature of video, making label representation limited and perfect alignment difficult. To address this issue, we have designed A$^{3}$lign-DFER, which introduces a new DFER labeling paradigm to comprehensively achieve alignment, thus enhancing CLIP's suitability for the DFER task. Specifically, our A$^{3}$lign-DFER method is designed with multiple modules that work together to obtain the most suitable expanded-dimensional embeddings for classification and to achieve alignment in three key aspects: affective, dynamic, and bidirectional. We replace the input label text with a learnable Multi-Dimensional Alignment Token (MAT), enabling alignment of text to facial expression video samples in both affective and dynamic dimensions. After CLIP feature extraction, we introduce the Joint Dynamic Alignment Synchronizer (JAS), further facilitating synchronization and alignment in the temporal dimension. Additionally, we implement a Bidirectional Alignment Training Paradigm (BAP) to ensure gradual and steady training of parameters for both modalities. Our insightful and concise A$^{3}$lign-DFER method achieves state-of-the-art results on multiple DFER datasets, including DFEW, FERV39k, and MAFW. Extensive ablation experiments and visualization studies demonstrate the effectiveness of A$^{3}$lign-DFER. The code will be available in the future. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03900 [pdf, other]

Mamba4Rec: Towards Efficient Sequential Recommendation with Selective State Space Models

Authors: Chengkai Liu, Jianghao Lin, Jianling Wang, Hanzhou Liu, James Caverlee

Abstract: Sequential recommendation aims to estimate the dynamic user preferences and sequential dependencies among historical user behaviors. Although Transformer-based models have proven to be effective for sequential recommendation, they suffer from the inference inefficiency problem stemming from the quadratic computational complexity of attention operators, especially for long behavior sequences. Inspi… ▽ More Sequential recommendation aims to estimate the dynamic user preferences and sequential dependencies among historical user behaviors. Although Transformer-based models have proven to be effective for sequential recommendation, they suffer from the inference inefficiency problem stemming from the quadratic computational complexity of attention operators, especially for long behavior sequences. Inspired by the recent success of state space models (SSMs), we propose Mamba4Rec, which is the first work to explore the potential of selective SSMs for efficient sequential recommendation. Built upon the basic Mamba block which is a selective SSM with an efficient hardware-aware parallel algorithm, we design a series of sequential modeling techniques to further promote model performance while maintaining inference efficiency. Through experiments on public datasets, we demonstrate how Mamba4Rec effectively tackles the effectiveness-efficiency dilemma, outperforming both RNN- and attention-based baselines in terms of both effectiveness and efficiency. The code is available at https://github.com/chengkai-liu/Mamba4Rec. △ Less

Submitted 29 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03536 [pdf, other]

Towards Efficient and Effective Unlearning of Large Language Models for Recommendation

Authors: Hangyu Wang, Jianghao Lin, Bo Chen, Yang Yang, Ruiming Tang, Weinan Zhang, Yong Yu

Abstract: The significant advancements in large language models (LLMs) give rise to a promising research direction, i.e., leveraging LLMs as recommenders (LLMRec). The efficacy of LLMRec arises from the open-world knowledge and reasoning capabilities inherent in LLMs. LLMRec acquires the recommendation capabilities through instruction tuning based on user interaction data. However, in order to protect user… ▽ More The significant advancements in large language models (LLMs) give rise to a promising research direction, i.e., leveraging LLMs as recommenders (LLMRec). The efficacy of LLMRec arises from the open-world knowledge and reasoning capabilities inherent in LLMs. LLMRec acquires the recommendation capabilities through instruction tuning based on user interaction data. However, in order to protect user privacy and optimize utility, it is also crucial for LLMRec to intentionally forget specific user data, which is generally referred to as recommendation unlearning. In the era of LLMs, recommendation unlearning poses new challenges for LLMRec in terms of \textit{inefficiency} and \textit{ineffectiveness}. Existing unlearning methods require updating billions of parameters in LLMRec, which is costly and time-consuming. Besides, they always impact the model utility during the unlearning process. To this end, we propose \textbf{E2URec}, the first \underline{E}fficient and \underline{E}ffective \underline{U}nlearning method for LLM\underline{Rec}. Our proposed E2URec enhances the unlearning efficiency by updating only a few additional LoRA parameters, and improves the unlearning effectiveness by employing a teacher-student framework, where we maintain multiple teacher networks to guide the unlearning process. Extensive experiments show that E2URec outperforms state-of-the-art baselines on two real-world datasets. Specifically, E2URec can efficiently forget specific data without affecting recommendation performance. The source code is at \url{https://github.com/justarter/E2URec}. △ Less

Submitted 30 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: Accepted by Frontier of Computer Science

arXiv:2403.02871 [pdf, other]

Quantum Mixed-State Self-Attention Network

Authors: Fu Chen, Qinglin Zhao, Li Feng, Chuangtao Chen, Yangbin Lin, Jianhong Lin

Abstract: The rapid advancement of quantum computing has increasingly highlighted its potential in the realm of machine learning, particularly in the context of natural language processing (NLP) tasks. Quantum machine learning (QML) leverages the unique capabilities of quantum computing to offer novel perspectives and methodologies for complex data processing and pattern recognition challenges. This paper i… ▽ More The rapid advancement of quantum computing has increasingly highlighted its potential in the realm of machine learning, particularly in the context of natural language processing (NLP) tasks. Quantum machine learning (QML) leverages the unique capabilities of quantum computing to offer novel perspectives and methodologies for complex data processing and pattern recognition challenges. This paper introduces a novel Quantum Mixed-State Attention Network (QMSAN), which integrates the principles of quantum computing with classical machine learning algorithms, especially self-attention networks, to enhance the efficiency and effectiveness in handling NLP tasks. QMSAN model employs a quantum attention mechanism based on mixed states, enabling efficient direct estimation of similarity between queries and keys within the quantum domain, leading to more effective attention weight acquisition. Additionally, we propose an innovative quantum positional encoding scheme, implemented through fixed quantum gates within the quantum circuit, to enhance the model's accuracy. Experimental validation on various datasets demonstrates that QMSAN model outperforms existing quantum and classical models in text classification, achieving significant performance improvements. QMSAN model not only significantly reduces the number of parameters but also exceeds classical self-attention networks in performance, showcasing its strong capability in data representation and information extraction. Furthermore, our study investigates the model's robustness in different quantum noise environments, showing that QMSAN possesses commendable robustness to low noise. △ Less

Submitted 8 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.01991 [pdf, other]

Skater: A Novel Bi-modal Bi-copter Robot for Adaptive Locomotion in Air and Diverse Terrain

Authors: Junxiao Lin, Ruibin Zhang, Neng Pan, Chao Xu, Fei Gao

Abstract: In this letter, we present a novel bi-modal bi-copter robot called Skater, which is adaptable to air and various ground surfaces. Skater consists of a bi-copter moving along its longitudinal direction with two passive wheels on both sides. Using a longitudinally arranged bi-copter as the unified actuation system for both aerial and ground modes, this robot not only keeps a concise and lightweight… ▽ More In this letter, we present a novel bi-modal bi-copter robot called Skater, which is adaptable to air and various ground surfaces. Skater consists of a bi-copter moving along its longitudinal direction with two passive wheels on both sides. Using a longitudinally arranged bi-copter as the unified actuation system for both aerial and ground modes, this robot not only keeps a concise and lightweight mechanism but also possesses exceptional terrain traversing capability and strong steering capacity. Moreover, leveraging the vectored thrust characteristic of bi-copters, the Skater can actively generate the centripetal force needed for steering, enabling it to achieve stable movement even on slippery surfaces. Furthermore, we model the comprehensive dynamics of the Skater, analyze its differential flatness, and introduce a controller using nonlinear model predictive control for trajectory tracking. The outstanding performance of the system is verified by extensive real-world experiments and benchmark comparisons. △ Less

Submitted 26 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted by IEEE Robotics and Automation Letters (RA-L)

arXiv:2403.01474 [pdf, ps, other]

Companion matrix, Vandermonde matrix, Jordan form, Interpolating Polynomials, and Linear Transformations

Authors: Chi-Kwong Li, Jephian C. -H. Lin

Abstract: Let $\mathbb{F}$ be a field, and let $C$ be the $n\times n$ companion matrix of the monic polynomial $f(x)\in \mathbb{F}[x]$ such that $f(x) = \det(xI-C) = (x - λらむだ_1)^{n_1} \cdots (x - λらむだ_m)^{n_m}$ for $m$ distinct elements $λらむだ_1, \dots, λらむだ_m \in \mathbb{F}$. It is shown that there is a generalized Vandermonde matrix $V$ associated with $f(x)$ such that $VCV^{-1}$ is in Jordan form, and the columns of… ▽ More Let $\mathbb{F}$ be a field, and let $C$ be the $n\times n$ companion matrix of the monic polynomial $f(x)\in \mathbb{F}[x]$ such that $f(x) = \det(xI-C) = (x - λらむだ_1)^{n_1} \cdots (x - λらむだ_m)^{n_m}$ for $m$ distinct elements $λらむだ_1, \dots, λらむだ_m \in \mathbb{F}$. It is shown that there is a generalized Vandermonde matrix $V$ associated with $f(x)$ such that $VCV^{-1}$ is in Jordan form, and the columns of $V^{-1}$ are connected to the Hermite interpolating polynomials, whose higher derivatives will have specific values at $λらむだ_1, \dots, λらむだ_m$. If $m = n$ and $n_1 = \cdots = n_m = 1$, then the results reduce to the fact that the (classical) Vandermonde $V$ of $λらむだ_1, \dots, λらむだ_n$ satisfies $VCV^{-1}$ is a diagonal matrix and that the columns of $V^{-1}$ correspond to the Lagrange interpolating polynomials. This shows that the results for real polynomials and matrices also hold for polynomials and matrices over an arbitrary field $\mathbb{F}$. Moreover, interpretations and insights of the results are given in terms of linear transformation between $\mathbb{F}^n$ and the linear space of polynomials in $\mathbb{F}[x]$ with degree less than $n$. △ Less

Submitted 3 March, 2024; originally announced March 2024.

MSC Class: 15A20; 15A21; 33C45

arXiv:2403.01083 [pdf, other]

Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images

Authors: Shufan Pei, Junhong Lin, Wenxi Liu, Tiesong Zhao, Chia-Wen Lin

Abstract: In addition to low light, night images suffer degradation from light effects (e.g., glare, floodlight, etc). However, existing nighttime visibility enhancement methods generally focus on low-light regions, which neglects, or even amplifies the light effects. To address this issue, we propose an Adaptive Multi-scale Fusion network (AMFusion) with infrared and visible images, which designs fusion ru… ▽ More In addition to low light, night images suffer degradation from light effects (e.g., glare, floodlight, etc). However, existing nighttime visibility enhancement methods generally focus on low-light regions, which neglects, or even amplifies the light effects. To address this issue, we propose an Adaptive Multi-scale Fusion network (AMFusion) with infrared and visible images, which designs fusion rules according to different illumination regions. First, we separately fuse spatial and semantic features from infrared and visible images, where the former are used for the adjustment of light distribution and the latter are used for the improvement of detection accuracy. Thereby, we obtain an image free of low light and light effects, which improves the performance of nighttime object detection. Second, we utilize detection features extracted by a pre-trained backbone that guide the fusion of semantic features. Hereby, we design a Detection-guided Semantic Fusion Module (DSFM) to bridge the domain gap between detection and semantic features. Third, we propose a new illumination loss to constrain fusion image with normal light intensity. Experimental results demonstrate the superiority of AMFusion with better visual quality and detection accuracy. The source code will be released after the peer review process. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2403.00322 [pdf, other]

Model-Based Planning and Control for Terrestrial-Aerial Bimodal Vehicles with Passive Wheels

Authors: Ruibin Zhang, Junxiao Lin, Yuze Wu, Yuman Gao, Chi Wang, Chao Xu, Yanjun Cao, Fei Gao

Abstract: Terrestrial and aerial bimodal vehicles have gained widespread attention due to their cross-domain maneuverability. Nevertheless, their bimodal dynamics significantly increase the complexity of motion planning and control, thus hindering robust and efficient autonomous navigation in unknown environments. To resolve this issue, we develop a model-based planning and control framework for terrestrial… ▽ More Terrestrial and aerial bimodal vehicles have gained widespread attention due to their cross-domain maneuverability. Nevertheless, their bimodal dynamics significantly increase the complexity of motion planning and control, thus hindering robust and efficient autonomous navigation in unknown environments. To resolve this issue, we develop a model-based planning and control framework for terrestrial aerial bi-modal vehicles. This work begins by deriving a unified dynamic model and the corresponding differential flatness. Leveraging differential flatness, an optimization-based trajectory planner is proposed, which takes into account both solution quality and computational efficiency. Moreover, we design a tracking controller using nonlinear model predictive control based on the proposed unified dynamic model to achieve accurate trajectory tracking and smooth mode transition. We validate our framework through extensive benchmark comparisons and experiments, demonstrating its effectiveness in terms of planning quality and control performance. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: Accepted at IROS 2023

arXiv:2402.19167 [pdf, other]

Teaching Large Language Models an Unseen Language on the Fly

Authors: Chen Zhang, Xiao Liu, Jiuheng Lin, Yansong Feng

Abstract: Existing large language models struggle to support numerous low-resource languages, particularly the extremely low-resource ones, for which there is minimal training data available for effective parameter updating. We thus investigate whether LLMs can learn a new language on the fly solely through prompting. To study this question, we collect a research suite for Zhuang, a language supported by no… ▽ More Existing large language models struggle to support numerous low-resource languages, particularly the extremely low-resource ones, for which there is minimal training data available for effective parameter updating. We thus investigate whether LLMs can learn a new language on the fly solely through prompting. To study this question, we collect a research suite for Zhuang, a language supported by no LLMs currently. We introduce DiPMT++, a framework for adapting LLMs to unseen languages by in-context learning. Using a dictionary and 5K parallel sentences only, DiPMT++ significantly enhances the performance of GPT-4 from 0 to 16 BLEU for Chinese-to-Zhuang translation and achieves 32 BLEU for Zhuang-to-Chinese translation. We also validate the effectiveness of our framework on Kalamang, another unseen language. Furthermore, we demonstrate the practical utility of DiPMT++ in aiding humans in translating completely unseen languages, which could contribute to the preservation of linguistic diversity. △ Less

Submitted 13 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: ACL 2024 https://github.com/luciusssss/ZhuangBench

arXiv:2402.18949 [pdf, other]

Improving Group Connectivity for Generalization of Federated Deep Learning

Authors: Zexi Li, Jie Lin, Zhiqi Li, Didi Zhu, Chao Wu

Abstract: Federated learning (FL) involves multiple heterogeneous clients collaboratively training a global model via iterative local updates and model fusion. The generalization of FL's global model has a large gap compared with centralized training, which is its bottleneck for broader applications. In this paper, we study and improve FL's generalization through a fundamental ``connectivity'' perspective,… ▽ More Federated learning (FL) involves multiple heterogeneous clients collaboratively training a global model via iterative local updates and model fusion. The generalization of FL's global model has a large gap compared with centralized training, which is its bottleneck for broader applications. In this paper, we study and improve FL's generalization through a fundamental ``connectivity'' perspective, which means how the local models are connected in the parameter region and fused into a generalized global model. The term ``connectivity'' is derived from linear mode connectivity (LMC), studying the interpolated loss landscape of two different solutions (e.g., modes) of neural networks. Bridging the gap between LMC and FL, in this paper, we leverage fixed anchor models to empirically and theoretically study the transitivity property of connectivity from two models (LMC) to a group of models (model fusion in FL). Based on the findings, we propose FedGuCci and FedGuCci+, improving group connectivity for better generalization. It is shown that our methods can boost the generalization of FL under client heterogeneity across various tasks (4 CV datasets and 6 NLP datasets), models (both convolutional and transformer-based), and training paradigms (both from-scratch and pretrain-finetune). △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Preprint

arXiv:2402.18307 [pdf, other]

Feature Denoising For Low-Light Instance Segmentation Using Weighted Non-Local Blocks

Authors: Joanne Lin, Nantheera Anantrasirichai, David Bull

Abstract: Instance segmentation for low-light imagery remains largely unexplored due to the challenges imposed by such conditions, for example shot noise due to low photon count, color distortions and reduced contrast. In this paper, we propose an end-to-end solution to address this challenging task. Based on Mask R-CNN, our proposed method implements weighted non-local (NL) blocks in the feature extractor.… ▽ More Instance segmentation for low-light imagery remains largely unexplored due to the challenges imposed by such conditions, for example shot noise due to low photon count, color distortions and reduced contrast. In this paper, we propose an end-to-end solution to address this challenging task. Based on Mask R-CNN, our proposed method implements weighted non-local (NL) blocks in the feature extractor. This integration enables an inherent denoising process at the feature level. As a result, our method eliminates the need for aligned ground truth images during training, thus supporting training on real-world low-light datasets. We introduce additional learnable weights at each layer in order to enhance the network's adaptability to real-world noise characteristics, which affect different feature scales in different ways. Experimental results show that the proposed method outperforms the pretrained Mask R-CNN with an Average Precision (AP) improvement of +10.0, with the introduction of weighted NL Blocks further enhancing AP by +1.0. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.18302 [pdf, other]

EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

Authors: Jiacheng Lin, Jiajun Chen, Kunyu Peng, Xuan He, Zhiyong Li, Rainer Stiefelhagen, Kailun Yang

Abstract: This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multi-object tracking, which often comes at the cos… ▽ More This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multi-object tracking, which often comes at the cost of tracking quality, interaction efficiency, and even the safety of assistance systems, limiting the application of such methods in autonomous driving. In this paper, we delve into the problem of AR-MOT from the perspective of audio-video fusion and audio-video tracking. We put forward EchoTrack, an end-to-end AR-MOT framework with dual-stream vision transformers. The dual streams are intertwined with our Bidirectional Frequency-domain Cross-attention Fusion Module (Bi-FCFM), which bidirectionally fuses audio and video features from both frequency- and spatiotemporal domains. Moreover, we propose the Audio-visual Contrastive Tracking Learning (ACTL) regime to extract homogeneous semantic features between expressions and visual objects by learning homogeneous features between different audio and video objects effectively. Aside from the architectural design, we establish the first set of large-scale AR-MOT benchmarks, including Echo-KITTI, Echo-KITTI+, and Echo-BDD. Extensive experiments on the established benchmarks demonstrate the effectiveness of the proposed EchoTrack and its components. The source code and datasets are available at https://github.com/lab206/EchoTrack. △ Less

Submitted 5 August, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). The source code and datasets are available at https://github.com/lab206/EchoTrack

arXiv:2402.17785 [pdf, other]

ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

Authors: Xia Liang, Xingjian Du, Jiaju Lin, Pei Zou, Yuan Wan, Bilei Zhu

Abstract: Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Eval… ▽ More Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Evaluation and Modification - Aesthetic Selection". This framework seamlessly blends the interactive and knowledge-understanding features of LLMs with existing symbolic music generation models, thereby achieving a melody composition agent comparable to human creators. We conduct extensive experiments on GPT4 and several open-source large language models, which substantiate our framework's effectiveness. Furthermore, professional music composers were engaged in multi-dimensional evaluations, the final results demonstrated that across various facets of music composition, ByteComposer agent attains the level of a novice melody composer. △ Less

Submitted 6 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.17427 [pdf, other]

VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction

Authors: Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang

Abstract: Existing NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed. While the recent 3D Gaussian Splatting works well on small-scale and object-centric scenes, scaling it up to large scenes poses challenges due to limited video memory, long optimization time, and noticeable appearance variations. To address these challenges, we present VastGauss… ▽ More Existing NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed. While the recent 3D Gaussian Splatting works well on small-scale and object-centric scenes, scaling it up to large scenes poses challenges due to limited video memory, long optimization time, and noticeable appearance variations. To address these challenges, we present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting. We propose a progressive partitioning strategy to divide a large scene into multiple cells, where the training cameras and point cloud are properly distributed with an airspace-aware visibility criterion. These cells are merged into a complete scene after parallel optimization. We also introduce decoupled appearance modeling into the optimization process to reduce appearance variations in the rendered images. Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets, enabling fast optimization and high-fidelity real-time rendering. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted to CVPR 2024. Project website: https://vastgaussian.github.io

arXiv:2402.16131 [pdf, other]

A VAE-based Framework for Learning Multi-Level Neural Granger-Causal Connectivity

Authors: Jiahe Lin, Huitian Lei, George Michailidis

Abstract: Granger causality has been widely used in various application domains to capture lead-lag relationships amongst the components of complex dynamical systems, and the focus in extant literature has been on a single dynamical system. In certain applications in macroeconomics and neuroscience, one has access to data from a collection of related such systems, wherein the modeling task of interest is to… ▽ More Granger causality has been widely used in various application domains to capture lead-lag relationships amongst the components of complex dynamical systems, and the focus in extant literature has been on a single dynamical system. In certain applications in macroeconomics and neuroscience, one has access to data from a collection of related such systems, wherein the modeling task of interest is to extract the shared common structure that is embedded across them, as well as to identify the idiosyncrasies within individual ones. This paper introduces a Variational Autoencoder (VAE) based framework that jointly learns Granger-causal relationships amongst components in a collection of related-yet-heterogeneous dynamical systems, and handles the aforementioned task in a principled way. The performance of the proposed framework is evaluated on several synthetic data settings and benchmarked against existing approaches designed for individual system learning. The method is further illustrated on a real dataset involving time series data from a neurophysiological experiment and produces interpretable results. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: Accepted by Transactions on Machine Learning Research

arXiv:2402.15943 [pdf]

Rethinking Software Engineering in the Foundation Model Era: A Curated Catalogue of Challenges in the Development of Trustworthy FMware

Authors: Ahmed E. Hassan, Dayi Lin, Gopi Krishnan Rajbahadur, Keheliya Gallaba, Filipe R. Cogo, Boyuan Chen, Haoxiang Zhang, Kishanthan Thangarajah, Gustavo Ansaldi Oliva, Jiahuei Lin, Wali Mohammad Abdullah, Zhen Ming Jiang

Abstract: Foundation models (FMs), such as Large Language Models (LLMs), have revolutionized software development by enabling new use cases and business models. We refer to software built using FMs as FMware. The unique properties of FMware (e.g., prompts, agents, and the need for orchestration), coupled with the intrinsic limitations of FMs (e.g., hallucination) lead to a completely new set of software eng… ▽ More Foundation models (FMs), such as Large Language Models (LLMs), have revolutionized software development by enabling new use cases and business models. We refer to software built using FMs as FMware. The unique properties of FMware (e.g., prompts, agents, and the need for orchestration), coupled with the intrinsic limitations of FMs (e.g., hallucination) lead to a completely new set of software engineering challenges. Based on our industrial experience, we identified 10 key SE4FMware challenges that have caused enterprise FMware development to be unproductive, costly, and risky. In this paper, we discuss these challenges in detail and state the path for innovation that we envision. Next, we present FMArts, which is our long-term effort towards creating a cradle-to-grave platform for the engineering of trustworthy FMware. Finally, we (i) show how the unique properties of FMArts enabled us to design and develop a complex FMware for a large customer in a timely manner and (ii) discuss the lessons that we learned in doing so. We hope that the disclosure of the aforementioned challenges and our associated efforts to tackle them will not only raise awareness but also promote deeper and further discussions, knowledge sharing, and innovative solutions across the software engineering discipline. △ Less

Submitted 3 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.14594 [pdf, other]

Improving Assessment of Tutoring Practices using Retrieval-Augmented Generation

Authors: Zifei FeiFei Han, Jionghao Lin, Ashish Gurung, Danielle R. Thomas, Eason Chen, Conrad Borchers, Shivang Gupta, Kenneth R. Koedinger

Abstract: One-on-one tutoring is an effective instructional method for enhancing learning, yet its efficacy hinges on tutor competencies. Novice math tutors often prioritize content-specific guidance, neglecting aspects such as social-emotional learning. Social-emotional learning promotes equity and inclusion and nurturing relationships with students, which is crucial for holistic student development. Asses… ▽ More One-on-one tutoring is an effective instructional method for enhancing learning, yet its efficacy hinges on tutor competencies. Novice math tutors often prioritize content-specific guidance, neglecting aspects such as social-emotional learning. Social-emotional learning promotes equity and inclusion and nurturing relationships with students, which is crucial for holistic student development. Assessing the competencies of tutors accurately and efficiently can drive the development of tailored tutor training programs. However, evaluating novice tutor ability during real-time tutoring remains challenging as it typically requires experts-in-the-loop. To address this challenge, this preliminary study aims to harness Generative Pre-trained Transformers (GPT), such as GPT-3.5 and GPT-4 models, to automatically assess tutors' ability of using social-emotional tutoring strategies. Moreover, this study also reports on the financial dimensions and considerations of employing these models in real-time and at scale for automated assessment. The current study examined four prompting strategies: two basic Zero-shot prompt strategies, Tree of Thought prompt, and Retrieval-Augmented Generator (RAG) based prompt. The results indicate that the RAG prompt demonstrated more accurate performance (assessed by the level of hallucination and correctness in the generated assessment texts) and lower financial costs than the other strategies evaluated. These findings inform the development of personalized tutor training interventions to enhance the the educational effectiveness of tutored learning. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 11 page Workshop paper, AAAI2024 Workshop on AI for Education - Bridging Innovation and Responsibility, Large Language Model, Personalized Tutor Training, Automatic Assessment

arXiv:2402.14307 [pdf, other]

An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes

Authors: Miaoxin Wang, Xiao Wu, Jun Lin, Zhongfeng Wang

Abstract: Convolutional neural networks (CNNs) with large kernels, drawing inspiration from the key operations of vision transformers (ViTs), have demonstrated impressive performance in various vision-based applications. To address the issue of computational efficiency degradation in existing designs for supporting large-kernel convolutions, an FPGA-based inference accelerator is proposed for the efficient… ▽ More Convolutional neural networks (CNNs) with large kernels, drawing inspiration from the key operations of vision transformers (ViTs), have demonstrated impressive performance in various vision-based applications. To address the issue of computational efficiency degradation in existing designs for supporting large-kernel convolutions, an FPGA-based inference accelerator is proposed for the efficient deployment of CNNs with arbitrary kernel sizes. Firstly, a Z-flow method is presented to optimize the computing data flow by maximizing data reuse opportunity. Besides, the proposed design, incorporating the kernel-segmentation (Kseg) scheme, enables extended support for large-kernel convolutions, significantly reducing the storage requirements for overlapped data. Moreover, based on the analysis of typical block structures in emerging CNNs, vertical-fused (VF) and horizontal-fused (HF) methods are developed to optimize CNN deployments from both computation and transmission perspectives. The proposed hardware accelerator, evaluated on Intel Arria 10 FPGA, achieves up to 3.91 times better DSP efficiency than prior art on the same network. Particularly, it demonstrates efficient support for large-kernel CNNs, achieving throughputs of 169.68 GOPS and 244.55 GOPS for RepLKNet-31 and PyConvResNet-50, respectively, both of which are implemented on hardware for the first time. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 5 pages,4 figures. This work has been accepted by 2024 lEEE International Symposium on Circuits and Systems (lSCAS 2024) as a regular paper

arXiv:2402.13794 [pdf, ps, other]

Revisiting Convergence of AdaGrad with Relaxed Assumptions

Authors: Yusu Hong, Junhong Lin

Abstract: In this study, we revisit the convergence of AdaGrad with momentum (covering AdaGrad as a special case) on non-convex smooth optimization problems. We consider a general noise model where the noise magnitude is controlled by the function value gap together with the gradient magnitude. This model encompasses a broad range of noises including bounded noise, sub-Gaussian noise, affine variance noise… ▽ More In this study, we revisit the convergence of AdaGrad with momentum (covering AdaGrad as a special case) on non-convex smooth optimization problems. We consider a general noise model where the noise magnitude is controlled by the function value gap together with the gradient magnitude. This model encompasses a broad range of noises including bounded noise, sub-Gaussian noise, affine variance noise and the expected smoothness, and it has been shown to be more realistic in many practical applications. Our analysis yields a probabilistic convergence rate which, under the general noise, could reach at (\tilde{\mathcal{O}}(1/\sqrt{T})). This rate does not rely on prior knowledge of problem-parameters and could accelerate to (\tilde{\mathcal{O}}(1/T)) where (T) denotes the total number iterations, when the noise parameters related to the function value gap and noise level are sufficiently small. The convergence rate thus matches the lower rate for stochastic first-order methods over non-convex smooth landscape up to logarithm terms [Arjevani et al., 2023]. We further derive a convergence bound for AdaGrad with mometum, considering the generalized smoothness where the local smoothness is controlled by a first-order function of the gradient norm. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.13638 [pdf, ps, other]

doi 10.1103/PhysRevD.110.026002

Effective four-dimensional loop quantum black hole with a cosmological constant

Authors: Jianhui Lin, Xiangdong Zhang

Abstract: In this paper, we utilize the effective corrections of the $\barμみゅー$-scheme in loop quantum black holes to obtain a 4-dimensional spherically symmetric metric with a cosmological constant. By imposing the areal gauge on the components of Ashtekar variables in the classical theory and applying the holonomy corrections, we derive the equations of motion, which can be solved to obtain the expression fo… ▽ More In this paper, we utilize the effective corrections of the $\barμみゅー$-scheme in loop quantum black holes to obtain a 4-dimensional spherically symmetric metric with a cosmological constant. By imposing the areal gauge on the components of Ashtekar variables in the classical theory and applying the holonomy corrections, we derive the equations of motion, which can be solved to obtain the expression for the effective metric in the Painlevé-Gullstrand coordinates. Compared to the classical dS (AdS) spacetime, the LQG correction sets an upper bound on the cosmological constant as $Λらむだ<\frac{3}{γがんま^2Δでるた}$. The thermodynamic properties of black holes have also been calculated. We interestingly found that for a small black hole, the temperature of the LQG black hole decreases as the mass decreases, which is quite different with the classical scenario. Moreover, our result shows that a logarithmic term appeared as the leading order correction to the Beikenstein-Hawking entropy. Furthermore, the LQG corrections also introduce an extra phase transition in the black hole's heat capacity at smaller radius. △ Less

Submitted 29 May, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 10 pages, 4 figures; V2, 14 pages, 9 figures

Journal ref: Phys. Rev. D 110, 026002 (2024)

arXiv:2402.13339 [pdf, ps, other]

Criticality in the Luria-Delbrück model with an arbitrary mutation rate

Authors: Deng Pan, Jie Lin, Ariel Amir

Abstract: The Luria-Delbrück model is a classic model of population dynamics with random mutations, that has been used historically to prove that random mutations drive evolution. In typical scenarios, the relevant mutation rate is exceedingly small, and mutants are counted only at the final time point. Here, inspired by recent experiments on DNA repair, we study a mathematical model that is formally equiva… ▽ More The Luria-Delbrück model is a classic model of population dynamics with random mutations, that has been used historically to prove that random mutations drive evolution. In typical scenarios, the relevant mutation rate is exceedingly small, and mutants are counted only at the final time point. Here, inspired by recent experiments on DNA repair, we study a mathematical model that is formally equivalent to the Luria-Delbrück setup, with the repair rate $p$ playing the role of mutation rate, albeit taking on large values, of order unity per cell division. We find that although at large times the fraction of repaired cells approaches one, the variance of the number of repaired cells undergoes a phase transition: when $p>1/2$ the variance decreases with time, but, intriguingly, for $p<1/2$ even though the fraction of repaired cells approaches 1, the variance in number of repaired cells increases with time. Analyzing DNA-repair experiments, we find that in order to explain the data the model should also take into account the probability of a successful repair process once it is initiated. Taken together, our work shows how the study of variability can lead to surprising phase-transitions as well as provide biological insights into the process of DNA-repair. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 5 pages, 4 figures

arXiv:2402.12813 [pdf, other]

Scaling Laws Behind Code Understanding Model

Authors: Jiayi Lin, Hande Dong, Yutao Xie, Lei Zhang

Abstract: The scaling law is becoming a fundamental law in many machine learning areas. That is, test error falls off with the power law when increasing training data, model size, and computing resource. However, whether this law is suitable for the task of code understanding is not well studied, and most current language models for code understanding are about 100M parameters, which are relatively "small"… ▽ More The scaling law is becoming a fundamental law in many machine learning areas. That is, test error falls off with the power law when increasing training data, model size, and computing resource. However, whether this law is suitable for the task of code understanding is not well studied, and most current language models for code understanding are about 100M parameters, which are relatively "small" compared to large language models. In this paper, we conduct extensive experiments to investigate the scaling law for the code understanding task by varying training data, model size, and computing resource. We validate that the test error of code understanding models falls off with the power law when using larger models, indicating that the scaling law is suitable for the code understanding task. Besides, we apply different scales of models to two downstream code understanding tasks, and find that the performance increases with larger scale of models. Finally, we train a large-scale code understanding model named CoLSBERT with 1.5B parameters on a large dataset using more computing resource, which outperforms previous work by a large margin. We will release our code and the CoLSBERT model when our paper is published. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Showing 201–250 of 2,461 results for author: Lin, J