Search | arXiv e-print repository

Affine $\imath$quantum groups and Steinberg varieties of type C

Abstract: We provide a geometric realization of the quasi-split affine $\imath$quantum group of type AIII$_{2n-1}^{(τたう)}$ in terms of equivariant K-groups of non-connected Steinberg varieties of type C. This uses a new Drinfeld type presentation of this affine $\imath$quantum group which admits very nontrivial Serre relations. We then construct à la Springer a family of finite-dimensional standard modules an… ▽ More We provide a geometric realization of the quasi-split affine $\imath$quantum group of type AIII$_{2n-1}^{(τたう)}$ in terms of equivariant K-groups of non-connected Steinberg varieties of type C. This uses a new Drinfeld type presentation of this affine $\imath$quantum group which admits very nontrivial Serre relations. We then construct à la Springer a family of finite-dimensional standard modules and irreducible modules of this $\imath$quantum group, and provide a composition multiplicity formula of the standard modules. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 47 pages. Comments are welcome

arXiv:2407.00978 [pdf, other]

Hybrid RAG-empowered Multi-modal LLM for Secure Healthcare Data Management: A Diffusion-based Contract Theory Approach

Authors: Cheng Su, Jinbo Wen, Jiawen Kang, Yonghua Wang, Hudan Pan, M. Shamim Hossain

Abstract: Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amount… ▽ More Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amounts of multi-modal data. However, critical challenges persist in developing medical MLLMs, including healthcare data security and freshness issues, affecting the output quality of MLLMs. In this paper, we propose a hybrid Retrieval-Augmented Generation (RAG)-empowered medical MLLMs framework for healthcare data management. This framework leverages a hierarchical cross-chain architecture to facilitate secure data training. Moreover, it enhances the output quality of MLLMs through hybrid RAG, which employs multi-modal metrics to filter various unimodal RAG results and incorporates these retrieval results as additional inputs to MLLMs. Additionally, we employ age of information to indirectly evaluate the data freshness impact of MLLMs and utilize contract theory to incentivize healthcare data holders to share fresh data, mitigating information asymmetry in data sharing. Finally, we utilize a generative diffusion model-based reinforcement learning algorithm to identify the optimal contract for efficient data sharing. Numerical results demonstrate the effectiveness of the proposed schemes, which achieve secure and efficient healthcare data management. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 12 pages, 6 figures

arXiv:2406.11208 [pdf]

Privacy-preserving Pseudonym Schemes for Personalized 3D Avatars in Mobile Social Metaverses

Authors: Cheng Su, Xiaofeng Luo, Zhenmou Liu, Jiawen Kang, Min Hao, Zehui Xiong, Zhaohui Yang, Chongwen Huang

Abstract: The emergence of mobile social metaverses, a novel paradigm bridging physical and virtual realms, has led to the widespread adoption of avatars as digital representations for Social Metaverse Users (SMUs) within virtual spaces. Equipped with immersive devices, SMUs leverage Edge Servers (ESs) to deploy their avatars and engage with other SMUs in virtual spaces. To enhance immersion, SMUs incline t… ▽ More The emergence of mobile social metaverses, a novel paradigm bridging physical and virtual realms, has led to the widespread adoption of avatars as digital representations for Social Metaverse Users (SMUs) within virtual spaces. Equipped with immersive devices, SMUs leverage Edge Servers (ESs) to deploy their avatars and engage with other SMUs in virtual spaces. To enhance immersion, SMUs incline to opt for 3D avatars for social interactions. However, existing 3D avatars are typically generated through scanning the real faces of SMUs, which can raise concerns regarding information privacy and security, such as profile identity leakages. To tackle this, we introduce a new framework for personalized 3D avatar construction, leveraging a two-layer network model that provides SMUs with the option to customize their personal avatars for privacy preservation. Specifically, our approach introduces avatar pseudonyms to jointly safeguard the profile and digital identity privacy of the generated avatars. Then, we design a novel metric named Privacy of Personalized Avatars (PoPA), to evaluate effectiveness of the avatar pseudonyms. To optimize pseudonym resource, we model the pseudonym distribution process as a Stackelberg game and employ Deep Reinforcement Learning (DRL) to learn equilibrium strategies under incomplete information. Simulation results validate the efficacy and feasibility of our proposed schemes for mobile social metaverses. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 6pages, 4 figures

arXiv:2406.01591 [pdf, other]

DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation

Authors: Chun-Hung Wu, Shih-Hong Chen, Chih-Yao Hu, Hsin-Yu Wu, Kai-Hsin Chen, Yu-You Chen, Chih-Hai Su, Chih-Kuo Lee, Yu-Lun Liu

Abstract: This paper presents Deformable Neural Vessel Representations (DeNVeR), an unsupervised approach for vessel segmentation in X-ray videos without annotated ground truth. DeNVeR uses optical flow and layer separation, enhancing segmentation accuracy and adaptability through test-time training. A key component of our research is the introduction of the XACV dataset, the first X-ray angiography coronar… ▽ More This paper presents Deformable Neural Vessel Representations (DeNVeR), an unsupervised approach for vessel segmentation in X-ray videos without annotated ground truth. DeNVeR uses optical flow and layer separation, enhancing segmentation accuracy and adaptability through test-time training. A key component of our research is the introduction of the XACV dataset, the first X-ray angiography coronary video dataset with high-quality, manually labeled segmentation ground truth. Our evaluation demonstrates that DeNVeR outperforms current state-of-the-art methods in vessel segmentation. This paper marks an advance in medical imaging, providing a robust, data-efficient tool for disease diagnosis and treatment planning and setting a new standard for future research in video vessel segmentation. See our project page for video results at https://kirito878.github.io/DeNVeR/. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Project page: https://kirito878.github.io/DeNVeR/

arXiv:2405.13923 [pdf, other]

Why Not Transform Chat Large Language Models to Non-English?

Authors: Xiang Geng, Ming Zhu, Jiahuan Li, Zhejian Lai, Wei Zou, Shuaijie She, Jiaxin Guo, Xiaofeng Zhao, Yinglu Li, Yuang Li, Chang Su, Yanqing Zhao, Xinglin Lyu, Min Zhang, Jiajun Chen, Hao Yang, Shujian Huang

Abstract: The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized fo… ▽ More The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized for advanced abilities, e.g. multi-turn conversation and human preference alignment, and thus more powerful in both helpfulness and safety. However, transforming a chat LLM involves two critical issues: (1) How can we effectively transfer advanced abilities without their supervised data? (2) How can we prevent the original knowledge from catastrophic forgetting during transformation? We target these issues by introducing a simple framework called TransLLM. For the first issue, TransLLM divides the transfer problem into some common sub-tasks with the translation chain-of-thought, which uses the translation as the bridge between English and non-English step-by-step. We further enhance the performance of sub-tasks with publicly available data. For the second issue, we propose a method comprising two synergistic components: low-rank adaptation for training to maintain the original LLM parameters, and recovery KD, which utilizes data generated by the chat LLM itself to recover the original knowledge from the frozen parameters. In the experiments, we transform the LLaMA-2-chat-7B to the Thai language. Our method, using only single-turn data, outperforms strong baselines and ChatGPT on multi-turn benchmark MT-bench. Furthermore, our method, without safety data, rejects more harmful queries of safety benchmark AdvBench than both ChatGPT and GPT-4. △ Less

Submitted 31 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.11280 [pdf, other]

Joint Analysis of Single-Cell Data across Cohorts with Missing Modalities

Authors: Marianne Arriola, Weishen Pan, Manqi Zhou, Qiannan Zhang, Chang Su, Fei Wang

Abstract: Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel f… ▽ More Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel framework that learns unified cell representations under domain shift without requiring full-modality reference samples. Our generative approach learns rich cross-modal and cross-domain relationships that enable imputation of these missing modalities. Through experiments on real-world multi-omic datasets, we demonstrate that offers a robust solution to single-cell tasks such as cell type clustering, cell type classification, and feature imputation. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 10 pages, 7 figures, 5 tables

arXiv:2405.06138 [pdf, other]

Seasonality Patterns in 311-Reported Foodborne Illness Cases and Machine Learning-Identified Indications of Foodborne Illnesses from Yelp Reviews, New York City, 2022-2023

Authors: Eden Shaveet, Crystal Su, Daniel Hsu, Luis Gravano

Abstract: Restaurants are critical venues at which to investigate foodborne illness outbreaks due to shared sourcing, preparation, and distribution of foods. Formal channels to report illness after food consumption, such as 311, New York City's non-emergency municipal service platform, are underutilized. Given this, online social media platforms serve as abundant sources of user-generated content that provi… ▽ More Restaurants are critical venues at which to investigate foodborne illness outbreaks due to shared sourcing, preparation, and distribution of foods. Formal channels to report illness after food consumption, such as 311, New York City's non-emergency municipal service platform, are underutilized. Given this, online social media platforms serve as abundant sources of user-generated content that provide critical insights into the needs of individuals and populations. We extracted restaurant reviews and metadata from Yelp to identify potential outbreaks of foodborne illness in connection with consuming food from restaurants. Because the prevalence of foodborne illnesses may increase in warmer months as higher temperatures breed more favorable conditions for bacterial growth, we aimed to identify seasonal patterns in foodborne illness reports from 311 and identify seasonal patterns of foodborne illness from Yelp reviews for New York City restaurants using a Hierarchical Sigmoid Attention Network (HSAN). We found no evidence of significant bivariate associations between any variables of interest. Given the inherent limitations of relying solely on user-generated data for public health insights, it is imperative to complement these sources with other data streams and insights from subject matter experts. Future investigations should involve conducting these analyses at more granular spatial and temporal scales to explore the presence of such differences or associations. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Paper counterpart to flash talk presented at 8th Annual Conference of the UConn Center for mHealth and Social Media, Advancing Public Health and Science with Artificial Intelligence

arXiv:2405.03135 [pdf, other]

CURLING - I. The Influence of Point-like Image Approximation on the Outcomes of Cluster Strong Lens Modeling

Authors: Yushan Xie, Huanyuan Shan, Nan Li, Ran Li, Eric Jullo, Chen Su, Xiaoyue Cao, Jean-Paul Kneib, Ana Acebron, Mengfan He, Ji Yao, Chunxiang Wang, Jiadong Li, Yin Li

Abstract: Cluster-scale strong lensing is a powerful tool for exploring the properties of dark matter and constraining cosmological models. However, due to the complex parameter space, pixelized strong lens modeling in galaxy clusters is computationally expensive, leading to the point-source approximation of strongly lensed extended images, potentially introducing systematic biases. Herein, as the first pap… ▽ More Cluster-scale strong lensing is a powerful tool for exploring the properties of dark matter and constraining cosmological models. However, due to the complex parameter space, pixelized strong lens modeling in galaxy clusters is computationally expensive, leading to the point-source approximation of strongly lensed extended images, potentially introducing systematic biases. Herein, as the first paper of the ClUsteR strong Lens modelIng for the Next-Generation observations (CURLING) program, we use lensing ray-tracing simulations to quantify the biases and uncertainties arising from the point-like image approximation for JWST-like observations. Our results indicate that the approximation works well for reconstructing the total cluster mass distribution, but can bias the magnification measurements near critical curves and the constraints on the cosmological parameters, the total matter density of the Universe $Ωおめが_{\rm m}$, and dark energy equation of state parameter $w$. To mitigate the biases, we propose incorporating the extended surface brightness distribution of lensed sources into the modeling. This approach reduces the bias in magnification from 46.2 per cent to 0.09 per cent for $μみゅー\sim 1000$. Furthermore, the median values of cosmological parameters align more closely with the fiducial model. In addition to the improved accuracy, we also demonstrate that the constraining power can be substantially enhanced. In conclusion, it is necessary to model cluster-scale strong lenses with pixelized multiple images, especially for estimating the intrinsic luminosity of highly magnified sources and accurate cosmography in the era of high-precision observations. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 12 pages, 8 figures

arXiv:2405.01561 [pdf]

doi 10.5281/zenodo.10899798

Rapid Mobile App Development for Generative AI Agents on MIT App Inventor

Authors: Jaida Gao, Calab Su, Etai Miller, Kevin Lu, Yu Meng

Abstract: The evolution of Artificial Intelligence (AI) stands as a pivotal force shaping our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications u… ▽ More The evolution of Artificial Intelligence (AI) stands as a pivotal force shaping our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications using the development platform provided by MIT App Inventor. To demonstrate its efficacy, we share the development journey of three distinct mobile applications: SynchroNet for fostering sustainable communities; ProductiviTeams for addressing procrastination; and iHELP for enhancing community safety. All three applications seamlessly integrate a spectrum of generative AI features, leveraging OpenAI APIs. Furthermore, we offer insights gleaned from overcoming challenges in integrating diverse tools and AI functionalities, aiming to inspire young developers to join our efforts in building practical AI agent applications. △ Less

Submitted 31 March, 2024; originally announced May 2024.

Journal ref: Journal of advances in information science and technology 2(3) 1-8, March 2024

arXiv:2405.00088 [pdf, other]

Mass from Nothing

Authors: Paul Romatschke, Chun-Wei Su, Ryan Weller

Abstract: We study the Abelian Higgs model with multiple scalar fields, but without mass terms. Solving the model non-perturbatively order-by-order in the number of scalar fields, we find that radiative corrections generate masses for the scalar and gauge boson, without spontaneous symmetry breaking. The mass scales are set by the $Λらむだ$-parameter of the electroweak running coupling, thereby naturally avoiding… ▽ More We study the Abelian Higgs model with multiple scalar fields, but without mass terms. Solving the model non-perturbatively order-by-order in the number of scalar fields, we find that radiative corrections generate masses for the scalar and gauge boson, without spontaneous symmetry breaking. The mass scales are set by the $Λらむだ$-parameter of the electroweak running coupling, thereby naturally avoiding the hierarchy problem. No part of our calculation employs a weak-coupling expansion, and we find that the perturbative vacuum is metastable, and hence must decay to the stable non-perturbative vacuum of the theory, which we identify. Although the field content of our Lagrangian is standard, our results predict the existence of two heavy scalar resonances in addition to the Higgs. We believe that these predicted resonances will ultimately allow experimentalists to discriminate between our method and standard solutions of the Higgs model. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 25 pages, no figures; comments, criticisms and citation requests welcome!

arXiv:2404.11982 [pdf, other]

doi 10.1145/3626772.3657747

SIGformer: Sign-aware Graph Transformer for Recommendation

Authors: Sirui Chen, Jiawei Chen, Sheng Zhou, Bohao Wang, Shen Han, Chanfei Su, Yuqing Yuan, Can Wang

Abstract: In recommender systems, most graph-based methods focus on positive user feedback, while overlooking the valuable negative feedback. Integrating both positive and negative feedback to form a signed graph can lead to a more comprehensive understanding of user preferences. However, the existing efforts to incorporate both types of feedback are sparse and face two main limitations: 1) They process pos… ▽ More In recommender systems, most graph-based methods focus on positive user feedback, while overlooking the valuable negative feedback. Integrating both positive and negative feedback to form a signed graph can lead to a more comprehensive understanding of user preferences. However, the existing efforts to incorporate both types of feedback are sparse and face two main limitations: 1) They process positive and negative feedback separately, which fails to holistically leverage the collaborative information within the signed graph; 2) They rely on MLPs or GNNs for information extraction from negative feedback, which may not be effective. To overcome these limitations, we introduce SIGformer, a new method that employs the transformer architecture to sign-aware graph-based recommendation. SIGformer incorporates two innovative positional encodings that capture the spectral properties and path patterns of the signed graph, enabling the full exploitation of the entire graph. Our extensive experiments across five real-world datasets demonstrate the superiority of SIGformer over state-of-the-art methods. The code is available at https://github.com/StupidThree/SIGformer. △ Less

Submitted 6 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted by SIGIR2024

arXiv:2404.03414 [pdf, other]

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

Authors: Jooyoung Lee, Fan Yang, Thanh Tran, Qian Hu, Emre Barut, Kai-Wei Chang, Chengwei Su

Abstract: We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resour… ▽ More We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resource-efficient in the sense that it only requires training the lightweight LM. We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals. We assess our method with multi-hop extractive question answering (QA) benchmarks, HotpotQA, and 2WikiMultiHopQA. Experimental results show that our approach outperforms all baselines regarding answer prediction accuracy. We also find that reinforcement learning helps the model to produce higher-quality rationales with improved QA performance. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: This paper is accepted to LREC-COLING 2024

arXiv:2404.00095 [pdf, other]

GDA: Generalized Diffusion for Robust Test-time Adaptation

Authors: Yun-Yun Tsai, Fu-Chen Chen, Albert Y. C. Chen, Junfeng Yang, Che-Chun Su, Min Sun, Cheng-Hao Kuo

Abstract: Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the mod… ▽ More Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the model's weights. Unfortunately, those studies have primarily focused on pixel-level corruptions, thereby lacking the generalization to adapt to a broader range of OOD types. We introduce Generalized Diffusion Adaptation (GDA), a novel diffusion-based test-time adaptation method robust against diverse OOD types. Specifically, GDA iteratively guides the diffusion by applying a marginal entropy loss derived from the model, in conjunction with style and content preservation losses during the reverse sampling process. In other words, GDA considers the model's output behavior with the semantic information of the samples as a whole, which can reduce ambiguity in downstream tasks during the generation process. Evaluation across various popular model architectures and OOD benchmarks shows that GDA consistently outperforms prior work on diffusion-driven adaptation. Notably, it achieves the highest classification accuracy improvements, ranging from 4.4\% to 5.02\% on ImageNet-C and 2.5\% to 7.4\% on Rendition, Sketch, and Stylized benchmarks. This performance highlights GDA's generalization to a broader range of OOD benchmarks. △ Less

Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.14118 [pdf, other]

From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation

Authors: Haofei Zhao, Yilun Liu, Shimin Tao, Weibin Meng, Yimeng Chen, Xiang Geng, Chang Su, Min Zhang, Hao Yang

Abstract: Machine Translation Quality Estimation (MTQE) is the task of estimating the quality of machine-translated text in real time without the need for reference translations, which is of great importance for the development of MT. After two decades of evolution, QE has yielded a wealth of results. This article provides a comprehensive overview of QE datasets, annotation methods, shared tasks, methodolog… ▽ More Machine Translation Quality Estimation (MTQE) is the task of estimating the quality of machine-translated text in real time without the need for reference translations, which is of great importance for the development of MT. After two decades of evolution, QE has yielded a wealth of results. This article provides a comprehensive overview of QE datasets, annotation methods, shared tasks, methodologies, challenges, and future research directions. It begins with an introduction to the background and significance of QE, followed by an explanation of the concepts and evaluation metrics for word-level QE, sentence-level QE, document-level QE, and explainable QE. The paper categorizes the methods developed throughout the history of QE into those based on handcrafted features, deep learning, and Large Language Models (LLMs), with a further division of deep learning-based methods into classic deep learning and those incorporating pre-trained language models (LMs). Additionally, the article details the advantages and limitations of each method and offers a straightforward comparison of different approaches. Finally, the paper discusses the current challenges in QE research and provides an outlook on future research directions. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted by IJCNN 2024

arXiv:2403.12370 [pdf, other]

XPose: eXplainable Human Pose Estimation

Authors: Luyu Qiu, Jianing Li, Lei Wen, Chi Su, Fei Hao, Chen Jason Zhang, Lei Chen

Abstract: Current approaches in pose estimation primarily concentrate on enhancing model architectures, often overlooking the importance of comprehensively understanding the rationale behind model decisions. In this paper, we propose XPose, a novel framework that incorporates Explainable AI (XAI) principles into pose estimation. This integration aims to elucidate the individual contribution of each keypoint… ▽ More Current approaches in pose estimation primarily concentrate on enhancing model architectures, often overlooking the importance of comprehensively understanding the rationale behind model decisions. In this paper, we propose XPose, a novel framework that incorporates Explainable AI (XAI) principles into pose estimation. This integration aims to elucidate the individual contribution of each keypoint to final prediction, thereby elevating the model's transparency and interpretability. Conventional XAI techniques have predominantly addressed tasks with single-target tasks like classification. Additionally, the application of Shapley value, a common measure in XAI, to pose estimation has been hindered by prohibitive computational demands. To address these challenges, this work introduces an innovative concept called Group Shapley Value (GSV). This approach strategically organizes keypoints into clusters based on their interdependencies. Within these clusters, GSV meticulously calculates Shapley value for keypoints, while for inter-cluster keypoints, it opts for a more holistic group-level valuation. This dual-level computation framework meticulously assesses keypoint contributions to the final outcome, optimizing computational efficiency. Building on the insights into keypoint interactions, we devise a novel data augmentation technique known as Group-based Keypoint Removal (GKR). This method ingeniously removes individual keypoints during training phases, deliberately preserving those with strong mutual connections, thereby refining the model's predictive prowess for non-visible keypoints. The empirical validation of GKR across a spectrum of standard approaches attests to its efficacy. GKR's success demonstrates how using Explainable AI (XAI) can directly enhance pose estimation models. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.08599 [pdf, other]

The role of susceptible individuals in spreading dynamics

Authors: Chang Su, Fang Zhou, Linyuan Lü

Abstract: Exploring the internal mechanism of information spreading is critical for understanding and controlling the process. Traditional spreading models often assume individuals play the same role in the spreading process. In reality, however, individuals' diverse characteristics contribute differently to the spreading performance, leading to a heterogeneous infection rate across the system. To investiga… ▽ More Exploring the internal mechanism of information spreading is critical for understanding and controlling the process. Traditional spreading models often assume individuals play the same role in the spreading process. In reality, however, individuals' diverse characteristics contribute differently to the spreading performance, leading to a heterogeneous infection rate across the system. To investigate network spreading dynamics under heterogeneous infection rates, we integrate two individual-level features -- influence (i.e., the ability to influence neighbors) and susceptibility (i.e., the extent to be influenced by neighbors) -- into the independent cascade model. Our findings reveal significant differences in spreading performance under heterogeneous and constant infection rates, with traditional structural centrality metrics proving more effective in the latter scenario. Additionally, we take the constant and heterogeneous infection rates into a state-of-the-art maximization algorithm, the well-known TIM algorithm, and find the seeds selected by heterogeneous infection rates are more dispersed compared to those under constant rates. Lastly, we find that both individuals' influence and susceptibility are vital to the spreading performance. Strikingly, susceptible individuals are particularly important to spreading when information is disseminated by social celebrities. By integrating influence and susceptibility into the spreading model, we gain a more profound understanding of the underlying mechanisms driving information spreading. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.03542 [pdf, other]

DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

Authors: Zhongkai Hao, Chang Su, Songming Liu, Julius Berner, Chengyang Ying, Hang Su, Anima Anandkumar, Jian Song, Jun Zhu

Abstract: Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training s… ▽ More Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training strategy, which allows for more stable and efficient pre-training on PDE data and generalizes to various downstream tasks. Moreover, by designing a flexible and scalable model architecture based on Fourier attention, we can easily scale up the model for large-scale pre-training. We train our PDE foundation model with up to 0.5B parameters on 10+ PDE datasets with more than 100k trajectories. Extensive experiments show that we achieve SOTA on these benchmarks and validate the strong generalizability of our model to significantly enhance performance on diverse downstream PDE tasks like 3D data. Code is available at \url{https://github.com/thu-ml/DPOT}. △ Less

Submitted 6 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.18893 [pdf]

Direct Visualization of Disorder Driven Electronic Liquid Crystal Phases in Dirac Nodal Line Semimetal GdSbTe

Authors: Balaji Venkatesan, Syu-You Guan, Jen-Te Chang, Shiang-Bin Chiu, Po-Yuan Yang, Chih-Chuan Su, Tay-Rong Chang, Kalaivanan Raju, Raman Sankar, Somboon Fongchaiya, Ming-Wen Chu, Chia-Seng Chang, Guoqing Chang, Hsin Lin, Adrian Del Maestro, Ying-Jer Kao, Tien-Ming Chuang

Abstract: Electronic liquid crystal (ELC) phases are spontaneous symmetry breaking states believed to arise from strong electron correlation in quantum materials such as cuprates and iron pnictides. Here, we report a direct observation of ELC phases in a Dirac nodal line (DNL) semimetal GdSbxTe2-x. Electronic nanostructures consisting of incommensurate smectic charge modulation and intense local nematic ord… ▽ More Electronic liquid crystal (ELC) phases are spontaneous symmetry breaking states believed to arise from strong electron correlation in quantum materials such as cuprates and iron pnictides. Here, we report a direct observation of ELC phases in a Dirac nodal line (DNL) semimetal GdSbxTe2-x. Electronic nanostructures consisting of incommensurate smectic charge modulation and intense local nematic order are visualized by using spectroscopic imaging - scanning tunneling microscopy. As topological materials with symmetry protected Dirac or Weyl fermions are mostly weakly correlated, the discovery of such ELC phases are anomalous and raise questions on the origin of their emergence. Specifically, we demonstrate how chemical substitution generates these symmetry breaking phases before the system undergoes a charge density wave - orthorhombic structural transition. We further show how dopants can induce nematicity via quasiparticle scattering interference. Our results highlight the importance of impurities in realizing ELC phases and present a new material platform for exploring the interplay among quenched disorder, topology and electron correlation. △ Less

Submitted 7 May, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.12215 [pdf]

Forming Long-range Order of Semiconducting Polymers through Liquid-phase Directional Molecular Assemblies

Authors: Minh Nhat Pham, Chun-Jen Su, Yu-Ching Huang, Kun-Ta Lin, Ting-Yu Huang, Yu-Ying Lai, Chen-An Wang, Yong-Kang Liaw, Ting-Han Lin, U-Ser Jeng, Jrjeng Ruan, Chan Luo, Ye Huang, Guillermo C. Bazan, Ben B. Y. Hsu

Abstract: Intermolecular interactions are crucial in determining the morphology of solution-processed semiconducting polymer thin films. However, these random interactions often lead to disordered or short-range ordered structures. Achieving long-range order in these films has been a challenge due to limited control over microscopic interactions in current techniques. Here, we present a molecular-level meth… ▽ More Intermolecular interactions are crucial in determining the morphology of solution-processed semiconducting polymer thin films. However, these random interactions often lead to disordered or short-range ordered structures. Achieving long-range order in these films has been a challenge due to limited control over microscopic interactions in current techniques. Here, we present a molecular-level methodology that leverages spatial matching of intermolecular dynamics among solutes, solvents, and substrates to induce directional molecular assembly in weakly bonded polymers. Within the optimized dynamic scale of 2.5 Å between polymer side chains and self-assembled monolayers (SAMs) on nanogrooved substrates, our approach transforms random aggregates into unidirectional fibers with a remarkable increase in the anisotropic stacking ratio from 1 to 11. The Flory-Huggins-based molecular stacking model accurately predicts the transitioning order on various SAMs, validated by morphologic and spectroscopic observations. The enhanced structural ordering spans over 3 orders of magnitude in length, raising from the smallest 7.3 nm random crystallites to >14 um unidirectional fibers on sub-millimeter areas. Overall, this study provides insights into the control of complex intermolecular interactions and offers enhanced molecular-level controllability in solution-based processes. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 40 pages, 6 figures, 2 tables

arXiv:2402.11266 [pdf, ps, other]

Filtered Lie-Trotter splitting for the "good" Boussinesq equation: low regularity error estimates

Authors: Lun Ji, Hang Li, Alexander Ostermann, Chunmei Su

Abstract: We investigate a filtered Lie-Trotter splitting scheme for the ``good" Boussinesq equation and derive an error estimate for initial data with very low regularity. Through the use of discrete Bourgain spaces, our analysis extends to initial data in $H^{s}$ for $0<s\leq 2$, overcoming the constraint of $s>1/2$ imposed by the bilinear estimate in smooth Sobolev spaces. We establish convergence rates… ▽ More We investigate a filtered Lie-Trotter splitting scheme for the ``good" Boussinesq equation and derive an error estimate for initial data with very low regularity. Through the use of discrete Bourgain spaces, our analysis extends to initial data in $H^{s}$ for $0<s\leq 2$, overcoming the constraint of $s>1/2$ imposed by the bilinear estimate in smooth Sobolev spaces. We establish convergence rates of order $τたう^{s/2}$ in $L^2$ for such levels of regularity. Our analytical findings are supported by numerical experiments. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: 3 figures

MSC Class: 35Q35; 65M12; 65M15; 65M70

arXiv:2402.05026 [pdf, other]

Characterization and Optimization of a Cryogenic Pure CsI Detector with Remarkable Light Yield and Unprecedented Energy Resolution for CLOVERS Experiment

Authors: Chenguang Su, Qian Lin, Linqquan Kong, Shi Chen, Kimiya Moharrami, Yangheng Zheng, Jin Li

Abstract: In this study, we conducted a comprehensive characterization and optimization of a cryogenic pure CsI (pCsI) detector. We utilized a \SI{2}{\centi\metre} cubic crystal coupled with a HAMAMATSU R11065 photomultiplier tube (PMT), achieving a remarkable light yield of \SI{35.2}{PE/\keV_{ee}} and an unprecedented energy resolution of \SI{6.9}{\%} at \SI{60}{\keV}. Additionally, we measured the scintil… ▽ More In this study, we conducted a comprehensive characterization and optimization of a cryogenic pure CsI (pCsI) detector. We utilized a \SI{2}{\centi\metre} cubic crystal coupled with a HAMAMATSU R11065 photomultiplier tube (PMT), achieving a remarkable light yield of \SI{35.2}{PE/\keV_{ee}} and an unprecedented energy resolution of \SI{6.9}{\%} at \SI{60}{\keV}. Additionally, we measured the scintillation decay time of pCsI, which proved to be significantly faster than that of CsI(Na) at room temperature. Furthermore, we investigated the impact of temperature, surface treatment, and crystal shape on the light yield. Notably, the light yield peaked at approximately \SI{20}{\K} and remained stable within the range of \SI{70}-\SI{100}{\K}. We observed that the light yield of polished crystals was approximately 1.5 times greater than that of ground crystals, while the crystal shape exhibited minimal influence on the light yield. These results are crucial for the design of the \SI{10}{\kg} pCsI detector for the future CLOVERS (Coherent eLastic neutrinO(V)-nucleus scattERing at China Spallation Neutron Source (CSNS)) experiment. △ Less

Submitted 22 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.04500 [pdf, ps, other]

A Pieri type formula for motivic Chern classes of Schubert cells in Grassmannians

Authors: Neil J. Y. Fan, Peter L. Guo, Changjian Su, Rui Xiong

Abstract: We prove a Pieri formula for motivic Chern classes of Schubert cells in the equivariant K-theory of Grassmannians, which is described in terms of ribbon operators on partitions. Our approach is to transform the Schubert calculus over Grassmannians to the calculation in a certain affine Hecke algebra. As a consequence, we derive a Pieri formula for Segre motivic classes of Schubert cells in Grassma… ▽ More We prove a Pieri formula for motivic Chern classes of Schubert cells in the equivariant K-theory of Grassmannians, which is described in terms of ribbon operators on partitions. Our approach is to transform the Schubert calculus over Grassmannians to the calculation in a certain affine Hecke algebra. As a consequence, we derive a Pieri formula for Segre motivic classes of Schubert cells in Grassmannians. We apply the Pieri formulas to establish a relation between motivic Chern classes and Segre motivic classes, extending a well-known relation between the classes of structure sheaves and ideal sheaves. As another application, we find a symmetric power series representative for the class of the dualizing sheaf of a Schubert variety. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03641 [pdf, ps, other]

Stable BDF time discretization of BGN-based parametric finite element methods for geometric flows

Authors: Wei Jiang, Chunmei Su, Ganghui Zhang

Abstract: We propose a novel class of temporal high-order parametric finite element methods for solving a wide range of geometric flows of curves and surfaces. By incorporating the backward differentiation formulae (BDF) for time discretization into the BGN formulation, originally proposed by Barrett, Garcke, and Nürnberg (J. Comput. Phys., 222 (2007), pp.~441--467), we successfully develop high-order BGN/B… ▽ More We propose a novel class of temporal high-order parametric finite element methods for solving a wide range of geometric flows of curves and surfaces. By incorporating the backward differentiation formulae (BDF) for time discretization into the BGN formulation, originally proposed by Barrett, Garcke, and Nürnberg (J. Comput. Phys., 222 (2007), pp.~441--467), we successfully develop high-order BGN/BDF$k$ schemes. The proposed BGN/BDF$k$ schemes not only retain almost all the advantages of the classical first-order BGN scheme such as computational efficiency and good mesh quality, but also exhibit the desired $k$th-order temporal accuracy in terms of shape metrics, ranging from second-order to fourth-order accuracy. Furthermore, we validate the performance of our proposed BGN/BDF$k$ schemes through extensive numerical examples, demonstrating their high-order temporal accuracy for various types of geometric flows while maintaining good mesh quality throughout the evolution. △ Less

Submitted 5 February, 2024; originally announced February 2024.

MSC Class: 74H15; 74S05; 74M15; 65M60

arXiv:2402.00531 [pdf, other]

Preconditioning for Physics-Informed Neural Networks

Authors: Songming Liu, Chang Su, Jiachen Yao, Zhongkai Hao, Hang Su, Youjia Wu, Jun Zhu

Abstract: Physics-informed neural networks (PINNs) have shown promise in solving various partial differential equations (PDEs). However, training pathologies have negatively affected the convergence and prediction accuracy of PINNs, which further limits their practical applications. In this paper, we propose to use condition number as a metric to diagnose and mitigate the pathologies in PINNs. Inspired by c… ▽ More Physics-informed neural networks (PINNs) have shown promise in solving various partial differential equations (PDEs). However, training pathologies have negatively affected the convergence and prediction accuracy of PINNs, which further limits their practical applications. In this paper, we propose to use condition number as a metric to diagnose and mitigate the pathologies in PINNs. Inspired by classical numerical analysis, where the condition number measures sensitivity and stability, we highlight its pivotal role in the training dynamics of PINNs. We prove theorems to reveal how condition number is related to both the error control and convergence of PINNs. Subsequently, we present an algorithm that leverages preconditioning to improve the condition number. Evaluations of 18 PDE problems showcase the superior performance of our method. Significantly, in 7 of these problems, our method reduces errors by an order of magnitude. These empirical findings verify the critical role of the condition number in PINNs' training. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.06516 [pdf, ps, other]

Hook formula for Coxeter groups via the twisted group ring

Authors: Leonardo C. Mihalcea, Hiroshi Naruse, Changjian Su

Abstract: We use Kostant and Kumar's twisted group ring and its dual to formulate and prove a generalization of Nakada's colored hook formula for any Coxeter groups. For dominant minuscule elements of the Weyl group of a Kac--Moody algebra, this provides another short proof of Nakada's colored hook formula. We use Kostant and Kumar's twisted group ring and its dual to formulate and prove a generalization of Nakada's colored hook formula for any Coxeter groups. For dominant minuscule elements of the Weyl group of a Kac--Moody algebra, this provides another short proof of Nakada's colored hook formula. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: 12 pages

MSC Class: 20F55; 05E16; 16S35

arXiv:2401.05689 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096194

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Authors: Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang

Abstract: Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tu… ▽ More Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tuning on Original Paired Data, the source side data must be transcribed by a well-trained ASR model, which takes a lot of time and not universal. In this paper, we propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction. UCorrect has no dependency on the training data mentioned before. The whole procedure is first to detect whether the character is erroneous, then to generate some candidate characters and finally to select the most confident one to replace the error character. Experiments on the public AISHELL-1 dataset and WenetSpeech dataset show the effectiveness of UCorrect for ASR error correction: 1) it achieves significant WER reduction, achieves 6.83\% even without fine-tuning and 14.29\% after fine-tuning; 2) it outperforms the popular NAR correction models by a large margin with a competitive low latency; and 3) it is an universal method, as it reduces all WERs of the ASR model with different decoding strategies and reduces all WERs of ASR models trained on different scale datasets. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: Accepted in ICASSP 2023

arXiv:2312.17200 [pdf, ps, other]

Chevalley formulae for the motivic Chern classes of Schubert cells and for the stable envelopes

Authors: Leonardo C. Mihalcea, Hiroshi Naruse, Changjian Su

Abstract: We prove a Chevalley formula to multiply the motivic Chern classes of Schubert cells in a generalized flag manifold $G/P$ by the class of any line bundle $\mathcal{L}_λらむだ$. Our formula is given in terms of the $λらむだ$-chains of Lenart and Postnikov. Its proof relies on a change of basis formula in the affine Hecke algebra due to Ram, and on the Hecke algebra action on torus-equivariant K-theory of the c… ▽ More We prove a Chevalley formula to multiply the motivic Chern classes of Schubert cells in a generalized flag manifold $G/P$ by the class of any line bundle $\mathcal{L}_λらむだ$. Our formula is given in terms of the $λらむだ$-chains of Lenart and Postnikov. Its proof relies on a change of basis formula in the affine Hecke algebra due to Ram, and on the Hecke algebra action on torus-equivariant K-theory of the complete flag manifold $G/B$ via left Demazure--Lusztig operators. We revisit some wall-crossing formulae for the stable envelopes in $T^*(G/B)$. We use our Chevalley formula, and the equivalence between motivic Chern classes of Schubert cells and K-theoretic stable envelopes in $T^*(G/B)$, to give formulae for the change of polarization, and for the change of slope for stable envelopes. We prove several additional applications, including Serre, star, and Dynkin, dualities of the Chevalley coefficients, new formulae for the Whittaker functions, and for the Hall--Littlewood polynomials. We also discuss positivity properties of Chevalley coefficients, and properties of the coefficients arising from multiplication by minuscule weights. △ Less

Submitted 6 March, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 43 pages; v3: removed positivity conjectures

MSC Class: 14M15; 14C17 (Primary); 14N15; 17B10; 33D80 (Secondary)

arXiv:2312.05551 [pdf, other]

Multi-dimensional Fair Federated Learning

Authors: Cong Su, Guoxian Yu, Jun Wang, Hui Li, Qingzhong Li, Han Yu

Abstract: Federated learning (FL) has emerged as a promising collaborative and secure paradigm for training a model from decentralized data without compromising privacy. Group fairness and client fairness are two dimensions of fairness that are important for FL. Standard FL can result in disproportionate disadvantages for certain clients, and it still faces the challenge of treating different groups equitab… ▽ More Federated learning (FL) has emerged as a promising collaborative and secure paradigm for training a model from decentralized data without compromising privacy. Group fairness and client fairness are two dimensions of fairness that are important for FL. Standard FL can result in disproportionate disadvantages for certain clients, and it still faces the challenge of treating different groups equitably in a population. The problem of privately training fair FL models without compromising the generalization capability of disadvantaged clients remains open. In this paper, we propose a method, called mFairFL, to address this problem and achieve group fairness and client fairness simultaneously. mFairFL leverages differential multipliers to construct an optimization objective for empirical risk minimization with fairness constraints. Before aggregating locally trained models, it first detects conflicts among their gradients, and then iteratively curates the direction and magnitude of gradients to mitigate these conflicts. Theoretical analysis proves mFairFL facilitates the fairness in model development. The experimental evaluations based on three benchmark datasets show significant advantages of mFairFL compared to seven state-of-the-art baselines. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: Accepted by the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI2024)

arXiv:2311.15369 [pdf, other]

TD-Net: A Tri-domain network for sparse-view CT reconstruction

Authors: Xinyuan Wang, Changqing Su, Bo Xiong

Abstract: Sparse-view CT reconstruction, aimed at reducing X-ray radiation risks, frequently suffers from image quality degradation, manifested as noise and artifacts. Existing post-processing and dual-domain techniques, although effective in radiation reduction, often lead to over-smoothed results, compromising diagnostic clarity. Addressing this, we introduce TD-Net, a pioneering tri-domain approach that… ▽ More Sparse-view CT reconstruction, aimed at reducing X-ray radiation risks, frequently suffers from image quality degradation, manifested as noise and artifacts. Existing post-processing and dual-domain techniques, although effective in radiation reduction, often lead to over-smoothed results, compromising diagnostic clarity. Addressing this, we introduce TD-Net, a pioneering tri-domain approach that unifies sinogram, image, and frequency domain optimizations. By incorporating Frequency Supervision Module(FSM), TD-Net adeptly preserves intricate details, overcoming the prevalent over-smoothing issue. Extensive evaluations demonstrate TD-Net's superior performance in reconstructing high-quality CT images from sparse views, efficiently balancing radiation safety and image fidelity. The enhanced capabilities of TD-Net in varied noise scenarios highlight its potential as a breakthrough in medical imaging. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.13621 [pdf, other]

Knowledge From the Dark Side: Entropy-Reweighted Knowledge Distillation for Balanced Knowledge Transfer

Authors: Chi-Ping Su, Ching-Hsun Tseng, Shin-Jye Lee

Abstract: Knowledge Distillation (KD) transfers knowledge from a larger "teacher" model to a compact "student" model, guiding the student with the "dark knowledge" $\unicode{x2014}$ the implicit insights present in the teacher's soft predictions. Although existing KDs have shown the potential of transferring knowledge, the gap between the two parties still exists. With a series of investigations, we argue t… ▽ More Knowledge Distillation (KD) transfers knowledge from a larger "teacher" model to a compact "student" model, guiding the student with the "dark knowledge" $\unicode{x2014}$ the implicit insights present in the teacher's soft predictions. Although existing KDs have shown the potential of transferring knowledge, the gap between the two parties still exists. With a series of investigations, we argue the gap is the result of the student's overconfidence in prediction, signaling an imbalanced focus on pronounced features while overlooking the subtle yet crucial dark knowledge. To overcome this, we introduce the Entropy-Reweighted Knowledge Distillation (ER-KD), a novel approach that leverages the entropy in the teacher's predictions to reweight the KD loss on a sample-wise basis. ER-KD precisely refocuses the student on challenging instances rich in the teacher's nuanced insights while reducing the emphasis on simpler cases, enabling a more balanced knowledge transfer. Consequently, ER-KD not only demonstrates compatibility with various state-of-the-art KD methods but also further enhances their performance at negligible cost. This approach offers a streamlined and effective strategy to refine the knowledge transfer process in KD, setting a new paradigm in the meticulous handling of dark knowledge. Our code is available at https://github.com/cpsu00/ER-KD. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.13246 [pdf, other]

CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning

Authors: Yilun Liu, Shimin Tao, Xiaofeng Zhao, Ming Zhu, Wenbing Ma, Junhao Zhu, Chang Su, Yutai Hou, Miao Zhang, Min Zhang, Hongxia Ma, Li Zhang, Hao Yang, Yanfei Jiang

Abstract: Instruction tuning is crucial for enabling Language Learning Models (LLMs) in responding to human instructions. The quality of instruction pairs used for tuning greatly affects the performance of LLMs. However, the manual creation of high-quality instruction datasets is costly, leading to the adoption of automatic generation of instruction pairs by LLMs as a popular alternative. To ensure the high… ▽ More Instruction tuning is crucial for enabling Language Learning Models (LLMs) in responding to human instructions. The quality of instruction pairs used for tuning greatly affects the performance of LLMs. However, the manual creation of high-quality instruction datasets is costly, leading to the adoption of automatic generation of instruction pairs by LLMs as a popular alternative. To ensure the high quality of LLM-generated instruction datasets, several approaches have been proposed. Nevertheless, existing methods either compromise dataset integrity by filtering a large proportion of samples, or are unsuitable for industrial applications. In this paper, instead of discarding low-quality samples, we propose CoachLM, a novel approach to enhance the quality of instruction datasets through automatic revisions on samples in the dataset. CoachLM is trained from the samples revised by human experts and significantly increases the proportion of high-quality samples in the dataset from 17.7% to 78.9%. The effectiveness of CoachLM is further assessed on various real-world instruction test sets. The results show that CoachLM improves the instruction-following capabilities of the instruction-tuned LLM by an average of 29.9%, which even surpasses larger LLMs with nearly twice the number of parameters. Furthermore, CoachLM is successfully deployed in a data management system for LLMs at Huawei, resulting in an efficiency improvement of up to 20% in the cleaning of 40k real-world instruction pairs. We release various assets of CoachLM, including the training data, code and test set (https://github.com/lunyiliu/CoachLM). △ Less

Submitted 20 March, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: Accepted by ICDE 2024

arXiv:2311.10451 [pdf]

doi 10.1063/5.0189100

Design and performance of an ultrahigh vacuum spectroscopic-imaging scanning tunneling microscope with a hybrid vibration isolation system

Authors: Pei-Fang Chung, Balaji Venkatesan, Chih-Chuan Su, Jen-Te Chang, Hsu-Kai Cheng, Che-An Liu, Henry Yu, Chia-Seng Chang, Syu-You Guan, Tien-Ming Chuang

Abstract: A spectroscopic imaging-scanning tunneling microscope (SI-STM) allows the atomic scale visualization of surface electronic and magnetic structure of novel quantum materials with high energy resolution. To achieve the optimal performance, low vibration facility is required. Here, we describe the design and the performance of an ultrahigh vacuum STM system supported by a hybrid vibration isolation s… ▽ More A spectroscopic imaging-scanning tunneling microscope (SI-STM) allows the atomic scale visualization of surface electronic and magnetic structure of novel quantum materials with high energy resolution. To achieve the optimal performance, low vibration facility is required. Here, we describe the design and the performance of an ultrahigh vacuum STM system supported by a hybrid vibration isolation system that consists of a pneumatic passive and a piezoelectric active vibration isolation stages. The STM system is equipped with a 1K pot cryogenic insert and a 9 Tesla superconducting magnet, capable of continuous SI-STM measurements for 7 days. A field ion microscopy system is installed for in situ STM tip treatment. We present the detailed vibrational noise analysis of the hybrid vibration isolation system and demonstrate the performance of our STM system by taking high resolution spectroscopic maps and topographic images on several quantum materials. Our results establish a new strategy to achieve an effective vibration isolation system for high-resolution STM and other scanning probe microscopy to investigate the nanoscale quantum phenomena. △ Less

Submitted 27 November, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.01653 [pdf]

INeAT: Iterative Neural Adaptive Tomography

Authors: Bo Xiong, Changqing Su, Zihan Lin, You Zhou, Zhaofei Yu

Abstract: Computed Tomography (CT) with its remarkable capability for three-dimensional imaging from multiple projections, enjoys a broad range of applications in clinical diagnosis, scientific observation, and industrial detection. Neural Adaptive Tomography (NeAT) is a recently proposed 3D rendering method based on neural radiance field for CT, and it demonstrates superior performance compared to traditio… ▽ More Computed Tomography (CT) with its remarkable capability for three-dimensional imaging from multiple projections, enjoys a broad range of applications in clinical diagnosis, scientific observation, and industrial detection. Neural Adaptive Tomography (NeAT) is a recently proposed 3D rendering method based on neural radiance field for CT, and it demonstrates superior performance compared to traditional methods. However, it still faces challenges when dealing with the substantial perturbations and pose shifts encountered in CT scanning processes. Here, we propose a neural rendering method for CT reconstruction, named Iterative Neural Adaptive Tomography (INeAT), which incorporates iterative posture optimization to effectively counteract the influence of posture perturbations in data, particularly in cases involving significant posture variations. Through the implementation of a posture feedback optimization strategy, INeAT iteratively refines the posture corresponding to the input images based on the reconstructed 3D volume. We demonstrate that INeAT achieves artifact-suppressed and resolution-enhanced reconstruction in scenarios with significant pose disturbances. Furthermore, we show that our INeAT maintains comparable reconstruction performance to stable-state acquisitions even using data from unstable-state acquisitions, which significantly reduces the time required for CT scanning and relaxes the stringent requirements on imaging hardware systems, underscoring its immense potential for applications in short-time and low-cost CT technology. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.10676 [pdf, other]

Application-layer Characterization and Traffic Analysis for Encrypted QUIC Transport Protocol

Authors: Qianqian Zhang, Chi-Jiun Su

Abstract: Quick UDP Internet Connection (QUIC) is an emerging end-to-end encrypted, transport-layer protocol, which has been increasingly adopted by popular web services to improve communication security and quality of experience (QoE) towards end-users. However, this tendency makes the traffic analysis more challenging, given the limited information in the QUIC packet header and full encryption on the payl… ▽ More Quick UDP Internet Connection (QUIC) is an emerging end-to-end encrypted, transport-layer protocol, which has been increasingly adopted by popular web services to improve communication security and quality of experience (QoE) towards end-users. However, this tendency makes the traffic analysis more challenging, given the limited information in the QUIC packet header and full encryption on the payload. To address this challenge, a novel rule-based approach is proposed to estimate the application-level traffic attributes without decrypting QUIC packets. Based on the size, timing, and direction information, our proposed algorithm analyzes the associated network traffic to infer the identity of each HTTP request and response pair, as well as the multiplexing feature in each QUIC connection. The inferred HTTP attributes can be used to evaluate the QoE of application-layer services and identify the service categories for traffic classification in the encrypted QUIC connections. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.09467 [pdf]

PC-bzip2: a phase-space continuity enhanced lossless compression algorithm for light field microscopy data

Authors: Changqing Su, Zihan Lin, You Zhou, Shuai Wang, Yuhan Gao, Chenggang Yan, Bo Xiong

Abstract: Light-field fluorescence microscopy (LFM) is a powerful elegant compact method for long-term high-speed imaging of complex biological systems, such as neuron activities and rapid movements of organelles. LFM experiments typically generate terabytes image data and require a huge number of storage space. Some lossy compression algorithms have been proposed recently with good compression performance.… ▽ More Light-field fluorescence microscopy (LFM) is a powerful elegant compact method for long-term high-speed imaging of complex biological systems, such as neuron activities and rapid movements of organelles. LFM experiments typically generate terabytes image data and require a huge number of storage space. Some lossy compression algorithms have been proposed recently with good compression performance. However, since the specimen usually only tolerates low power density illumination for long-term imaging with low phototoxicity, the image signal-to-noise ratio (SNR) is relative-ly low, which will cause the loss of some efficient position or intensity information by using such lossy compression al-gorithms. Here, we propose a phase-space continuity enhanced bzip2 (PC-bzip2) lossless compression method for LFM data as a high efficiency and open-source tool, which combines GPU-based fast entropy judgement and multi-core-CPU-based high-speed lossless compression. Our proposed method achieves almost 10% compression ratio improvement while keeping the capability of high-speed compression, compared with original bzip2. We evaluated our method on fluorescence beads data and fluorescence staining cells data with different SNRs. Moreover, by introducing the temporal continuity, our method shows the superior compression ratio on time series data of zebrafish blood vessels. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2309.14329 [pdf, other]

Innovative Digital Storytelling with AIGC: Exploration and Discussion of Recent Advances

Authors: Rongzhang Gu, Hui Li, Changyue Su, Wayne Wu

Abstract: Digital storytelling, as an art form, has struggled with cost-quality balance. The emergence of AI-generated Content (AIGC) is considered as a potential solution for efficient digital storytelling production. However, the specific form, effects, and impacts of this fusion remain unclear, leaving the boundaries of AIGC combined with storytelling undefined. This work explores the current integration… ▽ More Digital storytelling, as an art form, has struggled with cost-quality balance. The emergence of AI-generated Content (AIGC) is considered as a potential solution for efficient digital storytelling production. However, the specific form, effects, and impacts of this fusion remain unclear, leaving the boundaries of AIGC combined with storytelling undefined. This work explores the current integration state of AIGC and digital storytelling, investigates the artistic value of their fusion in a sample project, and addresses common issues through interviews. Through our study, we conclude that AIGC, while proficient in image creation, voiceover production, and music composition, falls short of replacing humans due to the irreplaceable elements of human creativity and aesthetic sensibilities at present, especially in complex character animations, facial expressions, and sound effects. The research objective is to increase public awareness of the current state, limitations, and challenges arising from combining AIGC and digital storytelling. △ Less

Submitted 28 September, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Project page: https://lsgm-demo.github.io/Leveraging-recent-advances-of-foundation-models-for-story-telling/

arXiv:2309.12875 [pdf, ps, other]

A second-order in time, BGN-based parametric finite element method for geometric flows of curves

Authors: Wei Jiang, Chunmei Su, Ganghui Zhang

Abstract: Over the last two decades, the field of geometric curve evolutions has attracted significant attention from scientific computing. One of the most popular numerical methods for solving geometric flows is the so-called BGN scheme, which was proposed by Barrett, Garcke, and Nürnberg (J. Comput. Phys., 222 (2007), pp.~441--467), due to its favorable properties (e.g., its computational efficiency and t… ▽ More Over the last two decades, the field of geometric curve evolutions has attracted significant attention from scientific computing. One of the most popular numerical methods for solving geometric flows is the so-called BGN scheme, which was proposed by Barrett, Garcke, and Nürnberg (J. Comput. Phys., 222 (2007), pp.~441--467), due to its favorable properties (e.g., its computational efficiency and the good mesh property). However, the BGN scheme is limited to first-order accuracy in time, and how to develop a higher-order numerical scheme is challenging. In this paper, we propose a fully discrete, temporal second-order parametric finite element method, which integrates with two different mesh regularization techniques, for solving geometric flows of curves. The scheme is constructed based on the BGN formulation and a semi-implicit Crank-Nicolson leap-frog time stepping discretization as well as a linear finite element approximation in space. More importantly, we point out that the shape metrics, such as manifold distance and Hausdorff distance, instead of function norms, should be employed to measure numerical errors. Extensive numerical experiments demonstrate that the proposed BGN-based scheme is second-order accurate in time in terms of shape metrics. Moreover, by employing the classical BGN scheme as mesh regularization techniques, our proposed second-order schemes exhibit good properties with respect to the mesh distribution. In addition, an unconditional interlaced energy stability property is obtained for one of the mesh regularization techniques. △ Less

Submitted 20 June, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: 37 pages, 8 figures

MSC Class: 65M60; 65M12; 53C44; 35K55

arXiv:2309.09552 [pdf, other]

A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting

Authors: Yuang Li, Min Zhang, Chang Su, Yinglu Li, Xiaosong Qiao, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Shimin Tao, Hao Yang

Abstract: The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS)… ▽ More The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS) on the hidden states of the Whisper encoder to recognize user-defined named entities. These entities serve as prompts for the Whisper decoder. To optimize the model, we propose a multitask training approach that learns OV-KWS and contextual-ASR tasks. We evaluate our approach on Chinese Aishell hot word subsets and two internal code-switching test sets and show that it significantly improves the entity recall compared to the original Whisper model. Moreover, we demonstrate that the OV-KWS can be a plug-and-play module to enhance the ASR error correction methods and frozen Whisper models. △ Less

Submitted 6 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 5 pages, 2 figures, Accepted to InterSpeech 2024

arXiv:2309.02326 [pdf, other]

Revisiting File Context for Source Code Summarization

Authors: Aakash Bansal, Chia-Yi Su, Collin McMillan

Abstract: Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder-decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem wi… ▽ More Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder-decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself -- that information often resides in other nearby code. In this paper, we revisit the idea of ``file context'' for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several baselines. We find that file context helps on a subset of challenging examples where traditional approaches struggle. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 27 pages + references, Under peer review

arXiv:2308.15471 [pdf]

Three-dimensional imaging of buried heterointerfaces

Authors: Colum M. O'Leary, Haozhi Sha, Jianhua Zhang, Cong Su, Salman Kahn, Huaidong Jiang, Alex Zettl, Jim Ciston, Jianwei Miao

Abstract: We report three-dimensional (3D) structure determination of a twisted hexagonal boron nitride (h-BN) heterointerface from a single-view data set using multislice ptychography. We identify the buried heterointerface between two twisted h-BN flakes with a lateral resolution of 0.57 Å and a depth resolution of 2.5 nm. The latter is a significant improvement (~2.7 times) over the aperture-limited dept… ▽ More We report three-dimensional (3D) structure determination of a twisted hexagonal boron nitride (h-BN) heterointerface from a single-view data set using multislice ptychography. We identify the buried heterointerface between two twisted h-BN flakes with a lateral resolution of 0.57 Å and a depth resolution of 2.5 nm. The latter is a significant improvement (~2.7 times) over the aperture-limited depth resolution of incoherent imaging modes such as annular-dark-field scanning transmission electron microscopy. This is attributed to the diffraction signal extending beyond the aperture edge with the depth resolution set by the curvature of the Ewald sphere. Future advances to this approach could improve the depth resolution to the sub-nanometer level and enable the identification of individual dopants, defects and color centers in twisted heterointerfaces and other materials. △ Less

Submitted 18 May, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.14731 [pdf, other]

Distilled GPT for Source Code Summarization

Authors: Chia-Yi Su, Collin McMillan

Abstract: A code summary is a brief natural language description of source code. Summaries are usually only a single sentence long, and yet form the backbone of developer documentation. A short descriptions such as "changes all visible polygons to the color blue" can give a programmer a high-level idea of what code does without the effort of reading the code itself. Recently, products based on Large Languag… ▽ More A code summary is a brief natural language description of source code. Summaries are usually only a single sentence long, and yet form the backbone of developer documentation. A short descriptions such as "changes all visible polygons to the color blue" can give a programmer a high-level idea of what code does without the effort of reading the code itself. Recently, products based on Large Language Models such as ChatGPT have demonstrated a strong ability to write these descriptions automatically. However, to use these tools, programmers must send their code to untrusted third parties for processing (e.g., via an API call). This loss of custody is not acceptable to many organizations. In this paper, we present an alternative: we train an open source model using sample output generated by GPT-3.5 in a process related to knowledge distillation. Our model is small enough (350m parameters) to be run on a single 16gb GPU, yet we show in our evaluation that it is large enough to mimic GPT-3.5 on this task. △ Less

Submitted 5 February, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 19 pages + 6 figures. Accepted to Automated Software Engineering Journal

arXiv:2308.13920 [pdf, other]

Modeling Programmer Attention as Scanpath Prediction

Authors: Aakash Bansal, Chia-Yi Su, Zachary Karas, Yifan Zhang, Yu Huang, Toby Jia-Jun Li, Collin McMillan

Abstract: This paper launches a new effort at modeling programmer attention by predicting eye movement scanpaths. Programmer attention refers to what information people intake when performing programming tasks. Models of programmer attention refer to machine prediction of what information is important to people. Models of programmer attention are important because they help researchers build better interfac… ▽ More This paper launches a new effort at modeling programmer attention by predicting eye movement scanpaths. Programmer attention refers to what information people intake when performing programming tasks. Models of programmer attention refer to machine prediction of what information is important to people. Models of programmer attention are important because they help researchers build better interfaces, assistive technologies, and more human-like AI. For many years, researchers in SE have built these models based on features such as mouse clicks, key logging, and IDE interactions. Yet the holy grail in this area is scanpath prediction -- the prediction of the sequence of eye fixations a person would take over a visual stimulus. A person's eye movements are considered the most concrete evidence that a person is taking in a piece of information. Scanpath prediction is a notoriously difficult problem, but we believe that the emergence of lower-cost, higher-accuracy eye tracking equipment and better large language models of source code brings a solution within grasp. We present an eye tracking experiment with 27 programmers and a prototype scanpath predictor to present preliminary results and obtain early community feedback. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: Accepter at ASE2023 NIER Track. 4 pages + 1 page for references, 4 figures, 1 table

arXiv:2308.07429 [pdf, other]

Semantic Similarity Loss for Neural Source Code Summarization

Authors: Chia-Yi Su, Collin McMillan

Abstract: This paper presents a procedure for and evaluation of using a semantic similarity metric as a loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural network… ▽ More This paper presents a procedure for and evaluation of using a semantic similarity metric as a loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural networks as either standalone models or as part of a pretrained large language models e.g., GPT, Codex, LLaMA. Yet almost all also use a categorical cross-entropy (CCE) loss function for network optimization. Two problems with CCE are that 1) it computes loss over each word prediction one-at-a-time, rather than evaluating a whole sentence, and 2) it requires a perfect prediction, leaving no room for partial credit for synonyms. In this paper, we extend our previous work on semantic similarity metrics to show a procedure for using semantic similarity as a loss function to alleviate this problem, and we evaluate this procedure in several settings in both metrics-driven and human studies. In essence, we propose to use a semantic similarity metric to calculate loss over the whole output sentence prediction per training batch, rather than just loss for each word. We also propose to combine our loss with CCE for each word, which streamlines the training process compared to baselines. We evaluate our approach over several baselines and report improvement in the vast majority of conditions. △ Less

Submitted 11 June, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: 17 pages + 3 figures + 2 references. Accepted at Journal of Software Evolution and Process on June 2024

arXiv:2307.12651 [pdf, other]

Nanoscale domain engineering in SrRuO$_3$ thin films

Authors: Céline Lichtensteiger, Chia-Ping Su, Iaroslav Gaponenko, Marios Hadjimichael, Ludovica Tovaglieri, Patrycja Paruch, Alexandre Gloter, Jean-Marc Triscone

Abstract: We investigate nanoscale domain engineering via epitaxial coupling in a set of SrRuO$_3$/PbTiO$_3$/SrRuO$_3$ heterostructures epitaxially grown on (110)$_o$-oriented DyScO$_3$ substrates. The SrRuO$_3$ layer thickness is kept at 55 unit cells, whereas the PbTiO$_3$ layer is grown to thicknesses of 23, 45 and 90 unit cells. Through a combination of atomic force microscopy, x-ray diffraction and hig… ▽ More We investigate nanoscale domain engineering via epitaxial coupling in a set of SrRuO$_3$/PbTiO$_3$/SrRuO$_3$ heterostructures epitaxially grown on (110)$_o$-oriented DyScO$_3$ substrates. The SrRuO$_3$ layer thickness is kept at 55 unit cells, whereas the PbTiO$_3$ layer is grown to thicknesses of 23, 45 and 90 unit cells. Through a combination of atomic force microscopy, x-ray diffraction and high resolution scanning transmission electron microscopy studies, we find that above a certain critical thickness of the ferroelectric layer, the large structural distortions associated with the ferroelastic domains propagate through the top SrRuO$_3$ layer, locally modifying the orientation of the orthorhombic SrRuO$_3$ and creating a modulated structure that extends beyond the ferroelectric layer boundaries. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 19 pages, 6 figures, supplementary materials. arXiv admin note: text overlap with arXiv:2304.06948

arXiv:2307.10243 [pdf, other]

Vision-Based Reactive Planning and Control of Quadruped Robots in Unstructured Dynamic Environments

Authors: Tangyu Qian, Zhangli Zhou, Shaocheng Wang, Zhijun Li, Chun-Yi Su, Zhen Kan

Abstract: Quadruped robots have received increasing attention for the past few years. However, existing works primarily focus on static environments or assume the robot has full observations of the environment. This limits their practical applications since real-world environments are often dynamic and partially observable. To tackle these issues, vision-based reactive planning and control (V-RPC) is develo… ▽ More Quadruped robots have received increasing attention for the past few years. However, existing works primarily focus on static environments or assume the robot has full observations of the environment. This limits their practical applications since real-world environments are often dynamic and partially observable. To tackle these issues, vision-based reactive planning and control (V-RPC) is developed in this work. The V-RPC comprises two modules: offline pre-planning and online reactive planning. The pre-planning phase generates a reference trajectory over continuous workspace via sampling-based methods using prior environmental knowledge, given an LTL specification. The online reactive module dynamically adjusts the reference trajectory and control based on the robot's real-time visual perception to adapt to environmental changes. △ Less

Submitted 16 July, 2023; originally announced July 2023.

arXiv:2307.08737 [pdf, other]

doi 10.1103/PhysRevX.14.011051

Demonstrating a long-coherence dual-rail erasure qubit using tunable transmons

Authors: Harry Levine, Arbel Haim, Jimmy S. C. Hung, Nasser Alidoust, Mahmoud Kalaee, Laura DeLorenzo, E. Alex Wollack, Patricio Arrangoiz-Arriola, Amirhossein Khalajhedayati, Rohan Sanil, Hesam Moradinejad, Yotam Vaknin, Aleksander Kubica, David Hover, Shahriar Aghaeimeibodi, Joshua Ari Alcid, Christopher Baek, James Barnett, Kaustubh Bawdekar, Przemyslaw Bienias, Hugh Carson, Cliff Chen, Li Chen, Harut Chinkezian, Eric M. Chisholm , et al. (88 additional authors not shown)

Abstract: Quantum error correction with erasure qubits promises significant advantages over standard error correction due to favorable thresholds for erasure errors. To realize this advantage in practice requires a qubit for which nearly all errors are such erasure errors, and the ability to check for erasure errors without dephasing the qubit. We demonstrate that a "dual-rail qubit" consisting of a pair of… ▽ More Quantum error correction with erasure qubits promises significant advantages over standard error correction due to favorable thresholds for erasure errors. To realize this advantage in practice requires a qubit for which nearly all errors are such erasure errors, and the ability to check for erasure errors without dephasing the qubit. We demonstrate that a "dual-rail qubit" consisting of a pair of resonantly coupled transmons can form a highly coherent erasure qubit, where transmon $T_1$ errors are converted into erasure errors and residual dephasing is strongly suppressed, leading to millisecond-scale coherence within the qubit subspace. We show that single-qubit gates are limited primarily by erasure errors, with erasure probability $p_\text{erasure} = 2.19(2)\times 10^{-3}$ per gate while the residual errors are $\sim 40$ times lower. We further demonstrate mid-circuit detection of erasure errors while introducing $< 0.1\%$ dephasing error per check. Finally, we show that the suppression of transmon noise allows this dual-rail qubit to preserve high coherence over a broad tunable operating range, offering an improved capacity to avoid frequency collisions. This work establishes transmon-based dual-rail qubits as an attractive building block for hardware-efficient quantum error correction. △ Less

Submitted 20 March, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 9+13 pages, 16 figures

Journal ref: Physical Review X 14, 011051 (2024)

arXiv:2307.08674 [pdf, other]

TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT

Authors: Liangyu Zha, Junlin Zhou, Liyao Li, Rui Wang, Qingyi Huang, Saisai Yang, Jing Yuan, Changbao Su, Xiang Li, Aofeng Su, Tao Zhang, Chen Zhou, Kaizhe Shou, Miao Wang, Wufang Zhu, Guoshan Lu, Chao Ye, Yali Ye, Wentao Ye, Yiming Zhang, Xinglong Deng, Jie Xu, Haobo Wang, Gang Chen, Junbo Zhao

Abstract: Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operat… ▽ More Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases. △ Less

Submitted 7 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: Technical Report

arXiv:2306.08827 [pdf, other]

PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs

Authors: Zhongkai Hao, Jiachen Yao, Chang Su, Hang Su, Ziao Wang, Fanzhi Lu, Zeyu Xia, Yichi Zhang, Songming Liu, Lu Lu, Jun Zhu

Abstract: While significant progress has been made on Physics-Informed Neural Networks (PINNs), a comprehensive comparison of these methods across a wide range of Partial Differential Equations (PDEs) is still lacking. This study introduces PINNacle, a benchmarking tool designed to fill this gap. PINNacle provides a diverse dataset, comprising over 20 distinct PDEs from various domains, including heat condu… ▽ More While significant progress has been made on Physics-Informed Neural Networks (PINNs), a comprehensive comparison of these methods across a wide range of Partial Differential Equations (PDEs) is still lacking. This study introduces PINNacle, a benchmarking tool designed to fill this gap. PINNacle provides a diverse dataset, comprising over 20 distinct PDEs from various domains, including heat conduction, fluid dynamics, biology, and electromagnetics. These PDEs encapsulate key challenges inherent to real-world problems, such as complex geometry, multi-scale phenomena, nonlinearity, and high dimensionality. PINNacle also offers a user-friendly toolbox, incorporating about 10 state-of-the-art PINN methods for systematic evaluation and comparison. We have conducted extensive experiments with these methods, offering insights into their strengths and weaknesses. In addition to providing a standardized means of assessing performance, PINNacle also offers an in-depth analysis to guide future research, particularly in areas such as domain decomposition methods and loss reweighting for handling multi-scale problems and complex geometry. To the best of our knowledge, it is the largest benchmark with a diverse and comprehensive evaluation that will undoubtedly foster further research in PINNs. △ Less

Submitted 5 October, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

arXiv:2306.07560 [pdf, other]

doi 10.1109/TVCG.2023.3286392

Creating Emordle: Animating Word Cloud for Emotion Expression

Authors: Liwenhan Xie, Xinhuan Shu, Jeon Cheol Su, Yun Wang, Siming Chen, Huamin Qu

Abstract: We propose emordle, a conceptual design that animates wordles (compact word clouds) to deliver their emotional context to the audiences. To inform the design, we first reviewed online examples of animated texts and animated wordles, and summarized strategies for injecting emotion into the animations. We introduced a composite approach that extends an existing animation scheme for one word to multi… ▽ More We propose emordle, a conceptual design that animates wordles (compact word clouds) to deliver their emotional context to the audiences. To inform the design, we first reviewed online examples of animated texts and animated wordles, and summarized strategies for injecting emotion into the animations. We introduced a composite approach that extends an existing animation scheme for one word to multiple words in a wordle with two global factors: the randomness of text animation (entropy) and the animation speed (speed). To create an emordle, general users can choose one predefined animated scheme that matches the intended emotion class and fine-tune the emotion intensity with the two parameters. We designed proof-of-concept emordle examples for four basic emotion classes, namely happiness, sadness, anger, and fear. We conducted two controlled crowdsourcing studies to evaluate our approach. The first study confirmed that people generally agreed on the conveyed emotions from well-crafted animations, and the second one demonstrated that our identified factors helped fine-tune the delivered emotion extent. We also invited general users to create emordles on their own based on our proposed framework. Through this user study, we confirmed the effectiveness of the approach. We concluded with implications for future research opportunities of supporting emotion expression in visualizations. △ Less

Submitted 14 June, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: Accepted in IEEE Transactions on Visualization and Computer Graphics

arXiv:2306.02816 [pdf, other]

MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks

Authors: Jiachen Yao, Chang Su, Zhongkai Hao, Songming Liu, Hang Su, Jun Zhu

Abstract: Physics-informed Neural Networks (PINNs) have recently achieved remarkable progress in solving Partial Differential Equations (PDEs) in various fields by minimizing a weighted sum of PDE loss and boundary loss. However, there are several critical challenges in the training of PINNs, including the lack of theoretical frameworks and the imbalance between PDE loss and boundary loss. In this paper, we… ▽ More Physics-informed Neural Networks (PINNs) have recently achieved remarkable progress in solving Partial Differential Equations (PDEs) in various fields by minimizing a weighted sum of PDE loss and boundary loss. However, there are several critical challenges in the training of PINNs, including the lack of theoretical frameworks and the imbalance between PDE loss and boundary loss. In this paper, we present an analysis of second-order non-homogeneous PDEs, which are classified into three categories and applicable to various common problems. We also characterize the connections between the training loss and actual error, guaranteeing convergence under mild conditions. The theoretical analysis inspires us to further propose MultiAdam, a scale-invariant optimizer that leverages gradient momentum to parameter-wisely balance the loss terms. Extensive experiment results on multiple problems from different physical domains demonstrate that our MultiAdam solver can improve the predictive accuracy by 1-2 orders of magnitude compared with strong baselines. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Showing 1–50 of 245 results for author: Su, C