Search | arXiv e-print repository

Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

Authors: Thong Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi Le, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

Abstract: Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering t… ▽ More Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering the downstream performance across unpopular subjects. To address these problems, we propose a contrastive objective with a subtractive angular margin to regularize cross-modal representations in their effort to reach perfect similarity. Furthermore, to adapt to the non-uniform concept distribution, we propose a multi-layer perceptron (MLP)-parameterized weighting function that maps loss values to sample weights which enable dynamic adjustment of the model's focus throughout the training. With the training guided by a small amount of unbiased meta-data and augmented by video-text data generated by large vision-language model, we improve video-language representations and achieve superior performances on commonly used video question answering and text-video retrieval datasets. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024

arXiv:2407.02721 [pdf, ps, other]

Model and Feature Diversity for Bayesian Neural Networks in Mutual Learning

Authors: Cuong Pham, Cuong C. Nguyen, Trung Le, Dinh Phung, Gustavo Carneiro, Thanh-Toan Do

Abstract: Bayesian Neural Networks (BNNs) offer probability distributions for model parameters, enabling uncertainty quantification in predictions. However, they often underperform compared to deterministic neural networks. Utilizing mutual learning can effectively enhance the performance of peer BNNs. In this paper, we propose a novel approach to improve BNNs performance through deep mutual learning. The p… ▽ More Bayesian Neural Networks (BNNs) offer probability distributions for model parameters, enabling uncertainty quantification in predictions. However, they often underperform compared to deterministic neural networks. Utilizing mutual learning can effectively enhance the performance of peer BNNs. In this paper, we propose a novel approach to improve BNNs performance through deep mutual learning. The proposed approaches aim to increase diversity in both network parameter distributions and feature distributions, promoting peer networks to acquire distinct features that capture different characteristics of the input, which enhances the effectiveness of mutual learning. Experimental results demonstrate significant improvements in the classification accuracy, negative log-likelihood, and expected calibration error when compared to traditional mutual learning for BNNs. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted to NeurIPS 2023

arXiv:2407.02662 [pdf, other]

Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms

Authors: Viet Cuong Nguyen, Mini Jain, Abhijat Chauhan, Heather Jaime Soled, Santiago Alvarez Lesmes, Zihang Li, Michael L. Birnbaum, Sunny X. Tang, Srijan Kumar, Munmun De Choudhury

Abstract: Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis… ▽ More Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis and treatment. Detecting and understanding engagement with such content is crucial to mitigating their harmful effects on public health. We perform the first quantitative study of the phenomenon using YouTube Shorts and Bitchute as the sites of study. We contribute MentalMisinfo, a novel labeled mental health misinformation (MHMisinfo) dataset of 739 videos (639 from Youtube and 100 from Bitchute) and 135372 comments in total, using an expert-driven annotation schema. We first found that few-shot in-context learning with large language models (LLMs) are effective in detecting MHMisinfo videos. Next, we discover distinct and potentially alarming linguistic patterns in how audiences engage with MHMisinfo videos through commentary on both video-sharing platforms. Across the two platforms, comments could exacerbate prevailing stigma with some groups showing heightened susceptibility to and alignment with MHMisinfo. We discuss technical and public health-driven adaptive solutions to tackling the "epidemic" of mental health misinformation online. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 12 pages, in submission to ICWSM

arXiv:2407.02264 [pdf, other]

SOAF: Scene Occlusion-aware Neural Acoustic Field

Authors: Huiyu Gao, Jiahao Ma, David Ahmedt-Aristizabal, Chuong Nguyen, Miaomiao Liu

Abstract: This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach… ▽ More This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation. Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling and then transforms it based on scene transmittance learned from the input video. We extract features from the local acoustic field centred around the receiver using a Fibonacci Sphere to generate binaural audio for novel views with a direction-aware attention mechanism. Extensive experiments on the real dataset RWAVS and the synthetic dataset SoundSpaces demonstrate that our method outperforms previous state-of-the-art techniques in audio generation. Project page: https://github.com/huiyu-gao/SOAF/. △ Less

Submitted 2 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.00535 [pdf, other]

AI-powered multimodal modeling of personalized hemodynamics in aortic stenosis

Authors: Caglar Ozturk, Daniel H. Pak, Luca Rosalia, Debkalpa Goswami, Mary E. Robakowski, Raymond McKay, Christopher T. Nguyen, James S. Duncan, Ellen T. Roche

Abstract: Aortic stenosis (AS) is the most common valvular heart disease in developed countries. High-fidelity preclinical models can improve AS management by enabling therapeutic innovation, early diagnosis, and tailored treatment planning. However, their use is currently limited by complex workflows necessitating lengthy expert-driven manual operations. Here, we propose an AI-powered computational framewo… ▽ More Aortic stenosis (AS) is the most common valvular heart disease in developed countries. High-fidelity preclinical models can improve AS management by enabling therapeutic innovation, early diagnosis, and tailored treatment planning. However, their use is currently limited by complex workflows necessitating lengthy expert-driven manual operations. Here, we propose an AI-powered computational framework for accelerated and democratized patient-specific modeling of AS hemodynamics from computed tomography. First, we demonstrate that our automated meshing algorithms can generate task-ready geometries for both computational and benchtop simulations with higher accuracy and 100 times faster than existing approaches. Then, we show that our approach can be integrated with fluid-structure interaction and soft robotics models to accurately recapitulate a broad spectrum of clinical hemodynamic measurements of diverse AS patients. The efficiency and reliability of these algorithms make them an ideal complementary tool for personalized high-fidelity modeling of AS biomechanics, hemodynamics, and treatment planning. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: CO and DHP contributed equally to this work. JSD and ETR are corresponding authors

arXiv:2406.18788 [pdf, other]

Wafer-Scale Fabrication of InGaP-on-Insulator for Nonlinear and Quantum Photonic Applications

Authors: Lillian Thiel, Joshua E. Castro, Trevor J. Steiner, Catherine L. Nguyen, Audrey Pechilis, Liao Duan, Nicholas Lewis, Garrett D. Cole, John E. Bowers, Galan Moody

Abstract: The development of manufacturable and scalable integrated nonlinear photonic materials is driving key technologies in diverse areas such as high-speed communications, signal processing, sensing, and quantum information. Here, we demonstrate a novel nonlinear platform -- InGaP-on-insulator -- optimized for visible-to-telecommunication wavelength $χかい^{\left(2\right)}$ nonlinear optical processes. In… ▽ More The development of manufacturable and scalable integrated nonlinear photonic materials is driving key technologies in diverse areas such as high-speed communications, signal processing, sensing, and quantum information. Here, we demonstrate a novel nonlinear platform -- InGaP-on-insulator -- optimized for visible-to-telecommunication wavelength $χかい^{\left(2\right)}$ nonlinear optical processes. In this work, we detail our 100-mm wafer-scale InGaP-on-insulator fabrication process realized via wafer bonding, optical lithography, and dry-etching techniques. The resulting wafers yield 1000s of components in each fabrication cycle, with initial designs that include chip-to-fiber couplers, 12.5-cm-long nested spiral waveguides, and arrays of microring resonators with free-spectral ranges spanning 400-900 GHz. We demonstrate intrinsic resonator quality factors as high as 324,000 (440,000) for single-resonance (split-resonance) modes near 1550 nm corresponding to 1.56 dBでしべる cm$^{-1}$ (1.22 dBでしべる cm$^{-1}$) propagation loss. We analyze the loss versus waveguide width and resonator radius to establish the operating regime for optimal 775-to-1550 nm phase matching. By combining the high $χかい^{\left(2\right)}$ and $χかい^{\left(3\right)}$ optical nonlinearity of InGaP with wafer-scale fabrication and low propagation loss, these results open promising possibilities for entangled-photon, multi-photon, and squeezed light generation. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.10583 [pdf, other]

Demonstration of neutron identification in neutrino interactions in the MicroBooNE liquid argon time projection chamber

Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (165 additional authors not shown)

Abstract: A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data const… ▽ More A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data constraining their production rates and kinematics. We present the first demonstration of tagging neutrino-induced neutrons in liquid argon time projection chambers using secondary protons emitted from neutron-argon interactions in the MicroBooNE detector. We describe the method developed to identify neutrino-induced neutrons and demonstrate its performance using neutrons produced in muon-neutrino charged current interactions. The method is validated using a small subset of MicroBooNE's total dataset. The selection yields a sample with $60\%$ of selected tracks corresponding to neutron-induced secondary protons. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Report number: FERMILAB-PUB-24-0301

arXiv:2406.07514 [pdf, other]

Scintillation Light in SBND: Simulation, Reconstruction, and Expected Performance of the Photon Detection System

Authors: SBND Collaboration, P. Abratenko, R. Acciarri, C. Adams, L. Aliaga-Soplin, O. Alterkait, R. Alvarez-Garrote, C. Andreopoulos, A. Antonakis, L. Arellano, J. Asaadi, W. Badgett, S. Balasubramanian, V. Basque, A. Beever, B. Behera, E. Belchior, M. Betancourt, A. Bhat, M. Bishai, A. Blake, B. Bogart, J. Bogenschuetz, D. Brailsford, A. Brandt , et al. (158 additional authors not shown)

Abstract: SBND is the near detector of the Short-Baseline Neutrino program at Fermilab. Its location near to the Booster Neutrino Beam source and relatively large mass will allow the study of neutrino interactions on argon with unprecedented statistics. This paper describes the expected performance of the SBND photon detection system, using a simulated sample of beam neutrinos and cosmogenic particles. Its… ▽ More SBND is the near detector of the Short-Baseline Neutrino program at Fermilab. Its location near to the Booster Neutrino Beam source and relatively large mass will allow the study of neutrino interactions on argon with unprecedented statistics. This paper describes the expected performance of the SBND photon detection system, using a simulated sample of beam neutrinos and cosmogenic particles. Its design is a dual readout concept combining a system of 120 photomultiplier tubes, used for triggering, with a system of 192 X-ARAPUCA devices, located behind the anode wire planes. Furthermore, covering the cathode plane with highly-reflective panels coated with a wavelength-shifting compound recovers part of the light emitted towards the cathode, where no optical detectors exist. We show how this new design provides a high light yield and a more uniform detection efficiency, an excellent timing resolution and an independent 3D-position reconstruction using only the scintillation light. Finally, the whole reconstruction chain is applied to recover the temporal structure of the beam spill, which is resolved with a resolution on the order of nanoseconds. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 21 pages, 17 figures

Report number: FERMILAB-PUB-24-0303-PPD

arXiv:2406.05615 [pdf, other]

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

Authors: Thong Nguyen, Yi Bin, Junbin Xiao, Leigang Qu, Yicong Li, Jay Zhangjie Wu, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

Abstract: Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video-language understanding systems with human-like senses since a video-language pair can mimic both our linguistic medium and visual environment with te… ▽ More Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video-language understanding systems with human-like senses since a video-language pair can mimic both our linguistic medium and visual environment with temporal dynamics. In this survey, we review the key tasks of these systems and highlight the associated challenges. Based on the challenges, we summarize their methods from model architecture, model training, and data perspectives. We also conduct performance comparison among the methods, and discuss promising directions for future research. △ Less

Submitted 1 July, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

Comments: Accepted at ACL 2024 (Findings)

arXiv:2406.03820 [pdf, other]

A Survey on Intelligent Internet of Things: Applications, Security, Privacy, and Future Directions

Authors: Ons Aouedi, Thai-Hoc Vu, Alessio Sacco, Dinh C. Nguyen, Kandaraj Piamrat, Guido Marchetto, Quoc-Viet Pham

Abstract: The rapid advances in the Internet of Things (IoT) have promoted a revolution in communication technology and offered various customer services. Artificial intelligence (AI) techniques have been exploited to facilitate IoT operations and maximize their potential in modern application scenarios. In particular, the convergence of IoT and AI has led to a new networking paradigm called Intelligent IoT… ▽ More The rapid advances in the Internet of Things (IoT) have promoted a revolution in communication technology and offered various customer services. Artificial intelligence (AI) techniques have been exploited to facilitate IoT operations and maximize their potential in modern application scenarios. In particular, the convergence of IoT and AI has led to a new networking paradigm called Intelligent IoT (IIoT), which has the potential to significantly transform businesses and industrial domains. This paper presents a comprehensive survey of IIoT by investigating its significant applications in mobile networks, as well as its associated security and privacy issues. Specifically, we explore and discuss the roles of IIoT in a wide range of key application domains, from smart healthcare and smart cities to smart transportation and smart industries. Through such extensive discussions, we investigate important security issues in IIoT networks, where network attacks, confidentiality, integrity, and intrusion are analyzed, along with a discussion of potential countermeasures. Privacy issues in IIoT networks were also surveyed and discussed, including data, location, and model privacy leakage. Finally, we outline several key challenges and highlight potential research directions in this important area. △ Less

Submitted 21 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: This work has been accepted by IEEE Communications Surveys & Tutorials

arXiv:2405.19723 [pdf, other]

Encoding and Controlling Global Semantics for Long-form Video Question Answering

Authors: Thong Thanh Nguyen, Zhiyuan Hu, Xiaobao Wu, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu

Abstract: Seeking answers effectively for long videos is essential to build video question answering (videoQA) systems. Previous methods adaptively select frames and regions from long videos to save computations. However, this fails to reason over the whole sequence of video, leading to sub-optimal performance. To address this problem, we introduce a state space layer (SSL) into multi-modal Transformer to e… ▽ More Seeking answers effectively for long videos is essential to build video question answering (videoQA) systems. Previous methods adaptively select frames and regions from long videos to save computations. However, this fails to reason over the whole sequence of video, leading to sub-optimal performance. To address this problem, we introduce a state space layer (SSL) into multi-modal Transformer to efficiently integrate global semantics of the video, which mitigates the video information loss caused by frame and region selection modules. Our SSL includes a gating unit to enable controllability over the flow of global semantics into visual representations. To further enhance the controllability, we introduce a cross-modal compositional congruence (C^3) objective to encourage global semantics aligned with the question. To rigorously evaluate long-form videoQA capacity, we construct two new benchmarks Ego-QA and MAD-QA featuring videos of considerably long length, i.e. 17.5 minutes and 1.9 hours, respectively. Extensive experiments demonstrate the superiority of our framework on these new as well as existing datasets. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Work in progress

arXiv:2405.14442 [pdf, other]

Fully parallel implementation of digital memcomputing on FPGA

Authors: Dyk Chung Nguyen, Yuriy V. Pershin

Abstract: We present a fully parallel digital memcomputing solver implemented on a field-programmable gate array (FPGA) board. For this purpose, we have designed an FPGA code that solves the ordinary differential equations associated with digital memcomputing in parallel. A feature of the code is the use of only integer-type variables and integer constants to enhance optimization. Consequently, each integra… ▽ More We present a fully parallel digital memcomputing solver implemented on a field-programmable gate array (FPGA) board. For this purpose, we have designed an FPGA code that solves the ordinary differential equations associated with digital memcomputing in parallel. A feature of the code is the use of only integer-type variables and integer constants to enhance optimization. Consequently, each integration step in our solver is executed in 96~ns. This method was utilized for difficult instances of the Boolean satisfiability (SAT) problem close to a phase transition, involving up to about 150 variables. Our results demonstrate that the parallel implementation reduces the scaling exponent by about 1 compared to a sequential C++ code on a standard computer. Additionally, compared to C++ code, we observed a time-to-solution advantage of about three orders of magnitude. Given the limitations of FPGA resources, the current implementation of digital memcomputing will be especially useful for solving compact but challenging problems. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.12201 [pdf, ps, other]

Quantum-symmetric equivalence is a graded Morita invariant

Authors: Hongdi Huang, Van C. Nguyen, Padmini Veerapen, Kent B. Vashaw, Xingting Wang

Abstract: We show that if two $m$-homogeneous algebras have Morita equivalent graded module categories, then they are quantum-symmetrically equivalent, that is, there is a monoidal equivalence between the categories of comodules for their associated universal quantum groups (in the sense of Manin) which sends one algebra to the other. As a consequence, any Zhang twist of an $m$-homogeneous algebra is a 2-co… ▽ More We show that if two $m$-homogeneous algebras have Morita equivalent graded module categories, then they are quantum-symmetrically equivalent, that is, there is a monoidal equivalence between the categories of comodules for their associated universal quantum groups (in the sense of Manin) which sends one algebra to the other. As a consequence, any Zhang twist of an $m$-homogeneous algebra is a 2-cocycle twist by some 2-cocycle from its Manin's universal quantum group. △ Less

Submitted 20 May, 2024; originally announced May 2024.

MSC Class: 16T05; 16W50; 17B37

arXiv:2405.08542 [pdf, other]

Industrial Metaverse: Enabling Technologies, Open Problems, and Future Trends

Authors: Shiying Zhang, Jun Li, Long Shi, Ming Ding, Dinh C. Nguyen, Wen Chen, Zhu Han

Abstract: As an emerging technology that enables seamless integration between the physical and virtual worlds, the Metaverse has great potential to be deployed in the industrial production field with the development of extended reality (XR) and next-generation communication networks. This deployment, called the Industrial Metaverse, is used for product design, production operations, industrial quality inspe… ▽ More As an emerging technology that enables seamless integration between the physical and virtual worlds, the Metaverse has great potential to be deployed in the industrial production field with the development of extended reality (XR) and next-generation communication networks. This deployment, called the Industrial Metaverse, is used for product design, production operations, industrial quality inspection, and product testing. However, there lacks of in-depth understanding of the enabling technologies associated with the Industrial Metaverse. This encompasses both the precise industrial scenarios targeted by each technology and the potential migration of technologies developed in other domains to the industrial sector. Driven by this issue, in this article, we conduct a comprehensive survey of the state-of-the-art literature on the Industrial Metaverse. Specifically, we first analyze the advantages of the Metaverse for industrial production. Then, we review a collection of key enabling technologies of the Industrial Metaverse, including blockchain (BC), digital twin (DT), 6G, XR, and artificial intelligence (AI), and analyze how these technologies can support different aspects of industrial production. Subsequently, we present numerous formidable challenges encountered within the Industrial Metaverse, including confidentiality and security concerns, resource limitations, and interoperability constraints. Furthermore, we investigate the extant solutions devised to address them. Finally, we briefly outline several open issues and future research directions of the Industrial Metaverse. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 26 pages, 8 figures

arXiv:2405.02994 [pdf, other]

Extended State Observer for Mismatch Disturbances Using Taylor Approximation of the Integral

Authors: Cuong Duc Nguyen

Abstract: The development of disturbance estimators using extended state observers (ESOs) typically assumes that the system is observable. This paper introduces an improved method for systems that are initially unobservable, leveraging Taylor expansion to approximate the integral of disturbance dynamics. A new extended system is formulated based on this approximation, enabling the design of an observer that… ▽ More The development of disturbance estimators using extended state observers (ESOs) typically assumes that the system is observable. This paper introduces an improved method for systems that are initially unobservable, leveraging Taylor expansion to approximate the integral of disturbance dynamics. A new extended system is formulated based on this approximation, enabling the design of an observer that achieves exponential stability of the error dynamics. The proposed method's efficacy is demonstrated through a practical example, highlighting its potential for robust disturbance estimation in dynamic systems. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.00306 [pdf, other]

Environment-adaptive machine learning potentials

Authors: Ngoc Cuong Nguyen, Dionysios Sema

Abstract: The development of interatomic potentials that can effectively capture a wide range of atomic environments is a complex challenge due to several reasons. Materials can exist in numerous structural forms (e.g., crystalline, amorphous, defects, interfaces) and phases (solid, liquid, gas, plasma). Each form may require different treatment in potential modeling to reflect the real physical behavior co… ▽ More The development of interatomic potentials that can effectively capture a wide range of atomic environments is a complex challenge due to several reasons. Materials can exist in numerous structural forms (e.g., crystalline, amorphous, defects, interfaces) and phases (solid, liquid, gas, plasma). Each form may require different treatment in potential modeling to reflect the real physical behavior correctly. Atoms interact through various forces such as electrostatic, van der Waals, ionic bonding, covalent bonding, and metallic bonding, which manifest differently depending on the chemical elements and their electronic structures. Furthermore, the effective interaction among atoms can change with external conditions like temperature, pressure, and chemical environment. Consequently, creating an interatomic potential that performs well across diverse conditions is difficult since optimizing the potential for one set of conditions can lead to a trade-off in the accuracy of predicted properties associated with other conditions. In this paper, we present a method to construct accurate, efficient and transferable interatomic potentials by adapting to the local atomic environment of each atom within a system. The collection of atomic environments of interest is partitioned into several clusters of atomic environments. Each cluster represents a distinctive local environment and is used to define a corresponding local potential. We introduce a many-body many-potential expansion to smoothly blend these local potentials to ensure global continuity of the potential energy surface. This is achieved by computing the probability functions that determine the likelihood of an atom belonging to each cluster. We apply the environment-adaptive machine learning potentials to predict observable properties for Ta element and InP compound, and compare them with density functional theory calculations. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 17 pages, 7 figures, and 10 tables

arXiv:2404.18960 [pdf]

Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods

Authors: Steven Shave, Richard Kasprowicz, Abdullah M. Athar, Denise Vlachou, Neil O. Carragher, Cuong Q. Nguyen

Abstract: The Connectivity Map (CMap) is a large publicly available database of cellular transcriptomic responses to chemical and genetic perturbations built using a standardized acquisition protocol known as the L1000 technique. Databases such as CMap provide an exciting opportunity to enrich drug discovery efforts, providing a 'known' phenotypic landscape to explore and enabling the development of state o… ▽ More The Connectivity Map (CMap) is a large publicly available database of cellular transcriptomic responses to chemical and genetic perturbations built using a standardized acquisition protocol known as the L1000 technique. Databases such as CMap provide an exciting opportunity to enrich drug discovery efforts, providing a 'known' phenotypic landscape to explore and enabling the development of state of the art techniques for enhanced information extraction and better informed decisions. Whilst multiple methods for measuring phenotypic similarity and interrogating profiles have been developed, the field is severely lacking standardized benchmarks using appropriate data splitting for training and unbiased evaluation of machine learning methods. To address this, we have developed 'Leak Proof CMap' and exemplified its application to a set of common transcriptomic and generic phenotypic similarity methods along with an exemplar triplet loss-based method. Benchmarking in three critical performance areas (compactness, distinctness, and uniqueness) is conducted using carefully crafted data splits ensuring no similar cell lines or treatments with shared or closely matching responses or mechanisms of action are present in training, validation, or test sets. This enables testing of models with unseen samples akin to exploring treatments with novel modes of action in novel patient derived cell lines. With a carefully crafted benchmark and data splitting regime in place, the tooling now exists to create performant phenotypic similarity methods for use in personalized medicine (novel cell lines) and to better augment high throughput phenotypic screening technologies with the L1000 transcriptomic technology. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.14908 [pdf, other]

Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation

Authors: Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez, Miaomiao Liu

Abstract: This paper focuses on self-supervised monocular depth estimation in dynamic scenes trained on monocular videos. Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss. Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation, resulting in inaccurate depth estimation. This paper… ▽ More This paper focuses on self-supervised monocular depth estimation in dynamic scenes trained on monocular videos. Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss. Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation, resulting in inaccurate depth estimation. This paper proposes a self-supervised training framework exploiting pseudo depth labels for dynamic regions from training data. The key contribution of our framework is to decouple depth estimation for static and dynamic regions of images in the training data. We start with an unsupervised depth estimation approach, which provides reliable depth estimates for static regions and motion cues for dynamic regions and allows us to extract moving object information at the instance level. In the next stage, we use an object network to estimate the depth of those moving objects assuming rigid motions. Then, we propose a new scale alignment module to address the scale ambiguity between estimated depths for static and dynamic regions. We can then use the depth labels generated to train an end-to-end depth estimation network and improve its performance. Extensive experiments on the Cityscapes and KITTI datasets show that our self-training strategy consistently outperforms existing self/unsupervised depth estimation methods. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR2024

arXiv:2404.14044 [pdf, other]

HashPoint: Accelerated Point Searching and Sampling for Neural Rendering

Authors: Jiahao Ma, Miaomiao Liu, David Ahmedt-Aristizaba, Chuong Nguyen

Abstract: In this paper, we address the problem of efficient point searching and sampling for volume neural rendering. Within this realm, two typical approaches are employed: rasterization and ray tracing. The rasterization-based methods enable real-time rendering at the cost of increased memory and lower fidelity. In contrast, the ray-tracing-based methods yield superior quality but demand longer rendering… ▽ More In this paper, we address the problem of efficient point searching and sampling for volume neural rendering. Within this realm, two typical approaches are employed: rasterization and ray tracing. The rasterization-based methods enable real-time rendering at the cost of increased memory and lower fidelity. In contrast, the ray-tracing-based methods yield superior quality but demand longer rendering time. We solve this problem by our HashPoint method combining these two strategies, leveraging rasterization for efficient point searching and sampling, and ray marching for rendering. Our method optimizes point searching by rasterizing points within the camera's view, organizing them in a hash table, and facilitating rapid searches. Notably, we accelerate the rendering process by adaptive sampling on the primary surface encountered by the ray. Our approach yields substantial speed-up for a range of state-of-the-art ray-tracing-based methods, maintaining equivalent or superior accuracy across synthetic and real test datasets. The code will be available at https://jiahao-ma.github.io/hashpoint/. △ Less

Submitted 11 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: CVPR2024 Highlight

Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

arXiv:2404.11792 [pdf, other]

Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study

Authors: Zooey Nguyen, Anthony Annunziata, Vinh Luong, Sang Dinh, Quynh Le, Anh Hai Ha, Chanh Le, Hong An Phan, Shruti Raghavan, Christopher Nguyen

Abstract: This paper investigates the impact of domain-specific model fine-tuning and of reasoning mechanisms on the performance of question-answering (Q&A) systems powered by large language models (LLMs) and Retrieval-Augmented Generation (RAG). Using the FinanceBench SEC financial filings dataset, we observe that, for RAG, combining a fine-tuned embedding model with a fine-tuned LLM achieves better accura… ▽ More This paper investigates the impact of domain-specific model fine-tuning and of reasoning mechanisms on the performance of question-answering (Q&A) systems powered by large language models (LLMs) and Retrieval-Augmented Generation (RAG). Using the FinanceBench SEC financial filings dataset, we observe that, for RAG, combining a fine-tuned embedding model with a fine-tuned LLM achieves better accuracy than generic models, with relatively greater gains attributable to fine-tuned embedding models. Additionally, employing reasoning iterations on top of RAG delivers an even bigger jump in performance, enabling the Q&A systems to get closer to human-expert quality. We discuss the implications of such findings, propose a structured technical design space capturing major technical components of Q&A AI, and provide recommendations for making high-impact technical choices for such components. We plan to follow up on this work with actionable guides for AI teams and further investigations into the impact of domain-specific augmentation in RAG and into agentic AI capabilities such as advanced planning and reasoning. △ Less

Submitted 19 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: Fixed typo of OODA's score on harder-question set in Table 2

arXiv:2404.07626 [pdf, other]

Homography Guided Temporal Fusion for Road Line and Marking Segmentation

Authors: Shan Wang, Chuong Nguyen, Jiawei Liu, Kaihao Zhang, Wenhan Luo, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Hongdong Li

Abstract: Reliable segmentation of road lines and markings is critical to autonomous driving. Our work is motivated by the observations that road lines and markings are (1) frequently occluded in the presence of moving vehicles, shadow, and glare and (2) highly structured with low intra-class shape variance and overall high appearance consistency. To solve these issues, we propose a Homography Guided Fusion… ▽ More Reliable segmentation of road lines and markings is critical to autonomous driving. Our work is motivated by the observations that road lines and markings are (1) frequently occluded in the presence of moving vehicles, shadow, and glare and (2) highly structured with low intra-class shape variance and overall high appearance consistency. To solve these issues, we propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues facilitating the correct classification of the partially occluded road lines or markings. To reduce computational complexity, a novel surface normal estimator is proposed to establish spatial correspondences between the sampled frames, allowing the HomoFusion module to perform a pixel-to-pixel attention mechanism in updating the representation of the occluded road lines or markings. Experiments on ApolloScape, a large-scale lane mark segmentation dataset, and ApolloScape Night with artificial simulated night-time road conditions, demonstrate that our method outperforms other existing SOTA lane mark segmentation models with less than 9\% of their parameters and computational complexity. We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy. We also prove the versatility of our HomoFusion approach by applying it to the problem of water puddle segmentation and achieving SOTA performance. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Accepted by ICCV 2023

arXiv:2403.19443 [pdf, other]

Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model

Authors: Qi Gou, Cam-Tu Nguyen

Abstract: Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. However, as they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that are not aligned with human values. This paper studies two main approaches to LLM alignment: Reinforcement Learning with Human Feedback (RLHF) and contrastive le… ▽ More Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. However, as they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that are not aligned with human values. This paper studies two main approaches to LLM alignment: Reinforcement Learning with Human Feedback (RLHF) and contrastive learning-based methods like Direct Preference Optimization (DPO). By analyzing the stability and robustness of RLHF and DPO, we propose MPO (Mixed Preference Optimization), a novel method that mitigates the weaknesses of both approaches. Specifically, we propose a two-stage training procedure: first train DPO on an easy dataset, and then perform RLHF on a difficult set with DPO model being the reference model. Here, the easy and difficult sets are constructed by a well-trained reward model that splits response pairs into those with large gaps of reward (easy), and those with small gaps (difficult). The first stage allows us to obtain a relatively optimal policy (LLM) model quickly, whereas the second stage refines LLM with online RLHF, thus mitigating the distribution shift issue associated with DPO. Experiments are conducted on two public alignment datasets, namely HH-RLHF and TLDR, demonstrating the effectiveness of MPO, both in terms of GPT4 and human evaluation. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.17486 [pdf, other]

KDMCSE: Knowledge Distillation Multimodal Sentence Embeddings with Adaptive Angular margin Contrastive Learning

Authors: Cong-Duy Nguyen, Thong Nguyen, Xiaobao Wu, Anh Tuan Luu

Abstract: Previous work on multimodal sentence embedding has proposed multimodal contrastive learning and achieved promising results. However, by taking the rest of the batch as negative samples without reviewing when forming contrastive pairs, those studies encountered many suspicious and noisy negative examples, significantly affecting the methods' overall performance. In this work, we propose KDMCSE (Kno… ▽ More Previous work on multimodal sentence embedding has proposed multimodal contrastive learning and achieved promising results. However, by taking the rest of the batch as negative samples without reviewing when forming contrastive pairs, those studies encountered many suspicious and noisy negative examples, significantly affecting the methods' overall performance. In this work, we propose KDMCSE (Knowledge Distillation Multimodal contrastive learning of Sentence Embeddings), a novel approach that enhances the discrimination and generalizability of multimodal representation and inherits the knowledge from the teacher model to learn the difference between positive and negative instances and via that, can detect noisy and wrong negative samples effectively before they are calculated in the contrastive objective. Furthermore, to overcome the limitation of modeling the variation within negative pairs, we introduce a new contrastive objective, AdapACSE (Adaptive Angular Margin Supervised Contrastive Learning for Multimodal sentence embeddings), that enhances the discriminative representation by strengthening the margin within the angular space while capturing varying semantics within the negative. Experimental results on widely used Semantic Textual Similarity (STS) benchmarks demonstrate the effectiveness of our approach. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted to NAACL 2024

arXiv:2403.14343 [pdf]

Physics mechanisms of fines detachment and migration during CO2-water corefloods

Authors: C. Nguyen, G. Loi, T. Russell, Y. Yang, N. N. Zulkifli, M. I. Mahamad Amir, A. A. Abdul Manap, S. R. Mohd Shafian, A. Badalyan, P. Bedrikovetsky, A. Zeinijahromi

Abstract: One of the key risks for a Carbon Capture Storage (CCS) is injectivity decline. Evaporation of the connate brine in near-wellbore region during CO2 injection may result in drying-up the rock yielding the mobilisation and migration of clay particles leading to decline rock permeability and consequent loss of well injectivity. Influx of the reservoir brine into the dried-up zone yields accumulation… ▽ More One of the key risks for a Carbon Capture Storage (CCS) is injectivity decline. Evaporation of the connate brine in near-wellbore region during CO2 injection may result in drying-up the rock yielding the mobilisation and migration of clay particles leading to decline rock permeability and consequent loss of well injectivity. Influx of the reservoir brine into the dried-up zone yields accumulation of precipitated salt and injectivity decline. This paper presents the results of eight coreflooding experiments aiming investigation of the effect of rock dry-out, fines migration, and salt precipitation during CO2 injection. Pressure drops across the cores, brine saturation and produced clay fines concentration versus Pore Volume Injected (PVI) have been measured. All lab tests exhibit the following features: intensive fines production at the very beginning of gas-water production period following reduced-rate fines production during overall evaporation period and continuous fines disappearance at the late stage; abrupt increase in gas permeability in the middle of evaporation, and non-monotonic evaporation rate and pressure drop. To explain these phenomena, we distinguished three sequential regimes of fines detachment during two-phase displacement: (i) moving gas-water menisci; (ii) pendular rings of residual water; (iii) dry flux, and found that for the conditions of our corefloods, detachment is possible in regime (i) only. Fines production during overall evaporation period is explained by simultaneous occurrence of three regimes during unstable displacement of water by gas in micro-heterogeneous rock. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 33 pages, 52 figures

arXiv:2403.07763 [pdf, other]

Emerging Technologies for 6G Non-Terrestrial-Networks: From Academia to Industrial Applications

Authors: Cong T. Nguyen, Yuris Mulya Saputra, Nguyen Van Huynh, Tan N. Nguyen, Dinh Thai Hoang, Diep N Nguyen, Van-Quan Pham, Miroslav Voznak, Symeon Chatzinotas, Dinh-Hieu Tran

Abstract: Terrestrial networks form the fundamental infrastructure of modern communication systems, serving more than 4 billion users globally. However, terrestrial networks are facing a wide range of challenges, from coverage and reliability to interference and congestion. As the demands of the 6G era are expected to be much higher, it is crucial to address these challenges to ensure a robust and efficient… ▽ More Terrestrial networks form the fundamental infrastructure of modern communication systems, serving more than 4 billion users globally. However, terrestrial networks are facing a wide range of challenges, from coverage and reliability to interference and congestion. As the demands of the 6G era are expected to be much higher, it is crucial to address these challenges to ensure a robust and efficient communication infrastructure for the future. To address these problems, Non-terrestrial Network (NTN) has emerged to be a promising solution. NTNs are communication networks that leverage airborne (e.g., unmanned aerial vehicles) and spaceborne vehicles (e.g., satellites) to facilitate ultra-reliable communications and connectivity with high data rates and low latency over expansive regions. This article aims to provide a comprehensive survey on the utilization of network slicing, Artificial Intelligence/Machine Learning (AI/ML), and Open Radio Access Network (ORAN) to address diverse challenges of NTNs from the perspectives of both academia and industry. Particularly, we first provide an in-depth tutorial on NTN and the key enabling technologies including network slicing, AI/ML, and ORAN. Then, we provide a comprehensive survey on how network slicing and AI/ML have been leveraged to overcome the challenges that NTNs are facing. Moreover, we present how ORAN can be utilized for NTNs. Finally, we highlight important challenges, open issues, and future research directions of NTN in the 6G era. △ Less

Submitted 3 July, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: 35 pages

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2403.02548 [pdf, ps, other]

The least primary factor of the multiplicative group

Authors: Greg Martin, Chau Nguyen

Abstract: Let $S(n)$ denote the least primary factor in the primary decomposition of the multiplicative group $M_n = (\Bbb Z/n\Bbb Z)^\times$. We give an asymptotic formula, with order of magnitude $x/(\log x)^{1/2}$, for the counting function of those integers $n$ for which $S(n) \ne 2$. We also give an asymptotic formula, for any prime power $q$, for the counting function of those integers $n$ for which… ▽ More Let $S(n)$ denote the least primary factor in the primary decomposition of the multiplicative group $M_n = (\Bbb Z/n\Bbb Z)^\times$. We give an asymptotic formula, with order of magnitude $x/(\log x)^{1/2}$, for the counting function of those integers $n$ for which $S(n) \ne 2$. We also give an asymptotic formula, for any prime power $q$, for the counting function of those integers $n$ for which $S(n) = q$. This group-theoretic problem can be reduced to problems of counting integers with restrictions on their prime factors, allowing it to be addressed by classical techniques of analytic number theory. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 16 pages

MSC Class: 11N25; 11N37; 11N45; 11N64; 20K01

arXiv:2402.18998 [pdf, other]

COFT-AD: COntrastive Fine-Tuning for Few-Shot Anomaly Detection

Authors: Jingyi Liao, Xun Xu, Manh Cuong Nguyen, Adam Goodge, Chuan Sheng Foo

Abstract: Existing approaches towards anomaly detection~(AD) often rely on a substantial amount of anomaly-free data to train representation and density models. However, large anomaly-free datasets may not always be available before the inference stage; in which case an anomaly detection model must be trained with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). In this paper, we… ▽ More Existing approaches towards anomaly detection~(AD) often rely on a substantial amount of anomaly-free data to train representation and density models. However, large anomaly-free datasets may not always be available before the inference stage; in which case an anomaly detection model must be trained with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). In this paper, we propose a novel methodology to address the challenge of FSAD which incorporates two important techniques. Firstly, we employ a model pre-trained on a large source dataset to initialize model weights. Secondly, to ameliorate the covariate shift between source and target domains, we adopt contrastive training to fine-tune on the few-shot target domain data. To learn suitable representations for the downstream AD task, we additionally incorporate cross-instance positive pairs to encourage a tight cluster of the normal samples, and negative pairs for better separation between normal and synthesized negative samples. We evaluate few-shot anomaly detection on on 3 controlled AD tasks and 4 real-world AD tasks to demonstrate the effectiveness of the proposed method. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: IEEE Transactions on Image Processing

arXiv:2402.17269 [pdf, other]

Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition

Authors: Cam-Van Thi Nguyen, Cao-Bach Nguyen, Quang-Thuy Ha, Duc-Trong Le

Abstract: Emotion recognition in conversation (ERC) is a crucial task in natural language processing and affective computing. This paper proposes MultiDAG+CL, a novel approach for Multimodal Emotion Recognition in Conversation (ERC) that employs Directed Acyclic Graph (DAG) to integrate textual, acoustic, and visual features within a unified framework. The model is enhanced by Curriculum Learning (CL) to ad… ▽ More Emotion recognition in conversation (ERC) is a crucial task in natural language processing and affective computing. This paper proposes MultiDAG+CL, a novel approach for Multimodal Emotion Recognition in Conversation (ERC) that employs Directed Acyclic Graph (DAG) to integrate textual, acoustic, and visual features within a unified framework. The model is enhanced by Curriculum Learning (CL) to address challenges related to emotional shifts and data imbalance. Curriculum learning facilitates the learning process by gradually presenting training samples in a meaningful order, thereby improving the model's performance in handling emotional variations and data imbalance. Experimental results on the IEMOCAP and MELD datasets demonstrate that the MultiDAG+CL models outperform baseline models. We release the code for MultiDAG+CL and experiments: https://github.com/vanntc711/MultiDAG-CL △ Less

Submitted 8 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted by LREC-COLING 2024

arXiv:2402.15677 [pdf, other]

Consensus seeking in diffusive multidimensional networks with a repeated interaction pattern and time-delays

Authors: Hoang Huy Vu, Quyen Ngoc Nguyen, Chuong Van Nguyen, Tuynh Van Pham, Minh Hoang Trinh

Abstract: This paper studies a consensus problem in multidimensional networks having the same agent-to-agent interaction pattern under both intra- and cross-layer time delays. Several conditions for the agents to globally asymptotically achieve a consensus are derived, which involve the overall network's structure, the local interacting pattern, and the values of the time delays. The validity of these condi… ▽ More This paper studies a consensus problem in multidimensional networks having the same agent-to-agent interaction pattern under both intra- and cross-layer time delays. Several conditions for the agents to globally asymptotically achieve a consensus are derived, which involve the overall network's structure, the local interacting pattern, and the values of the time delays. The validity of these conditions is proved by direct eigenvalue evaluation and supported by numerical simulations. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: 6 pages, 7 figures, submitted to a journal

arXiv:2402.13549 [pdf, ps, other]

Q-learning-based Joint Design of Adaptive Modulation and Precoding for Physical Layer Security in Visible Light Communications

Authors: Duc M. T. Hoang, Thanh V. Pham, Anh T. Pham, Chuyen T Nguyen

Abstract: There has been an increasing interest in physical layer security (PLS), which, compared with conventional cryptography, offers a unique approach to guaranteeing information confidentiality against eavesdroppers. In this paper, we study a joint design of adaptive $M$-ary pulse amplitude modulation (PAM) and precoding, which aims to optimize wiretap visible-light channels' secrecy capacity and bit e… ▽ More There has been an increasing interest in physical layer security (PLS), which, compared with conventional cryptography, offers a unique approach to guaranteeing information confidentiality against eavesdroppers. In this paper, we study a joint design of adaptive $M$-ary pulse amplitude modulation (PAM) and precoding, which aims to optimize wiretap visible-light channels' secrecy capacity and bit error rate (BER) performances. The proposed design is motivated by higher-order modulation, which results in better secrecy capacity at the expense of a higher BER. On the other hand, a proper precoding design, which can manipulate the received signal quality at the legitimate user and the eavesdropper, can also enhance secrecy performance and influence the BER. A reward function that considers the secrecy capacity and the BERs of the legitimate user's (Bob) and the eavesdropper's (Eve) channels is introduced and maximized. Due to the non-linearity and complexity of the reward function, it is challenging to solve the optical design using classical optimization techniques. Therefore, reinforcement learning-based designs using Q-learning and Deep Q-learning are proposed to maximize the reward function. Simulation results verify that compared with the baseline designs, the proposed joint designs achieve better reward values while maintaining the BER of Bob's channel (Eve's channel) well below (above) the pre-FEC (forward error correction) BER threshold. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12503 [pdf, other]

PARCv2: Physics-aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics Modeling

Authors: Phong C. H. Nguyen, Xinlun Cheng, Shahab Azarfar, Pradeep Seshadri, Yen T. Nguyen, Munho Kim, Sanghun Choi, H. S. Udaykumar, Stephen Baek

Abstract: Modeling unsteady, fast transient, and advection-dominated physics problems is a pressing challenge for physics-aware deep learning (PADL). The physics of complex systems is governed by large systems of partial differential equations (PDEs) and ancillary constitutive models with nonlinear structures, as well as evolving state fields exhibiting sharp gradients and rapidly deforming material interfa… ▽ More Modeling unsteady, fast transient, and advection-dominated physics problems is a pressing challenge for physics-aware deep learning (PADL). The physics of complex systems is governed by large systems of partial differential equations (PDEs) and ancillary constitutive models with nonlinear structures, as well as evolving state fields exhibiting sharp gradients and rapidly deforming material interfaces. Here, we investigate an inductive bias approach that is versatile and generalizable to model generic nonlinear field evolution problems. Our study focuses on the recent physics-aware recurrent convolutions (PARC), which incorporates a differentiator-integrator architecture that inductively models the spatiotemporal dynamics of generic physical systems. We extend the capabilities of PARC to simulate unsteady, transient, and advection-dominant systems. The extended model, referred to as PARCv2, is equipped with differential operators to model advection-reaction-diffusion equations, as well as a hybrid integral solver for stable, long-time predictions. PARCv2 is tested on both standard benchmark problems in fluid dynamics, namely Burgers and Navier-Stokes equations, and then applied to more complex shock-induced reaction problems in energetic materials. We evaluate the behavior of PARCv2 in comparison to other physics-informed and learning bias models and demonstrate its potential to model unsteady and advection-dominant dynamics regimes. △ Less

Submitted 24 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.07577 [pdf, other]

Topic Modeling as Multi-Objective Contrastive Optimization

Authors: Thong Nguyen, Xiaobao Wu, Xinshuai Dong, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu

Abstract: Recent representation learning approaches enhance neural topic models by optimizing the weighted linear combination of the evidence lower bound (ELBO) of the log-likelihood and the contrastive learning objective that contrasts pairs of input documents. However, document-level contrastive learning might capture low-level mutual information, such as word ratio, which disturbs topic modeling. Moreove… ▽ More Recent representation learning approaches enhance neural topic models by optimizing the weighted linear combination of the evidence lower bound (ELBO) of the log-likelihood and the contrastive learning objective that contrasts pairs of input documents. However, document-level contrastive learning might capture low-level mutual information, such as word ratio, which disturbs topic modeling. Moreover, there is a potential conflict between the ELBO loss that memorizes input details for better reconstruction quality, and the contrastive loss which attempts to learn topic representations that generalize among input documents. To address these issues, we first introduce a novel contrastive learning method oriented towards sets of topic vectors to capture useful semantics that are shared among a set of input documents. Secondly, we explicitly cast contrastive topic modeling as a gradient-based multi-objective optimization problem, with the goal of achieving a Pareto stationary solution that balances the trade-off between the ELBO and the contrastive objective. Extensive experiments demonstrate that our framework consistently produces higher-performing neural topic models in terms of topic coherence, topic diversity, and downstream performance. △ Less

Submitted 9 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted at ICLR 2024 (poster)

arXiv:2402.06682 [pdf, other]

Private Knowledge Sharing in Distributed Learning: A Survey

Authors: Yasas Supeksala, Dinh C. Nguyen, Ming Ding, Thilina Ranbaduge, Calson Chua, Jun Zhang, Jun Li, H. Vincent Poor

Abstract: The rise of Artificial Intelligence (AI) has revolutionized numerous industries and transformed the way society operates. Its widespread use has led to the distribution of AI and its underlying data across many intelligent systems. In this light, it is crucial to utilize information in learning processes that are either distributed or owned by different entities. As a result, modern data-driven se… ▽ More The rise of Artificial Intelligence (AI) has revolutionized numerous industries and transformed the way society operates. Its widespread use has led to the distribution of AI and its underlying data across many intelligent systems. In this light, it is crucial to utilize information in learning processes that are either distributed or owned by different entities. As a result, modern data-driven services have been developed to integrate distributed knowledge entities into their outcomes. In line with this goal, the latest AI models are frequently trained in a decentralized manner. Distributed learning involves multiple entities working together to make collective predictions and decisions. However, this collaboration can also bring about security vulnerabilities and challenges. This paper provides an in-depth survey on private knowledge sharing in distributed learning, examining various knowledge components utilized in leading distributed learning architectures. Our analysis sheds light on the most critical vulnerabilities that may arise when using these components in a distributed setting. We further identify and examine defensive strategies for preserving the privacy of these knowledge components and preventing malicious parties from manipulating or accessing the knowledge information. Finally, we highlight several key limitations of knowledge sharing in distributed learning and explore potential avenues for future research. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Manuscript submitted to ACM

arXiv:2402.03832 [pdf, other]

Rethinking Skill Extraction in the Job Market Domain using Large Language Models

Authors: Khanh Cao Nguyen, Mike Zhang, Syrielle Montariol, Antoine Bosselut

Abstract: Skill Extraction involves identifying skills and qualifications mentioned in documents such as job postings and resumes. The task is commonly tackled by training supervised models using a sequence labeling approach with BIO tags. However, the reliance on manually annotated data limits the generalizability of such approaches. Moreover, the common BIO setting limits the ability of the models to capt… ▽ More Skill Extraction involves identifying skills and qualifications mentioned in documents such as job postings and resumes. The task is commonly tackled by training supervised models using a sequence labeling approach with BIO tags. However, the reliance on manually annotated data limits the generalizability of such approaches. Moreover, the common BIO setting limits the ability of the models to capture complex skill patterns and handle ambiguous mentions. In this paper, we explore the use of in-context learning to overcome these challenges, on a benchmark of 6 uniformized skill extraction datasets. Our approach leverages the few-shot learning capabilities of large language models (LLMs) to identify and extract skills from sentences. We show that LLMs, despite not being on par with traditional supervised models in terms of performance, can better handle syntactically complex skill mentions in skill extraction tasks. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Published at NLP4HR 2024 (EACL Workshop)

arXiv:2402.02319 [pdf]

Smart Textile-Driven Soft Spine Exosuit for Lifting Tasks in Industrial Applications

Authors: Kefan Zhu, Bibhu Sharma, Phuoc Thien Phan, James Davies, Mai Thanh Thai, Trung Thien Hoang, Chi Cong Nguyen, Adrienne Ji, Emanuele Nicotra, Nigel H. Lovell, Thanh Nho Do

Abstract: Work related musculoskeletal disorders (WMSDs) are often caused by repetitive lifting, making them a significant concern in occupational health. Although wearable assist devices have become the norm for mitigating the risk of back pain, most spinal assist devices still possess a partially rigid structure that impacts the user comfort and flexibility. This paper addresses this issue by presenting a… ▽ More Work related musculoskeletal disorders (WMSDs) are often caused by repetitive lifting, making them a significant concern in occupational health. Although wearable assist devices have become the norm for mitigating the risk of back pain, most spinal assist devices still possess a partially rigid structure that impacts the user comfort and flexibility. This paper addresses this issue by presenting a smart textile actuated spine assistance robotic exosuit (SARE), which can conform to the back seamlessly without impeding the user movement and is incredibly lightweight. The SARE can assist the human erector spinae to complete any action with virtually infinite degrees of freedom. To detect the strain on the spine and to control the smart textile automatically, a soft knitting sensor which utilizes fluid pressure as sensing element is used. The new device is validated experimentally with human subjects where it reduces peak electromyography (EMG) signals of lumbar erector spinae by around 32 percent in loaded and around 22 percent in unloaded conditions. Moreover, the integrated EMG decreased by around 24.2 percent under loaded condition and around 23.6 percent under unloaded condition. In summary, the artificial muscle wearable device represents an anatomical solution to reduce the risk of muscle strain, metabolic energy cost and back pain associated with repetitive lifting tasks. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 6 pages, 7 figures

arXiv:2402.02021 [pdf, other]

Transfer Learning in ECG Diagnosis: Is It Effective?

Authors: Cuong V. Nguyen, Cuong D. Do

Abstract: The adoption of deep learning in ECG diagnosis is often hindered by the scarcity of large, well-labeled datasets in real-world scenarios, leading to the use of transfer learning to leverage features learned from larger datasets. Yet the prevailing assumption that transfer learning consistently outperforms training from scratch has never been systematically validated. In this study, we conduct the… ▽ More The adoption of deep learning in ECG diagnosis is often hindered by the scarcity of large, well-labeled datasets in real-world scenarios, leading to the use of transfer learning to leverage features learned from larger datasets. Yet the prevailing assumption that transfer learning consistently outperforms training from scratch has never been systematically validated. In this study, we conduct the first extensive empirical study on the effectiveness of transfer learning in multi-label ECG classification, by investigating comparing the fine-tuning performance with that of training from scratch, covering a variety of ECG datasets and deep neural networks. We confirm that fine-tuning is the preferable choice for small downstream datasets; however, when the dataset is sufficiently large, training from scratch can achieve comparable performance, albeit requiring a longer training time to catch up. Furthermore, we find that transfer learning exhibits better compatibility with convolutional neural networks than with recurrent neural networks, which are the two most prevalent architectures for time-series ECG applications. Our results underscore the importance of transfer learning in ECG diagnosis, yet depending on the amount of available data, researchers may opt not to use it, considering the non-negligible cost associated with pre-training. △ Less

Submitted 26 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.17897 [pdf, ps, other]

Employing Label Models on ChatGPT Answers Improves Legal Text Entailment Performance

Authors: Chau Nguyen, Le-Minh Nguyen

Abstract: The objective of legal text entailment is to ascertain whether the assertions in a legal query logically follow from the information provided in one or multiple legal articles. ChatGPT, a large language model, is robust in many natural language processing tasks, including legal text entailment: when we set the temperature = 0 (the ChatGPT answers are deterministic) and prompt the model, it achieve… ▽ More The objective of legal text entailment is to ascertain whether the assertions in a legal query logically follow from the information provided in one or multiple legal articles. ChatGPT, a large language model, is robust in many natural language processing tasks, including legal text entailment: when we set the temperature = 0 (the ChatGPT answers are deterministic) and prompt the model, it achieves 70.64% accuracy on COLIEE 2022 dataset, which outperforms the previous SOTA of 67.89%. On the other hand, if the temperature is larger than zero, ChatGPT answers are not deterministic, leading to inconsistent answers and fluctuating results. We propose to leverage label models (a fundamental component of weak supervision techniques) to integrate the provisional answers by ChatGPT into consolidated labels. By that way, we treat ChatGPT provisional answers as noisy predictions which can be consolidated by label models. The experimental results demonstrate that this approach can attain an accuracy of 76.15%, marking a significant improvement of 8.26% over the prior state-of-the-art benchmark. Additionally, we perform an analysis of the instances where ChatGPT produces incorrect answers, then we classify the errors, offering insights that could guide potential enhancements for future research endeavors. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 15 pages

arXiv:2401.15625 [pdf, other]

Generative AI-enabled Blockchain Networks: Fundamentals, Applications, and Case Study

Authors: Cong T. Nguyen, Yinqiu Liu, Hongyang Du, Dinh Thai Hoang, Dusit Niyato, Diep N. Nguyen, Shiwen Mao

Abstract: Generative Artificial Intelligence (GAI) has recently emerged as a promising solution to address critical challenges of blockchain technology, including scalability, security, privacy, and interoperability. In this paper, we first introduce GAI techniques, outline their applications, and discuss existing solutions for integrating GAI into blockchains. Then, we discuss emerging solutions that demon… ▽ More Generative Artificial Intelligence (GAI) has recently emerged as a promising solution to address critical challenges of blockchain technology, including scalability, security, privacy, and interoperability. In this paper, we first introduce GAI techniques, outline their applications, and discuss existing solutions for integrating GAI into blockchains. Then, we discuss emerging solutions that demonstrate the effectiveness of GAI in addressing various challenges of blockchain, such as detecting unknown blockchain attacks and smart contract vulnerabilities, designing key secret sharing schemes, and enhancing privacy. Moreover, we present a case study to demonstrate that GAI, specifically the generative diffusion model, can be employed to optimize blockchain network performance metrics. Experimental results clearly show that, compared to a baseline traditional AI approach, the proposed generative diffusion model approach can converge faster, achieve higher rewards, and significantly improve the throughput and latency of the blockchain network. Additionally, we highlight future research directions for GAI in blockchain applications, including personalized GAI-enabled blockchains, GAI-blockchain synergy, and privacy and security considerations within blockchain ecosystems. △ Less

Submitted 28 January, 2024; originally announced January 2024.

arXiv:2401.14420 [pdf, other]

A Novel Blockchain Based Information Management Framework for Web 3.0

Authors: Md Arif Hassan, Cong T. Nguyen, Chi-Hieu Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Eryk Dutkiewicz

Abstract: Web 3.0 is the third generation of the World Wide Web (WWW), concentrating on the critical concepts of decentralization, availability, and increasing client usability. Although Web 3.0 is undoubtedly an essential component of the future Internet, it currently faces critical challenges, including decentralized data collection and management. To overcome these challenges, blockchain has emerged as o… ▽ More Web 3.0 is the third generation of the World Wide Web (WWW), concentrating on the critical concepts of decentralization, availability, and increasing client usability. Although Web 3.0 is undoubtedly an essential component of the future Internet, it currently faces critical challenges, including decentralized data collection and management. To overcome these challenges, blockchain has emerged as one of the core technologies for the future development of Web 3.0. In this paper, we propose a novel blockchain-based information management framework, namely Smart Blockchain-based Web, to manage information in Web 3.0 effectively, enhance the security and privacy of users data, bring additional profits, and incentivize users to contribute information to the websites. Particularly, SBW utilizes blockchain technology and smart contracts to manage the decentralized data collection process for Web 3.0 effectively. Moreover, in this framework, we develop an effective consensus mechanism based on Proof-of-Stake to reward the user's information contribution and conduct game theoretical analysis to analyze the users behavior in the considered system. Additionally, we conduct simulations to assess the performance of SBW and investigate the impact of critical parameters on information contribution. The findings confirm our theoretical analysis and demonstrate that our proposed consensus mechanism can incentivize the nodes and users to contribute more information to our systems. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.14113 [pdf, other]

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling

Authors: Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu

Abstract: Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity. However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-… ▽ More Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity. However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo). Instead of early simple topic dependencies, we propose a transport plan dependency method. It constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them. This improves affinity and diversity of hierarchies. We further propose a context-aware disentangled decoder. Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding. This facilitates the rationality of hierarchies. Experiments on benchmark datasets demonstrate that our method surpasses state-of-the-art baselines, effectively improving the affinity, rationality, and diversity of hierarchical topic modeling with better performance on downstream tasks. △ Less

Submitted 31 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted to AAAI2024 conference. Our code is available at https://github.com/bobxwu/TraCo

arXiv:2401.10901 [pdf, other]

Enabling Technologies for Web 3.0: A Comprehensive Survey

Authors: Md Arif Hassan, Mohammad Behdad Jamshidi, Bui Duc Manh, Nam H. Chu, Chi-Hieu Nguyen, Nguyen Quang Hieu, Cong T. Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Nguyen Van Huynh, Mohammad Abu Alsheikh, Eryk Dutkiewicz

Abstract: Web 3.0 represents the next stage of Internet evolution, aiming to empower users with increased autonomy, efficiency, quality, security, and privacy. This evolution can potentially democratize content access by utilizing the latest developments in enabling technologies. In this paper, we conduct an in-depth survey of enabling technologies in the context of Web 3.0, such as blockchain, semantic web… ▽ More Web 3.0 represents the next stage of Internet evolution, aiming to empower users with increased autonomy, efficiency, quality, security, and privacy. This evolution can potentially democratize content access by utilizing the latest developments in enabling technologies. In this paper, we conduct an in-depth survey of enabling technologies in the context of Web 3.0, such as blockchain, semantic web, 3D interactive web, Metaverse, Virtual reality/Augmented reality, Internet of Things technology, and their roles in shaping Web 3.0. We commence by providing a comprehensive background of Web 3.0, including its concept, basic architecture, potential applications, and industry adoption. Subsequently, we examine recent breakthroughs in IoT, 5G, and blockchain technologies that are pivotal to Web 3.0 development. Following that, other enabling technologies, including AI, semantic web, and 3D interactive web, are discussed. Utilizing these technologies can effectively address the critical challenges in realizing Web 3.0, such as ensuring decentralized identity, platform interoperability, data transparency, reducing latency, and enhancing the system's scalability. Finally, we highlight significant challenges associated with Web 3.0 implementation, emphasizing potential solutions and providing insights into future research directions in this field. △ Less

Submitted 29 December, 2023; originally announced January 2024.

arXiv:2401.08723 [pdf, other]

HierSFL: Local Differential Privacy-aided Split Federated Learning in Mobile Edge Computing

Authors: Minh K. Quan, Dinh C. Nguyen, Van-Dinh Nguyen, Mayuri Wijayasundara, Sujeeva Setunge, Pubudu N. Pathirana

Abstract: Federated Learning is a promising approach for learning from user data while preserving data privacy. However, the high requirements of the model training process make it difficult for clients with limited memory or bandwidth to participate. To tackle this problem, Split Federated Learning is utilized, where clients upload their intermediate model training outcomes to a cloud server for collaborat… ▽ More Federated Learning is a promising approach for learning from user data while preserving data privacy. However, the high requirements of the model training process make it difficult for clients with limited memory or bandwidth to participate. To tackle this problem, Split Federated Learning is utilized, where clients upload their intermediate model training outcomes to a cloud server for collaborative server-client model training. This methodology facilitates resource-constrained clients' participation in model training but also increases the training time and communication overhead. To overcome these limitations, we propose a novel algorithm, called Hierarchical Split Federated Learning (HierSFL), that amalgamates models at the edge and cloud phases, presenting qualitative directives for determining the best aggregation timeframes to reduce computation and communication expenses. By implementing local differential privacy at the client and edge server levels, we enhance privacy during local model parameter updates. Our experiments using CIFAR-10 and MNIST datasets show that HierSFL outperforms standard FL approaches with better training accuracy, training time, and communication-computing trade-offs. HierSFL offers a promising solution to mobile edge computing's challenges, ultimately leading to faster content delivery and improved mobile service quality. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 6 Pages, 5 figures, IEEE Virtual Conference on Communications 2023

arXiv:2401.03917 [pdf, other]

Toward a comprehensive simulation framework for hypergraphs: a Python-base approach

Authors: Quoc Chuong Nguyen, Trung Kien Le

Abstract: Hypergraphs, or generalization of graphs such that edges can contain more than two nodes, have become increasingly prominent in understanding complex network analysis. Unlike graphs, hypergraphs have relatively few supporting platforms, and such dearth presents a barrier to more widespread adaptation of hypergraph computational toolboxes that could enable further research in several areas. Here, w… ▽ More Hypergraphs, or generalization of graphs such that edges can contain more than two nodes, have become increasingly prominent in understanding complex network analysis. Unlike graphs, hypergraphs have relatively few supporting platforms, and such dearth presents a barrier to more widespread adaptation of hypergraph computational toolboxes that could enable further research in several areas. Here, we introduce HyperRD, a Python package for hypergraph computation, simulation, and interoperability with other powerful Python packages in graph and hypergraph research. Then, we will introduce two models on hypergraph, the general Schelling's model and the SIR model, and simulate them with HyperRD. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 13 pages, 3 figures

arXiv:2401.03551 [pdf, other]

CAPTAIN at COLIEE 2023: Efficient Methods for Legal Information Retrieval and Entailment Tasks

Authors: Chau Nguyen, Phuong Nguyen, Thanh Tran, Dat Nguyen, An Trieu, Tin Pham, Anh Dang, Le-Minh Nguyen

Abstract: The Competition on Legal Information Extraction/Entailment (COLIEE) is held annually to encourage advancements in the automatic processing of legal texts. Processing legal documents is challenging due to the intricate structure and meaning of legal language. In this paper, we outline our strategies for tackling Task 2, Task 3, and Task 4 in the COLIEE 2023 competition. Our approach involved utiliz… ▽ More The Competition on Legal Information Extraction/Entailment (COLIEE) is held annually to encourage advancements in the automatic processing of legal texts. Processing legal documents is challenging due to the intricate structure and meaning of legal language. In this paper, we outline our strategies for tackling Task 2, Task 3, and Task 4 in the COLIEE 2023 competition. Our approach involved utilizing appropriate state-of-the-art deep learning methods, designing methods based on domain characteristics observation, and applying meticulous engineering practices and methodologies to the competition. As a result, our performance in these tasks has been outstanding, with first places in Task 2 and Task 3, and promising results in Task 4. Our source code is available at https://github.com/Nguyen2015/CAPTAIN-COLIEE2023/tree/coliee2023. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.02093 [pdf, ps, other]

A new approach to convergence analysis of iterative models with optimal error bounds

Authors: Minh-Phuong Tran, Thanh-Nhan Nguyen, Thai-Hung Nguyen, Tan-Phuc Nguyen, Tien-Khai Nguyen, Cong-Duy-Nguyen Nguyen, Trung-Hieu Huynh

Abstract: In this paper, we study a new approach related to the convergence analysis of Ishikawa-type iterative models to a common fixed point of two non-expansive mappings in Banach spaces. The main novelty of our contribution lies in the so-called \emph{optimal error bounds}, which established some necessary and sufficient conditions for convergence and derived both the error estimates and bounds on the c… ▽ More In this paper, we study a new approach related to the convergence analysis of Ishikawa-type iterative models to a common fixed point of two non-expansive mappings in Banach spaces. The main novelty of our contribution lies in the so-called \emph{optimal error bounds}, which established some necessary and sufficient conditions for convergence and derived both the error estimates and bounds on the convergence rates for iterative schemes. Although a special interest here is devoted to the Ishikawa and modified Ishikawa iterative sequences, the theory of \emph{optimal error bounds} proposed in this paper can also be favorably applied to various types of iterative models to approximate common fixed points of non-expansive mappings. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 29 pages, 9 figures

arXiv:2401.00165 [pdf, other]

Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization

Authors: Shiqi Wang, Yeqin Zhang, Cam-Tu Nguyen

Abstract: In open-domain Question Answering (QA), dense retrieval is crucial for finding relevant passages for answer generation. Typically, contrastive learning is used to train a retrieval model that maps passages and queries to the same semantic space. The objective is to make similar ones closer and dissimilar ones further apart. However, training such a system is challenging due to the false negative i… ▽ More In open-domain Question Answering (QA), dense retrieval is crucial for finding relevant passages for answer generation. Typically, contrastive learning is used to train a retrieval model that maps passages and queries to the same semantic space. The objective is to make similar ones closer and dissimilar ones further apart. However, training such a system is challenging due to the false negative issue, where relevant passages may be missed during data annotation. Hard negative sampling, which is commonly used to improve contrastive learning, can introduce more noise in training. This is because hard negatives are those closer to a given query, and thus more likely to be false negatives. To address this issue, we propose a novel contrastive confidence regularizer for Noise Contrastive Estimation (NCE) loss, a commonly used loss for dense retrieval. Our analysis shows that the regularizer helps dense retrieval models be more robust against false negatives with a theoretical guarantee. Additionally, we propose a model-agnostic method to filter out noisy negative passages in the dataset, improving any downstream dense retrieval models. Through experiments on three datasets, we demonstrate that our method achieves better retrieval performance in comparison to existing state-of-the-art dense retrieval systems. △ Less

Submitted 13 January, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

Comments: Accepted by AAAI24

arXiv:2312.17619 [pdf, other]

Discontinuous Galerkin Methods for Hypersonic Flows

Authors: Dominique S. Hoskin, R. Loek Van Heyningen, Ngoc Cuong Nguyen, Jordi Vila-Pérez, Wesley L. Harris, Jaime Peraire

Abstract: In recent years, high-order discontinuous Galerkin (DG) methods have emerged as an attractive approach for numerical simulations of compressible flows. This paper presents an overview of the recent development of DG methods for compressible flows with particular focus on hypersononic flows. First, we survey state-of-the-art DG methods for computational fluid dynamics. Next, we discuss both matrix-… ▽ More In recent years, high-order discontinuous Galerkin (DG) methods have emerged as an attractive approach for numerical simulations of compressible flows. This paper presents an overview of the recent development of DG methods for compressible flows with particular focus on hypersononic flows. First, we survey state-of-the-art DG methods for computational fluid dynamics. Next, we discuss both matrix-based and matrix-free iterative methods for the solution of discrete systems stemming from the spatial DG discretizations of the compressible Navier-Stokes equations. We then describe various shock capturing methods to deal with strong shock waves in hypersonic flows. We discuss adaptivity techniques to refine high-order meshes, and synthetic boundary conditions to simulate free-stream disturbances in hypersonic boundary layers. We present a few examples to demonstrate the ability of high-order DG methods to provide accurate solutions of hypersonic laminar flows. Furthermore, we present direct numerical simulations of hypersonic transitional flow past a flared cone at Reynolds number $10.8 \times 10^6$, and hypersonic transitional shock wave boundary layer interaction flow over a flat plate at Reynolds number $3.97 \times 10^6$. These simulations run entirely on hundreds of graphics processing units (GPUs) and demonstrate the ability of DG methods to directly resolve hypersonic transitional flows, even at high Reynolds numbers, without relying on transition or turbulence models. We end the paper by offering our perspectives on error estimation, turbulence modeling, and real gas effects in hypersonic flows. △ Less

Submitted 29 December, 2023; originally announced December 2023.

Comments: 34 pages, 25 figures, and 1 table

MSC Class: 76M10; 76F65; 76K05

arXiv:2312.16631 [pdf, other]

doi 10.1103/PhysRevD.109.092008

Measurement of Electron Neutrino and Antineutrino Cross Sections at Low Momentum Transfer

Authors: S. Henry, H. Su, S. Akhter, Z. Ahmad Dar, V. Ansari, M. V. Ascencio, M. Sajjad Athar, A. Bashyal, M. Betancourt, J. L. Bonilla, A. Bravar, G. Caceres, G. A. Díaz, J. Felix, L. Fields, R. Fine, P. K. Gaur, S. M. Gilligan, R. Gran, E. Granados, D. A. Harris, A. L. Hart, J. Kleykamp, A. Klustová, M. Kordosky , et al. (31 additional authors not shown)

Abstract: Accelerator based neutrino oscillation experiments seek to measure the relative number of electron and muon neutrinos and antineutrinos at different $L/E$ values. However high statistics studies of neutrino interactions are almost exclusively measured using muon neutrinos and antineutrinos since the dominant flavor of neutrinos produced by accelerator based beams are of the muon type. This work re… ▽ More Accelerator based neutrino oscillation experiments seek to measure the relative number of electron and muon neutrinos and antineutrinos at different $L/E$ values. However high statistics studies of neutrino interactions are almost exclusively measured using muon neutrinos and antineutrinos since the dominant flavor of neutrinos produced by accelerator based beams are of the muon type. This work reports new measurements of electron neutrino and antineutrino interactions in hydrocarbon, obtained by strongly suppressing backgrounds initiated by muon flavor neutrinos and antineutrinos. Double differential cross sections as a function of visible energy transfer, $E_\text{avail}$, and transverse momentum transfer, $p_T$, or three momentum transfer, $q_3$ are presented. △ Less

Submitted 16 April, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: 25 pages, 32 figures and 7 tables, accepted for publication in Physical Review D. Revised to add content updated in review process

Report number: FERMILAB-PUB-23-0830-PPD

Journal ref: Physical Review D109, 092008 (2024)

arXiv:2312.06950 [pdf, other]

READ-PVLA: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling

Authors: Thong Nguyen, Xiaobao Wu, Xinshuai Dong, Khoi Le, Zhiyuan Hu, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

Abstract: Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization. With a growing number of tasks and limited training data, such full fine-tuning approach leads to costly model storage and unstable training. To overcome these shortcomings, we introduce lightweight adapte… ▽ More Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization. With a growing number of tasks and limited training data, such full fine-tuning approach leads to costly model storage and unstable training. To overcome these shortcomings, we introduce lightweight adapters to the pre-trained model and only update them at fine-tuning time. However, existing adapters fail to capture intrinsic temporal relations among video frames or textual words. Moreover, they neglect the preservation of critical task-related information that flows from the raw video-language input into the adapter's low-dimensional space. To address these issues, we first propose a novel REcurrent ADapter (READ) that employs recurrent computation to enable temporal modeling capability. Second, we propose Partial Video-Language Alignment (PVLA) objective via the use of partial optimal transport to maintain task-related information flowing into our READ modules. We validate our READ-PVLA framework through extensive experiments where READ-PVLA significantly outperforms all existing fine-tuning strategies on multiple low-resource temporal language grounding and video-language summarization benchmarks. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted at AAAI 2024

Showing 1–50 of 618 results for author: Nguyen, C