-
A fitted space-time finite element method for an advection-diffusion problem with moving interfaces
Authors:
Quang Huy Nguyen,
Van Chien Le,
Phuong Cuc Hoang,
Thi Thanh Mai Ta
Abstract:
This paper presents a fitted space-time finite element method for solving a parabolic advection-diffusion problem with a nonstationary interface. The jumping diffusion coefficient gives rise to the discontinuity of the spatial gradient of solution across the interface. We use the Banach-Necas-Babuska theorem to show the well-posedness of the continuous variational problem. A fully discrete finite-…
▽ More
This paper presents a fitted space-time finite element method for solving a parabolic advection-diffusion problem with a nonstationary interface. The jumping diffusion coefficient gives rise to the discontinuity of the spatial gradient of solution across the interface. We use the Banach-Necas-Babuska theorem to show the well-posedness of the continuous variational problem. A fully discrete finite-element based scheme is analyzed using the Galerkin method and unstructured fitted meshes. An optimal error estimate is established in a discrete energy norm under appropriate globally low but locally high regularity conditions. Some numerical results corroborate our theoretical results.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Multicell-Fold: geometric learning in folding multicellular life
Authors:
Haiqian Yang,
Anh Q. Nguyen,
Dapeng Bi,
Markus J. Buehler,
Ming Guo
Abstract:
During developmental processes such as embryogenesis, how a group of cells fold into specific structures, is a central question in biology that defines how living organisms form. Establishing tissue-level morphology critically relies on how every single cell decides to position itself relative to its neighboring cells. Despite its importance, it remains a major challenge to understand and predict…
▽ More
During developmental processes such as embryogenesis, how a group of cells fold into specific structures, is a central question in biology that defines how living organisms form. Establishing tissue-level morphology critically relies on how every single cell decides to position itself relative to its neighboring cells. Despite its importance, it remains a major challenge to understand and predict the behavior of every cell within the living tissue over time during such intricate processes. To tackle this question, we propose a geometric deep learning model that can predict multicellular folding and embryogenesis, accurately capturing the highly convoluted spatial interactions among cells. We demonstrate that multicellular data can be represented with both granular and foam-like physical pictures through a unified graph data structure, considering both cellular interactions and cell junction networks. We successfully use our model to achieve two important tasks, interpretable 4-D morphological sequence alignment, and predicting local cell rearrangements before they occur at single-cell resolution. Furthermore, using an activation map and ablation studies, we demonstrate that cell geometries and cell junction networks together regulate local cell rearrangement which is critical for embryo morphogenesis. This approach provides a novel paradigm to study morphogenesis, highlighting a unified data structure and harnessing the power of geometric deep learning to accurately model the mechanisms and behaviors of cells during development. It offers a pathway toward creating a unified dynamic morphological atlas for a variety of developmental processes such as embryogenesis.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Well-posedness for local and nonlocal quasilinear evolution equations in fluids and geometry
Authors:
Ke Chen,
Ruilin Hu,
Quoc-Hung Nguyen
Abstract:
We establish a Schauder-type estimate for general local and non-local linear parabolic system $$\partial_tu+\mathbf{L}_su=Λ^γf+g$$ in $(0,\infty)\times\mathbb{R}^d$ where $Λ=(-Δ)^{\frac{1}{2}}$, $0<γ\leq s$, $\mathbf{L}_s$ is the Pesudo-differential operator defined by \begin{equation}
\mathbf{L}_su(t,x)=(2π)^{-\frac{d}{2}}\int_{\mathbb{R}^d}\mathsf{A}(t,x,ξ)\hat u(t,ξ)e^{ix\cdotξ}dξ,\quad\quad…
▽ More
We establish a Schauder-type estimate for general local and non-local linear parabolic system $$\partial_tu+\mathbf{L}_su=Λ^γf+g$$ in $(0,\infty)\times\mathbb{R}^d$ where $Λ=(-Δ)^{\frac{1}{2}}$, $0<γ\leq s$, $\mathbf{L}_s$ is the Pesudo-differential operator defined by \begin{equation}
\mathbf{L}_su(t,x)=(2π)^{-\frac{d}{2}}\int_{\mathbb{R}^d}\mathsf{A}(t,x,ξ)\hat u(t,ξ)e^{ix\cdotξ}dξ,\quad\quad \mathsf{A}(t,x,ξ)\sim |ξ|^s.
\end{equation}
To prove this, we develop a new freezing coefficient method for kernel, where we freeze the coefficient at $x_0$, then derive a representation formula of the solution, and finally we take $x_0=x$ when estimating the solution.
By applying our Schauder-type estimate to suitably chosen differential operators $\mathcal{L}_s$, we obtain critical well-posedness results of various local and non-local nonlinear evolution equations in geometry and fluids, including hypoviscous Navier--Stokes equations, the surface quasi-geostrophic equation, mean curvature equations, Willmore flow, surface diffusion flow, Peskin equations, thin-film equations and Muskat equations.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model
Authors:
Duy M. H. Nguyen,
An T. Le,
Trung Q. Nguyen,
Nghiem T. Diep,
Tai Nguyen,
Duy Duong-Tran,
Jan Peters,
Li Shen,
Mathias Niepert,
Daniel Sonntag
Abstract:
Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we c…
▽ More
Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model's feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT's characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Heterogeneous Hypergraph Embedding for Recommendation Systems
Authors:
Darnbi Sakong,
Viet Hung Vu,
Thanh Trung Huynh,
Phi Le Nguyen,
Hongzhi Yin,
Quoc Viet Hung Nguyen,
Thanh Tam Nguyen
Abstract:
Recent advancements in recommender systems have focused on integrating knowledge graphs (KGs) to leverage their auxiliary information. The core idea of KG-enhanced recommenders is to incorporate rich semantic information for more accurate recommendations. However, two main challenges persist: i) Neglecting complex higher-order interactions in the KG-based user-item network, potentially leading to…
▽ More
Recent advancements in recommender systems have focused on integrating knowledge graphs (KGs) to leverage their auxiliary information. The core idea of KG-enhanced recommenders is to incorporate rich semantic information for more accurate recommendations. However, two main challenges persist: i) Neglecting complex higher-order interactions in the KG-based user-item network, potentially leading to sub-optimal recommendations, and ii) Dealing with the heterogeneous modalities of input sources, such as user-item bipartite graphs and KGs, which may introduce noise and inaccuracies. To address these issues, we present a novel Knowledge-enhanced Heterogeneous Hypergraph Recommender System (KHGRec). KHGRec captures group-wise characteristics of both the interaction network and the KG, modeling complex connections in the KG. Using a collaborative knowledge heterogeneous hypergraph (CKHG), it employs two hypergraph encoders to model group-wise interdependencies and ensure explainability. Additionally, it fuses signals from the input graphs with cross-view self-supervised learning and attention mechanisms. Extensive experiments on four real-world datasets show our model's superiority over various state-of-the-art baselines, with an average 5.18\% relative improvement. Additional tests on noise resilience, missing data, and cold-start problems demonstrate the robustness of our KHGRec framework. Our model and evaluation datasets are publicly available at \url{https://github.com/viethungvu1998/KHGRec}.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Supporting Cross-language Cross-project Bug Localization Using Pre-trained Language Models
Authors:
Mahinthan Chandramohan,
Dai Quoc Nguyen,
Padmanabhan Krishnan,
Jovan Jancic
Abstract:
Automatically locating a bug within a large codebase remains a significant challenge for developers. Existing techniques often struggle with generalizability and deployment due to their reliance on application-specific data and large model sizes. This paper proposes a novel pre-trained language model (PLM) based technique for bug localization that transcends project and language boundaries. Our ap…
▽ More
Automatically locating a bug within a large codebase remains a significant challenge for developers. Existing techniques often struggle with generalizability and deployment due to their reliance on application-specific data and large model sizes. This paper proposes a novel pre-trained language model (PLM) based technique for bug localization that transcends project and language boundaries. Our approach leverages contrastive learning to enhance the representation of bug reports and source code. It then utilizes a novel ranking approach that combines commit messages and code segments. Additionally, we introduce a knowledge distillation technique that reduces model size for practical deployment without compromising performance.
This paper presents several key benefits. By incorporating code segment and commit message analysis alongside traditional file-level examination, our technique achieves better bug localization accuracy. Furthermore, our model excels at generalizability - trained on code from various projects and languages, it can effectively identify bugs in unseen codebases. To address computational limitations, we propose a CPU-compatible solution. In essence, proposed work presents a highly effective, generalizable, and efficient bug localization technique with the potential to real-world deployment.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Synaptic effects on the intermittent synchronization of gamma rhythms
Authors:
Quynh-Anh Nguyen,
Leonid L Rubchinsky
Abstract:
Synchronization of neural activity in the gamma frequency band is associated with various cognitive phenomena. Abnormalities of gamma synchronization may underlie symptoms of several neurological and psychiatric disorders such as schizophrenia and autism spectrum disorder. Properties of neural oscillations in the gamma band depend critically on the synaptic properties of the underlying circuits. T…
▽ More
Synchronization of neural activity in the gamma frequency band is associated with various cognitive phenomena. Abnormalities of gamma synchronization may underlie symptoms of several neurological and psychiatric disorders such as schizophrenia and autism spectrum disorder. Properties of neural oscillations in the gamma band depend critically on the synaptic properties of the underlying circuits. This study explores how synaptic properties in pyramidal-interneuronal circuits affect not only the average synchronization strength but also the fine temporal patterning of neural synchrony. If two signals show only moderate synchrony strength, it may be possible to consider these dynamics as alternating between synchronized and desynchronized states. We use a model of connected circuits that produces pyramidal-interneuronal gamma (PING) oscillations to explore the temporal patterning of synchronized and desynchronized intervals. Changes in synaptic strength may alter the temporal patterning of synchronized dynamics (even if the average synchrony strength is not changed). Larger values of local synaptic connections promote longer desynchronization durations, while larger values of long-range synaptic connections promote shorter desynchronization durations. Furthermore, we show that circuits with different temporal patterning of synchronization may have different sensitivity to synaptic input. Thus, the alterations of synaptic strength may mediate physiological properties of neural circuits not only through change in the average synchrony level of gamma oscillations, but also through change in how synchrony is patterned in time over very short time scales.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
X-Ray Constraints on Dark Photon Tridents
Authors:
Tim Linden,
Thong T. Q. Nguyen,
Tim M. P. Tait
Abstract:
Dark photons that are sufficiently light and/or weakly-interacting represent a compelling vision of dark matter. Dark photon decay into three photons, which we call the dark photon trident, can be the dominant channel when the dark photon mass falls below the electron pair threshold and can produce a significant flux of x-rays. We use 16 years of data from INTEGRAL/SPI to constrain sub-MeV dark ph…
▽ More
Dark photons that are sufficiently light and/or weakly-interacting represent a compelling vision of dark matter. Dark photon decay into three photons, which we call the dark photon trident, can be the dominant channel when the dark photon mass falls below the electron pair threshold and can produce a significant flux of x-rays. We use 16 years of data from INTEGRAL/SPI to constrain sub-MeV dark photon decay, producing new worlds-best constraints on the kinetic mixing parameter for dark photon masses between 61 keV and 1022 keV, and comment on the potential for future x-ray observatories to discover the trident decay process.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems
Authors:
Hung Vinh Tran,
Tong Chen,
Quoc Viet Hung Nguyen,
Zi Huang,
Lizhen Cui,
Hongzhi Yin
Abstract:
Since the creation of the Web, recommender systems (RSs) have been an indispensable mechanism in information filtering. State-of-the-art RSs primarily depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. To prevent over-parameterized embedding tables from harming scalability, both academia and industry have seen increasing efforts in c…
▽ More
Since the creation of the Web, recommender systems (RSs) have been an indispensable mechanism in information filtering. State-of-the-art RSs primarily depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. To prevent over-parameterized embedding tables from harming scalability, both academia and industry have seen increasing efforts in compressing RS embeddings. However, despite the prosperity of lightweight embedding-based RSs (LERSs), a wide diversity is seen in evaluation protocols, resulting in obstacles when relating LERS performance to real-world usability. Moreover, despite the common goal of lightweight embeddings, LERSs are evaluated with a single choice between the two main recommendation tasks -- collaborative filtering and content-based recommendation. This lack of discussions on cross-task transferability hinders the development of unified, more scalable solutions. Motivated by these issues, this study investigates various LERSs' performance, efficiency, and cross-task transferability via a thorough benchmarking process. Additionally, we propose an efficient embedding compression method using magnitude pruning, which is an easy-to-deploy yet highly competitive baseline that outperforms various complex LERSs. Our study reveals the distinct performance of LERSs across the two tasks, shedding light on their effectiveness and generalizability. To support edge-based recommendations, we tested all LERSs on a Raspberry Pi 4, where the efficiency bottleneck is exposed. Finally, we conclude this paper with critical summaries of LERS performance, model selection suggestions, and underexplored challenges around LERSs for future research. To encourage future research, we publish source codes and artifacts at \href{this link}{https://github.com/chenxing1999/recsys-benchmark}.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
NFL Ghosts: A framework for evaluating defender positioning with conditional density estimation
Authors:
Ronald Yurko,
Quang Nguyen,
Konstantinos Pelechrinis
Abstract:
Player attribution in American football remains an open problem due to the complex nature of twenty-two players interacting on the field, but the granularity of player tracking data provides ample opportunity for novel approaches. In this work, we introduce the first public framework to evaluate spatial and trajectory tracking data of players relative to a baseline distribution of "ghost" defender…
▽ More
Player attribution in American football remains an open problem due to the complex nature of twenty-two players interacting on the field, but the granularity of player tracking data provides ample opportunity for novel approaches. In this work, we introduce the first public framework to evaluate spatial and trajectory tracking data of players relative to a baseline distribution of "ghost" defenders. We demonstrate our framework in the context of modeling the nearest defender positioning at the moment of catch. In particular, we provide estimates of how much better or worse their observed positioning and trajectory compared to the expected play value of ghost defenders. Our framework leverages high-dimensional tracking data features through flexible random forests for conditional density estimation in two ways: (1) to model the distribution of receiver yards gained enabling the estimation of within-play expected value, and (2) to model the 2D spatial distribution of baseline ghost defenders. We present novel metrics for measuring player and team performance based on tracking data, and discuss challenges that remain in extending our framework to other aspects of American football.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Demonstration of a Squeezed Light Source on Thin-Film Lithium Niobate with Modal Phase Matching
Authors:
Tummas Napoleon Arge,
Seongmin Jo,
Huy Quang Nguyen,
Francesco Lenzini,
Emma Lomonte,
Jens Arnbak Holbøll Nielsen,
Renato R. Domeneguetti,
Jonas Schou Neergaard-Nielsen,
Wolfram Pernice,
Tobias Gehring,
Ulrik Lund Andersen
Abstract:
Squeezed states are essential for continuous variable (CV) quantum information processing, with wide-ranging applications in computing, sensing and communications. Integrated photonic circuits provide a scalable, convenient platform for building large CV circuits. Thin-film Lithium Niobate (TFLN) is particularly promising due to its low propagation loss, efficient parametric down conversion, and f…
▽ More
Squeezed states are essential for continuous variable (CV) quantum information processing, with wide-ranging applications in computing, sensing and communications. Integrated photonic circuits provide a scalable, convenient platform for building large CV circuits. Thin-film Lithium Niobate (TFLN) is particularly promising due to its low propagation loss, efficient parametric down conversion, and fast electro-optical modulation.
In this work, we demonstrate a squeezed light source on an integrated TFLN platform, achieving a measured shot noise reduction of 0.46 dB using modal phase matching and grating couplers with an efficiency of up to -2.2 dB.
The achieved squeezing is comparable to what has been observed using more complex circuitry based on periodic poling.
The simpler design allows for compact, efficient and reproducible sources of squeezed light.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Subdiffusive concentration for the chemical distance in Bernoulli percolation
Authors:
Van Hao Can,
Van Quyet Nguyen
Abstract:
Considering supercritical Bernoulli percolation on $\mathbb{Z}^d$, Garet and Marchand [GM09] proved a diffusive concentration for the graph distance. In this paper, we sharpen this result by establishing the subdiffusive concentration inequality, which revisits the sublinear bound of the variance proved by Dembin [Dem22] as a consequence. Our approach is inspired by similar work in First-passage p…
▽ More
Considering supercritical Bernoulli percolation on $\mathbb{Z}^d$, Garet and Marchand [GM09] proved a diffusive concentration for the graph distance. In this paper, we sharpen this result by establishing the subdiffusive concentration inequality, which revisits the sublinear bound of the variance proved by Dembin [Dem22] as a consequence. Our approach is inspired by similar work in First-passage percolation [BR08, DHS14], combined with new tools to address the challenge posed by the infinite weight of the model. These tools, including the notion of effective radius and its properties, enable a simple one-step renormalization process as a systematic means of managing the effects of resampling edges.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Infusing clinical knowledge into tokenisers for language models
Authors:
Abul Hasan,
Jinge Wu,
Quang Ngoc Nguyen,
Salomé Andres,
Imane Guellil,
Huayu Zhang,
Arlene Casey,
Beatrice Alex,
Bruce Guthrie,
Honghan Wu
Abstract:
This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At t…
▽ More
This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At training or inference stage, sentence level localised context will be utilised for choosing the optimal global token representation to realise the semantic-based tokenisation. To avoid pretraining using the new tokeniser, an embedding initialisation approach is proposed to generate representations for new tokens. Using three transformer-based language models, a comprehensive set of experiments are conducted on four real-world datasets for evaluating K-Tokeniser in a wide range of clinical text analytics tasks including clinical concept and relation extraction, automated clinical coding, clinical phenotype identification, and clinical research article classification. Overall, our models demonstrate consistent improvements over their counterparts in all tasks. In particular, substantial improvements are observed in the automated clinical coding task with 13\% increase on Micro $F_1$ score. Furthermore, K-Tokeniser also shows significant capacities in facilitating quicker converge of language models. Specifically, using K-Tokeniser, the language models would only require 50\% of the training data to achieve the best performance of the baseline tokeniser using all training data in the concept extraction task and less than 20\% of the data for the automated coding task. It is worth mentioning that all these improvements require no pre-training process, making the approach generalisable.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
No Analog Combiner TTD-based Hybrid Precoding for Multi-User Sub-THz Communications
Authors:
Dang Qua Nguyen,
Alexei Ashikhmin,
Hong Yang,
Taejoon Kim
Abstract:
We address the design and optimization of real-world-suitable hybrid precoders for multi-user wideband sub-terahertz (sub-THz) communications. We note that the conventional fully connected true-time delay (TTD)-based architecture is impractical because there is no room for the required large number of analog signal combiners in the circuit board. Additionally, analog signal combiners incur signifi…
▽ More
We address the design and optimization of real-world-suitable hybrid precoders for multi-user wideband sub-terahertz (sub-THz) communications. We note that the conventional fully connected true-time delay (TTD)-based architecture is impractical because there is no room for the required large number of analog signal combiners in the circuit board. Additionally, analog signal combiners incur significant signal power loss. These limitations are often overlooked in sub-THz research. To overcome these issues, we study a non-overlapping subarray architecture that eliminates the need for analog combiners. We extend the conventional single-user assumption by formulating an optimization problem to maximize the minimum data rate for simultaneously served users. This complex optimization problem is divided into two sub-problems. The first sub-problem aims to ensure a fair subarray allocation for all users and is solved via a continuous domain relaxation technique. The second sub-problem deals with practical TTD device constraints on range and resolution to maximize the subarray gain and is resolved by shifting to the phase domain. Our simulation results highlight significant performance gain for our real-world-ready TTD-based hybrid precoders.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
PTF-FSR: A Parameter Transmission-Free Federated Sequential Recommender System
Authors:
Wei Yuan,
Chaoqun Yang,
Liang Qu,
Quoc Viet Hung Nguyen,
Guanhua Ye,
Hongzhi Yin
Abstract:
Sequential recommender systems have made significant progress. Recently, due to increasing concerns about user data privacy, some researchers have implemented federated learning for sequential recommendation, a.k.a., Federated Sequential Recommender Systems (FedSeqRecs), in which a public sequential recommender model is shared and frequently transmitted between a central server and clients to achi…
▽ More
Sequential recommender systems have made significant progress. Recently, due to increasing concerns about user data privacy, some researchers have implemented federated learning for sequential recommendation, a.k.a., Federated Sequential Recommender Systems (FedSeqRecs), in which a public sequential recommender model is shared and frequently transmitted between a central server and clients to achieve collaborative learning. Although these solutions mitigate user privacy to some extent, they present two significant limitations that affect their practical usability: (1) They require a globally shared sequential recommendation model. However, in real-world scenarios, the recommendation model constitutes a critical intellectual property for platform and service providers. Therefore, service providers may be reluctant to disclose their meticulously developed models. (2) The communication costs are high as they correlate with the number of model parameters. This becomes particularly problematic as the current FedSeqRec will be inapplicable when sequential recommendation marches into a large language model era.
To overcome the above challenges, this paper proposes a parameter transmission-free federated sequential recommendation framework (PTF-FSR), which ensures both model and data privacy protection to meet the privacy needs of service providers and system users alike. Furthermore, since PTF-FSR only transmits prediction results under privacy protection, which are independent of model sizes, this new federated learning architecture can accommodate more complex and larger sequential recommendation models. Extensive experiments conducted on three widely used recommendation datasets, employing various sequential recommendation models from both ID-based and ID-free paradigms, demonstrate the effectiveness and generalization capability of our proposed framework.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Exploring magnetic fields in merging galaxy: combining polarization and velocity gradient in the Centaurus Galaxy
Authors:
Quynh Lan Nguyen,
Yue Hu,
Alex Lazarian
Abstract:
In this study, we apply the Velocity Gradient Technique (VGT) to the merging Centaurus galaxy. We compare gradient maps derived from the PHANGS-ALMA survey using CO emission lines with magnetic field tracings from dust polarization data obtained via the HAWC+ instrument. Our analysis reveals a strong correspondence between the directions indicated by these two tracers across most of the galactic i…
▽ More
In this study, we apply the Velocity Gradient Technique (VGT) to the merging Centaurus galaxy. We compare gradient maps derived from the PHANGS-ALMA survey using CO emission lines with magnetic field tracings from dust polarization data obtained via the HAWC+ instrument. Our analysis reveals a strong correspondence between the directions indicated by these two tracers across most of the galactic image. Specifically, we identify jet regions as areas of anti-alignment, consistent with previous reports that gradients tend to rotate 90 degrees in outflow regions. Statistically, we find that the alignment of magnetic fields, as revealed by polarization, is most accurate in regions with the highest signal-to-noise ratios. Our findings underscore the utility of velocity gradients as a valuable complementary tool for probing magnetic fields and dynamical processes in merging galaxies.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
PhoWhisper: Automatic Speech Recognition for Vietnamese
Authors:
Thanh-Thien Le,
Linh The Nguyen,
Dat Quoc Nguyen
Abstract:
We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com…
▽ More
We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com/VinAIResearch/PhoWhisper
△ Less
Submitted 27 March, 2024;
originally announced June 2024.
-
Fast-FedUL: A Training-Free Federated Unlearning with Provable Skew Resilience
Authors:
Thanh Trung Huynh,
Trong Bang Nguyen,
Phi Le Nguyen,
Thanh Tam Nguyen,
Matthias Weidlich,
Quoc Viet Hung Nguyen,
Karl Aberer
Abstract:
Federated learning (FL) has recently emerged as a compelling machine learning paradigm, prioritizing the protection of privacy for training data. The increasing demand to address issues such as ``the right to be forgotten'' and combat data poisoning attacks highlights the importance of techniques, known as \textit{unlearning}, which facilitate the removal of specific training data from trained FL…
▽ More
Federated learning (FL) has recently emerged as a compelling machine learning paradigm, prioritizing the protection of privacy for training data. The increasing demand to address issues such as ``the right to be forgotten'' and combat data poisoning attacks highlights the importance of techniques, known as \textit{unlearning}, which facilitate the removal of specific training data from trained FL models. Despite numerous unlearning methods proposed for centralized learning, they often prove inapplicable to FL due to fundamental differences in the operation of the two learning paradigms. Consequently, unlearning in FL remains in its early stages, presenting several challenges. Many existing unlearning solutions in FL require a costly retraining process, which can be burdensome for clients. Moreover, these methods are primarily validated through experiments, lacking theoretical assurances. In this study, we introduce Fast-FedUL, a tailored unlearning method for FL, which eliminates the need for retraining entirely. Through meticulous analysis of the target client's influence on the global model in each round, we develop an algorithm to systematically remove the impact of the target client from the trained model. In addition to presenting empirical findings, we offer a theoretical analysis delineating the upper bound of our unlearned model and the exact retrained model (the one obtained through retraining using untargeted clients). Experimental results with backdoor attack scenarios indicate that Fast-FedUL effectively removes almost all traces of the target client, while retaining the knowledge of untargeted clients (obtaining a high accuracy of up to 98\% on the main task). Significantly, Fast-FedUL attains the lowest time complexity, providing a speed that is 1000 times faster than retraining. Our source code is publicly available at \url{https://github.com/thanhtrunghuynh93/fastFedUL}.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
Authors:
Quan Van Nguyen,
Huy Quang Pham,
Dan Quang Tran,
Thang Kien-Bao Nguyen,
Nhat-Hao Nguyen-Dang,
Bao-Thien Nguyen-Tat
Abstract:
Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Metho…
▽ More
Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery.
△ Less
Submitted 27 May, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation
Authors:
Ngoc-Du Tran,
Thi-Thao Tran,
Quang-Huy Nguyen,
Manh-Hung Vu,
Van-Truong Pham
Abstract:
The emergence of deep learning techniques has advanced the image segmentation task, especially for medical images. Many neural network models have been introduced in the last decade bringing the automated segmentation accuracy close to manual segmentation. However, cutting-edge models like Transformer-based architectures rely on large scale annotated training data, and are generally designed with…
▽ More
The emergence of deep learning techniques has advanced the image segmentation task, especially for medical images. Many neural network models have been introduced in the last decade bringing the automated segmentation accuracy close to manual segmentation. However, cutting-edge models like Transformer-based architectures rely on large scale annotated training data, and are generally designed with densely consecutive layers in the encoder, decoder, and skip connections resulting in large number of parameters. Additionally, for better performance, they often be pretrained on a larger data, thus requiring large memory size and increasing resource expenses. In this study, we propose a new lightweight but efficient model, namely LiteNeXt, based on convolutions and mixing modules with simplified decoder, for medical image segmentation. The model is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42). To handle boundary fuzzy as well as occlusion or clutter in objects especially in medical image regions, we propose the Marginal Weight Loss that can help effectively determine the marginal boundary between object and background. Furthermore, we propose the Self-embedding Representation Parallel technique, that can help augment the data in a self-learning manner. Experiments on public datasets including Data Science Bowls, GlaS, ISIC2018, PH2, and Sunnybrook data show promising results compared to other state-of-the-art CNN-based and Transformer-based architectures. Our code will be published at: https://github.com/tranngocduvnvp/LiteNeXt.
△ Less
Submitted 3 April, 2024;
originally announced May 2024.
-
Amortized nonmyopic active search via deep imitation learning
Authors:
Quan Nguyen,
Anindya Sarkar,
Roman Garnett
Abstract:
Active search formalizes a specialized active learning setting where the goal is to collect members of a rare, valuable class. The state-of-the-art algorithm approximates the optimal Bayesian policy in a budget-aware manner, and has been shown to achieve impressive empirical performance in previous work. However, even this approximate policy has a superlinear computational complexity with respect…
▽ More
Active search formalizes a specialized active learning setting where the goal is to collect members of a rare, valuable class. The state-of-the-art algorithm approximates the optimal Bayesian policy in a budget-aware manner, and has been shown to achieve impressive empirical performance in previous work. However, even this approximate policy has a superlinear computational complexity with respect to the size of the search problem, rendering its application impractical in large spaces or in real-time systems where decisions must be made quickly. We study the amortization of this policy by training a neural network to learn to search. To circumvent the difficulty of learning from scratch, we appeal to imitation learning techniques to mimic the behavior of the expert, expensive-to-compute policy. Our policy network, trained on synthetic data, learns a beneficial search strategy that yields nonmyopic decisions carefully balancing exploration and exploitation. Extensive experiments demonstrate our policy achieves competitive performance at real-world tasks that closely approximates the expert's at a fraction of the cost, while outperforming cheaper baselines.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition
Authors:
Xinyi Gao,
Tong Chen,
Wentao Zhang,
Junliang Yu,
Guanhua Ye,
Quoc Viet Hung Nguyen,
Hongzhi Yin
Abstract:
The increasing prevalence of large-scale graphs poses a significant challenge for graph neural network training, attributed to their substantial computational requirements. In response, graph condensation (GC) emerges as a promising data-centric solution aiming to substitute the large graph with a small yet informative condensed graph to facilitate data-efficient GNN training. However, existing GC…
▽ More
The increasing prevalence of large-scale graphs poses a significant challenge for graph neural network training, attributed to their substantial computational requirements. In response, graph condensation (GC) emerges as a promising data-centric solution aiming to substitute the large graph with a small yet informative condensed graph to facilitate data-efficient GNN training. However, existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources. In this paper, we revisit existing GC optimization strategies and identify two pervasive issues: 1. various GC optimization strategies converge to class-level node feature matching between the original and condensed graphs, making the optimization target coarse-grained despite the complex computations; 2. to bridge the original and condensed graphs, existing GC methods rely on a Siamese graph network architecture that requires time-consuming bi-level optimization with iterative gradient computations. To overcome these issues, we propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC), which refines the node feature matching from the class-to-class paradigm into a novel class-to-node paradigm. Remarkably, this refinement also simplifies the GC optimization as a class partition problem, which can be efficiently solved by any clustering methods. Moreover, CGC incorporates a pre-defined graph structure to enable a closed-form solution for condensed node features, eliminating the back-and-forth gradient descent in existing GC approaches without sacrificing accuracy. Extensive experiments demonstrate that CGC achieves state-of-the-art performance with a more efficient condensation process. For instance, compared with the seminal GC method (i.e., GCond), CGC condenses the largest Reddit graph within 10 seconds, achieving a 2,680X speedup and a 1.4% accuracy increase.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
RecGPT: Generative Pre-training for Text-based Recommendation
Authors:
Hoang Ngo,
Dat Quoc Nguyen
Abstract:
We present the first domain-adapted and fully-trained large language model, RecGPT-7B, and its instruction-following variant, RecGPT-7B-Instruct, for text-based recommendation. Experimental results on rating prediction and sequential recommendation tasks show that our model, RecGPT-7B-Instruct, outperforms previous strong baselines. We are releasing our RecGPT models as well as their pre-training…
▽ More
We present the first domain-adapted and fully-trained large language model, RecGPT-7B, and its instruction-following variant, RecGPT-7B-Instruct, for text-based recommendation. Experimental results on rating prediction and sequential recommendation tasks show that our model, RecGPT-7B-Instruct, outperforms previous strong baselines. We are releasing our RecGPT models as well as their pre-training and fine-tuning datasets to facilitate future research and downstream applications in text-based recommendation. Public "huggingface" links to our RecGPT models and datasets are available at: https://github.com/VinAIResearch/RecGPT
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
First joint oscillation analysis of Super-Kamiokande atmospheric and T2K accelerator neutrino data
Authors:
Super-Kamiokande,
T2K collaborations,
:,
S. Abe,
K. Abe,
N. Akhlaq,
R. Akutsu,
H. Alarakia-Charles,
A. Ali,
Y. I. Alj Hakim,
S. Alonso Monsalve,
S. Amanai,
C. Andreopoulos,
L. H. V. Anthony,
M. Antonova,
S. Aoki,
K. A. Apte,
T. Arai,
T. Arihara,
S. Arimoto,
Y. Asada,
R. Asaka,
Y. Ashida,
E. T. Atkin,
N. Babu
, et al. (524 additional authors not shown)
Abstract:
The Super-Kamiokande and T2K collaborations present a joint measurement of neutrino oscillation parameters from their atmospheric and beam neutrino data. It uses a common interaction model for events overlapping in neutrino energy and correlated detector systematic uncertainties between the two datasets, which are found to be compatible. Using 3244.4 days of atmospheric data and a beam exposure of…
▽ More
The Super-Kamiokande and T2K collaborations present a joint measurement of neutrino oscillation parameters from their atmospheric and beam neutrino data. It uses a common interaction model for events overlapping in neutrino energy and correlated detector systematic uncertainties between the two datasets, which are found to be compatible. Using 3244.4 days of atmospheric data and a beam exposure of $19.7(16.3) \times 10^{20}$ protons on target in (anti)neutrino mode, the analysis finds a 1.9$σ$ exclusion of CP-conservation (defined as $J_{CP}=0$) and a preference for the normal mass ordering.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Wolff potentials and nonlocal equations of Lane-Emden type
Authors:
Quoc-Hung Nguyen,
Jihoon Ok,
Kyeong Song
Abstract:
We consider nonlocal equations of the type \[ (-Δ_{p})^{s}u = μ\quad \text{in }Ω, \] where $Ω\subset \mathbb{R}^{n}$ is either a bounded domain or the whole $\mathbb{R}^{n}$, $μ$ is a Radon measure on $Ω$, $0<s<1$ and $1<p<n/s$. Especially, we extend the existence, regularity and Wolff potential estimates for SOLA (Solutions Obtained as Limits of Approximations), established by Kuusi, Mingione, an…
▽ More
We consider nonlocal equations of the type \[ (-Δ_{p})^{s}u = μ\quad \text{in }Ω, \] where $Ω\subset \mathbb{R}^{n}$ is either a bounded domain or the whole $\mathbb{R}^{n}$, $μ$ is a Radon measure on $Ω$, $0<s<1$ and $1<p<n/s$. Especially, we extend the existence, regularity and Wolff potential estimates for SOLA (Solutions Obtained as Limits of Approximations), established by Kuusi, Mingione, and Sire (Comm. Math. Phys. 337:1317--1368, 2015), to the strongly singular case $1<p\le2-s/n$. Moreover, using Wolff potentials and Orlicz capacities, we present both a sufficient and a necessary conditions for the existence of SOLA to nonlocal equations of the type \[ (-Δ_{p})^{s}u = P(u) + μ\quad \text{in }Ω, \] where $P(\cdot)$ is either a power function or an exponential function.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution
Authors:
Eslam Zaher,
Maciej Trzaskowski,
Quan Nguyen,
Fred Roosta
Abstract:
In this paper, we dive into the reliability concerns of Integrated Gradients (IG), a prevalent feature attribution method for black-box deep learning models. We particularly address two predominant challenges associated with IG: the generation of noisy feature visualizations for vision models and the vulnerability to adversarial attributional attacks. Our approach involves an adaptation of path-ba…
▽ More
In this paper, we dive into the reliability concerns of Integrated Gradients (IG), a prevalent feature attribution method for black-box deep learning models. We particularly address two predominant challenges associated with IG: the generation of noisy feature visualizations for vision models and the vulnerability to adversarial attributional attacks. Our approach involves an adaptation of path-based feature attribution, aligning the path of attribution more closely to the intrinsic geometry of the data manifold. Our experiments utilise deep generative models applied to several real-world image datasets. They demonstrate that IG along the geodesics conforms to the curved geometry of the Riemannian data manifold, generating more perceptually intuitive explanations and, subsequently, substantially increasing robustness to targeted attributional attacks.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Adaptation of Distinct Semantics for Uncertain Areas in Polyp Segmentation
Authors:
Quang Vinh Nguyen,
Van Thong Huynh,
Soo-Hyung Kim
Abstract:
Colonoscopy is a common and practical method for detecting and treating polyps. Segmenting polyps from colonoscopy image is useful for diagnosis and surgery progress. Nevertheless, achieving excellent segmentation performance is still difficult because of polyp characteristics like shape, color, condition, and obvious non-distinction from the surrounding context. This work presents a new novel arc…
▽ More
Colonoscopy is a common and practical method for detecting and treating polyps. Segmenting polyps from colonoscopy image is useful for diagnosis and surgery progress. Nevertheless, achieving excellent segmentation performance is still difficult because of polyp characteristics like shape, color, condition, and obvious non-distinction from the surrounding context. This work presents a new novel architecture namely Adaptation of Distinct Semantics for Uncertain Areas in Polyp Segmentation (ADSNet), which modifies misclassified details and recovers weak features having the ability to vanish and not be detected at the final stage. The architecture consists of a complementary trilateral decoder to produce an early global map. A continuous attention module modifies semantics of high-level features to analyze two separate semantics of the early global map. The suggested method is experienced on polyp benchmarks in learning ability and generalization ability, experimental results demonstrate the great correction and recovery ability leading to better segmentation performance compared to the other state of the art in the polyp image segmentation task. Especially, the proposed architecture could be experimented flexibly for other CNN-based encoders, Transformer-based encoders, and decoder backbones.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design
Authors:
Quan Nguyen,
Adji Bousso Dieng
Abstract:
Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data…
▽ More
Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data. In this paper, we extend the Vendi scores -- a family of interpretable similarity-based diversity metrics -- to account for quality. We then leverage these quality-weighted Vendi scores to tackle experimental design problems across various applications, including drug discovery, materials discovery, and reinforcement learning. We found that quality-weighted Vendi scores allow us to construct policies for experimental design that flexibly balance quality and diversity, and ultimately assemble rich and diverse sets of high-performing data points. Our algorithms led to a 70%-170% increase in the number of effective discoveries compared to baselines.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis
Authors:
Quy Hoang Nguyen,
Minh-Van Truong Nguyen,
Kiet Van Nguyen
Abstract:
The emergence of multimodal data on social media platforms presents new opportunities to better understand user sentiments toward a given aspect. However, existing multimodal datasets for Aspect-Category Sentiment Analysis (ACSA) often focus on textual annotations, neglecting fine-grained information in images. Consequently, these datasets fail to fully exploit the richness inherent in multimodal.…
▽ More
The emergence of multimodal data on social media platforms presents new opportunities to better understand user sentiments toward a given aspect. However, existing multimodal datasets for Aspect-Category Sentiment Analysis (ACSA) often focus on textual annotations, neglecting fine-grained information in images. Consequently, these datasets fail to fully exploit the richness inherent in multimodal. To address this, we introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4,876 text-image pairs with 14,618 fine-grained annotations for both text and image in the hotel domain. Additionally, we propose a Fine-Grained Cross-Modal Fusion Framework (FCMF) that effectively learns both intra- and inter-modality interactions and then fuses these information to produce a unified multimodal representation. Experimental results show that our framework outperforms SOTA models on the ViMACSA dataset, achieving the highest F1 score of 79.73%. We also explore characteristics and challenges in Vietnamese multimodal sentiment analysis, including misspellings, abbreviations, and the complexities of the Vietnamese language. This work contributes both a benchmark dataset and a new framework that leverages fine-grained multimodal information to improve multimodal aspect-category sentiment analysis. Our dataset is available for research purposes: https://github.com/hoangquy18/Multimodal-Aspect-Category-Sentiment-Analysis.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods
Authors:
Steven Shave,
Richard Kasprowicz,
Abdullah M. Athar,
Denise Vlachou,
Neil O. Carragher,
Cuong Q. Nguyen
Abstract:
The Connectivity Map (CMap) is a large publicly available database of cellular transcriptomic responses to chemical and genetic perturbations built using a standardized acquisition protocol known as the L1000 technique. Databases such as CMap provide an exciting opportunity to enrich drug discovery efforts, providing a 'known' phenotypic landscape to explore and enabling the development of state o…
▽ More
The Connectivity Map (CMap) is a large publicly available database of cellular transcriptomic responses to chemical and genetic perturbations built using a standardized acquisition protocol known as the L1000 technique. Databases such as CMap provide an exciting opportunity to enrich drug discovery efforts, providing a 'known' phenotypic landscape to explore and enabling the development of state of the art techniques for enhanced information extraction and better informed decisions. Whilst multiple methods for measuring phenotypic similarity and interrogating profiles have been developed, the field is severely lacking standardized benchmarks using appropriate data splitting for training and unbiased evaluation of machine learning methods. To address this, we have developed 'Leak Proof CMap' and exemplified its application to a set of common transcriptomic and generic phenotypic similarity methods along with an exemplar triplet loss-based method. Benchmarking in three critical performance areas (compactness, distinctness, and uniqueness) is conducted using carefully crafted data splits ensuring no similar cell lines or treatments with shared or closely matching responses or mechanisms of action are present in training, validation, or test sets. This enables testing of models with unseen samples akin to exploring treatments with novel modes of action in novel patient derived cell lines. With a carefully crafted benchmark and data splitting regime in place, the tooling now exists to create performant phenotypic similarity methods for use in personalized medicine (novel cell lines) and to better augment high throughput phenotypic screening technologies with the L1000 transcriptomic technology.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
Authors:
Huy Quang Pham,
Thang Kien-Bao Nguyen,
Quan Van Nguyen,
Dan Quang Tran,
Nghia Hieu Nguyen,
Kiet Van Nguyen,
Ngan Luu-Thuy Nguyen
Abstract:
Optical Character Recognition - Visual Question Answering (OCR-VQA) is the task of answering text information contained in images that have just been significantly developed in the English language in recent years. However, there are limited studies of this task in low-resource languages such as Vietnamese. To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recogniti…
▽ More
Optical Character Recognition - Visual Question Answering (OCR-VQA) is the task of answering text information contained in images that have just been significantly developed in the English language in recent years. However, there are limited studies of this task in low-resource languages such as Vietnamese. To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recognition - Visual Question Answering dataset), consisting of 28,000+ images and 120,000+ question-answer pairs. In this dataset, all the images contain text and questions about the information relevant to the text in the images. We deploy ideas from state-of-the-art methods proposed for English to conduct experiments on our dataset, revealing the challenges and difficulties inherent in a Vietnamese dataset. Furthermore, we introduce a novel approach, called VisionReader, which achieved 0.4116 in EM and 0.6990 in the F1-score on the test set. Through the results, we found that the OCR system plays a very important role in VQA models on the ViOCRVQA dataset. In addition, the objects in the image also play a role in improving model performance. We open access to our dataset at link (https://github.com/qhnhynmm/ViOCRVQA.git) for further research in OCR-VQA task in Vietnamese.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Manipulating Recommender Systems: A Survey of Poisoning Attacks and Countermeasures
Authors:
Thanh Toan Nguyen,
Quoc Viet Hung Nguyen,
Thanh Tam Nguyen,
Thanh Trung Huynh,
Thanh Thi Nguyen,
Matthias Weidlich,
Hongzhi Yin
Abstract:
Recommender systems have become an integral part of online services to help users locate specific information in a sea of data. However, existing studies show that some recommender systems are vulnerable to poisoning attacks, particularly those that involve learning schemes. A poisoning attack is where an adversary injects carefully crafted data into the process of training a model, with the goal…
▽ More
Recommender systems have become an integral part of online services to help users locate specific information in a sea of data. However, existing studies show that some recommender systems are vulnerable to poisoning attacks, particularly those that involve learning schemes. A poisoning attack is where an adversary injects carefully crafted data into the process of training a model, with the goal of manipulating the system's final recommendations. Based on recent advancements in artificial intelligence, such attacks have gained importance recently. While numerous countermeasures to poisoning attacks have been developed, they have not yet been systematically linked to the properties of the attacks. Consequently, assessing the respective risks and potential success of mitigation strategies is difficult, if not impossible. This survey aims to fill this gap by primarily focusing on poisoning attacks and their countermeasures. This is in contrast to prior surveys that mainly focus on attacks and their detection methods. Through an exhaustive literature review, we provide a novel taxonomy for poisoning attacks, formalise its dimensions, and accordingly organise 30+ attacks described in the literature. Further, we review 40+ countermeasures to detect and/or prevent poisoning attacks, evaluating their effectiveness against specific types of attacks. This comprehensive survey should serve as a point of reference for protecting recommender systems against poisoning attacks. The article concludes with a discussion on open issues in the field and impactful directions for future research. A rich repository of resources associated with poisoning attacks is available at https://github.com/tamlhp/awesome-recsys-poisoning.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Efficient and Concise Explanations for Object Detection with Gaussian-Class Activation Mapping Explainer
Authors:
Quoc Khanh Nguyen,
Truong Thanh Hung Nguyen,
Vo Thanh Khang Nguyen,
Van Binh Truong,
Tuong Phan,
Hung Cao
Abstract:
To address the challenges of providing quick and plausible explanations in Explainable AI (XAI) for object detection models, we introduce the Gaussian Class Activation Mapping Explainer (G-CAME). Our method efficiently generates concise saliency maps by utilizing activation maps from selected layers and applying a Gaussian kernel to emphasize critical image regions for the predicted object. Compar…
▽ More
To address the challenges of providing quick and plausible explanations in Explainable AI (XAI) for object detection models, we introduce the Gaussian Class Activation Mapping Explainer (G-CAME). Our method efficiently generates concise saliency maps by utilizing activation maps from selected layers and applying a Gaussian kernel to emphasize critical image regions for the predicted object. Compared with other Region-based approaches, G-CAME significantly reduces explanation time to 0.5 seconds without compromising the quality. Our evaluation of G-CAME, using Faster-RCNN and YOLOX on the MS-COCO 2017 dataset, demonstrates its ability to offer highly plausible and faithful explanations, especially in reducing the bias on tiny object detection.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Continuous Dynamic Bipedal Jumping via Adaptive-model Optimization
Authors:
Junheng Li,
Omar Kolt,
Quan Nguyen
Abstract:
Dynamic and continuous jumping remains an open yet challenging problem in bipedal robot control. The choice of dynamic models in trajectory optimization (TO) problems plays a huge role in trajectory accuracy and computation efficiency, which normally cannot be ensured simultaneously. In this letter, we propose a novel adaptive-model optimization approach, a unified framework of Adaptive-model TO a…
▽ More
Dynamic and continuous jumping remains an open yet challenging problem in bipedal robot control. The choice of dynamic models in trajectory optimization (TO) problems plays a huge role in trajectory accuracy and computation efficiency, which normally cannot be ensured simultaneously. In this letter, we propose a novel adaptive-model optimization approach, a unified framework of Adaptive-model TO and Adaptive-frequency Model Predictive Control (MPC), to effectively realize continuous and robust jumping on HECTOR bipedal robot. The proposed Adaptive-model TO fuses adaptive-fidelity dynamics modeling of bipedal jumping motion for model fidelity necessities in different jumping phases to ensure trajectory accuracy and computation efficiency. In addition, conventional approaches have unsynchronized sampling frequencies in TO and real-time control, causing the framework to have mismatched modeling resolutions. We adapt MPC sampling frequency based on TO trajectory resolution in different phases for effective trajectory tracking. In hardware experiments, we have demonstrated robust and dynamic jumps covering a distance of up to 40 cm (57% of robot height). To verify the repeatability of this experiment, we run 53 jumping experiments and achieve 90% success rate. In continuous jumps, we demonstrate continuous bipedal jumping with terrain height perturbations (up to 5 cm) and discontinuities (up to 20 cm gap).
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Multi-target and multi-stage liver lesion segmentation and detection in multi-phase computed tomography scans
Authors:
Abdullah F. Al-Battal,
Soan T. M. Duong,
Van Ha Tang,
Quang Duc Tran,
Steven Q. H. Truong,
Chien Phan,
Truong Q. Nguyen,
Cheolhong An
Abstract:
Multi-phase computed tomography (CT) scans use contrast agents to highlight different anatomical structures within the body to improve the probability of identifying and detecting anatomical structures of interest and abnormalities such as liver lesions. Yet, detecting these lesions remains a challenging task as these lesions vary significantly in their size, shape, texture, and contrast with resp…
▽ More
Multi-phase computed tomography (CT) scans use contrast agents to highlight different anatomical structures within the body to improve the probability of identifying and detecting anatomical structures of interest and abnormalities such as liver lesions. Yet, detecting these lesions remains a challenging task as these lesions vary significantly in their size, shape, texture, and contrast with respect to surrounding tissue. Therefore, radiologists need to have an extensive experience to be able to identify and detect these lesions. Segmentation-based neural networks can assist radiologists with this task. Current state-of-the-art lesion segmentation networks use the encoder-decoder design paradigm based on the UNet architecture where the multi-phase CT scan volume is fed to the network as a multi-channel input. Although this approach utilizes information from all the phases and outperform single-phase segmentation networks, we demonstrate that their performance is not optimal and can be further improved by incorporating the learning from models trained on each single-phase individually. Our approach comprises three stages. The first stage identifies the regions within the liver where there might be lesions at three different scales (4, 8, and 16 mm). The second stage includes the main segmentation model trained using all the phases as well as a segmentation model trained on each of the phases individually. The third stage uses the multi-phase CT volumes together with the predictions from each of the segmentation models to generate the final segmentation map. Overall, our approach improves relative liver lesion segmentation performance by 1.6% while reducing performance variability across subjects by 8% when compared to the current state-of-the-art models.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Authors:
Quan Van Nguyen,
Dan Quang Tran,
Huy Quang Pham,
Thang Kien-Bao Nguyen,
Nghia Hieu Nguyen,
Kiet Van Nguyen,
Ngan Luu-Thuy Nguyen
Abstract:
Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched, focusing on methods to help machines understand objects and scene contexts in images. However, some text appearing in the image that carries explicit information about the full content of the image is not mentioned. Along…
▽ More
Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched, focusing on methods to help machines understand objects and scene contexts in images. However, some text appearing in the image that carries explicit information about the full content of the image is not mentioned. Along with the continuous development of the AI era, there have been many studies on the reading comprehension ability of VQA models in the world. As a developing country, conditions are still limited, and this task is still open in Vietnam. Therefore, we introduce the first large-scale dataset in Vietnamese specializing in the ability to understand text appearing in images, we call it ViTextVQA (\textbf{Vi}etnamese \textbf{Text}-based \textbf{V}isual \textbf{Q}uestion \textbf{A}nswering dataset) which contains \textbf{over 16,000} images and \textbf{over 50,000} questions with answers. Through meticulous experiments with various state-of-the-art models, we uncover the significance of the order in which tokens in OCR text are processed and selected to formulate answers. This finding helped us significantly improve the performance of the baseline models on the ViTextVQA dataset. Our dataset is available at this \href{https://github.com/minhquan6203/ViTextVQA-Dataset}{link} for research purposes.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures
Authors:
Thanh Tam Nguyen,
Thanh Trung Huynh,
Zhao Ren,
Thanh Toan Nguyen,
Phi Le Nguyen,
Hongzhi Yin,
Quoc Viet Hung Nguyen
Abstract:
As the adoption of explainable AI (XAI) continues to expand, the urgency to address its privacy implications intensifies. Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations. This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures. Our contribution to…
▽ More
As the adoption of explainable AI (XAI) continues to expand, the urgency to address its privacy implications intensifies. Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations. This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures. Our contribution to this field comprises a thorough analysis of research papers with a connected taxonomy that facilitates the categorisation of privacy attacks and countermeasures based on the targeted explanations. This work also includes an initial investigation into the causes of privacy leaks. Finally, we discuss unresolved issues and prospective research directions uncovered in our analysis. This survey aims to be a valuable resource for the research community and offers clear insights for those new to this domain. To support ongoing research, we have established an online resource repository, which will be continuously updated with new and relevant findings. Interested readers are encouraged to access our repository at https://github.com/tamlhp/awesome-privex.
△ Less
Submitted 26 June, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
Robust Federated Contrastive Recommender System against Model Poisoning Attack
Authors:
Wei Yuan,
Chaoqun Yang,
Liang Qu,
Guanhua Ye,
Quoc Viet Hung Nguyen,
Hongzhi Yin
Abstract:
Federated Recommender Systems (FedRecs) have garnered increasing attention recently, thanks to their privacy-preserving benefits. However, the decentralized and open characteristics of current FedRecs present two dilemmas. First, the performance of FedRecs is compromised due to highly sparse on-device data for each client. Second, the system's robustness is undermined by the vulnerability to model…
▽ More
Federated Recommender Systems (FedRecs) have garnered increasing attention recently, thanks to their privacy-preserving benefits. However, the decentralized and open characteristics of current FedRecs present two dilemmas. First, the performance of FedRecs is compromised due to highly sparse on-device data for each client. Second, the system's robustness is undermined by the vulnerability to model poisoning attacks launched by malicious users. In this paper, we introduce a novel contrastive learning framework designed to fully leverage the client's sparse data through embedding augmentation, referred to as CL4FedRec. Unlike previous contrastive learning approaches in FedRecs that necessitate clients to share their private parameters, our CL4FedRec aligns with the basic FedRec learning protocol, ensuring compatibility with most existing FedRec implementations. We then evaluate the robustness of FedRecs equipped with CL4FedRec by subjecting it to several state-of-the-art model poisoning attacks. Surprisingly, our observations reveal that contrastive learning tends to exacerbate the vulnerability of FedRecs to these attacks. This is attributed to the enhanced embedding uniformity, making the polluted target item embedding easily proximate to popular items. Based on this insight, we propose an enhanced and robust version of CL4FedRec (rCL4FedRec) by introducing a regularizer to maintain the distance among item embeddings with different popularity levels. Extensive experiments conducted on four commonly used recommendation datasets demonstrate that CL4FedRec significantly enhances both the model's performance and the robustness of FedRecs.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Improving Vietnamese-English Medical Machine Translation
Authors:
Nhu Vo,
Dat Quoc Nguyen,
Dung D. Le,
Massimo Piccardi,
Wray Buntine
Abstract:
Machine translation for Vietnamese-English in the medical domain is still an under-explored research area. In this paper, we introduce MedEV -- a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. We conduct extensive experiments comparing Google Translate, ChatGPT (gpt-3.5-turbo), state-of-the-art Vietnam…
▽ More
Machine translation for Vietnamese-English in the medical domain is still an under-explored research area. In this paper, we introduce MedEV -- a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. We conduct extensive experiments comparing Google Translate, ChatGPT (gpt-3.5-turbo), state-of-the-art Vietnamese-English neural machine translation models and pre-trained bilingual/multilingual sequence-to-sequence models on our new MedEV dataset. Experimental results show that the best performance is achieved by fine-tuning "vinai-translate" for each translation direction. We publicly release our dataset to promote further research.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Bayesian Learned Models Can Detect Adversarial Malware For Free
Authors:
Bao Gia Doan,
Dang Quang Nguyen,
Paul Montague,
Tamas Abraham,
Olivier De Vel,
Seyit Camtepe,
Salil S. Kanhere,
Ehsan Abbasnejad,
Damith C. Ranasinghe
Abstract:
The vulnerability of machine learning-based malware detectors to adversarial attacks has prompted the need for robust solutions. Adversarial training is an effective method but is computationally expensive to scale up to large datasets and comes at the cost of sacrificing model performance for robustness. We hypothesize that adversarial malware exploits the low-confidence regions of models and can…
▽ More
The vulnerability of machine learning-based malware detectors to adversarial attacks has prompted the need for robust solutions. Adversarial training is an effective method but is computationally expensive to scale up to large datasets and comes at the cost of sacrificing model performance for robustness. We hypothesize that adversarial malware exploits the low-confidence regions of models and can be identified using epistemic uncertainty of ML approaches -- epistemic uncertainty in a machine learning-based malware detector is a result of a lack of similar training samples in regions of the problem space. In particular, a Bayesian formulation can capture the model parameters' distribution and quantify epistemic uncertainty without sacrificing model performance. To verify our hypothesis, we consider Bayesian learning approaches with a mutual information-based formulation to quantify uncertainty and detect adversarial malware in Android, Windows domains and PDF malware. We found, quantifying uncertainty through Bayesian learning methods can defend against adversarial malware. In particular, Bayesian models: (1) are generally capable of identifying adversarial malware in both feature and problem space, (2) can detect concept drift by measuring uncertainty, and (3) with a diversity-promoting approach (or better posterior approximations) lead to parameter instances from the posterior to significantly enhance a detectors' ability.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Observation of polarization density waves in SrTiO3
Authors:
Gal Orenstein,
Viktor Krapivin,
Yijing Huang,
Zhuquan Zhan,
Gilberto de la Pena Munoz,
Ryan A. Duncan,
Quynh Nguyen,
Jade Stanton,
Samuel Teitelbaum,
Hasan Yavas,
Takahiro Sato,
Matthias C. Hoffmann,
Patrick Kramer,
Jiahao Zhang,
Andrea Cavalleri,
Riccardo Comin,
Mark P. M. Dean,
Ankit S. Disa,
Michael Forst,
Steven L. Johnson,
Matteo Mitrano,
Andrew M. Rappe,
David Reis,
Diling Zhu,
Keith A. Nelson
, et al. (1 additional authors not shown)
Abstract:
The nature of the "failed" ferroelectric transition in SrTiO3 has been a long-standing puzzle in condensed matter physics. A compelling explanation is the competition between ferroelectricity and an instability with a mesoscopic modulation of the polarization. These polarization density waves, which should become especially strong near the quantum critical point, break local inversion symmetry and…
▽ More
The nature of the "failed" ferroelectric transition in SrTiO3 has been a long-standing puzzle in condensed matter physics. A compelling explanation is the competition between ferroelectricity and an instability with a mesoscopic modulation of the polarization. These polarization density waves, which should become especially strong near the quantum critical point, break local inversion symmetry and are difficult to probe with conventional x-ray scattering methods. Here we combine a femtosecond x-ray free electron laser (XFEL) with THz coherent control methods to probe inversion symmetry breaking at finite momenta and visualize the instability of the polarization on nanometer lengthscales in SrTiO3. We find polar-acoustic collective modes that are soft particularly at the tens of nanometer lengthscale. These precursor collective excitations provide evidence for the conjectured mesoscopic modulated phase in SrTiO3.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Multiple-Input Auto-Encoder Guided Feature Selection for IoT Intrusion Detection Systems
Authors:
Phai Vu Dinh,
Diep N. Nguyen,
Dinh Thai Hoang,
Quang Uy Nguyen,
Eryk Dutkiewicz,
Son Pham Bao
Abstract:
While intrusion detection systems (IDSs) benefit from the diversity and generalization of IoT data features, the data diversity (e.g., the heterogeneity and high dimensions of data) also makes it difficult to train effective machine learning models in IoT IDSs. This also leads to potentially redundant/noisy features that may decrease the accuracy of the detection engine in IDSs. This paper first i…
▽ More
While intrusion detection systems (IDSs) benefit from the diversity and generalization of IoT data features, the data diversity (e.g., the heterogeneity and high dimensions of data) also makes it difficult to train effective machine learning models in IoT IDSs. This also leads to potentially redundant/noisy features that may decrease the accuracy of the detection engine in IDSs. This paper first introduces a novel neural network architecture called Multiple-Input Auto-Encoder (MIAE). MIAE consists of multiple sub-encoders that can process inputs from different sources with different characteristics. The MIAE model is trained in an unsupervised learning mode to transform the heterogeneous inputs into lower-dimensional representation, which helps classifiers distinguish between normal behaviour and different types of attacks. To distil and retain more relevant features but remove less important/redundant ones during the training process, we further design and embed a feature selection layer right after the representation layer of MIAE resulting in a new model called MIAEFS. This layer learns the importance of features in the representation vector, facilitating the selection of informative features from the representation vector. The results on three IDS datasets, i.e., NSLKDD, UNSW-NB15, and IDS2017, show the superior performance of MIAE and MIAEFS compared to other methods, e.g., conventional classifiers, dimensionality reduction models, unsupervised representation learning methods with different input dimensions, and unsupervised feature selection models. Moreover, MIAE and MIAEFS combined with the Random Forest (RF) classifier achieve accuracy of 96.5% in detecting sophisticated attacks, e.g., Slowloris. The average running time for detecting an attack sample using RF with the representation of MIAE and MIAEFS is approximate 1.7E-6 seconds, whilst the model size is lower than 1 MB.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Twin Auto-Encoder Model for Learning Separable Representation in Cyberattack Detection
Authors:
Phai Vu Dinh,
Quang Uy Nguyen,
Thai Hoang Dinh,
Diep N. Nguyen,
Bao Son Pham,
Eryk Dutkiewicz
Abstract:
Representation Learning (RL) plays a pivotal role in the success of many problems including cyberattack detection. Most of the RL methods for cyberattack detection are based on the latent vector of Auto-Encoder (AE) models. An AE transforms raw data into a new latent representation that better exposes the underlying characteristics of the input data. Thus, it is very useful for identifying cyberat…
▽ More
Representation Learning (RL) plays a pivotal role in the success of many problems including cyberattack detection. Most of the RL methods for cyberattack detection are based on the latent vector of Auto-Encoder (AE) models. An AE transforms raw data into a new latent representation that better exposes the underlying characteristics of the input data. Thus, it is very useful for identifying cyberattacks. However, due to the heterogeneity and sophistication of cyberattacks, the representation of AEs is often entangled/mixed resulting in the difficulty for downstream attack detection models. To tackle this problem, we propose a novel mod called Twin Auto-Encoder (TAE). TAE deterministically transforms the latent representation into a more distinguishable representation namely the \textit{separable representation} and the reconstructsuct the separable representation at the output. The output of TAE called the \textit{reconstruction representation} is input to downstream models to detect cyberattacks. We extensively evaluate the effectiveness of TAE using a wide range of bench-marking datasets. Experiment results show the superior accuracy of TAE over state-of-the-art RL models and well-known machine learning algorithms. Moreover, TAE also outperforms state-of-the-art models on some sophisticated and challenging attacks. We then investigate various characteristics of TAE to further demonstrate its superiority.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Fractional Tackles: Leveraging Player Tracking Data for Within-Play Tackling Evaluation in American Football
Authors:
Quang Nguyen,
Ruitong Jiang,
Meg Ellingwood,
Ronald Yurko
Abstract:
Tackling is a fundamental defensive move in American football, with the main purpose of stopping the forward motion of the ball-carrier. However, current tackling metrics are manually recorded outcomes that are inherently flawed due to their discrete and subjective nature. Using player tracking data, we present a novel framework for assessing tackling contribution in a continuous and objective man…
▽ More
Tackling is a fundamental defensive move in American football, with the main purpose of stopping the forward motion of the ball-carrier. However, current tackling metrics are manually recorded outcomes that are inherently flawed due to their discrete and subjective nature. Using player tracking data, we present a novel framework for assessing tackling contribution in a continuous and objective manner. Our approach first identifies when a defender is in a ``contact window'' of the ball-carrier during a play, before assigning value to each window and the players involved. This enables us to devise a new metric called fractional tackles, which credits defenders for halting the ball-carrier's forward motion toward the end zone. We demonstrate that fractional tackles overcome the shortcomings of traditional metrics such as tackles and assists, by providing greater variation and measurable information for players lacking recorded statistics like defensive linemen. We view our contribution as a significant step forward in measuring defensive performance in American football and a clear demonstration of the capabilities of player tracking data.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Safety-Aware Perception for Autonomous Collision Avoidance in Dynamic Environments
Authors:
Ryan M. Bena,
Chongbo Zhao,
Quan Nguyen
Abstract:
Autonomous collision avoidance requires accurate environmental perception; however, flight systems often possess limited sensing capabilities with field-of-view (FOV) restrictions. To navigate this challenge, we present a safety-aware approach for online determination of the optimal sensor-pointing direction $ψ_\text{d}$ which utilizes control barrier functions (CBFs). First, we generate a spatial…
▽ More
Autonomous collision avoidance requires accurate environmental perception; however, flight systems often possess limited sensing capabilities with field-of-view (FOV) restrictions. To navigate this challenge, we present a safety-aware approach for online determination of the optimal sensor-pointing direction $ψ_\text{d}$ which utilizes control barrier functions (CBFs). First, we generate a spatial density function $Φ$ which leverages CBF constraints to map the collision risk of all local coordinates. Then, we convolve $Φ$ with an attitude-dependent sensor FOV quality function to produce the objective function $Γ$ which quantifies the total observed risk for a given pointing direction. Finally, by finding the global optimizer for $Γ$, we identify the value of $ψ_\text{d}$ which maximizes the perception of risk within the FOV. We incorporate $ψ_\text{d}$ into a safety-critical flight architecture and conduct a numerical analysis using multiple simulated mission profiles. Our algorithm achieves a success rate of $88-96\%$, constituting a $16-29\%$ improvement compared to the best heuristic methods. We demonstrate the functionality of our approach via a flight demonstration using the Crazyflie 2.1 micro-quadrotor. Without a priori obstacle knowledge, the quadrotor follows a dynamic flight path while simultaneously calculating and tracking $ψ_\text{d}$ to perceive and avoid two static obstacles with an average computation time of 371 $μ$s.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
ARtVista: Gateway To Empower Anyone Into Artist
Authors:
Trong-Vu Hoang,
Quang-Binh Nguyen,
Duy-Nam Ly,
Khanh-Duy Le,
Tam V. Nguyen,
Minh-Triet Tran,
Trung-Nghia Le
Abstract:
Drawing is an art that enables people to express their imagination and emotions. However, individuals usually face challenges in drawing, especially when translating conceptual ideas into visually coherent representations and bridging the gap between mental visualization and practical execution. In response, we propose ARtVista - a novel system integrating AR and generative AI technologies. ARtVis…
▽ More
Drawing is an art that enables people to express their imagination and emotions. However, individuals usually face challenges in drawing, especially when translating conceptual ideas into visually coherent representations and bridging the gap between mental visualization and practical execution. In response, we propose ARtVista - a novel system integrating AR and generative AI technologies. ARtVista not only recommends reference images aligned with users' abstract ideas and generates sketches for users to draw but also goes beyond, crafting vibrant paintings in various painting styles. ARtVista also offers users an alternative approach to create striking paintings by simulating the paint-by-number concept on reference images, empowering users to create visually stunning artwork devoid of the necessity for advanced drawing skills. We perform a pilot study and reveal positive feedback on its usability, emphasizing its effectiveness in visualizing user ideas and aiding the painting process to achieve stunning pictures without requiring advanced drawing skills. The source code will be available at https://github.com/htrvu/ARtVista.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
OGMP: Oracle Guided Multimodal Policies for Agile and Versatile Robot Control
Authors:
Lokesh Krishna,
Nikhil Sobanbabu,
Quan Nguyen
Abstract:
The efficacy of model-free learning for robot control relies on the tailored integration of task-specific priors and heuristics, hence calling for a unified approach. In this paper, we define a general class for priors called oracles and propose bounding the permissible state around the oracle's ansatz, resulting in task-agnostic oracle-guided policy optimization. Additionally, to enhance modulari…
▽ More
The efficacy of model-free learning for robot control relies on the tailored integration of task-specific priors and heuristics, hence calling for a unified approach. In this paper, we define a general class for priors called oracles and propose bounding the permissible state around the oracle's ansatz, resulting in task-agnostic oracle-guided policy optimization. Additionally, to enhance modularity, we introduce the notion of task-vital modes. A policy mastering a compact set of modes and intermediate transitions can then solve perpetual tasks. The proposed approach is validated in challenging biped control tasks: parkour and diving on a 16-DoF dynamic bipedal robot, Hector. OGMP results in a single policy per task, solving indefinite parkour over diverse tracks and omnidirectional diving from varied heights, exhibiting versatile agility. Finally, we introduce a novel latent mode space reachability analysis to study our policy's mode generalization by computing a feasible mode set function through which we certify a set of failure-free modes for our policy to perform at any given state.
△ Less
Submitted 14 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Dynamical decoding of the competition between charge density waves in a kagome superconductor
Authors:
Honglie Ning,
Kyoung Hun Oh,
Yifan Su,
Alexander von Hoegen,
Zach Porter,
Andrea Capa Salinas,
Quynh L Nguyen,
Matthieu Chollet,
Takahiro Sato,
Vincent Esposito,
Matthias C Hoffmann,
Adam White,
Cynthia Melendrez,
Diling Zhu,
Stephen D Wilson,
Nuh Gedik
Abstract:
The kagome superconductor CsV$_3$Sb$_5$ hosts a variety of charge density wave (CDW) phases, which play a fundamental role in the formation of other exotic electronic instabilities. However, identifying the precise structure of these CDW phases and their intricate relationships remain the subject of intense debate, due to the lack of static probes that can distinguish the CDW phases with identical…
▽ More
The kagome superconductor CsV$_3$Sb$_5$ hosts a variety of charge density wave (CDW) phases, which play a fundamental role in the formation of other exotic electronic instabilities. However, identifying the precise structure of these CDW phases and their intricate relationships remain the subject of intense debate, due to the lack of static probes that can distinguish the CDW phases with identical spatial periodicity. Here, we unveil the competition between two coexisting $2\times2\times2$ CDWs in CsV$_3$Sb$_5$ harnessing time-resolved X-ray diffraction. By analyzing the light-induced changes in the intensity of CDW superlattice peaks, we demonstrate the presence of both phases, each displaying a significantly different amount of melting upon excitation. The anomalous light-induced sharpening of peak width further shows that the phase that is more resistant to photo-excitation exhibits an increase in domain size at the expense of the other, thereby showcasing a hallmark of phase competition. Our results not only shed light on the interplay between the multiple CDW phases in CsV$_3$Sb$_5$, but also establish a non-equilibrium framework for comprehending complex phase relationships that are challenging to disentangle using static techniques.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models
Authors:
Sang T. Truong,
Duc Q. Nguyen,
Toan Nguyen,
Dong D. Le,
Nhi N. Truong,
Tho Quan,
Sanmi Koyejo
Abstract:
Recent advancements in large language models (LLMs) have underscored their importance in the evolution of artificial intelligence. However, despite extensive pretraining on multilingual datasets, available open-sourced LLMs exhibit limited effectiveness in processing Vietnamese. The challenge is exacerbated by the absence of systematic benchmark datasets and metrics tailored for Vietnamese LLM eva…
▽ More
Recent advancements in large language models (LLMs) have underscored their importance in the evolution of artificial intelligence. However, despite extensive pretraining on multilingual datasets, available open-sourced LLMs exhibit limited effectiveness in processing Vietnamese. The challenge is exacerbated by the absence of systematic benchmark datasets and metrics tailored for Vietnamese LLM evaluation. To mitigate these issues, we have finetuned LLMs specifically for Vietnamese and developed a comprehensive evaluation framework encompassing 10 common tasks and 31 metrics. Our evaluation results reveal that the fine-tuned LLMs exhibit enhanced comprehension and generative capabilities in Vietnamese. Moreover, our analysis indicates that models with more parameters can introduce more biases and uncalibrated outputs and the key factor influencing LLM performance is the quality of the training or fine-tuning datasets. These insights underscore the significance of meticulous fine-tuning with high-quality datasets in enhancing LLM performance.
△ Less
Submitted 26 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Near-optimal Per-Action Regret Bounds for Sleeping Bandits
Authors:
Quan Nguyen,
Nishant A. Mehta
Abstract:
We derive near-optimal per-action regret bounds for sleeping bandits, in which both the sets of available arms and their losses in every round are chosen by an adversary. In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(K\sqrt{TA\ln{K}})$, obtained indirectly via minimizing internal sleeping regrets. Compared to the min…
▽ More
We derive near-optimal per-action regret bounds for sleeping bandits, in which both the sets of available arms and their losses in every round are chosen by an adversary. In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(K\sqrt{TA\ln{K}})$, obtained indirectly via minimizing internal sleeping regrets. Compared to the minimax $Ω(\sqrt{TA})$ lower bound, this upper bound contains an extra multiplicative factor of $K\ln{K}$. We address this gap by directly minimizing the per-action regret using generalized versions of EXP3, EXP3-IX and FTRL with Tsallis entropy, thereby obtaining near-optimal bounds of order $O(\sqrt{TA\ln{K}})$ and $O(\sqrt{T\sqrt{AK}})$. We extend our results to the setting of bandits with advice from sleeping experts, generalizing EXP4 along the way. This leads to new proofs for a number of existing adaptive and tracking regret bounds for standard non-sleeping bandits. Extending our results to the bandit version of experts that report their confidences leads to new bounds for the confidence regret that depends primarily on the sum of experts' confidences. We prove a lower bound, showing that for any minimax optimal algorithms, there exists an action whose regret is sublinear in $T$ but linear in the number of its active rounds.
△ Less
Submitted 29 May, 2024; v1 submitted 2 March, 2024;
originally announced March 2024.