Search | arXiv e-print repository

Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model

Authors: Duy M. H. Nguyen, An T. Le, Trung Q. Nguyen, Nghiem T. Diep, Tai Nguyen, Duy Duong-Tran, Jan Peters, Li Shen, Mathias Niepert, Daniel Sonntag

Abstract: Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we c… ▽ More Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model's feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT's characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Version 1

arXiv:2407.04279 [pdf, other]

BiosERC: Integrating Biography Speakers Supported by LLMs for ERC Tasks

Authors: Jieying Xue, Minh Phuong Nguyen, Blake Matheny, Le Minh Nguyen

Abstract: In the Emotion Recognition in Conversation task, recent investigations have utilized attention mechanisms exploring relationships among utterances from intra- and inter-speakers for modeling emotional interaction between them. However, attributes such as speaker personality traits remain unexplored and present challenges in terms of their applicability to other tasks or compatibility with diverse… ▽ More In the Emotion Recognition in Conversation task, recent investigations have utilized attention mechanisms exploring relationships among utterances from intra- and inter-speakers for modeling emotional interaction between them. However, attributes such as speaker personality traits remain unexplored and present challenges in terms of their applicability to other tasks or compatibility with diverse model architectures. Therefore, this work introduces a novel framework named BiosERC, which investigates speaker characteristics in a conversation. By employing Large Language Models (LLMs), we extract the "biographical information" of the speaker within a conversation as supplementary knowledge injected into the model to classify emotional labels for each utterance. Our proposed method achieved state-of-the-art (SOTA) results on three famous benchmark datasets: IEMOCAP, MELD, and EmoryNLP, demonstrating the effectiveness and generalization of our model and showcasing its potential for adaptation to various conversation analysis tasks. Our source code is available at https://github.com/yingjie7/BiosERC. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Accepted in the 33rd International Conference on Artificial Neural Networks (ICANN 2024)

arXiv:2407.03376 [pdf, other]

The Complex Interplay Between Risk Tolerance and the Spread of Infectious Diseases

Authors: Maximilian Nguyen, Ari Freedman, Matthew Cheung, Chadi Saad-Roy, Baltazar Espinoza, Bryan Grenfell, Simon Levin

Abstract: Risk-driven behavior provides a feedback mechanism through which individuals both shape and are collectively affected by an epidemic. We introduce a general and flexible compartmental model to study the effect of heterogeneity in the population with regards to risk tolerance. The interplay between behavior and epidemiology leads to a rich set of possible epidemic dynamics. Depending on the behavio… ▽ More Risk-driven behavior provides a feedback mechanism through which individuals both shape and are collectively affected by an epidemic. We introduce a general and flexible compartmental model to study the effect of heterogeneity in the population with regards to risk tolerance. The interplay between behavior and epidemiology leads to a rich set of possible epidemic dynamics. Depending on the behavioral composition of the population, we find that increasing heterogeneity in risk tolerance can either increase or decrease the epidemic size. We find that multiple waves of infection can arise due to the interplay between transmission and behavior, even without the replenishment of susceptibles. We find that increasing protective mechanisms such as the effectiveness of interventions, the number of risk-averse people in the population, and the duration of intervention usage reduces the epidemic overshoot. When the protection is pushed past a critical threshold, the epidemic dynamics enter an underdamped regime where the epidemic size exactly equals the herd immunity threshold. Lastly, we can find regimes where epidemic size does not monotonically decrease with a population that becomes increasingly risk-averse. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 12 pages, 7 figures (main text)

arXiv:2407.01082 [pdf, other]

Min P Sampling: Balancing Creativity and Coherence at High Temperature

Authors: Minh Nguyen, Andrew Baker, Andreas Kirsch, Clement Neo

Abstract: Large Language Models (LLMs) generate longform text by successively sampling the next token based on the probability distribution of the token vocabulary at each decoding step. Current popular truncation sampling methods such as top-$p$ sampling, also known as nucleus sampling, often struggle to balance coherence and creativity in generating text, particularly when using higher temperatures. To ad… ▽ More Large Language Models (LLMs) generate longform text by successively sampling the next token based on the probability distribution of the token vocabulary at each decoding step. Current popular truncation sampling methods such as top-$p$ sampling, also known as nucleus sampling, often struggle to balance coherence and creativity in generating text, particularly when using higher temperatures. To address this issue, we propose min-$p$, a dynamic truncation sampling method, that establishes a minimum base percentage threshold for tokens, which the scales according to the probability of the top candidate token. Through experiments on several benchmarks, such as GPQA, GSM8K and AlpacaEval Creative Writing, we demonstrate that min-$p$ improves the coherence and quality of generated text even at high temperatures, while also facilitating more creative and diverse outputs compared to top-$p$ and other sampling methods. As of writing, min-$p$ has been adopted by multiple open-source LLM implementations, and have been independently assessed by members of the open-source LLM community, further validating its practical utility and potential. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 8 Pages

arXiv:2407.00521 [pdf, other]

A Medical Low-Back Pain Physical Rehabilitation Dataset for Human Body Movement Analysis

Authors: Sao Mai Nguyen, Maxime Devanne, Olivier Remy-Neris, Mathieu Lempereur, André Thepaut

Abstract: While automatic monitoring and coaching of exercises are showing encouraging results in non-medical applications, they still have limitations such as errors and limited use contexts. To allow the development and assessment of physical rehabilitation by an intelligent tutoring system, we identify in this article four challenges to address and propose a medical dataset of clinical patients carrying… ▽ More While automatic monitoring and coaching of exercises are showing encouraging results in non-medical applications, they still have limitations such as errors and limited use contexts. To allow the development and assessment of physical rehabilitation by an intelligent tutoring system, we identify in this article four challenges to address and propose a medical dataset of clinical patients carrying out low back-pain rehabilitation exercises. The dataset includes 3D Kinect skeleton positions and orientations, RGB videos, 2D skeleton data, and medical annotations to assess the correctness, and error classification and localisation of body part and timespan. Along this dataset, we perform a complete research path, from data collection to processing, and finally a small benchmark. We evaluated on the dataset two baseline movement recognition algorithms, pertaining to two different approaches: the probabilistic approach with a Gaussian Mixture Model (GMM), and the deep learning approach with a Long-Short Term Memory (LSTM). This dataset is valuable because it includes rehabilitation relevant motions in a clinical setting with patients in their rehabilitation program, using a cost-effective, portable, and convenient sensor, and because it shows the potential for improvement on these challenges. △ Less

Submitted 29 June, 2024; originally announced July 2024.

ACM Class: I.5.4; I.4.8

Journal ref: IJCNN 2024

arXiv:2406.20043 [pdf, ps, other]

Existence of Solutions to the Seiberg-Witten Vortex Equations with Exponential Decay on the Plane

Authors: William L. Blair, Minh Lam Nguyen

Abstract: Clifford Taubes showed that the moduli space of the variational equation of the Yang-Mills-Higgs functional on the plane is non-empty, and its elements correspond to "vortices". Inspired by this result, in this paper, we show that the moduli space of the Hitchin-type dimensional reduction of the Seiberg-Witten equations on the plane contains both exponentially decayed solutions and polynomial grow… ▽ More Clifford Taubes showed that the moduli space of the variational equation of the Yang-Mills-Higgs functional on the plane is non-empty, and its elements correspond to "vortices". Inspired by this result, in this paper, we show that the moduli space of the Hitchin-type dimensional reduction of the Seiberg-Witten equations on the plane contains both exponentially decayed solutions and polynomial growth solutions. Furthermore, we show that there is correspondence from the moduli space of exponentially decayed and polynomial growth solutions to the symmetric products of complex numbers. The correspondence restricted to the latter is a surjective map. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 32 pages, comments are welcome!

MSC Class: 53Cxx; 57Rxx; 58Jxx; 57Kxx; 30G20; 30Cxx

arXiv:2406.13997 [pdf, other]

"Global is Good, Local is Bad?": Understanding Brand Bias in LLMs

Authors: Mahammed Kamruzzaman, Hieu Minh Nguyen, Gene Louis Kim

Abstract: Many recent studies have investigated social biases in LLMs but brand bias has received little attention. This research examines the biases exhibited by LLMs towards different brands, a significant concern given the widespread use of LLMs in affected use cases such as product recommendation and market analysis. Biased models may perpetuate societal inequalities, unfairly favoring established globa… ▽ More Many recent studies have investigated social biases in LLMs but brand bias has received little attention. This research examines the biases exhibited by LLMs towards different brands, a significant concern given the widespread use of LLMs in affected use cases such as product recommendation and market analysis. Biased models may perpetuate societal inequalities, unfairly favoring established global brands while marginalizing local ones. Using a curated dataset across four brand categories, we probe the behavior of LLMs in this space. We find a consistent pattern of bias in this space -- both in terms of disproportionately associating global brands with positive attributes and disproportionately recommending luxury gifts for individuals in high-income countries. We also find LLMs are subject to country-of-origin effects which may boost local brand preference in LLM outputs in specific contexts. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13781 [pdf, other]

A Primal-Dual Framework for Transformers and Neural Networks

Authors: Tan M. Nguyen, Tam Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

Abstract: Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresp… ▽ More Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresponds to the support vector expansion derived from a support vector regression problem, whose primal formulation has the form of a neural network layer. Using our framework, we derive popular attention layers used in practice and propose two new attentions: 1) the Batch Normalized Attention (Attention-BN) derived from the batch normalization layer and 2) the Attention with Scaled Head (Attention-SH) derived from using less training data to fit the SVR model. We empirically demonstrate the advantages of the Attention-BN and Attention-SH in reducing head redundancy, increasing the model's accuracy, and improving the model's efficiency in a variety of practical applications including image and time-series classification. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted to ICLR 2023, 26 pages, 4 figures, 14 tables

arXiv:2406.13770 [pdf, other]

Elliptical Attention

Authors: Stefan K. Nielsen, Laziz U. Abdullaev, Rachel Teo, Tan M. Nguyen

Abstract: Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision. This dot-product self-attention computes attention weights among the input tokens using Euclidean distance, which makes the model prone to representation collapse and vulnerable to contaminated samples. In this paper, we propos… ▽ More Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision. This dot-product self-attention computes attention weights among the input tokens using Euclidean distance, which makes the model prone to representation collapse and vulnerable to contaminated samples. In this paper, we propose using a Mahalanobis distance metric for computing the attention weights to stretch the underlying feature space in directions of high contextual relevance. In particular, we define a hyper-ellipsoidal neighborhood around each query to increase the attention weights of the tokens lying in the contextually important directions. We term this novel class of attention Elliptical Attention. Our Elliptical Attention provides two benefits: 1) reducing representation collapse and 2) enhancing the model's robustness as the Elliptical Attention pays more attention to contextually relevant information rather than focusing on some small subset of informative features. We empirically demonstrate the advantages of Elliptical Attention over the baseline dot-product attention and state-of-the-art attention methods on various practical tasks, including object classification, image segmentation, and language modeling across different data modalities. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 38 pages, 7 figures, 12 tables

arXiv:2406.13762 [pdf, other]

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis

Authors: Rachel S. Y. Teo, Tan M. Nguyen

Abstract: The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most deep learning models, the construction of these attention mechanisms rely on heuristics and experience. In our work, we derive self-attention from kernel principa… ▽ More The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most deep learning models, the construction of these attention mechanisms rely on heuristics and experience. In our work, we derive self-attention from kernel principal component analysis (kernel PCA) and show that self-attention projects its query vectors onto the principal component axes of its key matrix in a feature space. We then formulate the exact formula for the value matrix in self-attention, theoretically and empirically demonstrating that this value matrix captures the eigenvectors of the Gram matrix of the key vectors in self-attention. Leveraging our kernel PCA framework, we propose Attention with Robust Principal Components (RPC-Attention), a novel class of robust attention that is resilient to data contamination. We empirically demonstrate the advantages of RPC-Attention over softmax attention on the ImageNet-1K object classification, WikiText-103 language modeling, and ADE20K image segmentation task. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 33 pages, 5 figures, 12 tables

arXiv:2406.13725 [pdf, other]

Tree-Sliced Wasserstein Distance on a System of Lines

Authors: Viet-Hoang Tran, Trang Pham, Tho Tran, Tam Le, Tan M. Nguyen

Abstract: Sliced Wasserstein (SW) distance in Optimal Transport (OT) is widely used in various applications thanks to its statistical effectiveness and computational efficiency. On the other hand, Tree Wassenstein (TW) and Tree-sliced Wassenstein (TSW) are instances of OT for probability measures where its ground cost is a tree metric. TSW also has a low computational complexity, i.e. linear to the number o… ▽ More Sliced Wasserstein (SW) distance in Optimal Transport (OT) is widely used in various applications thanks to its statistical effectiveness and computational efficiency. On the other hand, Tree Wassenstein (TW) and Tree-sliced Wassenstein (TSW) are instances of OT for probability measures where its ground cost is a tree metric. TSW also has a low computational complexity, i.e. linear to the number of edges in the tree. Especially, TSW is identical to SW when the tree is a chain. While SW is prone to loss of topological information of input measures due to relying on one-dimensional projection, TSW is more flexible and has a higher degree of freedom by choosing a tree rather than a line to alleviate the curse of dimensionality in SW. However, for practical applications, popular tree metric sampling methods are heavily built upon given supports, which limits their capacity to adapt to new supports. In this paper, we propose the Tree-Sliced Wasserstein distance on a System of Lines (TSW-SL), which brings a connection between SW and TSW. Compared to SW and TSW, our TSW-SL benefits from the higher degree of freedom of TSW while being suitable to dynamic settings as SW. In TSW-SL, we use a variant of the Radon Transform to project measures onto a system of lines, resulting in measures on a space with a tree metric, then leverage TW to efficiently compute distances between them. We empirically verify the advantages of TSW-SL over the traditional SW by conducting a variety of experiments on gradient flows, image style transfer, and generative models. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 33 pages, 6 figures, 2 tables, 4 algorithms

arXiv:2406.12389 [pdf, other]

doi 10.18653/v1/2023.findings-emnlp.737

EMO-KNOW: A Large Scale Dataset on Emotion and Emotion-cause

Authors: Mia Huong Nguyen, Yasith Samaradivakara, Prasanth Sasikumar, Chitralekha Gupta, Suranga Nanayakkara

Abstract: Emotion-Cause analysis has attracted the attention of researchers in recent years. However, most existing datasets are limited in size and number of emotion categories. They often focus on extracting parts of the document that contain the emotion cause and fail to provide more abstractive, generalizable root cause. To bridge this gap, we introduce a large-scale dataset of emotion causes, derived f… ▽ More Emotion-Cause analysis has attracted the attention of researchers in recent years. However, most existing datasets are limited in size and number of emotion categories. They often focus on extracting parts of the document that contain the emotion cause and fail to provide more abstractive, generalizable root cause. To bridge this gap, we introduce a large-scale dataset of emotion causes, derived from 9.8 million cleaned tweets over 15 years. We describe our curation process, which includes a comprehensive pipeline for data gathering, cleaning, labeling, and validation, ensuring the dataset's reliability and richness. We extract emotion labels and provide abstractive summarization of the events causing emotions. The final dataset comprises over 700,000 tweets with corresponding emotion-cause pairs spanning 48 emotion classes, validated by human evaluators. The novelty of our dataset stems from its broad spectrum of emotion classes and the abstractive emotion cause that facilitates the development of an emotion-cause knowledge graph for nuanced reasoning. Our dataset will enable the design of emotion-aware systems that account for the diverse emotional responses of different people for the same event. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted to Findings of EMNLP 2023

Journal ref: Findings of EMNLP 2023

arXiv:2406.11927 [pdf, other]

REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark

Authors: Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui

Abstract: The ability of CodeLLMs to generate executable and functionally correct code at the repository-level scale remains largely unexplored. We introduce RepoExec, a novel benchmark for evaluating code generation at the repository-level scale. RepoExec focuses on three main aspects: executability, functional correctness through automated test case generation with high coverage rate, and carefully crafte… ▽ More The ability of CodeLLMs to generate executable and functionally correct code at the repository-level scale remains largely unexplored. We introduce RepoExec, a novel benchmark for evaluating code generation at the repository-level scale. RepoExec focuses on three main aspects: executability, functional correctness through automated test case generation with high coverage rate, and carefully crafted cross-file contexts to accurately generate code. Our work explores a controlled scenario where developers specify necessary code dependencies, challenging the model to integrate these accurately. Experiments show that while pretrained LLMs outperform instruction-tuned models in correctness, the latter excel in utilizing provided dependencies and demonstrating debugging capabilities. We also introduce a new instruction-tuned dataset that focuses on code dependencies and demonstrate that CodeLLMs fine-tuned on our dataset have a better capability to leverage these dependencies effectively. RepoExec aims to provide a comprehensive evaluation of code functionality and alignment with developer intent, paving the way for more reliable and applicable CodeLLMs in real-world scenarios. The dataset and source code can be found at~\url{https://github.com/FSoft-AI4Code/RepoExec}. △ Less

Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11912 [pdf, other]

AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology

Authors: Minh Huynh Nguyen, Thang Phan Chau, Phong X. Nguyen, Nghi D. Q. Bui

Abstract: Software agents have emerged as promising tools for addressing complex software engineering tasks. However, existing works oversimplify software development workflows by following the waterfall model. Thus, we propose AgileCoder, a multi-agent system that integrates Agile Methodology (AM) into the framework. This system assigns specific AM roles such as Product Manager, Developer, and Tester to di… ▽ More Software agents have emerged as promising tools for addressing complex software engineering tasks. However, existing works oversimplify software development workflows by following the waterfall model. Thus, we propose AgileCoder, a multi-agent system that integrates Agile Methodology (AM) into the framework. This system assigns specific AM roles such as Product Manager, Developer, and Tester to different agents, who then collaboratively develop software based on user inputs. AgileCoder enhances development efficiency by organizing work into sprints, focusing on incrementally developing software through sprints. Additionally, we introduce Dynamic Code Graph Generator, a module that creates a Code Dependency Graph dynamically as updates are made to the codebase. This allows agents to better comprehend the codebase, leading to more precise code generation and modifications throughout the software development process. AgileCoder surpasses existing benchmarks, like ChatDev and MetaGPT, establishing a new standard and showcasing the capabilities of multi-agent systems in advanced software engineering environments. Our source code can be found at https://github.com/FSoft-AI4Code/AgileCoder. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10853 [pdf, other]

MV2Cyl: Reconstructing 3D Extrusion Cylinders from Multi-View Images

Authors: Eunji Hong, Minh Hieu Nguyen, Mikaela Angelina Uy, Minhyuk Sung

Abstract: We present MV2Cyl, a novel method for reconstructing 3D from 2D multi-view images, not merely as a field or raw geometry but as a sketch-extrude CADきゃど model. Extracting extrusion cylinders from raw 3D geometry has been extensively researched in computer vision, while the processing of 3D data through neural networks has remained a bottleneck. Since 3D scans are generally accompanied by multi-view im… ▽ More We present MV2Cyl, a novel method for reconstructing 3D from 2D multi-view images, not merely as a field or raw geometry but as a sketch-extrude CADきゃど model. Extracting extrusion cylinders from raw 3D geometry has been extensively researched in computer vision, while the processing of 3D data through neural networks has remained a bottleneck. Since 3D scans are generally accompanied by multi-view images, leveraging 2D convolutional neural networks allows these images to be exploited as a rich source for extracting extrusion cylinder information. However, we observe that extracting only the surface information of the extrudes and utilizing it results in suboptimal outcomes due to the challenges in the occlusion and surface segmentation. By synergizing with the extracted base curve information, we achieve the optimal reconstruction result with the best accuracy in 2D sketch and extrude parameter estimation. Our experiments, comparing our method with previous work that takes a raw 3D point cloud as input, demonstrate the effectiveness of our approach by taking advantage of multi-view images. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 24 pages

arXiv:2406.09837 [pdf, other]

TabularFM: An Open Framework For Tabular Foundational Models

Authors: Quan M. Tran, Suong N. Hoang, Lam M. Nguyen, Dzung Phan, Hoang Thanh Lam

Abstract: Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured… ▽ More Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured data, such as text and images, or semi-structured data, like time-series. However, there has been limited attention to structured data, such as tabular data, which, despite its prevalence, remains under-studied due to a lack of clean datasets and insufficient research on the transferability of FMs for various tabular data tasks. In response to this gap, we introduce a framework called TabularFM, which incorporates state-of-the-art methods for developing FMs specifically for tabular data. This includes variations of neural architectures such as GANs, VAEs, and Transformers. We have curated a million of tabular datasets and released cleaned versions to facilitate the development of tabular FMs. We pretrained FMs on this curated data, benchmarked various learning methods on these datasets, and released the pretrained models along with leaderboards for future comparative studies. Our fully open-sourced system provides a comprehensive analysis of the transferability of tabular FMs. By releasing these datasets, pretrained models, and leaderboards, we aim to enhance the validity and usability of tabular FMs in the near future. △ Less

Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.08301 [pdf, other]

Jet modification via $πぱい^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, K. Aoki, N. Apadula, L. Aphecetche, J. Asai, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (510 additional authors not shown)

Abstract: High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs… ▽ More High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δでるた_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δでるたφふぁい$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 534 authors from 83 institutions, 12 pages, 7 figures. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2406.06239 [pdf, other]

I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data

Authors: Hoang H. Le, Duy M. H. Nguyen, Omair Shahzad Bhatti, Laszlo Kopacsi, Thinh P. Ngo, Binh T. Nguyen, Michael Barz, Daniel Sonntag

Abstract: Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object r… ▽ More Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings. Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations. Such mechanisms enable us to learn embedding functions capable of generalizing to new object angle views, facilitating rapid adaptation and efficient reasoning in dynamic contexts as users navigate their environment. Through experiments conducted on three distinct video sequences, our interactive-based method showcases significant performance improvements over fixed training/testing algorithms, even when trained on considerably smaller annotated samples collected through user feedback. Furthermore, we demonstrate exceptional efficiency in data annotation processes and surpass prior interactive methods that use complete object detectors, combine detectors with convolutional networks, or employ interactive video segmentation. △ Less

Submitted 7 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: Updated version

arXiv:2406.05592 [pdf, other]

Constrained Design of a Binary Instrument in a Partially Linear Model

Authors: Tim Morrison, Minh Nguyen, Michael Baiocchi, Art B. Owen

Abstract: We study the question of how best to assign an encouragement in a randomized encouragement study. In our setting, units arrive with covariates, receive a nudge toward treatment or control, acquire one of those statuses in a way that need not align with the nudge, and finally have a response observed. The nudge can be seen as a binary instrument that affects the response only via the treatment stat… ▽ More We study the question of how best to assign an encouragement in a randomized encouragement study. In our setting, units arrive with covariates, receive a nudge toward treatment or control, acquire one of those statuses in a way that need not align with the nudge, and finally have a response observed. The nudge can be seen as a binary instrument that affects the response only via the treatment status. Our goal is to assign the nudge as a function of covariates in a way that best estimates the local average treatment effect (LATE). We assume a partially linear model, wherein the baseline model is non-parametric and the treatment term is linear in the covariates. Under this model, we outline a two-stage procedure to consistently estimate the LATE. Though the variance of the LATE is intractable, we derive a finite sample approximation and thus a design criterion to minimize. This criterion is convex, allowing for constraints that might arise for budgetary or ethical reasons. We prove conditions under which our solution asymptotically recovers the lowest true variance among all possible nudge propensities. We apply our method to a semi-synthetic example involving triage in an emergency department and find significant gains relative to a regression discontinuity design. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 31 pages, 6 figures

arXiv:2406.03413 [pdf, other]

UnWave-Net: Unrolled Wavelet Network for Compton Tomography Image Reconstruction

Authors: Ishak Ayad, Cécilia Tarpau, Javier Cebeiro, Maï K. Nguyen

Abstract: Computed tomography (CT) is a widely used medical imaging technique to scan internal structures of a body, typically involving collimation and mechanical rotation. Compton scatter tomography (CST) presents an interesting alternative to conventional CT by leveraging Compton physics instead of collimation to gather information from multiple directions. While CST introduces new imaging opportunities… ▽ More Computed tomography (CT) is a widely used medical imaging technique to scan internal structures of a body, typically involving collimation and mechanical rotation. Compton scatter tomography (CST) presents an interesting alternative to conventional CT by leveraging Compton physics instead of collimation to gather information from multiple directions. While CST introduces new imaging opportunities with several advantages such as high sensitivity, compactness, and entirely fixed systems, image reconstruction remains an open problem due to the mathematical challenges of CST modeling. In contrast, deep unrolling networks have demonstrated potential in CT image reconstruction, despite their computationally intensive nature. In this study, we investigate the efficiency of unrolling networks for CST image reconstruction. To address the important computational cost required for training, we propose UnWave-Net, a novel unrolled wavelet-based reconstruction network. This architecture includes a non-local regularization term based on wavelets, which captures long-range dependencies within images and emphasizes the multi-scale components of the wavelet transform. We evaluate our approach using a CST of circular geometry which stays completely static during data acquisition, where UnWave-Net facilitates image reconstruction in the absence of a specific reconstruction formula. Our method outperforms existing approaches and achieves state-of-the-art performance in terms of SSIM and PSNR, and offers an improved computational efficiency compared to traditional unrolling networks. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: This paper has been early accepted by MICCAI 2024

arXiv:2406.02533 [pdf, other]

SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition

Authors: Van Minh Nguyen, Emma Sandidge, Trupti Mahendrakar, Ryan T. White

Abstract: On-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possibly unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. In this article, we present an approach for mapping geometries… ▽ More On-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possibly unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. In this article, we present an approach for mapping geometries and high-confidence detection of components of unknown, non-cooperative satellites on orbit. We implement accelerated 3D Gaussian splatting to learn a 3D representation of the satellite, render virtual views of the target, and ensemble the YOLOv5 object detector over the virtual views, resulting in reliable, accurate, and precise satellite component detections. The full pipeline capable of running on-board and stand to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01007 [pdf, other]

Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineνにゅー_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θしーた_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δでるたm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δでるたm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θしーた_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θしーた_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00162 [pdf, ps, other]

doi 10.1145/3663408.366582

Optical-computing-enabled Network: A New Dawn for Optical-layer Intelligence?

Authors: Dao Thanh Hai, Minh Nguyen, Isaac Woungang

Abstract: Inspired by the renaissance of optical computing recently, this poster presents a disruptive outlook on the possibility of seamless integration between optical communications and optical computing infrastructures, paving the way for achieving optical-layer intelligence and consequently boosting the capacity efficiency. This entails a paradigm shift in optical node architecture from the currently u… ▽ More Inspired by the renaissance of optical computing recently, this poster presents a disruptive outlook on the possibility of seamless integration between optical communications and optical computing infrastructures, paving the way for achieving optical-layer intelligence and consequently boosting the capacity efficiency. This entails a paradigm shift in optical node architecture from the currently used optical-bypass to a novel one, entitled, optical-computing-enabled mode, where in addition to the traditional add-drop and cross-connect functionalities, optical nodes are upgraded to account for optical-computing capabilities between the lightpath entities directly at the optical layer. A preliminary study focusing on the optical aggregation operation is examined and early simulation results indicate a promising spectral saving enabled by the optical-computing-enabled mode compared with the optical-bypass one. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 2-page poster, to be appeared in Proceedings of the 8th Asia-Pacific Workshop on Networking (APNet 2024)

arXiv:2405.20448 [pdf, other]

Knockout: A simple way to handle missing inputs

Authors: Minh Nguyen, Batuhan K. Karaman, Heejong Kim, Alan Q. Wang, Fengbei Liu, Mert R. Sabuncu

Abstract: Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training mul… ▽ More Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. Marginalization can obtain calibrated predictions but it is computationally costly and therefore only feasible for low dimensional inputs. Imputation may result in inaccurate predictions because it employs point estimates for missing variables and does not work well for high dimensional inputs (e.g., images). Training multiple models whereby each model takes different subsets of inputs can work well but requires knowing missing input patterns in advance. Furthermore, training and retaining multiple models can be costly. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification of Knockout and show that it can be viewed as an implicit marginalization strategy. We evaluate Knockout in a wide range of simulations and real-world datasets and show that it can offer strong empirical performance. △ Less

Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.20024 [pdf, other]

Applications of Generative AI (GAI) for Mobile and Wireless Networking: A Survey

Authors: Thai-Hoc Vu, Senthil Kumar Jagatheesaperumal, Minh-Duong Nguyen, Nguyen Van Huynh, Sunghwan Kim, Quoc-Viet Pham

Abstract: The success of Artificial Intelligence (AI) in multiple disciplines and vertical domains in recent years has promoted the evolution of mobile networking and the future Internet toward an AI-integrated Internet-of-Things (IoT) era. Nevertheless, most AI techniques rely on data generated by physical devices (e.g., mobile devices and network nodes) or specific applications (e.g., fitness trackers and… ▽ More The success of Artificial Intelligence (AI) in multiple disciplines and vertical domains in recent years has promoted the evolution of mobile networking and the future Internet toward an AI-integrated Internet-of-Things (IoT) era. Nevertheless, most AI techniques rely on data generated by physical devices (e.g., mobile devices and network nodes) or specific applications (e.g., fitness trackers and mobile gaming). To bypass this circumvent, Generative AI (GAI), a.k.a. AI-generated content (AIGC), has emerged as a powerful AI paradigm; thanks to its ability to efficiently learn complex data distributions and generate synthetic data to represent the original data in various forms. This impressive feature is projected to transform the management of mobile networking and diversify the current services and applications provided. On this basis, this work presents a concise tutorial on the role of GAIs in mobile and wireless networking. In particular, this survey first provides the fundamentals of GAI and representative GAI models, serving as an essential preliminary to the understanding of the applications of GAI in mobile and wireless networking. Then, this work provides a comprehensive review of state-of-the-art studies and GAI applications in network management, wireless security, semantic communication, and lessons learned from the open literature. Finally, this work summarizes the current research on GAI for mobile and wireless networking by outlining important challenges that need to be resolved to facilitate the development and applicability of GAI in this edge-cutting area. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19612 [pdf, other]

Keyword-driven Retrieval-Augmented Large Language Models for Cold-start User Recommendations

Authors: Hai-Dang Kieu, Minh Duc Nguyen, Thanh-Son Nguyen, Dung D. Le

Abstract: Recent advancements in Large Language Models (LLMs) have shown significant potential in enhancing recommender systems. However, addressing the cold-start recommendation problem, where users lack historical data, remains a considerable challenge. In this paper, we introduce KALM4Rec (Keyword-driven Retrieval-Augmented Large Language Models for Cold-start User Recommendations), a novel framework spe… ▽ More Recent advancements in Large Language Models (LLMs) have shown significant potential in enhancing recommender systems. However, addressing the cold-start recommendation problem, where users lack historical data, remains a considerable challenge. In this paper, we introduce KALM4Rec (Keyword-driven Retrieval-Augmented Large Language Models for Cold-start User Recommendations), a novel framework specifically designed to tackle this problem by requiring only a few input keywords from users in a practical scenario of cold-start user restaurant recommendations. KALM4Rec operates in two main stages: candidates retrieval and LLM-based candidates re-ranking. In the first stage, keyword-driven retrieval models are used to identify potential candidates, addressing LLMs' limitations in processing extensive tokens and reducing the risk of generating misleading information. In the second stage, we employ LLMs with various prompting strategies, including zero-shot and few-shot techniques, to re-rank these candidates by integrating multiple examples directly into the LLM prompts. Our evaluation, using a Yelp restaurant dataset with user reviews from three English-speaking cities, shows that our proposed framework significantly improves recommendation quality. Specifically, the integration of in-context instructions with LLMs for re-ranking markedly enhances the performance of the cold-start user recommender system. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 10 pages, 10 figures, 4 tables

arXiv:2405.18605 [pdf, ps, other]

BioBERT-based Deep Learning and Merged ChemProt-DrugProt for Enhanced Biomedical Relation Extraction

Authors: Bridget T. McInnes, Jiawei Tang, Darshini Mahendran, Mai H. Nguyen

Abstract: This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improv… ▽ More This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improvements, particularly in CPR groups shared between the datasets. The findings underscore the importance of dataset merging in augmenting sample counts and improving model accuracy. Moreover, the study highlights the potential of automated information extraction in biomedical research and clinical practice. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17385 [pdf, other]

Thermalization and Criticality on an Analog-Digital Quantum Simulator

Authors: Trond I. Andersen, Nikita Astrakhantsev, Amir H. Karamlou, Julia Berndtsson, Johannes Motruk, Aaron Szasz, Jonathan A. Gross, Alexander Schuckert, Tom Westerhout, Yaxing Zhang, Ebrahim Forati, Dario Rossi, Bryce Kobrin, Agustin Di Paolo, Andrey R. Klots, Ilya Drozdov, Vladislav D. Kurilovich, Andre Petukhov, Lev B. Ioffe, Andreas Elben, Aniket Rath, Vittorio Vitale, Benoit Vermersch, Rajeev Acharya, Laleh Aghababaie Beni , et al. (202 additional authors not shown)

Abstract: Understanding how interacting particles approach thermal equilibrium is a major challenge of quantum simulators. Unlocking the full potential of such systems toward this goal requires flexible initial state preparation, precise time evolution, and extensive probes for final state characterization. We present a quantum simulator comprising 69 superconducting qubits which supports both universal qua… ▽ More Understanding how interacting particles approach thermal equilibrium is a major challenge of quantum simulators. Unlocking the full potential of such systems toward this goal requires flexible initial state preparation, precise time evolution, and extensive probes for final state characterization. We present a quantum simulator comprising 69 superconducting qubits which supports both universal quantum gates and high-fidelity analog evolution, with performance beyond the reach of classical simulation in cross-entropy benchmarking experiments. Emulating a two-dimensional (2D) XY quantum magnet, we leverage a wide range of measurement techniques to study quantum states after ramps from an antiferromagnetic initial state. We observe signatures of the classical Kosterlitz-Thouless phase transition, as well as strong deviations from Kibble-Zurek scaling predictions attributed to the interplay between quantum and classical coarsening of the correlated domains. This interpretation is corroborated by injecting variable energy density into the initial state, which enables studying the effects of the eigenstate thermalization hypothesis (ETH) in targeted parts of the eigenspectrum. Finally, we digitally prepare the system in pairwise-entangled dimer states and image the transport of energy and vorticity during thermalization. These results establish the efficacy of superconducting analog-digital quantum processors for preparing states across many-body spectra and unveiling their thermalization dynamics. △ Less

Submitted 8 July, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16692 [pdf, other]

Planning Robot Placement for Object Grasping

Authors: Manish Saini, Melvin Paul Jacob, Minh Nguyen, Nico Hochgeschwender

Abstract: When performing manipulation-based activities such as picking objects, a mobile robot needs to position its base at a location that supports successful execution. To address this problem, prominent approaches typically rely on costly grasp planners to provide grasp poses for a target object, which are then are then analysed to identify the best robot placements for achieving each grasp pose. In th… ▽ More When performing manipulation-based activities such as picking objects, a mobile robot needs to position its base at a location that supports successful execution. To address this problem, prominent approaches typically rely on costly grasp planners to provide grasp poses for a target object, which are then are then analysed to identify the best robot placements for achieving each grasp pose. In this paper, we propose instead to first find robot placements that would not result in collision with the environment and from where picking up the object is feasible, then evaluate them to find the best placement candidate. Our approach takes into account the robot's reachability, as well as RGB-D images and occupancy grid maps of the environment for identifying suitable robot poses. The proposed algorithm is embedded in a service robotic workflow, in which a person points to select the target object for grasping. We evaluate our approach with a series of grasping experiments, against an existing baseline implementation that sends the robot to a fixed navigation goal. The experimental results show how the approach allows the robot to grasp the target object from locations that are very challenging to the baseline implementation. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16339 [pdf, other]

BOLD: Boolean Logic Deep Learning

Authors: Van Minh Nguyen, Cristian Ocampo, Aymen Askri, Louis Leconte, Ba-Hien Tran

Abstract: Deep learning is computationally intensive, with significant efforts focused on reducing arithmetic complexity, particularly regarding energy consumption dominated by data movement. While existing literature emphasizes inference, training is considerably more resource-intensive. This paper proposes a novel mathematical principle by introducing the notion of Boolean variation such that neurons made… ▽ More Deep learning is computationally intensive, with significant efforts focused on reducing arithmetic complexity, particularly regarding energy consumption dominated by data movement. While existing literature emphasizes inference, training is considerably more resource-intensive. This paper proposes a novel mathematical principle by introducing the notion of Boolean variation such that neurons made of Boolean weights and inputs can be trained -- for the first time -- efficiently in Boolean domain using Boolean logic instead of gradient descent and real arithmetic. We explore its convergence, conduct extensively experimental benchmarking, and provide consistent complexity evaluation by considering chip architecture, memory hierarchy, dataflow, and arithmetic precision. Our approach achieves baseline full-precision accuracy in ImageNet classification and surpasses state-of-the-art results in semantic segmentation, with notable performance in image super-resolution, and natural language understanding with transformer-based models. Moreover, it significantly reduces energy consumption during both training and inference. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Under review

arXiv:2405.16148 [pdf, other]

Accelerating Transformers with Spectrum-Preserving Token Merging

Authors: Hoai-Chau Tran, Duy M. H. Nguyen, Duy M. Nguyen, Trung-Tin Nguyen, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y. Zou, Binh T. Nguyen, Mathias Niepert

Abstract: Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Pr… ▽ More Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents a novel paradigm called PiToMe, which prioritizes the preservation of informative tokens using an additional metric termed the energy score. This score identifies large clusters of similar tokens as high-energy, indicating potential candidates for merging, while smaller (unique and isolated) clusters are considered as low-energy and preserved. Experimental findings demonstrate that PiToMe saved from 40-60\% FLOPs of the base models while exhibiting superior off-the-shelf performance on image classification (0.5\% average performance drop of ViT-MAE-H compared to 2.6\% as baselines), image-text retrieval (0.3\% average performance drop of CLIP on Flickr30k compared to 4.5\% as others), and analogously in visual questions answering with LLaVa-7B. Furthermore, PiToMe is theoretically shown to preserve intrinsic spectral properties of the original token space under mild conditions △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Version 1

arXiv:2405.13155 [pdf, other]

ReALLM: A general framework for LLM compression and fine-tuning

Authors: Louis Leconte, Lisa Bedin, Van Minh Nguyen, Eric Moulines

Abstract: We introduce ReALLM, a novel approach for compression and memory-efficient adaptation of pre-trained language models that encompasses most of the post-training quantization and fine-tuning methods for a budget of <4 bits. Pre-trained matrices are decomposed into a high-precision low-rank component and a vector-quantized latent representation (using an autoencoder). During the fine-tuning step, onl… ▽ More We introduce ReALLM, a novel approach for compression and memory-efficient adaptation of pre-trained language models that encompasses most of the post-training quantization and fine-tuning methods for a budget of <4 bits. Pre-trained matrices are decomposed into a high-precision low-rank component and a vector-quantized latent representation (using an autoencoder). During the fine-tuning step, only the low-rank components are updated. Our results show that pre-trained matrices exhibit different patterns. ReALLM adapts the shape of the encoder (small/large embedding, high/low bit VQ, etc.) to each matrix. ReALLM proposes to represent each matrix with a small embedding on $b$ bits and a neural decoder model $\mathcal{D}_φふぁい$ with its weights on $b_φふぁい$ bits. The decompression of a matrix requires only one embedding and a single forward pass with the decoder. Our weight-only quantization algorithm yields the best results on language generation tasks (C4 and WikiText-2) for a budget of $3$ bits without any training. With a budget of $2$ bits, ReALLM achieves state-of-the art performance after fine-tuning on a small calibration dataset. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.12956 [pdf, ps, other]

A Fueter operator for 3/2-spinors

Authors: Ahmad Reza Haj Saeedi Sadegh, Minh Lam Nguyen

Abstract: We show the non-compactness of moduli space of solutions of the monopole equations for $3/2$-spinors on a closed 3-manifold is equivalent to the existence of `3/2-Fueter sections' that are solutions of an overdetermined non-linear elliptic differential equation. These are sections of a fiber bundle whose fiber is a special 4-dimensional submanifold of the hyperkähler manifold of center-framed char… ▽ More We show the non-compactness of moduli space of solutions of the monopole equations for $3/2$-spinors on a closed 3-manifold is equivalent to the existence of `3/2-Fueter sections' that are solutions of an overdetermined non-linear elliptic differential equation. These are sections of a fiber bundle whose fiber is a special 4-dimensional submanifold of the hyperkähler manifold of center-framed charged one $SU(2)$-instantons on $\mathbf{R}^4$. This fiber bundle does not inherit a hyperkähler structure. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: Comments are welcome

MSC Class: 53Cxx; 57Rxx; 58Jxx; 57Kx

arXiv:2405.12306 [pdf, ps, other]

GW with hybrid functionals for large molecular systems

Authors: Tucker Allen, Minh Nguyen, Daniel Neuhauser

Abstract: A low-cost approach for stochastically sampling static exchange during TDHF-type propagation is presented. This enables the use of an excellent hybrid DFT starting point for stochastic GW quasiparticle energy calculations. Generalized Kohn-Sham molecular orbitals and energies, rather than those of a local-DFT calculation, are used for building the Green's function and effective Coulomb interaction… ▽ More A low-cost approach for stochastically sampling static exchange during TDHF-type propagation is presented. This enables the use of an excellent hybrid DFT starting point for stochastic GW quasiparticle energy calculations. Generalized Kohn-Sham molecular orbitals and energies, rather than those of a local-DFT calculation, are used for building the Green's function and effective Coulomb interaction. The use of an optimally tuned hybrid diminishes the starting point dependency in one-shot stochastic GW, effectively avoiding the need for self-consistent GW iterations. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 18 pages, 1 figure

arXiv:2405.11348 [pdf, other]

Chiral symmetry and Atiyah-Patodi-Singer index theorem for staggered fermions

Authors: Mendel Nguyen, Hersh Singh

Abstract: We consider the Atiyah-Patodi-Singer (APS) index theorem corresponding to the chiral symmetry of a continuum formulation of staggered fermions called Kähler-Dirac fermions, which have been recently investigated as an ingredient in lattice constructions of chiral gauge theories. We point out that there are two notions of chiral symmetry for Kähler-Dirac fermions, both having a mixed perturbative an… ▽ More We consider the Atiyah-Patodi-Singer (APS) index theorem corresponding to the chiral symmetry of a continuum formulation of staggered fermions called Kähler-Dirac fermions, which have been recently investigated as an ingredient in lattice constructions of chiral gauge theories. We point out that there are two notions of chiral symmetry for Kähler-Dirac fermions, both having a mixed perturbative anomaly with gravity leading to index theorems on closed manifolds. By formulating these theories on a manifold with boundary, we find the APS index theorems corresponding to each of these symmetries, necessary for a complete picture of anomaly inflow, using a recently discovered physics-motivated proof. We comment on a fundamental difference between the nature of these two symmetries by showing that a sensible local, symmetric boundary condition only exists for one of the two symmetries. This sheds light on how these symmetries behave under lattice discretization, and in particular on their use for recent symmetric-mass generation proposals. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 12 pages, 1 figure

Report number: FERMILAB-PUB-24-0262-T

arXiv:2405.08311 [pdf, ps, other]

A Decoupling and Aggregating Framework for Joint Extraction of Entities and Relations

Authors: Yao Wang, Xin Liu, Weikun Kong, Hai-Tao Yu, Teeradaj Racharak, Kyoung-Sook Kim, Minh Le Nguyen

Abstract: Named Entity Recognition and Relation Extraction are two crucial and challenging subtasks in the field of Information Extraction. Despite the successes achieved by the traditional approaches, fundamental research questions remain open. First, most recent studies use parameter sharing for a single subtask or shared features for both two subtasks, ignoring their semantic differences. Second, informa… ▽ More Named Entity Recognition and Relation Extraction are two crucial and challenging subtasks in the field of Information Extraction. Despite the successes achieved by the traditional approaches, fundamental research questions remain open. First, most recent studies use parameter sharing for a single subtask or shared features for both two subtasks, ignoring their semantic differences. Second, information interaction mainly focuses on the two subtasks, leaving the fine-grained informtion interaction among the subtask-specific features of encoding subjects, relations, and objects unexplored. Motivated by the aforementioned limitations, we propose a novel model to jointly extract entities and relations. The main novelties are as follows: (1) We propose to decouple the feature encoding process into three parts, namely encoding subjects, encoding objects, and encoding relations. Thanks to this, we are able to use fine-grained subtask-specific features. (2) We propose novel inter-aggregation and intra-aggregation strategies to enhance the information interaction and construct individual fine-grained subtask-specific features, respectively. The experimental results demonstrate that our model outperforms several previous state-of-the-art models. Extensive additional experiments further confirm the effectiveness of our model. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.07615 [pdf, other]

ViWikiFC: Fact-Checking for Vietnamese Wikipedia-Based Textual Knowledge Source

Authors: Hung Tuan Le, Long Truong To, Manh Trong Nguyen, Kiet Van Nguyen

Abstract: Fact-checking is essential due to the explosion of misinformation in the media ecosystem. Although false information exists in every language and country, most research to solve the problem mainly concentrated on huge communities like English and Chinese. Low-resource languages like Vietnamese are necessary to explore corpora and models for fact verification. To bridge this gap, we construct ViWik… ▽ More Fact-checking is essential due to the explosion of misinformation in the media ecosystem. Although false information exists in every language and country, most research to solve the problem mainly concentrated on huge communities like English and Chinese. Low-resource languages like Vietnamese are necessary to explore corpora and models for fact verification. To bridge this gap, we construct ViWikiFC, the first manual annotated open-domain corpus for Vietnamese Wikipedia Fact Checking more than 20K claims generated by converting evidence sentences extracted from Wikipedia articles. We analyze our corpus through many linguistic aspects, from the new dependency rate, the new n-gram rate, and the new word rate. We conducted various experiments for Vietnamese fact-checking, including evidence retrieval and verdict prediction. BM25 and InfoXLM (Large) achieved the best results in two tasks, with BM25 achieving an accuracy of 88.30% for SUPPORTS, 86.93% for REFUTES, and only 56.67% for the NEI label in the evidence retrieval task, InfoXLM (Large) achieved an F1 score of 86.51%. Furthermore, we also conducted a pipeline approach, which only achieved a strict accuracy of 67.00% when using InfoXLM (Large) and BM25. These results demonstrate that our dataset is challenging for the Vietnamese language model in fact-checking tasks. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.06521 [pdf, ps, other]

Diameter of Commuting Graphs of Lie Algebras

Authors: Hieu V. Ha, Hoa D. Quang, Vu A. Le, Tuyen T. M Nguyen

Abstract: In this paper, we study the connectedness of the commuting graph of a general Lie algebra and provide a process to determine whether the commuting graph is connected or not, as well as to compute an upper bound for its diameter. In addition, we will examine the connectedness and diameter of the commuting graphs of some remarkable classes of Lie algebras, including: (1) a class of Lie algebras with… ▽ More In this paper, we study the connectedness of the commuting graph of a general Lie algebra and provide a process to determine whether the commuting graph is connected or not, as well as to compute an upper bound for its diameter. In addition, we will examine the connectedness and diameter of the commuting graphs of some remarkable classes of Lie algebras, including: (1) a class of Lie algebras with one- or two-dimensional derived algebras; and (2) a class of solvable Lie algebras over the real field of dimension up to $4$. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 21 pages

MSC Class: 17B30; 17B60

arXiv:2405.05687 [pdf, other]

Massive interacting binaries as an enrichment source for multiple populations in star clusters

Authors: Michelle Nguyen, Alison Sills

Abstract: We present a suite of binary evolution models with massive primaries (10 $\leq$ M$_1$ $\leq$ 40 M$_\odot$) and periods and mass ratios chosen such that the systems undergo non-conservative mass transfer while the primaries have helium cores. We track the total mass and chemical composition of the ejecta from these systems. This material shows the abundance signatures of hot hydrogen burning which… ▽ More We present a suite of binary evolution models with massive primaries (10 $\leq$ M$_1$ $\leq$ 40 M$_\odot$) and periods and mass ratios chosen such that the systems undergo non-conservative mass transfer while the primaries have helium cores. We track the total mass and chemical composition of the ejecta from these systems. This material shows the abundance signatures of hot hydrogen burning which are needed to explain the abundance patterns seen in multiple populations in massive star clusters. We then calculate the total yield of a population of binary stars with masses, mass ratios, and periods consistent with their distribution in a field population. We show that the overall abundance of this material is enriched in helium, nitrogen, sodium, and aluminum, and depleted in carbon, oxygen, and magnesium, by amounts that are consistent with observations. We also show that such a population of binaries will return approximately 25% of its mass in this ejecta (compared to 4% if all the stars were single), over a characteristic timescale of about 12 Myr. We argue that massive binaries must be seriously considered as a contributor to the source of enriched material needed to explain the multiple populations in massive clusters, since essentially all massive stars are formed in binaries or higher order multiples, massive binaries are primarily formed in clusters, and massive binaries naturally produce material of the right composition. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Accepted by The Astrophysical Journal

arXiv:2405.00543 [pdf, other]

New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis

Authors: Quy Hoang Nguyen, Minh-Van Truong Nguyen, Kiet Van Nguyen

Abstract: The emergence of multimodal data on social media platforms presents new opportunities to better understand user sentiments toward a given aspect. However, existing multimodal datasets for Aspect-Category Sentiment Analysis (ACSA) often focus on textual annotations, neglecting fine-grained information in images. Consequently, these datasets fail to fully exploit the richness inherent in multimodal.… ▽ More The emergence of multimodal data on social media platforms presents new opportunities to better understand user sentiments toward a given aspect. However, existing multimodal datasets for Aspect-Category Sentiment Analysis (ACSA) often focus on textual annotations, neglecting fine-grained information in images. Consequently, these datasets fail to fully exploit the richness inherent in multimodal. To address this, we introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4,876 text-image pairs with 14,618 fine-grained annotations for both text and image in the hotel domain. Additionally, we propose a Fine-Grained Cross-Modal Fusion Framework (FCMF) that effectively learns both intra- and inter-modality interactions and then fuses these information to produce a unified multimodal representation. Experimental results show that our framework outperforms SOTA models on the ViMACSA dataset, achieving the highest F1 score of 79.73%. We also explore characteristics and challenges in Vietnamese multimodal sentiment analysis, including misspellings, abbreviations, and the complexities of the Vietnamese language. This work contributes both a benchmark dataset and a new framework that leverages fine-grained multimodal information to improve multimodal aspect-category sentiment analysis. Our dataset is available for research purposes: https://github.com/hoangquy18/Multimodal-Aspect-Category-Sentiment-Analysis. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.15617 [pdf, other]

DPO: Differential reinforcement learning with application to optimal configuration search

Authors: Chandrajit Bajaj, Minh Nguyen

Abstract: Reinforcement learning (RL) with continuous state and action spaces remains one of the most challenging problems within the field. Most current learning methods focus on integral identities such as value functions to derive an optimal strategy for the learning agent. In this paper, we instead study the dual form of the original RL formulation to propose the first differential RL framework that can… ▽ More Reinforcement learning (RL) with continuous state and action spaces remains one of the most challenging problems within the field. Most current learning methods focus on integral identities such as value functions to derive an optimal strategy for the learning agent. In this paper, we instead study the dual form of the original RL formulation to propose the first differential RL framework that can handle settings with limited training samples and short-length episodes. Our approach introduces Differential Policy Optimization (DPO), a pointwise and stage-wise iteration method that optimizes policies encoded by local-movement operators. We prove a pointwise convergence estimate for DPO and provide a regret bound comparable with current theoretical works. Such pointwise estimate ensures that the learned policy matches the optimal path uniformly across different steps. We then apply DPO to a class of practical RL problems which search for optimal configurations with Lagrangian rewards. DPO is easy to implement, scalable, and shows competitive results on benchmarking experiments against several popular RL methods. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 24 pages, 1 figure, 2 tables

arXiv:2404.12555 [pdf, other]

Sociotechnical Considerations for SLAM Anchors in Location-Based AR

Authors: Tiffany T. Nguyen, Cinthya Jauregui, Sarah H. Sallee, Mohan R. Chandrasekar, Liam A'Hearn, Dominic J. Woetzel, Pinak Paliwal, Madison Nguyen, Isabella `Amne Gomez, Xinqi Zhang, Lee M. Panich, Danielle M. Heitmuller, Amy Lueck, Kai Lukoff

Abstract: In this position paper, we explore the power of storytelling and its connection to place through the use of Augmented Reality (AR) technology, particularly within the context of Thámien Ohlone history on the Santa Clara University campus. To do this, we utilized SLAM and 8th Wall to create virtual, location-based experiences that geolocate tribal stories at present-day sites, showcase the living c… ▽ More In this position paper, we explore the power of storytelling and its connection to place through the use of Augmented Reality (AR) technology, particularly within the context of Thámien Ohlone history on the Santa Clara University campus. To do this, we utilized SLAM and 8th Wall to create virtual, location-based experiences that geolocate tribal stories at present-day sites, showcase the living culture of the Thámien Ohlone tribe, and advocate for physical markers that could exist to recognize their story. When doing so, we made sure to select locations that added to the story each stop tells to serve as our anchors. Our research then investigates both the social and technical considerations involved in selecting anchors for AR experiences, using the Thámien Ohlone AR Tour as a case study. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Presented at CHI 2024 (arXiv:2404.05889)

Report number: ARSJ/2024/13

arXiv:2404.07343 [pdf, other]

Monitoring AGNs with H$βべーた$ Asymmetry. IV. First Reverberation Mapping Results of 14 AGNs

Authors: T. E. Zastrocky, Michael S. Brotherton, Pu Du, Jacob N. McLane, Kianna A. Olson, D. A. Dale, H. A. Kobulnicky, Jaya Maithil, My L. Nguyen, William T. Chick, David H. Kasper, Derek Hand, C. Adelman, Z. Carter, G. Murphree, M. Oeur, T. Roth, S. Schonsberg, M. J. Caradonna, J. Favro, A. J. Ferguson, I. M. Gonzalez, L. M. Hadding, H. D. Hagler, C. J. Rogers , et al. (19 additional authors not shown)

Abstract: We report first-time reverberation mapping results for 14 AGNs from the ongoing Monitoring AGNs with H$βべーた$ Asymmetry campaign (MAHA). These results utilize optical spectra obtained with the Long Slit Spectrograph on the Wyoming Infrared 2.3m Telescope between 2017 November-2023 May. MAHA combines long-duration monitoring with high cadence. We report results from multiple observing seasons for 9 of… ▽ More We report first-time reverberation mapping results for 14 AGNs from the ongoing Monitoring AGNs with H$βべーた$ Asymmetry campaign (MAHA). These results utilize optical spectra obtained with the Long Slit Spectrograph on the Wyoming Infrared 2.3m Telescope between 2017 November-2023 May. MAHA combines long-duration monitoring with high cadence. We report results from multiple observing seasons for 9 of the 14 objects. These results include H$βべーた$ time lags, supermassive black hole masses, and velocity-resolved time lags. The velocity-resolved lags allow us to investigate the kinematics of the broad-line region. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 35 pages, 19 figures, accepted for publication in ApJ Supplement

arXiv:2404.07341 [pdf, other]

Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

Authors: Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato

Abstract: This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources. To achieve this, we perform Noisy Student Training after generating pseudo-labels for the unlabeled public data using a strong Conformer RNN-T baseline model. The addition of these pseu… ▽ More This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources. To achieve this, we perform Noisy Student Training after generating pseudo-labels for the unlabeled public data using a strong Conformer RNN-T baseline model. The addition of these pseudo-labeled data results in remarkable improvements in relative Word Error Rate (WER) by 11.5% and 24.3% for our asynchronous and realtime models, respectively. Additionally, the model is more robust to background noise owing to the addition of these data. The results obtained in this study demonstrate that the incorporation of pseudo-labeled publicly available data is a highly effective strategy for improving ASR accuracy and noise robustness. △ Less

Submitted 12 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.04145 [pdf, other]

A global approach for the inverse scattering problem using a Carleman contraction map

Authors: Phuong M. Nguyen, Loc H. Nguyen, Huong T. T. Vu

Abstract: This paper addresses the inverse scattering problem in the domain Omega. The input data, measured outside Omega, involve the waves generated by the interaction of plane waves with various directions and unknown scatterers fully occluded inside Omega. The output of this problem is the spatially dielectric constant of these scatterers. Our approach to solving this problem consists of two primary sta… ▽ More This paper addresses the inverse scattering problem in the domain Omega. The input data, measured outside Omega, involve the waves generated by the interaction of plane waves with various directions and unknown scatterers fully occluded inside Omega. The output of this problem is the spatially dielectric constant of these scatterers. Our approach to solving this problem consists of two primary stages. Initially, we eliminate the unknown dielectric constant from the governing equation, resulting in a system of partial differential equations. Subsequently, we develop the Carleman contraction mapping method to effectively tackle this system. It is noteworthy to highlight this method's robustness. It does not request a precise initial guess of the true solution, and its computational cost is not expensive. Some numerical examples are presented. △ Less

Submitted 21 June, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.03002 [pdf, other]

DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations

Authors: DESI Collaboration, A. G. Adame, J. Aguilar, S. Ahlen, S. Alam, D. M. Alexander, M. Alvarez, O. Alves, A. Anand, U. Andrade, E. Armengaud, S. Avila, A. Aviles, H. Awan, B. Bahr-Kalus, S. Bailey, C. Baltay, A. Bault, J. Behera, S. BenZvi, A. Bera, F. Beutler, D. Bianchi, C. Blake, R. Blum , et al. (178 additional authors not shown)

Abstract: We present cosmological results from the measurement of baryon acoustic oscillations (BAO) in galaxy, quasar and Lyman-$αあるふぁ$ forest tracers from the first year of observations from the Dark Energy Spectroscopic Instrument (DESI), to be released in the DESI Data Release 1. DESI BAO provide robust measurements of the transverse comoving distance and Hubble rate, or their combination, relative to the s… ▽ More We present cosmological results from the measurement of baryon acoustic oscillations (BAO) in galaxy, quasar and Lyman-$αあるふぁ$ forest tracers from the first year of observations from the Dark Energy Spectroscopic Instrument (DESI), to be released in the DESI Data Release 1. DESI BAO provide robust measurements of the transverse comoving distance and Hubble rate, or their combination, relative to the sound horizon, in seven redshift bins from over 6 million extragalactic objects in the redshift range $0.1<z<4.2$. DESI BAO data alone are consistent with the standard flat $Λらむだ$CDM cosmological model with a matter density $Ωおめが_\mathrm{m}=0.295\pm 0.015$. Paired with a BBN prior and the robustly measured acoustic angular scale from the CMB, DESI requires $H_0=(68.52\pm0.62)$ km/s/Mpc. In conjunction with CMB anisotropies from Planck and CMB lensing data from Planck and ACT, we find $Ωおめが_\mathrm{m}=0.307\pm 0.005$ and $H_0=(67.97\pm0.38)$ km/s/Mpc. Extending the baseline model with a constant dark energy equation of state parameter $w$, DESI BAO alone require $w=-0.99^{+0.15}_{-0.13}$. In models with a time-varying dark energy equation of state parametrized by $w_0$ and $w_a$, combinations of DESI with CMB or with SN~Ia individually prefer $w_0>-1$ and $w_a<0$. This preference is 2.6$σしぐま$ for the DESI+CMB combination, and persists or grows when SN~Ia are added in, giving results discrepant with the $Λらむだ$CDM model at the $2.5σしぐま$, $3.5σしぐま$ or $3.9σしぐま$ levels for the addition of Pantheon+, Union3, or DES-SN5YR datasets respectively. For the flat $Λらむだ$CDM model with the sum of neutrino mass $\sum m_νにゅー$ free, combining the DESI and CMB data yields an upper limit $\sum m_νにゅー< 0.072$ $(0.113)$ eV at 95% confidence for a $\sum m_νにゅー>0$ $(\sum m_νにゅー>0.059)$ eV prior. These neutrino-mass constraints are substantially relaxed in models beyond $Λらむだ$CDM. [Abridged.] △ Less

Submitted 24 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: This DESI Collaboration Key Publication is part of the 2024 publication series using the first year of observations (see https://data.desi.lbl.gov/doc/papers). Typos corrected and a new figure and discussion added to Appendix A

arXiv:2404.03001 [pdf, other]

DESI 2024 IV: Baryon Acoustic Oscillations from the Lyman Alpha Forest

Authors: DESI Collaboration, A. G. Adame, J. Aguilar, S. Ahlen, S. Alam, D. M. Alexander, M. Alvarez, O. Alves, A. Anand, U. Andrade, E. Armengaud, S. Avila, A. Aviles, H. Awan, S. Bailey, C. Baltay, A. Bault, J. Bautista, J. Behera, S. BenZvi, F. Beutler, D. Bianchi, C. Blake, R. Blum, S. Brieden , et al. (174 additional authors not shown)

Abstract: We present the measurement of Baryon Acoustic Oscillations (BAO) from the Lyman-$αあるふぁ$ (Ly$αあるふぁ$) forest of high-redshift quasars with the first-year dataset of the Dark Energy Spectroscopic Instrument (DESI). Our analysis uses over $420\,000$ Ly$αあるふぁ$ forest spectra and their correlation with the spatial distribution of more than $700\,000$ quasars. An essential facet of this work is the development of a… ▽ More We present the measurement of Baryon Acoustic Oscillations (BAO) from the Lyman-$αあるふぁ$ (Ly$αあるふぁ$) forest of high-redshift quasars with the first-year dataset of the Dark Energy Spectroscopic Instrument (DESI). Our analysis uses over $420\,000$ Ly$αあるふぁ$ forest spectra and their correlation with the spatial distribution of more than $700\,000$ quasars. An essential facet of this work is the development of a new analysis methodology on a blinded dataset. We conducted rigorous tests using synthetic data to ensure the reliability of our methodology and findings before unblinding. Additionally, we conducted multiple data splits to assess the consistency of the results and scrutinized various analysis approaches to confirm their robustness. For a given value of the sound horizon ($r_d$), we measure the expansion at $z_{\rm eff}=2.33$ with 2\% precision, $H(z_{\rm eff}) = (239.2 \pm 4.8) (147.09~{\rm Mpc} /r_d)$ km/s/Mpc. Similarly, we present a 2.4\% measurement of the transverse comoving distance to the same redshift, $D_M(z_{\rm eff}) = (5.84 \pm 0.14) (r_d/147.09~{\rm Mpc})$ Gpc. Together with other DESI BAO measurements at lower redshifts, these results are used in a companion paper to constrain cosmological parameters. △ Less

Submitted 12 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: This DESI Collaboration Key Publication is part of the 2024 publication series using the first year of observations (see https://data.desi.lbl.gov/doc/papers)

arXiv:2404.03000 [pdf, other]

DESI 2024 III: Baryon Acoustic Oscillations from Galaxies and Quasars

Authors: DESI Collaboration, A. G. Adame, J. Aguilar, S. Ahlen, S. Alam, D. M. Alexander, M. Alvarez, O. Alves, A. Anand, U. Andrade, E. Armengaud, S. Avila, A. Aviles, H. Awan, S. Bailey, C. Baltay, A. Bault, J. Behera, S. BenZvi, F. Beutler, D. Bianchi, C. Blake, R. Blum, S. Brieden, A. Brodzeller , et al. (171 additional authors not shown)

Abstract: We present the DESI 2024 galaxy and quasar baryon acoustic oscillations (BAO) measurements using over 5.7 million unique galaxy and quasar redshifts in the range 0.1<z<2.1. Divided by tracer type, we utilize 300,017 galaxies from the magnitude-limited Bright Galaxy Survey with 0.1<z<0.4, 2,138,600 Luminous Red Galaxies with 0.4<z<1.1, 2,432,022 Emission Line Galaxies with 0.8<z<1.6, and 856,652 qu… ▽ More We present the DESI 2024 galaxy and quasar baryon acoustic oscillations (BAO) measurements using over 5.7 million unique galaxy and quasar redshifts in the range 0.1<z<2.1. Divided by tracer type, we utilize 300,017 galaxies from the magnitude-limited Bright Galaxy Survey with 0.1<z<0.4, 2,138,600 Luminous Red Galaxies with 0.4<z<1.1, 2,432,022 Emission Line Galaxies with 0.8<z<1.6, and 856,652 quasars with 0.8<z<2.1, over a ~7,500 square degree footprint. The analysis was blinded at the catalog-level to avoid confirmation bias. All fiducial choices of the BAO fitting and reconstruction methodology, as well as the size of the systematic errors, were determined on the basis of the tests with mock catalogs and the blinded data catalogs. We present several improvements to the BAO analysis pipeline, including enhancing the BAO fitting and reconstruction methods in a more physically-motivated direction, and also present results using combinations of tracers. We present a re-analysis of SDSS BOSS and eBOSS results applying the improved DESI methodology and find scatter consistent with the level of the quoted SDSS theoretical systematic uncertainties. With the total effective survey volume of ~ 18 Gpc$^3$, the combined precision of the BAO measurements across the six different redshift bins is ~0.52%, marking a 1.2-fold improvement over the previous state-of-the-art results using only first-year data. We detect the BAO in all of these six redshift bins. The highest significance of BAO detection is $9.1σしぐま$ at the effective redshift of 0.93, with a constraint of 0.86% placed on the BAO scale. We find our measurements are systematically larger than the prediction of Planck-2018 LCDM model at z<0.8. We translate the results into transverse comoving distance and radial Hubble distance measurements, which are used to constrain cosmological models in our companion paper [abridged]. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: This DESI Collaboration Key Publication is part of the 2024 publication series using the first year of observations (see https://data.desi.lbl.gov/doc/papers)

arXiv:2404.02949 [pdf, other]

The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability

Authors: Stephen Casper, Jieun Yun, Joonhyuk Baek, Yeseong Jung, Minhwan Kim, Kiwan Kwon, Saerom Park, Hayden Moore, David Shriver, Marissa Connor, Keltin Grimes, Angus Nicolson, Arush Tagade, Jessica Rumbelow, Hieu Minh Nguyen, Dylan Hadfield-Menell

Abstract: Interpretability techniques are valuable for helping humans understand and oversee AI systems. The SaTML 2024 CNN Interpretability Competition solicited novel methods for studying convolutional neural networks (CNNs) at the ImageNet scale. The objective of the competition was to help human crowd-workers identify trojans in CNNs. This report showcases the methods and results of four featured compet… ▽ More Interpretability techniques are valuable for helping humans understand and oversee AI systems. The SaTML 2024 CNN Interpretability Competition solicited novel methods for studying convolutional neural networks (CNNs) at the ImageNet scale. The objective of the competition was to help human crowd-workers identify trojans in CNNs. This report showcases the methods and results of four featured competition entries. It remains challenging to help humans reliably diagnose trojans via interpretability tools. However, the competition's entries have contributed new techniques and set a new record on the benchmark from Casper et al., 2023. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Competition for SaTML 2024

arXiv:2404.02717 [pdf, other]

Automatic Prompt Selection for Large Language Models

Authors: Viet-Tung Do, Van-Khanh Hoang, Duy-Hung Nguyen, Shahab Sabahi, Jeff Yang, Hajime Hotta, Minh-Tien Nguyen, Hung Le

Abstract: Large Language Models (LLMs) can perform various natural language processing tasks with suitable instruction prompts. However, designing effective prompts manually is challenging and time-consuming. Existing methods for automatic prompt optimization either lack flexibility or efficiency. In this paper, we propose an effective approach to automatically select the optimal prompt for a given input fr… ▽ More Large Language Models (LLMs) can perform various natural language processing tasks with suitable instruction prompts. However, designing effective prompts manually is challenging and time-consuming. Existing methods for automatic prompt optimization either lack flexibility or efficiency. In this paper, we propose an effective approach to automatically select the optimal prompt for a given input from a finite set of synthetic candidate prompts. Our approach consists of three steps: (1) clustering the training data and generating candidate prompts for each cluster using an LLM-based prompt generator; (2) synthesizing a dataset of input-prompt-output tuples for training a prompt evaluator to rank the prompts based on their relevance to the input; (3) using the prompt evaluator to select the best prompt for a new input at test time. Our approach balances prompt generality-specificity and eliminates the need for resource-intensive training and inference. It demonstrates competitive performance on zero-shot question-answering datasets: GSM8K, MultiArith, and AQuA. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: preprint

Showing 1–50 of 810 results for author: Nguyen, M