-
Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model
Authors:
Duy M. H. Nguyen,
An T. Le,
Trung Q. Nguyen,
Nghiem T. Diep,
Tai Nguyen,
Duy Duong-Tran,
Jan Peters,
Li Shen,
Mathias Niepert,
Daniel Sonntag
Abstract:
Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we c…
▽ More
Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model's feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT's characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
EMO-KNOW: A Large Scale Dataset on Emotion and Emotion-cause
Authors:
Mia Huong Nguyen,
Yasith Samaradivakara,
Prasanth Sasikumar,
Chitralekha Gupta,
Suranga Nanayakkara
Abstract:
Emotion-Cause analysis has attracted the attention of researchers in recent years. However, most existing datasets are limited in size and number of emotion categories. They often focus on extracting parts of the document that contain the emotion cause and fail to provide more abstractive, generalizable root cause. To bridge this gap, we introduce a large-scale dataset of emotion causes, derived f…
▽ More
Emotion-Cause analysis has attracted the attention of researchers in recent years. However, most existing datasets are limited in size and number of emotion categories. They often focus on extracting parts of the document that contain the emotion cause and fail to provide more abstractive, generalizable root cause. To bridge this gap, we introduce a large-scale dataset of emotion causes, derived from 9.8 million cleaned tweets over 15 years. We describe our curation process, which includes a comprehensive pipeline for data gathering, cleaning, labeling, and validation, ensuring the dataset's reliability and richness. We extract emotion labels and provide abstractive summarization of the events causing emotions. The final dataset comprises over 700,000 tweets with corresponding emotion-cause pairs spanning 48 emotion classes, validated by human evaluators. The novelty of our dataset stems from its broad spectrum of emotion classes and the abstractive emotion cause that facilitates the development of an emotion-cause knowledge graph for nuanced reasoning. Our dataset will enable the design of emotion-aware systems that account for the diverse emotional responses of different people for the same event.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology
Authors:
Minh Huynh Nguyen,
Thang Phan Chau,
Phong X. Nguyen,
Nghi D. Q. Bui
Abstract:
Software agents have emerged as promising tools for addressing complex software engineering tasks. However, existing works oversimplify software development workflows by following the waterfall model. Thus, we propose AgileCoder, a multi-agent system that integrates Agile Methodology (AM) into the framework. This system assigns specific AM roles such as Product Manager, Developer, and Tester to di…
▽ More
Software agents have emerged as promising tools for addressing complex software engineering tasks. However, existing works oversimplify software development workflows by following the waterfall model. Thus, we propose AgileCoder, a multi-agent system that integrates Agile Methodology (AM) into the framework. This system assigns specific AM roles such as Product Manager, Developer, and Tester to different agents, who then collaboratively develop software based on user inputs. AgileCoder enhances development efficiency by organizing work into sprints, focusing on incrementally developing software through sprints. Additionally, we introduce Dynamic Code Graph Generator, a module that creates a Code Dependency Graph dynamically as updates are made to the codebase. This allows agents to better comprehend the codebase, leading to more precise code generation and modifications throughout the software development process. AgileCoder surpasses existing benchmarks, like ChatDev and MetaGPT, establishing a new standard and showcasing the capabilities of multi-agent systems in advanced software engineering environments. Our source code can be found at https://github.com/FSoft-AI4Code/AgileCoder.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
MV2Cyl: Reconstructing 3D Extrusion Cylinders from Multi-View Images
Authors:
Eunji Hong,
Minh Hieu Nguyen,
Mikaela Angelina Uy,
Minhyuk Sung
Abstract:
We present MV2Cyl, a novel method for reconstructing 3D from 2D multi-view images, not merely as a field or raw geometry but as a sketch-extrude CAD model. Extracting extrusion cylinders from raw 3D geometry has been extensively researched in computer vision, while the processing of 3D data through neural networks has remained a bottleneck. Since 3D scans are generally accompanied by multi-view im…
▽ More
We present MV2Cyl, a novel method for reconstructing 3D from 2D multi-view images, not merely as a field or raw geometry but as a sketch-extrude CAD model. Extracting extrusion cylinders from raw 3D geometry has been extensively researched in computer vision, while the processing of 3D data through neural networks has remained a bottleneck. Since 3D scans are generally accompanied by multi-view images, leveraging 2D convolutional neural networks allows these images to be exploited as a rich source for extracting extrusion cylinder information. However, we observe that extracting only the surface information of the extrudes and utilizing it results in suboptimal outcomes due to the challenges in the occlusion and surface segmentation. By synergizing with the extracted base curve information, we achieve the optimal reconstruction result with the best accuracy in 2D sketch and extrude parameter estimation. Our experiments, comparing our method with previous work that takes a raw 3D point cloud as input, demonstrate the effectiveness of our approach by taking advantage of multi-view images.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data
Authors:
Hoang H. Le,
Duy M. H. Nguyen,
Omair Shahzad Bhatti,
Laszlo Kopacsi,
Thinh P. Ngo,
Binh T. Nguyen,
Michael Barz,
Daniel Sonntag
Abstract:
Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object r…
▽ More
Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings. Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations. Such mechanisms enable us to learn embedding functions capable of generalizing to new object angle views, facilitating rapid adaptation and efficient reasoning in dynamic contexts as users navigate their environment. Through experiments conducted on three distinct video sequences, our interactive-based method showcases significant performance improvements over fixed training/testing algorithms, even when trained on considerably smaller annotated samples collected through user feedback. Furthermore, we demonstrate exceptional efficiency in data annotation processes and surpass prior interactive methods that use complete object detectors, combine detectors with convolutional networks, or employ interactive video segmentation.
△ Less
Submitted 7 July, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
BioBERT-based Deep Learning and Merged ChemProt-DrugProt for Enhanced Biomedical Relation Extraction
Authors:
Bridget T. McInnes,
Jiawei Tang,
Darshini Mahendran,
Mai H. Nguyen
Abstract:
This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improv…
▽ More
This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improvements, particularly in CPR groups shared between the datasets. The findings underscore the importance of dataset merging in augmenting sample counts and improving model accuracy. Moreover, the study highlights the potential of automated information extraction in biomedical research and clinical practice.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Accelerating Transformers with Spectrum-Preserving Token Merging
Authors:
Hoai-Chau Tran,
Duy M. H. Nguyen,
Duy M. Nguyen,
Trung-Tin Nguyen,
Ngan Le,
Pengtao Xie,
Daniel Sonntag,
James Y. Zou,
Binh T. Nguyen,
Mathias Niepert
Abstract:
Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Pr…
▽ More
Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents a novel paradigm called PiToMe, which prioritizes the preservation of informative tokens using an additional metric termed the energy score. This score identifies large clusters of similar tokens as high-energy, indicating potential candidates for merging, while smaller (unique and isolated) clusters are considered as low-energy and preserved. Experimental findings demonstrate that PiToMe saved from 40-60\% FLOPs of the base models while exhibiting superior off-the-shelf performance on image classification (0.5\% average performance drop of ViT-MAE-H compared to 2.6\% as baselines), image-text retrieval (0.3\% average performance drop of CLIP on Flickr30k compared to 4.5\% as others), and analogously in visual questions answering with LLaVa-7B. Furthermore, PiToMe is theoretically shown to preserve intrinsic spectral properties of the original token space under mild conditions
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks
Authors:
Duy M. H. Nguyen,
Nina Lukashina,
Tai Nguyen,
An T. Le,
TrungTin Nguyen,
Nhat Ho,
Jan Peters,
Daniel Sonntag,
Viktor Zaverkin,
Mathias Niepert
Abstract:
A molecule's 2D representation consists of its atoms, their attributes, and the molecule's covalent bonds. A 3D (geometric) representation of a molecule is called a conformer and consists of its atom types and Cartesian coordinates. Every conformer has a potential energy, and the lower this energy, the more likely it occurs in nature. Most existing machine learning methods for molecular property p…
▽ More
A molecule's 2D representation consists of its atoms, their attributes, and the molecule's covalent bonds. A 3D (geometric) representation of a molecule is called a conformer and consists of its atom types and Cartesian coordinates. Every conformer has a potential energy, and the lower this energy, the more likely it occurs in nature. Most existing machine learning methods for molecular property prediction consider either 2D molecular graphs or 3D conformer structure representations in isolation. Inspired by recent work on using ensembles of conformers in conjunction with 2D graph representations, we propose $\mathrm{E}$(3)-invariant molecular conformer aggregation networks. The method integrates a molecule's 2D representation with that of multiple of its conformers. Contrary to prior work, we propose a novel 2D-3D aggregation mechanism based on a differentiable solver for the \emph{Fused Gromov-Wasserstein Barycenter} problem and the use of an efficient conformer generation method based on distance geometry. We show that the proposed aggregation mechanism is $\mathrm{E}$(3) invariant and propose an efficient GPU implementation. Moreover, we demonstrate that the aggregation mechanism helps to significantly outperform state-of-the-art molecule property prediction methods on established datasets.
△ Less
Submitted 10 June, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation
Authors:
Duy Minh Ho Nguyen,
Tan Ngoc Pham,
Nghiem Tuong Diep,
Nghi Quoc Phan,
Quang Pham,
Vinh Tong,
Binh T. Nguyen,
Ngan Hoang Le,
Nhat Ho,
Pengtao Xie,
Daniel Sonntag,
Mathias Niepert
Abstract:
Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for…
▽ More
Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for only a limited amount of annotated samples. While numerous techniques have focused on developing better fine-tuning strategies to adapt these models for specific domains, we instead examine their robustness to domain shifts in the medical image segmentation task. To this end, we compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset and show that foundation-based models enjoy better robustness than other architectures. From here, we further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution (OOD) data, proving particularly beneficial for real-world applications. Our experiments not only reveal the limitations of current indicators like accuracy on the line or agreement on the line commonly used in natural image applications but also emphasize the promise of the introduced Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend to higher out-of-distribution (OOD) performance.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Functional Overlap Reranking for Neural Code Generation
Authors:
Hung Quoc To,
Minh Huynh Nguyen,
Nghi D. Q. Bui
Abstract:
Code Large Language Models (CodeLLMs) have ushered in a new era in code generation advancements. However, selecting the best code solutions from all possible CodeLLM outputs remains a challenge. Previous methods often overlooked the intricate functional similarities and interactions between solution clusters. We introduce SRank, a novel reranking strategy for selecting the best solutions from code…
▽ More
Code Large Language Models (CodeLLMs) have ushered in a new era in code generation advancements. However, selecting the best code solutions from all possible CodeLLM outputs remains a challenge. Previous methods often overlooked the intricate functional similarities and interactions between solution clusters. We introduce SRank, a novel reranking strategy for selecting the best solutions from code generation, focusing on modeling the relationships between clusters of solutions. By quantifying the functional overlap between solution clusters, our approach provides a better ranking strategy for code solutions. Empirical results show that our method achieves remarkable results on the pass@1 score. For instance, on the Human-Eval benchmark, we achieve 69.66% in pass@1 with Codex002, 75.31% with WizardCoder, 53.99% with StarCoder, and 60.55% with CodeGen, surpassing state-of-the-art code generation reranking methods such as CodeT and Coder-Reviewer on the same CodeLLM by a significant margin (approximately 6.1% improvement on average). Even in scenarios with a limited number of sampled solutions and test cases, our approach demonstrates robustness and superiority, marking a new benchmark in code generation reranking. Our implementation can be found at https://github.com/FSoft-AI4Code/SRank-CodeRanker.
△ Less
Submitted 22 June, 2024; v1 submitted 16 October, 2023;
originally announced November 2023.
-
Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual Programming
Authors:
Manh Hung Nguyen,
Sebastian Tschiatschek,
Adish Singla
Abstract:
Student modeling is central to many educational technologies as it enables predicting future learning outcomes and designing targeted instructional strategies. However, open-ended learning domains pose challenges for accurately modeling students due to the diverse behaviors and a large space of possible misconceptions. To approach these challenges, we explore the application of large language mode…
▽ More
Student modeling is central to many educational technologies as it enables predicting future learning outcomes and designing targeted instructional strategies. However, open-ended learning domains pose challenges for accurately modeling students due to the diverse behaviors and a large space of possible misconceptions. To approach these challenges, we explore the application of large language models (LLMs) for in-context student modeling in open-ended learning domains. More concretely, given a particular student's attempt on a reference task as observation, the objective is to synthesize the student's attempt on a target task. We introduce a novel framework, LLM for Student Synthesis (LLM-SS), that leverages LLMs for synthesizing a student's behavior. Our framework can be combined with different LLMs; moreover, we fine-tune LLMs to boost their student modeling capabilities. We instantiate several methods based on LLM-SS framework and evaluate them using an existing benchmark, StudentSyn, for student attempt synthesis in a visual programming domain. Experimental results show that our methods perform significantly better than the baseline method NeurSS provided in the StudentSyn benchmark. Furthermore, our method using a fine-tuned version of the GPT-3.5 model is significantly better than using the base GPT-3.5 model and gets close to human tutors' performance.
△ Less
Submitted 3 May, 2024; v1 submitted 15 October, 2023;
originally announced October 2023.
-
Prescribed Fire Modeling using Knowledge-Guided Machine Learning for Land Management
Authors:
Somya Sharma Chatterjee,
Kelly Lindsay,
Neel Chatterjee,
Rohan Patil,
Ilkay Altintas De Callafon,
Michael Steinbach,
Daniel Giron,
Mai H. Nguyen,
Vipin Kumar
Abstract:
In recent years, the increasing threat of devastating wildfires has underscored the need for effective prescribed fire management. Process-based computer simulations have traditionally been employed to plan prescribed fires for wildfire prevention. However, even simplified process models like QUIC-Fire are too compute-intensive to be used for real-time decision-making, especially when weather cond…
▽ More
In recent years, the increasing threat of devastating wildfires has underscored the need for effective prescribed fire management. Process-based computer simulations have traditionally been employed to plan prescribed fires for wildfire prevention. However, even simplified process models like QUIC-Fire are too compute-intensive to be used for real-time decision-making, especially when weather conditions change rapidly. Traditional ML methods used for fire modeling offer computational speedup but struggle with physically inconsistent predictions, biased predictions due to class imbalance, biased estimates for fire spread metrics (e.g., burned area, rate of spread), and generalizability in out-of-distribution wind conditions. This paper introduces a novel machine learning (ML) framework that enables rapid emulation of prescribed fires while addressing these concerns. By incorporating domain knowledge, the proposed method helps reduce physical inconsistencies in fuel density estimates in data-scarce scenarios. To overcome the majority class bias in predictions, we leverage pre-existing source domain data to augment training data and learn the spread of fire more effectively. Finally, we overcome the problem of biased estimation of fire spread metrics by incorporating a hierarchical modeling structure to capture the interdependence in fuel density and burned area. Notably, improvement in fire metric (e.g., burned area) estimates offered by our framework makes it useful for fire managers, who often rely on these fire metric estimates to make decisions about prescribed burn management. Furthermore, our framework exhibits better generalization capabilities than the other ML-based fire modeling methods across diverse wind conditions and ignition patterns.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching
Authors:
Duy M. H. Nguyen,
Hoang Nguyen,
Nghiem T. Diep,
Tan N. Pham,
Tri Cao,
Binh T. Nguyen,
Paul Swoboda,
Nhat Ho,
Shadi Albarqouni,
Pengtao Xie,
Daniel Sonntag,
Mathias Niepert
Abstract:
Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and me…
▽ More
Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.
△ Less
Submitted 18 November, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation
Authors:
Juil Koo,
Seungwoo Yoo,
Minh Hieu Nguyen,
Minhyuk Sung
Abstract:
We present a cascaded diffusion model based on a part-level implicit 3D representation. Our model achieves state-of-the-art generation quality and also enables part-level shape editing and manipulation without any additional training in conditional setup. Diffusion models have demonstrated impressive capabilities in data generation as well as zero-shot completion and editing via a guided reverse p…
▽ More
We present a cascaded diffusion model based on a part-level implicit 3D representation. Our model achieves state-of-the-art generation quality and also enables part-level shape editing and manipulation without any additional training in conditional setup. Diffusion models have demonstrated impressive capabilities in data generation as well as zero-shot completion and editing via a guided reverse process. Recent research on 3D diffusion models has focused on improving their generation capabilities with various data representations, while the absence of structural information has limited their capability in completion and editing tasks. We thus propose our novel diffusion model using a part-level implicit representation. To effectively learn diffusion with high-dimensional embedding vectors of parts, we propose a cascaded framework, learning diffusion first on a low-dimensional subspace encoding extrinsic parameters of parts and then on the other high-dimensional subspace encoding intrinsic attributes. In the experiments, we demonstrate the outperformance of our method compared with the previous ones both in generation and part-level completion and manipulation tasks.
△ Less
Submitted 20 March, 2024; v1 submitted 21 March, 2023;
originally announced March 2023.
-
Integrable Systems Arising from Separation of Variables on $S^{3}$
Authors:
Diana M. H. Nguyen,
Sean R. Dawson,
Holger R. Dullin
Abstract:
We show that the space of orthogonally separable coordinates on the sphere $S^3$ induces a natural family of integrable systems, which after symplectic reduction leads to a family of integrable systems on $S^2 \times S^2$. The generic member of the family corresponds to ellipsoidal coordinates. We use the theory of compatible Poisson structures to study the critical points and critical values of t…
▽ More
We show that the space of orthogonally separable coordinates on the sphere $S^3$ induces a natural family of integrable systems, which after symplectic reduction leads to a family of integrable systems on $S^2 \times S^2$. The generic member of the family corresponds to ellipsoidal coordinates. We use the theory of compatible Poisson structures to study the critical points and critical values of the momentum map. Interesting structure arises because the ellipsoidal coordinate system can degenerate in a variety of ways, and all possible orthogonally separable coordinate systems on $S^3$ (including degenerations) have the topology of the Stasheff polytope $K^4$, which is a pentagon. We describe how the generic integrable system degenerates, and how the appearance of global $SO(2)$ and $SO(3)$ symmetries is the main feature that organises the various degenerate systems. For the whole family we show that there is an action map whose image is an equilateral triangle. When higher symmetry is present, this triangle ``unfolds'' into a semi-toric polygon (when there is one global $S^1$-action) or a Delzant polygon (when there are two global $S^1$-actions). We believe that this family of integrable systems is a natural playground for theories of global symplectic classification of integrable systems.
△ Less
Submitted 26 February, 2023;
originally announced February 2023.
-
DRG-Net: Interactive Joint Learning of Multi-lesion Segmentation and Classification for Diabetic Retinopathy Grading
Authors:
Hasan Md Tusfiqur,
Duy M. H. Nguyen,
Mai T. N. Truong,
Triet A. Nguyen,
Binh T. Nguyen,
Michael Barz,
Hans-Juergen Profitlich,
Ngoc T. T. Than,
Ngan Le,
Pengtao Xie,
Daniel Sonntag
Abstract:
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DR…
▽ More
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
Multimodal Wildland Fire Smoke Detection
Authors:
Siddhant Baldota,
Shreyas Anantha Ramaprasad,
Jaspreet Kaur Bhamra,
Shane Luna,
Ravi Ramachandra,
Eugene Zen,
Harrison Kim,
Daniel Crawl,
Ismael Perez,
Ilkay Altintas,
Garrison W. Cottrell,
Mai H. Nguyen
Abstract:
Research has shown that climate change creates warmer temperatures and drier conditions, leading to longer wildfire seasons and increased wildfire risks in the United States. These factors have in turn led to increases in the frequency, extent, and severity of wildfires in recent years. Given the danger posed by wildland fires to people, property, wildlife, and the environment, there is an urgency…
▽ More
Research has shown that climate change creates warmer temperatures and drier conditions, leading to longer wildfire seasons and increased wildfire risks in the United States. These factors have in turn led to increases in the frequency, extent, and severity of wildfires in recent years. Given the danger posed by wildland fires to people, property, wildlife, and the environment, there is an urgency to provide tools for effective wildfire management. Early detection of wildfires is essential to minimizing potentially catastrophic destruction. In this paper, we present our work on integrating multiple data sources in SmokeyNet, a deep learning model using spatio-temporal information to detect smoke from wildland fires. Camera image data is integrated with weather sensor measurements and processed by SmokeyNet to create a multimodal wildland fire smoke detection system. We present our results comparing performance in terms of both accuracy and time-to-detection for multimodal data vs. a single data source. With a time-to-detection of only a few minutes, SmokeyNet can serve as an automated early notification system, providing a useful tool in the fight against destructive wildfires.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering
Authors:
Duy M. H. Nguyen,
Hoang Nguyen,
Mai T. N. Truong,
Tri Cao,
Binh T. Nguyen,
Nhat Ho,
Paul Swoboda,
Shadi Albarqouni,
Pengtao Xie,
Daniel Sonntag
Abstract:
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have…
▽ More
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.
△ Less
Submitted 4 December, 2022;
originally announced December 2022.
-
Efficient Integration of Multi-Order Dynamics and Internal Dynamics in Stock Movement Prediction
Authors:
Thanh Trung Huynh,
Minh Hieu Nguyen,
Thanh Tam Nguyen,
Phi Le Nguyen,
Matthias Weidlich,
Quoc Viet Hung Nguyen,
Karl Aberer
Abstract:
Advances in deep neural network (DNN) architectures have enabled new prediction techniques for stock market data. Unlike other multivariate time-series data, stock markets show two unique characteristics: (i) \emph{multi-order dynamics}, as stock prices are affected by strong non-pairwise correlations (e.g., within the same industry); and (ii) \emph{internal dynamics}, as each individual stock sho…
▽ More
Advances in deep neural network (DNN) architectures have enabled new prediction techniques for stock market data. Unlike other multivariate time-series data, stock markets show two unique characteristics: (i) \emph{multi-order dynamics}, as stock prices are affected by strong non-pairwise correlations (e.g., within the same industry); and (ii) \emph{internal dynamics}, as each individual stock shows some particular behaviour. Recent DNN-based methods capture multi-order dynamics using hypergraphs, but rely on the Fourier basis in the convolution, which is both inefficient and ineffective. In addition, they largely ignore internal dynamics by adopting the same model for each stock, which implies a severe information loss.
In this paper, we propose a framework for stock movement prediction to overcome the above issues. Specifically, the framework includes temporal generative filters that implement a memory-based mechanism onto an LSTM network in an attempt to learn individual patterns per stock. Moreover, we employ hypergraph attentions to capture the non-pairwise correlations. Here, using the wavelet basis instead of the Fourier basis, enables us to simplify the message passing and focus on the localized convolution. Experiments with US market data over six years show that our framework outperforms state-of-the-art methods in terms of profit and stability. Our source code and data are available at \url{https://github.com/thanhtrunghuynh93/estimate}.
△ Less
Submitted 24 November, 2022; v1 submitted 10 November, 2022;
originally announced November 2022.
-
Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round
Authors:
Manh Hung Nguyen,
Lisheng Sun,
Nathan Grinsztajn,
Isabelle Guyon
Abstract:
Meta-learning from learning curves is an important yet often neglected research area in the Machine Learning community. We introduce a series of Reinforcement Learning-based meta-learning challenges, in which an agent searches for the best suited algorithm for a given dataset, based on feedback of learning curves from the environment. The first round attracted participants both from academia and i…
▽ More
Meta-learning from learning curves is an important yet often neglected research area in the Machine Learning community. We introduce a series of Reinforcement Learning-based meta-learning challenges, in which an agent searches for the best suited algorithm for a given dataset, based on feedback of learning curves from the environment. The first round attracted participants both from academia and industry. This paper analyzes the results of the first round (accepted to the competition program of WCCI 2022), to draw insights into what makes a meta-learner successful at learning from learning curves. With the lessons learned from the first round and the feedback from the participants, we have designed the second round of our challenge with a new protocol and a new meta-dataset. The second round of our challenge is accepted at the AutoML-Conf 2022 and currently ongoing .
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
Model-Agnostic and Diverse Explanations for Streaming Rumour Graphs
Authors:
Thanh Tam Nguyen,
Thanh Cong Phan,
Minh Hieu Nguyen,
Matthias Weidlich,
Hongzhi Yin,
Jun Jo,
Quoc Viet Hung Nguyen
Abstract:
The propagation of rumours on social media poses an important threat to societies, so that various techniques for rumour detection have been proposed recently. Yet, existing work focuses on \emph{what} entities constitute a rumour, but provides little support to understand \emph{why} the entities have been classified as such. This prevents an effective evaluation of the detected rumours as well as…
▽ More
The propagation of rumours on social media poses an important threat to societies, so that various techniques for rumour detection have been proposed recently. Yet, existing work focuses on \emph{what} entities constitute a rumour, but provides little support to understand \emph{why} the entities have been classified as such. This prevents an effective evaluation of the detected rumours as well as the design of countermeasures. In this work, we argue that explanations for detected rumours may be given in terms of examples of related rumours detected in the past. A diverse set of similar rumours helps users to generalize, i.e., to understand the properties that govern the detection of rumours. Since the spread of rumours in social media is commonly modelled using feature-annotated graphs, we propose a query-by-example approach that, given a rumour graph, extracts the $k$ most similar and diverse subgraphs from past rumours. The challenge is that all of the computations require fast assessment of similarities between graphs. To achieve an efficient and adaptive realization of the approach in a streaming setting, we present a novel graph representation learning technique and report on implementation considerations. Our evaluation experiments show that our approach outperforms baseline techniques in delivering meaningful explanations for various rumour propagation behaviours.
△ Less
Submitted 17 July, 2022;
originally announced July 2022.
-
HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations
Authors:
Minh Huynh Nguyen,
Nghi D. Q. Bui,
Truong Son Hy,
Long Tran-Thanh,
Tien N. Nguyen
Abstract:
We propose a novel method for code summarization utilizing Heterogeneous Code Representations (HCRs) and our specially designed HierarchyNet. HCRs effectively capture essential code features at lexical, syntactic, and semantic levels by abstracting coarse-grained code elements and incorporating fine-grained program elements in a hierarchical structure. Our HierarchyNet method processes each layer…
▽ More
We propose a novel method for code summarization utilizing Heterogeneous Code Representations (HCRs) and our specially designed HierarchyNet. HCRs effectively capture essential code features at lexical, syntactic, and semantic levels by abstracting coarse-grained code elements and incorporating fine-grained program elements in a hierarchical structure. Our HierarchyNet method processes each layer of the HCR separately through a unique combination of the Heterogeneous Graph Transformer, a Tree-based CNN, and a Transformer Encoder. This approach preserves dependencies between code elements and captures relations through a novel Hierarchical-Aware Cross Attention layer. Our method surpasses current state-of-the-art techniques, such as PA-Former, CAST, and NeuralCodeSum.
△ Less
Submitted 9 May, 2023; v1 submitted 30 May, 2022;
originally announced May 2022.
-
FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection
Authors:
Anshuman Dewangan,
Yash Pande,
Hans-Werner Braun,
Frank Vernon,
Ismael Perez,
Ilkay Altintas,
Garrison W. Cottrell,
Mai H. Nguyen
Abstract:
The size and frequency of wildland fires in the western United States have dramatically increased in recent years. On high-fire-risk days, a small fire ignition can rapidly grow and become out of control. Early detection of fire ignitions from initial smoke can assist the response to such fires before they become difficult to manage. Past deep learning approaches for wildfire smoke detection have…
▽ More
The size and frequency of wildland fires in the western United States have dramatically increased in recent years. On high-fire-risk days, a small fire ignition can rapidly grow and become out of control. Early detection of fire ignitions from initial smoke can assist the response to such fires before they become difficult to manage. Past deep learning approaches for wildfire smoke detection have suffered from small or unreliable datasets that make it difficult to extrapolate performance to real-world scenarios. In this work, we present the Fire Ignition Library (FIgLib), a publicly available dataset of nearly 25,000 labeled wildfire smoke images as seen from fixed-view cameras deployed in Southern California. We also introduce SmokeyNet, a novel deep learning architecture using spatiotemporal information from camera imagery for real-time wildfire smoke detection. When trained on the FIgLib dataset, SmokeyNet outperforms comparable baselines and rivals human performance. We hope that the availability of the FIgLib dataset and the SmokeyNet architecture will inspire further research into deep learning methods for wildfire smoke detection, leading to automated notification systems that reduce the time to wildfire response.
△ Less
Submitted 14 May, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
The Harmonic Lagrange Top and the Confluent Heun Equation
Authors:
Sean R. Dawson,
Holger R. Dullin,
Diana M. H. Nguyen
Abstract:
The harmonic Lagrange top is the Lagrange top plus a quadratic (harmonic) potential term. We describe the top in the space fixed frame using a global description with a Poisson structure on $T^*S^3$. This global description naturally leads to a rational parametrisation of the set of critical values of the energy-momentum map. We show that there are 4 different topological types for generic paramet…
▽ More
The harmonic Lagrange top is the Lagrange top plus a quadratic (harmonic) potential term. We describe the top in the space fixed frame using a global description with a Poisson structure on $T^*S^3$. This global description naturally leads to a rational parametrisation of the set of critical values of the energy-momentum map. We show that there are 4 different topological types for generic parameter values. The quantum mechanics of the harmonic Lagrange top is described by the most general confluent Heun equation (also known as the generalised spheroidal wave equation). We derive formulas for an infinite pentadiagonal symmetric matrix representing the Hamiltonian from which the spectrum is computed.
△ Less
Submitted 10 December, 2021; v1 submitted 28 November, 2021;
originally announced November 2021.
-
LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking
Authors:
Duy M. H. Nguyen,
Roberto Henschel,
Bodo Rosenhahn,
Daniel Sonntag,
Paul Swoboda
Abstract:
Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance in crowded scenes or in wide spaces. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-…
▽ More
Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance in crowded scenes or in wide spaces. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect performance, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset.
△ Less
Submitted 3 May, 2022; v1 submitted 23 November, 2021;
originally announced November 2021.
-
Atomistic mechanisms of binary alloy surface segregation from nanoseconds to seconds using accelerated dynamics
Authors:
Richard B. Garza,
Jiyoung Lee,
Mai H. Nguyen,
Andrew Garmon,
Danny Perez,
Meng Li,
Judith C. Yang,
Graeme Henkelman,
Wissam A. Saidi
Abstract:
Although the equilibrium composition of many alloy surfaces is well understood, the rate of transient surface segregation during annealing is not known, despite its crucial effect on alloy corrosion and catalytic reactions occurring on overlapping timescales. In this work, CuNi bimetallic alloys representing (100) surface facets are annealed in vacuum using atomistic simulations to observe the eff…
▽ More
Although the equilibrium composition of many alloy surfaces is well understood, the rate of transient surface segregation during annealing is not known, despite its crucial effect on alloy corrosion and catalytic reactions occurring on overlapping timescales. In this work, CuNi bimetallic alloys representing (100) surface facets are annealed in vacuum using atomistic simulations to observe the effect of vacancy diffusion on surface separation. We employ multi-timescale methods to sample the early transient, intermediate, and equilibrium states of slab surfaces during the separation process, including standard MD as well as three methods to perform atomistic, long-time dynamics: parallel trajectory splicing (ParSplice), adaptive kinetic Monte Carlo (AKMC), and kinetic Monte Carlo (KMC). From nanosecond (ns) to second timescales, our multiscale computational methodology can observe rare stochastic events not typically seen with standard MD, closing the gap between computational and experimental timescales for surface segregation. Rapid diffusion of a vacancy to the slab is resolved by all four methods in tens of ns. Stochastic re-entry of vacancies into the subsurface, however, is only seen on the microsecond timescale in the two KMC methods. Kinetic vacancy trapping on the surface and its effect on the segregation rate are discussed. The equilibrium composition profile of CuNi after segregation during annealing is estimated to occur on a timescale of seconds as determined by KMC, a result directly comparable to nanoscale experiments.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Construction of minimal annuli in PSL2 via a variational method
Authors:
Pascal Collin,
Laurent Hauswirth,
Minh Hoang Nguyen
Abstract:
We construct complete, embedded minimal annuli asymptotic to vertical planes in the Riemannian 3-manifold PSL. The boundary of these annuli consists of 4 vertical lines at infinity. They are constructed by taking the limit of a sequence of compact minimal annuli. The compactness is obtained from an estimate of curvature which uses foliations by minimal surfaces. This estimate is independent of the…
▽ More
We construct complete, embedded minimal annuli asymptotic to vertical planes in the Riemannian 3-manifold PSL. The boundary of these annuli consists of 4 vertical lines at infinity. They are constructed by taking the limit of a sequence of compact minimal annuli. The compactness is obtained from an estimate of curvature which uses foliations by minimal surfaces. This estimate is independent of the index of the surface. We also prove the existence of a one-periodic family of Riemann's type examples. The difficulty of the construction comes from the lack of symmetry of the ambient space PSL.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction
Authors:
Duy M. H. Nguyen,
Truong T. N. Mai,
Ngoc T. T. Than,
Alexander Prange,
Daniel Sonntag
Abstract:
This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that…
▽ More
This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that our approach outperforms existing domain adaption strategies. Furthermore, when utilizing entire training data in the target domain, we are able to compete with several state-of-the-art approaches in final classification accuracy just by applying standard network architectures and using image-level labels.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams
Authors:
Heitor Murilo Gomes,
Maciej Grzenda,
Rodrigo Mello,
Jesse Read,
Minh Huong Le Nguyen,
Albert Bifet
Abstract:
Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leverage the unlabelled data (semi-supervised learning…
▽ More
Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leverage the unlabelled data (semi-supervised learning); or assume some labels will be available on request (active learning). The first approach is the simplest, yet the amount of labelled data available will limit the predictive performance. The second relies on finding and exploiting the underlying characteristics of the data distribution. The third depends on an external agent to provide the required labels in a timely fashion. This survey pays special attention to methods that leverage unlabelled data in a semi-supervised setting. We also discuss the delayed labelling issue, which impacts both fully supervised and semi-supervised methods. We propose a unified problem setting, discuss the learning guarantees and existing methods, explain the differences between related problem settings. Finally, we review the current benchmarking practices and propose adaptations to enhance them.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Optimal fire allocation in a combat model of mixed NCW type
Authors:
My A. Vu,
Nam H. Nguyen,
Hanh Le T. Nguyen,
Anh N. Ta,
Mong H. Nguyen
Abstract:
In this work, we introduce a nonlinear Lanchester model of NCW-type and study a problem of finding the optimal fire allocation for this model. A Blue party $B$ will fight against a Red party consisting of $A$ and $R$, where $A$ is an independent force and $R$ fights with supports from a supply unit $N$. A battle may consist of several stages but we consider the problem of finding optimal fire allo…
▽ More
In this work, we introduce a nonlinear Lanchester model of NCW-type and study a problem of finding the optimal fire allocation for this model. A Blue party $B$ will fight against a Red party consisting of $A$ and $R$, where $A$ is an independent force and $R$ fights with supports from a supply unit $N$. A battle may consist of several stages but we consider the problem of finding optimal fire allocation for $B$ in the first stage only. Optimal fire allocation is a set of three non-negative numbers whose sum equals to one, such that the remaining force of $B$ is maximal at any instants. In order to tackle this problem, we introduce the notion of \textit{threatening rates} which are computed for $A, R, N$ at the beginning of the battle. Numerical illustrations are presented to justify the theoretical findings.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
TATL: Task Agnostic Transfer Learning for Skin Attributes Detection
Authors:
Duy M. H. Nguyen,
Thu T. Nguyen,
Huong Vu,
Quang Pham,
Manh-Duy Nguyen,
Binh T. Nguyen,
Daniel Sonntag
Abstract:
Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose \emph{Task Agnostic Transfer Learning (TATL)}, a novel framework motivate…
▽ More
Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose \emph{Task Agnostic Transfer Learning (TATL)}, a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular attribute. Since TATL's attribute-agnostic segmenter only detects skin attribute regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We conduct extensive experiments to evaluate the proposed TATL transfer learning mechanism with various neural network architectures on two popular skin attributes detection benchmarks. The empirical results show that TATL not only works well with multiple architectures but also can achieve state-of-the-art performances while enjoying minimal model and computational complexities. We also provide theoretical insights and explanations for why our transfer learning framework performs well in practice.
△ Less
Submitted 27 January, 2022; v1 submitted 4 April, 2021;
originally announced April 2021.
-
On the number of antipodal or strictly antipodal pairs of points in finite subsets of $\mathbb{R}^d$, III
Authors:
E. Makai Jr.,
H. Martini,
M. H. Nguyên,
V. Soltan,
I. Talata
Abstract:
We improve our earlier upper bound on the numbers of antipodal pairs of points among $n$ points in ${\mathbb{R}}^3$, to $2n^2/5+O(n^c)$, for some $c<2$. We prove that the minimal number of antipodal pairs among $n$ points in convex position in ${\mathbb{R}}^d$, affinely spanning ${\mathbb{R}}^d$, is $n + d(d - 1)/2 - 1$. Let ${\underline{sa}}^s_d(n)$ be the minimum of the number of strictly antipo…
▽ More
We improve our earlier upper bound on the numbers of antipodal pairs of points among $n$ points in ${\mathbb{R}}^3$, to $2n^2/5+O(n^c)$, for some $c<2$. We prove that the minimal number of antipodal pairs among $n$ points in convex position in ${\mathbb{R}}^d$, affinely spanning ${\mathbb{R}}^d$, is $n + d(d - 1)/2 - 1$. Let ${\underline{sa}}^s_d(n)$ be the minimum of the number of strictly antipodal pairs of points among any $n$ points in ${\mathbb{R}}^d$, with affine hull ${\mathbb{R}}^d$, and in strictly convex position. The value of ${\underline{sa}}^s_d(n)$ was known for $d \le 3$ and any $n$. Moreover, ${\underline{sa}}^s_d(n) = \lceil n/2\rceil $ was known for $n \ge 2d$ even, and $n \ge 4d+1$ odd. We show ${\underline{sa}}^s_d(n) = 2d$ for $2d+1 \le n \le 4d-1$ odd, we determine ${\underline{sa}}^s_d(n)$ for $d=4$ and any $n$, and prove ${\underline{sa}}^s_d(2d -1) = 3(d - 1)$. The cases $d \ge 5 $ and $d+2 \le n \le 2d - 2$ remain open, but we give a lower and an upper bound on ${\underline{sa}}^s_d(n)$ for them, which are of the same order of magnitude, namely $Θ\left( (d-k)d \right) $. We present a simple example of a strictly antipodal set in ${\mathbb{R}}^d$, of cardinality const\,$\cdot 1.5874...^d$. We give simple proofs of the following statements: if $n$ segments in ${\mathbb{R}}^3$ are pairwise antipodal, or strictly antipodal, then $n \le 4$, or $n \le 3$, respectively, and these are sharp. We describe also the cases of equality.
△ Less
Submitted 2 June, 2021; v1 submitted 24 March, 2021;
originally announced March 2021.
-
An Attention Mechanism with Multiple Knowledge Sources for COVID-19 Detection from CT Images
Authors:
Duy M. H. Nguyen,
Duy M. Nguyen,
Huong Vu,
Binh T. Nguyen,
Fabrizio Nunnari,
Daniel Sonntag
Abstract:
Until now, Coronavirus SARS-CoV-2 has caused more than 850,000 deaths and infected more than 27 million individuals in over 120 countries. Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utiliz…
▽ More
Until now, Coronavirus SARS-CoV-2 has caused more than 850,000 deaths and infected more than 27 million individuals in over 120 countries. Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utilize deep networks for COVID-19 diagnosis based on CT scans. While these approaches mostly focus on introducing novel architectures, transfer learning techniques, or construction large scale data, we propose a novel strategy to improve the performance of several baselines by leveraging multiple useful information sources relevant to doctors' judgments. Specifically, infected regions and heat maps extracted from learned networks are integrated with the global image via an attention mechanism during the learning process. This procedure not only makes our system more robust to noise but also guides the network focusing on local lesion areas. Extensive experiments illustrate the superior performance of our approach compared to recent baselines. Furthermore, our learned network guidance presents an explainable feature to doctors as we can understand the connection between input and output in a grey-box model.
△ Less
Submitted 1 December, 2020; v1 submitted 23 September, 2020;
originally announced September 2020.
-
Optical phonons of SnSe(1-x)Sx layered semiconductor alloys
Authors:
Tharith Sriv,
Thi Minh Hai Nguyen,
Yangjin Lee,
Soo Yeon Lim,
Van Quang Nguyen,
Kwanpyo Kim,
Sunglae Cho,
Hyeonsik Cheong
Abstract:
The evolution of the optical phonons in layered semiconductor alloys SnSe1-xSx is studied as a function of the composition by using polarized Raman spectroscopy with six different excitation wavelengths (784.8, 632.8, 532, 514.5, 488, and 441.6 nm). The polarization dependences of the phonon modes are compared with transmission electron diffraction measurements to determine the crystallographic or…
▽ More
The evolution of the optical phonons in layered semiconductor alloys SnSe1-xSx is studied as a function of the composition by using polarized Raman spectroscopy with six different excitation wavelengths (784.8, 632.8, 532, 514.5, 488, and 441.6 nm). The polarization dependences of the phonon modes are compared with transmission electron diffraction measurements to determine the crystallographic orientation of the samples. Some of the Raman modes show significant variation in their polarization behavior depending on the excitation wavelengths. It is established that the maximum intensity direction of the Ag2 mode of SnSe1-xSx (0<=x<=1) does not depend on the excitation wavelength and corresponds to the armchair direction. It is additionally found that the lower-frequency Raman modes of Ag1, Ag2 and B3g1 in the alloys show the typical one-mode behavior of optical phonons, whereas the higher-frequency modes of B3g2, Ag3 and Ag4 show two-mode behavior.
△ Less
Submitted 18 July, 2020;
originally announced July 2020.
-
Monodromy in Prolate Spheroidal Harmonics
Authors:
Sean R. Dawson,
Holger R. Dullin,
Diana M. H. Nguyen
Abstract:
We show that spheroidal wave functions viewed as the essential part of the joint eigenfunction of two commuting operators of $L_2(S^2)$ has a defect in the joint spectrum that makes a global labelling of the joint eigenfunctions by quantum numbers impossible. To our knowledge this is the first explicit demonstration that quantum monodromy exists in a class of classically known special functions. U…
▽ More
We show that spheroidal wave functions viewed as the essential part of the joint eigenfunction of two commuting operators of $L_2(S^2)$ has a defect in the joint spectrum that makes a global labelling of the joint eigenfunctions by quantum numbers impossible. To our knowledge this is the first explicit demonstration that quantum monodromy exists in a class of classically known special functions. Using an analogue of the Laplace-Runge-Lenz vector we show that the corresponding classical Liouville integrable system is symplectically equivalent to the C. Neumann system. To prove the existence of this defect we construct a classical integrable system that is the semi-classical limit of the quantum integrable system of commuting operators. We show that this is a semi-toric system with a non-degenerate focus-focus point, such that there is monodromy in the classical and the quantum system.
△ Less
Submitted 30 January, 2020;
originally announced January 2020.
-
Cardiac MRI Image Segmentation for Left Ventricle and Right Ventricle using Deep Learning
Authors:
Bosung Seo,
Daniel Mariano,
John Beckfield,
Vinay Madenur,
Yuming Hu,
Tony Reina,
Marcus Bobar,
Mai H. Nguyen,
Ilkay Altintas
Abstract:
The goal of this project is to use magnetic resonance imaging (MRI) data to provide an end-to-end analytics pipeline for left and right ventricle (LV and RV) segmentation. Another aim of the project is to find a model that would be generalizable across medical imaging datasets. We utilized a variety of models, datasets, and tests to determine which one is well suited to this purpose. Specifically,…
▽ More
The goal of this project is to use magnetic resonance imaging (MRI) data to provide an end-to-end analytics pipeline for left and right ventricle (LV and RV) segmentation. Another aim of the project is to find a model that would be generalizable across medical imaging datasets. We utilized a variety of models, datasets, and tests to determine which one is well suited to this purpose. Specifically, we implemented three models (2-D U-Net, 3-D U-Net, and DenseNet), and evaluated them on four datasets (Automated Cardiac Diagnosis Challenge, MICCAI 2009 LV, Sunnybrook Cardiac Data, MICCAI 2012 RV). While maintaining a consistent preprocessing strategy, we tested the performance of each model when trained on data from the same dataset as the test data, and when trained on data from a different dataset than the test dataset. Data augmentation was also used to increase the adaptability of the models. The results were compared to determine performance and generalizability.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
Crowd Transformer Network
Authors:
Viresh Ranjan,
Mubarak Shah,
Minh Hoai Nguyen
Abstract:
In this paper, we tackle the problem of Crowd Counting, and present a crowd density estimation based approach for obtaining the crowd count. Most of the existing crowd counting approaches rely on local features for estimating the crowd density map. In this work, we investigate the usefulness of combining local with non-local features for crowd counting. We use convolution layers for extracting loc…
▽ More
In this paper, we tackle the problem of Crowd Counting, and present a crowd density estimation based approach for obtaining the crowd count. Most of the existing crowd counting approaches rely on local features for estimating the crowd density map. In this work, we investigate the usefulness of combining local with non-local features for crowd counting. We use convolution layers for extracting local features, and a type of self-attention mechanism for extracting non-local features. We combine the local and the non-local features, and use it for estimating crowd density map. We conduct experiments on three publicly available Crowd Counting datasets, and achieve significant improvement over the previous approaches.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Ten Simple Rules for Reproducible Research in Jupyter Notebooks
Authors:
Adam Rule,
Amanda Birmingham,
Cristal Zuniga,
Ilkay Altintas,
Shih-Cheng Huang,
Rob Knight,
Niema Moshiri,
Mai H. Nguyen,
Sara Brin Rosenthal,
Fernando Pérez,
Peter W. Rose
Abstract:
Reproducibility of computational studies is a hallmark of scientific methodology. It enables researchers to build with confidence on the methods and findings of others, reuse and extend computational pipelines, and thereby drive scientific progress. Since many experimental studies rely on computational analyses, biologists need guidance on how to set up and document reproducible data analyses or s…
▽ More
Reproducibility of computational studies is a hallmark of scientific methodology. It enables researchers to build with confidence on the methods and findings of others, reuse and extend computational pipelines, and thereby drive scientific progress. Since many experimental studies rely on computational analyses, biologists need guidance on how to set up and document reproducible data analyses or simulations.
In this paper, we address several questions about reproducibility. For example, what are the technical and non-technical barriers to reproducible computational studies? What opportunities and challenges do computational notebooks offer to overcome some of these barriers? What tools are available and how can they be used effectively?
We have developed a set of rules to serve as a guide to scientists with a specific focus on computational notebook systems, such as Jupyter Notebooks, which have become a tool of choice for many applications. Notebooks combine detailed workflows with narrative text and visualization of results. Combined with software repositories and open source licensing, notebooks are powerful tools for transparent, collaborative, reproducible, and reusable data analyses.
△ Less
Submitted 13 October, 2018;
originally announced October 2018.
-
Left Ventricle Segmentation and Volume Estimation on Cardiac MRI using Deep Learning
Authors:
Ehab Abdelmaguid,
Jolene Huang,
Sanjay Kenchareddy,
Disha Singla,
Laura Wilke,
Mai H. Nguyen,
Ilkay Altintas
Abstract:
In the United States, heart disease is the leading cause of death for both men and women, accounting for 610,000 deaths each year [1]. Physicians use Magnetic Resonance Imaging (MRI) scans to take images of the heart in order to non-invasively estimate its structural and functional parameters for cardiovascular diagnosis and disease management. The end-systolic volume (ESV) and end-diastolic volum…
▽ More
In the United States, heart disease is the leading cause of death for both men and women, accounting for 610,000 deaths each year [1]. Physicians use Magnetic Resonance Imaging (MRI) scans to take images of the heart in order to non-invasively estimate its structural and functional parameters for cardiovascular diagnosis and disease management. The end-systolic volume (ESV) and end-diastolic volume (EDV) of the left ventricle (LV), and the ejection fraction (EF) are indicators of heart disease. These measures can be derived from the segmented contours of the LV; thus, consistent and accurate segmentation of the LV from MRI images are critical to the accuracy of the ESV, EDV, and EF, and to non-invasive cardiac disease detection.
In this work, various image preprocessing techniques, model configurations using the U-Net deep learning architecture, postprocessing methods, and approaches for volume estimation are investigated. An end-to-end analytics pipeline with multiple stages is provided for automated LV segmentation and volume estimation. First, image data are reformatted and processed from DICOM and NIfTI formats to raw images in array format. Secondly, raw images are processed with multiple image preprocessing methods and cropped to include only the Region of Interest (ROI). Thirdly, preprocessed images are segmented using U-Net models. Lastly, post processing of segmented images to remove extra contours along with intelligent slice and frame selection are applied, followed by calculation of the ESV, EDV, and EF. This analytics pipeline is implemented and runs on a distributed computing environment with a GPU cluster at the San Diego Supercomputer Center at UCSD.
△ Less
Submitted 21 November, 2018; v1 submitted 14 September, 2018;
originally announced September 2018.
-
Vietnamese Semantic Role Labelling
Authors:
Phuong Le-Hong,
Thai Hoang Pham,
Xuan Khoai Pham,
Thi Minh Huyen Nguyen,
Thi Luong Nguyen,
Minh Hiep Nguyen
Abstract:
In this paper, we study semantic role labelling (SRL), a subtask of semantic parsing of natural language sentences and its application for the Vietnamese language. We present our effort in building Vietnamese PropBank, the first Vietnamese SRL corpus and a software system for labelling semantic roles of Vietnamese texts. In particular, we present a novel constituent extraction algorithm in the arg…
▽ More
In this paper, we study semantic role labelling (SRL), a subtask of semantic parsing of natural language sentences and its application for the Vietnamese language. We present our effort in building Vietnamese PropBank, the first Vietnamese SRL corpus and a software system for labelling semantic roles of Vietnamese texts. In particular, we present a novel constituent extraction algorithm in the argument candidate identification step which is more suitable and more accurate than the common node-mapping method. In the machine learning part, our system integrates distributed word features produced by two recent unsupervised learning models in two learned statistical classifiers and makes use of integer linear programming inference procedure to improve the accuracy. The system is evaluated in a series of experiments and achieves a good result, an $F_1$ score of 74.77%. Our system, including corpus and software, is available as an open source project for free research and we believe that it is a good baseline for the development of future Vietnamese SRL systems.
△ Less
Submitted 27 November, 2017;
originally announced November 2017.
-
The Lie-Poisson Structure of the Symmetry Reduced Regularised n-Body Problem
Authors:
Suntharan Arunasalam,
Holger R. Dullin,
Diana M. H. Nguyen
Abstract:
This paper investigates the symmetry reduction of the regularised n-body problem. The three body problem, regularised through quaternions, is examined in detail. We show that for a suitably chosen symmetry group action the space of quadratic invariants is closed and the Hamiltonian can be written in terms of the quadratic invariants. The corresponding Lie-Poisson structure is isomorphic to the Lie…
▽ More
This paper investigates the symmetry reduction of the regularised n-body problem. The three body problem, regularised through quaternions, is examined in detail. We show that for a suitably chosen symmetry group action the space of quadratic invariants is closed and the Hamiltonian can be written in terms of the quadratic invariants. The corresponding Lie-Poisson structure is isomorphic to the Lie algebra u(3,3). Finally, we generalise this result to the n-body problem for n>3.
△ Less
Submitted 13 August, 2014;
originally announced August 2014.
-
The Dirichlet problem for the minimal surface equation in $\rm Sol_3$, with possible infinite boundary data
Authors:
Minh Hoang Nguyen
Abstract:
In this paper, we study the Dirichlet problem for the minimal surface equation in $\rm Sol_3$ with possible infinite boundary data, where $\rm Sol_3$ is the non-abelian solvable $3$-dimensional Lie group equipped with its usual left-invariant metric that makes it into a model space for one of the eight Thurston geometries. Our main result is a Jenkins-Serrin type theorem which establishes necessar…
▽ More
In this paper, we study the Dirichlet problem for the minimal surface equation in $\rm Sol_3$ with possible infinite boundary data, where $\rm Sol_3$ is the non-abelian solvable $3$-dimensional Lie group equipped with its usual left-invariant metric that makes it into a model space for one of the eight Thurston geometries. Our main result is a Jenkins-Serrin type theorem which establishes necessary and sufficient conditions for the existence and uniqueness of certain minimal Killing graphs with a non-unitary Killing vector field in $\rm Sol_3$.
△ Less
Submitted 27 January, 2014; v1 submitted 20 December, 2013;
originally announced December 2013.
-
From Parametric Model-based Optimization to robust PID Gain Scheduling
Authors:
Minh Hoang-Tuan Nguyen,
Kok Kiong Tan
Abstract:
In chemical process applications, model predictive control effectively deals with input and state constraints during transient operations. However, industrial PID controllers directly manipulates the actuators, so they play the key role in small perturbation robustness. This paper considers the problem of augmenting the commonplace PID with the constraint handling and optimization functionalities…
▽ More
In chemical process applications, model predictive control effectively deals with input and state constraints during transient operations. However, industrial PID controllers directly manipulates the actuators, so they play the key role in small perturbation robustness. This paper considers the problem of augmenting the commonplace PID with the constraint handling and optimization functionalities of MPC. First, we review the MPC framework, which employs a linear feedback gain in its unconstrained region. This linear gain can be any preexisting multiloop PID design, or based on the two stabilizing PI or PID designs for multivariable systems proposed in the paper. The resulting controller is a feedforward PID mapping, a straightforward form without the need of tuning PID to fit an optimal input. The parametrized solution of MPC under constraints further leverages a familiar PID gain scheduling structure. Steady state robustness is achieved along with the PID design so that additional robustness analysis is avoided.
△ Less
Submitted 28 May, 2013;
originally announced May 2013.
-
Enhanced Predictive Ratio Control of Interacting Systems
Authors:
Minh Hoang-Tuan Nguyen,
Kok Kiong Tan,
Sunan Huang
Abstract:
Ratio control for two interacting processes is proposed with a PID feedforward design based on model predictive control (MPC) scheme. At each sampling instant, the MPC control action minimizes a state-dependent performance index associated with a PID-type state vector, thus yielding a PID-type control structure. Compared to the standard MPC formulations with separated single-variable control, such…
▽ More
Ratio control for two interacting processes is proposed with a PID feedforward design based on model predictive control (MPC) scheme. At each sampling instant, the MPC control action minimizes a state-dependent performance index associated with a PID-type state vector, thus yielding a PID-type control structure. Compared to the standard MPC formulations with separated single-variable control, such a control action allows one to take into account the non-uniformity of the two process outputs. After reformulating the MPC control law as a PID control law, we provide conditions for prediction horizon and weighting matrices so that the closed-loop control is asymptotically stable, and show the effectiveness of the approach with simulation and experiment results.
△ Less
Submitted 28 May, 2013;
originally announced May 2013.
-
Robust Precision Positioning Control on Linear Ultrasonic Motor
Authors:
Minh H-T Nguyen,
Kok Kiong Tan,
Wenyu Liang,
Chek Sing Teo
Abstract:
Ultrasonic motors used in high-precision mechatronics are characterized by strong frictional effects, which are among the main problems in precision motion control. The traditional methods apply model-based nonlinear feedforward to compensate the friction, thus requiring closed-loop stability and safety constraint considerations. Implementation of these methods requires complex designed experiment…
▽ More
Ultrasonic motors used in high-precision mechatronics are characterized by strong frictional effects, which are among the main problems in precision motion control. The traditional methods apply model-based nonlinear feedforward to compensate the friction, thus requiring closed-loop stability and safety constraint considerations. Implementation of these methods requires complex designed experiments. This paper introduces a systematic approach using piecewise affine models to emulate the friction effect of the motor motion. The well-known model predictive control method is employed to deal with piecewise affine models. The increased complexity of the model offers a higher tracking precision on a simpler gain scheduling scheme.
△ Less
Submitted 28 May, 2013;
originally announced May 2013.