Search | arXiv e-print repository

Generalized knowledge-enhanced framework for biomedical entity and relation extraction

Abstract: In recent years, there has been an increasing number of frameworks developed for biomedical entity and relation extraction. This research effort aims to address the accelerating growth in biomedical publications and the intricate nature of biomedical texts, which are written for mainly domain experts. To handle these challenges, we develop a novel framework that utilizes external knowledge to cons… ▽ More In recent years, there has been an increasing number of frameworks developed for biomedical entity and relation extraction. This research effort aims to address the accelerating growth in biomedical publications and the intricate nature of biomedical texts, which are written for mainly domain experts. To handle these challenges, we develop a novel framework that utilizes external knowledge to construct a task-independent and reusable background knowledge graph for biomedical entity and relation extraction. The design of our model is inspired by how humans learn domain-specific topics. In particular, humans often first acquire the most basic and common knowledge regarding a field to build the foundational knowledge and then use that as a basis for extending to various specialized topics. Our framework employs such common-knowledge-sharing mechanism to build a general neural-network knowledge graph that is learning transferable to different domain-specific biomedical texts effectively. Experimental evaluations demonstrate that our model, equipped with this generalized and cross-transferable knowledge base, achieves competitive performance benchmarks, including BioRelEx for binding interaction detection and ADE for Adverse Drug Effect identification. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.02859 [pdf, other]

Multistain Pretraining for Slide Representation Learning in Pathology

Authors: Guillaume Jaume, Anurag Vaidya, Andrew Zhang, Andrew H. Song, Richard J. Chen, Sharifa Sahai, Dandan Mo, Emilio Madrigal, Long Phi Le, Faisal Mahmood

Abstract: Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learnin… ▽ More Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learning extend the principles of SSL from small images (e.g., 224 x 224 patches) to entire slides, usually by aligning two different augmentations (or views) of the slide. Yet the resulting representation remains constrained by the limited clinical and biological diversity of the views. Instead, we postulate that slides stained with multiple markers, such as immunohistochemistry, can be used as different views to form a rich task-agnostic training signal. To this end, we introduce Madeleine, a multimodal pretraining strategy for slide representation learning. Madeleine is trained with a dual global-local cross-stain alignment objective on large cohorts of breast cancer samples (N=4,211 WSIs across five stains) and kidney transplant samples (N=12,070 WSIs across four stains). We demonstrate the quality of slide representations learned by Madeleine on various downstream evaluations, ranging from morphological and molecular classification to prognostic prediction, comprising 21 tasks using 7,299 WSIs from multiple medical centers. Code is available at https://github.com/mahmoodlab/MADELEINE. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: ECCV'24

arXiv:2407.14974 [pdf, other]

Out of spuriousity: Improving robustness to spurious correlations without group annotations

Authors: Phuong Quynh Le, Jörg Schlötterer, Christin Seifert

Abstract: Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correlations and poor generalization ability. To improve the robustness of machine learning models to spurious correlations, we propose an approach to extract a sub… ▽ More Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correlations and poor generalization ability. To improve the robustness of machine learning models to spurious correlations, we propose an approach to extract a subnetwork from a fully trained network that does not rely on spurious correlations. The subnetwork is found by the assumption that data points with the same spurious attribute will be close to each other in the representation space when training with ERM, then we employ supervised contrastive loss in a novel way to force models to unlearn the spurious connections. The increase in the worst-group performance of our approach contributes to strengthening the hypothesis that there exists a subnetwork in a fully trained dense network that is responsible for using only invariant features in classification tasks, therefore erasing the influence of spurious features even in the setup of multi spurious attributes and no prior knowledge of attributes labels. △ Less

Submitted 20 July, 2024; originally announced July 2024.

arXiv:2407.05452 [pdf, other]

Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images

Authors: Tuan T. Nguyen, Phan Le, Yasir Hassan, Mina Sartipi

Abstract: In this paper, we present the submission to the 5th Annual Smoky Mountains Computational Sciences Data Challenge, Challenge 3. This is the solution for semantic segmentation problem in both real-world and synthetic images from a vehicle s forward-facing camera. We concentrate in building a robust model which performs well across various domains of different outdoor situations such as sunny, snowy,… ▽ More In this paper, we present the submission to the 5th Annual Smoky Mountains Computational Sciences Data Challenge, Challenge 3. This is the solution for semantic segmentation problem in both real-world and synthetic images from a vehicle s forward-facing camera. We concentrate in building a robust model which performs well across various domains of different outdoor situations such as sunny, snowy, rainy, etc. In particular, our method is developed with two main directions: model development and domain adaptation. In model development, we use the High Resolution Network (HRNet) as the baseline. Then, this baseline s result is processed by two coarse-to-fine models: Object-Contextual Representations (OCR) and Hierarchical Multi-scale Attention (HMA) to get the better robust feature. For domain adaption, we implement the Domain-Based Batch Normalization (DNB) to reduce the distribution shift from diverse domains. Our proposed method yield 81.259 mean intersection-over-union (mIoU) in validation set. This paper studies the effectiveness of employing real-world and synthetic data to handle the domain adaptation in semantic segmentation problem. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 13 pages

arXiv:2401.01108 [pdf, other]

Unveiling Comparative Sentiments in Vietnamese Product Reviews: A Sequential Classification Framework

Authors: Ha Le, Bao Tran, Phuong Le, Tan Nguyen, Dac Nguyen, Ngoan Pham, Dang Huynh

Abstract: Comparative opinion mining is a specialized field of sentiment analysis that aims to identify and extract sentiments expressed comparatively. To address this task, we propose an approach that consists of solving three sequential sub-tasks: (i) identifying comparative sentence, i.e., if a sentence has a comparative meaning, (ii) extracting comparative elements, i.e., what are comparison subjects, o… ▽ More Comparative opinion mining is a specialized field of sentiment analysis that aims to identify and extract sentiments expressed comparatively. To address this task, we propose an approach that consists of solving three sequential sub-tasks: (i) identifying comparative sentence, i.e., if a sentence has a comparative meaning, (ii) extracting comparative elements, i.e., what are comparison subjects, objects, aspects, predicates, and (iii) classifying comparison types which contribute to a deeper comprehension of user sentiments in Vietnamese product reviews. Our method is ranked fifth at the Vietnamese Language and Speech Processing (VLSP) 2023 challenge on Comparative Opinion Mining (ComOM) from Vietnamese Product Reviews. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: Accepted manuscript at VLSP 2023

arXiv:2312.07814 [pdf, other]

A Foundational Multimodal Vision Language AI Assistant for Human Pathology

Authors: Ming Y. Lu, Bowen Chen, Drew F. K. Williamson, Richard J. Chen, Kenji Ikamura, Georg Gerber, Ivy Liang, Long Phi Le, Tong Ding, Anil V Parwani, Faisal Mahmood

Abstract: The field of computational pathology has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants tailored to pathology. Here we present PathChat, a vis… ▽ More The field of computational pathology has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology using an in-house developed foundational vision encoder pretrained on 100 million histology images from over 100,000 patient cases and 1.18 million pathology image-caption pairs. The vision encoder is then combined with a pretrained large language model and the whole system is finetuned on over 250,000 diverse disease agnostic visual language instructions. We compare PathChat against several multimodal vision language AI assistants as well as GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4. When relevant clinical context is provided with the histology image, PathChat achieved a diagnostic accuracy of 87% on multiple-choice questions based on publicly available cases of diverse tissue origins and disease models. Additionally, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision language AI assistant that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.05279 [pdf]

Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network

Authors: Anbo Cao, Pin-Yu Le, Zhonghui Qie, Haseeb Hassan, Yingwei Guo, Asim Zaman, Jiaxi Lu, Xueqiang Zeng, Huihui Yang, Xiaoqiang Miao, Taiyu Han, Guangtao Huang, Yan Kang, Yu Luo, Jia Guo

Abstract: Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t… ▽ More Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning technology could leverage it, which can accurately estimate clinical perfusion parameters compared to traditional clinical approaches. Therefore, this study presents a perfusion parameters estimation network that considers spatial and temporal information, the Spatiotemporal Network (ST-Net), for the first time. The proposed network comprises a designed physical loss function to enhance model performance further. The results indicate that the network can accurately estimate perfusion parameters, including cerebral blood volume (CBV), cerebral blood flow (CBF), and time to maximum of the residual function (Tmax). The structural similarity index (SSIM) mean values for CBV, CBF, and Tmax parameters were 0.952, 0.943, and 0.863, respectively. The DICE score for the hypo-perfused region reached 0.859, demonstrating high consistency. The proposed model also maintains time efficiency, closely approaching the performance of commercial gold-standard software. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.03630 [pdf, other]

Counterfactual Data Augmentation with Contrastive Learning

Authors: Ahmed Aloui, Juncheng Dong, Cat P. Le, Vahid Tarokh

Abstract: Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a simila… ▽ More Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group. By augmenting the original dataset with these reliable imputations, we can effectively reduce the discrepancy between different treatment groups, while inducing minimal imputation error. The augmented dataset is subsequently employed to train CATE estimation models. Theoretical analysis and experimental studies on synthetic and semi-synthetic benchmarks demonstrate that our method achieves significant improvements in both performance and robustness to overfitting across state-of-the-art models. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.03383 [pdf, other]

Toward Reinforcement Learning-based Rectilinear Macro Placement Under Human Constraints

Authors: Tuyen P. Le, Hieu T. Nguyen, Seungyeol Baek, Taeyoun Kim, Jungwoo Lee, Seongjung Kim, Hyunjin Kim, Misu Jung, Daehoon Kim, Seokyong Lee, Daewoo Choi

Abstract: Macro placement is a critical phase in chip design, which becomes more intricate when involving general rectilinear macros and layout areas. Furthermore, macro placement that incorporates human-like constraints, such as design hierarchy and peripheral bias, has the potential to significantly reduce the amount of additional manual labor required from designers. This study proposes a methodology tha… ▽ More Macro placement is a critical phase in chip design, which becomes more intricate when involving general rectilinear macros and layout areas. Furthermore, macro placement that incorporates human-like constraints, such as design hierarchy and peripheral bias, has the potential to significantly reduce the amount of additional manual labor required from designers. This study proposes a methodology that leverages an approach suggested by Google's Circuit Training (G-CT) to provide a learning-based macro placer that not only supports placing rectilinear cases, but also adheres to crucial human-like design principles. Our experimental results demonstrate the effectiveness of our framework in achieving power-performance-area (PPA) metrics and in obtaining placements of high quality, comparable to those produced with human intervention. Additionally, our methodology shows potential as a generalized model to address diverse macro shapes and layout areas. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: Fast ML for Science @ ICCAD 2023

arXiv:2310.01720 [pdf, other]

Perceiver-based CDF Modeling for Time Series Forecasting

Authors: Cat P. Le, Chris Cannella, Ali Hasan, Yuting Ng, Vahid Tarokh

Abstract: Transformers have demonstrated remarkable efficacy in forecasting time series data. However, their extensive dependence on self-attention mechanisms demands significant computational resources, thereby limiting their practical applicability across diverse tasks, especially in multimodal problems. In this work, we propose a new architecture, called perceiver-CDF, for modeling cumulative distributio… ▽ More Transformers have demonstrated remarkable efficacy in forecasting time series data. However, their extensive dependence on self-attention mechanisms demands significant computational resources, thereby limiting their practical applicability across diverse tasks, especially in multimodal problems. In this work, we propose a new architecture, called perceiver-CDF, for modeling cumulative distribution functions (CDF) of time series data. Our approach combines the perceiver architecture with a copula-based attention mechanism tailored for multimodal time series prediction. By leveraging the perceiver, our model efficiently transforms high-dimensional and multimodal data into a compact latent space, thereby significantly reducing computational demands. Subsequently, we implement a copula-based attention mechanism to construct the joint distribution of missing data for prediction. Further, we propose an output variance testing mechanism to effectively mitigate error propagation during prediction. To enhance efficiency and reduce complexity, we introduce midpoint inference for the local attention mechanism. This enables the model to efficiently capture dependencies within nearby imputed samples without considering all previous samples. The experiments on the unimodal and multimodal benchmarks consistently demonstrate a 20% improvement over state-of-the-art methods while utilizing less than half of the computational resources. △ Less

Submitted 24 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Accepted in Winter Simulation Conference 2024

arXiv:2310.00467 [pdf, ps, other]

New results on Erasure Combinatorial Batch Codes

Authors: Phuc-Lu Le, Son Hoang Dau, Hy Dinh Ngo, Thuc D. Nguyen

Abstract: We investigate in this work the problem of Erasure Combinatorial Batch Codes, in which $n$ files are stored on $m$ servers so that every set of $n-r$ servers allows a client to retrieve at most $k$ distinct files by downloading at most $t$ files from each server. Previous studies have solved this problem for the special case of $t=1$ using Combinatorial Batch Codes. We tackle the general case… ▽ More We investigate in this work the problem of Erasure Combinatorial Batch Codes, in which $n$ files are stored on $m$ servers so that every set of $n-r$ servers allows a client to retrieve at most $k$ distinct files by downloading at most $t$ files from each server. Previous studies have solved this problem for the special case of $t=1$ using Combinatorial Batch Codes. We tackle the general case $t \geq 1$ using a generalization of Hall's theorem. Additionally, we address a realistic scenario in which the retrieved files are consecutive according to some order and provide a simple and optimal solution for this case. △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: Allerton conference

arXiv:2308.15474 [pdf, other]

A General-Purpose Self-Supervised Model for Computational Pathology

Authors: Richard J. Chen, Tong Ding, Ming Y. Lu, Drew F. K. Williamson, Guillaume Jaume, Bowen Chen, Andrew Zhang, Daniel Shao, Andrew H. Song, Muhammad Shaban, Mane Williams, Anurag Vaidya, Sharifa Sahai, Lukas Oldenburg, Luca L. Weishaupt, Judy J. Wang, Walt Williams, Long Phi Le, Georg Gerber, Faisal Mahmood

Abstract: Tissue phenotyping is a fundamental computational pathology (CPath) task in learning objective characterizations of histopathologic biomarkers in anatomic pathology. However, whole-slide imaging (WSI) poses a complex computer vision problem in which the large-scale image resolutions of WSIs and the enormous diversity of morphological phenotypes preclude large-scale data annotation. Current efforts… ▽ More Tissue phenotyping is a fundamental computational pathology (CPath) task in learning objective characterizations of histopathologic biomarkers in anatomic pathology. However, whole-slide imaging (WSI) poses a complex computer vision problem in which the large-scale image resolutions of WSIs and the enormous diversity of morphological phenotypes preclude large-scale data annotation. Current efforts have proposed using pretrained image encoders with either transfer learning from natural image datasets or self-supervised pretraining on publicly-available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using over 100 million tissue patches from over 100,000 diagnostic haematoxylin and eosin-stained WSIs across 20 major tissue types, and evaluated on 33 representative CPath clinical tasks in CPath of varying diagnostic difficulties. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree code classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient AI models that can generalize and transfer to a gamut of diagnostically-challenging tasks and clinical workflows in anatomic pathology. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.00473 [pdf, other]

Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?

Authors: Phuong Quynh Le, Jörg Schlötterer, Christin Seifert

Abstract: Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite cla… ▽ More Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite class but with the spurious feature present. The recently proposed Deep Feature Reweighting (DFR) method improves accuracy of these worst groups. Based on the main argument that ERM mods can learn core features sufficiently well, DFR only needs to retrain the last layer of the classification model with a small group-balanced data set. In this work, we examine the applicability of DFR to realistic data in the medical domain. Furthermore, we investigate the reasoning behind the effectiveness of last-layer retraining and show that even though DFR has the potential to improve the accuracy of the worst group, it remains susceptible to spurious correlations. △ Less

Submitted 9 January, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: Accepted at IJCAI Workshop on XAI 2023

arXiv:2307.12914 [pdf, other]

Towards a Visual-Language Foundation Model for Computational Pathology

Authors: Ming Y. Lu, Bowen Chen, Drew F. K. Williamson, Richard J. Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Andrew Zhang, Long Phi Le, Georg Gerber, Anil V Parwani, Faisal Mahmood

Abstract: The accelerated adoption of digital pathology and advances in deep learning have enabled the development of powerful models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain and the model's usage is limited by the specific task and disease for which it is trained. Additionally,… ▽ More The accelerated adoption of digital pathology and advances in deep learning have enabled the development of powerful models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain and the model's usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and notably over 1.17 million image-caption pairs via task-agnostic pretraining. Evaluated on a suite of 13 diverse benchmarks, CONCH can be transferred to a wide range of downstream tasks involving either or both histopathology images and text, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval. CONCH represents a substantial leap over concurrent visual-language pretrained systems for histopathology, with the potential to directly facilitate a wide array of machine learning-based workflows requiring minimal or no further supervised fine-tuning. △ Less

Submitted 25 July, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2306.16678 [pdf, other]

BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models

Authors: Phuoc-Hoan Charles Le, Xinlin Li

Abstract: With the increasing popularity and the increasing size of vision transformers (ViTs), there has been an increasing interest in making them more efficient and less computationally costly for deployment on edge devices with limited computing resources. Binarization can be used to help reduce the size of ViT models and their computational cost significantly, using popcount operations when the weights… ▽ More With the increasing popularity and the increasing size of vision transformers (ViTs), there has been an increasing interest in making them more efficient and less computationally costly for deployment on edge devices with limited computing resources. Binarization can be used to help reduce the size of ViT models and their computational cost significantly, using popcount operations when the weights and the activations are in binary. However, ViTs suffer a larger performance drop when directly applying convolutional neural network (CNN) binarization methods or existing binarization methods to binarize ViTs compared to CNNs on datasets with a large number of classes such as ImageNet-1k. With extensive analysis, we find that binary vanilla ViTs such as DeiT miss out on a lot of key architectural properties that CNNs have that allow binary CNNs to have much higher representational capability than binary vanilla ViT. Therefore, we propose BinaryViT, in which inspired by the CNN architecture, we include operations from the CNN architecture into a pure ViT architecture to enrich the representational capability of a binary ViT without introducing convolutions. These include an average pooling layer instead of a token pooling layer, a block that contains multiple average pooling branches, an affine transformation right before the addition of each main residual connection, and a pyramid structure. Experimental results on the ImageNet-1k dataset show the effectiveness of these operations that allow a binary pure ViT model to be competitive with previous state-of-the-art (SOTA) binary CNN models. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: Accepted in CVPR 2023 Workshop on Efficient Deep Learning for Computer Vision (ECV)

arXiv:2306.07831 [pdf, other]

Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

Authors: Ming Y. Lu, Bowen Chen, Andrew Zhang, Drew F. K. Williamson, Richard J. Chen, Tong Ding, Long Phi Le, Yung-Sung Chuang, Faisal Mahmood

Abstract: Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of… ▽ More Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-text datasets and each image can span up to 100,000 x 100,000 pixels. In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models on gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. MI-Zero reformulates zero-shot transfer under the framework of multiple instance learning to overcome the computational challenge of inference on extremely large images. We used over 550k pathology reports and other available in-domain text corpora to pre-train our text encoder. By effectively leveraging strong pre-trained encoders, our best model pretrained on over 33k histopathology image-caption pairs achieves an average median zero-shot accuracy of 70.2% across three different real-world cancer subtyping tasks. Our code is available at: https://github.com/mahmoodlab/MI-Zero. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: Accepted to CVPR 2023

arXiv:2306.04739 [pdf, other]

Automatic retrieval of corresponding US views in longitudinal examinations

Authors: Hamideh Kerdegari, Tran Huy Nhat Phung1, Van Hao Nguyen, Thi Phuong Thao Truong, Ngoc Minh Thu Le, Thanh Phuong Le, Thi Mai Thao Le, Luigi Pisani, Linda Denehy, Vital Consortium, Reza Razavi, Louise Thwaites, Sophie Yacoub, Andrew P. King, Alberto Gomez

Abstract: Skeletal muscle atrophy is a common occurrence in critically ill patients in the intensive care unit (ICU) who spend long periods in bed. Muscle mass must be recovered through physiotherapy before patient discharge and ultrasound imaging is frequently used to assess the recovery process by measuring the muscle size over time. However, these manual measurements are subject to large variability, par… ▽ More Skeletal muscle atrophy is a common occurrence in critically ill patients in the intensive care unit (ICU) who spend long periods in bed. Muscle mass must be recovered through physiotherapy before patient discharge and ultrasound imaging is frequently used to assess the recovery process by measuring the muscle size over time. However, these manual measurements are subject to large variability, particularly since the scans are typically acquired on different days and potentially by different operators. In this paper, we propose a self-supervised contrastive learning approach to automatically retrieve similar ultrasound muscle views at different scan times. Three different models were compared using data from 67 patients acquired in the ICU. Results indicate that our contrastive model outperformed a supervised baseline model in the task of view retrieval with an AUえーゆーC of 73.52% and when combined with an automatic segmentation model achieved 5.7%+/-0.24% error in cross-sectional area. Furthermore, a user study survey confirmed the efficacy of our model for muscle view retrieval. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 10 pages, 6 figures

arXiv:2305.11400 [pdf, other]

Mode-Aware Continual Learning for Conditional Generative Adversarial Networks

Authors: Cat P. Le, Juncheng Dong, Ahmed Aloui, Vahid Tarokh

Abstract: The main challenge in continual learning for generative models is to effectively learn new target modes with limited samples while preserving previously learned ones. To this end, we introduce a new continual learning approach for conditional generative adversarial networks by leveraging a mode-affinity score specifically designed for generative modeling. First, the generator produces samples of e… ▽ More The main challenge in continual learning for generative models is to effectively learn new target modes with limited samples while preserving previously learned ones. To this end, we introduce a new continual learning approach for conditional generative adversarial networks by leveraging a mode-affinity score specifically designed for generative modeling. First, the generator produces samples of existing modes for subsequent replay. The discriminator is then used to compute the mode similarity measure, which identifies a set of closest existing modes to the target. Subsequently, a label for the target mode is generated and given as a weighted average of the labels within this set. We extend the continual learning model by training it on the target data with the newly-generated label, while performing memory replay to mitigate the risk of catastrophic forgetting. Experimental results on benchmark datasets demonstrate the gains of our continual learning approach over the state-of-the-art methods, even when using fewer training samples. △ Less

Submitted 23 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2303.02828 [pdf, ps, other]

Robust Autoencoders for Collective Corruption Removal

Authors: Taihui Li, Hengkang Wang, Peng Le, XianE Tang, Ju Sun

Abstract: Robust PCA is a standard tool for learning a linear subspace in the presence of sparse corruption or rare outliers. What about robustly learning manifolds that are more realistic models for natural data, such as images? There have been several recent attempts to generalize robust PCA to manifold settings. In this paper, we propose $\ell_1$- and scaling-invariant $\ell_1/\ell_2$-robust autoencoders… ▽ More Robust PCA is a standard tool for learning a linear subspace in the presence of sparse corruption or rare outliers. What about robustly learning manifolds that are more realistic models for natural data, such as images? There have been several recent attempts to generalize robust PCA to manifold settings. In this paper, we propose $\ell_1$- and scaling-invariant $\ell_1/\ell_2$-robust autoencoders based on a surprisingly compact formulation built on the intuition that deep autoencoders perform manifold learning. We demonstrate on several standard image datasets that the proposed formulation significantly outperforms all previous methods in collectively removing sparse corruption, without clean images for training. Moreover, we also show that the learned manifold structures can be generalized to unseen data samples effectively. △ Less

Submitted 5 March, 2023; originally announced March 2023.

Comments: This paper has been accepted to ICASSP2023

arXiv:2301.13372 [pdf, other]

Improving Open-Domain Dialogue Evaluation with a Causal Inference Model

Authors: Cat P. Le, Luke Dai, Michael Johnston, Yang Liu, Marilyn Walker, Reza Ghanadan

Abstract: Effective evaluation methods remain a significant challenge for research on open-domain conversational dialogue systems. Explicit satisfaction ratings can be elicited from users, but users often do not provide ratings when asked, and those they give can be highly subjective. Post-hoc ratings by experts are an alternative, but these can be both expensive and complex to collect. Here, we explore the… ▽ More Effective evaluation methods remain a significant challenge for research on open-domain conversational dialogue systems. Explicit satisfaction ratings can be elicited from users, but users often do not provide ratings when asked, and those they give can be highly subjective. Post-hoc ratings by experts are an alternative, but these can be both expensive and complex to collect. Here, we explore the creation of automated methods for predicting both expert and user ratings of open-domain dialogues. We compare four different approaches. First, we train a baseline model using an end-to-end transformer to predict ratings directly from the raw dialogue text. The other three methods are variants of a two-stage approach in which we first extract interpretable features at the turn level that capture, among other aspects, user dialogue behaviors indicating contradiction, repetition, disinterest, compliments, or criticism. We project these features to the dialogue level and train a dialogue-level MLP regression model, a dialogue-level LSTM, and a novel causal inference model called counterfactual-LSTM (CF-LSTM) to predict ratings. The proposed CF-LSTM is a sequential model over turn-level features which predicts ratings using multiple regressors depending on hypotheses derived from the turn-level features. As a causal inference model, CF-LSTM aims to learn the underlying causes of a specific event, such as a low rating. We also bin the user ratings and perform classification experiments with all four models. In evaluation experiments on conversational data from the Alexa Prize SocialBot, we show that the CF-LSTM achieves the best performance for predicting dialogue ratings and classification. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: Accepted as a conference paper at IWSDS 2023

arXiv:2301.11716 [pdf, other]

Pre-training for Speech Translation: CTC Meets Optimal Transport

Authors: Phuong-Hang Le, Hongyu Gong, Changhan Wang, Juan Pino, Benjamin Lecouteux, Didier Schwab

Abstract: The gap between speech and text modalities is a major challenge in speech-to-text translation (ST). Different methods have been proposed to reduce this gap, but most of them require architectural changes in ST training. In this work, we propose to mitigate this issue at the pre-training stage, requiring no change in the ST model. First, we show that the connectionist temporal classification (CTC)… ▽ More The gap between speech and text modalities is a major challenge in speech-to-text translation (ST). Different methods have been proposed to reduce this gap, but most of them require architectural changes in ST training. In this work, we propose to mitigate this issue at the pre-training stage, requiring no change in the ST model. First, we show that the connectionist temporal classification (CTC) loss can reduce the modality gap by design. We provide a quantitative comparison with the more common cross-entropy loss, showing that pre-training with CTC consistently achieves better final ST accuracy. Nevertheless, CTC is only a partial solution and thus, in our second contribution, we propose a novel pre-training method combining CTC and optimal transport to further reduce this gap. Our method pre-trains a Siamese-like model composed of two encoders, one for acoustic inputs and the other for textual inputs, such that they produce representations that are close to each other in the Wasserstein space. Extensive experiments on the standard CoVoST-2 and MuST-C datasets show that our pre-training method applied to the vanilla encoder-decoder Transformer achieves state-of-the-art performance under the no-external-data setting, and performs on par with recent strong multi-task learning systems trained with external data. Finally, our method can also be applied on top of these multi-task systems, leading to further improvements for these models. Code and pre-trained models are available at https://github.com/formiel/fairseq. △ Less

Submitted 5 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

Comments: ICML 2023 (oral presentation). This version fixed URLs, updated affiliations & acknowledgements, and improved formatting

arXiv:2210.15897 [pdf, other]

Single-Image HDR Reconstruction by Multi-Exposure Generation

Authors: Phuoc-Hieu Le, Quynh Le, Rang Nguyen, Binh-Son Hua

Abstract: High dynamic range (HDR) imaging is an indispensable technique in modern photography. Traditional methods focus on HDR reconstruction from multiple images, solving the core problems of image alignment, fusion, and tone mapping, yet having a perfect solution due to ghosting and other visual artifacts in the reconstruction. Recent attempts at single-image HDR reconstruction show a promising alternat… ▽ More High dynamic range (HDR) imaging is an indispensable technique in modern photography. Traditional methods focus on HDR reconstruction from multiple images, solving the core problems of image alignment, fusion, and tone mapping, yet having a perfect solution due to ghosting and other visual artifacts in the reconstruction. Recent attempts at single-image HDR reconstruction show a promising alternative: by learning to map pixel values to their irradiance using a neural network, one can bypass the align-and-merge pipeline completely yet still obtain a high-quality HDR image. In this work, we propose a weakly supervised learning method that inverts the physical image formation process for HDR reconstruction via learning to generate multiple exposures from a single image. Our neural network can invert the camera response to reconstruct pixel irradiance before synthesizing multiple exposures and hallucinating details in under- and over-exposed regions from a single input image. To train the network, we propose a representation loss, a reconstruction loss, and a perceptual loss applied on pairs of under- and over-exposure images and thus do not require HDR images for training. Our experiments show that our proposed model can effectively reconstruct HDR images. Our qualitative and quantitative results show that our method achieves state-of-the-art performance on the DrTMO dataset. Our code is available at https://github.com/VinAIResearch/single_image_hdr. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: WACV 2023 paper. 8 pages of content, 2 pages of references, 8 pages of supplementary material

arXiv:2210.00380 [pdf, other]

Transfer Learning for Individual Treatment Effect Estimation

Authors: Ahmed Aloui, Juncheng Dong, Cat P. Le, Vahid Tarokh

Abstract: This work considers the problem of transferring causal knowledge between tasks for Individual Treatment Effect (ITE) estimation. To this end, we theoretically assess the feasibility of transferring ITE knowledge and present a practical framework for efficient transfer. A lower bound is introduced on the ITE error of the target task to demonstrate that ITE knowledge transfer is challenging due to t… ▽ More This work considers the problem of transferring causal knowledge between tasks for Individual Treatment Effect (ITE) estimation. To this end, we theoretically assess the feasibility of transferring ITE knowledge and present a practical framework for efficient transfer. A lower bound is introduced on the ITE error of the target task to demonstrate that ITE knowledge transfer is challenging due to the absence of counterfactual information. Nevertheless, we establish generalization upper bounds on the counterfactual loss and ITE error of the target task, demonstrating the feasibility of ITE knowledge transfer. Subsequently, we introduce a framework with a new Causal Inference Task Affinity (CITA) measure for ITE knowledge transfer. Specifically, we use CITA to find the closest source task to the target task and utilize it for ITE knowledge transfer. Empirical studies are provided, demonstrating the efficacy of the proposed method. We observe that ITE knowledge transfer can significantly (up to 95%) reduce the amount of data required for ITE estimation. △ Less

Submitted 5 June, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

arXiv:2205.05211 [pdf, other]

TreePIR: Efficient Private Retrieval of Merkle Proofs via Tree Colorings with Fast Indexing and Zero Storage Overhead

Authors: Son Hoang Dau, Quang Cao, Rinaldo Gagiano, Duy Huynh, Xun Yi, Phuc Lu Le, Quang-Hung Luu, Emanuele Viterbo, Yu-Chih Huang, Jingge Zhu, Mohammad M. Jalalzai, Chen Feng

Abstract: A Batch Private Information Retrieval (batch-PIR) scheme allows a client to retrieve multiple data items from a database without revealing them to the storage server(s). Most existing approaches for batch-PIR are based on batch codes, in particular, probabilistic batch codes (PBC) (Angel et al. S&P'18), which incur large storage overheads. In this work, we show that \textit{zero} storage overhead… ▽ More A Batch Private Information Retrieval (batch-PIR) scheme allows a client to retrieve multiple data items from a database without revealing them to the storage server(s). Most existing approaches for batch-PIR are based on batch codes, in particular, probabilistic batch codes (PBC) (Angel et al. S&P'18), which incur large storage overheads. In this work, we show that \textit{zero} storage overhead is achievable for tree-shaped databases. In particular, we develop TreePIR, a novel approach tailored made for private retrieval of the set of nodes along an arbitrary root-to-leaf path in a Merkle tree with no storage redundancy. This type of trees has been widely implemented in many real-world systems such as Amazon DynamoDB, Google's Certificate Transparency, and blockchains. Tree nodes along a root-to-leaf path forms the well-known Merkle proof. TreePIR, which employs a novel tree coloring, outperforms PBC, a fundamental component in state-of-the-art batch-PIR schemes (Angel et al. S&P'18, Mughees-Ren S&P'23, Liu et al. S&P'24), in all metrics, achieving $3\times$ lower total storage and $1.5$-$2\times$ lower computation and communication costs. Most notably, TreePIR has $8$-$160\times$ lower setup time and its polylog-complexity indexing algorithm is $19$-$160\times$ faster than PBC for trees of $2^{10}$-$2^{24}$ leaves. △ Less

Submitted 4 June, 2024; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: 25 pages

MSC Class: 05C05; 05C15; 05C85; 05C90; ACM Class: G.2.2; F.2.0; E.1

arXiv:2110.02399 [pdf, other]

Task Affinity with Maximum Bipartite Matching in Few-Shot Learning

Authors: Cat P. Le, Juncheng Dong, Mohammadreza Soltani, Vahid Tarokh

Abstract: We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one. Our method is based on the maximum bipartite matching algorithm and utilizes the Fisher Information matrix. We provide theoretical analyses demonstrating that the proposed score is mathematically well-defined, and subsequently use the affinity score to propose a… ▽ More We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one. Our method is based on the maximum bipartite matching algorithm and utilizes the Fisher Information matrix. We provide theoretical analyses demonstrating that the proposed score is mathematically well-defined, and subsequently use the affinity score to propose a novel algorithm for the few-shot learning problem. In particular, using this score, we find relevant training data labels to the test data and leverage the discovered relevant data for episodically fine-tuning a few-shot model. Results on various few-shot benchmark datasets demonstrate the efficacy of the proposed approach by improving the classification accuracy over the state-of-the-art methods even when using smaller models. △ Less

Submitted 21 January, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: Accepted as a conference paper at ICLR 2022

arXiv:2108.01806 [pdf, other]

Neural Scene Decoration from a Single Photograph

Authors: Hong-Wing Pang, Yingshu Chen, Phuoc-Hieu Le, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung

Abstract: Furnishing and rendering indoor scenes has been a long-standing task for interior design, where artists create a conceptual design for the space, build a 3D model of the space, decorate, and then perform rendering. Although the task is important, it is tedious and requires tremendous effort. In this paper, we introduce a new problem of domain-specific indoor scene image synthesis, namely neural sc… ▽ More Furnishing and rendering indoor scenes has been a long-standing task for interior design, where artists create a conceptual design for the space, build a 3D model of the space, decorate, and then perform rendering. Although the task is important, it is tedious and requires tremendous effort. In this paper, we introduce a new problem of domain-specific indoor scene image synthesis, namely neural scene decoration. Given a photograph of an empty indoor space and a list of decorations with layout determined by user, we aim to synthesize a new image of the same space with desired furnishing and decorations. Neural scene decoration can be applied to create conceptual interior designs in a simple yet effective manner. Our attempt to this research problem is a novel scene generation architecture that transforms an empty scene and an object layout into a realistic furnished scene photograph. We demonstrate the performance of our proposed method by comparing it with conditional image synthesis baselines built upon prevailing image translation approaches both qualitatively and quantitatively. We conduct extensive experiments to further validate the plausibility and aesthetics of our generated scenes. Our implementation is available at \url{https://github.com/hkust-vgd/neural_scene_decoration}. △ Less

Submitted 25 July, 2022; v1 submitted 3 August, 2021; originally announced August 2021.

Comments: ECCV 2022 paper. 14 pages of main content, 4 pages of references, and 11 pages of appendix

arXiv:2103.13735 [pdf, ps, other]

doi 10.1145/3452143.3465546

Faster One Block Quantifier Elimination for Regular Polynomial Systems of Equations

Authors: Huu Phuoc Le, Mohab Safey El Din

Abstract: Quantifier elimination over the reals is a central problem in computational real algebraic geometry, polynomial system solving and symbolic computation. Given a semi-algebraic formula (whose atoms are polynomial constraints) with quantifiers on some variables, it consists in computing a logically equivalent formula involving only unquantified variables. When there is no alternation of quantifiers,… ▽ More Quantifier elimination over the reals is a central problem in computational real algebraic geometry, polynomial system solving and symbolic computation. Given a semi-algebraic formula (whose atoms are polynomial constraints) with quantifiers on some variables, it consists in computing a logically equivalent formula involving only unquantified variables. When there is no alternation of quantifiers, one has a one block quantifier elimination problem. This paper studies a variant of the one block quantifier elimination in which we compute an almost equivalent formula of the input. We design a new probabilistic efficient algorithm for solving this variant when the input is a system of polynomial equations satisfying some regularity assumptions. When the input is generic, involves $s$ polynomials of degree bounded by $D$ with $n$ quantified variables and $t$ unquantified ones, we prove that this algorithm outputs semi-algebraic formulas of degree bounded by $\mathcal{D}$ using $O\ {\widetilde{~}}\left ((n-s+1)\ 8^{t}\ \mathcal{D}^{3t+2} \binom{t+\mathcal{D}}{t} \right )$ arithmetic operations in the ground field where $\mathcal{D} = 2(n+s)\ D^s(D-1)^{n-s+1}\ \binom{n}{s}$. In practice, it allows us to solve quantifier elimination problems which are out of reach of the state-of-the-art (up to $8$ variables). △ Less

Submitted 24 May, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

Comments: International Symposium on Symbolic and Algebraic Computation 2021, Jul. 2021, Saint-Petersbourg, Russia

ACM Class: I.1.2

arXiv:2103.12827 [pdf, other]

doi 10.1109/ACCESS.2022.3171741

Fisher Task Distance and Its Application in Neural Architecture Search

Authors: Cat P. Le, Mohammadreza Soltani, Juncheng Dong, Vahid Tarokh

Abstract: We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Tas… ▽ More We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Taskonomy datasets. Next, we construct an online neural architecture search framework using the Fisher task distance, in which we have access to the past learned tasks. By using the Fisher task distance, we can identify the closest learned tasks to the target task, and utilize the knowledge learned from these related tasks for the target task. Here, we show how the proposed distance between a target task and a set of learned tasks can be used to reduce the neural architecture search space for the target task. The complexity reduction in search space for task-specific architectures is achieved by building on the optimized architectures for similar tasks instead of doing a full search and without using this side information. Experimental results for tasks in MNIST, CIFAR-10, CIFAR-100, ImageNet datasets demonstrate the efficacy of the proposed approach and its improvements, in terms of the performance and the number of parameters, over other gradient-based search methods, such as ENAS, DARTS, PC-DARTS. △ Less

Submitted 30 April, 2022; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: Published in IEEE Access, Volume 10, 2022

arXiv:2103.00241 [pdf, other]

Improved Automated Machine Learning from Transfer Learning

Authors: Cat P. Le, Mohammadreza Soltani, Robert Ravier, Vahid Tarokh

Abstract: In this paper, we propose a neural architecture search framework based on a similarity measure between some baseline tasks and a target task. We first define the notion of the task similarity based on the log-determinant of the Fisher Information matrix. Next, we compute the task similarity from each of the baseline tasks to the target task. By utilizing the relation between a target and a set of… ▽ More In this paper, we propose a neural architecture search framework based on a similarity measure between some baseline tasks and a target task. We first define the notion of the task similarity based on the log-determinant of the Fisher Information matrix. Next, we compute the task similarity from each of the baseline tasks to the target task. By utilizing the relation between a target and a set of learned baseline tasks, the search space of architectures for the target task can be significantly reduced, making the discovery of the best candidates in the set of possible architectures tractable and efficient, in terms of GPU days. This method eliminates the requirement for training the networks from scratch for a given target task as well as introducing the bias in the initialization of the search space from the human domain. △ Less

Submitted 29 January, 2022; v1 submitted 27 February, 2021; originally announced March 2021.

arXiv:2102.05924 [pdf, other]

Toward Improving Coherence and Diversity of Slogan Generation

Authors: Yiping Jin, Akshay Bhatia, Dittaya Wanvarie, Phu T. V. Le

Abstract: Previous work in slogan generation focused on utilising slogan skeletons mined from existing slogans. While some generated slogans can be catchy, they are often not coherent with the company's focus or style across their marketing communications because the skeletons are mined from other companies' slogans. We propose a sequence-to-sequence (seq2seq) transformer model to generate slogans from a br… ▽ More Previous work in slogan generation focused on utilising slogan skeletons mined from existing slogans. While some generated slogans can be catchy, they are often not coherent with the company's focus or style across their marketing communications because the skeletons are mined from other companies' slogans. We propose a sequence-to-sequence (seq2seq) transformer model to generate slogans from a brief company description. A naive seq2seq model fine-tuned for slogan generation is prone to introducing false information. We use company name delexicalisation and entity masking to alleviate this problem and improve the generated slogans' quality and truthfulness. Furthermore, we apply conditional training based on the first words' POS tag to generate syntactically diverse slogans. Our best model achieved a ROUGE-1/-2/-L F1 score of 35.58/18.47/33.32. Besides, automatic and human evaluations indicate that our method generates significantly more factual, diverse and catchy slogans than strong LSTM and transformer seq2seq baselines. △ Less

Submitted 7 September, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

Comments: Submitted to the NLE journal

arXiv:2012.10550 [pdf, other]

Grant-Free Random Access in Machine-Type Communication: Approaches and Challenges

Authors: Jinho Choi, Jie Ding, Ngoc Phuc Le, Zhiguo Ding

Abstract: Massive machine-type communication (MTC) is expected to play a key role in supporting Internet of Things (IoT) applications such as smart cities, smart factory, and connected vehicles through cellular networks. MTC is characterized by a large number of MTC devices and their sparse activities, which are difficult to be supported by conventional approaches and motivate the design of new access techn… ▽ More Massive machine-type communication (MTC) is expected to play a key role in supporting Internet of Things (IoT) applications such as smart cities, smart factory, and connected vehicles through cellular networks. MTC is characterized by a large number of MTC devices and their sparse activities, which are difficult to be supported by conventional approaches and motivate the design of new access technologies. In particular, in the 5th generation (5G), grant-free or 2-step random access schemes are introduced for MTC to be more efficient by reducing signaling overhead. In this article, we first introduce grant-free random access and discuss how it can be modified with massive multiple-input multiple-output (MIMO) to exploit a high spatial multiplexing gain. We then explain preamble designs that can improve the performance and variations based on the notion of non-orthogonal multiple access (NOMA). Finally, design challenges of grant-free random access towards next generation cellular systems are presented. △ Less

Submitted 18 December, 2020; originally announced December 2020.

Comments: 6 pages, 3 figures

arXiv:2011.14136 [pdf, ps, other]

doi 10.1016/j.jsc.2021.12.002

Solving parametric systems of polynomial equations over the reals through Hermite matrices

Authors: Huu Phuoc Le, Mohab Safey El Din

Abstract: We design a new algorithm for solving parametric systems having finitely many complex solutions for generic values of the parameters. More precisely, let $f = (f_1, \ldots, f_m)\subset \mathbb{Q}[y][x]$ with $y = (y_1, \ldots, y_t)$ and $x = (x_1, \ldots, x_n)$, $V\subset \mathbb{C}^{t+n}$ be the algebraic set defined by $f$ and $πぱい$ be the projection $(y, x) \to y$. Under the assumptions that $f$… ▽ More We design a new algorithm for solving parametric systems having finitely many complex solutions for generic values of the parameters. More precisely, let $f = (f_1, \ldots, f_m)\subset \mathbb{Q}[y][x]$ with $y = (y_1, \ldots, y_t)$ and $x = (x_1, \ldots, x_n)$, $V\subset \mathbb{C}^{t+n}$ be the algebraic set defined by $f$ and $πぱい$ be the projection $(y, x) \to y$. Under the assumptions that $f$ admits finitely many complex roots for generic values of $y$ and that the ideal generated by $f$ is radical, we solve the following problem. On input $f$, we compute semi-algebraic formulas defining semi-algebraic subsets $S_1, \ldots, S_l$ of the $y$-space such that $\cup_{i=1}^l S_i$ is dense in $\mathbb{R}^t$ and the number of real points in $V\cap πぱい^{-1}(ηいーた)$ is invariant when $ηいーた$ varies over each $S_i$. This algorithm exploits properties of some well chosen monomial bases in the algebra $\mathbb{Q}(y)[x]/I$ where $I$ is the ideal generated by $f$ in $\mathbb{Q}(y)[x]$ and the specialization property of the so-called Hermite matrices. This allows us to obtain compact representations of the sets $S_i$ by means of semi-algebraic formulas encoding the signature of a symmetric matrix. When $f$ satisfies extra genericity assumptions, we derive complexity bounds on the number of arithmetic operations in $\mathbb{Q}$ and the degree of the output polynomials. Let $d$ be the maximal degree of the $f_i$'s and $D = n(d-1)d^n$, we prove that, on a generic $f=(f_1,\ldots,f_n)$, one can compute those semi-algebraic formulas with $O^~( \binom{t+D}{t}2^{3t}n^{2t+1} d^{3nt+2(n+t)+1})$ operations in $\mathbb{Q}$ and that the polynomials involved have degree bounded by $D$. We report on practical experiments which illustrate the efficiency of our algorithm on generic systems and systems from applications. It allows us to solve problems which are out of reach of the state-of-the-art. △ Less

Submitted 16 December, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

ACM Class: I.1.2

Journal ref: Journal of Symbolic Computation, 2021

arXiv:2011.05295 [pdf, other]

DoLFIn: Distributions over Latent Features for Interpretability

Authors: Phong Le, Willem Zuidema

Abstract: Interpreting the inner workings of neural models is a key step in ensuring the robustness and trustworthiness of the models, but work on neural network interpretability typically faces a trade-off: either the models are too constrained to be very useful, or the solutions found by the models are too complex to interpret. We propose a novel strategy for achieving interpretability that -- in our expe… ▽ More Interpreting the inner workings of neural models is a key step in ensuring the robustness and trustworthiness of the models, but work on neural network interpretability typically faces a trade-off: either the models are too constrained to be very useful, or the solutions found by the models are too complex to interpret. We propose a novel strategy for achieving interpretability that -- in our experiments -- avoids this trade-off. Our approach builds on the success of using probability as the central quantity, such as for instance within the attention mechanism. In our architecture, DoLFIn (Distributions over Latent Features for Interpretability), we do no determine beforehand what each feature represents, and features go altogether into an unordered set. Each feature has an associated probability ranging from 0 to 1, weighing its importance for further processing. We show that, unlike attention and saliency map approaches, this set-up makes it straight-forward to compute the probability with which an input component supports the decision the neural model makes. To demonstrate the usefulness of the approach, we apply DoLFIn to text classification, and show that DoLFIn not only provides interpretable solutions, but even slightly outperforms the classical CNN and BiLSTM text classifiers on the SST2 and AG-news datasets. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Journal ref: COLING 2020

arXiv:2010.13962 [pdf, ps, other]

Task-Aware Neural Architecture Search

Authors: Cat P. Le, Mohammadreza Soltani, Robert Ravier, Vahid Tarokh

Abstract: The design of handcrafted neural networks requires a lot of time and resources. Recent techniques in Neural Architecture Search (NAS) have proven to be competitive or better than traditional handcrafted design, although they require domain knowledge and have generally used limited search spaces. In this paper, we propose a novel framework for neural architecture search, utilizing a dictionary of m… ▽ More The design of handcrafted neural networks requires a lot of time and resources. Recent techniques in Neural Architecture Search (NAS) have proven to be competitive or better than traditional handcrafted design, although they require domain knowledge and have generally used limited search spaces. In this paper, we propose a novel framework for neural architecture search, utilizing a dictionary of models of base tasks and the similarity between the target task and the atoms of the dictionary; hence, generating an adaptive search space based on the base models of the dictionary. By introducing a gradient-based search algorithm, we can evaluate and discover the best architecture in the search space without fully training the networks. The experimental results show the efficacy of our proposed task-aware approach. △ Less

Submitted 15 March, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

arXiv:2010.13187 [pdf, other]

Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling

Authors: Akash Srivastava, Yamini Bansal, Yukun Ding, Cole Lincoln Hurwitz, Kai Xu, Bernhard Egger, Prasanna Sattigeri, Joshua B. Tenenbaum, Phuong Le, Arun Prakash R, Nengfeng Zhou, Joel Vaughan, Yaquan Wang, Anwesha Bhattacharyya, Kristjan Greenewald, David D. Cox, Dan Gutfreund

Abstract: Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors. This approach introduces a trade-off between disentangled representation learning and reconstruction quality since the model does not have enough capacity to learn correlated latent variables that capture… ▽ More Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors. This approach introduces a trade-off between disentangled representation learning and reconstruction quality since the model does not have enough capacity to learn correlated latent variables that capture detail information present in most image data. To overcome this trade-off, we present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method; then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables, adding detail information while maintaining conditioning on the previously learned disentangled factors. Taken together, our multi-stage modelling approach results in a single, coherent probabilistic model that is theoretically justified by the principal of D-separation and can be realized with a variety of model classes including likelihood-based models such as variational autoencoders, implicit models such as generative adversarial networks, and tractable models like normalizing flows or mixtures of Gaussians. We demonstrate that our multi-stage model has higher reconstruction quality than current state-of-the-art methods with equivalent disentanglement performance across multiple standard benchmarks. In addition, we apply the multi-stage model to generate synthetic tabular datasets, showcasing an enhanced performance over benchmark models across a variety of metrics. The interpretability analysis further indicates that the multi-stage model can effectively uncover distinct and meaningful features of variations from which the original distribution can be recovered. △ Less

Submitted 3 April, 2024; v1 submitted 25 October, 2020; originally announced October 2020.

arXiv:2008.10331 [pdf, ps, other]

doi 10.1145/3373207.3404049

Computing the Real Isolated Points of an Algebraic Hypersurface

Authors: Huu Phuoc Le, Mohab Safey El Din, Timo de Wolff

Abstract: Let $\mathbb{R}$ be the field of real numbers. We consider the problem of computing the real isolated points of a real algebraic set in $\mathbb{R}^n$ given as the vanishing set of a polynomial system. This problem plays an important role for studying rigidity properties of mechanism in material designs. In this paper, we design an algorithm which solves this problem. It is based on the computatio… ▽ More Let $\mathbb{R}$ be the field of real numbers. We consider the problem of computing the real isolated points of a real algebraic set in $\mathbb{R}^n$ given as the vanishing set of a polynomial system. This problem plays an important role for studying rigidity properties of mechanism in material designs. In this paper, we design an algorithm which solves this problem. It is based on the computations of critical points as well as roadmaps for answering connectivity queries in real algebraic sets. This leads to a probabilistic algorithm of complexity $(nd)^{O(n\log(n))}$ for computing the real isolated points of real algebraic hypersurfaces of degree $d$. It allows us to solve in practice instances which are out of reach of the state-of-the-art. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: Conference paper ISSAC 2020

arXiv:2005.00087 [pdf, other]

Revisiting Unsupervised Relation Extraction

Authors: Thy Thy Tran, Phong Le, Sophia Ananiadou

Abstract: Unsupervised relation extraction (URE) extracts relations between named entities from raw text without manually-labelled data and existing knowledge bases (KBs). URE methods can be categorised into generative and discriminative approaches, which rely either on hand-crafted features or surface form. However, we demonstrate that by using only named entities to induce relation types, we can outperfor… ▽ More Unsupervised relation extraction (URE) extracts relations between named entities from raw text without manually-labelled data and existing knowledge bases (KBs). URE methods can be categorised into generative and discriminative approaches, which rely either on hand-crafted features or surface form. However, we demonstrate that by using only named entities to induce relation types, we can outperform existing methods on two popular datasets. We conduct a comparison and evaluation of our findings with other URE techniques, to ascertain the important features in URE. We conclude that entity types provide a strong inductive bias for URE. △ Less

Submitted 30 April, 2020; originally announced May 2020.

Comments: 8 pages, 1 figure, 2 tables. Accepted in ACL 2020

arXiv:2003.01281 [pdf, ps, other]

Code-domain NOMA in Massive MIMO: When is it Needed?

Authors: Mai T. P. Le, Luca Sanguinetti, Emil Björnson, Maria-Gabriella Di Benedetto

Abstract: In overloaded Massive MIMO (mMIMO) systems, wherein the number $K$ of user equipments (UEs) exceeds the number of base station antennas $M$, it has recently been shown that non-orthogonal multiple access (NOMA) can increase the sum spectral efficiency. This paper aims at identifying cases where code-domain NOMA can improve the spectral efficiency of mMIMO in the classical regime where $K < M$. Nov… ▽ More In overloaded Massive MIMO (mMIMO) systems, wherein the number $K$ of user equipments (UEs) exceeds the number of base station antennas $M$, it has recently been shown that non-orthogonal multiple access (NOMA) can increase the sum spectral efficiency. This paper aims at identifying cases where code-domain NOMA can improve the spectral efficiency of mMIMO in the classical regime where $K < M$. Novel spectral efficiency expressions are provided for the uplink and downlink with arbitrary spreading signatures and spatial correlation matrices. Particular attention is devoted to the planar arrays that are currently being deployed in pre-5G and 5G networks (in sub$-6$ GHz bands), which are characterized by limited spatial resolution. Numerical results show that mMIMO with such planar arrays can benefit from NOMA in scenarios where the UEs are spatially close to each other. A two-step UE grouping scheme is proposed for NOMA-aided mMIMO systems that is applicable to the spatial correlation matrices of the UEs that are currently active in each cell. Numerical results are used to investigate the performance of the algorithm under different operating conditions and types of spreading signatures (orthogonal, sparse and random sets). The analysis reveals that orthogonal signatures provide the highest average spectral efficiency. △ Less

Submitted 3 April, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: to appear IEEE Trans. Vehicular Technology. Copyright (c) 2015 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE

arXiv:1912.09421 [pdf, other]

Neural Design Network: Graphic Layout Generation with Constraints

Authors: Hsin-Ying Lee, Lu Jiang, Irfan Essa, Phuong B Le, Haifeng Gong, Ming-Hsuan Yang, Weilong Yang

Abstract: Graphic design is essential for visual communication with layouts being fundamental to composing attractive designs. Layout generation differs from pixel-level image synthesis and is unique in terms of the requirement of mutual relations among the desired components. We propose a method for design layout generation that can satisfy user-specified constraints. The proposed neural design network (ND… ▽ More Graphic design is essential for visual communication with layouts being fundamental to composing attractive designs. Layout generation differs from pixel-level image synthesis and is unique in terms of the requirement of mutual relations among the desired components. We propose a method for design layout generation that can satisfy user-specified constraints. The proposed neural design network (NDN) consists of three modules. The first module predicts a graph with complete relations from a graph with user-specified relations. The second module generates a layout from the predicted graph. Finally, the third module fine-tunes the predicted layout. Quantitative and qualitative experiments demonstrate that the generated layouts are visually similar to real design layouts. We also construct real designs based on predicted layouts for a better understanding of the visual quality. Finally, we demonstrate a practical application on layout recommendation. △ Less

Submitted 16 July, 2020; v1 submitted 19 December, 2019; originally announced December 2019.

Comments: European Conference on Computer Vision (ECCV) 2020

arXiv:1910.11067 [pdf, ps, other]

doi 10.1109/ICASSP40776.2020.9054118

Supervised Encoding for Discrete Representation Learning

Authors: Cat P. Le, Yi Zhou, Jie Ding, Vahid Tarokh

Abstract: Classical supervised classification tasks search for a nonlinear mapping that maps each encoded feature directly to a probability mass over the labels. Such a learning framework typically lacks the intuition that encoded features from the same class tend to be similar and thus has little interpretability for the learned features. In this paper, we propose a novel supervised learning model named Su… ▽ More Classical supervised classification tasks search for a nonlinear mapping that maps each encoded feature directly to a probability mass over the labels. Such a learning framework typically lacks the intuition that encoded features from the same class tend to be similar and thus has little interpretability for the learned features. In this paper, we propose a novel supervised learning model named Supervised-Encoding Quantizer (SEQ). The SEQ applies a quantizer to cluster and classify the encoded features. We found that the quantizer provides an interpretable graph where each cluster in the graph represents a class of data samples that have a particular style. We also trained a decoder that can decode convex combinations of the encoded features from similar and different clusters and provide guidance on style transfer between sub-classes. △ Less

Submitted 14 October, 2019; originally announced October 2019.

arXiv:1907.02648 [pdf, other]

What is the Benefit of Code-domain NOMA in Massive MIMO?

Authors: Mai T. P. Le, Luca Sanguinetti, Emil Björnson, Maria-Gabriella Di Benedetto

Abstract: In overloaded Massive MIMO systems, wherein the number K of user equipments (UEs) exceeds the number of base station antennas M, it has recently been shown that non-orthogonal multiple access (NOMA) can increase performance. This paper aims at identifying cases of the classical operating regime K < M, where code-domain NOMA can also improve the spectral efficiency of Massive MIMO. Particular atten… ▽ More In overloaded Massive MIMO systems, wherein the number K of user equipments (UEs) exceeds the number of base station antennas M, it has recently been shown that non-orthogonal multiple access (NOMA) can increase performance. This paper aims at identifying cases of the classical operating regime K < M, where code-domain NOMA can also improve the spectral efficiency of Massive MIMO. Particular attention is given to use cases in which poor favorable propagation conditions are experienced. Numerical results show that Massive MIMO with planar antenna arrays can benefit from NOMA in practical scenarios where the UEs are spatially close to each other. △ Less

Submitted 3 July, 2019; originally announced July 2019.

Comments: To appear at the 2019 IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (IEEE PIMRC 2019), 5 pages, 5 figures

arXiv:1906.01250 [pdf, other]

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Authors: Phong Le, Ivan Titov

Abstract: Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we… ▽ More Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we use the candidate lists as weak supervision to constrain our document-level entity linking model. The model treats entities as latent variables and, when estimated on a collection of unlabelled texts, learns to choose entities relying both on local context of each mention and on coherence with other entities in the document. The resulting approach rivals fully-supervised state-of-the-art systems on standard test sets. It also approaches their performance in the very challenging setting: when tested on a test set sampled from the data used to estimate the supervised systems. By comparing to Wikipedia-only training of our model, we demonstrate that modeling unlabeled documents is beneficial. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: ACL2019

arXiv:1905.07189 [pdf, other]

Distant Learning for Entity Linking with Automatic Noise Detection

Authors: Phong Le, Ivan Titov

Abstract: Accurate entity linkers have been produced for domains and languages where annotated data (i.e., texts linked to a knowledge base) is available. However, little progress has been made for the settings where no or very limited amounts of labeled data are present (e.g., legal or most scientific domains). In this work, we show how we can learn to link mentions without having any labeled examples, onl… ▽ More Accurate entity linkers have been produced for domains and languages where annotated data (i.e., texts linked to a knowledge base) is available. However, little progress has been made for the settings where no or very limited amounts of labeled data are present (e.g., legal or most scientific domains). In this work, we show how we can learn to link mentions without having any labeled examples, only a knowledge base and a collection of unannotated texts from the corresponding domain. In order to achieve this, we frame the task as a multi-instance learning problem and rely on surface matching to create initial noisy labels. As the learning signal is weak and our surrogate labels are noisy, we introduce a noise detection component in our model: it lets the model detect and disregard examples which are likely to be noisy. Our method, jointly learning to detect noise and link entities, greatly outperforms the surface matching baseline. For a subset of entity categories, it even approaches the performance of supervised learning. △ Less

Submitted 4 June, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

Comments: ACL 2019

arXiv:1804.10637 [pdf, other]

Improving Entity Linking by Modeling Latent Relations between Mentions

Authors: Phong Le, Ivan Titov

Abstract: Entity linking involves aligning textual mentions of named entities to their corresponding entries in a knowledge base. Entity linking systems often exploit relations between textual mentions in a document (e.g., coreference) to decide if the linking decisions are compatible. Unlike previous approaches, which relied on supervised systems or heuristics to predict these relations, we treat relations… ▽ More Entity linking involves aligning textual mentions of named entities to their corresponding entries in a knowledge base. Entity linking systems often exploit relations between textual mentions in a document (e.g., coreference) to decide if the linking decisions are compatible. Unlike previous approaches, which relied on supervised systems or heuristics to predict these relations, we treat relations as latent variables in our neural entity-linking model. We induce the relations without any supervision while optimizing the entity-linking system in an end-to-end fashion. Our multi-relational model achieves the best reported scores on the standard benchmark (AIDA-CoNLL) and substantially outperforms its relation-agnostic version. Its training also converges much faster, suggesting that the injected structural bias helps to explain regularities in the training data. △ Less

Submitted 27 April, 2018; originally announced April 2018.

Comments: ACL 2018

arXiv:1710.06499 [pdf, other]

Fundamental Limits of Low-Density Spreading NOMA with Fading

Authors: Mai T. P. Le, Guido Carlo Ferrante, Tony Q. S. Quek, Maria-Gabriella Di Benedetto

Abstract: Spectral efficiency of low-density spreading non-orthogonal multiple access channels in the presence of fading is derived for linear detection with independent decoding as well as optimum decoding. The large system limit, where both the number of users and number of signal dimensions grow with fixed ratio, called load, is considered. In the case of optimum decoding, it is found that low-density sp… ▽ More Spectral efficiency of low-density spreading non-orthogonal multiple access channels in the presence of fading is derived for linear detection with independent decoding as well as optimum decoding. The large system limit, where both the number of users and number of signal dimensions grow with fixed ratio, called load, is considered. In the case of optimum decoding, it is found that low-density spreading underperforms dense spreading for all loads. Conversely, linear detection is characterized by different behaviors in the underloaded vs. overloaded regimes. In particular, it is shown that spectral efficiency changes smoothly as load increases. However, in the overloaded regime, the spectral efficiency of low- density spreading is higher than that of dense spreading. △ Less

Submitted 27 January, 2018; v1 submitted 17 October, 2017; originally announced October 2017.

arXiv:1708.00514 [pdf, other]

Dense Piecewise Planar RGB-D SLAM for Indoor Environments

Authors: Phi-Hung Le, Jana Kosecka

Abstract: The paper exploits weak Manhattan constraints to parse the structure of indoor environments from RGB-D video sequences in an online setting. We extend the previous approach for single view parsing of indoor scenes to video sequences and formulate the problem of recovering the floor plan of the environment as an optimal labeling problem solved using dynamic programming. The temporal continuity is e… ▽ More The paper exploits weak Manhattan constraints to parse the structure of indoor environments from RGB-D video sequences in an online setting. We extend the previous approach for single view parsing of indoor scenes to video sequences and formulate the problem of recovering the floor plan of the environment as an optimal labeling problem solved using dynamic programming. The temporal continuity is enforced in a recursive setting, where labeling from previous frames is used as a prior term in the objective function. In addition to recovery of piecewise planar weak Manhattan structure of the extended environment, the orthogonality constraints are also exploited by visual odometry and pose graph optimization. This yields reliable estimates in the presence of large motions and absence of distinctive features to track. We evaluate our method on several challenging indoors sequences demonstrating accurate SLAM and dense mapping of low texture environments. On existing TUM benchmark we achieve competitive results with the alternative approaches which fail in our environments. △ Less

Submitted 1 August, 2017; originally announced August 2017.

Comments: International Conference on Intelligent Robots and Systems (IROS) 2017

arXiv:1704.04451 [pdf, other]

Optimizing Differentiable Relaxations of Coreference Evaluation Metrics

Authors: Phong Le, Ivan Titov

Abstract: Coreference evaluation metrics are hard to optimize directly as they are non-differentiable functions, not easily decomposable into elementary decisions. Consequently, most approaches optimize objectives only indirectly related to the end goal, resulting in suboptimal performance. Instead, we propose a differentiable relaxation that lends itself to gradient-based optimisation, thus bypassing the n… ▽ More Coreference evaluation metrics are hard to optimize directly as they are non-differentiable functions, not easily decomposable into elementary decisions. Consequently, most approaches optimize objectives only indirectly related to the end goal, resulting in suboptimal performance. Instead, we propose a differentiable relaxation that lends itself to gradient-based optimisation, thus bypassing the need for reinforcement learning or heuristic modification of cross-entropy. We show that by modifying the training objective of a competitive neural coreference system, we obtain a substantial gain in performance. This suggests that our approach can be regarded as a viable alternative to using reinforcement learning or more computationally expensive imitation learning. △ Less

Submitted 22 June, 2017; v1 submitted 14 April, 2017; originally announced April 2017.

Comments: 10 pages. CoNLL

arXiv:1703.03334 [pdf, other]

Fast Genetic Algorithms

Authors: Benjamin Doerr, Huu Phuoc Le, Régis Makhmara, Ta Duy Nguyen

Abstract: For genetic algorithms using a bit-string representation of length~$n$, the general recommendation is to take $1/n$ as mutation rate. In this work, we discuss whether this is really justified for multimodal functions. Taking jump functions and the $(1+1)$ evolutionary algorithm as the simplest example, we observe that larger mutation rates give significantly better runtimes. For the $\jump_{m,n}$… ▽ More For genetic algorithms using a bit-string representation of length~$n$, the general recommendation is to take $1/n$ as mutation rate. In this work, we discuss whether this is really justified for multimodal functions. Taking jump functions and the $(1+1)$ evolutionary algorithm as the simplest example, we observe that larger mutation rates give significantly better runtimes. For the $\jump_{m,n}$ function, any mutation rate between $2/n$ and $m/n$ leads to a speed-up at least exponential in $m$ compared to the standard choice. The asymptotically best runtime, obtained from using the mutation rate $m/n$ and leading to a speed-up super-exponential in $m$, is very sensitive to small changes of the mutation rate. Any deviation by a small $(1 \pm \eps)$ factor leads to a slow-down exponential in $m$. Consequently, any fixed mutation rate gives strongly sub-optimal results for most jump functions. Building on this observation, we propose to use a random mutation rate $αあるふぁ/n$, where $αあるふぁ$ is chosen from a power-law distribution. We prove that the $(1+1)$ EA with this heavy-tailed mutation rate optimizes any $\jump_{m,n}$ function in a time that is only a small polynomial (in~$m$) factor above the one stemming from the optimal rate for this $m$. Our heavy-tailed mutation operator yields similar speed-ups (over the best known performance guarantees) for the vertex cover problem in bipartite graphs and the matching problem in general graphs. Following the example of fast simulated annealing, fast evolution strategies, and fast evolutionary programming, we propose to call genetic algorithms using a heavy-tailed mutation operator \emph{fast genetic algorithms}. △ Less

Submitted 15 March, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

Journal ref: Proceedings of GECCO 2017

arXiv:1609.07826 [pdf, other]

Multiview RGB-D Dataset for Object Instance Detection

Authors: Georgios Georgakis, Md Alimoor Reza, Arsalan Mousavian, Phi-Hung Le, Jana Kosecka

Abstract: This paper presents a new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud. Also, an approach for detection and recognition is presented, whi… ▽ More This paper presents a new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud. Also, an approach for detection and recognition is presented, which is comprised of two parts: i) a new multi-view 3D proposal generation method and ii) the development of several recognition baselines using AlexNet to score our proposals, which is trained either on crops of the dataset or on synthetically composited training images. Finally, we compare the performance of the object proposals and a detection baseline to the Washington RGB-D Scenes (WRGB-D) dataset and demonstrate that our Kitchen scenes dataset is more challenging for object detection and recognition. The dataset is available at: http://cs.gmu.edu/~robot/gmu-kitchens.html. △ Less

Submitted 25 September, 2016; originally announced September 2016.

arXiv:1605.01652 [pdf, other]

LSTM-based Mixture-of-Experts for Knowledge-Aware Dialogues

Authors: Phong Le, Marc Dymetman, Jean-Michel Renders

Abstract: We introduce an LSTM-based method for dynamically integrating several word-prediction experts to obtain a conditional language model which can be good simultaneously at several subtasks. We illustrate this general approach with an application to dialogue where we integrate a neural chat model, good at conversational aspects, with a neural question-answering model, good at retrieving precise inform… ▽ More We introduce an LSTM-based method for dynamically integrating several word-prediction experts to obtain a conditional language model which can be good simultaneously at several subtasks. We illustrate this general approach with an application to dialogue where we integrate a neural chat model, good at conversational aspects, with a neural question-answering model, good at retrieving precise information from a knowledge-base, and show how the integration combines the strengths of the independent components. We hope that this focused contribution will attract attention on the benefits of using such mixtures of experts in NLP. △ Less

Submitted 5 May, 2016; originally announced May 2016.

Showing 1–50 of 58 results for author: Le, P