Search | arXiv e-print repository

PhoWhisper: Automatic Speech Recognition for Vietnamese

Authors: Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen

Abstract: We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com… ▽ More We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com/VinAIResearch/PhoWhisper △ Less

Submitted 27 March, 2024; originally announced June 2024.

Comments: Accepted to ICLR 2024 Tiny Papers Track

arXiv:2405.14141 [pdf, other]

ViHateT5: Enhancing Hate Speech Detection in Vietnamese With A Unified Text-to-Text Transformer Model

Authors: Luan Thanh Nguyen

Abstract: Recent advancements in hate speech detection (HSD) in Vietnamese have made significant progress, primarily attributed to the emergence of transformer-based pre-trained language models, particularly those built on the BERT architecture. However, the necessity for specialized fine-tuned models has resulted in the complexity and fragmentation of developing a multitasking HSD system. Moreover, most cu… ▽ More Recent advancements in hate speech detection (HSD) in Vietnamese have made significant progress, primarily attributed to the emergence of transformer-based pre-trained language models, particularly those built on the BERT architecture. However, the necessity for specialized fine-tuned models has resulted in the complexity and fragmentation of developing a multitasking HSD system. Moreover, most current methodologies focus on fine-tuning general pre-trained models, primarily trained on formal textual datasets like Wikipedia, which may not accurately capture human behavior on online platforms. In this research, we introduce ViHateT5, a T5-based model pre-trained on our proposed large-scale domain-specific dataset named VOZ-HSD. By harnessing the power of a text-to-text architecture, ViHateT5 can tackle multiple tasks using a unified model and achieve state-of-the-art performance across all standard HSD benchmarks in Vietnamese. Our experiments also underscore the significance of label distribution in pre-training data on model efficacy. We provide our experimental materials for research purposes, including the VOZ-HSD dataset, pre-trained checkpoint, the unified HSD-multitask ViHateT5 model, and related source code on GitHub publicly. △ Less

Submitted 4 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: Accepted at ACL'2024 (Findings)

arXiv:2404.05276 [pdf, ps, other]

On the complexity of normalization for the planar $λらむだ$-calculus

Authors: Anupam Das, Damiano Mazza, Lê Thành Dũng Nguyên, Noam Zeilberger

Abstract: We sketch a tentative proof of P-completeness for the $βべーた$-convertibility problem on untyped planar (a.k.a. ordered or non-commutative) $λらむだ$-terms. We sketch a tentative proof of P-completeness for the $βべーた$-convertibility problem on untyped planar (a.k.a. ordered or non-commutative) $λらむだ$-terms. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Abstract for the Trends in Linear Logic and Applications 2023 workshop, meant to be expanded into a proper paper in the future

arXiv:2404.05265 [pdf, other]

Function spaces for orbit-finite sets

Authors: Mikołaj Bojańczyk, Lê Thành Dũng Nguyên, Rafał Stefański

Abstract: Orbit-finite sets are a generalisation of finite sets, and as such support many operations allowed for finite sets, such as pairing, quotienting, or taking subsets. However, they do not support function spaces, i.e. if X and Y are orbit-finite sets, then the space of finitely supported functions from X to Y is not orbit-finite. In this paper we propose two solutions to this problem: one is obtaine… ▽ More Orbit-finite sets are a generalisation of finite sets, and as such support many operations allowed for finite sets, such as pairing, quotienting, or taking subsets. However, they do not support function spaces, i.e. if X and Y are orbit-finite sets, then the space of finitely supported functions from X to Y is not orbit-finite. In this paper we propose two solutions to this problem: one is obtained by generalising the notion of orbit-finite set, and the other one is obtained by restricting it. In both cases, function spaces and the original closure properties are retained. Curiously, both solutions are "linear": the generalisation is based on linear algebra, while the restriction is based on linear logic. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.14918 [pdf, other]

Deep learning-based method for weather forecasting: A case study in Itoshima

Authors: Yuzhong Cheng, Linh Thi Hoai Nguyen, Akinori Ozaki, Ton Viet Ta

Abstract: Accurate weather forecasting is of paramount importance for a wide range of practical applications, drawing substantial scientific and societal interest. However, the intricacies of weather systems pose substantial challenges to accurate predictions. This research introduces a multilayer perceptron model tailored for weather forecasting in Itoshima, Kyushu, Japan. Our meticulously designed archite… ▽ More Accurate weather forecasting is of paramount importance for a wide range of practical applications, drawing substantial scientific and societal interest. However, the intricacies of weather systems pose substantial challenges to accurate predictions. This research introduces a multilayer perceptron model tailored for weather forecasting in Itoshima, Kyushu, Japan. Our meticulously designed architecture demonstrates superior performance compared to existing models, surpassing benchmarks such as Long Short-Term Memory and Recurrent Neural Networks. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2402.05854 [pdf, other]

(Almost) Affine Higher-Order Tree Transducers

Authors: Lê Thành Dũng Tito Nguyên, Gabriele Vanoni

Abstract: We investigate the tree-to-tree functions computed by \enquote{affine$λらむだ$-transducers}: tree automata whose memory consists of an affine $λらむだ$-term instead of a finite state. They can be seen as variations on Gallot, Lemay and Salvati's Linear High-Order Deterministic Tree Transducers. When the memory is almost purely affine (\textit{à la} Kanazawa), we show that these machines can be translated to t… ▽ More We investigate the tree-to-tree functions computed by \enquote{affine$λらむだ$-transducers}: tree automata whose memory consists of an affine $λらむだ$-term instead of a finite state. They can be seen as variations on Gallot, Lemay and Salvati's Linear High-Order Deterministic Tree Transducers. When the memory is almost purely affine (\textit{à la} Kanazawa), we show that these machines can be translated to tree-walking transducers (and with a purely affine memory, we get a reversible tree-walking transducer). This leads to a proof of an inexpressivity conjecture of \titocecilia on \enquote{implicit automata} in an affine $λらむだ$-calculus. The key technical tool in our proofs is the Interaction Abstract Machine (IAM), an operational avatar of the \enquote{geometry of interaction} semantics of linear logic. We work with ad-hoc specializations to (almost) affine $λらむだ$-terms of a tree-generating version of the IAM. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.01198 [pdf, other]

Physical Layer Location Privacy in SIMO Communication Using Fake Paths Injection

Authors: Trong Duy Tran, Maxime Ferreira Da Costa, Linh Trung Nguyen

Abstract: Fake path injection is an emerging paradigm for inducing privacy over wireless networks. In this paper, fake paths are injected by the transmitter into a SIMO multipath communication channel to preserve her physical location from an eavesdropper. A novel statistical privacy metric is defined as the ratio between the largest (resp. smallest) eigenvalues of Bob's (resp. Eve's) Cramér-Rao lower bound… ▽ More Fake path injection is an emerging paradigm for inducing privacy over wireless networks. In this paper, fake paths are injected by the transmitter into a SIMO multipath communication channel to preserve her physical location from an eavesdropper. A novel statistical privacy metric is defined as the ratio between the largest (resp. smallest) eigenvalues of Bob's (resp. Eve's) Cramér-Rao lower bound on the SIMO multipath channel parameters to assess the privacy enhancements. Leveraging the spectral properties of generalized Vandermonde matrices, bounds on the privacy margin of the proposed scheme are derived. Specifically, it is shown that the privacy margin increases quadratically in the inverse of the separation between the true and the fake paths under Eve's perspective. Numerical simulations further showcase the approach's benefit. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2311.11001 [pdf, other]

Gendec: A Machine Learning-based Framework for Gender Detection from Japanese Names

Authors: Duong Tien Pham, Luan Thanh Nguyen

Abstract: Every human has their own name, a fundamental aspect of their identity and cultural heritage. The name often conveys a wealth of information, including details about an individual's background, ethnicity, and, especially, their gender. By detecting gender through the analysis of names, researchers can unlock valuable insights into linguistic patterns and cultural norms, which can be applied to pra… ▽ More Every human has their own name, a fundamental aspect of their identity and cultural heritage. The name often conveys a wealth of information, including details about an individual's background, ethnicity, and, especially, their gender. By detecting gender through the analysis of names, researchers can unlock valuable insights into linguistic patterns and cultural norms, which can be applied to practical applications. Hence, this work presents a novel dataset for Japanese name gender detection comprising 64,139 full names in romaji, hiragana, and kanji forms, along with their biological genders. Moreover, we propose Gendec, a framework for gender detection from Japanese names that leverages diverse approaches, including traditional machine learning techniques or cutting-edge transfer learning models, to predict the gender associated with Japanese names accurately. Through a thorough investigation, the proposed framework is expected to be effective and serve potential applications in various domains. △ Less

Submitted 18 November, 2023; originally announced November 2023.

Comments: This paper is accepted for presentation at ISDA'23

arXiv:2311.02945 [pdf, ps, other]

PhoGPT: Generative Pre-training for Vietnamese

Authors: Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Dinh Phung, Hung Bui

Abstract: We open-source a state-of-the-art 4B-parameter generative model series for Vietnamese, which includes the base pre-trained monolingual model PhoGPT-4B and its chat variant, PhoGPT-4B-Chat. The base model, PhoGPT-4B, with exactly 3.7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types. The chat vari… ▽ More We open-source a state-of-the-art 4B-parameter generative model series for Vietnamese, which includes the base pre-trained monolingual model PhoGPT-4B and its chat variant, PhoGPT-4B-Chat. The base model, PhoGPT-4B, with exactly 3.7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types. The chat variant, PhoGPT-4B-Chat, is the modeling output obtained by fine-tuning PhoGPT-4B on a dataset of 70K instructional prompts and their responses, along with an additional 290K conversations. In addition, we also demonstrate its superior performance compared to previous open-source models. Our PhoGPT models are available at: https://github.com/VinAIResearch/PhoGPT △ Less

Submitted 22 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: PhoGPT-4B Technical Report - 5 pages

arXiv:2310.18046 [pdf, other]

ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese

Authors: Khiem Vinh Tran, Hao Phu Phan, Kiet Van Nguyen, Ngan Luu Thuy Nguyen

Abstract: In recent years, Visual Question Answering (VQA) has gained significant attention for its diverse applications, including intelligent car assistance, aiding visually impaired individuals, and document image information retrieval using natural language queries. VQA requires effective integration of information from questions and images to generate accurate answers. Neural models for VQA have made r… ▽ More In recent years, Visual Question Answering (VQA) has gained significant attention for its diverse applications, including intelligent car assistance, aiding visually impaired individuals, and document image information retrieval using natural language queries. VQA requires effective integration of information from questions and images to generate accurate answers. Neural models for VQA have made remarkable progress on large-scale datasets, with a primary focus on resource-rich languages like English. To address this, we introduce the ViCLEVR dataset, a pioneering collection for evaluating various visual reasoning capabilities in Vietnamese while mitigating biases. The dataset comprises over 26,000 images and 30,000 question-answer pairs (QAs), each question annotated to specify the type of reasoning involved. Leveraging this dataset, we conduct a comprehensive analysis of contemporary visual reasoning systems, offering valuable insights into their strengths and limitations. Furthermore, we present PhoVIT, a comprehensive multimodal fusion that identifies objects in images based on questions. The architecture effectively employs transformers to enable simultaneous reasoning over textual and visual data, merging both modalities at an early model stage. The experimental findings demonstrate that our proposed model achieves state-of-the-art performance across four evaluation metrics. The accompanying code and dataset have been made publicly accessible at \url{https://github.com/kvt0012/ViCLEVR}. This provision seeks to stimulate advancements within the research community, fostering the development of more multimodal fusion algorithms, specifically tailored to address the nuances of low-resource languages, exemplified by Vietnamese. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: A pre-print version and submitted to journal

arXiv:2310.09811 [pdf, other]

Spacing distribution for quantum Rabi models

Authors: Daniel Braak, Linh Thi Hoai Nguyen, Cid Reyes-Bustos, Masato Wakayama

Abstract: The asymmetric quantum Rabi model (AQRM) is a fundamental model in quantum optics describing the interaction of light and matter. Besides its immediate physical interest, the AQRM possesses an intriguing mathematical structure which is far from being completely understood. In this paper, we focus on the distribution of the level spacing, the difference between consecutive eigenvalues of the AQRM i… ▽ More The asymmetric quantum Rabi model (AQRM) is a fundamental model in quantum optics describing the interaction of light and matter. Besides its immediate physical interest, the AQRM possesses an intriguing mathematical structure which is far from being completely understood. In this paper, we focus on the distribution of the level spacing, the difference between consecutive eigenvalues of the AQRM in the limit of high energies, i.e. large quantum numbers. In the symmetric case, that is the quantum Rabi model (QRM), the spacing distribution for each parity (given by the $\mathbb{Z}_2$-symmetry) is fully clarified by an asymptotic expression derived by de Monvel and Zielinski, though some questions remain for the full spectrum spacing. However, in the general AQRM case, there is no parity decomposition for the eigenvalues. In connection with numerically exact studies for the first 40,000 eigenstates we describe the spacing distribution for the AQRM which is characterized by a new type of periodicity and symmetric behavior of the distribution with respect to the bias parameter. The results reflects the hidden symmetry of the AQRM known to appear for half-integer bias. In addition, we observe in the AQRM the excited state quantum phase transition for large values of the bias parameter, analogous to the QRM with large qubit energy, and an internal symmetry of the level spacing distribution for fixed bias. This novel symmetry is independent from the symmetry for half-integer bias and not explained by current theoretical knowledge. △ Less

Submitted 9 February, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

Comments: 28 pages. 15 figures. The conjecture in Section 4.4 (Theorem 4.5 in the current version) was proved using results published after the previous version. The rest of the manuscript was modified slightly according to this change

MSC Class: 47B06 (Primary) 81V73; 81R40 (Secondary)

arXiv:2308.00198 [pdf, other]

doi 10.4230/LIPIcs.CSL.2024.40

Syntactically and semantically regular languages of lambda-terms coincide through logical relations

Authors: Vincent Moreau, Lê Thành Dũng Nguyên

Abstract: A fundamental theme in automata theory is regular languages of words and trees, and their many equivalent definitions. Salvati has proposed a generalization to regular languages of simply typed $λらむだ$-terms, defined using denotational semantics in finite sets. We provide here some evidence for its robustness. First, we give an equivalent syntactic characterization that naturally extends the seminal… ▽ More A fundamental theme in automata theory is regular languages of words and trees, and their many equivalent definitions. Salvati has proposed a generalization to regular languages of simply typed $λらむだ$-terms, defined using denotational semantics in finite sets. We provide here some evidence for its robustness. First, we give an equivalent syntactic characterization that naturally extends the seminal work of Hillebrand and Kanellakis connecting regular languages of words and syntactic $λらむだ$-definability. Second, we show that any finitary extensional model of the simply typed $λらむだ$-calculus, when used in Salvati's definition, recognizes exactly the same class of languages of $λらむだ$-terms as the category of finite sets does. The proofs of these two results rely on logical relations and can be seen as instances of a more general construction of a categorical nature, inspired by previous categorical accounts of logical relations using the gluing construction. △ Less

Submitted 8 February, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

Comments: The proofs on "finitely pointable" CCCs in versions 1 and 2 were wrong; we now make slightly weaker claims on well-pointed locally finite CCCs. New in this version: added reference [3] and official DOI (proceedings of CSL 2024)

arXiv:2307.15335 [pdf, other]

BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering

Authors: Khiem Vinh Tran, Kiet Van Nguyen, Ngan Luu Thuy Nguyen

Abstract: Visual Question Answering (VQA) is an intricate and demanding task that integrates natural language processing (NLP) and computer vision (CV), capturing the interest of researchers. The English language, renowned for its wealth of resources, has witnessed notable advancements in both datasets and models designed for VQA. However, there is a lack of models that target specific countries such as Vie… ▽ More Visual Question Answering (VQA) is an intricate and demanding task that integrates natural language processing (NLP) and computer vision (CV), capturing the interest of researchers. The English language, renowned for its wealth of resources, has witnessed notable advancements in both datasets and models designed for VQA. However, there is a lack of models that target specific countries such as Vietnam. To address this limitation, we introduce a transformer-based Vietnamese model named BARTPhoBEiT. This model includes pre-trained Sequence-to-Sequence and bidirectional encoder representation from Image Transformers in Vietnamese and evaluates Vietnamese VQA datasets. Experimental results demonstrate that our proposed model outperforms the strong baseline and improves the state-of-the-art in six metrics: Accuracy, Precision, Recall, F1-score, WUPS 0.0, and WUPS 0.9. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.11057 [pdf, other]

Two-way automata and transducers with planar behaviours are aperiodic

Authors: Lê Thành Dũng Nguyên, Camille Noûs, Cécilia Pradic

Abstract: We consider a notion of planarity for two-way finite automata and transducers, inspired by Temperley-Lieb monoids of planar diagrams. We show that this restriction captures star-free languages and first-order transductions. We consider a notion of planarity for two-way finite automata and transducers, inspired by Temperley-Lieb monoids of planar diagrams. We show that this restriction captures star-free languages and first-order transductions. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 18 pages, DMTCS submission

arXiv:2306.08798 [pdf, other]

MPSA-DenseNet: A novel deep learning model for English accent classification

Authors: Tianyu Song, Linh Thi Hoai Nguyen, Ton Viet Ta

Abstract: This paper presents three innovative deep learning models for English accent classification: Multi-DenseNet, PSA-DenseNet, and MPSE-DenseNet, that combine multi-task learning and the PSA module attention mechanism with DenseNet. We applied these models to data collected from six dialects of English across native English speaking regions (Britain, the United States, Scotland) and nonnative English… ▽ More This paper presents three innovative deep learning models for English accent classification: Multi-DenseNet, PSA-DenseNet, and MPSE-DenseNet, that combine multi-task learning and the PSA module attention mechanism with DenseNet. We applied these models to data collected from six dialects of English across native English speaking regions (Britain, the United States, Scotland) and nonnative English speaking regions (China, Germany, India). Our experimental results show a significant improvement in classification accuracy, particularly with MPSA-DenseNet, which outperforms all other models, including DenseNet and EPSA models previously used for accent identification. Our findings indicate that MPSA-DenseNet is a highly promising model for accurately identifying English accents. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2305.19709 [pdf, other]

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

Authors: Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

Abstract: We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the same model architecture as BERT-base, trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales. Experimental results show that employing XPhoneBERT as an input phoneme encod… ▽ More We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the same model architecture as BERT-base, trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales. Experimental results show that employing XPhoneBERT as an input phoneme encoder significantly boosts the performance of a strong neural TTS model in terms of naturalness and prosody and also helps produce fairly high-quality speech with limited training data. We publicly release our pre-trained XPhoneBERT with the hope that it would facilitate future research and downstream TTS applications for multiple languages. Our XPhoneBERT model is available at https://github.com/VinAIResearch/XPhoneBERT △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: In Proceedings of INTERSPEECH 2023 (to appear)

arXiv:2305.12601 [pdf, other]

Simply typed convertibility is TOWER-complete even for safe lambda-terms

Authors: Lê Thành Dũng Nguyên

Abstract: We consider the following decision problem: given two simply typed $λらむだ$-terms, are they $βべーた$-convertible? Equivalently, do they have the same normal form? It is famously non-elementary, but the precise complexity - namely TOWER-complete - is lesser known. One goal of this short paper is to popularize this fact. Our original contribution is to show that the problem stays TOWER-complete when the two… ▽ More We consider the following decision problem: given two simply typed $λらむだ$-terms, are they $βべーた$-convertible? Equivalently, do they have the same normal form? It is famously non-elementary, but the precise complexity - namely TOWER-complete - is lesser known. One goal of this short paper is to popularize this fact. Our original contribution is to show that the problem stays TOWER-complete when the two input terms belong to Blum and Ong's safe $λらむだ$-calculus, a fragment of the simply typed $λらむだ$-calculus arising from the study of higher-order recursion schemes. Previously, the best known lower bound for this safe $βべーた$-convertibility problem was PSPACE-hardness. Our proof proceeds by reduction from the star-free expression equivalence problem, taking inspiration from the author's work with Pradic on "implicit automata in typed $λらむだ$-calculi". These results also hold for $βべーたηいーた$-convertibility. △ Less

Submitted 12 July, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: final revision after acceptance to Logical Methods in Computer Science

arXiv:2303.06546 [pdf, other]

Blockchain-Empowered Trustworthy Data Sharing: Fundamentals, Applications, and Challenges

Authors: Linh T. Nguyen, Lam Duc Nguyen, Thong Hoang, Dilum Bandara, Qin Wang, Qinghua Lu, Xiwei Xu, Liming Zhu, Petar Popovski, Shiping Chen

Abstract: Various data-sharing platforms have emerged with the growing public demand for open data and legislation mandating certain data to remain open. Most of these platforms remain opaque, leading to many questions about data accuracy, provenance and lineage, privacy implications, consent management, and the lack of fair incentives for data providers. With their transparency, immutability, non-repudiati… ▽ More Various data-sharing platforms have emerged with the growing public demand for open data and legislation mandating certain data to remain open. Most of these platforms remain opaque, leading to many questions about data accuracy, provenance and lineage, privacy implications, consent management, and the lack of fair incentives for data providers. With their transparency, immutability, non-repudiation, and decentralization properties, blockchains could not be more apt to answer these questions and enhance trust in a data-sharing platform. However, blockchains are not good at handling the four Vs of big data (i.e., volume, variety, velocity, and veracity) due to their limited performance, scalability, and high cost. Given many related works proposes blockchain-based trustworthy data-sharing solutions, there is increasing confusion and difficulties in understanding and selecting these technologies and platforms in terms of their sharing mechanisms, sharing services, quality of services, and applications. In this paper, we conduct a comprehensive survey on blockchain-based data-sharing architectures and applications to fill the gap. First, we present the foundations of blockchains and discuss the challenges of current data-sharing techniques. Second, we focus on the convergence of blockchain and data sharing to give a clear picture of this landscape and propose a reference architecture for blockchain-based data sharing. Third, we discuss the industrial applications of blockchain-based data sharing, ranging from healthcare and smart grid to transportation and decarbonization. For each application, we provide lessons learned for the deployment of Blockchain-based data sharing. Finally, we discuss research challenges and open research directions. △ Less

Submitted 11 March, 2023; originally announced March 2023.

Comments: 40 pages, 15 figures, and 8 tables

arXiv:2303.05692 [pdf, ps, other]

Semantic-Preserving Augmentation for Robust Image-Text Retrieval

Authors: Sunwoo Kim, Kyuhong Shim, Luong Trung Nguyen, Byonghyo Shim

Abstract: Image text retrieval is a task to search for the proper textual descriptions of the visual world and vice versa. One challenge of this task is the vulnerability to input image and text corruptions. Such corruptions are often unobserved during the training, and degrade the retrieval model decision quality substantially. In this paper, we propose a novel image text retrieval technique, referred to a… ▽ More Image text retrieval is a task to search for the proper textual descriptions of the visual world and vice versa. One challenge of this task is the vulnerability to input image and text corruptions. Such corruptions are often unobserved during the training, and degrade the retrieval model decision quality substantially. In this paper, we propose a novel image text retrieval technique, referred to as robust visual semantic embedding (RVSE), which consists of novel image-based and text-based augmentation techniques called semantic preserving augmentation for image (SPAugI) and text (SPAugT). Since SPAugI and SPAugT change the original data in a way that its semantic information is preserved, we enforce the feature extractors to generate semantic aware embedding vectors regardless of the corruption, improving the model robustness significantly. From extensive experiments using benchmark datasets, we show that RVSE outperforms conventional retrieval schemes in terms of image-text retrieval performance. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: Accepted to ICASSP 2023

arXiv:2303.01706 [pdf, other]

A Geometrical Structure for Predator-Avoidance Fish Schooling

Authors: Aditya Dewanto Hartono, Ton Viet Ta, Linh Thi Hoai Nguyen

Abstract: This paper conducts a numerical study of a geometrical structure called $εいぷしろん$-school for predator-avoidance fish schools, based on our previous mathematical model. Our results show that during a predator attack, the number of $εいぷしろん$-school increases from one to a certain value. After the attack, the number of $εいぷしろん$-school decreases in the first two predator-avoidance patterns, but continues to increase i… ▽ More This paper conducts a numerical study of a geometrical structure called $εいぷしろん$-school for predator-avoidance fish schools, based on our previous mathematical model. Our results show that during a predator attack, the number of $εいぷしろん$-school increases from one to a certain value. After the attack, the number of $εいぷしろん$-school decreases in the first two predator-avoidance patterns, but continues to increase in the third pattern. A constant value for the number of the $εいぷしろん$-school is observed in the last pattern. These suggests that when the predator is approaching, each individual in the school focuses more on avoiding the predator, rather than on interacting with its schoolmates. Such a trait is in agreement with real-life behavior in the natural ecosystem. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2302.11789 [pdf, other]

Interval optimization problems on Hadamard manifolds:Solvability and Duality

Authors: Le Tram Nguyen, Yu-Lin Chang, Chu-Chin Hu, Jein-Shan Chen

Abstract: In this paper, we will study about the solvability and duality of interval optimization problems on Hadamard manifolds. It includes the KKT conditions, and Wofle dual problem with weak duality and strong duality. These results are the complement for the solvability of interval optimization problems on Hadamard manifolds. In this paper, we will study about the solvability and duality of interval optimization problems on Hadamard manifolds. It includes the KKT conditions, and Wofle dual problem with weak duality and strong duality. These results are the complement for the solvability of interval optimization problems on Hadamard manifolds. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2205.11793

arXiv:2301.09234 [pdf, other]

Refutations of pebble minimization via output languages

Authors: Sandra Kiefer, Lê Thành Dũng Nguyên, Cécilia Pradic

Abstract: Polyregular functions are the class of string-to-string functions definable by pebble transducers, an extension of finite-state automata with outputs and multiple two-way reading heads (pebbles) with a stack discipline. If a polyregular function can be computed with $k$ pebbles, then its output length is bounded by a polynomial of degree $k$ in the input length. But Bojańczyk has shown that the co… ▽ More Polyregular functions are the class of string-to-string functions definable by pebble transducers, an extension of finite-state automata with outputs and multiple two-way reading heads (pebbles) with a stack discipline. If a polyregular function can be computed with $k$ pebbles, then its output length is bounded by a polynomial of degree $k$ in the input length. But Bojańczyk has shown that the converse fails. In this paper, we provide two alternative easier proofs. The first establishes by elementary means that some quadratic polyregular function requires 3 pebbles. The second proof - just as short, albeit less elementary - shows a stronger statement: for every $k$, there exists some polyregular function with quadratic growth whose output language differs from that of any $k$-fold composition of macro tree transducers (and which therefore cannot be computed by a $k$-pebble transducer). Along the way, we also refute a conjectured logical characterization of polyblind functions. △ Less

Submitted 20 June, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

Comments: 20 pages, for submission to Fundamenta Informaticae; this version excludes some of the material in the v1, which may appear in other subsequent papers

arXiv:2210.03989 [pdf, other]

A Stochastic Differential Equation Model for Predator-Avoidance Fish Schooling

Authors: Aditya Dewanto Hartono, Linh Thi Hoai Nguyen, Ton Viet Ta

Abstract: This paper presents a system of stochastic differential equations (SDEs) as mathematical model to describe the spatial-temporal dynamics of predator-prey system in an artificial aquatic environment with schooling behavior imposed upon the associated prey. The proposed model follows the particle-like approach where interactions among the associated units are manifested through combination of attrac… ▽ More This paper presents a system of stochastic differential equations (SDEs) as mathematical model to describe the spatial-temporal dynamics of predator-prey system in an artificial aquatic environment with schooling behavior imposed upon the associated prey. The proposed model follows the particle-like approach where interactions among the associated units are manifested through combination of attractive and repulsive forces analogous to the ones occurred in molecular physics. Two hunting tactics of the predator are proposed and integrated into the general model, namely the center-attacking and the nearest-attacking strategy. Emphasis is placed upon demonstrating the capacity of the proposed model in: (i) discovering the predator-avoidance patterns of the schooling prey, and (ii) showing the benefit of constituting large prey school in better escaping the predator's attack. Based on numerical simulations upon the proposed model, four predator-avoidance patterns of the schooling prey are discovered, namely Split and Reunion, Split and Separate into Two Groups, Scattered, and Maintain Formation and Distance. The proposed model also successfully demonstrates the benefit of constituting large group of schooling prey in mitigating predation risk. Such findings are in agreement with real-life observations of the natural aquatic ecosystem, hence confirming the validity and exactitude of the proposed model. △ Less

Submitted 8 October, 2022; originally announced October 2022.

MSC Class: 92-10; 60H10; 68W10

arXiv:2209.10482 [pdf, other]

SMTCE: A Social Media Text Classification Evaluation Benchmark and BERTology Models for Vietnamese

Authors: Luan Thanh Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Text classification is a typical natural language processing or computational linguistics task with various interesting applications. As the number of users on social media platforms increases, data acceleration promotes emerging studies on Social Media Text Classification (SMTC) or social media text mining on these valuable resources. In contrast to English, Vietnamese, one of the low-resource la… ▽ More Text classification is a typical natural language processing or computational linguistics task with various interesting applications. As the number of users on social media platforms increases, data acceleration promotes emerging studies on Social Media Text Classification (SMTC) or social media text mining on these valuable resources. In contrast to English, Vietnamese, one of the low-resource languages, is still not concentrated on and exploited thoroughly. Inspired by the success of the GLUE, we introduce the Social Media Text Classification Evaluation (SMTCE) benchmark, as a collection of datasets and models across a diverse set of SMTC tasks. With the proposed benchmark, we implement and analyze the effectiveness of a variety of multilingual BERT-based models (mBERT, XLM-R, and DistilmBERT) and monolingual BERT-based models (PhoBERT, viBERT, vELECTRA, and viBERT4news) for tasks in the SMTCE benchmark. Monolingual models outperform multilingual models and achieve state-of-the-art results on all text classification tasks. It provides an objective assessment of multilingual and monolingual BERT-based models on the benchmark, which will benefit future studies about BERTology in the Vietnamese language. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: Accepted at The 36th annual Meeting of Pacific Asia Conference on Language, Information and Computation (PACLIC 36)

arXiv:2209.07825 [pdf, other]

doi 10.46298/lmcs-19(4:25)2023

A System of Interaction and Structure III: The Complexity of BV and Pomset Logic

Authors: Lê Thành Dũng Nguyên, Lutz Straßburger

Abstract: Pomset logic and BV are both logics that extend multiplicative linear logic (with Mix) with a third connective that is self-dual and non-commutative. Whereas pomset logic originates from the study of coherence spaces and proof nets, BV originates from the study of series-parallel orders, cographs, and proof systems. Both logics enjoy a cut-admissibility result, but for neither logic can this be do… ▽ More Pomset logic and BV are both logics that extend multiplicative linear logic (with Mix) with a third connective that is self-dual and non-commutative. Whereas pomset logic originates from the study of coherence spaces and proof nets, BV originates from the study of series-parallel orders, cographs, and proof systems. Both logics enjoy a cut-admissibility result, but for neither logic can this be done in the sequent calculus. Provability in pomset logic can be checked via a proof net correctness criterion and in BV via a deep inference proof system. It has long been conjectured that these two logics are the same. In this paper we show that this conjecture is false. We also investigate the complexity of the two logics, exhibiting a huge gap between the two. Whereas provability in BV is NP-complete, provability in pomset logic is $Σしぐま_2^p$-complete. We also make some observations with respect to possible sequent systems for the two logics. △ Less

Submitted 15 December, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

Journal ref: Logical Methods in Computer Science, Volume 19, Issue 4 (December 18, 2023) lmcs:10057

arXiv:2209.01304 [pdf, other]

doi 10.25073/2588-1086/vnucsce.369

vieCap4H-VLSP 2021: Vietnamese Image Captioning for Healthcare Domain using Swin Transformer and Attention-based LSTM

Authors: Thanh Tin Nguyen, Long H. Nguyen, Nhat Truong Pham, Liu Tai Nguyen, Van Huong Do, Hai Nguyen, Ngoc Duy Nguyen

Abstract: This study presents our approach on the automatic Vietnamese image captioning for healthcare domain in text processing tasks of Vietnamese Language and Speech Processing (VLSP) Challenge 2021, as shown in Figure 1. In recent years, image captioning often employs a convolutional neural network-based architecture as an encoder and a long short-term memory (LSTM) as a decoder to generate sentences. T… ▽ More This study presents our approach on the automatic Vietnamese image captioning for healthcare domain in text processing tasks of Vietnamese Language and Speech Processing (VLSP) Challenge 2021, as shown in Figure 1. In recent years, image captioning often employs a convolutional neural network-based architecture as an encoder and a long short-term memory (LSTM) as a decoder to generate sentences. These models perform remarkably well in different datasets. Our proposed model also has an encoder and a decoder, but we instead use a Swin Transformer in the encoder, and a LSTM combined with an attention module in the decoder. The study presents our training experiments and techniques used during the competition. Our model achieves a BLEU4 score of 0.293 on the vietCap4H dataset, and the score is ranked the 3$^{rd}$ place on the private leaderboard. Our code can be found at \url{https://git.io/JDdJm}. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: Accepted for publication in the VNU Journal of Science: Computer Science and Communication Engineering

Journal ref: VNU Journal of Science: Computer Science and Communication Engineering, 38(2), 2022

arXiv:2208.04243 [pdf, other]

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

Authors: Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen

Abstract: In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence). We also conduct empirical experiments using strong baselines and find that the traditional "Cascaded" approach still outperforms… ▽ More In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence). We also conduct empirical experiments using strong baselines and find that the traditional "Cascaded" approach still outperforms the modern "End-to-End" approach. To the best of our knowledge, this is the first large-scale English-Vietnamese speech translation study. We hope both our publicly available dataset and study can serve as a starting point for future research and applications on English-Vietnamese speech translation. Our dataset is available at https://github.com/VinAIResearch/PhoST △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: In Proceedings of INTERSPEECH 2022, to appear. The first three authors contributed equally to this work

arXiv:2206.09600 [pdf, other]

SPBERTQA: A Two-Stage Question Answering System Based on Sentence Transformers for Medical Texts

Authors: Nhung Thi-Hong Nguyen, Phuong Phan-Dieu Ha, Luan Thanh Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Question answering (QA) systems have gained explosive attention in recent years. However, QA tasks in Vietnamese do not have many datasets. Significantly, there is mostly no dataset in the medical domain. Therefore, we built a Vietnamese Healthcare Question Answering dataset (ViHealthQA), including 10,015 question-answer passage pairs for this task, in which questions from health-interested users… ▽ More Question answering (QA) systems have gained explosive attention in recent years. However, QA tasks in Vietnamese do not have many datasets. Significantly, there is mostly no dataset in the medical domain. Therefore, we built a Vietnamese Healthcare Question Answering dataset (ViHealthQA), including 10,015 question-answer passage pairs for this task, in which questions from health-interested users were asked on prestigious health websites and answers from highly qualified experts. This paper proposes a two-stage QA system based on Sentence-BERT (SBERT) using multiple negatives ranking (MNR) loss combined with BM25. Then, we conduct diverse experiments with many bag-of-words models to assess our system's performance. With the obtained results, this system achieves better performance than traditional methods. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2205.11793 [pdf, other]

Interval Optimization Problems on Hadamard manifolds

Authors: L. T. Nguyen, Y. L Chang, C. C Hu, J. S Chen

Abstract: In this article, we introduce the interval optimization problems (IOPs) on Hadamard manifolds as well as study the relationship between them and the interval variational inequalities. To achieve the theoretical results, we build up some new concepts about $gH$-directional derivative and $gH$-Gâteaux differentiability of interval valued functions and their properties on the Hadamard manifolds. Th… ▽ More In this article, we introduce the interval optimization problems (IOPs) on Hadamard manifolds as well as study the relationship between them and the interval variational inequalities. To achieve the theoretical results, we build up some new concepts about $gH$-directional derivative and $gH$-Gâteaux differentiability of interval valued functions and their properties on the Hadamard manifolds. The obtained results pave a way to further study on Riemannian interval optimization problems (RIOPs). △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: submitted

arXiv:2203.11400 [pdf, other]

doi 10.25073/2588-1086/vnucsce.340

VLSP 2021 - ViMRC Challenge: Vietnamese Machine Reading Comprehension

Authors: Kiet Van Nguyen, Son Quoc Tran, Luan Thanh Nguyen, Tin Van Huynh, Son T. Luu, Ngan Luu-Thuy Nguyen

Abstract: One of the emerging research trends in natural language understanding is machine reading comprehension (MRC) which is the task to find answers to human questions based on textual data. Existing Vietnamese datasets for MRC research concentrate solely on answerable questions. However, in reality, questions can be unanswerable for which the correct answer is not stated in the given textual data. To a… ▽ More One of the emerging research trends in natural language understanding is machine reading comprehension (MRC) which is the task to find answers to human questions based on textual data. Existing Vietnamese datasets for MRC research concentrate solely on answerable questions. However, in reality, questions can be unanswerable for which the correct answer is not stated in the given textual data. To address the weakness, we provide the research community with a benchmark dataset named UIT-ViQuAD 2.0 for evaluating the MRC task and question answering systems for the Vietnamese language. We use UIT-ViQuAD 2.0 as a benchmark dataset for the challenge on Vietnamese MRC at the Eighth Workshop on Vietnamese Language and Speech Processing (VLSP 2021). This task attracted 77 participant teams from 34 universities and other organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 77.24% in F1-score and 67.43% in Exact Match on the private test set. The Vietnamese MRC systems proposed by the top 3 teams use XLM-RoBERTa, a powerful pre-trained language model based on the transformer architecture. The UIT-ViQuAD 2.0 dataset motivates researchers to further explore the Vietnamese machine reading comprehension task and related tasks such as question answering, question generation, and natural language inference. △ Less

Submitted 4 April, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: The 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021)

arXiv:2110.12199 [pdf, other]

PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation

Authors: Long Doan, Linh The Nguyen, Nguyen Luong Tran, Thai Hoang, Dat Quoc Nguyen

Abstract: We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3.02M sentence pairs, which is 2.9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15. We conduct experiments comparing strong neural baselines and well-known automatic translation engines on our dataset and find that in both automatic and human evaluations: the best performance i… ▽ More We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3.02M sentence pairs, which is 2.9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15. We conduct experiments comparing strong neural baselines and well-known automatic translation engines on our dataset and find that in both automatic and human evaluations: the best performance is obtained by fine-tuning the pre-trained sequence-to-sequence denoising auto-encoder mBART. To our best knowledge, this is the first large-scale Vietnamese-English machine translation study. We hope our publicly available dataset and study can serve as a starting point for future research and applications on Vietnamese-English machine translation. △ Less

Submitted 23 October, 2021; originally announced October 2021.

Comments: To appear in Proceedings of EMNLP 2021 (main conference). The first three authors contribute equally to this work

arXiv:2110.05178 [pdf, other]

doi 10.1109/TSP.2021.3125137

Gradual Federated Learning with Simulated Annealing

Authors: Luong Trung Nguyen, Junhan Kim, Byonghyo Shim

Abstract: Federated averaging (FedAvg) is a popular federated learning (FL) technique that updates the global model by averaging local models and then transmits the updated global model to devices for their local model update. One main limitation of FedAvg is that the average-based global model is not necessarily better than local models in the early stage of the training process so that FedAvg might diverg… ▽ More Federated averaging (FedAvg) is a popular federated learning (FL) technique that updates the global model by averaging local models and then transmits the updated global model to devices for their local model update. One main limitation of FedAvg is that the average-based global model is not necessarily better than local models in the early stage of the training process so that FedAvg might diverge in realistic scenarios, especially when the data is non-identically distributed across devices and the number of data samples varies significantly from device to device. In this paper, we propose a new FL technique based on simulated annealing. The key idea of the proposed technique, henceforth referred to as \textit{simulated annealing-based FL} (SAFL), is to allow a device to choose its local model when the global model is immature. Specifically, by exploiting the simulated annealing strategy, we make each device choose its local model with high probability in early iterations when the global model is immature. From extensive numerical experiments using various benchmark datasets, we demonstrate that SAFL outperforms the conventional FedAvg technique in terms of the convergence speed and the classification accuracy. △ Less

Submitted 11 October, 2021; originally announced October 2021.

arXiv:2109.03219 [pdf, other]

Fruit-CoV: An Efficient Vision-based Framework for Speedy Detection and Diagnosis of SARS-CoV-2 Infections Through Recorded Cough Sounds

Authors: Long H. Nguyen, Nhat Truong Pham, Van Huong Do, Liu Tai Nguyen, Thanh Tin Nguyen, Van Dung Do, Hai Nguyen, Ngoc Duy Nguyen

Abstract: SARS-CoV-2 is colloquially known as COVID-19 that had an initial outbreak in December 2019. The deadly virus has spread across the world, taking part in the global pandemic disease since March 2020. In addition, a recent variant of SARS-CoV-2 named Delta is intractably contagious and responsible for more than four million deaths over the world. Therefore, it is vital to possess a self-testing serv… ▽ More SARS-CoV-2 is colloquially known as COVID-19 that had an initial outbreak in December 2019. The deadly virus has spread across the world, taking part in the global pandemic disease since March 2020. In addition, a recent variant of SARS-CoV-2 named Delta is intractably contagious and responsible for more than four million deaths over the world. Therefore, it is vital to possess a self-testing service of SARS-CoV-2 at home. In this study, we introduce Fruit-CoV, a two-stage vision framework, which is capable of detecting SARS-CoV-2 infections through recorded cough sounds. Specifically, we convert sounds into Log-Mel Spectrograms and use the EfficientNet-V2 network to extract its visual features in the first stage. In the second stage, we use 14 convolutional layers extracted from the large-scale Pretrained Audio Neural Networks for audio pattern recognition (PANNs) and the Wavegram-Log-Mel-CNN to aggregate feature representations of the Log-Mel Spectrograms. Finally, we use the combined features to train a binary classifier. In this study, we use a dataset provided by the AICovidVN 115M Challenge, which includes a total of 7371 recorded cough sounds collected throughout Vietnam, India, and Switzerland. Experimental results show that our proposed model achieves an AUえーゆーC score of 92.8% and ranks the 1st place on the leaderboard of the AICovidVN Challenge. More importantly, our proposed framework can be integrated into a call center or a VoIP system to speed up detecting SARS-CoV-2 infections through online/recorded cough sounds. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: 4 pages

arXiv:2107.00742 [pdf, other]

doi 10.1103/PhysRevMaterials.5.105004

Singular angular magnetoresistance and sharp resonant features in a high-mobility metal with open orbits, ReO3

Authors: Nicholas P. Quirk, Loi T. Nguyen, Jiayi Hu, R. J. Cava, N. P. Ong

Abstract: We report high-resolution angular magnetoresistance (AMR) experiments performed on crystals of ReO$_3$ with high mobility (90,000 cm$^2$/Vs at 2 K) and extremely low residual resistivity (5-8 n$Ωおめが$cm). The Fermi surface, comprised of intersecting cylinders, supports open orbits. The resistivity $ρろー_{xx}$ in a magnetic field $B$ = 9 T displays a singular pattern of behavior. With… ▽ More We report high-resolution angular magnetoresistance (AMR) experiments performed on crystals of ReO$_3$ with high mobility (90,000 cm$^2$/Vs at 2 K) and extremely low residual resistivity (5-8 n$Ωおめが$cm). The Fermi surface, comprised of intersecting cylinders, supports open orbits. The resistivity $ρろー_{xx}$ in a magnetic field $B$ = 9 T displays a singular pattern of behavior. With $\bf E\parallel \hat{x}$ and $\bf B$ initially $\parallel\bf\hat{z}$, tilting $\bf B$ in the longitudinal $k_z$-$k_x$ plane leads to a steep decrease in $ρろー_{xx}$ by a factor of 40. However, if $\bf B$ is tilted in the transverse $k_y$-$k_z$ plane, $ρろー_{xx}$ increases steeply by a factor of 8. Using the Shockley tube integral approach, we show that, in ReO$_3$, the singular behavior results from the rapid conversion of closed to open orbits, resulting in opposite signs for AMR in orthogonal planes. The floor values of $ρろー_{xx}$ in both AMR scans are identified with specific sets of open and closed orbits. Also, the "completion angle" $γがんま_c$ detected in the AMR is shown to be an intrinsic geometric feature that provides a new way to measure the Fermi radius $k_F$. However, additional sharp resonant features which appear at very small tilt angles in the longitudinal AMR scans are not explained by the tube integral approach. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: 12 pages, 7 figures

Journal ref: Phys. Rev. M 5, 105004 (2021)

arXiv:2105.15079 [pdf, other]

SA2SL: From Aspect-Based Sentiment Analysis to Social Listening System for Business Intelligence

Authors: Luong Luc Phan, Phuc Huynh Pham, Kim Thi-Thanh Nguyen, Tham Thi Nguyen, Sieu Khai Huynh, Luan Thanh Nguyen, Tin Van Huynh, Kiet Van Nguyen

Abstract: In this paper, we present a process of building a social listening system based on aspect-based sentiment analysis in Vietnamese from creating a dataset to building a real application. Firstly, we create UIT-ViSFD, a Vietnamese Smartphone Feedback Dataset as a new benchmark corpus built based on a strict annotation schemes for evaluating aspect-based sentiment analysis, consisting of 11,122 human-… ▽ More In this paper, we present a process of building a social listening system based on aspect-based sentiment analysis in Vietnamese from creating a dataset to building a real application. Firstly, we create UIT-ViSFD, a Vietnamese Smartphone Feedback Dataset as a new benchmark corpus built based on a strict annotation schemes for evaluating aspect-based sentiment analysis, consisting of 11,122 human-annotated comments for mobile e-commerce, which is freely available for research purposes. We also present a proposed approach based on the Bi-LSTM architecture with the fastText word embeddings for the Vietnamese aspect based sentiment task. Our experiments show that our approach achieves the best performances with the F1-score of 84.48% for the aspect task and 63.06% for the sentiment task, which performs several conventional machine learning and deep learning systems. Last but not least, we build SA2SL, a social listening system based on the best performance model on our dataset, which will inspire more social listening systems in future. △ Less

Submitted 10 June, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

arXiv:2105.08358 [pdf, other]

Comparison-free polyregular functions

Authors: Lê Thành Dũng Tito Nguyên, Camille Noûs, Cécilia Pradic

Abstract: This paper introduces a new automata-theoretic class of string-to-string functions with polynomial growth. Several equivalent definitions are provided: a machine model which is a restricted variant of pebble transducers, and a few inductive definitions that close the class of regular functions under certain operations. Our motivation for studying this class comes from another characterization, whi… ▽ More This paper introduces a new automata-theoretic class of string-to-string functions with polynomial growth. Several equivalent definitions are provided: a machine model which is a restricted variant of pebble transducers, and a few inductive definitions that close the class of regular functions under certain operations. Our motivation for studying this class comes from another characterization, which we merely mention here but prove elsewhere, based on a $λらむだ$-calculus with a linear type system.As their name suggests, these comparison-free polyregular functions form a subclass of polyregular functions; we prove that the inclusion is strict. We also show that they are incomparable with HDT0L transductions, closed under usual function composition -- but not under a certain ``map'' combinator -- and satisfy a comparison-free version of the pebble minimization theorem.On the broader topic of polynomial growth transductions, we also consider the recently introduced layered streaming string transducers (SSTs), or equivalently k-marble transducers. We prove that a function can be obtained by composing such transducers together if and only if it is polyregular, and that k-layered SSTs (or k-marble transducers) are closed under ``map'' and equivalent to a corresponding notion of (k+1)-layered HDT0L systems. △ Less

Submitted 22 February, 2023; v1 submitted 18 May, 2021; originally announced May 2021.

Journal ref: International Colloquium on Automata, Languages and Programming 2021, Jul 2021, Glasgow, United Kingdom

arXiv:2105.04722 [pdf, other]

doi 10.2528/PIERL22071401

On the Electrostatic Interaction between Point Charges due to Dielectrical Shielding

Authors: Long T. Nguyen, Kim Tuan Do, Duy V. Nguyen, Trung Phan

Abstract: How will the electrostatic interaction between two point charges change if they are shielded from the other by a dielectrical slab? While the physical setting of this electromagnetic problem is relatively simple, it is easy to be wronged and the correct solution is surprisingly complicated. Here we will show a general answer using the method of images, in which the electrical field are not found b… ▽ More How will the electrostatic interaction between two point charges change if they are shielded from the other by a dielectrical slab? While the physical setting of this electromagnetic problem is relatively simple, it is easy to be wronged and the correct solution is surprisingly complicated. Here we will show a general answer using the method of images, in which the electrical field are not found by solving the Poisson's equation but by superposing an infinite number of image charges to recurrently satisfy all interfaces' boundary conditions. We also obtain analytical and algebraic results in some special cases. △ Less

Submitted 31 October, 2022; v1 submitted 10 May, 2021; originally announced May 2021.

Journal ref: Progress In Electromagnetics Research Letters, Vol. 107, 111-118, 2022

arXiv:2104.11969 [pdf, ps, other]

Vietnamese Complaint Detection on E-Commerce Websites

Authors: Nhung Thi-Hong Nguyen, Phuong Phan-Dieu Ha, Luan Thanh Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Customer product reviews play a role in improving the quality of products and services for business organizations or their brands. Complaining is an attitude that expresses dissatisfaction with an event or a product not meeting customer expectations. In this paper, we build a Open-domain Complaint Detection dataset (UIT-ViOCD), including 5,485 human-annotated reviews on four categories about produ… ▽ More Customer product reviews play a role in improving the quality of products and services for business organizations or their brands. Complaining is an attitude that expresses dissatisfaction with an event or a product not meeting customer expectations. In this paper, we build a Open-domain Complaint Detection dataset (UIT-ViOCD), including 5,485 human-annotated reviews on four categories about product reviews on e-commerce sites. After the data collection phase, we proceed to the annotation task and achieve the inter-annotator agreement Am of 87%. Then, we present an extensive methodology for the research purposes and achieve 92.16% by F1-score for identifying complaints. With the results, in the future, we aim to build a system for open-domain complaint detection in E-commerce websites. △ Less

Submitted 5 July, 2021; v1 submitted 24 April, 2021; originally announced April 2021.

arXiv:2104.07376 [pdf, other]

UIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named Entity Recognition and Question-Answering Approaches

Authors: Phu Gia Hoang, Luan Thanh Nguyen, Kiet Van Nguyen

Abstract: The increment of toxic comments on online space is causing tremendous effects on other vulnerable users. For this reason, considerable efforts are made to deal with this, and SemEval-2021 Task 5: Toxic Spans Detection is one of those. This task asks competitors to extract spans that have toxicity from the given texts, and we have done several analyses to understand its structure before doing exper… ▽ More The increment of toxic comments on online space is causing tremendous effects on other vulnerable users. For this reason, considerable efforts are made to deal with this, and SemEval-2021 Task 5: Toxic Spans Detection is one of those. This task asks competitors to extract spans that have toxicity from the given texts, and we have done several analyses to understand its structure before doing experiments. We solve this task by two approaches, Named Entity Recognition with spaCy library and Question-Answering with RoBERTa combining with ToxicBERT, and the former gains the highest F1-score of 66.99%. △ Less

Submitted 15 April, 2021; originally announced April 2021.

Comments: Accepted at SemEval-2021 Task 5: Toxic Spans Detection, ACL-IJCNLP 2021

arXiv:2104.03539 [pdf, other]

doi 10.1093/mnras/stab1002

Black hole mass measurement using ALMA observations of [CI] and CO emissions in the Seyfert 1 galaxy NGC7469

Authors: Dieu D. Nguyen, Takuma Izumi, Sabine Thater, Masatoshi Imanishi, Taiki Kawamuro, Shunsuke Baba, Suzuka Nakano, Jean L. Turner, Kotaro Kohno, Satoki Matsushita, Sergio Martin, David S. Meier, Phuong M. Nguyen, Lam T. Nguyen

Abstract: We present a supermassive black hole (SMBH) mass measurement in the Seyfert 1 galaxy NGC7469 using Atacama Large Millimeter/submillimeter Array (ALMA) observations of the atomic-${\rm [CI]}$(1-0) and molecular-$^{12}$CO(1-0) emission lines at the spatial resolution of $\approx0.3$" (or $\approx$ 100 pc). These emissions reveal that NGC7469 hosts a circumnuclear gas disc (CND) with a ring-like stru… ▽ More We present a supermassive black hole (SMBH) mass measurement in the Seyfert 1 galaxy NGC7469 using Atacama Large Millimeter/submillimeter Array (ALMA) observations of the atomic-${\rm [CI]}$(1-0) and molecular-$^{12}$CO(1-0) emission lines at the spatial resolution of $\approx0.3$" (or $\approx$ 100 pc). These emissions reveal that NGC7469 hosts a circumnuclear gas disc (CND) with a ring-like structure and a two-arm/bi-symmetric spiral pattern within it, surrounded by a starbursting ring. The CND has a relatively low $σしぐま/V\approx0.35$ ($r\sim0.5$") and $\sim0.19$ ($r>0.5"$), suggesting that the gas is dynamically settled and suitable for dynamically deriving the mass of its central source. As is expected from X-ray dominated region (XDR) effects that dramatically increase an atomic carbon abundance by dissociating CO molecules, we suggest that the atomic [CI](1-0) emission is a better probe of SMBH masses than CO emission in AGNs. Our dynamical model using the ${\rm [CI]}$(1-0) kinematics yields a $M_{\rm BH}=1.78^{+2.69}_{-1.10}\times10^7$M$_\odot$ and $M/L_{\rm F547M}=2.25^{+0.40}_{-0.43}$ (M$_\odot$/L$_\odot$). The model using the CO(1-0) kinematics also gives a consistent $M_{\rm BH}$ with a larger uncertainty, up to an order of magnitude, i.e.\ $M_{\rm BH}=1.60^{+11.52}_{-1.45}\times10^7$M$_\odot$. This newly dynamical $M_{\rm BH}$ is $\approx$ 2 times higher than the mass determined from the reverberation mapped (RM) method using emissions arising in the unresolved broad-line region (BLR). Given this new $M_{\rm BH}$, we are able to constrain the specific RM dimensionless scaling factor of $f=7.2^{+4.2}_{-3.4}$ for the AGN BLR in NGC7469. The gas within the unresolved BLR thus has a Keplerian virial velocity component and the inclination of $i\approx11.0^\circ$$_{-2.5}^{+2.2}$, confirming its face-on orientation in a Seyfert 1 AGN by assuming a geometrically thin BLR model. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Comments: 22 pages, 16 figures, 7 tables. Accepted for publication on MNRAS

arXiv:2104.02983 [pdf, other]

Optimal fire allocation in a combat model of mixed NCW type

Authors: My A. Vu, Nam H. Nguyen, Hanh Le T. Nguyen, Anh N. Ta, Mong H. Nguyen

Abstract: In this work, we introduce a nonlinear Lanchester model of NCW-type and study a problem of finding the optimal fire allocation for this model. A Blue party $B$ will fight against a Red party consisting of $A$ and $R$, where $A$ is an independent force and $R$ fights with supports from a supply unit $N$. A battle may consist of several stages but we consider the problem of finding optimal fire allo… ▽ More In this work, we introduce a nonlinear Lanchester model of NCW-type and study a problem of finding the optimal fire allocation for this model. A Blue party $B$ will fight against a Red party consisting of $A$ and $R$, where $A$ is an independent force and $R$ fights with supports from a supply unit $N$. A battle may consist of several stages but we consider the problem of finding optimal fire allocation for $B$ in the first stage only. Optimal fire allocation is a set of three non-negative numbers whose sum equals to one, such that the remaining force of $B$ is maximal at any instants. In order to tackle this problem, we introduce the notion of \textit{threatening rates} which are computed for $A, R, N$ at the beginning of the battle. Numerical illustrations are presented to justify the theoretical findings. △ Less

Submitted 7 April, 2021; originally announced April 2021.

arXiv:2103.10069 [pdf, other]

doi 10.1007/978-3-030-79457-6_49

Constructive and Toxic Speech Detection for Open-domain Social Media Comments in Vietnamese

Authors: Luan Thanh Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: The rise of social media has led to the increasing of comments on online forums. However, there still exists invalid comments which are not informative for users. Moreover, those comments are also quite toxic and harmful to people. In this paper, we create a dataset for constructive and toxic speech detection, named UIT-ViCTSD (Vietnamese Constructive and Toxic Speech Detection dataset) with 10,00… ▽ More The rise of social media has led to the increasing of comments on online forums. However, there still exists invalid comments which are not informative for users. Moreover, those comments are also quite toxic and harmful to people. In this paper, we create a dataset for constructive and toxic speech detection, named UIT-ViCTSD (Vietnamese Constructive and Toxic Speech Detection dataset) with 10,000 human-annotated comments. For these tasks, we propose a system for constructive and toxic speech detection with the state-of-the-art transfer learning model in Vietnamese NLP as PhoBERT. With this system, we obtain F1-scores of 78.59% and 59.40% for classifying constructive and toxic comments, respectively. Besides, we implement various baseline models as traditional Machine Learning and Deep Neural Network-Based models to evaluate the dataset. With the results, we can solve several tasks on the online discussions and develop the framework for identifying constructiveness and toxicity of Vietnamese social media comments automatically. △ Less

Submitted 6 September, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

Comments: IEA/AIE 2021: Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices pp 572-583

arXiv:2101.01476 [pdf, other]

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

Authors: Linh The Nguyen, Dat Quoc Nguyen

Abstract: We present the first multi-task learning model -- named PhoNLP -- for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT (Nguyen and Nguyen,… ▽ More We present the first multi-task learning model -- named PhoNLP -- for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT (Nguyen and Nguyen, 2020) for each task independently. We publicly release PhoNLP as an open-source toolkit under the Apache License 2.0. Although we specify PhoNLP for Vietnamese, our PhoNLP training and evaluation command scripts in fact can directly work for other languages that have a pre-trained BERT-based language model and gold annotated corpora available for the three tasks of POS tagging, NER and dependency parsing. We hope that PhoNLP can serve as a strong baseline and useful toolkit for future NLP research and applications to not only Vietnamese but also the other languages. Our PhoNLP is available at: https://github.com/VinAIResearch/PhoNLP △ Less

Submitted 8 April, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

Comments: To appear in Proceedings of NAACL 2021: Demonstrations

arXiv:2101.00862 [pdf]

doi 10.1002/adfm.202007960

NbIr$_2$B$_2$ and TaIr$_2$B$_2$ -- new low symmetry noncentrosymmetric superconductors with strong spin orbit coupling

Authors: Karolina Górnicka, Xin Gui, Bartlomiej Wiendlocha, Loi T. Nguyen, Weiwei Xie, Robert J. Cava, Tomasz Klimczuk

Abstract: Superconductivity was first observed more than a century ago, but the search for new superconducting materials remains a challenge. The Cooper pairs in superconductors are ideal embodiments of quantum entanglement. Thus, novel superconductors can be critical for both learning about electronic systems in condensed matter and for possible application in future quantum technologies. Here two previous… ▽ More Superconductivity was first observed more than a century ago, but the search for new superconducting materials remains a challenge. The Cooper pairs in superconductors are ideal embodiments of quantum entanglement. Thus, novel superconductors can be critical for both learning about electronic systems in condensed matter and for possible application in future quantum technologies. Here two previously unreported materials, NbIr$_2$B$_2$ and TaIr$_2$B$_2$, are presented with superconducting transitions at 7.2 and 5.2 K, respectively. They display a unique noncentrosymmetric crystal structure, and for both compounds the magnetic field that destroys the superconductivity at 0 K exceeds one of the fundamental characteristics of conventional superconductors (the Pauli limit), suggesting that the superconductivity may be unconventional. Supporting this experimentally based deduction, first-principle calculations show a spin split Fermi surface due to the presence of strong spin-orbit coupling. These materials may thus provide an excellent platform for the study of non-BCS superconductivity in intermetallic compounds. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Comments: 36 pages, 11 figures

Journal ref: Adv. Funct. Mater. 2020, 2007960

arXiv:2012.14969 [pdf]

Van der Waals Heterostructure Magnetic Josephson Junction

Authors: H. Idzuchi, F. Pientka, K. -F. Huang, K. Harada, Ö. Gül, Y. J. Shin, L. T. Nguyen, N. H. Jo, D. Shindo, R. J. Cava, P. C. Canfield, P. Kim

Abstract: When two superconductors are connected across a ferromagnet, the spin configuration of the transferred Cooper pairs can be modulated due to magnetic exchange interaction. The resulting supercurrent can reverse its sign across the Josephson junction (JJ) [1-4]. Here we demonstrate Josephson phase modulation in van der Waals heterostructures when Cooper pairs from superconducting NbSe$_2$ tunnel thr… ▽ More When two superconductors are connected across a ferromagnet, the spin configuration of the transferred Cooper pairs can be modulated due to magnetic exchange interaction. The resulting supercurrent can reverse its sign across the Josephson junction (JJ) [1-4]. Here we demonstrate Josephson phase modulation in van der Waals heterostructures when Cooper pairs from superconducting NbSe$_2$ tunnel through atomically thin magnetic insulator (MI) Cr$_2$Ge$_2$Te$_6$. Employing a superconducting quantum interference device based on MI JJs, we probe a doubly degenerate non-trivial JJ phase ($φふぁい$) originating from the magnetic barrier. This $φふぁい$-phase JJ is formed by momentum conserving tunneling of Ising Cooper pairs [5] across magnetic domains in the Cr$_2$Ge$_2$Te$_6$ barrier. The doubly degenerate ground states in MI JJs provide a two-level quantum system that can be utilized as a new disipationless component for superconducting quantum devices, including phase batteries [6], memories [7,8], and quantum Ratchets [9,10]. △ Less

Submitted 29 December, 2020; originally announced December 2020.

arXiv:2011.00103 [pdf]

doi 10.1016/j.jssc.2019.06.037

Low Temperature Structural Phase Transition in the Perovskite Ba2CaMoO6

Authors: Loi T. Nguyen, Robert J. Cava, Allyson M. Fry-Petit

Abstract: Ba2CaMoO6 was synthesized by solid state method. The crystal structure adopts cubic Fm-3m space group at room temperature with lattice parameters of 8.378231(5) {angstroms}. Upon cooling, Ba2CaMoO6 was determined to have a structural phase transition to tetragonal I4/m (a=5.905763(6) {angstroms} and c=8.38817(1) {angstroms}) around 200 K. The phase transition was probed structurally by synchrotron… ▽ More Ba2CaMoO6 was synthesized by solid state method. The crystal structure adopts cubic Fm-3m space group at room temperature with lattice parameters of 8.378231(5) {angstroms}. Upon cooling, Ba2CaMoO6 was determined to have a structural phase transition to tetragonal I4/m (a=5.905763(6) {angstroms} and c=8.38817(1) {angstroms}) around 200 K. The phase transition was probed structurally by synchrotron and neutron diffraction and thermodynamically by specific heat and differential scanning calorimetry measurement. This structural phase transition will deepens our understanding of the perovskite family especially the formation of perovskites that break corner sharing networks. △ Less

Submitted 30 October, 2020; originally announced November 2020.

Comments: 5 figures, 3 tables

Journal ref: Journal of Solid State Chemistry (2019)

arXiv:2010.08232 [pdf, other]

WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets

Authors: Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen, Long Doan

Abstract: In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems… ▽ More In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems obtain very high performance, up to 0.91 F1 score, (ii) the majority of the submissions achieve substantially higher results than the baseline fastText (Joulin et al., 2017), and (iii) fine-tuning pre-trained language models on relevant language data followed by supervised training performs well in this task. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: In Proceedings of the 6th Workshop on Noisy User-generated Text

arXiv:2009.11215 [pdf]

Structure, Magnetism and First Principles Modeling of the Na0.5La0.5RuO3 Perovskite

Authors: Loi T. Nguyen, Matthieu Saubanère, Robert J. Cava

Abstract: High purity polycrystalline Na0.5La0.5RuO3 was synthesized by a solid state method, and its properties were studied by magnetic susceptibility, heat capacity and resistivity measurements. We find it to be a tetragonal perovskite, in contrast to an earlier report, with random La/Na mixing. With a Curie-Weiss temperature of -231 K and effective moment of 2.74 uB/mol-Ru, there is no magnetic ordering… ▽ More High purity polycrystalline Na0.5La0.5RuO3 was synthesized by a solid state method, and its properties were studied by magnetic susceptibility, heat capacity and resistivity measurements. We find it to be a tetragonal perovskite, in contrast to an earlier report, with random La/Na mixing. With a Curie-Weiss temperature of -231 K and effective moment of 2.74 uB/mol-Ru, there is no magnetic ordering down to 1.8 K. A broad hump at 1.4 K in the heat capacity, however, indicates the presence of a glassy magnetic transition, which we attribute to the influence of the random distribution of Na and La on the perovskite A sites. Comparison to CaRuO3, a structurally ordered ruthenate perovskite with similar properties, is presented. First-principle calculations indicate that the Na-La distribution determines the local magnetic exchange inter-actions between Ru ions, favoring either antiferromagnetic or ferromagnetic coupling when the local environment is Na or La rich. Thus our data and analysis suggest that mixing cations with different charges and sizes on the A site in this perovskite results in magnetic frustration through a balance of local magnetic exchange interactions. △ Less

Submitted 23 September, 2020; originally announced September 2020.

Comments: 7 figures, 2 tables

arXiv:2009.07333 [pdf]

doi 10.1103/PhysRevMaterials.5.034419

Widely Spaced Planes of Magnetic Dimers in the Ba6Y2Rh2Ti2O17-δでるた Hexagonal Perovskite

Authors: Loi T. Nguyen, Daniel B. Straus, Q. Zhang, R. J. Cava

Abstract: We report the synthesis and initial characterization of Ba6Y2Rh2Ti2O17-δでるた, a previously unreported material with a hexagonal symmetry structure. Face-sharing RhO6 octahedra form triangular planes of Rh2O9 dimers that are widely separated in the perpendicular direction. The material displays a small effective magnetic moment, due to the Rh ions present, and a negative Curie-Weiss temperature. The ch… ▽ More We report the synthesis and initial characterization of Ba6Y2Rh2Ti2O17-δでるた, a previously unreported material with a hexagonal symmetry structure. Face-sharing RhO6 octahedra form triangular planes of Rh2O9 dimers that are widely separated in the perpendicular direction. The material displays a small effective magnetic moment, due to the Rh ions present, and a negative Curie-Weiss temperature. The charge transport and optical band gaps are very similar, near 0.16 eV. A large upturn in the heat capacity at temperatures below 1 K, suppressed by applied magnetic fields larger than {μみゅー0H = 2 Tesla, is observed. A large T-linear term in the specific heat (γがんま=166 mJ/mol f.u-K2) is seen, although the material is insulating at low temperatures. These results suggest the possibility of a spin liquid ground state in this material. △ Less

Submitted 19 December, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

Comments: 5 figures, 2 tables

Journal ref: Phys. Rev. Materials 5, 034419 (2021)

arXiv:2009.02671 [pdf, other]

doi 10.18653/v1/2020.wnut-1.50

BANANA at WNUT-2020 Task 2: Identifying COVID-19 Information on Twitter by Combining Deep Learning and Transfer Learning Models

Authors: Tin Van Huynh, Luan Thanh Nguyen, Son T. Luu

Abstract: The outbreak COVID-19 virus caused a significant impact on the health of people all over the world. Therefore, it is essential to have a piece of constant and accurate information about the disease with everyone. This paper describes our prediction system for WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets. The dataset for this task contains size 10,000 tweets in English la… ▽ More The outbreak COVID-19 virus caused a significant impact on the health of people all over the world. Therefore, it is essential to have a piece of constant and accurate information about the disease with everyone. This paper describes our prediction system for WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets. The dataset for this task contains size 10,000 tweets in English labeled by humans. The ensemble model from our three transformer and deep learning models is used for the final prediction. The experimental result indicates that we have achieved F1 for the INFORMATIVE label on our systems at 88.81% on the test set. △ Less

Submitted 1 April, 2021; v1 submitted 6 September, 2020; originally announced September 2020.

Comments: Submitted to 2020 The 6th Workshop on Noisy User-generated Text (W-NUT)

Showing 1–50 of 78 results for author: Nguyen, L T