Search | arXiv e-print repository

SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

Authors: Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Tobias Röddiger, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues

Abstract: With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx -… ▽ More With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx - a benchmark consisting of university computer science exam questions, to evaluate LLMs ability on solving scientific tasks. SciEx is (1) multilingual, containing both English and German exams, and (2) multi-modal, containing questions that involve images, and (3) contains various types of freeform questions with different difficulty levels, due to the nature of university exams. We evaluate the performance of various state-of-the-art LLMs on our new benchmark. Since SciEx questions are freeform, it is not straightforward to evaluate LLM performance. Therefore, we provide human expert grading of the LLM outputs on SciEx. We show that the free-form exams in SciEx remain challenging for the current LLMs, where the best LLM only achieves 59.4\% exam grade on average. We also provide detailed comparisons between LLM performance and student performance on SciEx. To enable future evaluation of new LLMs, we propose using LLM-as-a-judge to grade the LLM answers on SciEx. Our experiments show that, although they do not perform perfectly on solving the exams, LLMs are decent as graders, achieving 0.948 Pearson correlation with expert grading. △ Less

Submitted 12 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

ACM Class: I.2.7

arXiv:2405.07180 [pdf, other]

Repairing Reed-Solomon Codes with Side Information

Authors: Thi Xinh Dinh, Ba Thong Le, Son Hoang Dau, Serdar Boztas, Stanislav Kruglik, Han Mao Kiah, Emanuele Viterbo, Tuvi Etzion, Yeow Meng Chee

Abstract: We generalize the problem of recovering a lost/erased symbol in a Reed-Solomon code to the scenario in which some side information about the lost symbol is known. The side information is represented as a set $S$ of linearly independent combinations of the sub-symbols of the lost symbol. When $S = \varnothing$, this reduces to the standard problem of repairing a single codeword symbol. When $S$ is… ▽ More We generalize the problem of recovering a lost/erased symbol in a Reed-Solomon code to the scenario in which some side information about the lost symbol is known. The side information is represented as a set $S$ of linearly independent combinations of the sub-symbols of the lost symbol. When $S = \varnothing$, this reduces to the standard problem of repairing a single codeword symbol. When $S$ is a set of sub-symbols of the erased one, this becomes the repair problem with partially lost/erased symbol. We first establish that the minimum repair bandwidth depends on $|S|$ and not the content of $S$ and construct a lower bound on the repair bandwidth of a linear repair scheme with side information $S$. We then consider the well-known subspace-polynomial repair schemes and show that their repair bandwidths can be optimized by choosing the right subspaces. Finally, we demonstrate several parameter regimes where the optimal bandwidths can be achieved for full-length Reed-Solomon codes. △ Less

Submitted 12 May, 2024; originally announced May 2024.

MSC Class: 94B05; 94B60 ACM Class: E.4

arXiv:2404.18031 [pdf, other]

Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation

Authors: Tu Anh Dinh, Tobias Palzer, Jan Niehues

Abstract: Providing quality scores along with Machine Translation (MT) output, so-called reference-free Quality Estimation (QE), is crucial to inform users about the reliability of the translation. We propose a model-specific, unsupervised QE approach, termed $k$NN-QE, that extracts information from the MT model's training data using $k$-nearest neighbors. Measuring the performance of model-specific QE is n… ▽ More Providing quality scores along with Machine Translation (MT) output, so-called reference-free Quality Estimation (QE), is crucial to inform users about the reliability of the translation. We propose a model-specific, unsupervised QE approach, termed $k$NN-QE, that extracts information from the MT model's training data using $k$-nearest neighbors. Measuring the performance of model-specific QE is not straightforward, since they provide quality scores on their own MT output, thus cannot be evaluated using benchmark QE test sets containing human quality scores on premade MT output. Therefore, we propose an automatic evaluation method that uses quality scores from reference-based metrics as gold standard instead of human-generated ones. We are the first to conduct detailed analyses and conclude that this automatic method is sufficient, and the reference-based MetricX-23 is best for the task. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: Accepted to EAMT 2024

ACM Class: I.2.7

arXiv:2403.15509 [pdf, other]

Twin Auto-Encoder Model for Learning Separable Representation in Cyberattack Detection

Authors: Phai Vu Dinh, Quang Uy Nguyen, Thai Hoang Dinh, Diep N. Nguyen, Bao Son Pham, Eryk Dutkiewicz

Abstract: Representation Learning (RL) plays a pivotal role in the success of many problems including cyberattack detection. Most of the RL methods for cyberattack detection are based on the latent vector of Auto-Encoder (AE) models. An AE transforms raw data into a new latent representation that better exposes the underlying characteristics of the input data. Thus, it is very useful for identifying cyberat… ▽ More Representation Learning (RL) plays a pivotal role in the success of many problems including cyberattack detection. Most of the RL methods for cyberattack detection are based on the latent vector of Auto-Encoder (AE) models. An AE transforms raw data into a new latent representation that better exposes the underlying characteristics of the input data. Thus, it is very useful for identifying cyberattacks. However, due to the heterogeneity and sophistication of cyberattacks, the representation of AEs is often entangled/mixed resulting in the difficulty for downstream attack detection models. To tackle this problem, we propose a novel mod called Twin Auto-Encoder (TAE). TAE deterministically transforms the latent representation into a more distinguishable representation namely the \textit{separable representation} and the reconstructsuct the separable representation at the output. The output of TAE called the \textit{reconstruction representation} is input to downstream models to detect cyberattacks. We extensively evaluate the effectiveness of TAE using a wide range of bench-marking datasets. Experiment results show the superior accuracy of TAE over state-of-the-art RL models and well-known machine learning algorithms. Moreover, TAE also outperforms state-of-the-art models on some sophisticated and challenging attacks. We then investigate various characteristics of TAE to further demonstrate its superiority. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2401.10447 [pdf, other]

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

Authors: Yu Yu, Chao-Han Huck Yang, Tuan Dinh, Sungho Ryu, Jari Kolehmainen, Roger Ren, Denis Filimonov, Prashanth G. Shivakumar, Ankur Gandhe, Ariya Rastow, Jia Xu, Ivan Bulyko, Andreas Stolcke

Abstract: The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions of 3.50\% on the public Librispeech dat… ▽ More The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions of 3.50\% on the public Librispeech dataset and of 3.67\% on an internal dataset in the messaging domain. To further characterize the stability of LoRA-based second-pass speech recognition models, we examine robustness against input perturbations. These perturbations are rooted in homophone replacements and a novel metric called N-best Perturbation-based Rescoring Robustness (NPRR), both designed to measure the relative degradation in the performance of rescoring models. Our experimental results indicate that while advanced variants of LoRA, such as dynamic rank-allocated LoRA, lead to performance degradation in $1$-best perturbation, they alleviate the degradation in $N$-best perturbation. This finding is in comparison to fully-tuned models and vanilla LoRA tuning baselines, suggesting that a comprehensive selection is needed when using LoRA-based adaptation for compute-cost savings and robust language modeling. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2311.07289 [pdf, other]

A probabilistic forecast methodology for volatile electricity prices in the Australian National Electricity Market

Authors: Cameron Cornell, Nam Trong Dinh, S. Ali Pourmousavi

Abstract: The South Australia region of the Australian National Electricity Market (NEM) displays some of the highest levels of price volatility observed in modern electricity markets. This paper outlines an approach to probabilistic forecasting under these extreme conditions, including spike filtration and several post-processing steps. We propose using quantile regression as an ensemble tool for probabili… ▽ More The South Australia region of the Australian National Electricity Market (NEM) displays some of the highest levels of price volatility observed in modern electricity markets. This paper outlines an approach to probabilistic forecasting under these extreme conditions, including spike filtration and several post-processing steps. We propose using quantile regression as an ensemble tool for probabilistic forecasting, with our combined forecasts achieving superior results compared to all constituent models. Within our ensemble framework, we demonstrate that averaging models with varying training length periods leads to a more adaptive model and increased prediction accuracy. The applicability of the final model is evaluated by comparing our median forecasts with the point forecasts available from the Australian NEM operator, with our model outperforming these NEM forecasts by a significant margin. △ Less

Submitted 12 December, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: This manuscript has been accepted for publication in International Journal of Forecasting

arXiv:2310.02494 [pdf, other]

On the Financial Consequences of Simplified Battery Sizing Models without Considering Operational Details

Authors: Nam Trong Dinh, Sahand Karimi-Arpanahi, S. Ali Pourmousavi, Mingyu Guo, Julian Lemos-Vinasco, Jon A. R. Liisberg

Abstract: Optimal battery sizing studies tend to overly simplify the practical aspects of battery operation within the battery sizing framework. Such assumptions may lead to a suboptimal battery capacity, resulting in significant financial losses for a battery project that could last more than a decade. In this paper, we compare the most common existing sizing methods in the literature with a battery sizing… ▽ More Optimal battery sizing studies tend to overly simplify the practical aspects of battery operation within the battery sizing framework. Such assumptions may lead to a suboptimal battery capacity, resulting in significant financial losses for a battery project that could last more than a decade. In this paper, we compare the most common existing sizing methods in the literature with a battery sizing model that incorporates the practical operation of a battery, that is, receding horizon operation. Consequently, we quantify the financial losses caused by the suboptimal capacities obtained by these models for a realistic case study related to community battery storage (CBS). We develop the case study by constructing a mathematical framework for the CBS and local end users. Our results show that existing sizing methods can lead to financial losses of up to 22%. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: This manuscript has been submitted to PSCC 2024 for possible publication

arXiv:2309.15223 [pdf, other]

doi 10.1109/ASRU57964.2023.10389632

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Authors: Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko

Abstract: We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we p… ▽ More We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6. △ Less

Submitted 10 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: Accepted to IEEE ASRU 2023. Internal Review Approved. Revised 2nd version with Andreas and Huck. The first version is in Sep 29th. 8 pages

Journal ref: Proc. IEEE ASRU Workshop, Dec. 2023

arXiv:2309.09012 [pdf, other]

Modelling Irrational Behaviour of Residential End Users using Non-Stationary Gaussian Processes

Authors: Nam Trong Dinh, Sahand Karimi-Arpanahi, Rui Yuan, S. Ali Pourmousavi, Mingyu Guo, Jon A. R. Liisberg, Julian Lemos-Vinasco

Abstract: Demand response (DR) plays a critical role in ensuring efficient electricity consumption and optimal use of network assets. Yet, existing DR models often overlook a crucial element, the irrational behaviour of electricity end users. In this work, we propose a price-responsive model that incorporates key aspects of end-user irrationality, specifically loss aversion, time inconsistency, and bounded… ▽ More Demand response (DR) plays a critical role in ensuring efficient electricity consumption and optimal use of network assets. Yet, existing DR models often overlook a crucial element, the irrational behaviour of electricity end users. In this work, we propose a price-responsive model that incorporates key aspects of end-user irrationality, specifically loss aversion, time inconsistency, and bounded rationality. To this end, we first develop a framework that uses Multiple Seasonal-Trend decomposition using Loess (MSTL) and non-stationary Gaussian processes to model the randomness in the electricity consumption by residential consumers. The impact of this model is then evaluated through a community battery storage (CBS) business model. Additionally, we apply a chance-constrained optimisation model for CBS operation that deals with the unpredictability of the end-user irrationality. Our simulations using real-world data show that the proposed DR model provides a more realistic estimate of end-user price-responsive behaviour when considering irrationality. Compared to a deterministic model that cannot fully take into account the irrational behaviour of end users, the chance-constrained CBS operation model yields an additional 19% revenue. Lastly, the business model reduces the electricity costs of solar end users by 11%. △ Less

Submitted 26 March, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

Comments: This manuscript has been accepted for publication in IEEE Transactions on Smart Grid

arXiv:2308.03415 [pdf, other]

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

Authors: Christian Huber, Tu Anh Dinh, Carlos Mullov, Ngoc Quan Pham, Thai Binh Nguyen, Fabian Retkowski, Stefan Constantin, Enes Yavuz Ugan, Danni Liu, Zhaolin Li, Sai Koneru, Jan Niehues, Alexander Waibel

Abstract: The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work… ▽ More The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work, we propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions. The evaluation is carried out in an end-to-end fashion. This includes the segmentation of the audio as well as the run-time of the different components. Secondly, we compare different approaches to low-latency speech translation using this framework. We evaluate models with the option to revise the output as well as methods with fixed output. Furthermore, we directly compare state-of-the-art cascaded as well as end-to-end systems. Finally, the framework allows to automatically evaluate the translation quality as well as latency and also provides a web interface to show the low-latency model outputs to the user. △ Less

Submitted 17 July, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: Demo paper at EMNLP 2023

arXiv:2306.15860 [pdf, other]

Federated Deep Reinforcement Learning-based Bitrate Adaptation for Dynamic Adaptive Streaming over HTTP

Authors: Phuong L. Vo, Nghia T. Nguyen, Long Luu, Canh T. Dinh, Nguyen H. Tran, Tuan-Anh Le

Abstract: In video streaming over HTTP, the bitrate adaptation selects the quality of video chunks depending on the current network condition. Some previous works have applied deep reinforcement learning (DRL) algorithms to determine the chunk's bitrate from the observed states to maximize the quality-of-experience (QoE). However, to build an intelligent model that can predict in various environments, such… ▽ More In video streaming over HTTP, the bitrate adaptation selects the quality of video chunks depending on the current network condition. Some previous works have applied deep reinforcement learning (DRL) algorithms to determine the chunk's bitrate from the observed states to maximize the quality-of-experience (QoE). However, to build an intelligent model that can predict in various environments, such as 3G, 4G, Wifi, \textit{etc.}, the states observed from these environments must be sent to a server for training centrally. In this work, we integrate federated learning (FL) to DRL-based rate adaptation to train a model appropriate for different environments. The clients in the proposed framework train their model locally and only update the weights to the server. The simulations show that our federated DRL-based rate adaptations, called FDRLABR with different DRL algorithms, such as deep Q-learning, advantage actor-critic, and proximal policy optimization, yield better performance than the traditional bitrate adaptation methods in various environments. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 13 pages, 1 column

arXiv:2306.09754 [pdf, other]

CroCoDai: A Stablecoin for Cross-Chain Commerce

Authors: Daniël Reijsbergen, Bretislav Hajek, Tien Tuan Anh Dinh, Jussi Keppo, Hank Korth, Anwitaman Datta

Abstract: Decentralized Finance (DeFi), in which digital assets are exchanged without trusted intermediaries, has grown rapidly in value in recent years. The global DeFi ecosystem is fragmented into multiple blockchains, fueling the demand for cross-chain commerce. Existing approaches for cross-chain transactions, e.g., bridges and cross-chain deals, achieve atomicity by locking assets in escrow. However, l… ▽ More Decentralized Finance (DeFi), in which digital assets are exchanged without trusted intermediaries, has grown rapidly in value in recent years. The global DeFi ecosystem is fragmented into multiple blockchains, fueling the demand for cross-chain commerce. Existing approaches for cross-chain transactions, e.g., bridges and cross-chain deals, achieve atomicity by locking assets in escrow. However, locking up assets increases the financial risks for the participants, especially due to price fluctuations and the long latency of cross-chain transactions. Stablecoins, which are pegged to a non-volatile asset such as the US dollar, help mitigate the risk associated with price fluctuations. However, existing stablecoin designs are tied to individual blockchain platforms, and trusted parties or complex protocols are needed to exchange stablecoin tokens between blockchains. Our goal is to design a practical stablecoin for cross-chain commerce. Realizing this goal requires addressing two challenges. The first challenge is to support a large and growing number of blockchains efficiently. The second challenge is to be resilient to price fluctuations and blockchain platform failures. We present CroCoDai to address these challenges. We also present three prototype implementations of our stablecoin system, and show that it incurs small execution overhead. △ Less

Submitted 20 June, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.09735 [pdf, other]

PIEChain -- A Practical Blockchain Interoperability Framework

Authors: Daniël Reijsbergen, Aung Maw, Jingchi Zhang, Tien Tuan Anh Dinh, Anwitaman Datta

Abstract: A plethora of different blockchain platforms have emerged in recent years, but many of them operate in silos. As such, there is a need for reliable cross-chain communication to enable blockchain interoperability. Blockchain interoperability is challenging because transactions can typically not be reverted - as such, if one transaction is committed then the protocol must ensure that all related tra… ▽ More A plethora of different blockchain platforms have emerged in recent years, but many of them operate in silos. As such, there is a need for reliable cross-chain communication to enable blockchain interoperability. Blockchain interoperability is challenging because transactions can typically not be reverted - as such, if one transaction is committed then the protocol must ensure that all related transactions are committed as well. Existing interoperability approaches, e.g., Cosmos and Polkadot, are limited in the sense that they only support interoperability between their own subchains, or require intrusive changes to existing blockchains. To overcome this limitation, we propose PIEChain, a general, Kafka-based cross-chain communication framework. We utilize PIEChain for a practical case study: a cross-chain auction in which users who hold tokens on multiple chains bid for a ticket sold on another chain. PIEChain is the first publicly available, practical implementation of a general framework for cross-chain communication. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.05320 [pdf, other]

KIT's Multilingual Speech Translation System for IWSLT 2023

Authors: Danni Liu, Thai Binh Nguyen, Sai Koneru, Enes Yavuz Ugan, Ngoc-Quan Pham, Tuan-Nam Nguyen, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, Jan Niehues

Abstract: Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which evaluates translation quality on scientific conference talks. The test condition features accented input speech and te… ▽ More Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which evaluates translation quality on scientific conference talks. The test condition features accented input speech and terminology-dense contents. The task requires translation into 10 languages of varying amounts of resources. In absence of training data from the target domain, we use a retrieval-based approach (kNN-MT) for effective adaptation (+0.8 BLEU for speech translation). We also use adapters to easily integrate incremental training data from data augmentation, and show that it matches the performance of re-training. We observe that cascaded systems are more easily adaptable towards specific target domains, due to their separate modules. Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation, although their performance remains similar on TED talks. △ Less

Submitted 12 July, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: IWSLT 2023

arXiv:2306.03438 [pdf, other]

Large Language Models of Code Fail at Completing Code with Potential Bugs

Authors: Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen, Sheng Zha, George Karypis

Abstract: Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bugs in the code context for generation, which are inevitable in software development. Therefore, we introduce and study the buggy-code completion problem, inspired… ▽ More Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bugs in the code context for generation, which are inevitable in software development. Therefore, we introduce and study the buggy-code completion problem, inspired by the realistic scenario of real-time code suggestion where the code context contains potential bugs -- anti-patterns that can become bugs in the completed program. To systematically study the task, we introduce two datasets: one with synthetic bugs derived from semantics-altering operator changes (buggy-HumanEval) and one with realistic bugs derived from user submissions to coding problems (buggy-FixEval). We find that the presence of potential bugs significantly degrades the generation performance of the high-performing Code-LLMs. For instance, the passing rates of CODEGEN-2B-MONO on test cases of buggy-HumanEval drop more than 50% given a single potential bug in the context. Finally, we investigate several post-hoc methods for mitigating the adverse effect of potential bugs and find that there remains a significant gap in post-mitigation performance. △ Less

Submitted 30 November, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 27 pages, accepted to NeurIPS 2023

arXiv:2305.07457 [pdf, other]

Perturbation-based QE: An Explainable, Unsupervised Word-level Quality Estimation Method for Blackbox Machine Translation

Authors: Tu Anh Dinh, Jan Niehues

Abstract: Quality Estimation (QE) is the task of predicting the quality of Machine Translation (MT) system output, without using any gold-standard translation references. State-of-the-art QE models are supervised: they require human-labeled quality of some MT system output on some datasets for training, making them domain-dependent and MT-system-dependent. There has been research on unsupervised QE, which r… ▽ More Quality Estimation (QE) is the task of predicting the quality of Machine Translation (MT) system output, without using any gold-standard translation references. State-of-the-art QE models are supervised: they require human-labeled quality of some MT system output on some datasets for training, making them domain-dependent and MT-system-dependent. There has been research on unsupervised QE, which requires glass-box access to the MT systems, or parallel MT data to generate synthetic errors for training QE models. In this paper, we present Perturbation-based QE - a word-level Quality Estimation approach that works simply by analyzing MT system output on perturbed input source sentences. Our approach is unsupervised, explainable, and can evaluate any type of blackbox MT systems, including the currently prominent large language models (LLMs) with opaque internal processes. For language directions with no labeled QE data, our approach has similar or better performance than the zero-shot supervised approach on the WMT21 shared task. Our approach is better at detecting gender bias and word-sense-disambiguation errors in translation than supervised QE, indicating its robustness to out-of-domain usage. The performance gap is larger when detecting errors on a nontraditional translation-prompting LLM, indicating that our approach is more generalizable to different MT systems. We give examples demonstrating our approach's explainability power, where it shows which input source words have influence on a certain MT output word. △ Less

Submitted 13 July, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: Accepted to MT Summit 2023

ACM Class: I.2.7

arXiv:2305.06600 [pdf, other]

Designing Compact Repair Groups for Reed-Solomon Codes

Authors: Thi Xinh Dinh, Serdar Boztas, Son Hoang Dau, Emanuele Viterbo

Abstract: Motivated by the application of Reed-Solomon codes to recently emerging decentralized storage systems such as Storj and Filebase/Sia, we study the problem of designing compact repair groups for recovering multiple failures in a decentralized manner. Here, compactness means that the corresponding trace repair schemes of these groups of helpers can be generated from a single or a few seed repair sch… ▽ More Motivated by the application of Reed-Solomon codes to recently emerging decentralized storage systems such as Storj and Filebase/Sia, we study the problem of designing compact repair groups for recovering multiple failures in a decentralized manner. Here, compactness means that the corresponding trace repair schemes of these groups of helpers can be generated from a single or a few seed repair schemes, thus saving the time and space required for finding and storing them. The goal is to design compact repair groups that can tolerate as many failures as possible. It turns out that the maximum number of failures a collection of repair groups can tolerate equals the size of a minimum hitting set of a collection of subsets of the finite field {\mathbb{F}_{q^{\ell}}} minus one. When the repair groups for each symbol are generated from a single subspace, we establish a pair of asymptotically tight lower bound and upper bound on the size of such a minimum hitting set. Using Burnside's Lemma and the Möbius inversion formula, we determine a number of subspaces that together attain the upper bound on the minimum hitting set size when the repair groups are generated from multiple subspaces. △ Less

Submitted 11 May, 2023; originally announced May 2023.

arXiv:2305.04605 [pdf]

doi 10.1109/ICEET53442.2021.9659578

Development of a Vision System to Enhance the Reliability of the Pick-and-Place Robot for Autonomous Testing of Camera Module used in Smartphones

Authors: Hoang-Anh Phan, Duy Nam Bui, Tuan Nguyen Dinh, Bao-Anh Hoang, An Nguyen Ngoc, Dong Tran Huu Quoc, Ha Tran Thi Thuy, Tung Thanh Bui, Van Nguyen Thi Thanh

Abstract: Pick-and-place robots are commonly used in modern industrial manufacturing. For complex devices/parts like camera modules used in smartphones, which contain optical parts, electrical components and interfacing connectors, the placement operation may not absolutely accurate, which may cause damage in the device under test during the mechanical movement to make good contact for electrical functions… ▽ More Pick-and-place robots are commonly used in modern industrial manufacturing. For complex devices/parts like camera modules used in smartphones, which contain optical parts, electrical components and interfacing connectors, the placement operation may not absolutely accurate, which may cause damage in the device under test during the mechanical movement to make good contact for electrical functions inspection. In this paper, we proposed an effective vision system including hardware and algorithm to enhance the reliability of the pick-and-place robot for autonomous testing memory of camera modules. With limited hardware based on camera and raspberry PI and using simplify image processing algorithm based on histogram information, the vision system can confirm the presence of the camera modules in feeding tray and the placement accuracy of the camera module in test socket. Through that, the system can work with more flexibility and avoid damaging the device under test. The system was experimentally quantified through testing approximately 2000 camera modules in a stable light condition. Experimental results demonstrate that the system achieves accuracy of more than 99.92%. With its simplicity and effectiveness, the proposed vision system can be considered as a useful solution for using in pick-and-place systems in industry. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: Published to 2021 International Conference on Engineering and Emerging Technologies (ICEET 2021). 6 pages

arXiv:2304.13671 [pdf, other]

Multiobjective Logistics Optimization for Automated ATM Cash Replenishment Process

Authors: Bui Tien Thanh, Dinh Van Tuan, Tuan Anh Chi, Nguyen Van Dai, Nguyen Tai Quang Dinh, Nguyen Thu Thuy, Nguyen Thi Xuan Hoa

Abstract: In the digital transformation era, integrating digital technology into every aspect of banking operations improves process automation, cost efficiency, and service level improvement. Although logistics for ATM cash is a crucial task that impacts operating costs and consumer satisfaction, there has been little effort to enhance it. Specifically, in Vietnam, with a market of more than 20,000 ATMs na… ▽ More In the digital transformation era, integrating digital technology into every aspect of banking operations improves process automation, cost efficiency, and service level improvement. Although logistics for ATM cash is a crucial task that impacts operating costs and consumer satisfaction, there has been little effort to enhance it. Specifically, in Vietnam, with a market of more than 20,000 ATMs nationally, research and technological solutions that can resolve this issue remain scarce. In this paper, we generalized the vehicle routing problem for ATM cash replenishment, suggested a mathematical model and then offered a tool to evaluate various situations. When being evaluated on the simulated dataset, our proposed model and method produced encouraging results with the benefits of cutting ATM cash operating costs. △ Less

Submitted 22 July, 2023; v1 submitted 23 April, 2023; originally announced April 2023.

arXiv:2304.03433 [pdf, other]

Multi-User Cooperation for Covert Communication Under Quasi-Static Fading

Authors: Jinyoung Lee, Duc Trung Dinh, Hyeonsik Yeom, Si-Hyeon Lee, Jeongseok Ha

Abstract: This work studies a covert communication scheme for an uplink multi-user scenario in which some users are opportunistically selected to help a covert user. In particular, the selected users emit interfering signals via an orthogonal resource dedicated to the covert user together with signals for their own communications using orthogonal resources allocated to the selected users, which helps the co… ▽ More This work studies a covert communication scheme for an uplink multi-user scenario in which some users are opportunistically selected to help a covert user. In particular, the selected users emit interfering signals via an orthogonal resource dedicated to the covert user together with signals for their own communications using orthogonal resources allocated to the selected users, which helps the covert user hide the presence of the covert communication. For the covert communication scheme, we carry out extensive analysis and find system parameters in closed forms. The analytic derivation for the system parameters allow one to find the optimal combination of system parameters by performing a simple one-dimensional search. In addition, the analytic results elucidate relations among the system parameters. In particular, it will be proved that the optimal strategy for the non-covert users is an on-off scheme with equal transmit power. The theoretical results derived in this work are confirmed by comparing them with numerical results obtained with exhaustive searches. Finally, we demonstrate that the results of work can be utilized in versatile ways by demonstrating a design of covert communication with energy efficiency into account. △ Less

Submitted 10 April, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: 13 pages, 8 figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2302.11426 [pdf, other]

Mining compact high utility sequential patterns

Authors: Tai Dinh, Philippe Fournier-Viger, Huynh Van Hong

Abstract: High utility sequential pattern mining (HUSPM) aims to mine all patterns that yield a high utility (profit) in a sequence dataset. HUSPM is useful for several applications such as market basket analysis, marketing, and website clickstream analysis. In these applications, users may also consider high utility patterns frequently appearing in the dataset to obtain more fruitful information. However,… ▽ More High utility sequential pattern mining (HUSPM) aims to mine all patterns that yield a high utility (profit) in a sequence dataset. HUSPM is useful for several applications such as market basket analysis, marketing, and website clickstream analysis. In these applications, users may also consider high utility patterns frequently appearing in the dataset to obtain more fruitful information. However, this task is high computation since algorithms may generate a combinatorial explosive number of candidates that may be redundant or of low importance. To reduce complexity and obtain a compact set of frequent high utility sequential patterns (FHUSPs), this paper proposes an algorithm named CHUSP for mining closed frequent high utility sequential patterns (CHUSPs). Such patterns keep a concise representation while preserving the same expressive power of the complete set of FHUSPs. The proposed algorithm relies on a CHUS data structure to maintain information during mining. It uses three pruning strategies to eliminate early low-utility and non-frequent patterns, thereby reducing the search space. An extensive experimental evaluation was performed on six real-life datasets to evaluate the performance of CHUSP in terms of execution time, memory usage, and the number of generated patterns. Experimental results show that CHUSP can efficiently discover the compact set of CHUSPs under different user-defined thresholds. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: Nippon (Japan) Applied Informatics Society Journal

arXiv:2302.05512 [pdf]

Composable Ledgers for Distributed Synchronic Web Archiving

Authors: Thien-Nam Dinh, Nicholas Pattengale

Abstract: The Synchronic Web is a highly scalable notary infrastructure that provides tamper-evident data provenance for historical web data. In this document, we describe the applicability of this infrastructure for web archiving across three envisioned stages of adoption. We codify the core mechanism enabling the value proposition: a procedure for splitting and merging cryptographic information fluidly ac… ▽ More The Synchronic Web is a highly scalable notary infrastructure that provides tamper-evident data provenance for historical web data. In this document, we describe the applicability of this infrastructure for web archiving across three envisioned stages of adoption. We codify the core mechanism enabling the value proposition: a procedure for splitting and merging cryptographic information fluidly across blockchain-backed ledgers. Finally, we present preliminary performance results that indicate the feasibility of our approach for modern web archiving scales. △ Less

Submitted 10 February, 2023; originally announced February 2023.

arXiv:2301.10733 [pdf]

The Synchronic Web

Authors: Thien-Nam Dinh, Nicholas Pattengale, Steven Elliott

Abstract: The Synchronic Web is a distributed network for securing data provenance on the World Wide Web. By enabling clients around the world to freely commit digital information into a single shared view of history, it provides a foundational basis of truth on which to build decentralized and scalable trust across the Internet. Its core cryptographical capability allows mutually distrusting parties to cre… ▽ More The Synchronic Web is a distributed network for securing data provenance on the World Wide Web. By enabling clients around the world to freely commit digital information into a single shared view of history, it provides a foundational basis of truth on which to build decentralized and scalable trust across the Internet. Its core cryptographical capability allows mutually distrusting parties to create and verify statements of the following form: "I commit to this information--and only this information--at this moment in time." The backbone of the Synchronic Web infrastructure is a simple, small, and semantic-free blockchain that is accessible to any Internet-enabled entity. The infrastructure is maintained by a permissioned network of well-known servers, called notaries, and accessed by a permissionless group of clients, called ledgers. Through an evolving stack of flexible and composable semantic specifications, the parties cooperate to generate synchronic commitments over arbitrary data. When integrated with existing infrastructures, adapted to diverse domains, and scaled across the breadth of cyberspace, the Synchronic Web provides a ubiquitous mechanism to lock the world's data into unique points in discrete time and digital space. △ Less

Submitted 10 June, 2024; v1 submitted 25 January, 2023; originally announced January 2023.

arXiv:2212.10723 [pdf, other]

Comparison and Evaluation of Methods for a Predict+Optimize Problem in Renewable Energy

Authors: Christoph Bergmeir, Frits de Nijs, Abishek Sriramulu, Mahdi Abolghasemi, Richard Bean, John Betts, Quang Bui, Nam Trong Dinh, Nils Einecke, Rasul Esmaeilbeigi, Scott Ferraro, Priya Galketiya, Evgenii Genov, Robert Glasgow, Rakshitha Godahewa, Yanfei Kang, Steffen Limmer, Luis Magdalena, Pablo Montero-Manso, Daniel Peralta, Yogesh Pipada Sunil Kumar, Alejandro Rosales-Pérez, Julian Ruddick, Akylas Stratigakos, Peter Stuckey , et al. (3 additional authors not shown)

Abstract: Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends o… ▽ More Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2212.00981 [pdf, other]

QC-StyleGAN -- Quality Controllable Image Generation and Manipulation

Authors: Dat Viet Thanh Nguyen, Phong Tran The, Tan M. Dinh, Cuong Pham, Anh Tuan Tran

Abstract: The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN stru… ▽ More The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation. The code is available at https://github.com/VinAIResearch/QC-StyleGAN. △ Less

Submitted 7 December, 2022; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: Accepted to NeurIPS 2022; The code is available at https://github.com/VinAIResearch/QC-StyleGAN

arXiv:2211.11001 [pdf]

F2SD: A dataset for end-to-end group detection algorithms

Authors: Giang Hoang, Tuan Nguyen Dinh, Tung Cao Hoang, Son Le Duy, Keisuke Hihara, Yumeka Utada, Akihiko Torii, Naoki Izumi, Long Tran Quoc

Abstract: The lack of large-scale datasets has been impeding the advance of deep learning approaches to the problem of F-formation detection. Moreover, most research works on this problem rely on input sensor signals of object location and orientation rather than image signals. To address this, we develop a new, large-scale dataset of simulated images for F-formation detection, called F-formation Simulation… ▽ More The lack of large-scale datasets has been impeding the advance of deep learning approaches to the problem of F-formation detection. Moreover, most research works on this problem rely on input sensor signals of object location and orientation rather than image signals. To address this, we develop a new, large-scale dataset of simulated images for F-formation detection, called F-formation Simulation Dataset (F2SD). F2SD contains nearly 60,000 images simulated from GTA-5, with bounding boxes and orientation information on images, making it useful for a wide variety of modelling approaches. It is also closer to practical scenarios, where three-dimensional location and orientation information are costly to record. It is challenging to construct such a large-scale simulated dataset while keeping it realistic. Furthermore, the available research utilizes conventional methods to detect groups. They do not detect groups directly from the image. In this work, we propose (1) a large-scale simulation dataset F2SD and a pipeline for F-formation simulation, (2) a first-ever end-to-end baseline model for the task, and experiments on our simulation dataset. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Comments: Accepted at ICMV 2022

arXiv:2210.12990 [pdf, other]

Optimal activity and battery scheduling algorithm using load and solar generation forecasts

Authors: Yogesh Pipada Sunil Kumar, Rui Yuan, Nam Trong Dinh, S. Ali Pourmousavi

Abstract: Energy usage optimal scheduling has attracted great attention in the power system community, where various methodologies have been proposed. However, in real-world applications, the optimal scheduling problems require reliable energy forecasting, which is scarcely discussed as a joint solution to the scheduling problem. The 5\textsuperscript{th} IEEE Computational Intelligence Society (IEEE-CIS) c… ▽ More Energy usage optimal scheduling has attracted great attention in the power system community, where various methodologies have been proposed. However, in real-world applications, the optimal scheduling problems require reliable energy forecasting, which is scarcely discussed as a joint solution to the scheduling problem. The 5\textsuperscript{th} IEEE Computational Intelligence Society (IEEE-CIS) competition raised a practical problem of decreasing the electricity bill by scheduling building activities, where forecasting the solar energy generation and building consumption is a necessity. To solve this problem, we propose a technical sequence for tackling the solar PV and demand forecast and optimal scheduling problems, where solar generation prediction methods and an optimal university lectures scheduling algorithm are proposed. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 6 pages, 4 figures, 3 tables. Accepted for IEEE proceedings as a conference paper for AUPEC 2022

arXiv:2210.11702 [pdf, other]

TAP: Transparent and Privacy-Preserving Data Services

Authors: Daniel Reijsbergen, Aung Maw, Zheng Yang, Tien Tuan Anh Dinh, Jianying Zhou

Abstract: Users today expect more security from services that handle their data. In addition to traditional data privacy and integrity requirements, they expect transparency, i.e., that the service's processing of the data is verifiable by users and trusted auditors. Our goal is to build a multi-user system that provides data privacy, integrity, and transparency for a large number of operations, while achie… ▽ More Users today expect more security from services that handle their data. In addition to traditional data privacy and integrity requirements, they expect transparency, i.e., that the service's processing of the data is verifiable by users and trusted auditors. Our goal is to build a multi-user system that provides data privacy, integrity, and transparency for a large number of operations, while achieving practical performance. To this end, we first identify the limitations of existing approaches that use authenticated data structures. We find that they fall into two categories: 1) those that hide each user's data from other users, but have a limited range of verifiable operations (e.g., CONIKS, Merkle2, and Proofs of Liabilities), and 2) those that support a wide range of verifiable operations, but make all data publicly visible (e.g., IntegriDB and FalconDB). We then present TAP to address the above limitations. The key component of TAP is a novel tree data structure that supports efficient result verification, and relies on independent audits that use zero-knowledge range proofs to show that the tree is constructed correctly without revealing user data. TAP supports a broad range of verifiable operations, including quantiles and sample standard deviations. We conduct a comprehensive evaluation of TAP, and compare it against two state-of-the-art baselines, namely IntegriDB and Merkle2, showing that the system is practical at scale. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: Accepted for USENIX Security 2023

arXiv:2210.06042 [pdf, ps, other]

Efficient Hamiltonian Reduction for Quantum Annealing on SatCom Beam Placement Problem

Authors: Thinh Q. Dinh, Son Hoang Dau, Eva Lagunas, Symeon Chatzinotas

Abstract: Beam Placement (BP) is a well-known problem in Low-Earth Orbit (LEO) satellite communication (SatCom) systems, which can be modelled as an NP-hard clique cover problem. Recently, quantum computing has emerged as a novel technology which revolutionizes how to solve challenging optimization problems by formulating Quadratic Unconstrained Binary Optimization (QUBO), then preparing Hamiltonians as inp… ▽ More Beam Placement (BP) is a well-known problem in Low-Earth Orbit (LEO) satellite communication (SatCom) systems, which can be modelled as an NP-hard clique cover problem. Recently, quantum computing has emerged as a novel technology which revolutionizes how to solve challenging optimization problems by formulating Quadratic Unconstrained Binary Optimization (QUBO), then preparing Hamiltonians as inputs for quantum computers. In this paper, we study how to use quantum computing to solve BP problems. However, due to limited hardware resources, existing quantum computers are unable to tackle large optimization spaces. Therefore, we propose an efficient Hamiltonian Reduction method that allows quantum processors to solve large BP instances encountered in LEO systems. We conduct our simulations on real quantum computers (D-Wave Advantage) using a real dataset of vessel locations in the US. Numerical results show that our algorithm outperforms commercialized solutions of D-Wave by allowing existing quantum annealers to solve 17.5 times larger BP instances while maintaining high solution quality. Although quantum computing cannot theoretically overcome the hardness of BP problems, this work contributes early efforts to applying quantum computing in satellite optimization problems, especially applications formulated as clique cover/graph coloring problems. △ Less

Submitted 12 October, 2022; originally announced October 2022.

arXiv:2208.04613 [pdf]

Res-Dense Net for 3D Covid Chest CT-scan classification

Authors: Quoc-Huy Trinh, Minh-Van Nguyen, Thien-Phuc Nguyen Dinh

Abstract: One of the most contentious areas of research in Medical Image Preprocessing is 3D CT-scan. With the rapid spread of COVID-19, the function of CT-scan in properly and swiftly diagnosing the disease has become critical. It has a positive impact on infection prevention. There are many tasks to diagnose the illness through CT-scan images, include COVID-19. In this paper, we propose a method that usin… ▽ More One of the most contentious areas of research in Medical Image Preprocessing is 3D CT-scan. With the rapid spread of COVID-19, the function of CT-scan in properly and swiftly diagnosing the disease has become critical. It has a positive impact on infection prevention. There are many tasks to diagnose the illness through CT-scan images, include COVID-19. In this paper, we propose a method that using a Stacking Deep Neural Network to detect the Covid 19 through the series of 3D CT-scans images . In our method, we experiment with two backbones are DenseNet 121 and ResNet 101. This method achieves a competitive performance on some evaluation metrics △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2106.07524 by other authors

arXiv:2208.04609 [pdf, other]

E2EG: End-to-End Node Classification Using Graph Topology and Text-based Node Attributes

Authors: Tu Anh Dinh, Jeroen den Boef, Joran Cornelisse, Paul Groth

Abstract: Node classification utilizing text-based node attributes has many real-world applications, ranging from prediction of paper topics in academic citation graphs to classification of user characteristics in social media networks. State-of-the-art node classification frameworks, such as GIANT, use a two-stage pipeline: first embedding the text attributes of graph nodes then feeding the resulting embed… ▽ More Node classification utilizing text-based node attributes has many real-world applications, ranging from prediction of paper topics in academic citation graphs to classification of user characteristics in social media networks. State-of-the-art node classification frameworks, such as GIANT, use a two-stage pipeline: first embedding the text attributes of graph nodes then feeding the resulting embeddings into a node classification model. In this paper, we eliminate these two stages and develop an end-to-end node classification model that builds upon GIANT, called End-to-End-GIANT (E2EG). The tandem utilization of a main and an auxiliary classification objectives in our approach results in a more robust model, enabling the BERT backbone to be switched out for a distilled encoder with a 25% - 40% reduction in the number of parameters. Moreover, the model's end-to-end nature increases ease of use, as it avoids the need of chaining multiple models for node classification. Compared to a GIANT+MLP baseline on the ogbn-arxiv and ogbn-products datasets, E2EG obtains slightly better accuracy in the transductive setting (+0.5%), while reducing model training time by up to 40%. Our model is also applicable in the inductive setting, outperforming GIANT+MLP by up to +2.23%. △ Less

Submitted 26 September, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted to MLoG - IEEE International Conference on Data Mining Workshops ICDMW 2023

arXiv:2207.00944 [pdf, other]

GlassDB: An Efficient Verifiable Ledger Database System Through Transparency

Authors: Cong Yue, Tien Tuan Anh Dinh, Zhongle Xie, Meihui Zhang, Gang Chen, Beng Chin Ooi, Xiaokui Xiao

Abstract: Verifiable ledger databases protect data history against malicious tampering. Existing systems, such as blockchains and certificate transparency, are based on transparency logs -- a simple abstraction allowing users to verify that a log maintained by an untrusted server is append-only. They expose a simple key-value interface. Building a practical database from transparency logs, on the other hand… ▽ More Verifiable ledger databases protect data history against malicious tampering. Existing systems, such as blockchains and certificate transparency, are based on transparency logs -- a simple abstraction allowing users to verify that a log maintained by an untrusted server is append-only. They expose a simple key-value interface. Building a practical database from transparency logs, on the other hand, remains a challenge. In this paper, we explore the design space of verifiable ledger databases along three dimensions: abstraction, threat model, and performance. We survey existing systems and identify their two limitations, namely, the lack of transaction support and the inferior efficiency. We then present GlassDB, a distributed database that addresses these limitations under a practical threat model. GlassDB inherits the verifiability of transparency logs, but supports transactions and offers high performance. It extends a ledger-like key-value store with a data structure for efficient proofs, and adds a concurrency control mechanism for transactions. GlassDB batches independent operations from concurrent transactions when updating the core data structures. In addition, we design a new benchmark for evaluating verifiable ledger databases, by extending YCSB and TPC-C benchmarks. Using this benchmark, we compare GlassDB against four baselines: reimplemented versions of three verifiable databases, and a verifiable map backed by a transparency log. Experimental results demonstrate that GlassDB is an efficient, transactional, and verifiable ledger database. △ Less

Submitted 19 February, 2023; v1 submitted 2 July, 2022; originally announced July 2022.

arXiv:2206.06565 [pdf, other]

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

Authors: Tuan Dinh, Yuchen Zeng, Ruisu Zhang, Ziqian Lin, Michael Gira, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

Abstract: Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding… ▽ More Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding layer with an image patch embedding layer, the word token output layer with a 10-way output layer, and the word prediction loss with a 10-way classification loss, respectively. A natural question arises: Can LM fine-tuning solve non-language downstream tasks without changing the model architecture or loss function? To answer this, we propose Language-Interfaced Fine-Tuning (LIFT) and study its efficacy and limitations by conducting an extensive empirical study on a suite of non-language classification and regression tasks. LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling "no-code machine learning with LMs." We find that LIFT performs comparably well across a wide range of low-dimensional classification and regression tasks, matching the performances of the best baselines in many cases, especially for the classification tasks. We also report experimental results on the fundamental properties of LIFT, including inductive bias, robustness, and sample complexity. We also analyze the effect of pretraining on LIFT and a few properties/techniques specific to LIFT, e.g., context-aware learning via appropriate prompting, calibrated predictions, data generation, and two-stage fine-tuning. Our code is available at https://github.com/UW-Madison-Lee-Lab/LanguageInterfacedFineTuning. △ Less

Submitted 30 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2206.01432 [pdf, other]

On the Generalization of Wasserstein Robust Federated Learning

Authors: Tung-Anh Nguyen, Tuan Dung Nguyen, Long Tan Le, Canh T. Dinh, Nguyen H. Tran

Abstract: In federated learning, participating clients typically possess non-i.i.d. data, posing a significant challenge to generalization to unseen distributions. To address this, we propose a Wasserstein distributionally robust optimization scheme called WAFL. Leveraging its duality, we frame WAFL as an empirical surrogate risk minimization problem, and solve it using a local SGD-based algorithm with conv… ▽ More In federated learning, participating clients typically possess non-i.i.d. data, posing a significant challenge to generalization to unseen distributions. To address this, we propose a Wasserstein distributionally robust optimization scheme called WAFL. Leveraging its duality, we frame WAFL as an empirical surrogate risk minimization problem, and solve it using a local SGD-based algorithm with convergence guarantees. We show that the robustness of WAFL is more general than related approaches, and the generalization bound is robust to all adversarial distributions inside the Wasserstein ball (ambiguity set). Since the center location and radius of the Wasserstein ball can be suitably modified, WAFL shows its applicability not only in robustness but also in domain adaptation. Through empirical evaluation, we demonstrate that WAFL generalizes better than the vanilla FedAvg in non-i.i.d. settings, and is more robust than other related methods in distribution shift settings. Further, using benchmark datasets we show that WAFL is capable of generalizing to unseen target domains. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2205.11616 [pdf, other]

Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Authors: Tuan Dinh, Jy-yong Sohn, Shashank Rajput, Timothy Ossowski, Yifei Ming, Junjie Hu, Dimitris Papailiopoulos, Kangwook Lee

Abstract: Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods. Recent findings have shown that the accuracy and robustness of unsupervised word translation (UWT) can be improved by making use of visual observations, which are universal representations across languages. In this work, we investigate the potential of using not only visual observations b… ▽ More Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods. Recent findings have shown that the accuracy and robustness of unsupervised word translation (UWT) can be improved by making use of visual observations, which are universal representations across languages. In this work, we investigate the potential of using not only visual observations but also pretrained language-image models for enabling a more efficient and robust UWT. Specifically, we develop a novel UWT method dubbed Word Alignment using Language-Image Pretraining (WALIP), which leverages visual observations via the shared embedding space of images and texts provided by CLIP models (Radford et al., 2021). WALIP has a two-step procedure. First, we retrieve word pairs with high confidences of similarity, computed using our proposed image-based fingerprints, which define the initial pivot for the word alignment. Second, we apply our robust Procrustes algorithm to estimate the linear mapping between two embedding spaces, which iteratively corrects and refines the estimated alignment. Our extensive experiments show that WALIP improves upon the state-of-the-art performance of bilingual word alignment for a few language pairs across different word embeddings and displays great robustness to the dissimilarity of language pairs or training corpora for two word embeddings. △ Less

Submitted 7 November, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings)

arXiv:2205.11469 [pdf]

Advanced Transient Diagnostic with Ensemble Digital Twin Modeling

Authors: Edward Chen, Linyu Lin, Nam T. Dinh

Abstract: The use of machine learning (ML) model as digital-twins for reduced-order-modeling (ROM) in lieu of system codes has grown traction over the past few years. However, due to the complex and non-linear nature of nuclear reactor transients as well as the large range of tasks required, it is infeasible for a single ML model to generalize across all tasks. In this paper, we incorporate issue specific d… ▽ More The use of machine learning (ML) model as digital-twins for reduced-order-modeling (ROM) in lieu of system codes has grown traction over the past few years. However, due to the complex and non-linear nature of nuclear reactor transients as well as the large range of tasks required, it is infeasible for a single ML model to generalize across all tasks. In this paper, we incorporate issue specific digital-twin ML models with ensembles to enhance the prediction outcome. The ensemble also utilizes an indirect probabilistic tracking method of surrogate state variables to produce accurate predictions of unobservable safety goals. The unique method named Ensemble Diagnostic Digital-twin Modeling (EDDM) can select not only the most appropriate predictions from the incorporated diagnostic digital-twin models but can also reduce generalization error associated with training as opposed to single models. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: 9 pages, 4 figures, 3 tables, presented in the American Nuclear Society Mathematics and Computation 2021 Annual Conference

MSC Class: 93C57

arXiv:2205.11015 [pdf, other]

Practical Considerations in Repairing Reed-Solomon Codes

Authors: Thi Xinh Dinh, Luu Y Nhi Nguyen, Lakshmi J. Mohan, Serdar Boztas, Tran Thi Luong, Son Hoang Dau

Abstract: The issue of repairing Reed-Solomon codes currently employed in industry has been sporadically discussed in the literature. In this work we carry out a systematic study of these codes and investigate important aspects of repairing them under the trace repair framework, including which evaluation points to select and how to implement a trace repair scheme efficiently. In particular, we employ diffe… ▽ More The issue of repairing Reed-Solomon codes currently employed in industry has been sporadically discussed in the literature. In this work we carry out a systematic study of these codes and investigate important aspects of repairing them under the trace repair framework, including which evaluation points to select and how to implement a trace repair scheme efficiently. In particular, we employ different heuristic algorithms to search for low-bandwidth repair schemes for codes of short lengths with typical redundancies and establish three tables of current best repair schemes for $[n, k]$ Reed-Solomon codes over GF(256) with $4 \leq n \leq 16$ and $r = n - k \in \{2,3,4\}$. The tables cover most known codes currently used in the distributed storage industry. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Comments: 6 pages, accepted to the IEEE International Symposium on Information Theory

MSC Class: 94B05; 94B60 ACM Class: E.4

arXiv:2205.06941 [pdf, ps, other]

Blockchain Goes Green? Part II: Characterizing the Performance and Cost of Blockchains on the Cloud and at the Edge

Authors: Dumitrel Loghin, Tien Tuan Anh Dinh, Aung Maw, Chen Gang, Yong Meng Teo, Beng Chin Ooi

Abstract: While state-of-the-art permissioned blockchains can achieve thousands of transactions per second on commodity hardware with x86/64 architecture, their performance when running on different architectures is not clear. The goal of this work is to characterize the performance and cost of permissioned blockchains on different hardware systems, which is important as diverse application domains are adop… ▽ More While state-of-the-art permissioned blockchains can achieve thousands of transactions per second on commodity hardware with x86/64 architecture, their performance when running on different architectures is not clear. The goal of this work is to characterize the performance and cost of permissioned blockchains on different hardware systems, which is important as diverse application domains are adopting t. To this end, we conduct extensive cost and performance evaluation of two permissioned blockchains, namely Hyperledger Fabric and ConsenSys Quorum, on five different types of hardware covering both x86/64 and ARM architecture, as well as, both cloud and edge computing. The hardware nodes include servers with Intel Xeon CPU, servers with ARM-based Amazon Graviton CPU, and edge devices with ARM-based CPU. Our results reveal a diverse profile of the two blockchains across different settings, demonstrating the impact of hardware choices on the overall performance and cost. We find that Graviton servers outperform Xeon servers in many settings, due to their powerful CPU and high memory bandwidth. Edge devices with ARM architecture, on the other hand, exhibit low performance. When comparing the cloud with the edge, we show that the cost of the latter is much smaller in the long run if manpower cost is not considered. △ Less

Submitted 13 May, 2022; originally announced May 2022.

Comments: 13 pages, 10 figures, 3 tables

arXiv:2205.05611 [pdf, other]

Blockchain-based Secure Client Selection in Federated Learning

Authors: Truc Nguyen, Phuc Thai, Tre' R. Jeter, Thang N. Dinh, My T. Thai

Abstract: Despite the great potential of Federated Learning (FL) in large-scale distributed learning, the current system is still subject to several privacy issues due to the fact that local models trained by clients are exposed to the central server. Consequently, secure aggregation protocols for FL have been developed to conceal the local models from the server. However, we show that, by manipulating the… ▽ More Despite the great potential of Federated Learning (FL) in large-scale distributed learning, the current system is still subject to several privacy issues due to the fact that local models trained by clients are exposed to the central server. Consequently, secure aggregation protocols for FL have been developed to conceal the local models from the server. However, we show that, by manipulating the client selection process, the server can circumvent the secure aggregation to learn the local models of a victim client, indicating that secure aggregation alone is inadequate for privacy protection. To tackle this issue, we leverage blockchain technology to propose a verifiable client selection protocol. Owing to the immutability and transparency of blockchain, our proposed protocol enforces a random selection of clients, making the server unable to control the selection process at its discretion. We present security proofs showing that our protocol is secure against this attack. Additionally, we conduct several experiments on an Ethereum-like blockchain to demonstrate the feasibility and practicality of our solution. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: IEEE ICBC 2022

arXiv:2205.05577 [pdf, ps, other]

Channel Estimation in RIS-assisted Downlink Massive MIMO: A Learning-Based Approach

Authors: Tung T. Vu, Trinh Van Chien, Canh T. Dinh, Hien Quoc Ngo, Michail Matthaiou

Abstract: For downlink massive multiple-input multiple-output (MIMO) operating in time-division duplex protocol, users can decode the signals effectively by only utilizing the channel statistics as long as channel hardening holds. However, in a reconfigurable intelligent surface (RIS)-assisted massive MIMO system, the propagation channels may be less hardened due to the extra random fluctuations of the effe… ▽ More For downlink massive multiple-input multiple-output (MIMO) operating in time-division duplex protocol, users can decode the signals effectively by only utilizing the channel statistics as long as channel hardening holds. However, in a reconfigurable intelligent surface (RIS)-assisted massive MIMO system, the propagation channels may be less hardened due to the extra random fluctuations of the effective channel gains. To address this issue, we propose a learning-based method that trains a neural network to learn a mapping between the received downlink signal and the effective channel gains. The proposed method does not require any downlink pilots and statistical information of interfering users. Numerical results show that, in terms of mean-square error of the channel estimation, our proposed learning-based method outperforms the state-of-the-art methods, especially when the light-of-sight (LoS) paths are dominated by non-LoS paths with a low level of channel hardening, e.g., in the cases of small numbers of RIS elements and/or base station antennas. △ Less

Submitted 15 May, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

Comments: accepted to appear in IEEE SPAWC'22, Oulu, Finland

arXiv:2205.05004 [pdf, other]

FastHare: Fast Hamiltonian Reduction for Large-scale Quantum Annealing

Authors: Phuc Thai, My T. Thai, Tam Vu, Thang N. Dinh

Abstract: Quantum annealing (QA) that encodes optimization problems into Hamiltonians remains the only near-term quantum computing paradigm that provides sufficient many qubits for real-world applications. To fit larger optimization instances on existing quantum annealers, reducing Hamiltonians into smaller equivalent Hamiltonians provides a promising approach. Unfortunately, existing reduction techniques a… ▽ More Quantum annealing (QA) that encodes optimization problems into Hamiltonians remains the only near-term quantum computing paradigm that provides sufficient many qubits for real-world applications. To fit larger optimization instances on existing quantum annealers, reducing Hamiltonians into smaller equivalent Hamiltonians provides a promising approach. Unfortunately, existing reduction techniques are either computationally expensive or ineffective in practice. To this end, we introduce a novel notion of non-separable~group, defined as a subset of qubits in a Hamiltonian that obtains the same value in optimal solutions. We develop a theoretical framework on non-separability accordingly and propose FastHare, a highly efficient reduction method. FastHare, iteratively, detects and merges non-separable groups into single qubits. It does so within a provable worst-case time complexity of only $O(αあるふぁn^2)$, for some user-defined parameter $αあるふぁ$. Our extensive benchmarks for the feasibility of the reduction are done on both synthetic Hamiltonians and 3000+ instances from the MQLIB library. The results show FastHare outperforms the roof duality, the implemented reduction method in D-Wave's SDK library, with 3.6x higher average reduction ratio. It demonstrates a high level of effectiveness with an average of 62% qubits saving and 0.3s processing time, advocating for Hamiltonian reduction as an inexpensive necessity for QA. △ Less

Submitted 10 May, 2022; originally announced May 2022.

arXiv:2205.00185 [pdf, other]

Protecting the Integrity of IoT Sensor Data and Firmware With A Feather-Light Blockchain Infrastructure

Authors: Daniel Reijsbergen, Aung Maw, Sarad Venugopalan, Dianshi Yang, Tien Tuan Anh Dinh, Jianying Zhou

Abstract: Smart cities deploy large numbers of sensors and collect a tremendous amount of data from them. For example, Advanced Metering Infrastructures (AMIs), which consist of physical meters that collect usage data about public utilities such as power and water, are an important building block in a smart city. In a typical sensor network, the measurement devices are connected through a computer network,… ▽ More Smart cities deploy large numbers of sensors and collect a tremendous amount of data from them. For example, Advanced Metering Infrastructures (AMIs), which consist of physical meters that collect usage data about public utilities such as power and water, are an important building block in a smart city. In a typical sensor network, the measurement devices are connected through a computer network, which exposes them to cyber attacks. Furthermore, the data is centrally managed at the operator's servers, making it vulnerable to insider threats. Our goal is to protect the integrity of data collected by large-scale sensor networks and the firmware in measurement devices from cyber attacks and insider threats. To this end, we first develop a comprehensive threat model for attacks against data and firmware integrity, which can target any of the stakeholders in the operation of the sensor network. Next, we use our threat model to analyze existing defense mechanisms, including signature checks, remote firmware attestation, anomaly detection, and blockchain-based secure logs. However, the large size of the Trusted Computing Base and a lack of scalability limit the applicability of these existing mechanisms. We propose the Feather-Light Blockchain Infrastructure (FLBI) framework to address these limitations. Our framework leverages a two-layer architecture and cryptographic threshold signature chains to support large networks of low-capacity devices such as meters and data aggregators. We have fully implemented the FLBI's end-to-end functionality on the Hyperledger Fabric and private Ethereum blockchain platforms. Our experiments show that the FLBI is able to support millions of end devices. △ Less

Submitted 30 April, 2022; originally announced May 2022.

arXiv:2203.01746 [pdf, other]

SaPHyRa: A Learning Theory Approach to Ranking Nodes in Large Networks

Authors: Phuc Thai, My T. Thai, Tam Vu, Thang N. Dinh

Abstract: Ranking nodes based on their centrality stands a fundamental, yet, challenging problem in large-scale networks. Approximate methods can quickly estimate nodes' centrality and identify the most central nodes, but the ranking for the majority of remaining nodes may be meaningless. For example, ranking for less-known websites in search queries is known to be noisy and unstable. To this end, we invest… ▽ More Ranking nodes based on their centrality stands a fundamental, yet, challenging problem in large-scale networks. Approximate methods can quickly estimate nodes' centrality and identify the most central nodes, but the ranking for the majority of remaining nodes may be meaningless. For example, ranking for less-known websites in search queries is known to be noisy and unstable. To this end, we investigate a new node ranking problem with two important distinctions: a) ranking quality, rather than the centrality estimation quality, as the primary objective; and b) ranking only nodes of interest, e.g., websites that matched search criteria. We propose Sample space Partitioning Hypothesis Ranking, or SaPHyRa, that transforms node ranking into a hypothesis ranking in machine learning. This transformation maps nodes' centrality to the expected risks of hypotheses, opening doors for theoretical machine learning (ML) tools. The key of SaPHyRa is to partition the sample space into exact and approximate subspaces. The exact subspace contains samples related to the nodes of interest, increasing both estimation and ranking qualities. The approximate space can be efficiently sampled with ML-based techniques to provide theoretical guarantees on the estimation error. Lastly, we present SaPHyRa_bc, an illustration of SaPHyRa on ranking nodes' betweenness centrality (BC). By combining a novel bi-component sampling, a 2-hop sample partitioning, and improved bounds on the Vapnik-Chervonenkis dimension, SaPHyRa_bc can effectively rank any node subset in BC. Its performance is up to 200x faster than state-of-the-art methods in approximating BC, while its rank correlation to the ground truth is improved by multifold. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: To appear in IEEE ICDE'22

arXiv:2202.04345 [pdf, other]

Securing Smart Grids Through an Incentive Mechanism for Blockchain-Based Data Sharing

Authors: Daniel Reijsbergen, Aung Maw, Tien Tuan Anh Dinh, Wen-Tai Li, Chau Yuen

Abstract: Smart grids leverage the data collected from smart meters to make important operational decisions. However, they are vulnerable to False Data Injection (FDI) attacks in which an attacker manipulates meter data to disrupt the grid operations. Existing works on FDI are based on a simple threat model in which a single grid operator has access to all the data, and only some meters can be compromised.… ▽ More Smart grids leverage the data collected from smart meters to make important operational decisions. However, they are vulnerable to False Data Injection (FDI) attacks in which an attacker manipulates meter data to disrupt the grid operations. Existing works on FDI are based on a simple threat model in which a single grid operator has access to all the data, and only some meters can be compromised. Our goal is to secure smart grids against FDI under a realistic threat model. To this end, we present a threat model in which there are multiple operators, each with a partial view of the grid, and each can be fully compromised. An effective defense against FDI in this setting is to share data between the operators. However, the main challenge here is to incentivize data sharing. We address this by proposing an incentive mechanism that rewards operators for uploading data, but penalizes them if the data is missing or anomalous. We derive formal conditions under which our incentive mechanism is provably secure against operators who withhold or distort measurement data for profit. We then implement the data sharing solution on a private blockchain, introducing several optimizations that overcome the inherent performance limitations of the blockchain. Finally, we conduct an experimental evaluation that demonstrates that our implementation has practical performance. △ Less

Submitted 9 February, 2022; originally announced February 2022.

arXiv:2201.11172 [pdf, other]

doi 10.1109/ICASSP43922.2022.9746815

Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques

Authors: Tu Anh Dinh, Danni Liu, Jan Niehues

Abstract: Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error propagation. However, the approach suffers from data scarcity. It heavily depends on direct ST data and is less efficient in making use of speech transcription and text translation data, which is often more easily available. In the related field of multilingual text translation, several techniques have… ▽ More Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error propagation. However, the approach suffers from data scarcity. It heavily depends on direct ST data and is less efficient in making use of speech transcription and text translation data, which is often more easily available. In the related field of multilingual text translation, several techniques have been proposed for zero-shot translation. A main idea is to increase the similarity of semantically similar sentences in different languages. We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data. We investigate the effects of data augmentation and auxiliary loss function. The techniques were successfully applied to few-shot ST using limited ST data, with improvements of up to +12.9 BLEU points compared to direct end-to-end ST and +3.1 BLEU points compared to ST models fine-tuned from ASR model. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: 6 pages, 5 figures, accepted to IEEE ICASSP 2022. arXiv admin note: text overlap with arXiv:2107.06010

ACM Class: I.2.7

Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6222-6226

arXiv:2201.02692 [pdf, other]

Improved Input Reprogramming for GAN Conditioning

Authors: Tuan Dinh, Daewon Seo, Zhixu Du, Liang Shang, Kangwook Lee

Abstract: We study the GAN conditioning problem, whose goal is to convert a pretrained unconditional GAN into a conditional GAN using labeled data. We first identify and analyze three approaches to this problem -- conditional GAN training from scratch, fine-tuning, and input reprogramming. Our analysis reveals that when the amount of labeled data is small, input reprogramming performs the best. Motivated by… ▽ More We study the GAN conditioning problem, whose goal is to convert a pretrained unconditional GAN into a conditional GAN using labeled data. We first identify and analyze three approaches to this problem -- conditional GAN training from scratch, fine-tuning, and input reprogramming. Our analysis reveals that when the amount of labeled data is small, input reprogramming performs the best. Motivated by real-world scenarios with scarce labeled data, we focus on the input reprogramming approach and carefully analyze the existing algorithm. After identifying a few critical issues of the previous input reprogramming approach, we propose a new algorithm called InRep+. Our algorithm InRep+ addresses the existing issues with the novel uses of invertible neural networks and Positive-Unlabeled (PU) learning. Via extensive experiments, we show that InRep+ outperforms all existing methods, particularly when label information is scarce, noisy, and/or imbalanced. For instance, for the task of conditioning a CIFAR10 GAN with 1% labeled data, InRep+ achieves an average Intra-FID of 76.24, whereas the second-best method achieves 114.51. △ Less

Submitted 7 February, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: 24 pages, 7 figures

arXiv:2112.01398 [pdf, other]

TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation

Authors: Tan M. Dinh, Rang Nguyen, Binh-Son Hua

Abstract: In this paper, we conduct a study on the state-of-the-art methods for text-to-image synthesis and propose a framework to evaluate these methods. We consider syntheses where an image contains a single or multiple objects. Our study outlines several issues in the current evaluation pipeline: (i) for image quality assessment, a commonly used metric, e.g., Inception Score (IS), is often either miscali… ▽ More In this paper, we conduct a study on the state-of-the-art methods for text-to-image synthesis and propose a framework to evaluate these methods. We consider syntheses where an image contains a single or multiple objects. Our study outlines several issues in the current evaluation pipeline: (i) for image quality assessment, a commonly used metric, e.g., Inception Score (IS), is often either miscalibrated for the single-object case or misused for the multi-object case; (ii) for text relevance and object accuracy assessment, there is an overfitting phenomenon in the existing R-precision (RP) and Semantic Object Accuracy (SOA) metrics, respectively; (iii) for multi-object case, many vital factors for evaluation, e.g., object fidelity, positional alignment, counting alignment, are largely dismissed; (iv) the ranking of the methods based on current metrics is highly inconsistent with real images. To overcome these issues, we propose a combined set of existing and new metrics to systematically evaluate the methods. For existing metrics, we offer an improved version of IS named IS* by using temperature scaling to calibrate the confidence of the classifier used by IS; we also propose a solution to mitigate the overfitting issues of RP and SOA. For new metrics, we develop counting alignment, positional alignment, object-centric IS, and object-centric FID metrics for evaluating the multi-object case. We show that benchmarking with our bag of metrics results in a highly consistent ranking among existing methods that is well-aligned with human evaluation. As a by-product, we create AttnGAN++, a simple but strong baseline for the benchmark by stabilizing the training of AttnGAN using spectral normalization. We also release our toolbox, so-called TISE, for advocating fair and consistent evaluation of text-to-image models. △ Less

Submitted 19 July, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

Comments: Accepted to ECCV 2022; TISE toolbox is available at https://github.com/VinAIResearch/tise-toolbox

arXiv:2112.00719 [pdf, other]

HyperInverter: Improving StyleGAN Inversion via Hypernetwork

Authors: Tan M. Dinh, Anh Tuan Tran, Rang Nguyen, Binh-Son Hua

Abstract: Real-world image manipulation has achieved fantastic progress in recent years as a result of the exploration and utilization of GAN latent spaces. GAN inversion is the first step in this pipeline, which aims to map the real image to the latent code faithfully. Unfortunately, the majority of existing GAN inversion methods fail to meet at least one of the three requirements listed below: high recons… ▽ More Real-world image manipulation has achieved fantastic progress in recent years as a result of the exploration and utilization of GAN latent spaces. GAN inversion is the first step in this pipeline, which aims to map the real image to the latent code faithfully. Unfortunately, the majority of existing GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference. We present a novel two-phase strategy in this research that fits all requirements at the same time. In the first phase, we train an encoder to map the input image to StyleGAN2 $\mathcal{W}$-space, which was proven to have excellent editability but lower reconstruction quality. In the second phase, we supplement the reconstruction ability in the initial phase by leveraging a series of hypernetworks to recover the missing information during inversion. These two steps complement each other to yield high reconstruction quality thanks to the hypernetwork branch and excellent editability due to the inversion done in the $\mathcal{W}$-space. Our method is entirely encoder-based, resulting in extremely fast inference. Extensive experiments on two challenging datasets demonstrate the superiority of our method. △ Less

Submitted 4 April, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: Accepted to CVPR 2022; Project page is located at https://di-mi-ta.github.io/HyperInverter/

arXiv:2111.11604 [pdf, other]

Simultaneous face detection and 360 degree headpose estimation

Authors: Hoang Nguyen Viet, Linh Nguyen Viet, Tuan Nguyen Dinh, Duc Tran Minh, Long Tran Quoc

Abstract: With many practical applications in human life, including manufacturing surveillance cameras, analyzing and processing customer behavior, many researchers are noticing face detection and head pose estimation on digital images. A large number of proposed deep learning models have state-of-the-art accuracy such as YOLO, SSD, MTCNN, solving the problem of face detection or HopeNet, FSA-Net, RankPose… ▽ More With many practical applications in human life, including manufacturing surveillance cameras, analyzing and processing customer behavior, many researchers are noticing face detection and head pose estimation on digital images. A large number of proposed deep learning models have state-of-the-art accuracy such as YOLO, SSD, MTCNN, solving the problem of face detection or HopeNet, FSA-Net, RankPose model used for head pose estimation problem. According to many state-of-the-art methods, the pipeline of this task consists of two parts, from face detection to head pose estimation. These two steps are completely independent and do not share information. This makes the model clear in setup but does not leverage most of the featured resources extracted in each model. In this paper, we proposed the Multitask-Net model with the motivation to leverage the features extracted from the face detection model, sharing them with the head pose estimation branch to improve accuracy. Also, with the variety of data, the Euler angle domain representing the face is large, our model can predict with results in the 360 Euler angle domain. Applying the multitask learning method, the Multitask-Net model can simultaneously predict the position and direction of the human head. To increase the ability to predict the head direction of the model, we change there presentation of the human face from the Euler angle to vectors of the Rotation matrix. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: Accepted at The 13th International Conference on Knowledge and Systems Engineering (KSE 2021), 7 pages, 2 figures, 3 tables

arXiv:2111.07039 [pdf, other]

UET-Headpose: A sensor-based top-view head pose dataset

Authors: Linh Nguyen Viet, Tuan Nguyen Dinh, Hoang Nguyen Viet, Duc Tran Minh, Long Tran Quoc

Abstract: Head pose estimation is a challenging task that aims to solve problems related to predicting three dimensions vector, that serves for many applications in human-robot interaction or customer behavior. Previous researches have proposed some precise methods for collecting head pose data. But those methods require either expensive devices like depth cameras or complex laboratory environment setup. In… ▽ More Head pose estimation is a challenging task that aims to solve problems related to predicting three dimensions vector, that serves for many applications in human-robot interaction or customer behavior. Previous researches have proposed some precise methods for collecting head pose data. But those methods require either expensive devices like depth cameras or complex laboratory environment setup. In this research, we introduce a new approach with efficient cost and easy setup to collecting head pose images, namely UET-Headpose dataset, with top-view head pose data. This method uses an absolute orientation sensor instead of Depth cameras to be set up quickly and small cost but still ensure good results. Through experiments, our dataset has been shown the difference between its distribution and available dataset like CMU Panoptic Dataset \cite{CMU}. Besides using the UET-Headpose dataset and other head pose datasets, we also introduce the full-range model called FSANet-Wide, which significantly outperforms head pose estimation results by the UET-Headpose dataset, especially on top-view images. Also, this model is very lightweight and takes small size images. △ Less

Submitted 12 November, 2021; originally announced November 2021.

Showing 1–50 of 123 results for author: Dinh, T