Search | arXiv e-print repository

Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models

Authors: Meiyun Wang, Masahiro Suzuki, Hiroki Sakaji, Kiyoshi Izumi

Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks. Given the high costs of creating annotated datasets for supervised learning, LLMs offer a valuable alternative by enabling effective few-shot in-context learning. However, these models can produce hallucinations, particularly in domains with incomplete knowledge. Additionally, curren… ▽ More Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks. Given the high costs of creating annotated datasets for supervised learning, LLMs offer a valuable alternative by enabling effective few-shot in-context learning. However, these models can produce hallucinations, particularly in domains with incomplete knowledge. Additionally, current methods for knowledge distillation using LLMs often struggle to enhance the effectiveness of both teacher and student models. To address these challenges, we introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models during knowledge distillation. DualChecker employs ContextAligner to ensure that the context provided by teacher models aligns with human labeling standards. It also features a dynamic checker system that enhances model interaction: one component re-prompts teacher models with more detailed content when they show low confidence, and another identifies borderline cases from student models to refine the teaching templates. This interactive process promotes continuous improvement and effective knowledge transfer between the models. We evaluate DualChecker using a green innovation textual dataset that includes binary, multiclass, and token classification tasks. The experimental results show that DualChecker significantly outperforms existing state-of-the-art methods, achieving up to a 17% improvement in F1 score for teacher models and 10% for student models. Notably, student models fine-tuned with LLM predictions perform comparably to those fine-tuned with actual data, even in a challenging domain. We make all datasets, models, and code from this research publicly available. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.08711 [pdf, ps, other]

Weighted Envy-free Allocation with Subsidy

Authors: Haris Aziz, Xin Huang, Kei Kimura, Indrajit Saha, Zhaohong Sun Mashbat Suzuki, Makoto Yokoo

Abstract: We consider the problem of fair allocation with subsidy when agents have weighted entitlements. After highlighting several important differences from the unweighted cases, we present several results concerning weighted envy-freeability including general characterizations, algorithms for achieving and testing weighted envy-freeability, lower and upper bounds for worst case subsidy for non-wasteful… ▽ More We consider the problem of fair allocation with subsidy when agents have weighted entitlements. After highlighting several important differences from the unweighted cases, we present several results concerning weighted envy-freeability including general characterizations, algorithms for achieving and testing weighted envy-freeability, lower and upper bounds for worst case subsidy for non-wasteful and envy-freeable allocations, and algorithms for achieving weighted envy-freeability along with other properties. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 20 pages, 1 Table

arXiv:2407.19391 [pdf, ps, other]

Approval-Based Committee Voting under Uncertainty

Authors: Hariz Aziz, Venkateswara Rao Kagita, Baharak Rastegari, Mashbat Suzuki

Abstract: We study approval-based committee voting in which a target number of candidates are selected based on voters' approval preferences over candidates. In contrast to most of the work, we consider the setting where voters express uncertain approval preferences and explore four different types of uncertain approval preference models. For each model, we study the problems such as computing a committee w… ▽ More We study approval-based committee voting in which a target number of candidates are selected based on voters' approval preferences over candidates. In contrast to most of the work, we consider the setting where voters express uncertain approval preferences and explore four different types of uncertain approval preference models. For each model, we study the problems such as computing a committee with the highest probability of satisfying axioms such as justified representation. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.14727 [pdf, other]

Economy Watchers Survey provides Datasets and Tasks for Japanese Financial Domain

Authors: Masahiro Suzuki, Hiroki Sakaji

Abstract: Many natural language processing (NLP) tasks in English or general domains are widely available and are often used to evaluate pre-trained language models. In contrast, there are fewer tasks available for languages other than English and for the financial domain. In particular, tasks in Japanese and the financial domain are limited. We construct two large datasets using materials published by a Ja… ▽ More Many natural language processing (NLP) tasks in English or general domains are widely available and are often used to evaluate pre-trained language models. In contrast, there are fewer tasks available for languages other than English and for the financial domain. In particular, tasks in Japanese and the financial domain are limited. We construct two large datasets using materials published by a Japanese central government agency. The datasets provide three Japanese financial NLP tasks, which include a 3-class and 12-class classification for categorizing sentences, as well as a 5-class classification task for sentiment analysis. Our datasets are designed to be comprehensive and up-to-date, leveraging an automatic update framework that ensures the latest task datasets are publicly available anytime. △ Less

Submitted 19 July, 2024; originally announced July 2024.

Comments: 10 pages

arXiv:2407.13300 [pdf, other]

Robust ASR Error Correction with Conservative Data Filtering

Authors: Takuma Udagawa, Masayuki Suzuki, Masayasu Muraoka, Gakuto Kurata

Abstract: Error correction (EC) based on large language models is an emerging technology to enhance the performance of automatic speech recognition (ASR) systems. Generally, training data for EC are collected by automatically pairing a large set of ASR hypotheses (as sources) and their gold references (as targets). However, the quality of such pairs is not guaranteed, and we observed various types of noise… ▽ More Error correction (EC) based on large language models is an emerging technology to enhance the performance of automatic speech recognition (ASR) systems. Generally, training data for EC are collected by automatically pairing a large set of ASR hypotheses (as sources) and their gold references (as targets). However, the quality of such pairs is not guaranteed, and we observed various types of noise which can make the EC models brittle, e.g. inducing overcorrection in out-of-domain (OOD) settings. In this work, we propose two fundamental criteria that EC training data should satisfy: namely, EC targets should (1) improve linguistic acceptability over sources and (2) be inferable from the available context (e.g. source phonemes). Through these criteria, we identify low-quality EC pairs and train the models not to make any correction in such cases, the process we refer to as conservative data filtering. In our experiments, we focus on Japanese ASR using a strong Conformer-CTC as the baseline and finetune Japanese LLMs for EC. Through our evaluation on a suite of 21 internal benchmarks, we demonstrate that our approach can significantly reduce overcorrection and improve both the accuracy and quality of ASR results in the challenging OOD settings. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.13171 [pdf, other]

Maximin Fair Allocation of Indivisible Items under Cost Utilities

Authors: Sirin Botan, Angus Ritossa, Mashbat Suzuki, Toby Walsh

Abstract: We study the problem of fairly allocating indivisible goods among a set of agents. Our focus is on the existence of allocations that give each agent their maximin fair share--the value they are guaranteed if they divide the goods into as many bundles as there are agents, and receive their lowest valued bundle. An MMS allocation is one where every agent receives at least their maximin fair share. W… ▽ More We study the problem of fairly allocating indivisible goods among a set of agents. Our focus is on the existence of allocations that give each agent their maximin fair share--the value they are guaranteed if they divide the goods into as many bundles as there are agents, and receive their lowest valued bundle. An MMS allocation is one where every agent receives at least their maximin fair share. We examine the existence of such allocations when agents have cost utilities. In this setting, each item has an associated cost, and an agent's valuation for an item is the cost of the item if it is useful to them, and zero otherwise. Our main results indicate that cost utilities are a promising restriction for achieving MMS. We show that for the case of three agents with cost utilities, an MMS allocation always exists. We also show that when preferences are restricted slightly further--to what we call laminar set approvals--we can guarantee MMS allocations for any number of agents. Finally, we explore if it is possible to guarantee each agent their maximin fair share while using a strategyproof mechanism. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Appeared in SAGT 2023

arXiv:2407.12461 [pdf, ps, other]

Compatibility of Fairness and Nash Welfare under Subadditive Valuations

Authors: Siddharth Barman, Mashbat Suzuki

Abstract: We establish a compatibility between fairness and efficiency, captured via Nash Social Welfare (NSW), under the broad class of subadditive valuations. We prove that, for subadditive valuations, there always exists a partial allocation that is envy-free up to the removal of any good (EFx) and has NSW at least half of the optimal; here, optimality is considered across all allocations, fair or otherw… ▽ More We establish a compatibility between fairness and efficiency, captured via Nash Social Welfare (NSW), under the broad class of subadditive valuations. We prove that, for subadditive valuations, there always exists a partial allocation that is envy-free up to the removal of any good (EFx) and has NSW at least half of the optimal; here, optimality is considered across all allocations, fair or otherwise. We also prove, for subadditive valuations, the universal existence of complete allocations that are envy-free up to one good (EF1) and also achieve a factor $1/2$ approximation to the optimal NSW. Our EF1 result resolves an open question posed by Garg et al. (STOC 2023). In addition, we develop a polynomial-time algorithm which, given an arbitrary allocation Ã as input, returns an EF1 allocation with NSW at least $1/3$ times that of Ã. Therefore, our results imply that the EF1 criterion can be attained simultaneously with a constant-factor approximation to optimal NSW in polynomial time (with demand queries), for subadditive valuations. The previously best-known approximation factor for optimal NSW, under EF1 and among $n$ agents, was $O(n)$ - we improve this bound to $O(1)$. It is known that EF1 and exact Pareto efficiency (PO) are incompatible with subadditive valuations. Complementary to this negative result, the current work shows that we regain compatibility by just considering a factor $1/2$ approximation: EF1 can be achieved in conjunction with $\frac{1}{2}$-PO under subadditive valuations. As such, our results serve as a general tool that can be used as a black box to convert any efficient outcome into a fair one, with only a marginal decrease in efficiency. △ Less

Submitted 23 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.05240 [pdf, other]

Neighborhood Stability in Assignments on Graphs

Authors: Haris Aziz, Grzegorz Lisowski, Mashbat Suzuki, Jeremy Vollen

Abstract: We study the problem of assigning agents to the vertices of a graph such that no pair of neighbors can benefit from swapping assignments -- a property we term neighborhood stability. We further assume that agents' utilities are based solely on their preferences over the assignees of adjacent vertices and that those preferences are binary. Having shown that even this very restricted setting does no… ▽ More We study the problem of assigning agents to the vertices of a graph such that no pair of neighbors can benefit from swapping assignments -- a property we term neighborhood stability. We further assume that agents' utilities are based solely on their preferences over the assignees of adjacent vertices and that those preferences are binary. Having shown that even this very restricted setting does not guarantee neighborhood stable assignments, we focus on special cases that provide such guarantees. We show that when the graph is a cycle or a path, a neighborhood stable assignment always exists for any preference profile. Furthermore, we give a general condition under which neighborhood stable assignments always exist. For each of these results, we give a polynomial-time algorithm to compute a neighborhood stable assignment. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2406.14907 [pdf, other]

Maximum Flow is Fair: A Network Flow Approach to Committee Voting

Authors: Mashbat Suzuki, Jeremy Vollen

Abstract: In the committee voting setting, a subset of $k$ alternatives is selected based on the preferences of voters. In this paper, our goal is to efficiently compute ex-ante fair probability distributions (or lotteries) over committees. Since it is not known whether a lottery satisfying the desirable fairness property of fractional core is polynomial-time computable, we introduce a new axiom called grou… ▽ More In the committee voting setting, a subset of $k$ alternatives is selected based on the preferences of voters. In this paper, our goal is to efficiently compute ex-ante fair probability distributions (or lotteries) over committees. Since it is not known whether a lottery satisfying the desirable fairness property of fractional core is polynomial-time computable, we introduce a new axiom called group resource proportionality (GRP), which strengthens other fairness notions in the literature. We characterize our fairness axiom by a correspondence with max flows on a network formulation of committee voting. Using the connection to flow networks revealed by this characterization, we then introduce voting rules which achieve fairness in conjunction with other desirable properties. The redistributive utilitarian rule satisfies ex-ante efficiency in addition to our fairness axiom. We also give a voting rule which maximizes social welfare subject to fairness by reducing to a minimum-cost maximum-flow problem. Lastly, we show our fairness property can be obtained in tandem with strong ex-post fairness properties -- an approach known as best-of-both-worlds fairness. We strengthen existing best-or-both-worlds fairness results in committee voting and resolve an open question posed by Aziz et al. (2023). These findings follow from an auxiliary result which may prove useful in obtaining best-of-both-worlds type results in future research on committee voting. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: To appear at EC 2024

arXiv:2406.00765 [pdf]

The Embodied World Model Based on LLM with Visual Information and Prediction-Oriented Prompts

Authors: Wakana Haijima, Kou Nakakubo, Masahiro Suzuki, Yutaka Matsuo

Abstract: In recent years, as machine learning, particularly for vision and language understanding, has been improved, research in embedded AI has also evolved. VOYAGER is a well-known LLM-based embodied AI that enables autonomous exploration in the Minecraft world, but it has issues such as underutilization of visual data and insufficient functionality as a world model. In this research, the possibility of… ▽ More In recent years, as machine learning, particularly for vision and language understanding, has been improved, research in embedded AI has also evolved. VOYAGER is a well-known LLM-based embodied AI that enables autonomous exploration in the Minecraft world, but it has issues such as underutilization of visual data and insufficient functionality as a world model. In this research, the possibility of utilizing visual data and the function of LLM as a world model were investigated with the aim of improving the performance of embodied AI. The experimental results revealed that LLM can extract necessary information from visual data, and the utilization of the information improves its performance as a world model. It was also suggested that devised prompts could bring out the LLM's function as a world model. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.01689 [pdf, other]

Investigation on optimal microstructure of dual-phase steel with high strength and ductility by machine learning

Authors: Misato Suzuki, Kazuyuki Shizawa, Mayu Muramatsu

Abstract: In this study, we developed an inverse analysis framework that proposes a microstructure for dual-phase (DP) steel that exhibits high strength and ductility. The inverse analysis method proposed in this study involves repeated random searches on a model that combines a generative adversarial network (GAN), which generates microstructures, and a convolutional neural network (CNN), which predicts th… ▽ More In this study, we developed an inverse analysis framework that proposes a microstructure for dual-phase (DP) steel that exhibits high strength and ductility. The inverse analysis method proposed in this study involves repeated random searches on a model that combines a generative adversarial network (GAN), which generates microstructures, and a convolutional neural network (CNN), which predicts the maximum stress and working limit strain from DP steel microstructures. GAN was trained using images of DP steel microstructures generated by the phase-field method. CNN was trained using images of DP steel microstructures, the maximum stress and the working limit strain calculated by the dislocation-crystal plasticity finite element method. The constructed framework made an efficient search for microstructures possible because of a low-dimensional search space by a latent variable of GAN. The multiple deformation modes were considered in this framework, which allowed the required microstructures to be explored under complex deformation modes. A microstructure with a fine grain size was proposed by using the developed framework. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 27 pages, 23 figures

arXiv:2404.09260 [pdf, other]

JaFIn: Japanese Financial Instruction Dataset

Authors: Kota Tanabe, Masahiro Suzuki, Hiroki Sakaji, Itsuki Noda

Abstract: We construct an instruction dataset for the large language model (LLM) in the Japanese finance domain. Domain adaptation of language models, including LLMs, is receiving more attention as language models become more popular. This study demonstrates the effectiveness of domain adaptation through instruction tuning. To achieve this, we propose an instruction tuning data in Japanese called JaFIn, the… ▽ More We construct an instruction dataset for the large language model (LLM) in the Japanese finance domain. Domain adaptation of language models, including LLMs, is receiving more attention as language models become more popular. This study demonstrates the effectiveness of domain adaptation through instruction tuning. To achieve this, we propose an instruction tuning data in Japanese called JaFIn, the Japanese Financial Instruction Dataset. JaFIn is manually constructed based on multiple data sources, including Japanese government websites, which provide extensive financial knowledge. We then utilize JaFIn to apply instruction tuning for several LLMs, demonstrating that our models specialized in finance have better domain adaptability than the original models. The financial-specialized LLMs created were evaluated using a quantitative Japanese financial benchmark and qualitative response comparisons, showing improved performance over the originals. △ Less

Submitted 19 July, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: 10 pages, 1 figure. The paper is a camera-ready version for the 2024 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr)

arXiv:2404.05198 [pdf, ps, other]

Fair Lotteries for Participatory Budgeting

Authors: Haris Aziz, Xinhang Lu, Mashbat Suzuki, Jeremy Vollen, Toby Walsh

Abstract: In pursuit of participatory budgeting (PB) outcomes with broader fairness guarantees, we initiate the study of lotteries over discrete PB outcomes. As the projects have heterogeneous costs, the amount spent may not be equal ex ante and ex post. To address this, we develop a technique to bound the amount by which the ex-post spend differs from the ex-ante spend -- the property is termed budget bala… ▽ More In pursuit of participatory budgeting (PB) outcomes with broader fairness guarantees, we initiate the study of lotteries over discrete PB outcomes. As the projects have heterogeneous costs, the amount spent may not be equal ex ante and ex post. To address this, we develop a technique to bound the amount by which the ex-post spend differs from the ex-ante spend -- the property is termed budget balanced up to one project (BB1). With respect to fairness, we take a best-of-both-worlds perspective, seeking outcomes that are both ex-ante and ex-post fair. Towards this goal, we initiate a study of ex-ante fairness properties in PB, including Individual Fair Share (IFS), Unanimous Fair Share (UFS) and their stronger variants, as well as Group Fair Share (GFS). We show several incompatibility results between these ex-ante fairness notions and existing ex-post concepts based on justified representation. One of our main contributions is a randomized algorithm which simultaneously satisfies ex-ante Strong UFS, ex-post full justified representation (FJR) and ex-post BB1 for PB with binary utilities. △ Less

Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Appears in the 38th AAAI Conference on Artificial Intelligence (AAAI), 2024

arXiv:2403.07711 [pdf, other]

SSM Meets Video Diffusion Models: Efficient Long-Term Video Generation with Structured State Spaces

Authors: Yuta Oshima, Shohei Taniguchi, Masahiro Suzuki, Yutaka Matsuo

Abstract: Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generation have predominantly utilized attention layers to extract temporal features. However, attention layers are limited by their computational costs, which increase quadratically wit… ▽ More Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generation have predominantly utilized attention layers to extract temporal features. However, attention layers are limited by their computational costs, which increase quadratically with the sequence length. This limitation presents significant challenges when generating longer video sequences using diffusion models. To overcome this challenge, we propose leveraging state-space models (SSMs) as temporal feature extractors. SSMs (e.g., Mamba) have recently gained attention as promising alternatives due to their linear-time memory consumption relative to sequence length. In line with previous research suggesting that using bidirectional SSMs is effective for understanding spatial features in image generation, we found that bidirectionality is also beneficial for capturing temporal features in video data, rather than relying on traditional unidirectional SSMs. We conducted comprehensive evaluations on multiple long-term video datasets, such as MineRL Navigate, across various model sizes. For sequences up to 256 frames, SSM-based models require less memory to achieve the same FVD as attention-based models. Moreover, SSM-based models often deliver better performance with comparable GPU memory usage. Our codes are available at https://github.com/shim0114/SSM-Meets-Video-Diffusion-Models. △ Less

Submitted 3 September, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted as a workshop paper at ICLR 2024

arXiv:2402.14484 [pdf, other]

Is ChatGPT the Future of Causal Text Mining? A Comprehensive Evaluation and Analysis

Authors: Takehiro Takayanagi, Masahiro Suzuki, Ryotaro Kobayashi, Hiroki Sakaji, Kiyoshi Izumi

Abstract: Causality is fundamental in human cognition and has drawn attention in diverse research fields. With growing volumes of textual data, discerning causalities within text data is crucial, and causal text mining plays a pivotal role in extracting meaningful patterns. This study conducts comprehensive evaluations of ChatGPT's causal text mining capabilities. Firstly, we introduce a benchmark that exte… ▽ More Causality is fundamental in human cognition and has drawn attention in diverse research fields. With growing volumes of textual data, discerning causalities within text data is crucial, and causal text mining plays a pivotal role in extracting meaningful patterns. This study conducts comprehensive evaluations of ChatGPT's causal text mining capabilities. Firstly, we introduce a benchmark that extends beyond general English datasets, including domain-specific and non-English datasets. We also provide an evaluation framework to ensure fair comparisons between ChatGPT and previous approaches. Finally, our analysis outlines the limitations and future challenges in employing ChatGPT for causal text mining. Specifically, our analysis reveals that ChatGPT serves as a good starting point for various datasets. However, when equipped with a sufficient amount of training data, previous models still surpass ChatGPT's performance. Additionally, ChatGPT suffers from the tendency to falsely recognize non-causal sequences as causal sequences. These issues become even more pronounced with advanced versions of the model, such as GPT-4. In addition, we highlight the constraints of ChatGPT in handling complex causality types, including both intra/inter-sentential and implicit causality. The model also faces challenges with effectively leveraging in-context learning and domain adaptation. We release our code to support further research and development in this field. △ Less

Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2312.11286 [pdf, ps, other]

Envy-free House Allocation under Uncertain Preferences

Authors: Haris Aziz, Isaiah Iliffe, Bo Li, Angus Ritossa, Ankang Sun, Mashbat Suzuki

Abstract: We study the envy-free house allocation problem when agents have uncertain preferences over items and consider several well-studied preference uncertainty models. The central problem that we focus on is computing an allocation that has the highest probability of being envy-free. We show that each model leads to a distinct set of algorithmic and complexity results, including detailed results on (in… ▽ More We study the envy-free house allocation problem when agents have uncertain preferences over items and consider several well-studied preference uncertainty models. The central problem that we focus on is computing an allocation that has the highest probability of being envy-free. We show that each model leads to a distinct set of algorithmic and complexity results, including detailed results on (in-)approximability. En route, we consider two related problems of checking whether there exists an allocation that is possibly or necessarily envy-free. We give a complete picture of the computational complexity of these two problems for all the uncertainty models we consider. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: To appear in the proceeding of AAAI2024

arXiv:2310.12900 [pdf]

Personalized human mobility prediction for HuMob challenge

Authors: Masahiro Suzuki, Shomu Furuta, Yusuke Fukazawa

Abstract: We explain the methodology used to create the data submitted to HuMob Challenge, a data analysis competition for human mobility prediction. We adopted a personalized model to predict the individual's movement trajectory from their data, instead of predicting from the overall movement, based on the hypothesis that human movement is unique to each person. We devised the features such as the date and… ▽ More We explain the methodology used to create the data submitted to HuMob Challenge, a data analysis competition for human mobility prediction. We adopted a personalized model to predict the individual's movement trajectory from their data, instead of predicting from the overall movement, based on the hypothesis that human movement is unique to each person. We devised the features such as the date and time, activity time, days of the week, time of day, and frequency of visits to POI (Point of Interest). As additional features, we incorporated the movement of other individuals with similar behavior patterns through the employment of clustering. The machine learning model we adopted was the Support Vector Regression (SVR). We performed accuracy through offline assessment and carried out feature selection and parameter tuning. Although overall dataset provided consists of 100,000 users trajectory, our method use only 20,000 target users data, and do not need to use other 80,000 data. Despite the personalized model's traditional feature engineering approach, this model yields reasonably good accuracy with lower computational cost. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.10083 [pdf, other]

JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning

Authors: Issey Sukeda, Masahiro Suzuki, Hiroki Sakaji, Satoshi Kodera

Abstract: In the ongoing wave of impact driven by large language models (LLMs) like ChatGPT, the adaptation of LLMs to medical domain has emerged as a crucial research frontier. Since mainstream LLMs tend to be designed for general-purpose applications, constructing a medical LLM through domain adaptation is a huge challenge. While instruction-tuning is used to fine-tune some LLMs, its precise roles in doma… ▽ More In the ongoing wave of impact driven by large language models (LLMs) like ChatGPT, the adaptation of LLMs to medical domain has emerged as a crucial research frontier. Since mainstream LLMs tend to be designed for general-purpose applications, constructing a medical LLM through domain adaptation is a huge challenge. While instruction-tuning is used to fine-tune some LLMs, its precise roles in domain adaptation remain unknown. Here we show the contribution of LoRA-based instruction-tuning to performance in Japanese medical question-answering tasks. In doing so, we employ a multifaceted evaluation for multiple-choice questions, including scoring based on "Exact match" and "Gestalt distance" in addition to the conventional accuracy. Our findings suggest that LoRA-based instruction-tuning can partially incorporate domain-specific knowledge into LLMs, with larger models demonstrating more pronounced effects. Furthermore, our results underscore the potential of adapting English-centric models for Japanese applications in domain adaptation, while also highlighting the persisting limitations of Japanese-centric models. This initiative represents a pioneering effort in enabling medical institutions to fine-tune and operate models without relying on external services. △ Less

Submitted 30 November, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 8 pages, 1 figures

arXiv:2309.04031 [pdf, other]

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Authors: Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Masayasu Muraoka, George Saon

Abstract: Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different laye… ▽ More Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can be an effective alternative to transferring only a single representation. △ Less

Submitted 25 December, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted to ICASSP 2024

arXiv:2309.03412 [pdf, other]

From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models

Authors: Masahiro Suzuki, Masanori Hirano, Hiroki Sakaji

Abstract: Instruction tuning is essential for large language models (LLMs) to become interactive. While many instruction tuning datasets exist in English, there is a noticeable lack in other languages. Also, their effectiveness has not been well verified in non-English languages. We construct a Japanese instruction dataset by expanding and filtering existing datasets and apply the dataset to a Japanese pre-… ▽ More Instruction tuning is essential for large language models (LLMs) to become interactive. While many instruction tuning datasets exist in English, there is a noticeable lack in other languages. Also, their effectiveness has not been well verified in non-English languages. We construct a Japanese instruction dataset by expanding and filtering existing datasets and apply the dataset to a Japanese pre-trained base model. We performed Low-Rank Adaptation (LoRA) tuning on both Japanese and English existing models using our instruction dataset. We evaluated these models from both quantitative and qualitative perspectives. As a result, the effectiveness of Japanese instruction datasets is confirmed. The results also indicate that even with relatively small LLMs, performances in downstream tasks would be improved through instruction tuning. Our instruction dataset, tuned models, and implementation are publicly available online. △ Less

Submitted 5 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: 10 pages, 1 figure, 2 tables. The paper is a camera-ready version of IEEE BigData 2023

arXiv:2306.09564 [pdf, ps, other]

doi 10.1613/jair.1.15800

Mixed Fair Division: A Survey

Authors: Shengxin Liu, Xinhang Lu, Mashbat Suzuki, Toby Walsh

Abstract: Fair division considers the allocation of scarce resources among agents in such a way that every agent gets a fair share. It is a fundamental problem in society and has received significant attention and rapid developments from the game theory and artificial intelligence communities in recent years. The majority of the fair division literature can be divided along at least two orthogonal direction… ▽ More Fair division considers the allocation of scarce resources among agents in such a way that every agent gets a fair share. It is a fundamental problem in society and has received significant attention and rapid developments from the game theory and artificial intelligence communities in recent years. The majority of the fair division literature can be divided along at least two orthogonal directions: goods versus chores, and divisible versus indivisible resources. In this survey, besides describing the state of the art, we outline a number of interesting open questions and future directions in three mixed fair division settings: (i) indivisible goods and chores, (ii) divisible and indivisible goods (mixed goods), and (iii) indivisible goods with subsidy which can be viewed like a divisible good. △ Less

Submitted 12 August, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Appears in the 38th AAAI Conference on Artificial Intelligence (AAAI), Senior Member Presentation Track, 2024

Journal ref: Journal of Artificial Intelligence Research (JAIR), 80:1373-1406, 2024

arXiv:2305.19684 [pdf, other]

End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive Divergence with Local Mode Initialization

Authors: Shohei Taniguchi, Masahiro Suzuki, Yusuke Iwasawa, Yutaka Matsuo

Abstract: We address the problem of biased gradient estimation in deep Boltzmann machines (DBMs). The existing method to obtain an unbiased estimator uses a maximal coupling based on a Gibbs sampler, but when the state is high-dimensional, it takes a long time to converge. In this study, we propose to use a coupling based on the Metropolis-Hastings (MH) and to initialize the state around a local mode of the… ▽ More We address the problem of biased gradient estimation in deep Boltzmann machines (DBMs). The existing method to obtain an unbiased estimator uses a maximal coupling based on a Gibbs sampler, but when the state is high-dimensional, it takes a long time to converge. In this study, we propose to use a coupling based on the Metropolis-Hastings (MH) and to initialize the state around a local mode of the target distribution. Because of the propensity of MH to reject proposals, the coupling tends to converge in only one step with a high probability, leading to high efficiency. We find that our method allows DBMs to be trained in an end-to-end fashion without greedy pretraining. We also propose some practical techniques to further improve the performance of DBMs. We empirically demonstrate that our training algorithm enables DBMs to show comparable generative performance to other deep generative models, achieving the FID score of 10.33 for MNIST. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: Accepted at ICML 2023

arXiv:2305.12720 [pdf, ps, other]

llm-japanese-dataset v0: Construction of Japanese Chat Dataset for Large Language Models and its Methodology

Authors: Masanori Hirano, Masahiro Suzuki, Hiroki Sakaji

Abstract: This study constructed a Japanese chat dataset for tuning large language models (LLMs), which consist of about 8.4 million records. Recently, LLMs have been developed and gaining popularity. However, high-performing LLMs are usually mainly for English. There are two ways to support languages other than English by those LLMs: constructing LLMs from scratch or tuning existing models. However, in bot… ▽ More This study constructed a Japanese chat dataset for tuning large language models (LLMs), which consist of about 8.4 million records. Recently, LLMs have been developed and gaining popularity. However, high-performing LLMs are usually mainly for English. There are two ways to support languages other than English by those LLMs: constructing LLMs from scratch or tuning existing models. However, in both ways, datasets are necessary parts. In this study, we focused on supporting Japanese in those LLMs and making a dataset for training or tuning LLMs in Japanese. The dataset we constructed consisted of various tasks, such as translation and knowledge tasks. In our experiment, we tuned an existing LLM using our dataset and evaluated the performance qualitatively. The results suggest that our dataset is possibly beneficial for LLMs. However, we also revealed some difficulties in constructing LLMs in languages other than English. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 12 pages

arXiv:2303.03642 [pdf, ps, other]

Best-of-Both-Worlds Fairness in Committee Voting

Authors: Haris Aziz, Xinhang Lu, Mashbat Suzuki, Jeremy Vollen, Toby Walsh

Abstract: The best-of-both-worlds paradigm advocates an approach that achieves desirable properties both ex-ante and ex-post. We launch a best-of-both-worlds fairness perspective for the important social choice setting of approval-based committee voting. To this end, we initiate work on ex-ante proportional representation properties in this domain and formalize a hierarchy of notions including Individual Fa… ▽ More The best-of-both-worlds paradigm advocates an approach that achieves desirable properties both ex-ante and ex-post. We launch a best-of-both-worlds fairness perspective for the important social choice setting of approval-based committee voting. To this end, we initiate work on ex-ante proportional representation properties in this domain and formalize a hierarchy of notions including Individual Fair Share (IFS), Unanimous Fair Share (UFS), Group Fair Share (GFS), and their stronger variants. We establish their compatibility with well-studied ex-post concepts such as extended justified representation (EJR) and fully justified representation (FJR). Our first main result is a polynomial-time algorithm that simultaneously satisfies ex-post EJR, ex-ante GFS and ex-ante Strong UFS. Subsequently, we strengthen our ex-post guarantee to FJR and present an algorithm that outputs a lottery which is ex-post FJR and ex-ante Strong UFS, but does not run in polynomial time. △ Less

Submitted 25 December, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Appears in the 19th Conference on Web and Internet Economics (WINE), 2023

arXiv:2301.05832 [pdf, other]

World Models and Predictive Coding for Cognitive and Developmental Robotics: Frontiers and Challenges

Authors: Tadahiro Taniguchi, Shingo Murata, Masahiro Suzuki, Dimitri Ognibene, Pablo Lanillos, Emre Ugur, Lorenzo Jamone, Tomoaki Nakamura, Alejandra Ciria, Bruno Lara, Giovanni Pezzulo

Abstract: Creating autonomous robots that can actively explore the environment, acquire knowledge and learn skills continuously is the ultimate achievement envisioned in cognitive and developmental robotics. Their learning processes should be based on interactions with their physical and social world in the manner of human learning and cognitive development. Based on this context, in this paper, we focus on… ▽ More Creating autonomous robots that can actively explore the environment, acquire knowledge and learn skills continuously is the ultimate achievement envisioned in cognitive and developmental robotics. Their learning processes should be based on interactions with their physical and social world in the manner of human learning and cognitive development. Based on this context, in this paper, we focus on the two concepts of world models and predictive coding. Recently, world models have attracted renewed attention as a topic of considerable interest in artificial intelligence. Cognitive systems learn world models to better predict future sensory observations and optimize their policies, i.e., controllers. Alternatively, in neuroscience, predictive coding proposes that the brain continuously predicts its inputs and adapts to model its own dynamics and control behavior in its environment. Both ideas may be considered as underpinning the cognitive development of robots and humans capable of continual or lifelong learning. Although many studies have been conducted on predictive coding in cognitive robotics and neurorobotics, the relationship between world model-based approaches in AI and predictive coding in robotics has rarely been discussed. Therefore, in this paper, we clarify the definitions, relationships, and status of current research on these topics, as well as missing pieces of world models and predictive coding in conjunction with crucially related concepts such as the free-energy principle and active inference in the context of cognitive and developmental robotics. Furthermore, we outline the frontiers and challenges involved in world models and predictive coding toward the further integration of AI and robotics, as well as the creation of robots with real cognitive and developmental capabilities in the future. △ Less

Submitted 14 January, 2023; originally announced January 2023.

Comments: 28 pages, 3 figures

arXiv:2211.00879 [pdf, ps, other]

Fair Allocation of Two Types of Chores

Authors: Haris Aziz, Jeremy Lindsay, Angus Ritossa, Mashbat Suzuki

Abstract: We consider the problem of fair allocation of indivisible chores under additive valuations. We assume that the chores are divided into two types and under this scenario, we present several results. Our first result is a new characterization of Pareto optimal allocations in our setting, and a polynomial-time algorithm to compute an envy-free up to one item (EF1) and Pareto optimal allocation. We th… ▽ More We consider the problem of fair allocation of indivisible chores under additive valuations. We assume that the chores are divided into two types and under this scenario, we present several results. Our first result is a new characterization of Pareto optimal allocations in our setting, and a polynomial-time algorithm to compute an envy-free up to one item (EF1) and Pareto optimal allocation. We then turn to the question of whether we can achieve a stronger fairness concept called envy-free up any item (EFX). We present a polynomial-time algorithm that returns an EFX allocation. Finally, we show that for our setting, it can be checked in polynomial time whether an envy-free allocation exists or not. △ Less

Submitted 24 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

arXiv:2210.08703 [pdf, ps, other]

Spoken Dialogue System Based on Attribute Vector for Travel Agent Robot

Authors: Motoyuki Suzuki, Shintaro Sodeya, Taichi Nakamura

Abstract: In this study, we develop a dialogue system for a dialogue robot competition. In the system, the characteristics of sightseeing spots are expressed as "attribute vectors" in advance, and the user is questioned on the different attributes of the two candidate spots. Consequently, the system can make recommendations based on user intentions. A dialogue experiment is conducted during a preliminary ro… ▽ More In this study, we develop a dialogue system for a dialogue robot competition. In the system, the characteristics of sightseeing spots are expressed as "attribute vectors" in advance, and the user is questioned on the different attributes of the two candidate spots. Consequently, the system can make recommendations based on user intentions. A dialogue experiment is conducted during a preliminary round of competition. The overall satisfaction score obtained is 40.1 out of 63 points, which is a reasonable result. Analysis of the relationship between the system behavior and satisfaction scores reveals that satisfaction increases when the system correctly understands the user intention and responds appropriately. However, a negative correlation is observed between the number of user utterances and the satisfaction score. This implies that inappropriate responses reduce the usefulness of the system as a consultation partner. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: This paper is part of the proceedings of the Dialogue Robot Competition 2022

arXiv:2207.02127 [pdf, ps, other]

doi 10.1080/01691864.2022.2035253

A survey of multimodal deep generative models

Authors: Masahiro Suzuki, Yutaka Matsuo

Abstract: Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and cross-modal generation via these representations; however, achieving this requires taking the heterogeneous nature of multimodal data into account. In recent years,… ▽ More Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and cross-modal generation via these representations; however, achieving this requires taking the heterogeneous nature of multimodal data into account. In recent years, deep generative models, i.e., generative models in which distributions are parameterized by deep neural networks, have attracted much attention, especially variational autoencoders, which are suitable for accomplishing the above challenges because they can consider heterogeneity and infer good representations of data. Therefore, various multimodal generative models based on variational autoencoders, called multimodal deep generative models, have been proposed in recent years. In this paper, we provide a categorized survey of studies on multimodal deep generative models. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: Published in Advanced Robotics

Journal ref: Advanced Robotics, 36:5-6, 261-278, 2022

arXiv:2206.05966 [pdf, other]

Coordinating Monetary Contributions in Participatory Budgeting

Authors: Haris Aziz, Sujit Gujar, Manisha Padala, Mashbat Suzuki, Jeremy Vollen

Abstract: We formalize a framework for coordinating funding and selecting projects, the costs of which are shared among agents with quasi-linear utility functions and individual budgets. Our model contains the classical discrete participatory budgeting model as a special case, while capturing other useful scenarios. We propose several important axioms and objectives and study how well they can be simultaneo… ▽ More We formalize a framework for coordinating funding and selecting projects, the costs of which are shared among agents with quasi-linear utility functions and individual budgets. Our model contains the classical discrete participatory budgeting model as a special case, while capturing other useful scenarios. We propose several important axioms and objectives and study how well they can be simultaneously satisfied. We show that whereas welfare maximization admits an FPTAS, welfare maximization subject to a natural and very weak participation requirement leads to a strong inapproximability. This result is bypassed if we consider some natural restricted valuations, namely laminar single-minded valuations and symmetric valuations. Our analysis for the former restriction leads to the discovery of a new class of tractable instances for the Set Union Knapsack problem, a classical problem in combinatorial optimization. △ Less

Submitted 22 February, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: In this version, we include results regarding single minded valuations. We have also corrected a bug in the proof of Lemma 1

arXiv:2205.14798 [pdf, other]

Random Rank: The One and Only Strategyproof and Proportionally Fair Randomized Facility Location Mechanism

Authors: Haris Aziz, Alexander Lam, Mashbat Suzuki, Toby Walsh

Abstract: Proportionality is an attractive fairness concept that has been applied to a range of problems including the facility location problem, a classic problem in social choice. In our work, we propose a concept called Strong Proportionality, which ensures that when there are two groups of agents at different locations, both groups incur the same total cost. We show that although Strong Proportionality… ▽ More Proportionality is an attractive fairness concept that has been applied to a range of problems including the facility location problem, a classic problem in social choice. In our work, we propose a concept called Strong Proportionality, which ensures that when there are two groups of agents at different locations, both groups incur the same total cost. We show that although Strong Proportionality is a well-motivated and basic axiom, there is no deterministic strategyproof mechanism satisfying the property. We then identify a randomized mechanism called Random Rank (which uniformly selects a number $k$ between $1$ to $n$ and locates the facility at the $k$'th highest agent location) which satisfies Strong Proportionality in expectation. Our main theorem characterizes Random Rank as the unique mechanism that achieves universal truthfulness, universal anonymity, and Strong Proportionality in expectation among all randomized mechanisms. Finally, we show via the AverageOrRandomRank mechanism that even stronger ex-post fairness guarantees can be achieved by weakening universal truthfulness to strategyproofness in expectation. △ Less

Submitted 14 June, 2022; v1 submitted 29 May, 2022; originally announced May 2022.

arXiv:2204.00212 [pdf, ps, other]

Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

Authors: Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon

Abstract: Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have been successfully applied to ASR N-best rescoring. However, whether or how they can benefit competitive, near state-of-the-art ASR systems remains unexplored. In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model. We demonstrate that consistent improvement is… ▽ More Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have been successfully applied to ASR N-best rescoring. However, whether or how they can benefit competitive, near state-of-the-art ASR systems remains unexplored. In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model. We demonstrate that consistent improvement is achieved by the LLM's bidirectionality, pretraining, in-domain finetuning and context augmentation. Furthermore, our lexical analysis sheds light on how each of these components may be contributing to the ASR performance. △ Less

Submitted 18 August, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

Comments: Accepted to Interspeech 2022

arXiv:2203.15176 [pdf, other]

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

Authors: Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata

Abstract: We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects… ▽ More We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects noise to ground truth labels during training in order to avoid overfitting, where the noisy labels are generated from n-best hypotheses. We evaluate these two techniques extensively on the 300-hour Switchboard (SWB300) dataset and an in-house 500-hour Japanese (JPN500) dataset using recurrent neural network transducer (RNNT) acoustic models for ASR. We show that both techniques improve the generalization of RNNT models individually and they can also be complementary. In particular, they yield good improvements over a strong SWB300 baseline and give state-of-art performance on SWB300 using RNNT models. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Submitted to Interspeech 2022

arXiv:2112.00355 [pdf, other]

doi 10.1145/3469877.3490612

Score Transformer: Generating Musical Score from Note-level Representation

Authors: Masahiro Suzuki

Abstract: In this paper, we explore the tokenized representation of musical scores using the Transformer model to automatically generate musical scores. Thus far, sequence models have yielded fruitful results with note-level (MIDI-equivalent) symbolic representations of music. Although the note-level representations can comprise sufficient information to reproduce music aurally, they cannot contain adequate… ▽ More In this paper, we explore the tokenized representation of musical scores using the Transformer model to automatically generate musical scores. Thus far, sequence models have yielded fruitful results with note-level (MIDI-equivalent) symbolic representations of music. Although the note-level representations can comprise sufficient information to reproduce music aurally, they cannot contain adequate information to represent music visually in terms of notation. Musical scores contain various musical symbols (e.g., clef, key signature, and notes) and attributes (e.g., stem direction, beam, and tie) that enable us to visually comprehend musical content. However, automated estimation of these elements has yet to be comprehensively addressed. In this paper, we first design score token representation corresponding to the various musical elements. We then train the Transformer model to transcribe note-level representation into appropriate music notation. Evaluations of popular piano scores show that the proposed method significantly outperforms existing methods on all 12 musical aspects that were investigated. We also explore an effective notation-level token representation to work with the model and determine that our proposed representation produces the steadiest results. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: Accepted at ACM Multimedia Asia 2021 (MMAsia '21); Project page: https://score-transformer.github.io/

arXiv:2110.07031 [pdf, other]

Improving the Robustness to Variations of Objects and Instructions with a Neuro-Symbolic Approach for Interactive Instruction Following

Authors: Kazutoshi Shinoda, Yuki Takezawa, Masahiro Suzuki, Yusuke Iwasawa, Yutaka Matsuo

Abstract: An interactive instruction following task has been proposed as a benchmark for learning to map natural language instructions and first-person vision into sequences of actions to interact with objects in 3D environments. We found that an existing end-to-end neural model for this task tends to fail to interact with objects of unseen attributes and follow various instructions. We assume that this pro… ▽ More An interactive instruction following task has been proposed as a benchmark for learning to map natural language instructions and first-person vision into sequences of actions to interact with objects in 3D environments. We found that an existing end-to-end neural model for this task tends to fail to interact with objects of unseen attributes and follow various instructions. We assume that this problem is caused by the high sensitivity of neural feature extraction to small changes in vision and language inputs. To mitigate this problem, we propose a neuro-symbolic approach that utilizes high-level symbolic features, which are robust to small changes in raw inputs, as intermediate representations. We verify the effectiveness of our model with the subtask evaluation on the ALFRED benchmark. Our experiments show that our approach significantly outperforms the end-to-end neural model by 9, 46, and 74 points in the success rate on the ToggleObject, PickupObject, and SliceObject subtasks in unseen environments respectively. △ Less

Submitted 15 November, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: Accepted to the 29th International Conference on MultiMedia Modeling (MMM 2023)

arXiv:2107.13109 [pdf, ps, other]

doi 10.1080/01691864.2023.2244568

Pixyz: a Python library for developing deep generative models

Authors: Masahiro Suzuki, Takaaki Kaneko, Yutaka Matsuo

Abstract: With the recent rapid progress in the study of deep generative models (DGMs), there is a need for a framework that can implement them in a simple and generic way. In this research, we focus on two features of DGMs: (1) deep neural networks are encapsulated by probability distributions, and (2) models are designed and learned based on an objective function. Taking these features into account, we pr… ▽ More With the recent rapid progress in the study of deep generative models (DGMs), there is a need for a framework that can implement them in a simple and generic way. In this research, we focus on two features of DGMs: (1) deep neural networks are encapsulated by probability distributions, and (2) models are designed and learned based on an objective function. Taking these features into account, we propose a new Python library to implement DGMs called Pixyz. This library adopts a step-by-step implementation method with three APIs, which allows us to implement various DGMs more concisely and intuitively. In addition, the library introduces memoization to reduce the cost of duplicate computations in DGMs to speed up the computation. We demonstrate experimentally that this library is faster than existing probabilistic programming languages in training DGMs. △ Less

Submitted 21 September, 2023; v1 submitted 27 July, 2021; originally announced July 2021.

Comments: Published in Advanced Robotics

Journal ref: Advanced Robotics, 2023

arXiv:2106.00841 [pdf, ps, other]

Two Birds With One Stone: Fairness and Welfare via Transfers

Authors: Vishnu V. Narayan, Mashbat Suzuki, Adrian Vetta

Abstract: We study the question of dividing a collection of indivisible goods amongst a set of agents. The main objective of research in the area is to achieve one of two goals: fairness or efficiency. On the fairness side, envy-freeness is the central fairness criterion in economics, but envy-free allocations typically do not exist when the goods are indivisible. A recent line of research shows that envy-f… ▽ More We study the question of dividing a collection of indivisible goods amongst a set of agents. The main objective of research in the area is to achieve one of two goals: fairness or efficiency. On the fairness side, envy-freeness is the central fairness criterion in economics, but envy-free allocations typically do not exist when the goods are indivisible. A recent line of research shows that envy-freeness can be achieved if a small quantity of a homogeneous divisible good (money) is introduced into the system, or equivalently, if transfer payments are allowed between the agents. A natural question to explore, then, is whether transfer payments can be used to provide high welfare in addition to envy-freeness, and if so, how much money is needed to be transferred. We show that for general monotone valuations, there always exists an allocation with transfers that is envy-free and whose Nash social welfare (NSW) is at least an $e^{-1/e}$-fraction of the optimal Nash social welfare. Additionally, when the agents have additive valuations, an envy-free allocation with negligible transfers and whose NSW is within a constant factor of optimal can be found in polynomial time. Consequently, we demonstrate that the seemingly incompatible objectives of fairness and high welfare can be achieved simultaneously via transfer payments, even for general valuations, when the welfare objective is NSW. On the other hand, we show that a similar result is impossible for utilitarian social welfare: any envy-freeable allocation that achieves a constant fraction of the optimal welfare requires non-negligible transfers. To complement this result we present algorithms that compute an envy-free allocation with a given target welfare and with bounded transfers. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2105.05142 [pdf, other]

Pirates in Wonderland: Liquid Democracy has Bicriteria Guarantees

Authors: Jonathan A. Noel, Mashbat Suzuki, Adrian Vetta

Abstract: Liquid democracy has a natural graphical representation, the delegation graph. Consequently, the strategic aspects of liquid democracy can be studied as a game over delegation graphs, called the liquid democracy game. Our main result is that this game has bicriteria approximation guarantees, in terms of both rationality and social welfare. Specifically, we prove the price of stability for $εいぷしろん$-Nash… ▽ More Liquid democracy has a natural graphical representation, the delegation graph. Consequently, the strategic aspects of liquid democracy can be studied as a game over delegation graphs, called the liquid democracy game. Our main result is that this game has bicriteria approximation guarantees, in terms of both rationality and social welfare. Specifically, we prove the price of stability for $εいぷしろん$-Nash equilibria is exactly $εいぷしろん$ in the liquid democracy game. △ Less

Submitted 22 November, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

arXiv:2103.08183 [pdf, other]

doi 10.1016/j.neunet.2022.02.026

A Whole Brain Probabilistic Generative Model: Toward Realizing Cognitive Architectures for Developmental Robots

Authors: Tadahiro Taniguchi, Hiroshi Yamakawa, Takayuki Nagai, Kenji Doya, Masamichi Sakagami, Masahiro Suzuki, Tomoaki Nakamura, Akira Taniguchi

Abstract: Building a humanlike integrative artificial cognitive system, that is, an artificial general intelligence (AGI), is the holy grail of the artificial intelligence (AI) field. Furthermore, a computational model that enables an artificial system to achieve cognitive development will be an excellent reference for brain and cognitive science. This paper describes an approach to develop a cognitive arch… ▽ More Building a humanlike integrative artificial cognitive system, that is, an artificial general intelligence (AGI), is the holy grail of the artificial intelligence (AI) field. Furthermore, a computational model that enables an artificial system to achieve cognitive development will be an excellent reference for brain and cognitive science. This paper describes an approach to develop a cognitive architecture by integrating elemental cognitive modules to enable the training of the modules as a whole. This approach is based on two ideas: (1) brain-inspired AI, learning human brain architecture to build human-level intelligence, and (2) a probabilistic generative model(PGM)-based cognitive system to develop a cognitive system for developmental robots by integrating PGMs. The development framework is called a whole brain PGM (WB-PGM), which differs fundamentally from existing cognitive architectures in that it can learn continuously through a system based on sensory-motor information. In this study, we describe the rationale of WB-PGM, the current status of PGM-based elemental cognitive modules, their relationship with the human brain, the approach to the integration of the cognitive modules, and future challenges. Our findings can serve as a reference for brain studies. As PGMs describe explicit informational relationships between variables, this description provides interpretable guidance from computational sciences to brain science. By providing such information, researchers in neuroscience can provide feedback to researchers in AI and robotics on what the current models lack with reference to the brain. Further, it can facilitate collaboration among researchers in neuro-cognitive sciences as well as AI and robotics. △ Less

Submitted 9 January, 2022; v1 submitted 15 March, 2021; originally announced March 2021.

Comments: 62 pages, 9 figures, submitted to Neural Networks

Journal ref: Neural Networks, 2022, Volume 150, 293-312

arXiv:2101.00133 [pdf, other]

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

Authors: Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini , et al. (28 additional authors not shown)

Abstract: We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage conte… ▽ More We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage contestants to explore the trade-off between storing retrieval corpora or the parameters of learned models. In this report, we describe the motivation and organization of the competition, review the best submissions, and analyze system predictions to inform a discussion of evaluation for open-domain QA. △ Less

Submitted 19 September, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

Comments: 26 pages; Published in Proceedings of Machine Learning Research (PMLR), NeurIPS 2020 Competition and Demonstration Track

arXiv:2005.12505 [pdf, other]

How Many Freemasons Are There? The Consensus Voting Mechanism in Metric Spaces

Authors: Mashbat Suzuki, Adrian Vetta

Abstract: We study the evolution of a social group when admission to the group is determined via consensus or unanimity voting. In each time period, two candidates apply for membership and a candidate is selected if and only if all the current group members agree. We apply the spatial theory of voting where group members and candidates are located in a metric space and each member votes for its closest (mos… ▽ More We study the evolution of a social group when admission to the group is determined via consensus or unanimity voting. In each time period, two candidates apply for membership and a candidate is selected if and only if all the current group members agree. We apply the spatial theory of voting where group members and candidates are located in a metric space and each member votes for its closest (most similar) candidate. Our interest focuses on the expected cardinality of the group after $T$ time periods. To evaluate this we study the geometry inherent in dynamic consensus voting over a metric space. This allows us to develop a set of techniques for lower bounding and upper bounding the expected cardinality of a group. We specialize these methods for two-dimensional metric spaces. For the unit ball the expected cardinality of the group after $T$ time periods is $Θしーた(T^{1/8})$. In sharp contrast, for the unit square the expected cardinality is at least $Ωおめが(\ln T)$ but at most $O(\ln T \cdot \ln\ln T )$. △ Less

Submitted 25 May, 2020; originally announced May 2020.

arXiv:2003.03526 [pdf, ps, other]

Convergence of Q-value in case of Gaussian rewards

Authors: Konatsu Miyamoto, Masaya Suzuki, Yuma Kigami, Kodai Satake

Abstract: In this paper, as a study of reinforcement learning, we converge the Q function to unbounded rewards such as Gaussian distribution. From the central limit theorem, in some real-world applications it is natural to assume that rewards follow a Gaussian distribution , but existing proofs cannot guarantee convergence of the Q-function. Furthermore, in the distribution-type reinforcement learning and B… ▽ More In this paper, as a study of reinforcement learning, we converge the Q function to unbounded rewards such as Gaussian distribution. From the central limit theorem, in some real-world applications it is natural to assume that rewards follow a Gaussian distribution , but existing proofs cannot guarantee convergence of the Q-function. Furthermore, in the distribution-type reinforcement learning and Bayesian reinforcement learning that have become popular in recent years, it is better to allow the reward to have a Gaussian distribution. Therefore, in this paper, we prove the convergence of the Q-function under the condition of $E[r(s,a)^2]<\infty$, which is much more relaxed than the existing research. Finally, as a bonus, a proof of the policy gradient theorem for distributed reinforcement learning is also posted. △ Less

Submitted 7 March, 2020; originally announced March 2020.

Comments: 10 pages

arXiv:1912.02797 [pdf, ps, other]

One Dollar Each Eliminates Envy

Authors: Johannes Brustle, Jack Dippel, Vishnu V. Narayan, Mashbat Suzuki, Adrian Vetta

Abstract: We study the fair division of a collection of $m$ indivisible goods amongst a set of $n$ agents. Whilst envy-free allocations typically do not exist in the indivisible goods setting, envy-freeness can be achieved if some amount of a divisible good (money) is introduced. Specifically, Halpern and Shah (SAGT 2019, pp.374-389) showed that, given additive valuation functions where the marginal value o… ▽ More We study the fair division of a collection of $m$ indivisible goods amongst a set of $n$ agents. Whilst envy-free allocations typically do not exist in the indivisible goods setting, envy-freeness can be achieved if some amount of a divisible good (money) is introduced. Specifically, Halpern and Shah (SAGT 2019, pp.374-389) showed that, given additive valuation functions where the marginal value of each item is at most one dollar for each agent, there always exists an envy-free allocation requiring a subsidy of at most $(n-1)\cdot m$ dollars. The authors also conjectured that a subsidy of $n-1$ dollars is sufficient for additive valuations. We prove this conjecture. In fact, a subsidy of at most one dollar per agent is sufficient to guarantee the existence of an envy-free allocation. Further, we prove that for general monotonic valuation functions an envy-free allocation always exists with a subsidy of at most $2(n-1)$ dollars per agent. In particular, the total subsidy required for monotonic valuations is independent of the number of items. △ Less

Submitted 5 December, 2019; originally announced December 2019.

arXiv:1910.08918 [pdf, other]

doi 10.1007/s00354-019-00084-w

Neuro-SERKET: Development of Integrative Cognitive System through the Composition of Deep Probabilistic Generative Models

Authors: Tadahiro Taniguchi, Tomoaki Nakamura, Masahiro Suzuki, Ryo Kuniyasu, Kaede Hayashi, Akira Taniguchi, Takato Horii, Takayuki Nagai

Abstract: This paper describes a framework for the development of an integrative cognitive system based on probabilistic generative models (PGMs) called Neuro-SERKET. Neuro-SERKET is an extension of SERKET, which can compose elemental PGMs developed in a distributed manner and provide a scheme that allows the composed PGMs to learn throughout the system in an unsupervised way. In addition to the head-to-tai… ▽ More This paper describes a framework for the development of an integrative cognitive system based on probabilistic generative models (PGMs) called Neuro-SERKET. Neuro-SERKET is an extension of SERKET, which can compose elemental PGMs developed in a distributed manner and provide a scheme that allows the composed PGMs to learn throughout the system in an unsupervised way. In addition to the head-to-tail connection supported by SERKET, Neuro-SERKET supports tail-to-tail and head-to-head connections, as well as neural network-based modules, i.e., deep generative models. As an example of a Neuro-SERKET application, an integrative model was developed by composing a variational autoencoder (VAE), a Gaussian mixture model (GMM), latent Dirichlet allocation (LDA), and automatic speech recognition (ASR). The model is called VAE+GMM+LDA+ASR. The performance of VAE+GMM+LDA+ASR and the validity of Neuro-SERKET were demonstrated through a multimodal categorization task using image data and a speech signal of numerical digits. △ Less

Submitted 29 January, 2020; v1 submitted 20 October, 2019; originally announced October 2019.

Comments: New Gener. Comput. (2020)

Journal ref: New Generation Computing, 2020, volume 38, 23--48

arXiv:1904.13258 [pdf, other]

doi 10.1109/ICASSP.2019.8683211

English Broadcast News Speech Recognition by Humans and Machines

Authors: Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan Tuske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko

Abstract: With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition. In this paper we evaluate the usefulness of these proposed techniques on broadcast news (BN), a similar challenging task. We also perform a set of recognition measurements to un… ▽ More With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition. In this paper we evaluate the usefulness of these proposed techniques on broadcast news (BN), a similar challenging task. We also perform a set of recognition measurements to understand how close the achieved automatic speech recognition results are to human performance on this task. On two publicly available BN test sets, DEV04F and RT04, our speech recognition system using LSTM and residual network based acoustic models with a combination of n-gram and neural network language models performs at 6.5% and 5.9% word error rate. By achieving new performance milestones on these test sets, our experiments show that techniques developed on other related tasks, like CTS, can be transferred to achieve similar performance. In contrast, the best measured human recognition performance on these test sets is much lower, at 3.6% and 2.8% respectively, indicating that there is still room for new techniques and improvements in this space, to reach human performance levels. △ Less

Submitted 30 April, 2019; originally announced April 2019.

Comments: ©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:1801.08702 [pdf, ps, other]

Improving Bi-directional Generation between Different Modalities with Variational Autoencoders

Authors: Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo

Abstract: We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. A major approach to achieve this objective is to train a model that integrates all the information of different modalities into a joint representation and then to generate one modality from the corresponding other modality via this joint rep… ▽ More We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. A major approach to achieve this objective is to train a model that integrates all the information of different modalities into a joint representation and then to generate one modality from the corresponding other modality via this joint representation. We simply applied this approach to variational autoencoders (VAEs), which we call a joint multimodal variational autoencoder (JMVAE). However, we found that when this model attempts to generate a large dimensional modality missing at the input, the joint representation collapses and this modality cannot be generated successfully. Furthermore, we confirmed that this difficulty cannot be resolved even using a known solution. Therefore, in this study, we propose two models to prevent this difficulty: JMVAE-kl and JMVAE-h. Results of our experiments demonstrate that these methods can prevent the difficulty above and that they generate modalities bi-directionally with equal or higher likelihood than conventional VAE methods, which generate in only one direction. Moreover, we confirm that these methods can obtain the joint representation appropriately, so that they can generate various variations of modality by moving over the joint representation or changing the value of another modality. △ Less

Submitted 26 January, 2018; originally announced January 2018.

Comments: Updated version of arXiv:1611.01891

arXiv:1611.08459 [pdf, other]

Neural Machine Translation with Latent Semantic of Image and Text

Authors: Joji Toyama, Masanori Misono, Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo

Abstract: Although attention-based Neural Machine Translation have achieved great success, attention-mechanism cannot capture the entire meaning of the source sentence because the attention mechanism generates a target word depending heavily on the relevant parts of the source sentence. The report of earlier studies has introduced a latent variable to capture the entire meaning of sentence and achieved impr… ▽ More Although attention-based Neural Machine Translation have achieved great success, attention-mechanism cannot capture the entire meaning of the source sentence because the attention mechanism generates a target word depending heavily on the relevant parts of the source sentence. The report of earlier studies has introduced a latent variable to capture the entire meaning of sentence and achieved improvement on attention-based Neural Machine Translation. We follow this approach and we believe that the capturing meaning of sentence benefits from image information because human beings understand the meaning of language not only from textual information but also from perceptual information such as that gained from vision. As described herein, we propose a neural machine translation model that introduces a continuous latent variable containing an underlying semantic extracted from texts and images. Our model, which can be trained end-to-end, requires image information only when training. Experiments conducted with an English--German translation task show that our model outperforms over the baseline. △ Less

Submitted 25 November, 2016; originally announced November 2016.

arXiv:1611.01891 [pdf, ps, other]

Joint Multimodal Learning with Deep Generative Models

Authors: Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo

Abstract: We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. Recently, some studies handle multiple modalities on deep generative models, such as variational autoencoders (VAEs). However, these models typically assume that modalities are forced to have a conditioned relation, i.e., we can only generat… ▽ More We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. Recently, some studies handle multiple modalities on deep generative models, such as variational autoencoders (VAEs). However, these models typically assume that modalities are forced to have a conditioned relation, i.e., we can only generate modalities in one direction. To achieve our objective, we should extract a joint representation that captures high-level concepts among all modalities and through which we can exchange them bi-directionally. As described herein, we propose a joint multimodal variational autoencoder (JMVAE), in which all modalities are independently conditioned on joint representation. In other words, it models a joint distribution of modalities. Furthermore, to be able to generate missing modalities from the remaining modalities properly, we develop an additional method, JMVAE-kl, that is trained by reducing the divergence between JMVAE's encoder and prepared networks of respective modalities. Our experiments show that our proposed method can obtain appropriate joint representation from multiple modalities and that it can generate and reconstruct them more properly than conventional VAEs. We further demonstrate that JMVAE can generate multiple modalities bi-directionally. △ Less

Submitted 6 November, 2016; originally announced November 2016.

arXiv:1411.3793 [pdf, ps, other]

doi 10.4204/EPTCS.168.4

A Language Support for Exhaustive Fault-Injection in Message-Passing System Models

Authors: Masaya Suzuki, Takuo Watanabe

Abstract: This paper presents an approach towards specifying and verifying adaptive distributed systems. We here take fault-handling as an example of adaptive behavior and propose a modeling language Sandal for describing fault-prone message-passing systems. One of the unique mechanisms of the language is a linguistic support for abstracting typical faults such as unexpected termination of processes and ran… ▽ More This paper presents an approach towards specifying and verifying adaptive distributed systems. We here take fault-handling as an example of adaptive behavior and propose a modeling language Sandal for describing fault-prone message-passing systems. One of the unique mechanisms of the language is a linguistic support for abstracting typical faults such as unexpected termination of processes and random loss of messages. The Sandal compiler translates a model into a set of NuSMV modules. During the compilation process, faults specified in the model will be woven into the output. One can thus enjoy full-automatic exhaustive fault-injection without writing faulty behaviors explicitly. We demonstrate the advantage of the language by verifying a model of the two-phase commit protocol under faulty environment. △ Less

Submitted 13 November, 2014; originally announced November 2014.

Comments: In Proceedings MOD* 2014, arXiv:1411.3453

ACM Class: D 2.4

Journal ref: EPTCS 168, 2014, pp. 45-58

Showing 1–48 of 48 results for author: Suzuki, M