Search | arXiv e-print repository

Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network

Authors: Shun Kotoku, Takatomo Mihana, André Röhm, Ryoichi Horisaki

Abstract: Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations… ▽ More Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation while facilitating cooperative decision-making without explicitly sharing information among agents. Our study demonstrates how decentralized reinforcement learning can be achieved by exploiting complex physical processes controlled by simple algorithms. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 16 pages, 8 figures

arXiv:2312.02537 [pdf, other]

Asymmetric leader-laggard cluster synchronization for collective decision-making with laser network

Authors: Shun Kotoku, Takatomo Mihana, André Röhm, Ryoichi Horisaki, Makoto Naruse

Abstract: Photonic accelerators have recently attracted soaring interest, harnessing the ultimate nature of light for information processing. Collective decision-making with a laser network, employing the chaotic and synchronous dynamics of optically interconnected lasers to address the competitive multi-armed bandit (CMAB) problem, is a highly compelling approach due to its scalability and experimental fea… ▽ More Photonic accelerators have recently attracted soaring interest, harnessing the ultimate nature of light for information processing. Collective decision-making with a laser network, employing the chaotic and synchronous dynamics of optically interconnected lasers to address the competitive multi-armed bandit (CMAB) problem, is a highly compelling approach due to its scalability and experimental feasibility. We investigated essential network structures for collective decision-making through quantitative stability analysis. Moreover, we demonstrated the asymmetric preferences of players in the CMAB problem, extending its functionality to more practical applications. Our study highlights the capability and significance of machine learning built upon chaotic lasers and photonic devices. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2307.15373 [pdf, other]

doi 10.1038/s41598-024-54491-1

Conflict-free joint decision by lag and zero-lag synchronization in laser network

Authors: Hisako Ito, Takatomo Mihana, Ryoichi Horisaki, Makoto Naruse

Abstract: With the end of Moore's Law and the increasing demand for computing, photonic accelerators are garnering considerable attention. This is due to the physical characteristics of light, such as high bandwidth and multiplicity, and the various synchronization phenomena that emerge in the realm of laser physics. These factors come into play as computer performance approaches its limits. In this study,… ▽ More With the end of Moore's Law and the increasing demand for computing, photonic accelerators are garnering considerable attention. This is due to the physical characteristics of light, such as high bandwidth and multiplicity, and the various synchronization phenomena that emerge in the realm of laser physics. These factors come into play as computer performance approaches its limits. In this study, we explore the application of a laser network, acting as a photonic accelerator, to the competitive multi-armed bandit problem. In this context, conflict avoidance is key to maximizing environmental rewards. We experimentally demonstrate cooperative decision-making using zero-lag and lag synchronization within a network of four semiconductor lasers. Lag synchronization of chaos realizes effective decision-making and zero-delay synchronization is responsible for the realization of the collision avoidance function. We experimentally verified a low collision rate and high reward in a fundamental 2-player, 2-slot scenario, and showed the scalability of this system. This system architecture opens up new possibilities for intelligent functionalities in laser dynamics. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Journal ref: Sci Rep 14, 4355 (2024)

arXiv:2304.10118 [pdf, ps, other]

doi 10.3390/e25060843

Bandit Algorithm Driven by a Classical Random Walk and a Quantum Walk

Authors: Tomoki Yamagami, Etsuo Segawa, Takatomo Mihana, André Röhm, Ryoichi Horisaki, Makoto Naruse

Abstract: Quantum walks (QWs) have a property that classical random walks (RWs) do not possess -- the coexistence of linear spreading and localization -- and this property is utilized to implement various kinds of applications. This paper proposes RW- and QW-based algorithms for multi-armed-bandit (MAB) problems. We show that, under some settings, the QW-based model realizes higher performance than the corr… ▽ More Quantum walks (QWs) have a property that classical random walks (RWs) do not possess -- the coexistence of linear spreading and localization -- and this property is utilized to implement various kinds of applications. This paper proposes RW- and QW-based algorithms for multi-armed-bandit (MAB) problems. We show that, under some settings, the QW-based model realizes higher performance than the corresponding RW-based one by associating the two operations that make MAB problems difficult -- exploration and exploitation -- with these two behaviors of QWs. △ Less

Submitted 25 May, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: 21 pages, 11 figures

Journal ref: Entropy, Vol. 25, Iss. 6, No. 843 (2023)

arXiv:2302.10761 [pdf, other]

doi 10.1063/5.0143846

Effect of temporal resolution on the reproduction of chaotic dynamics via reservoir computing

Authors: Kohei Tsuchiyama, André Röhm, Takatomo Mihana, Ryoichi Horisaki, Makoto Naruse

Abstract: Reservoir computing is a machine learning paradigm that uses a structure called a reservoir, which has nonlinearities and short-term memory. In recent years, reservoir computing has expanded to new functions such as the autonomous generation of chaotic time series, as well as time series prediction and classification. Furthermore, novel possibilities have been demonstrated, such as inferring the e… ▽ More Reservoir computing is a machine learning paradigm that uses a structure called a reservoir, which has nonlinearities and short-term memory. In recent years, reservoir computing has expanded to new functions such as the autonomous generation of chaotic time series, as well as time series prediction and classification. Furthermore, novel possibilities have been demonstrated, such as inferring the existence of previously unseen attractors. Sampling, in contrast, has a strong influence on such functions. Sampling is indispensable in a physical reservoir computer that uses an existing physical system as a reservoir because the use of an external digital system for the data input is usually inevitable. This study analyzes the effect of sampling on the ability of reservoir computing to autonomously regenerate chaotic time series. We found, as expected, that excessively coarse sampling degrades the system performance, but also that excessively dense sampling is unsuitable. Based on quantitative indicators that capture the local and global characteristics of attractors, we identify a suitable window of the sampling frequency and discuss its underlying mechanisms. △ Less

Submitted 22 February, 2023; v1 submitted 27 January, 2023; originally announced February 2023.

arXiv:2212.09926 [pdf, other]

Bandit approach to conflict-free multi-agent Q-learning in view of photonic implementation

Authors: Hiroaki Shinkawa, Nicolas Chauvet, André Röhm, Takatomo Mihana, Ryoichi Horisaki, Guillaume Bachelier, Makoto Naruse

Abstract: Recently, extensive studies on photonic reinforcement learning to accelerate the process of calculation by exploiting the physical nature of light have been conducted. Previous studies utilized quantum interference of photons to achieve collective decision-making without choice conflicts when solving the competitive multi-armed bandit problem, a fundamental example of reinforcement learning. Howev… ▽ More Recently, extensive studies on photonic reinforcement learning to accelerate the process of calculation by exploiting the physical nature of light have been conducted. Previous studies utilized quantum interference of photons to achieve collective decision-making without choice conflicts when solving the competitive multi-armed bandit problem, a fundamental example of reinforcement learning. However, the bandit problem deals with a static environment where the agent's action does not influence the reward probabilities. This study aims to extend the conventional approach to a more general multi-agent reinforcement learning targeting the grid world problem. Unlike the conventional approach, the proposed scheme deals with a dynamic environment where the reward changes because of agents' actions. A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm. This study proposes a novel learning algorithm, discontinuous bandit Q-learning, in view of a potential photonic implementation. Here, state-action pairs in the environment are regarded as slot machines in the context of the bandit problem and an updated amount of Q-value is regarded as the reward of the bandit problem. We perform numerical simulations to validate the effectiveness of the bandit algorithm. In addition, we propose a multi-agent architecture in which agents are indirectly connected through quantum interference of light and quantum principles ensure the conflict-free property of state-action pair selections among agents. We demonstrate that multi-agent reinforcement learning can be accelerated owing to conflict avoidance among multiple agents. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: 19 pages, 8 figures, 1 table

arXiv:2211.01661 [pdf, other]

doi 10.3390/e25010146

Pairing optimization via statistics: Algebraic structure in pairing problems and its application to performance enhancement

Authors: Naoki Fujita, André Röhm, Takatomo Mihana, Ryoichi Horisaki, Aohan Li, Mikio Hasegawa, Makoto Naruse

Abstract: Fully pairing all elements of a set while attempting to maximize the total benefit is a combinatorically difficult problem. Such pairing problems naturally appear in various situations in science, technology, economics, and other fields. In our previous study, we proposed an efficient method to infer the underlying compatibilities among the entities, under the constraint that only the total compat… ▽ More Fully pairing all elements of a set while attempting to maximize the total benefit is a combinatorically difficult problem. Such pairing problems naturally appear in various situations in science, technology, economics, and other fields. In our previous study, we proposed an efficient method to infer the underlying compatibilities among the entities, under the constraint that only the total compatibility is observable. Furthermore, by transforming the pairing problem into a traveling salesman problem with a multi-layer architecture, a pairing optimization algorithm was successfully demonstrated to derive a high-total-compatibility pairing. However, there is substantial room for further performance enhancement by further exploiting the underlying mathematical properties. In this study, we prove the existence of algebraic structures in the pairing problem. We transform the initially estimated compatibility information into an equivalent form where the variance of the individual compatibilities is minimized. We then demonstrate that the total compatibility obtained when using the heuristic pairing algorithm on the transformed problem is significantly higher compared to the previous method. With this improved perspective on the pairing problem using fundamental mathematical properties, we can contribute to practical applications such as wireless communications beyond 5G, where efficient pairing is of critical importance. △ Less

Submitted 3 November, 2022; originally announced November 2022.

arXiv:2210.06976 [pdf]

Parallel photonic accelerator for decision making using optical spatiotemporal chaos

Authors: Kensei Morijiri, Kento Takehana, Takatomo Mihana, Kazutaka Kanno, Makoto Naruse, Atsushi Uchida

Abstract: Photonic accelerators have attracted increasing attention in artificial intelligence applications. The multi-armed bandit problem is a fundamental problem of decision making using reinforcement learning. However, the scalability of photonic decision making has not yet been demonstrated in experiments, owing to technical difficulties in physical realization. We propose a parallel photonic decision-… ▽ More Photonic accelerators have attracted increasing attention in artificial intelligence applications. The multi-armed bandit problem is a fundamental problem of decision making using reinforcement learning. However, the scalability of photonic decision making has not yet been demonstrated in experiments, owing to technical difficulties in physical realization. We propose a parallel photonic decision-making system for solving large-scale multi-armed bandit problems using optical spatiotemporal chaos. We solve a 512-armed bandit problem online, which is much larger than previous experiments by two orders of magnitude. The scaling property for correct decision making is examined as a function of the number of slot machines, evaluated as an exponent of 0.86. This exponent is smaller than that in previous work, indicating the superiority of the proposed parallel principle. This experimental demonstration facilitates photonic decision making to solve large-scale multi-armed bandit problems for future photonic accelerators. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 20 pages, 6 figures

arXiv:2209.02943 [pdf, ps, other]

doi 10.1103/PhysRevA.107.012222

Skeleton structure inherent in discrete-time quantum walks

Authors: Tomoki Yamagami, Etsuo Segawa, Ken'ichiro Tanaka, Takatomo Mihana, André Röhm, Ryoichi Horisaki, Makoto Naruse

Abstract: In this paper, we claim that a common underlying structure--a skeleton structure--is present behind discrete-time quantum walks (QWs) on a one-dimensional lattice with a homogeneous coin matrix. This skeleton structure is independent of the initial state, and partially, even of the coin matrix. This structure is best interpreted in the context of quantum-walk-replicating random walks (QWRWs), i.e.… ▽ More In this paper, we claim that a common underlying structure--a skeleton structure--is present behind discrete-time quantum walks (QWs) on a one-dimensional lattice with a homogeneous coin matrix. This skeleton structure is independent of the initial state, and partially, even of the coin matrix. This structure is best interpreted in the context of quantum-walk-replicating random walks (QWRWs), i.e., random walks that replicate the probability distribution of quantum walks, where this newly found structure acts as a simplified formula for the transition probability. Additionally, we construct a random walk whose transition probabilities are defined by the skeleton structure and demonstrate that the resultant properties of the walkers are similar to both the original QWs and QWRWs. △ Less

Submitted 2 February, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: 26 pages, 9 figures

Journal ref: Physical Review A, Vol. 107, Issue 1, 012222 (2023)

arXiv:2208.03082 [pdf, other]

doi 10.1103/PhysRevApplied.18.064018

Conflict-free joint sampling for preference satisfaction through quantum interference

Authors: Hiroaki Shinkawa, Nicolas Chauvet, André Röhm, Takatomo Mihana, Ryoichi Horisaki, Guillaume Bachelier, Makoto Naruse

Abstract: Collective decision-making is vital for recent information and communications technologies. In our previous research, we mathematically derived conflict-free joint decision-making that optimally satisfies players' probabilistic preference profiles. However, two problems exist regarding the optimal joint decision-making method. First, as the number of choices increases, the computational cost of ca… ▽ More Collective decision-making is vital for recent information and communications technologies. In our previous research, we mathematically derived conflict-free joint decision-making that optimally satisfies players' probabilistic preference profiles. However, two problems exist regarding the optimal joint decision-making method. First, as the number of choices increases, the computational cost of calculating the optimal joint selection probability matrix explodes. Second, to derive the optimal joint selection probability matrix, all players must disclose their probabilistic preferences. Now, it is noteworthy that explicit calculation of the joint probability distribution is not necessarily needed; what is necessary for collective decisions is sampling. This study examines several sampling methods that converge to heuristic joint selection probability matrices that satisfy players' preferences. We show that they can significantly reduce the above problems of computational cost and confidentiality. We analyze the probability distribution each of the sampling methods converges to, as well as the computational cost required and the confidentiality secured. In particular, we introduce two conflict-free joint sampling methods through quantum interference of photons. The first system allows the players to hide their choices while satisfying the players' preferences almost perfectly when they have the same preferences. The second system, where the physical nature of light replaces the expensive computational cost, also conceals their choices under the assumption that they have a trusted third party. This paper has been published in Phys. Rev. Applied 18, 064018 (2022) (DOI: 10.1103/PhysRevApplied.18.064018). △ Less

Submitted 8 December, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

Comments: 14 pages, 6 figures

Journal ref: Phys. Rev. Appl. 18, 064018 (2022)

arXiv:2205.05987 [pdf]

Controlling chaotic itinerancy in laser dynamics for reinforcement learning

Authors: Ryugo Iwami, Takatomo Mihana, Kazutaka Kanno, Satoshi Sunada, Makoto Naruse, Atsushi Uchida

Abstract: Photonic artificial intelligence has attracted considerable interest in accelerating machine learning; however, the unique optical properties have not been fully utilized for achieving higher-order functionalities. Chaotic itinerancy, with its spontaneous transient dynamics among multiple quasi-attractors, can be employed to realize brain-like functionalities. In this paper, we propose a method fo… ▽ More Photonic artificial intelligence has attracted considerable interest in accelerating machine learning; however, the unique optical properties have not been fully utilized for achieving higher-order functionalities. Chaotic itinerancy, with its spontaneous transient dynamics among multiple quasi-attractors, can be employed to realize brain-like functionalities. In this paper, we propose a method for controlling the chaotic itinerancy in a multi-mode semiconductor laser to solve a machine learning task, known as the multi-armed bandit problem, which is fundamental to reinforcement learning. The proposed method utilizes ultrafast chaotic itinerant motion in mode competition dynamics controlled via optical injection. We found that the exploration mechanism is completely different from a conventional searching algorithm and is highly scalable, outperforming the conventional approaches for large-scale bandit problems. This study paves the way to utilize chaotic itinerancy for effectively solving complex machine learning tasks as photonic hardware accelerators. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 22 pages, 9 figures, 1 table

arXiv:1803.09425 [pdf]

Scalable photonic reinforcement learning by time-division multiplexing of laser chaos

Authors: Makoto Naruse, Takatomo Mihana, Hirokazu Hori, Hayato Saigo, Kazuya Okamura, Mikio Hasegawa, Atsushi Uchida

Abstract: Reinforcement learning involves decision making in dynamic and uncertain environments and constitutes a crucial element of artificial intelligence. In our previous work, we experimentally demonstrated that the ultrafast chaotic oscillatory dynamics of lasers can be used to solve the two-armed bandit problem efficiently, which requires decision making concerning a class of difficult trade-offs call… ▽ More Reinforcement learning involves decision making in dynamic and uncertain environments and constitutes a crucial element of artificial intelligence. In our previous work, we experimentally demonstrated that the ultrafast chaotic oscillatory dynamics of lasers can be used to solve the two-armed bandit problem efficiently, which requires decision making concerning a class of difficult trade-offs called the exploration-exploitation dilemma. However, only two selections were employed in that research; thus, the scalability of the laser-chaos-based reinforcement learning should be clarified. In this study, we demonstrated a scalable, pipelined principle of resolving the multi-armed bandit problem by introducing time-division multiplexing of chaotically oscillated ultrafast time-series. The experimental demonstrations in which bandit problems with up to 64 arms were successfully solved are presented in this report. Detailed analyses are also provided that include performance comparisons among laser chaos signals generated in different physical conditions, which coincide with the diffusivity inherent in the time series. This study paves the way for ultrafast reinforcement learning by taking advantage of the ultrahigh bandwidths of light wave and practical enabling technologies. △ Less

Submitted 26 March, 2018; originally announced March 2018.

Showing 1–12 of 12 results for author: Mihana, T