花札のこいこいにおける方策勾配法とNeural Fitted Q Iteration の適用

佐藤, 直之, 池田, 心

花札はなふだの「こいこい」ゲームは交互こうご2人にん零れい和わ不完全ふかんぜん情報じょうほうゲームの一種いっしゅで，様々さまざまな媒体ばいたいで多おおくの人ひとに遊あそばれているが研究けんきゅう例れいが少すくなく，人間にんげんの上級じょうきゅう者しゃに匹敵ひってきする人工じんこうプレイヤが開発かいはつされたという話はなしも聞きかない．そのため我々われわれは強化きょうか学習がくしゅうの方策ほうさく勾配こうばい法ほうとNeural Fitted Q Iterationを用もちいて強つよい「こいこい」プレイヤの実装じっそうを試こころみた．それぞれ盤面ばんめんの低級ていきゅうな特徴とくちょう量りょう268個こを入力にゅうりょくに用もちいた人工じんこうニューラルネットワークを状態じょうたい行動こうどう価値かちの推定すいていに用もちい，簡単かんたんなルールベース人工じんこうプレイヤとの反復はんぷく対戦たいせんを通つうじて適切てきせつなパラメータの学習がくしゅうを行おこなった．その結果けっかそれぞれ対戦たいせん相手あいてから搾取さくしゅした平均へいきんスコアは-0.3点てんと0.5点てんとなった． :Koi-koi game, which is played using Hanafuda playing cards, is a Japanese traditional card game classi?ed as two players turn based imperfect information zero sum game. There are few research article focusing on this game even though this game is popular in Japan. Therefore, we tried to make strong Koi-koi game player by applying two types of reinforcement learning methods. We applied policy gradient method and neural ?tted Q iteration. Each player played games against an arti?cial player which we constructed making its decision in a simple rule based manner. Over 1,000 times game, policy gradient player gained -0.3 score per game and neural ?tted Q iteration player gained 0.5 scores in average.

identifier:https://dspace.jaist.ac.jp/dspace/handle/10119/16089

花札はなふだのこいこいにおける方策ほうさく勾配こうばい法ほうとNeural Fitted Q Iteration の適用てきよう

書誌しょし事項じこう

抄録しょうろく

収録しゅうろく刊行かんこう物ぶつ

キーワード

詳細しょうさい情報じょうほう詳細しょうさい情報じょうほうについて

書かき出だし

問題もんだいの指摘してき