(Translated by https://www.hiragana.jp/)
AlphaGo Zero: Difference between revisions - Wikipedia Jump to content

AlphaGo Zero: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
External links and further reading: add category: DeepMind
Tags: Mobile edit Mobile web edit Advanced mobile edit
No edit summary
Tag: Reverted
Line 1: Line 1:
{{short description|Artificial intelligence that plays Go}}
{{short description|Artificial intelligence that plays Go}}
{{use dmy dates|date=October 2017}}
{{use dmy dates|date=October 2017}}
'''AlphaGo Zero''' is a version of [[DeepMind]]'s [[Go software]] [[AlphaGo]]. AlphaGo's team published an article in the journal ''[[Nature (journal)|Nature]]'' on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version.<ref name="Nature2017">{{cite journal|first1= David|last1= Silver|author-link1= David Silver (programmer)|first2= Julian|last2= Schrittwieser|first3= Karen|last3= Simonyan|first4= Ioannis|last4= Antonoglou|first5= Aja|last5= Huang|author-link5= Aja Huang|first6= Arthur|last6= Guez|first7= Thomas|last7= Hubert|first8= Lucas|last8= Baker|first9= Matthew|last9= Lai|first10= Adrian|last10= Bolton|first11= Yutian|last11= Chen|author-link11= Chen Yutian|first12= Timothy|last12= Lillicrap|first13= Hui|last13= Fan|author-link13= Fan Hui|first14= Laurent|last14= Sifre|first15= George van den|last15= Driessche|first16= Thore|last16= Graepel|first17= Demis|last17= Hassabis|author-link17= Demis Hassabis|title= Mastering the game of Go without human knowledge|journal= [[Nature (journal)|Nature]]|issn= 0028-0836|pages= 354–359|volume= 550|issue= 7676|doi= 10.1038/nature24270|pmid= 29052630|date= 19 October 2017|bibcode= 2017Natur.550..354S|s2cid= 205261034|url= https://discovery.ucl.ac.uk/10045895/1/agz_unformatted_nature.pdf|access-date=17 June 2023}}{{closed access}}</ref> By playing games against itself, AlphaGo Zero surpassed the strength of [[AlphaGo Lee]] in three days by winning 100 games to 0, reached the level of [[AlphaGo Master]] in 21 days, and exceeded all the old versions in 40 days.<ref name="Deepmind20171018">{{cite web|first1=Demis|last1=Hassabis|author-link=Demis Hassabis|first2=David|last2=Siver|author-link2=David Silver (programmer)|url=https://deepmind.com/blog/alphago-zero-learning-scratch/|title=AlphaGo Zero: Learning from scratch|date=18 October 2017|publisher=[[DeepMind]] official website|accessdate=19 October 2017|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019220157/https://deepmind.com/blog/alphago-zero-learning-scratch/|url-status=dead}}</ref>
{{Artificial intelligence}}
'''AlphaGo Zero''' is a version of [[DeepMind]]'s [[Go software]] [[AlphaGo]]. AlphaGo's team published an article in the journal ''[[Nature (journal)|Nature]]'' on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version.<ref name="Nature2017">{{cite journal|first1= David|last1= Silver|author-link1= David Silver (programmer)|first2= Julian|last2= Schrittwieser|first3= Karen|last3= Simonyan|first4= Ioannis|last4= Antonoglou|first5= Aja|last5= Huang|author-link5= Aja Huang|first6= Arthur|last6= Guez|first7= Thomas|last7= Hubert|first8= Lucas|last8= Baker|first9= Matthew|last9= Lai|first10= Adrian|last10= Bolton|first11= Yutian|last11= Chen|author-link11= Chen Yutian|first12= Timothy|last12= Lillicrap|first13= Hui|last13= Fan|author-link13= Fan Hui|first14= Laurent|last14= Sifre|first15= George van den|last15= Driessche|first16= Thore|last16= Graepel|first17= Demis|last17= Hassabis|author-link17= Demis Hassabis|title= Mastering the game of Go without human knowledge|journal= [[Nature (journal)|Nature]]|issn= 0028-0836|pages= 354–359|volume= 550|issue= 7676|doi= 10.1038/nature24270|pmid= 29052630|date= 19 October 2017|bibcode= 2017Natur.550..354S|s2cid= 205261034|url= http://discovery.ucl.ac.uk/10045895/1/agz_unformatted_nature.pdf|access-date= 2 September 2019|archive-date= 18 July 2018|archive-url= https://web.archive.org/web/20180718225914/http://discovery.ucl.ac.uk/10045895/1/agz_unformatted_nature.pdf|url-status= live}}{{closed access}}</ref> By playing games against itself, AlphaGo Zero surpassed the strength of [[AlphaGo Lee]] in three days by winning 100 games to 0, reached the level of [[AlphaGo Master]] in 21 days, and exceeded all the old versions in 40 days.<ref name="Deepmind20171018">{{cite web|first1=Demis|last1=Hassabis|author-link=Demis Hassabis|first2=David|last2=Siver|author-link2=David Silver (programmer)|url=https://deepmind.com/blog/alphago-zero-learning-scratch/|title=AlphaGo Zero: Learning from scratch|date=18 October 2017|publisher=[[DeepMind]] official website|accessdate=19 October 2017|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019220157/https://deepmind.com/blog/alphago-zero-learning-scratch/|url-status=dead}}</ref>


Training [[artificial intelligence]] (AI) without datasets derived from human experts has significant implications for the development of AI with superhuman skills because expert data is "often expensive, unreliable or simply unavailable."<ref>{{cite web|url=https://finance.yahoo.com/news/google-apos-alphago-breakthrough-could-095332226.html|title=Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone|publisher=[[Yahoo! Finance]]|date=19 October 2017|accessdate=19 October 2017|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019220713/https://finance.yahoo.com/news/google-apos-alphago-breakthrough-could-095332226.html|url-status=live}}</ref> [[Demis Hassabis]], the co-founder and CEO of DeepMind, said that AlphaGo Zero was so powerful because it was "no longer constrained by the limits of human knowledge".<ref>{{cite news|url=https://www.telegraph.co.uk/science/2017/10/18/alphago-zero-google-deepmind-supercomputer-learns-3000-years/|title=AlphaGo Zero: Google DeepMind supercomputer learns 3,000 years of human knowledge in 40 days|journal=The Telegraph|date=18 October 2017|accessdate=19 October 2017|last1=Knapton|first1=Sarah|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019111837/http://www.telegraph.co.uk/science/2017/10/18/alphago-zero-google-deepmind-supercomputer-learns-3000-years/|url-status=live}}</ref> Furthermore, AlphaGo Zero performed better than standard reinforcement deep learning models (such as DQN implementations<ref>{{Citation|last=mnj12|title=mnj12/chessDeepLearning|date=2021-07-07|url=https://github.com/mnj12/chessDeepLearning|access-date=2021-07-07}}</ref>) due to its integration of Monte Carlo tree search. [[David Silver (programmer)|David Silver]], one of the first authors of DeepMind's papers published in ''Nature'' on AlphaGo, said that it is possible to have generalised AI algorithms by removing the need to learn from humans.<ref>{{cite web|url=http://www.zdnet.com/article/deepmind-alphago-zero-learns-on-its-own-without-meatbag-intervention/|title=DeepMind AlphaGo Zero learns on its own without meatbag intervention|publisher=[[ZDNet]]|date=19 October 2017|accessdate=20 October 2017|archive-date=20 October 2017|archive-url=https://web.archive.org/web/20171020080244/http://www.zdnet.com/article/deepmind-alphago-zero-learns-on-its-own-without-meatbag-intervention/|url-status=live}}</ref>
Training [[artificial intelligence]] (AI) without datasets derived from human experts has significant implications for the development of AI with superhuman skills because expert data is "often expensive, unreliable or simply unavailable."<ref>{{cite web|url=https://finance.yahoo.com/news/google-apos-alphago-breakthrough-could-095332226.html|title=Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone|publisher=[[Yahoo! Finance]]|date=19 October 2017|accessdate=19 October 2017|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019220713/https://finance.yahoo.com/news/google-apos-alphago-breakthrough-could-095332226.html|url-status=live}}</ref> [[Demis Hassabis]], the co-founder and CEO of DeepMind, said that AlphaGo Zero was so powerful because it was "no longer constrained by the limits of human knowledge".<ref>{{cite news|url=https://www.telegraph.co.uk/science/2017/10/18/alphago-zero-google-deepmind-supercomputer-learns-3000-years/|title=AlphaGo Zero: Google DeepMind supercomputer learns 3,000 years of human knowledge in 40 days|journal=The Telegraph|date=18 October 2017|accessdate=19 October 2017|last1=Knapton|first1=Sarah|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019111837/http://www.telegraph.co.uk/science/2017/10/18/alphago-zero-google-deepmind-supercomputer-learns-3000-years/|url-status=live}}</ref> Furthermore, AlphaGo Zero performed better than standard reinforcement deep learning models (such as DQN implementations<ref>{{Citation|last=mnj12|title=mnj12/chessDeepLearning|date=2021-07-07|url=https://github.com/mnj12/chessDeepLearning|access-date=2021-07-07}}</ref>) due to its integration of Monte Carlo tree search. [[David Silver (programmer)|David Silver]], one of the first authors of DeepMind's papers published in ''Nature'' on AlphaGo, said that it is possible to have generalised AI algorithms by removing the need to learn from humans.<ref>{{cite web|url=https://www.zdnet.com/article/deepmind-alphago-zero-learns-on-its-own-without-meatbag-intervention/|title=DeepMind AlphaGo Zero learns on its own without meatbag intervention|publisher=[[ZDNet]]|date=18 October 2017|access-date=17 June 2023}}</ref>


Google later developed [[AlphaZero]], a generalized version of AlphaGo Zero that could play [[chess]] and [[Computer shogi|Shōgi]] in addition to Go. In December 2017, AlphaZero beat the 3-day version of AlphaGo Zero by winning 60 games to 40, and with 8 hours of training it outperformed [[AlphaGo Lee]] on an [[Elo rating system|Elo scale]]. AlphaZero also defeated a top chess program ([[Stockfish (chess)|Stockfish]]) and a top Shōgi program ([[Elmo (shogi engine)|Elmo]]).<ref name="preprint">{{Cite arXiv|author-link1=David Silver (programmer)|first1=David|last1= Silver|first2=Thomas|last2= Hubert|first3= Julian|last3=Schrittwieser|first4= Ioannis|last4=Antonoglou |first5= Matthew|last5= Lai|first6= Arthur|last6= Guez|first7= Marc|last7= Lanctot|first8= Laurent|last8= Sifre|first9= Dharshan|last9= Kumaran|author-link9=Dharshan Kumaran|first10= Thore|last10= Graepel|first11= Timothy|last11= Lillicrap|first12= Karen|last12= Simonyan|first13=Demis |last13=Hassabis|author-link13=Demis Hassabis |eprint=1712.01815|title=Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm|class=cs.AI|date=5 December 2017}}</ref><ref>{{cite news|url=https://www.telegraph.co.uk/science/2017/12/06/entire-human-chess-knowledge-learned-surpassed-deepminds-alphazero/|title=Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours|journal=The Telegraph|date=2017-12-06|last1=Knapton|first1=Sarah|last2=Watson|first2=Leon|access-date=5 April 2018|archive-date=2 December 2020|archive-url=https://web.archive.org/web/20201202114556/https://www.telegraph.co.uk/science/2017/12/06/entire-human-chess-knowledge-learned-surpassed-deepminds-alphazero/|url-status=live}}</ref>
Google later developed [[AlphaZero]], a generalized version of AlphaGo Zero that could play [[chess]] and [[Computer shogi|Shōgi]] in addition to Go. In December 2017, AlphaZero beat the 3-day version of AlphaGo Zero by winning 60 games to 40, and with 8 hours of training it outperformed [[AlphaGo Lee]] on an [[Elo rating system|Elo scale]]. AlphaZero also defeated a top chess program ([[Stockfish (chess)|Stockfish]]) and a top Shōgi program ([[Elmo (shogi engine)|Elmo]]).<ref name="preprint">{{Cite arXiv|author-link1=David Silver (programmer)|first1=David|last1= Silver|first2=Thomas|last2= Hubert|first3= Julian|last3=Schrittwieser|first4= Ioannis|last4=Antonoglou |first5= Matthew|last5= Lai|first6= Arthur|last6= Guez|first7= Marc|last7= Lanctot|first8= Laurent|last8= Sifre|first9= Dharshan|last9= Kumaran|author-link9=Dharshan Kumaran|first10= Thore|last10= Graepel|first11= Timothy|last11= Lillicrap|first12= Karen|last12= Simonyan|first13=Demis |last13=Hassabis|author-link13=Demis Hassabis |eprint=1712.01815|title=Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm|class=cs.AI|date=5 December 2017}}</ref><ref>{{cite news|url=https://www.telegraph.co.uk/science/2017/12/06/entire-human-chess-knowledge-learned-surpassed-deepminds-alphazero/|title=Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours|journal=The Telegraph|date=2017-12-06|last1=Knapton|first1=Sarah|last2=Watson|first2=Leon|access-date=5 April 2018|archive-date=2 December 2020|archive-url=https://web.archive.org/web/20201202114556/https://www.telegraph.co.uk/science/2017/12/06/entire-human-chess-knowledge-learned-surpassed-deepminds-alphazero/|url-status=live}}</ref>
Line 10: Line 9:
==Training==
==Training==
AlphaGo Zero's neural network was trained using [[TensorFlow]], with 64 GPU workers and 19 CPU parameter servers.
AlphaGo Zero's neural network was trained using [[TensorFlow]], with 64 GPU workers and 19 CPU parameter servers.
Only four [[Tensor processing unit|TPU]]s were used for inference. The [[neural network]] initially knew nothing about [[Go (game)|Go]] beyond the [[rules of Go|rules]]. Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in [[reinforcement learning]], playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome.<ref name="Scientific American">{{cite news|last1=Greenemeier|first1=Larry|title=AI versus AI: Self-Taught AlphaGo Zero Vanquishes Its Predecessor|url=https://www.scientificamerican.com/article/ai-versus-ai-self-taught-alphago-zero-vanquishes-its-predecessor/|accessdate=20 October 2017|work=Scientific American|language=en|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019230611/https://www.scientificamerican.com/article/ai-versus-ai-self-taught-alphago-zero-vanquishes-its-predecessor/|url-status=live}}</ref> In the first three days AlphaGo Zero played 4.9 million games against itself in quick succession.<ref name=npr>{{cite news|title=Computer Learns To Play Go At Superhuman Levels 'Without Human Knowledge'|url=https://www.npr.org/sections/thetwo-way/2017/10/18/558519095/computer-learns-to-play-go-at-superhuman-levels-without-human-knowledge|accessdate=20 October 2017|work=NPR|date=18 October 2017|language=en|archive-date=20 October 2017|archive-url=https://web.archive.org/web/20171020075313/http://www.npr.org/sections/thetwo-way/2017/10/18/558519095/computer-learns-to-play-go-at-superhuman-levels-without-human-knowledge|url-status=live}}</ref> It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level.<ref>{{cite news|title=Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone|url=http://fortune.com/2017/10/19/google-alphago-zero-deepmind-artificial-intelligence/|accessdate=20 October 2017|work=Fortune|date=19 October 2017|language=en|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019202849/http://fortune.com/2017/10/19/google-alphago-zero-deepmind-artificial-intelligence/|url-status=live}}</ref>
Only four [[Tensor processing unit|TPU]]s were used for inference. The [[neural network]] initially knew nothing about [[Go (game)|Go]] beyond the [[rules of Go|rules]]. Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in [[reinforcement learning]], playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome.<ref name="Scientific American">{{cite news|last1=Greenemeier|first1=Larry|title=AI versus AI: Self-Taught AlphaGo Zero Vanquishes Its Predecessor|url=https://www.scientificamerican.com/article/ai-versus-ai-self-taught-alphago-zero-vanquishes-its-predecessor/|accessdate=20 October 2017|work=Scientific American|language=en|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019230611/https://www.scientificamerican.com/article/ai-versus-ai-self-taught-alphago-zero-vanquishes-its-predecessor/|url-status=live}}</ref> In the first three days AlphaGo Zero played 4.9 million games against itself in quick succession.<ref name=npr>{{cite news|title=Computer Learns To Play Go At Superhuman Levels 'Without Human Knowledge'|url=https://www.npr.org/sections/thetwo-way/2017/10/18/558519095/computer-learns-to-play-go-at-superhuman-levels-without-human-knowledge|accessdate=20 October 2017|work=NPR|date=18 October 2017|language=en|archive-date=20 October 2017|archive-url=https://web.archive.org/web/20171020075313/http://www.npr.org/sections/thetwo-way/2017/10/18/558519095/computer-learns-to-play-go-at-superhuman-levels-without-human-knowledge|url-status=live}}</ref> It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level.<ref>{{cite news|title=Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone|url=http://fortune.com/2017/10/19/google-alphago-zero-deepmind-artificial-intelligence/|accessdate=20 October 2017|work=Fortune|date=19 October 2017|language=en|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019202849/http://fortune.com/2017/10/19/google-alphago-zero-deepmind-artificial-intelligence/|url-status=live}}</ref>


For comparison, the researchers also trained a version of AlphaGo Zero using human games, AlphaGo Master, and found that it learned more quickly, but actually performed more poorly in the long run.<ref>{{cite news|title=This computer program can beat humans at Go—with no human instruction|url=https://www.science.org/content/article/computer-program-can-beat-humans-go-no-human-instruction|accessdate=20 October 2017|work=Science {{!}} AAAS|date=18 October 2017|language=en|archive-date=2 February 2022|archive-url=https://web.archive.org/web/20220202142717/https://www.science.org/content/article/computer-program-can-beat-humans-go-no-human-instruction|url-status=live}}</ref> DeepMind submitted its initial findings in a paper to ''Nature'' in April 2017, which was then published in October 2017.<ref name="Nature2017" />
For comparison, the researchers also trained a version of AlphaGo Zero using human games, AlphaGo Master, and found that it learned more quickly, but actually performed more poorly in the long run.<ref>{{cite news|title=This computer program can beat humans at Go—with no human instruction|url=https://www.science.org/content/article/computer-program-can-beat-humans-go-no-human-instruction|accessdate=20 October 2017|work=Science {{!}} AAAS|date=18 October 2017|language=en|archive-date=2 February 2022|archive-url=https://web.archive.org/web/20220202142717/https://www.science.org/content/article/computer-program-can-beat-humans-go-no-human-instruction|url-status=live}}</ref> DeepMind submitted its initial findings in a paper to ''Nature'' in April 2017, which was then published in October 2017.<ref name="Nature2017" />
Line 21: Line 20:


==Reception==
==Reception==
AlphaGo Zero was widely regarded as a significant advance, even when compared with its groundbreaking predecessor, AlphaGo. [[Oren Etzioni]] of the [[Allen Institute for Artificial Intelligence]] called AlphaGo Zero "a very impressive technical result" in "both their ability to do it—and their ability to train the system in 40 days, on four TPUs".<ref name="Scientific American"/> [[The Guardian]] called it a "major breakthrough for artificial intelligence", citing Eleni Vasilaki of [[Sheffield University]] and Tom Mitchell of [[Carnegie Mellon University]], who called it an impressive feat and an “outstanding engineering accomplishment" respectively.<ref name=guardian>{{cite news|last1=Sample|first1=Ian|title='It's able to create knowledge itself': Google unveils AI that learns on its own|url=https://www.theguardian.com/science/2017/oct/18/its-able-to-create-knowledge-itself-google-unveils-ai-learns-all-on-its-own|accessdate=20 October 2017|work=The Guardian|date=18 October 2017|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019213849/https://www.theguardian.com/science/2017/oct/18/its-able-to-create-knowledge-itself-google-unveils-ai-learns-all-on-its-own|url-status=live}}</ref> [[Mark Pesce]] of the University of Sydney called AlphaGo Zero "a big technological advance" taking us into "undiscovered territory".<ref>{{cite news|title=How Google's new AI can teach itself to beat you at the most complex games|url=http://www.abc.net.au/news/2017-10-19/googles-new-ai-learns-without-human-input-to-win-complex-game/9065562|accessdate=20 October 2017|work=[[Australian Broadcasting Corporation]]|date=19 October 2017|language=en-AUえーゆー|archive-date=20 October 2017|archive-url=https://web.archive.org/web/20171020133502/http://www.abc.net.au/news/2017-10-19/googles-new-ai-learns-without-human-input-to-win-complex-game/9065562|url-status=live}}</ref>
AlphaGo Zero was widely regarded as a significant advance, even when compared with its groundbreaking predecessor, AlphaGo. [[Oren Etzioni]] of the [[Allen Institute for Artificial Intelligence]] called AlphaGo Zero "a very impressive technical result" in "both their ability to do it—and their ability to train the system in 40 days, on four TPUs".<ref name="Scientific American"/> [[The Guardian]] called it a "major breakthrough for artificial intelligence", citing Eleni Vasilaki of [[Sheffield University]] and Tom Mitchell of [[Carnegie Mellon University]], who called it an impressive feat and an “outstanding engineering accomplishment" respectively.<ref name=guardian>{{cite news|last1=Sample|first1=Ian|title='It's able to create knowledge itself': Google unveils AI that learns on its own|url=https://www.theguardian.com/science/2017/oct/18/its-able-to-create-knowledge-itself-google-unveils-ai-learns-all-on-its-own|accessdate=20 October 2017|work=The Guardian|date=18 October 2017|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019213849/https://www.theguardian.com/science/2017/oct/18/its-able-to-create-knowledge-itself-google-unveils-ai-learns-all-on-its-own|url-status=live}}</ref> [[Mark Pesce]] of the University of Sydney called AlphaGo Zero "a big technological advance" taking us into "undiscovered territory".<ref>{{cite news|title=Google DeepMind&#x27;s AI teaches itself to beat human players of complex Chinese game in three days|url=https://www.abc.net.au/news/2017-10-19/googles-new-ai-learns-without-human-input-to-win-complex-game/9065562|access-date=17 June 2023|work=[[Australian Broadcasting Corporation]]|date=19 October 2017}}</ref>


[[Gary Marcus]], a psychologist at [[New York University]], has cautioned that for all we know, AlphaGo may contain "implicit knowledge that the programmers have about how to construct machines to play problems like Go" and will need to be tested in other domains before being sure that its base architecture is effective at much more than playing Go. In contrast, DeepMind is "confident that this approach is generalisable to a large number of domains".<ref name=npr />
[[Gary Marcus]], a psychologist at [[New York University]], has cautioned that for all we know, AlphaGo may contain "implicit knowledge that the programmers have about how to construct machines to play problems like Go" and will need to be tested in other domains before being sure that its base architecture is effective at much more than playing Go. In contrast, DeepMind is "confident that this approach is generalisable to a large number of domains".<ref name=npr />
Line 30: Line 29:
Mok has reportedly already begun analyzing the playing style of AlphaGo Zero along with players from the national team.
Mok has reportedly already begun analyzing the playing style of AlphaGo Zero along with players from the national team.
"Though having watched only a few matches, we received the impression that AlphaGo Zero plays more like a human than its predecessors," Mok said.<ref>{{cite news|title=Go Players Excited About 'More Humanlike' AlphaGo Zero|url=http://koreabizwire.com/go-players-excited-about-more-humanlike-alphago-zero/98282|accessdate=21 October 2017|work=[[Korea Bizwire]]|date=19 October 2017|language=en|archive-date=21 October 2017|archive-url=https://web.archive.org/web/20171021063517/http://koreabizwire.com/go-players-excited-about-more-humanlike-alphago-zero/98282|url-status=live}}</ref>
"Though having watched only a few matches, we received the impression that AlphaGo Zero plays more like a human than its predecessors," Mok said.<ref>{{cite news|title=Go Players Excited About 'More Humanlike' AlphaGo Zero|url=http://koreabizwire.com/go-players-excited-about-more-humanlike-alphago-zero/98282|accessdate=21 October 2017|work=[[Korea Bizwire]]|date=19 October 2017|language=en|archive-date=21 October 2017|archive-url=https://web.archive.org/web/20171021063517/http://koreabizwire.com/go-players-excited-about-more-humanlike-alphago-zero/98282|url-status=live}}</ref>
Chinese Go professional, [[Ke Jie]] commented on the remarkable accomplishments of the new program: "A pure self-learning AlphaGo is the strongest. Humans seem redundant in front of its self-improvement."<ref>{{cite news|title=New version of AlphaGo can master Weiqi without human help|url=http://www.ecns.cn/2017/10-19/277691.shtml|accessdate=21 October 2017|work=[[China News Service]]|date=19 October 2017|language=en|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019160724/http://www.ecns.cn/2017/10-19/277691.shtml|url-status=live}}</ref>
Chinese Go professional, [[Ke Jie]] commented on the remarkable accomplishments of the new program: "A pure self-learning AlphaGo is the strongest. Humans seem redundant in front of its self-improvement."<ref>{{cite news|title=New version of AlphaGo can master Weiqi without human help|url=https://www.ecns.cn/2017/10-19/277691.shtml|access-date=17 June 2023|work=[[China News Service]]|date=19 October 2017|language=en}}</ref>


==Comparison with predecessors==
==Comparison with predecessors==


{| class="wikitable sortable" style="text-align:center"
{| class="wikitable sortable" style="text-align:center"
|+Configuration and strength<ref name="sohu0524">{{cite web|url=http://www.sohu.com/a/143092581_473283|title=【柯洁战败かいみつ】AlphaGo Master最新さいしん构和算法さんぽうたにうんあずかTPU拆解|date=24 May 2017|publisher=[[Sohu]]|language=zh|accessdate=1 June 2017|archive-date=17 September 2017|archive-url=https://web.archive.org/web/20170917171454/https://www.sohu.com/a/143092581_473283|url-status=live}}</ref>
|+Configuration and strength<ref name="sohu0524">{{cite web|url=https://www.sohu.com/a/143092581_473283|title=【柯洁战败かいみつ】AlphaGo Master最新さいしん构和算法さんぽうたにうんあずかTPU拆解|date=24 May 2017|publisher=[[Sohu]]|language=zh|access-date=17 June 2023}}</ref>
!Versions
!Versions
!Playing hardware<ref>Hardware used during training may be substantially more powerful</ref>
!Playing hardware<ref>Hardware used during training may be substantially more powerful</ref>
Line 84: Line 83:
* {{cite web |url=https://deepmind.com/blog/alphago-zero-learning-scratch/ |title=AlphaGo Zero: Starting from scratch |archive-url=https://web.archive.org/web/20200103035714/https://deepmind.com/blog/article/alphago-zero-starting-scratch |archive-date=3 January 2020 |url-status=dead}}
* {{cite web |url=https://deepmind.com/blog/alphago-zero-learning-scratch/ |title=AlphaGo Zero: Starting from scratch |archive-url=https://web.archive.org/web/20200103035714/https://deepmind.com/blog/article/alphago-zero-starting-scratch |archive-date=3 January 2020 |url-status=dead}}
* {{cite journal|pmid=29052631|title=AOP|journal=Nature|volume=550|issue=7676|pages=336–337|year=2017|last1=Singh|first1=S.|last2=Okun|first2=A.|last3=Jackson|first3=A.|doi=10.1038/550336a|bibcode=2017Natur.550..336S|s2cid=4447445|doi-access=free}}
* {{cite journal|pmid=29052631|title=AOP|journal=Nature|volume=550|issue=7676|pages=336–337|year=2017|last1=Singh|first1=S.|last2=Okun|first2=A.|last3=Jackson|first3=A.|doi=10.1038/550336a|bibcode=2017Natur.550..336S|s2cid=4447445|doi-access=free}}
* {{cite journal|doi=10.1038/nature24270|pmid=29052630|title=Mastering the game of Go without human knowledge|journal=Nature|volume=550|issue=7676|pages=354–359|year=2017|last1=Silver|first1=David|last2=Schrittwieser|first2=Julian|last3=Simonyan|first3=Karen|last4=Antonoglou|first4=Ioannis|last5=Huang|first5=Aja|last6=Guez|first6=Arthur|last7=Hubert|first7=Thomas|last8=Baker|first8=Lucas|last9=Lai|first9=Matthew|last10=Bolton|first10=Adrian|last11=Chen|first11=Yutian|last12=Lillicrap|first12=Timothy|last13=Hui|first13=Fan|last14=Sifre|first14=Laurent|last15=Van Den Driessche|first15=George|last16=Graepel|first16=Thore|last17=Hassabis|first17=Demis|bibcode=2017Natur.550..354S|s2cid=205261034|url=http://discovery.ucl.ac.uk/10045895/1/agz_unformatted_nature.pdf}}
* {{cite journal|doi=10.1038/nature24270|pmid=29052630|title=Mastering the game of Go without human knowledge|journal=Nature|volume=550|issue=7676|pages=354–359|year=2017|last1=Silver|first1=David|last2=Schrittwieser|first2=Julian|last3=Simonyan|first3=Karen|last4=Antonoglou|first4=Ioannis|last5=Huang|first5=Aja|last6=Guez|first6=Arthur|last7=Hubert|first7=Thomas|last8=Baker|first8=Lucas|last9=Lai|first9=Matthew|last10=Bolton|first10=Adrian|last11=Chen|first11=Yutian|last12=Lillicrap|first12=Timothy|last13=Hui|first13=Fan|last14=Sifre|first14=Laurent|last15=Van Den Driessche|first15=George|last16=Graepel|first16=Thore|last17=Hassabis|first17=Demis|bibcode=2017Natur.550..354S|s2cid=205261034|url=https://discovery.ucl.ac.uk/10045895/1/agz_unformatted_nature.pdf}}
* [http://www.alphago-games.com/ AlphaGo Zero Games]
* [https://www.alphago-games.com AlphaGo Zero Games]
* [https://www.reddit.com/r/MachineLearning/comments/76xjb5/ama_we_are_david_silver_and_julian_schrittwieser/ AMA on Reddit]
* [https://www.reddit.com/r/MachineLearning/comments/76xjb5/ama_we_are_david_silver_and_julian_schrittwieser/ AMA on Reddit]



Revision as of 16:04, 17 June 2023

AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version.[1] By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.[2]

Training artificial intelligence (AI) without datasets derived from human experts has significant implications for the development of AI with superhuman skills because expert data is "often expensive, unreliable or simply unavailable."[3] Demis Hassabis, the co-founder and CEO of DeepMind, said that AlphaGo Zero was so powerful because it was "no longer constrained by the limits of human knowledge".[4] Furthermore, AlphaGo Zero performed better than standard reinforcement deep learning models (such as DQN implementations[5]) due to its integration of Monte Carlo tree search. David Silver, one of the first authors of DeepMind's papers published in Nature on AlphaGo, said that it is possible to have generalised AI algorithms by removing the need to learn from humans.[6]

Google later developed AlphaZero, a generalized version of AlphaGo Zero that could play chess and Shōgi in addition to Go. In December 2017, AlphaZero beat the 3-day version of AlphaGo Zero by winning 60 games to 40, and with 8 hours of training it outperformed AlphaGo Lee on an Elo scale. AlphaZero also defeated a top chess program (Stockfish) and a top Shōgi program (Elmo).[7][8]

Training

AlphaGo Zero's neural network was trained using TensorFlow, with 64 GPU workers and 19 CPU parameter servers. Only four TPUs were used for inference. The neural network initially knew nothing about Go beyond the rules. Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in reinforcement learning, playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome.[9] In the first three days AlphaGo Zero played 4.9 million games against itself in quick succession.[10] It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level.[11]

For comparison, the researchers also trained a version of AlphaGo Zero using human games, AlphaGo Master, and found that it learned more quickly, but actually performed more poorly in the long run.[12] DeepMind submitted its initial findings in a paper to Nature in April 2017, which was then published in October 2017.[1]

Hardware cost

The hardware cost for a single AlphaGo Zero system in 2017, including the four TPUs, has been quoted as around $25 million.[13]

Applications

According to Hassabis, AlphaGo's algorithms are likely to be of the most benefit to domains that require an intelligent search through an enormous space of possibilities, such as protein folding (see AlphaFold) or accurately simulating chemical reactions.[14] AlphaGo's techniques are probably less useful in domains that are difficult to simulate, such as learning how to drive a car.[15] DeepMind stated in October 2017 that it had already started active work on attempting to use AlphaGo Zero technology for protein folding, and stated it would soon publish new findings.[16][17]

Reception

AlphaGo Zero was widely regarded as a significant advance, even when compared with its groundbreaking predecessor, AlphaGo. Oren Etzioni of the Allen Institute for Artificial Intelligence called AlphaGo Zero "a very impressive technical result" in "both their ability to do it—and their ability to train the system in 40 days, on four TPUs".[9] The Guardian called it a "major breakthrough for artificial intelligence", citing Eleni Vasilaki of Sheffield University and Tom Mitchell of Carnegie Mellon University, who called it an impressive feat and an “outstanding engineering accomplishment" respectively.[15] Mark Pesce of the University of Sydney called AlphaGo Zero "a big technological advance" taking us into "undiscovered territory".[18]

Gary Marcus, a psychologist at New York University, has cautioned that for all we know, AlphaGo may contain "implicit knowledge that the programmers have about how to construct machines to play problems like Go" and will need to be tested in other domains before being sure that its base architecture is effective at much more than playing Go. In contrast, DeepMind is "confident that this approach is generalisable to a large number of domains".[10]

In response to the reports, South Korean Go professional Lee Sedol said, "The previous version of AlphaGo wasn’t perfect, and I believe that’s why AlphaGo Zero was made." On the potential for AlphaGo's development, Lee said he will have to wait and see but also said it will affect young Go players. Mok Jin-seok, who directs the South Korean national Go team, said the Go world has already been imitating the playing styles of previous versions of AlphaGo and creating new ideas from them, and he is hopeful that new ideas will come out from AlphaGo Zero. Mok also added that general trends in the Go world are now being influenced by AlphaGo's playing style. "At first, it was hard to understand and I almost felt like I was playing against an alien. However, having had a great amount of experience, I’ve become used to it," Mok said. "We are now past the point where we debate the gap between the capability of AlphaGo and humans. It’s now between computers." Mok has reportedly already begun analyzing the playing style of AlphaGo Zero along with players from the national team. "Though having watched only a few matches, we received the impression that AlphaGo Zero plays more like a human than its predecessors," Mok said.[19] Chinese Go professional, Ke Jie commented on the remarkable accomplishments of the new program: "A pure self-learning AlphaGo is the strongest. Humans seem redundant in front of its self-improvement."[20]

Comparison with predecessors

Configuration and strength[21]
Versions Playing hardware[22] Elo rating Matches
AlphaGo Fan 176 GPUs,[2] distributed 3,144[1] 5:0 against Fan Hui
AlphaGo Lee 48 TPUs,[2] distributed 3,739[1] 4:1 against Lee Sedol
AlphaGo Master 4 TPUs,[2] single machine 4,858[1] 60:0 against professional players;

Future of Go Summit

AlphaGo Zero (40 days) 4 TPUs,[2] single machine 5,185[1] 100:0 against AlphaGo Lee

89:11 against AlphaGo Master

AlphaZero (34 hours) 4 TPUs, single machine[7] 4,430 (est.)[7] 60:40 against a 3-day AlphaGo Zero

AlphaZero

On 5 December 2017, DeepMind team released a preprint on arXiv, introducing AlphaZero, a program using generalized AlphaGo Zero's approach, which achieved within 24 hours a superhuman level of play in chess, shogi, and Go, defeating world-champion programs, Stockfish, Elmo, and 3-day version of AlphaGo Zero in each case.[7]

AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ) algorithm, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include:[7]

  • AZ has hard-coded rules for setting search hyperparameters.
  • The neural network is now updated continually.
  • Chess (unlike Go) can end in a tie; therefore AZ can take into account the possibility of a tie game.

An open source program, Leela Zero, based on the ideas from the AlphaGo papers is available. It uses a GPU instead of the TPUs recent versions of AlphaGo rely on.

References

  1. ^ a b c d e f Silver, David; Schrittwieser, Julian; Simonyan, Karen; Antonoglou, Ioannis; Huang, Aja; Guez, Arthur; Hubert, Thomas; Baker, Lucas; Lai, Matthew; Bolton, Adrian; Chen, Yutian; Lillicrap, Timothy; Fan, Hui; Sifre, Laurent; Driessche, George van den; Graepel, Thore; Hassabis, Demis (19 October 2017). "Mastering the game of Go without human knowledge" (PDF). Nature. 550 (7676): 354–359. Bibcode:2017Natur.550..354S. doi:10.1038/nature24270. ISSN 0028-0836. PMID 29052630. S2CID 205261034. Retrieved 17 June 2023.Closed access icon
  2. ^ a b c d e Hassabis, Demis; Siver, David (18 October 2017). "AlphaGo Zero: Learning from scratch". DeepMind official website. Archived from the original on 19 October 2017. Retrieved 19 October 2017.
  3. ^ "Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone". Yahoo! Finance. 19 October 2017. Archived from the original on 19 October 2017. Retrieved 19 October 2017.
  4. ^ Knapton, Sarah (18 October 2017). "AlphaGo Zero: Google DeepMind supercomputer learns 3,000 years of human knowledge in 40 days". The Telegraph. Archived from the original on 19 October 2017. Retrieved 19 October 2017.
  5. ^ mnj12 (7 July 2021), mnj12/chessDeepLearning, retrieved 7 July 2021{{citation}}: CS1 maint: numeric names: authors list (link)
  6. ^ "DeepMind AlphaGo Zero learns on its own without meatbag intervention". ZDNet. 18 October 2017. Retrieved 17 June 2023.
  7. ^ a b c d e Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI].
  8. ^ Knapton, Sarah; Watson, Leon (6 December 2017). "Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours". The Telegraph. Archived from the original on 2 December 2020. Retrieved 5 April 2018.
  9. ^ a b Greenemeier, Larry. "AI versus AI: Self-Taught AlphaGo Zero Vanquishes Its Predecessor". Scientific American. Archived from the original on 19 October 2017. Retrieved 20 October 2017.
  10. ^ a b "Computer Learns To Play Go At Superhuman Levels 'Without Human Knowledge'". NPR. 18 October 2017. Archived from the original on 20 October 2017. Retrieved 20 October 2017.
  11. ^ "Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone". Fortune. 19 October 2017. Archived from the original on 19 October 2017. Retrieved 20 October 2017.
  12. ^ "This computer program can beat humans at Go—with no human instruction". Science | AAAS. 18 October 2017. Archived from the original on 2 February 2022. Retrieved 20 October 2017.
  13. ^ Gibney, Elizabeth (18 October 2017). "Self-taught AI is best yet at strategy game Go". Nature News. doi:10.1038/nature.2017.22858. Archived from the original on 1 May 2020. Retrieved 10 May 2020.
  14. ^ "The latest AI can work things out without being taught". The Economist. Archived from the original on 19 October 2017. Retrieved 20 October 2017.
  15. ^ a b Sample, Ian (18 October 2017). "'It's able to create knowledge itself': Google unveils AI that learns on its own". The Guardian. Archived from the original on 19 October 2017. Retrieved 20 October 2017.
  16. ^ "'It's able to create knowledge itself': Google unveils AI that learns on its own". The Guardian. 18 October 2017. Archived from the original on 19 October 2017. Retrieved 26 December 2017.
  17. ^ Knapton, Sarah (18 October 2017). "AlphaGo Zero: Google DeepMind supercomputer learns 3,000 years of human knowledge in 40 days". The Telegraph. Archived from the original on 15 December 2017. Retrieved 26 December 2017.
  18. ^ "Google DeepMind's AI teaches itself to beat human players of complex Chinese game in three days". Australian Broadcasting Corporation. 19 October 2017. Retrieved 17 June 2023.
  19. ^ "Go Players Excited About 'More Humanlike' AlphaGo Zero". Korea Bizwire. 19 October 2017. Archived from the original on 21 October 2017. Retrieved 21 October 2017.
  20. ^ "New version of AlphaGo can master Weiqi without human help". China News Service. 19 October 2017. Retrieved 17 June 2023.
  21. ^ "【柯洁战败かいみつ】AlphaGo Master最新さいしん构和算法さんぽうたにうんあずかTPU拆解" (in Chinese). Sohu. 24 May 2017. Retrieved 17 June 2023.
  22. ^ Hardware used during training may be substantially more powerful