基於Muzero演算法的黑白棋程式__國立東華大學博碩士論文全文影像系統

帳號：guest(3.145.165.158) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	王泰翔
作者(英文):	Tai-Xiang Wang
論文名稱:	基於Muzero演算法的黑白棋程式
論文名稱(英文):	An Othello Program Based on Muzero Algorithm
指導教授:	顏士淨
指導教授(英文):	Shi-Jim Yen
口試委員:	林紋正陳志昌
口試委員(英文):	Wen-Cheng Lin JR-Chang Chen
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊工程學系
學號:	610821202
出版年(民國):	109
畢業學年度:	109
語文別:	中文
論文頁數:	32
關鍵詞:	電腦對局、蒙地卡羅樹搜尋、黑白棋、Muzero、機器學習
關鍵詞(英文):	Computer Games、Monte-Carlo Tree Search、Othello、Muzero、Machine Learning
相關次數:	推薦:0 點閱:67 評分: 下載:77 收藏:0

在Alphazero演算法應用在多種棋類並取得巨大成功之後，Deepmind團隊進而提出了Muzero演算法，不僅僅是在棋類上獲得豐碩的成果，更是跨到了電子遊戲的領域上，在Atari平台上的57款遊戲達到了SOTA的水準。Muzero演算法不用借助規則並且利用神經網路來學習規則，更是為電腦對局提供了另外一種思路。在本篇論文中，應用了Muzero演算法來強化黑白棋，透過黑白棋這個棋盤遊戲，結合卷積神經網路來加以訓練，並觀察訓練過程中黑白棋棋力的變化。經過一段時間訓練後，對戰隨機落子的程式勝率可以提升至85%，對戰兩層Alpha-Beta剪枝的程式可以達到73%的勝率。除此之外，使用多行程來加速模擬棋局的過程，在單張顯示卡上，相較於單行程，可以提昇42％的速度。

After huge success in various chess games in the application of the Alphazero algorithm, Google Deepmind proposed the Muzero algorithm. Not only achieved fruitful results in chess but also crossed the field of video games. The Muzero algorithm achieved a new state of the art on 57 different Atari games. It does not need to follow the rules to expand the Monte Carlo tree, and uses neural networks to learn the rules of game. In this paper, we attempt to implement 6*6 Othello based on Muzero algorithm and observe performance on training process.

第一章緒論 1
1.1研究背景 1
1.2黑白棋簡介 3
1.3研究動機及目的 5
1.4論文概述 5
第二章文獻探討 6
2.1UCT演算法 6
2.2蒙地卡羅樹搜尋 7
2.3Muzero演算法 9
2.4卷積神經網路 11
2.5 PyTorch 13
第三章研究方法 14
3.2程式流程 14
3.3自我對下流程 16
3.4神經網路 18
3.4.1 輸入及輸出 18
3.4.2神經網路架構 20
第四章實驗結果 21
4.1環境配置 21
4.2參數設定22
4.3多行程對模擬棋局的影響 23
4.4強度驗證 23
4.4.1對戰隨機對手 24
4.4.2對戰淺層Alpha-Beta剪枝 25
第五章結論與未來展望 26
參考文獻 27

[1] D. Silver et al. (2017). Mastering the game of go without human knowledge. Nature, 550:354– 359.
[2] Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., ... & Lillicrap, T. (2017). Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815.
[3] Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, et al. (2019). Mastering atari, go, chess and shogi by planning with a learned model. arXiv preprint arXiv:1911.08265.
[4] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner et al. (1998). Gradient-based learning. applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324.
[5] D. E. Knuth and R. W. Moore (1975). An analysis of alpha-beta pruning. Artificial. Intelligence, vol. 6, no. 4, pp. 293-326.
[6] L. Kocsis, and C. Szepesvari (2006). Bandit based monte-carlo planning. In 15th. European Conference on Machine Learning, pages 282-293.
[7] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Dieleman, S. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484.
[8] Coulom, R (2006). Efficient selectivity and backup operators in Monte-Carlo tree search. In 5th International Conference on Computers and Games, 72–83.
[9] Y. Tian, J. Ma, Q. Gong, S. Sengupta, Z. Chen, J. Pinkerton and C. L. Zitnick (2019). ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero. CoRR, vol. abs/1902.04522.
[10] A. Zobrist (1970). A new hashing method with application for game playing. Technical Report 88, Univ. of Wisconsin.
[11] Cracraft, S.M (1984). Bitmap Move Generation in Chess. ICCA Journal, Vol. 7, No. 3, pp. 146- 153. ISSN 0920-234X.
[12] Reversi, WIKIPEDIA, https://en.wikipedia.org/wiki/Reversi
[13] Campbell, M., Hoane, A., & Hsu, F. (2002). Deep Blue. Artificial Intelligence, 134(1-2), 57-83. doi:10.1016/s0004-3702(01)00129-1
[14] Guillaume, M. J. B. C., Mark, H. M. W., Herik, H. J. v. d., Jos, W. H. M. U., & Bruno,B. (2008). Progressive Strategies for Monte-Carlo Tree Search. New Mathematics and Natural Computation, 4(3).
[15] Gelly, S., Wang, Y., Munos, R., Teytaud, O. (2006). Modification of UCT with Patterns in Monte-Carlo Go. Technical Report 6062, INRIA.
[16] https://github.com/pytorch
[17] M. Buro (1997). The Othello Match of the Year: Takeshi Murakami vs. Logistello. ICCA. Journal, vol. 20, no. 3, pp. 189-193.

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文