以機器學習強化棋類遊戲__國立東華大學博碩士論文全文影像系統

帳號：guest(3.147.193.164) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	林璟農
作者(英文):	Ching-Nung Lin
論文名稱:	以機器學習強化棋類遊戲
論文名稱(英文):	Optimize Program Strength on Board Games using Machine Learning
指導教授:	顏士淨
指導教授(英文):	Shi-Jim Yen
口試委員:	陳志昌林紋正顏士淨楊茂村陳旻秀
口試委員(英文):	Jr-Chang Chen Wen-Cheng Lin Shi-Jim Yen Mau-Tsuen Yang Min-Xiou Chen
學位類別:	博士
校院名稱:	國立東華大學
系所名稱:	資訊工程學系
學號:	810221001
出版年(民國):	108
畢業學年度:	107
語文別:	英文
論文頁數:	74
關鍵詞:	深度學習、機器學習、亂數種子最佳化、棋類遊戲
關鍵詞(英文):	Deep Learning、Machine Learning、Seed Optimization、Board Games
相關次數:	推薦:0 點閱:54 評分: 下載:43 收藏:0

當電腦程式 AlphaGo 擊敗人類專家棋士，深度學習成為一個熱門的研究題目。在深度學習運用於棋類遊戲上，仍然存在許多有趣的挑戰。因為深度學習發展尚淺，對於如何設定其網路架構和其學習參數目前沒有相關理論，一般是由嘗試錯誤取得。
然而深度學習預測速度通常比傳統演算法慢，以至於實際運用時可能有許多限制。加速預測速度跟棋力強弱有很大關聯，譬如縮小訓練模型的尺寸就能達到加速的目的。另外如何把深度學習和之前其他演算法結合也是一個難題。譬如蒙地卡羅樹搜尋演算法跟深度學習結合的非常成功。除了深度學習，如何強化一個程式強度而且不改變其架構也是一個有趣的研究題目。當棋類遊戲執行在行動裝置上，如何強化程式棋力又不增加耗電，這在行動裝置上是一個重要課題。

本篇研究提供一些演算法去加速深度學習在 CPU, Xeon Phi 和顯示卡上預測速度。進一步，提出一個演算法去縮小深度學習訓練出來模型的尺寸。然後描述如何用深度學習和蒙地卡羅樹搜尋演算法設計一個棋類遊戲，本研究是選擇板塊圍棋。然後探索 testing accuracy 和 batch size 及 epoch 關係。另外也試驗改變深度學習輸入的權重值，來探討其跟棋力之關係。最後依據亂數種子最佳化方法來固定亂數，提出一個新的演算法來增強西洋跳棋程式棋力。本研究板塊圍棋和西洋跳棋程式在電腦奧林匹亞競賽都贏得冠軍。

Deep learning has become a hot research topic after AlphaGo beat a professional human go player. Many interesting challenges remain in deep learning with board games. Since deep learning is a new domain, how to design a framework and how to tune its parameters are unknown, and they are usually based on trial and error. Because deep learning inference is usually slower than traditional methods, this might limit its usage in real-world applications. Accelerating it has huge compact on the strength of a program, and one possible solution is to reduce the size of a trained model. Also, it is difficult to merge previous well-developed algorithms and deep learning. Using Monte Carlo tree search with deep learning is one of the successful cases. Beside deep learning, how to optimize the strength of a existing program without modifying its structure is another interesting topic. It is difficult to improve the strength and remain the same power consumption, and this is important for board games running on the widely used mobile devices.

This research provides some methods to accelerate deep learning inference using CPU, Xeon Phi and a graphics processing unit (GPU). On top of that, we provide an algorithm to reduce the size of a deep learning model. Then, we show how to design a program using deep learning and Monte Carlo tree search. A Block Go program is presented. We investigate testing accuracy and parameters related to deep learning including batch sizes and epochs. In addition, we try to adjust the values on the deep learning input feature planes. Finally, a new algorithm is described to use seed optimization on Draughts to improve strength of a program by fixing its seeds. The Block Go and Draughts programs both won the first place in the Computer Olympiad tournaments.

Chapter 1 Introduction 1
Chapter 2 Accelerating Deep Learning Inference with Monte Carlo tree search in the Game of Go 5
Section 2.1 Introduction 6
Section 2.2 Architectures of GPU, Intel CPU and Xeon Phi 7
Section 2.3 Method 9
Subsection 2.3.1 Knowledge Planes 10
Subsection 2.3.2 Direct Connection 11
Subsection 2.3.3 Sharing a Single Hardware with Multiple Threads 12
Subsection 2.3.4 Using Hardware Specific Instructions to Accelerate Deep Learning Inference 13
Section 2.4 Experimental Results 17
Subsection 2.4.1 Knowledge Planes Performance 18
Subsection 2.4.2 Direct Connection Performance 19
Subsection 2.4.3 Performance of Using Multiple CPU Threads and GPUs with Direct Connection 21
Subsection 2.4.4 Performance using a Single Hardware on GPU, CPU and PHI 22
Subsection 2.4.5 Doing DCNN Inference on MCTS Expanding and Simulating 25
Section 2.5 Conclusion 26
Chapter 3 Design of a Block Go Program Using Deep Learning and Monte Carlo tree search 29
Section 3.1 Introduction 29
Section 3.2 Background 31
Subsection 3.2.1 Rules and Complexity of Block Go 31
Subsection 3.2.2 Monte Carlo Tree Search 33
Section 3.3 Deep Learning for Block Go 34
Subsection 3.3.1 Feature Planes and Labelling 34
Subsection 3.3.2 Knowledge Planes with Adjustable Weights 35
Subsection 3.3.3 The DCNN Framework 36
Subsection 3.3.4 MCTS Combined with DCNN 36
Section 3.4 Experimental Results 37
Subsection 3.4.1 The Input Layer with Different Feature Planes 38
Subsection 3.4.2 Testing Accuracy and Epochs 39
Subsection 3.4.3 The Selection of Batch Sizes 40
Subsection 3.4.4 Different Combinations of Techniques 40
Subsection 3.4.5 Knowledge Planes with Adjustable Weights 41
Section 3.5 Conclusion 42
Chapter 4 Seed Optimization Framework on Draughts 45
Section 4.1 Introduction 46
Section 4.2 Related Works 47
Subsection 4.2.1 Computer Draughts 47
Subsection 4.2.2 Scan 49
Subsection 4.2.3 Seeds Optimization 50
Section 4.3 Best Promise Seed Optimization Framework 52
Subsection 4.3.1 Self-play Matrix 52
Subsection 4.3.2 Cross-validation 53
Subsection 4.3.3 Seeds Selection 55
Section 4.4 Experimental Results 58
Subsection 4.4.1 Best Promise Seed Method Performance against Other Seed Methods 59
Subsection 4.4.2 Different Time Setting 60
Subsection 4.4.3 The Effect of Increasing the Matrix Size 61
Subsection 4.4.4 Draw Rate and Playing Second Benefit 62
Section 4.5 Conclusion 65
Chapter 5 Conclusions and Future Works 67
Bibliography 69

[1] J. van den Herik, J. W. Uiterwijk, and J. van Rijswijck, “Games solved: Now and in the future,” Artificial Intelligence, vol. 134, no. 1, pp. 277 – 311, 2002. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0004370201001527

[2] B. Bru ̈gmann, “Monte carlo go,” 1993.

[3] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driess- che, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Diele- man, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.

[4] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of go without human knowledge,” Nature, vol. 550, pp. 354–, Oct. 2017. [Online]. Available: http://dx.doi.org/10.1038/nature24270

[5] Y. Tian and Y. Zhu, “Better computer go player with neural network and long-term prediction,” arXiv preprint arXiv:1511.06410, 2015.

[6] P. H. Jin and K. Keutzer, “Convolutional monte carlo rollouts in go,” CoRR, vol. abs/1512.03375, 2015.

[7] “Whitepaper gpu-based deep learning inference : A performance and power analysis,” 2015.

[8] M. S. Birrittella, M. Debbage, R. Huggahalli, J. Kunz, T. Lovett, T. Rimmer, K. D. Underwood, and R. C. Zak, “Intel®omni-path architecture: Enabling scalable, high performance fabrics,” in Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, ser. HOTI ’15. Washington, DC, USA: IEEE Computer Society, 2015, pp. 1–9. [Online]. Available: http://dx.doi.org/10.1109/HOTI.2015.22

[9] D. Foley and J. Danskin, “Ultra-performance pascal gpu and nvlink interconnect,” IEEE Micro, vol. 37, pp. 7–17, 2017.

[10] S. Markidis, S. W. D. Chien, E. Laure, I. B. Peng, and J. S. Vetter, “NVIDIA tensor core programmability, performance & precision,” CoRR, vol. abs/1803.04014, 2018. [Online]. Available: http://arxiv.org/abs/1803.04014

[11] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” in Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ser. ICML’15. JMLR.org, 2015, pp. 1737–1746. [Online]. Available: http://dl.acm.org/citation.cfm?id=3045118.3045303

[12] Nvidia. (2018) NVIDIA TensorRT. [Online]. Available: https://developer.nvidia.com/tensorrt

[13] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding,” CoRR, vol. abs/1510.00149, 2015. [Online]. Available: http://arxiv.org/abs/1510.00149

[14] S. Gray. (2015) MaxAs. [Online]. Available: https://github.com/NervanaSystems/maxas/wiki/Introduction

[15] R. Coulom, “Computing elo ratings of move patterns in the game of go,” ICGA Journal, vol. 30, 06 2007.

[16] G. M. J. B. Chaslot, M. H. M. Winands, and H. J. van den Herik, “Parallel monte-carlo tree search,” in Computers and Games, H. J. van den Herik, X. Xu, Z. Ma, and M. H. M. Winands, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 60–71.

[17] S. A. Mirsoleimani, A. Plaat, J. v. d. Herik, and J. Vermaseren, “Scaling monte carlo tree search on intel xeon phi,” in 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), Dec 2015, pp. 666–673.

[18] G. M. J. -B. Chaslot, M. H. M. Winands, J. vanden Herik, J. W. H. M. Uiterwijk, and B. Bouzy, “Progressive strategies for monte-carlo tree search,” New Mathematics and Natural Computation, vol. 4, pp. 343–357, 2008.

[19] S. Gelly, Y. Wang, R. Munos, and O. Teytaud, “Modification of uct with patterns in monte carlo go,” Tech. Rep., 2006.

[20] C. J. Maddison, A. Huang, I. Sutskever, and D. Silver, “Move evaluation in go using deep convolutional neural networks,” 2015. [Online]. Available: https://arxiv.org/pdf/1412.6564.pdf

[21] S. Gelly and D. Silver, “Monte-carlo tree search and rapid action value estimation in computer go,” Artificial Intelligence, vol. 175, no. 11, pp. 1856 – 1875, 2011. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S000437021100052X

[22] C.-N. Lin, S.-J. Yen, and J.-C. Chen, “Accelerate parallel deep learning inferences with MCTS in the game of go,” in Game Programming Workshop 2017, vol. 2017, nov 2017, pp. 131–137.

[23] R. Collobert, K. Kavukcuoglu, and C. Farabet, “Torch7: A matlab-like environment for machine learning,” in BigLearn, NIPS Workshop, 2011.

[24] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P.T.P.Tang,“On large-batch training for deep learning: Generalization gap and sharp minima,” CoRR, vol. abs/1609.04836, 2016 [Online]. Available: http://arxiv.org/abs/1609.04836

[25] J. Liu, O. Teytaud, and T. Cazenave, “Fast seed-learning algorithms for games,” in International Conference on Computers and Games. Springer, 2016, pp. 58–70.

[26] D. L. St-Pierre and O. Teytaud, “The nash and the bandit approaches for adversarial portfolios,” in 2014 IEEE Conference on Computational Intelligence and Games. IEEE, 2014, pp. 1–7.

[27] D. L. St-Pierre, J. Hoock, J. Liu, F. Teytaud, and O. Teytaud, “Automatically reinforcing a game AI,” CoRR, vol. abs/1607.08100, 2016.

[28] T. Cazenave, J. Liu, F. Teytaud, and O. Teytaud, “Learning opening books in partially observable games: Using random seeds in phantom go,” in IEEE Conference on Computational Intelligence and Games, CIG 2016, Santorini, Greece, September 20-23, 2016, pp. 1–7.

[29] V. Ventos, Y. Costel, O. Teytaud, and S.The ́pautVentos, “Boosting a bridge artificial intelligence,” in IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2017, Boston, USA, November 6-8, 2017, pp. 1–19.

[30] A. S. Fraenkel, M. R. Garey, D. S. Johnson, T. Schaefer, and Y. Yesha, “The complexity of checkers on an n x n board,” in Proceedings of the 19th Annual Symposium on Foundations of Computer Science, ser. SFCS ’78. Washington, DC, USA: IEEE Computer Society, 1978, pp. 55 64. [Online]. Available: http://dx.doi.org/10.1109/SFCS.1978.36

[31] L. Nagels. (2011) Draughts programs. [Online]. Available: http://home.kpn.nl/nagel580/Compdam/CompHist.htm

[32] M. Grimminck. (1996) Dragon. [Online]. Available: http://mdgsoft.home.xs4all.nl/draughts/index.html

[33] E. Gilbert. (2006) Kingsrow. [Online]. Available: http://edgilbert.org/InternationalDraughts/kingsrow international.htm

[34] F. Letouzey. (2015) Scan 2.0. [Online]. Available: https://hjetten.home.xs4all.nl/scan/scan.html

[35] M. Fierz. (2002) basics of strategy game programming: part II. [Online]. Available: http://www.fierz.ch/strategy2.htm

[36] CPWteam. (2016) Extensions. [Online]. Available: http://chessprogramming.org/Extensions

[37] M. Buro, “Experiments with multi-probcut and a new high-quality evaluation function for othello,” Games in AI Research, pp. 77–96, 1997.

[38] M. Buro, “Improving heuristic mini-max search by supervised learning,” Artificial Intelligence, vol. 134, no. 1, pp. 85–99, 2002.

[39] T. R. Lincke, “Strategies for the automatic construction of opening books,” in
Revised Papers from the Second International Conference on Computers and Games, ser. CG ’00. London, UK, UK: Springer-Verlag, 2002, pp. 74–86. [Online]. Available: http://dl.acm.org/citation.cfm?id=647481.728125

[40] T. R. Lincke, “Exploring the computational limits of large exhaustive search problems,” Ph.D. dissertation, Diss., Technische Wissenschaften ETH Zu ̈rich, Nr. 14701, 2002, 2002.

[41] R. Lake, J. Schaeffer, and P. Lu, Solving large retrograde analysis problems using a network of workstations. Department of Computing Science, University of Alberta, 1993.

[42] J.-J. van Horssen, “Schwarzman vs. maximus: A man-machine match in international draughts,” ICGA Journal, vol. 35, no. 2, pp. 106–119, 2002.

[43] CPWteam. (2016) Chess Programming WIKI. [Online]. Available: http://chessprogramming.org/

[44] FMJD. (2015) World draughts forum. [Online]. Available: http://fmjd.org/bb3/viewforum.php?f=53

[45] CPWteam. (2016) Late Move Reductions. [Online]. Available: http://chessprogramming.org/Late Move Reductions

[46] FMJD. (2015) Scan technical description. [Online]. Available: http://laatste.info/bb3/viewtopic.php?p=112682#p112682

[47] A. L. Zobrist, “A new hashing method with application for game playing,” Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, Tech. Rep., 1969.

[48] H. Jetten. (2015) Moby Dam. [Online]. Available: https://hjetten.home.xs4all.nl/mobydam/mobydam.html

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文