深度學習之三維手指關節點定位與手勢辨識__國立東華大學博碩士論文全文影像系統

帳號：guest(18.189.182.214) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	廖維恩
作者(英文):	Wei-An Liao
論文名稱:	深度學習之三維手指關節點定位與手勢辨識
論文名稱(英文):	3-D Finger Joint Position Estimation and Gesture Recognition with Deep Learning
指導教授:	江政欽
指導教授(英文):	Cheng-Chin Chiang
口試委員:	鄭錫齊顏士淨
口試委員(英文):	Shyi-Chy Cheng Shi-Jim Yen
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊工程學系
學號:	610421206
出版年(民國):	107
畢業學年度:	106
語文別:	中文
論文頁數:	35
關鍵詞:	深度學習、卷積神經網路、手勢辨識、關節點估測、隨機森林、Fast R-CNN、最近鄰居
關鍵詞(英文):	Deep learning、Convolution Neural Network (CNN)、Gesture recognition、Hand Joint Positions Estimation、Random forests、Fast R-CNN、Nearest Neighbor
相關次數:	推薦:0 點閱:15 評分: 下載:6 收藏:0

近年來，市面上的科技與產品促進人們以更多元、廣泛的方式進行人機互動，例如運用手勢動作等方式來操作電腦遠比使用傳統的鍵盤與滑鼠來的便利。手勢辨識是在人機互動中能使用自然手勢命令(Gesture Commands)進行操控的一個熱門議題，手勢辨識時常需要準確的手指關節點位置以取得時間及空間上的特徵才能夠辨識出不同的動作指令。在這篇論文我們提出最近鄰居卷積神經網路(Nearest Neighbor Convolutional Neural Network, NN-CNN)來估測手指關節點位置，再以手指關節點為輸入，搭配了隨機森林的手勢動作辨識應用來驗證我們關節點位置估測結果的準確度。我們的做法是先以Faster R-CNN從RGBD深度攝影機產生之深度影像中偵測手部位置，接著透過最近鄰居卷積神經網路對手指關節點定位，最後我們以隨機森林訓練並測試張開手掌、抓握、捏、翻手掌、順時針旋轉、逆時針旋轉等6種動態手勢。透過手勢辨識的結果以及與其他方法之比較的結果，證實我們的NN-CNN擁有較好的準確率。

In recent years, with the development of technology, many products and technology facilitates various ways of human–machine interaction like gesture commands which is much more convenient than traditional mechanical operations using keyboards and mouse. Hand gesture recognition has been a popular topic in enabling natural gesture commands in human–machine interaction. Gesture recognition often requires the accurate positions of finger joints to acquire effective both spatial and temporal features for classifying different gesture commands. In this thesis, we propose a deep learning neural network called nearest neighbor convolutional neural network (NN-CNN) to estimate finger joint positions and then verify the estimating results with a hand gesture recognition application using a random forest recognizer. With a Faster R-CNN, we extract the hand region from a RGBD camera and then the NN-CNN locates the finger joint positions from the acquired depth image. We train and test 6 dynamic gestures, including opening fist, grasping, pinching, turning over palm, clockwise rotating, counterclockwise rotating, using the proposed NN-CNN and random forest. The recognition performance is examined and compared for different methods. The results show that the proposed NN-CNN achieves better accuracy.

摘要 III
Abstract IV
目錄 V
圖目錄 VII
表目錄 VIII
第1章緒論 1
1.1 研究動機與目的 1
1.2 系統流程 2
1.3 章節架構 3
第2章文獻探討 5
2.1 關節點估測 5
2.2 隨機森林(Random Forests) 7
2.3 卷積神經網路(CNN) 8
第3章關節點估測之最近鄰居卷積神經網路 13
3.1 深度影像前處理 13
3.2 關節點估測 14
第4章動態手勢辨識 19
4.1 動態手勢辨識之前處理 19
4.2 動態手勢辨識 20
第5章實驗結果與討論 23
5.1 紐約大學手勢資料庫 23
5.2 關節點估測實驗 23
5.3 實驗結果分析與比較 25
5.4 動態手勢辨識 28
第6章結論與未來研究方向 33
參考文獻 34

[1]Yann LeCun, Leon Bottou, Yoshua Bengio, et al. (1998). “Gradient-based learning applied to document recognition.” Proceedings of the IEEE, 1998, 86(11):2278-2324.
[2]李戈副教授，「深度學習技術與應用」，北京大學軟件工程研究所教材，中華民國一○四年十月。
[3]Stanford CS231n note : Convolutional Neural Networks for Visual Recognition.
[4]Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
[5]Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1993). Classification and Regression Trees. Wadsworth, 1984. Google Scholar.
[6] Chen, T. Y., Ting, P. W., Wu, M. Y., & Fu, L. C. (2017). Learning a deep network with spherical part model for 3D hand pose estimation. In Robotics and Automation (ICRA), 2017 IEEE International Conference on (pp. 2600-2605). IEEE.
[7]Sinha, A., Choi, C., & Ramani, K. (2016). Deephand: Robust hand pose estimation by completing a matrix imputed with deep features. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4150-4158).
[8]Oberweger, M., Wohlhart, P., & Lepetit, V. (2015). Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807.
[9]Tompson, J., Stein, M., Lecun, Y., & Perlin, K. (2014). Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics (ToG), 33(5), 169.
[10]Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 91-99.
[11]Girshick, R. (2015). Fast r-cnn. arXiv preprint arXiv:1504.08083.
[12]Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.
[13]Erhan, D., Szegedy, C., Toshev, A., & Anguelov, D. (2014). Scalable object detection using deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2147-2154).
[14]Fan, J., Xu, W., Wu, Y., & Gong, Y. (2010). Human tracking using convolutional neural networks. IEEE Transactions on Neural Networks, 21(10), 1610-1623.
[15]Molchanov, P., Gupta, S., Kim, K., & Kautz, J. (2015). Hand gesture recognition with 3D convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 1-7).
[16]Garg, A., Noyola, J., & Bagadia, S. (2016). Lip reading using CNN and LSTM. Technical report, Stanford University, CS231n project report.
[17]Chen, C. P., Chen, Y. T., Lee, P. H., Tsai, Y. P., & Lei, S. (2011). Real-time hand tracking on depth images. Visual Communications and Image Processing (VCIP), 2011 IEEE, 1-4.
[18]Garg, P., Aggarwal, N., & Sofat, S. (2009). Vision based hand gesture recognition. World Academy of Science, Engineering and Technology, 49(1), 972-977.
[19]Park, S., Yu, S., Kim, J., Kim, S., & Lee, S. (2012). 3D hand tracking using Kalman filter in depth space. EURASIP Journal on Advances in Signal Processing, 2012(1), 36.
[20]Athitsos, V., & Sclaroff, S. (2003, June). Estimating 3D hand pose from a cluttered image. In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on (Vol. 2, pp. II-432). IEEE.
[21]Yeo, H. S., Lee, B. G., & Lim, H. (2015). Hand tracking and gesture recognition system for human-computer interaction using low-cost hardware. Multimedia Tools and Applications, 74(8), 2687-2715.
[22]Wand, M., Koutník, J., & Schmidhuber, J. (2016, March). Lipreading with long short-term memory. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 6115-6119). IEEE.
[23]Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
[24]Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC press.
[25]He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[26]Sinha, A., Choi, C., & Ramani, K. DeepHand: Robust Hand Pose Estimation by Completing a Matrix Imputed with Deep Features-Supplementary Material. In Robotics and Automation (ICRA), 2017 IEEE International Conference on (pp. 2600-2605). IEEE.

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文