深度學習之唇語辨識系統__國立東華大學博碩士論文全文影像系統

帳號：guest(18.117.232.233) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	鄭滄宇
作者(英文):	Tsang-Yu Cheng
論文名稱:	深度學習之唇語辨識系統
論文名稱(英文):	A Lip-Reading System with Deep Learning
指導教授:	江政欽
指導教授(英文):	Cheng-Chin Chiang
口試委員:	鄭錫齊顏士淨
口試委員(英文):	Shyi-Chy Cheng Shi-Jim Yen
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊工程學系
學號:	610421229
出版年(民國):	107
畢業學年度:	106
語文別:	中文
論文頁數:	36
關鍵詞:	深度學習、卷積神經網路、整體學習法、唇語辨識
關鍵詞(英文):	deep learning、convolution neural network、holistic learning method、lip-language recognition
相關次數:	推薦:0 點閱:47 評分: 下載:34 收藏:0

近年來，隨著科技快速發展和進步，高便利性的產品越來越多樣化，但享受便利之餘，不少產品卻潛藏著使用者身份被盜用的風險。有些產品使用時需要身份認證，當使用者在輸入密碼時，就會有被旁人觀看或者側錄的風險。隨著人臉辨識技術的成熟，已有些產品諸如智慧型手機已將人臉辨識做為身份認證的方法而衍生不少方便的服務，例如電子付費。但當使用者臉部增加了一些配件時，就有可能造成無法辨識；此外，兩個長得相似的人也能解鎖同一支手機。為降低人臉被冒用的可能性，本論文設計一套唇語辨識系統(Lip Reading System)，讓使用者只需做出唇形，不必發出聲音，透過說話時嘴唇區域的唇動連續畫面辨識出密碼作為密碼輸入的方式，當輸入無聲密碼時，旁人聽不到，也不易看出使用者所設定的密碼，可降低密碼被側錄的可能性，而且不同人說相同密碼時的唇動也會不一樣，據此也可提高身分認證的可靠性。本研究以MIRACL-VC1資料庫作為訓練與測試的樣本。本研究提出方法為將圖片經過嘴唇偵測後的嘴唇圖片作為輸入，再利用卷積神經網路(Convolutional Neural Network, CNN) 分別抽取特徵並訓練資料模型。使用者只需透過任意的單一攝影機，說出不同的唇語，將使用者「說」的指令作為輸入，利用事先學習好的多個資料模型，分別判斷唇語的機率值，最後再結合全部模型進行整體學習法(Ensemble Method)中的投票辨識法，即可得知使用者的唇語指令為何。與以往需要大量資料的機器學習相比，本論文使用多種不同特徵的網路模型相輔相成，達到不需大量資料，亦可讓訓練出效能不錯的網路模型，以資料庫的十種唇語辨識統計，本論文分別對兩種指令集與綜合平均的辨識結果可分別達到62%、58%以及60%。

In recent years, with the rapid development and progress of technologies, convenient High-tech products have become more and more diversified. However, while enjoying convenience of these products, users can run the risk of identity theft. When being required a password for authentication by a product, the users' password typing can be watched or recorded by other persons beside him or her. With the maturity of face recognition technology, some products, such as smart phones, have used face recognition as a way to authenticate persons and have spawned more convenient services, such as electronic payments. However, when users put some accessories on their faces, these systems may fail to recognize the users. In addition, two persons who look alike can easily cause the breach of the authentication. To address the weakness of face authentication, this thesis designs a lip-reading system which allows users to input passwords by lip motion without uttering the sound. Since other persons cannot hear the sound, the possibility of being watched or recorded is reduced. Even others know the password, the lip motion of the same password made by different persons will also be different, thereby enhancing the reliability of identity authentication. This study uses the MIRACL-VC1 database as the training and testing samples. The proposed method uses the detected lip on video frames as the inputs to multiple Convolution Neural Networks (CNNs) for feature extraction and recognition. The voting mechanism of the ensemble method is then applied to integrate the recognition results from these multiple CNNs to derive the final recognition results. Compared with the existing methods of machine learning, this paper uses a number of network models to complement each other, requiring only a smaller number of samples to achieve better performance. Using the database of ten word-based lip commands and ten phrase-based lip commands, our system achieves the recognition rates of 62%, 58%, respectively.

第1章緒論 1
第2章文獻探討 4
第3章卷積神經網路用於連續唇形 10
第4章實驗結果與討論 18
第5章結論與未來研究方向 32

[1]Fukushima, K., Miyake, S., & Ito, T. (1983). Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE transactions on systems, man, and cybernetics, (5), 826-834.
[2]Fan, J., Xu, W., Wu, Y., & Gong, Y. (2010). Human tracking using convolutional neural networks. IEEE Transactions on Neural Networks, 21(10), 1610-1623.
[3]Molchanov, P., Gupta, S., Kim, K., & Kautz, J. (2015). Hand gesture recognition with 3D convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 1-7).
[4]Justin Johnson,(2016).Stanford CS231n note : Convolutional Neural Networks for Visual Recognition.
[5]Wermter, S., Weber, C., Duch, W., Honkela, T., Koprinkova-Hristova, P., Magg, S., ... & Villa, A. E. (Eds.). (2014). Artificial Neural Networks and Machine Learning--ICANN 2014: 24th International Conference on Artificial Neural Networks, Hamburg, Germany, September 15-19, 2014, Proceedings (Vol. 8681). Springer.
[6] Lin, H. I., Hsu, M. H., & Chen, W. K. (2014, August). Human hand gesture recognition using a convolution neural network. In Automation Science and Engineering (CASE), 2014 IEEE International Conference on (pp. 1038-1043). IEEE.
[7]Oberweger, M., Wohlhart, P., & Lepetit, V. (2015). Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807.
[8]Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2017). LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems, 28(10), 2222-2232.
[9]Li, H., Lin, Z., Shen, X., Brandt, J., & Hua, G. (2015). A convolutional neural network cascade for face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5325-5334).
[10]Garg, A., Noyola, J., & Bagadia, S. (2016). Lip reading using CNN and LSTM. Technical report, Stanford University, CS231n project report.
[11]Alionte, E., & Lazar, C. (2015, October). A practical implementation of face detection by using Matlab cascade object detector. In System Theory, Control and Computing (ICSTCC), 2015 19th International Conference on (pp. 785-790). IEEE.
[12]Hsu, R. L., Abdel-Mottaleb, M., & Jain, A. K. (2002). Face detection in color images. IEEE transactions on pattern analysis and machine intelligence, 24(5), 696-706.
[13]Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International journal of computer vision, 57(2), 137-154.
[14] Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559-572..
[15]Ojala, T., Pietikainen, M., & Harwood, D. (1994, October). Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In Pattern Recognition, 1994. Vol. 1-Conference A: Computer Vision & Image Processing., Proceedings of the 12th IAPR International Conference on (Vol. 1, pp. 582-585). IEEE.
[16]Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence, 24(7), 971-987.
[17]Wand, M., Koutník, J., & Schmidhuber, J. (2016, March). Lipreading with long short-term memory. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 6115-6119). IEEE.
[18]Garg, A., Noyola, J., & Bagadia, S. (2016). Lip reading using CNN and LSTM. Technical report, Stanford University, CS231n project report.
[19]Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[20]LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
[21]Arden Dertat(2017). Applied Deep Learning - Part 4: Convolutional Neural Networks.
https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2
[22]李戈副教授，「深度學習技術與應用」，北京大學軟件工程研究所教材，中華民國一○四年十月。
[23]Arden Dertat(2017). Applied Deep Learning - Part 4: Convolutional Neural Networks.
https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2
[24]https://rpubs.com/skydome20/R-Note16-Ensemble_Learning
[25]https://blog.csdn.net/lwplwf/article/details/74859518
[26]Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 4489-4497).
[27]李蔚嵩，「主動式外觀模型應用於嘴唇輪廓抽取與讀唇術」，國立東華大學資訊工程學系碩士論文，中華民國九十八年七月。
[28]Edwards, G. J., Taylor, C. J., & Cootes, T. F. (1998, April). Interpreting face images using active appearance models. In Automatic face and gesture recognition, 1998. Proceedings. Third IEEE international conference on (pp. 300-305). IEEE.

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文