單目人臉多視角生成應用在裸視3D立體電視之實現__國立東華大學博碩士論文全文影像系統

帳號：guest(52.14.209.100) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	林洛渝
作者(英文):	Luo-Yu Lin
論文名稱:	單目人臉多視角生成應用在裸視3D立體電視之實現
論文名稱(英文):	Multi-view generation via monocular face in 3D stereoscopic TV base on PSM-Net
指導教授:	陳偉銘
指導教授(英文):	Wei-Ming Chen
口試委員:	張耀中簡暐哲
口試委員(英文):	Yao-Chung Chang Wei-Che Chien
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊管理學系
學號:	611035103
出版年(民國):	112
畢業學年度:	111
語文別:	中文
論文頁數:	72
關鍵詞:	立體匹配、2D轉3D、深度圖預處理、深度學習、人臉多角度視圖
關鍵詞(英文):	stereo matching、2D to 3D、pre-processing of the depth map、deep learning、multi-view face image
相關次數:	推薦:0 點閱:25 評分: 下載:35 收藏:0

近年來3D應用十分普及，不論是在3D電影、動畫和VR(虛擬實境)上，或是在工業和製造業等都有許多幫助，不僅提高了大眾的娛樂性還幫助產業減少人力和縮短生產的時間，讓許多產業都有顯著的進步。然而在製作裸眼3D電影和動畫以及電視等產業方面還是頗具挑戰性的，其過程不僅耗時、耗費金錢和難以製作且呈現出來的效果難以讓大眾有長時間舒適的觀看效果。因此本研究欲研究透過單張圖片生成3D人臉並提升裸眼3D(Auto stereoscopic)的穩定性及效果。其作法為將單張圖片透過無監督式單目估計深度方法去生成6張不同視角的人臉圖像。本文使用深度圖預測圖片視差只需要輸入單張左視圖就能生成右邊視圖，透過調整視差值以獲取6張不同視角的人臉圖像，最後將圖像輸入到3D立體電視中。若能完整呈現3D人臉在3D立體電視上，能讓製作過程變得不複雜、不需要大量設備就能得到良好的3D圖像，並且本文所產生的人臉部不容易扭曲變形或是更改樣貌，而且是對於整體的視角去進行轉換的因此連同背景也會更改，本研究所提出的方法其效益，除了在娛樂業製程方面會簡便許多，還能有效改善許多2D人臉衍伸出的資安問題，像是現在很多使用照片取代人臉的問題，透過3D人臉更可以將實際人臉與2D圖像做區別，對於人臉識別這區塊的議題有實質性的幫助。相信在現今資訊發達的時代立體人臉也能有更廣泛的應用以及不可限量的發展。

3D applications are currently becoming more and more popular, no matter in 3D movies or animation. However, producing 3D videos and pictures remains a challenge, because of the difficulty and the time-consuming process. Therefore, this research intends to generate 3D faces from a single image, called left view, to improve the stability and the effect of Autostereoscopy. In this paper we propose a method using a single image to generate the face images of six different perspectives by PSMNet, it is a pyramid stereo matching network. We predict the disparity map and combine it with the left view to generate the right view. By adjusting the disparity value, we can obtain the face images of six different perspectives or more. And showing the images by 3D stereoscopic TV. Our method can simplify the process with the high quality 3D image. Compared to other paper about rotation, such as CR-GAN , our paper won't distort the face when switching viewpoints and also produce good viewpoint changes when including backgrounds.The advantages of our method are: Simplify the process of the entertainment industry, Effectively improve the security problems of 2D faces, such as the problem of using photos instead of real human faces.
Through 3D faces, the real face can be distinguished from the 2D images. I believe that in today's era of advanced information, the three-dimensional face can also have a wider range of applications and unlimited development.

第一章緒論　　 1
1.1 研究動機與背景　　 1
1.2 研究目的　　 2
第二章文獻探討　　 4
2.1 人類雙眼立體視覺　　 4
2.1.1 視差(Disparity)　　 4
2.1.2 立體顯示器　　 5
2.2 裸眼3D (Autostereoscopy）　　 6
2.2.1 2D多工式(Multiplexed 2D)　　 6
2.2.2 分時多工式(time multiplexed)　　 8
2.3 其它 3D 成像技術　　 9
2.3.1 多平面式(multi-plane type)　　 9
2.4 立體匹配(Stereo Matching)　　 9
2.4.1 單目深度估計　　 12
2.5 合成新視圖　　 15
2.6 空洞產生　　 18
2.7 頭部旋轉　　 18
2.8 圖像修復(Image Inpainting)　　 20
第三章研究方法　　 22
3.1 研究架構　　 22
3.2 研究工具　　 29
3.2.1 軟硬體開發工具　　 29
3.3 深度前處理　　 30
3.3.1 正規化　　 30
3.3.2 高斯平滑　　 32
3.3.3 倒轉和縮放　　 33
3.4 訓練過程　　 34
3.4.1 視差範圍值及孤立像素點　　 34
3.4.2 向前扭曲　　 35
3.4.3 圖像修復　　 36
第四章實驗結果　　 37
4.1 實驗過程與結果　　 37
4.1.1 前處理的比較　　 38
4.2 與其他方法比較　　 41
4.3 與原始右視圖比較　　 45
4.4 帶有複雜背景的情況下　　 47
4.5 視差圖驗證　　 48
4.6 研究限制　　 49
第五章結論與未來展望　　 50
參考文獻　　 51

英文文獻
[1] Motion Picture Association of America: Theatrical market statistics. (2014)
[2] Laga, H., Jospin, L. V., Boussaid, F., & Bennamoun, M. (2020). A survey on deep learning techniques for stereo-based depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Fehn, C. (2004, May). Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. In Stereoscopic displays and virtual reality systems XI (Vol. 5291, pp. 93-104). SPIE.
[4] Sithara, R., & Rajasree, R. (2019, March). A survey on Face Recognition Technique. In 2019 IEEE International Conference on Innovations in Communication, Computing and Instrumentation (ICCI) (pp. 189-192). IEEE.
[5] Tong, X., Wang, L., Pan, X., & gya Wang, J. (2020, July). An Overview of Deepfake: The Sword of Damocles in AI. In 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL) (pp. 265-273). IEEE.
[6] Uçan, A. S., Buçak, F. M., Tutuk, M. A. H., Aydin, H. İ., Semiz, E., & Bahtiyar, Ş. (2021, September). Deepfake and Security of Video Conferences. In 2021 6th International Conference on Computer Science and Engineering (UBMK) (pp. 36-41). IEEE.
[7] Wang, Y., Wang, L., Yang, J., An, W., & Guo, Y. (2019). Flickr1024: A large-scale dataset for stereo image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (pp. 0-0).
[8] Geiger, A., Lenz, P., & Urtasun, R. (2012, June). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354-3361). IEEE.
[9] Liu, Z., Luo, P., Wang, X., & Tang, X. (2018). Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 15(2018), 11.
[10] Watson, J., Aodha, O. M., Turmukhambetov, D., Brostow, G. J., & Firman, M. (2020, August). Learning stereo from single images. In European Conference on Computer Vision (pp. 722-740). Springer, Cham.
[11] Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., & Koltun, V. (2020). Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence.
[12] Xie, J., Girshick, R., & Farhadi, A. (2016, October). Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In European conference on computer vision (pp. 842-857). Springer, Cham.
[13] Norling, J. A. (1953). The stereoscopic art—a reprint. Journal of the Society of Motion Picture and Television Engineers, 60(3), 268-308.
[14] P. Hohenstatt, “Leonardo da Vinci,” Konemann Verlag, 1998, pp. 1452-1519.
[15] Holliman, N. (2009). 3D display systems. Department of Computer Science, University of Durham.
[16] Cao, J., Hu, Y., Yu, B., He, R., & Sun, Z. (2019). 3D aided duet GANs for multi-view face image synthesis. IEEE Transactions on Information Forensics and Security, 14(8), 2028-2042.
[17] Benzie, P., Watson, J., Surman, P., Rakkolainen, I., Hopf, K., Urey, H., ... & Von Kopylow, C. (2007). A survey of 3DTV displays: techniques and technologies. IEEE transactions on circuits and systems for video technology, 17(11), 1647-1658.
[18] Tian, Y., Peng, X., Zhao, L., Zhang, S., & Metaxas, D. N. (2018). CR-GAN: learning complete representations for multi-view generation. arXiv preprint arXiv:1806.11191.
[19] Lee, H. J., Nam, H., Lee, J. D., Jang, H. W., Song, M. S., Kim, B. S., ... & Choi, K. H. (2006, June). 8.2: A high resolution autostereoscopic display employing a time division parallax barrier. In SID Symposium Digest of technical papers (Vol. 37, No. 1, pp. 81-84). Oxford, UK: Blackwell Publishing Ltd.
[20] Lee, B., Choi, H., & Cho, S. W. (2009). 3-D to 2-D Convertible Displays Using Liquid Crystal Devices. In Three-dimensional Imaging, Visualization, and Display (pp. 55-77). Springer, New York, NY.
[21] Hodges, L. F. (1992). Tutorial: Time-multiplexed stereoscopic computer graphics. IEEE computer graphics and applications, 12(02), 20-30.
[22] Mather, J., Winlow, R., Nakagawa, A., Kean, D. U., & Bourhill, G. (2010). U.S. Patent No. 7,813,042. Washington, DC: U.S. Patent and Trademark Office.
[23] Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., & Efros, A. A. (2016). Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2536-2544).
[24] Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., & Huang, T. S. (2018). Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5505-5514).
[25] Sullivan, A. (2000). U.S. Patent No. 6,100,862. Washington, DC: U.S. Patent and Trademark Office.
[26] Huang, R., Zhang, S., Li, T., & He, R. (2017). Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE international conference on computer vision (pp. 2439-2448).
[27] Zhang, X., & Liu, Z. (2014, June). A survey on stereo vision matching algorithms. In Proceeding of the 11th World Congress on Intelligent Control and Automation (pp. 2026-2031). IEEE.
[28] Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 47(1), 7-42.
[29] Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 47(1), 7-42.
[30] Yin, Z., Darrell, T., & Yu, F. (2019). Hierarchical discrete distribution decomposition for match density estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6044-6053).
[31] Guo, X., Yang, K., Yang, W., Wang, X., & Li, H. (2019). Group-wise correlation stereo network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3273-3282).
[32] Chang, J. R., & Chen, Y. S. (2018). Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5410-5418).
[33] Zhang, F., Prisacariu, V., Yang, R., & Torr, P. H. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 185-194).
[34] Hirschmuller, H. (2005, June). Accurate and efficient stereo processing by semi-global matching and mutual information. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) (Vol. 2, pp. 807-814). IEEE. [35] Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4040-4048).
[36] Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision (pp. 66-75).
[37] Marr, D. (2010). Vision: A computational investigation into the human representation and processing of visual information. MIT press.
[38] Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 47(1), 7-42.
[39] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916.
[40] Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881-2890).
[41] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834-848.
[42] Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.
[43] Furukawa, Y., & Hernández, C. (2015). Multi-view stereo: A tutorial. Foundations and Trends® in Computer Graphics and Vision, 9(1-2), 1-148.
[44] Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International journal of computer vision, 59(2), 167-181.
[45] Guo, X., Li, H., Yi, S., Ren, J., & Wang, X. (2018). Learning monocular depth by distilling cross-domain stereo networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 484-500).
[46] Tosi, F., Aleotti, F., Poggi, M., & Mattoccia, S. (2019). Learning monocular depth estimation infusing traditional stereo knowledge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9799-9809).
[47] Watson, J., Firman, M., Brostow, G. J., & Turmukhambetov, D. (2019). Self-supervised monocular depth hints. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 2162-2171).
[48] Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems, 27.
[49] Dwibedi, D., Misra, I., & Hebert, M. (2017). Cut, paste and learn: Surprisingly easy synthesis for instance detection. In Proceedings of the IEEE international conference on computer vision (pp. 1301-1310).
[50] Saxena, A., Sun, M., & Ng, A. Y. (2008). Make3d: Learning 3d scene structure from a single still image. IEEE transactions on pattern analysis and machine intelligence, 31(5), 824-840.
[51] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
[52] Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., ... & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 2758-2766).
[53] Flynn, J., Neulander, I., Philbin, J., & Snavely, N. (2016). Deepstereo: Learning to predict new views from the world's imagery. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5515-5524).
[54] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556..
[55] Bae, S. H., Elgharib, M., Hefeeda, M., & Matusik, W. (2017). Efficient and Scalable View Generation from a Single Image using Fully Convolutional Networks. arXiv preprint arXiv:1705.03737.
[56] Nazeri, K., Ng, E., Joseph, T., Qureshi, F. Z., & Ebrahimi, M. (2019). Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212.
[57] Miangoleh, S. M. H., Dille, S., Mai, L., Paris, S., & Aksoy, Y. (2021). Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9685-9694).
[58] Zhang, Z., Song, Y., & Qi, H. (2017). Age progression/regression by conditional adversarial autoencoder. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5810-5818).
[59] Hua, Y., Kohli, P., Uplavikar, P., Ravi, A., Gunaseelan, S., Orozco, J., & Li, E. (2020). Holopix50k: A large-scale in-the-wild stereo image dataset. arXiv preprint arXiv:2003.11172.
[60] Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4040-4048).
[61] Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., & Huang, T. S. (2019). Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4471-4480).
[62] Godard, C., Mac Aodha, O., Firman, M., & Brostow, G. J. (2019). Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3828-3838).
[63] Ranftl, R., Bochkovskiy, A., & Koltun, V. (2021). Vision transformers for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 12179-12188).
[64] Godard, C., Mac Aodha, O., & Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 270-279).
[65] Khan, F., Hussain, S., Basak, S., Lemley, J., & Corcoran, P. (2021). An efficient encoder–decoder model for portrait depth estimation from single images trained on pixel-accurate synthetic data. Neural Networks, 142, 479-491.
[66] Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image and vision computing, 28(5), 807-813.
[67] Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4401-4410).
資料集來源: https://github.com/NVlabs/ffhq-dataset
中文文獻
[68] 許精益, & 黃乙白. (2007). 3D 立體顯示技術之發展與研究. 光學工程, (98), 53-60.
[69] 林智祥. (2014). 3D 影像之立體感與舒適度探討.
[70] 黃則銘, & 林昇甫. (2010). 柱狀透鏡之即時裸視立體顯像操控之實現 (Doctoral dissertation).
[71] 黃怡菁、黃乙白、謝漢萍，「3D 立體顯示技術」，《科學發展》，451 期，46-52頁，2010 年 7 月。
[72] 楊皓婷. (2014). 3D 立體成像於無人水下載具上之實現.
[73] 愛的力量(1916年全球首部3D電影) - 中文百科全書
https://www.newton.com.tw/wiki/%E6%84%9B%E7%9A%84%E5%8A%9B%E9%87%8F/2892894
[74] 3D電視 - 中文百科知識
https://www.easyatm.com.tw/wiki/3D%E9%9B%BB%E8%A6%96

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文