作者:Paulo Enrique Linares Otoya
作者(英文):Paulo Enrique Linares Otoya
論文名稱(英文):Head Pose Classification and Facial Landmark Based Ensemble Learning for Pose-invariant Face Recognition
指導教授(英文):Shin-Feng Lin
口試委員(英文):Chia-Hung Yeh
Cheng-Chin Chiang
關鍵詞(英文):Ensemble learningfacial landmarkslocal feature descriptorspose-invariant face recognitionnon-linear vector transformations
三十多年來,人臉識別因其在生物識別和監控方面的應用而一直是計算機視覺領域的一個活躍研究領域。姿勢不變人臉識別(PIFR)通過分析從不同姿勢捕獲的人臉圖像來解決識別個體的任務。主要目標是通過處理一個人的非正面臉部圖像來識別該人。然而,由於姿勢而導致的臉部外觀的巨大變化意味著在不同姿勢下實現準確的人臉識別面臨著巨大的挑戰。近年來,PIFR 主要關注整體方法。ArcFace、Elastic Face 和FaceNet 等DCNN 已被用來生成人臉圖像映射,隨後用於人臉識別,並取得了可喜的結果。

本論文旨在通過一種新穎的局部方法來解決PIFR。在這種方法中,姿勢不變的人臉識別是通過結合頭部姿勢估計(HPE)、整體學習系統和局部特徵描述子來進行的。在人臉數據庫上為每個受試者訓練一個整體系統。該整體系統包括一個基礎學習器集,其中每個基礎學習器都專門使用從圖庫臉部圖像中特定臉部特徵點周圍的區域提取的特徵向量進行訓練。為了對輸入圖像進行PIFR,實施以下步驟。首先,檢測圖像中的臉部和臉部特徵點。其次,處理臉部特徵點位置以獲得頭部姿勢描述子,用於執行HPE。根據HPE 結果,選擇某些特徵點來對其應用局部特徵描述子,因為其中一些特徵點可能會因頭部姿勢而自遮擋。第三,計算出的描述子被分送給相應的基礎學習器。事實上,一個基礎學習器與特定的臉部特徵點相關聯。第四,合併每個整體系統基礎學習器的輸出,以計算其整體決策支持值。最後,通過選擇具有最高決策支持值的整體系統來計算輸入人臉圖像的身份。

在這項工作中,提出了三種新穎的頭部姿勢描述子(FAV、EFAV、NFCV)與非線性回歸模型相結合來執行HPE。BIWI 和Pointing’04 數據庫的實驗結果表明,所提出的HPE 方法優於使用相同數據庫的幾種最先進的方法。另一方面,提出了一種稱為特定特徵點SIFT (LS-SIFT) 的新型局部特徵描述子,以提高SIFT 針對視點和光照變化的強健性。此外,本論文還開發了三種創新的基礎學習器模型(RD-CS、GMM、Mahalanobis Similarity)。整個PIFR 框架在CMU-PIE、Multi-PIE 和FERET 數據庫上進行了測試。獲得的結果可與最先進的作品相媲美。最後,LS-SIFT 與Mahalanobis Similarity 的結合產生了最好的識別結果。
Face recognition has been an active research field in computer vision for over three decades, due to its applications on biometric authentication, and surveillance. Pose-invariant face recognition (PIFR) addresses the task of identifying individuals by analyzing face images captured from diverse poses. The primary objective is to recognize a person by processing a non-frontal view face image of this person. However, the substantial variations in facial appearance due to pose imply significant challenges in achieving accurate face recognition across different poses, making it a challenging task. In recent years, PIFR has predominantly focused on a holistic approach. DCNNs such as ArcFace, Elastic Face, and FaceNet have been utilized to generate face image embeddings, which are subsequently employed for face recognition with promising results.

This thesis aims to address PIFR from a novel local approach. In this approach, pose-invariant face recognition is conducted by combining head pose estimation (HPE), ensemble learning systems, and local feature descriptors. One ensemble system is trained for each subject on a face database. This ensemble system comprises a base learner set, where each base learner is exclusively trained with feature vectors extracted from the region surrounding an specific facial landmark within a gallery face image. In order to perform PIFR on an input image the following steps are carried out. First, the face and facial landmarks are detected in the image. Second, the facial ladmarks locations are processed to obtain a head pose descriptor, which is employed to perform HPE. According to the HPE results, some landmarks are selected to apply local feature descriptors on them, given that some of them might be self-occluded by the head pose. Third, the computed descriptors are distributed to their corresponding base learners. Indeed, one base learner is linked to a specific facial landmark. Fourth, the base learners’ outputs are combined for each ensemble system to compute its ensemble decision support value. Finally, the identity of the input face image is computed by selecting the ensemble system with the highest decision support.

In this work, three novel head pose descriptors (FAV, EFAV, NFCV) are proposed in conjunction with a non-linear regression model to perform HPE. Experimental results on the BIWI, and Pointing’04 databases showed that the proposed HPE approach outperforms several state-of-the-art works using the same databases. On the other hand, a novel local feature descriptor, called Landmark-specific SIFT (LS-SIFT) is proposed to improve the robustness of SIFT against changes in viewpoint and illumination. Additionally, three innovative base learner models (RD-CS, GMM, Mahalanobis Similarity) are developed in this thesis. The whole PIFR framework is tested on the CMU-PIE, Multi-PIE, and FERET databases. The obtained results are comparable to the state-of-the-art works. At the end, the combination of LS-SIFT with Mahalanobis Similarity yielded the best recognition results.
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Thesis Organization 3

Chapter 2 Backgrounds 4
2.1 Local feature descriptors 4
2.2 Facial landmark detection 8
2.3 Head Pose Estimation 13
2.4 Ensemble learning 15
2.5 Non-linear vector transformations via Neural Networks 16
2.6 Face recognition 18

Chapter 3 Related work 22
3.1 Works on head pose estimation using facial landmarks 22
3.1.1 Head Pose Estimation in the Wild Assisted by Facial Landmarks Based on Convolutional Neural Networks 22
3.1.2 Web-Shaped Model for Head Pose Estimation: An Approach for Best Exemplar Selection 24
3.2 Works on face recognition using ensemble learning 26
3.2.1 Research on face recognition based on ensemble learning 26
3.2.2 Ensemble of deep convolutional neural networks with gabor face representations for face recognition 27

Chapter 4 The proposed method 29
4.1 Head pose description, estimation, and classification 29
4.2 Learned non-linear mapping for enhancing the robustness of facial landmark descriptors against pose and illumination variability (LS-SIFT) 35
4.3 Facial landmark description 40
4.4 Pose-invariant face recognition using ensemble systems 41
4.4.1 Base learners 43
4.4.2 Ensemble face recognition models 47

Chapter 5 Experimental results and performance analysis 54
5.1 Head pose estimation databases 54
5.1.1 BIWI database 54
5.1.2 Pointing’04 database 55
5.2 Face recognition databases 56
5.2.1 CMU-PIE database 56
5.2.2 CMU Multi-PIE database 56
5.2.3 FERET database 58
5.3 Experiments on head pose estimation and classification 59
5.4 Experiments on Pose-Invariant Face Recognition 70

Chapter 6 Conclusions 85

Appendices 88
A Appendix: Published papers 88
A.1 Large Pose Detection and Facial Landmark Description for Pose-invariant Face Recognition 88
A.2 Pose-invariant Face Recognition via Facial Landmark based Ensemble Learning 89
B Appendix: Obtained Awards 91

References 92
