作者(英文):Yi-Le Liu
論文名稱(英文):Leather Retrieval System with TSDP-CVAE : Triplet-based Symmetric Dual-Path Conditional Variational Autoencoder using Metric Learning
指導教授(英文):Cheng-Chin Chiang
口試委員(英文):Jun-Wei Hsieh
Shin-Feng Lin
關鍵詞(英文):Conditional Variational AutoEncoder (CVAE)Triplet LossLeather Texture RetrievalImage RetrievalEdge DetectionTraditional Texture FeaturesMetric LearningCluster Analysis
為抵抗色紋干擾,我們設計了一種皮紋與色紋混合合成方法來合成更多訓練樣本,以幫助深度神經網路有效學習抵抗色紋干擾。在神經網路模型上,我們提出一個改良型條件式變分自編碼器(Conditional Variational Autoencoder, CVAE)融入常用來識別紋理圖像的灰度共生矩阵(Gray-level Co-occurrence Matrix,GLCM)特徵並結合三元損失函數(Triplet Loss Function)來讓CVAE學習出一個嵌入特徵空間轉換,讓同系列皮革紋路在該嵌入空間中有理想的群聚效果,而讓不同系列皮革紋路在該嵌入空間中則會互相遠離,如此便可在後續辨識階段使用高斯混合模型(Gaussian Mixture Model,GMM)來檢索皮革。我們將此用來抽取嵌入特徵的神經網路稱為三元損失函數條件式變分自編碼器(Triplet Loss Conditional Variational Autoencoder, TCVAE)。
為了讓我們所提出的TCVAE可克服因為層數增加而造成學習不佳的問題,我們也在此神經網路的架構中加入了對稱式跳躍連結(Symmetric Skip Connections)以及密集式連結(Densely Connections),我們稱此架構的神經網路為三元損失對稱雙路條件式變分自編碼器(Triplet-based Symmetric Dual-Path CVAE,TSDP-CVAE)。此外我們更整合了改良版三元損失(Improved Triplet Loss)技術來讓神經網路更能學習提取各種皮紋的共同特徵。經過實驗測試,我們所提出的TSDP-CVAE在檢索皮革的應用上取得了相當理想的準確率。
Leather products have exquisite and diverse textures. Identifying these textures by human eyes requires high costs in both manpower and time. Therefore, an automatic way for retrieval leather textures by machines is desirable. A leather commonly has two types of textures. One is the leather texture, which is mainly used to identify the texture of the leather. Leather textures usually have deep dents or fine cracks. The other is the color spots, which appears after a customized dyeing process. Color spots have no dents but will cause interference to the leather texture identification. This research is mainly to design and develop the deep learning technique of leather texture retrieval. Our technique is expected to be able to effectively resist the interference of color spots and require no time-consuming retraining of the deep neural network when new leather textures are included.
To resist the interference of color spots, we design a synthesis method to synthesize more leather patterns with hybrid textures and color spots for training so that the deep neural networks can learn to better handle the noisy color spots. As to the neural network model, we propose a modified Conditional Variational Autoencoder (CVAE) to incorporate the Gray-level Co-occurrence Matrix (GLCM) features, which is commonly used to identify texture images. Combined with a Triplet Loss Function, the CVAE is able to learn an embedded feature space, in which the feature representations of intra-class leather samples are well clustered while those of inter-class leather samples are pulled away. This embedded feature space is very suitable for using the Gaussian Mixture Model (GMM) in our later retrieval of leather textures. The proposed network that extracts embedded features is called the Triplet Loss Conditional Variational Autoencoder (TCVAE).
To overcome the problem of poor learning due to the increased number of layers in autoencoders, we also introduced Symmetric Skip Connections and Densely Connections into our network architecture. We call this network the triplet loss-based Symmetric Dual-Path CVAE (TSDP-CVAE). Additionally, we also integrate the Improved Triplet Loss (ITL) in the network so that the common features of leather textures can be explored. The experimental results verify that TSDP-CVAE achieves a satisfactory accuracy in the leather texture retrieval applications.
致謝 I
摘要 III
Abstract V
目錄 VII
圖目錄 IX
表目錄 XI
第1章 緒論 1
1.1 研究背景及動機 1
1.2 研究目的與問題 2
1.2 遭遇之問題與解決方法 3
1.3 章節架構 8
第2章 文獻回顧 9
2.1 相關技術與背景 9
2.1.1 基於圖像的辨識研究現狀 9
2.1.2 GLCM 10
2.1.3 GMM 11
2.2 本系統之處理流程 12
第3章 影像採集及前處理 15
3.1 圖像採集方式 15
3.2 有效區域切割 15
3.3 小塊取樣 16
3.4 小塊挑選 17
3.5 前處理演算法 19
第4章 擴充樣本演算法 29
4.1 旋轉影像樣本擴增 29
4.2 皮革混合生成演算法 29
第5章 TSDP-CVAE類神經網路架構 33
5.1 三元損失函數(Triplet Loss Function) 33
5.2 整合三元損失函數與GMM的變分自編碼器 35
5.3 以Symmetric Learning改良自編碼器架構 39
5.4 以GLCM特徵融入TVAE學習 40
5.5 改良版三元損失(Improved Triplet Loss, ITL) 44
5.6 以DenseNet改良自編碼器架構 45
第6章 實驗資料與方法 49
6.1 實驗方法 49
6.2 實驗結果 50
6.2.1 前處理效果比較 50
6.2.2 GLCM整合GMM與AlexNet效果比較 52
第7章 結論與未來展望 59
7.1 結論 59
7.2 未來工作 59
參考文獻 61
