以DB-BiLSTM 網路整合多專注特徵實現細粒度影像分類__國立東華大學博碩士論文全文影像系統

帳號：guest(3.145.62.88) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	李奎翰
作者(英文):	Kuei-Han Li
論文名稱:	以DB-BiLSTM 網路整合多專注特徵實現細粒度影像分類
論文名稱(英文):	Integrating Multi-attention Features by the DB-BiLSTM Network for Fine-grained Image Classification
指導教授:	江政欽
指導教授(英文):	CHENG-CHIN CHIANG
口試委員:	魏德樂方文杰
口試委員(英文):	Der-Lor Way Wen-Chieh Fang
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊工程學系
學號:	610921234
出版年(民國):	112
畢業學年度:	111
語文別:	中文
論文頁數:	70
關鍵詞:	細粒度影像分類、影像分割、部位專注、影像特徵提取
關鍵詞(英文):	fine-grained image classification、image segmentation、part attention、image feature extraction
相關次數:	推薦:0 點閱:7 評分: 下載:6 收藏:0

細粒度影像分類(Fine-grained image classification, FGIC)是電腦視覺領域中一個長期存在的研究挑戰，例如區分不同品牌型號年份的汽車(子類別的藍寶堅尼汽車型號Aventador Coupe年份2012和Gallardo LP 570-4 Superleggera) (斯坦福汽車)或不同種類的鳥類(子類別的黃鸝鳥vs史考特氏擬黃鸝) (CUB-鳥類)的諸多挑戰，近來在深度神經網路學習模型的幫助下取得了長足的進步，受到越來越多的關注。然而物體易受到部份遮擋的影響，從而降低了分類的準確度。本研究設計了一種從輸入端的RGB影像，估測出物體所屬子類別的「細粒度影像分類之多專注區域(Attention Region)整合分類法」架構。此神經網路先利用「物體偵測與部位專注定位模組」從影像中切割出物體，以及在影像上定位出多個較具有細緻區分能力的部位專注區域，然後送入「多專注區域特徵整合模組」，從物體影像及各個專注區域中以雙向長短期記憶網路(Bidirectional long short-term memory, BiLSTM)提取出整合特徵，以供後續進行細粒度影像分類。再將提取的特徵做為下一階段「DB-BiLSTM Net特徵整合分類模組」的輸入；提取了整合特徵及個別特徵後，我們還另外設計一個分解式雙線性雙向長短期記憶網路(Decomposed Bilinear-Bidirectional LSTM Net, DB-BiLSTM Net)分類架構進行最後的細粒度影像分類，此DB-BiLSTM Net以本研究所自行設計的分解式雙線性層(Decomposed Bilinear Layer, DBLayer)學習物體的整體性特徵與局部性部位專注特徵間的交互作用，可以有效改進最終細粒度分類準確度。我們的模型分別在斯坦福汽車、CUB-鳥類和FGVC-飛機資料庫上獲得 93.71%、87.59%、91.14% 的準確度，相對前人所提出的多專注卷積神經網路(Multi-Attention Convolutional Neural Network, MA-CNN) 92.75%、 86.58% 、89.90%分別提高了0.96%、1.01%及1.24%。

Fine-grained image classification (FGIC) is a long-standing research challenge in the field of computer vision, such as distinguishing between different brand models and years of cars (subcategories of Lamborghini car models Aventador Coupe year 2012 and Gallardo LP 570-4 Superleggera) (Stanford Cars) or different types of birds (subcategories of Yellow Warbler vs. Scott's Oriole) (CUB-Birds). In recent years, significant progress has been made with the help of deep neural network learning models, and it has gained increasing attention. However, partial occlusion can easily affect objects, which reduces classification accuracy. This study designs a "Multi-Attention Region Integration Classification Method for Fine-grained Image Classification" architecture for recognizing the subcategory of the object from the input RGB image. The neural network first uses the "Object Detection and Multi-Attention Region Localization Module" to cut the object from the image and locate multiple parts with finer discriminative power on the image. Then, a "Multi-Attention Region Feature Integration Module" extracts integrated features from the object image and each attention region using a bidirectional long short-term memory network (BiLSTM). The interested features are then used as input to the "Multi-Attention Region Integration Classification Module" in the next stage. We also design a Decomposed Bilinear-Bidirectional LSTM Net (DB-BiLSTM Net) for the final fine-grained image classification. This DB-BiLSTM Net uses a decomposed bilinear layer (DBLayer) developed in this study to effectively learn the interaction between the overall features of the object and the localized attention features, improving the accuracy of the final fine-grained classification. From the experimental results, our model achieves an accuracy of 93.71%, 87.59%, and 91.14% on the Stanford Cars, CUB-Birds, and FGVC-Aircraft databases, respectively. Compared with the Multi-Attention Convolutional Neural Network (MA-CNN) proposed by previous researchers, our model improves the accuracy by 0.96%, 1.01%, and 1.24%, respectively.

致謝 I
摘要 II
Abstract IV
目錄 V
圖目錄 VIII
表目錄 X
第 1 章緒論 1
1.1. 研究背景 1
1.2 動機與目的 2
1.3. 章節架構 3
第 2 章文獻探討 4
2.1. 物體關鍵部位區域定位 4
2.2. 局部部位專注區域特徵整合 6
2.3.局部部位專注區域整合分類 7
2.4. 擬解決之問題與解決方案概述 9
第 3 章研究方法 11
3.1. 物體偵測與部位專注定位模組 12
3.1.1. 模組設計 12
3.1.2. 模型訓練 15
3.2. 多專注區域特徵整合模組 20
3.2.1. 模組設計 20
3.2.1.1 雙向長短期記憶網路特徵序列整合法 22
3.2.1.2 分解式雙線性層(Decomposed Bilinear Layer)特徵整合法 23
3.2.2. 模型訓練 26
3.3. DB-BiLSTM Net特徵整合分類模組 28
3.3.1. 模組設計 28
3.3.2. 模型訓練 31
第 4 章實驗結果 32
4.1. 斯坦福汽車結果比較 33
4.2. CUB-鳥類結果比較 36
4.3. FGVC-飛機結果比較 39
4. 4. 銷融實驗 42
4.4.1. 不同縮放偵測率的比較 42
4.4.2. 不同解析度的比較 43
4.4.3 不同資料擴增的比較 44
4.4.4 資料邊緣填充的比較 44
4. 5. 雙向長短期記憶網路、DB-BiLSTM Net與MA-CNN比較 45
第 5 章結論 47
文獻參考 48

[1]Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
[2]X.-S. Wei, C.-W. Xie, J. Wu, and C. Shen, “Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization,” PR, vol. 76, pp. 704–714, 2018.
[3]J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015.
[4]H. Zheng, J. Fu, T. Mei, and J. Luo, “Learning multi-attention convolutional neural network for fine-grained image recognition,” in ICCV, 2017, pp.5209–5217
[5]M. Lin, Q. Chen, and S. Yan, “Network in network,”CoRR, vol. abs/1312.4400, 2013. [Online]. Available:http://arxiv.org/abs/1312.4400
[6]T. -Y. Lin, A. RoyChowdhury and S. Maji, "Bilinear CNN Models for Fine-Grained Visual Recognition," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1449-1457, doi: 10.1109/ICCV.2015.170. arXiv:1504.07889
[7]K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, pages 1409–1556, 2015.
[8]K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 2980-2988, doi: 10.1109/ICCV.2017.322.
[9]S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," in Neural Computation, vol. 9, no. 8, pp. 1735-1780, 15 Nov. 1997.
[10]K. He, X. Zhang, S. Ran and J. Sun, “Deep residual learning for image recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770- 778
[11]T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S.Belongie, "Feature Pyramid Networks for Object Detection,"CVPR, Honolulu, HI, 2017, pp. 936-944.
[12]S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in NIPS, 2015, pp. 91–99
[13]J. Krause, M. Stark, J. Deng and L. Fei-Fei, "3D Object Representations for Fine-Grained Categorization," 2013 IEEE International Conference on Computer Vision Workshops, Sydney, NSW, Australia, 2013, pp. 554-561.
[14]C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” California Institute of Technology, Tech. Rep. CNS-TR-2011-001, 2011.
[15]S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Finegrained visual classification of aircraft,” 2013, arXiv:1306.5151.

(此全文20260602後開放外部瀏覽)
01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文