作者(英文):Ting-Yu Chang
論文名稱:基於 YOLO 偵測器之降低遮蔽影響的多物件追蹤
論文名稱(英文):Multiple Object Tracking with Occlusion Effect Reduction using YOLO-based Detector
指導教授(英文):Shin-Feng Lin
口試委員(英文):Kuo-Cheng Liu
I-Cheng Chang
關鍵詞(英文):Multiple object trackingYOLORe-identificationMotion predictionOcclusion
計算機視覺中,多物件追蹤(MOT)在解決許多重要問題中有著重要的作用;如自動車、人群行為分析和人機互動。MOT也有許多挑戰需要被克服,例如ID的重新識別和如何處理被遮蔽的物件。Tracking-by-detection是MOT中常用的方法,透過預先使用偵測器進行偵測物件,並利用所得到的檢測資料完成追蹤、識別物件、物件重新識別和運動預測。從影片中提取一組用於引導追踪過程的檢測,並將檢測結果關聯在一起,以相同的標識分配給包含相同目標的邊界框中。在本文中,MOT 使用 YOLO 取代傳統的檢測器,能在一開始就有更好的檢測結果,以便之後追蹤階段的效果提升。
本論文提出了一種基於 YOLO 偵測器之降低遮蔽影響的多物件追蹤。 目標是透過少數的目標特徵在各種場景下實現良好的效果,包括擁擠的廣場、夜景、購物中心的移動相機和擁擠的火車站室內。這些場景存在於2DMOT15、 MOT16和MOT20的影片集中。本方法分別有兩個階段,YOLO 檢測階段和遮蔽處理階段。在檢測部分,我們使用了先進的YOLOv4、YOLOv5、YOLOv7來替代公開檢測器,以獲得比傳統方法更好的結果。目標是開發一個系統,只使用幾個目標特徵,即使有遮蔽也可以重新識別物件,這些強大的偵測方法能夠大幅提升影片中偵測到的物件數,因此能夠更好的讓追蹤器進行關聯,以提高實驗的效果。在遮蔽處理中,我們擷取所有物件的特徵並保存,當遇到物件被遮蔽時,我們也可以重新識別被遮蔽的物件。從實驗數據證明此方法在擁擠的廣場、夜景、擁擠的火車站等具有挑戰性的場景中的效果,相較於許多先進的方法有更好的表現。
Multiple Object Tracking (MOT) in computer vision is crucial in solving various crucial problems like autonomous vehicles, crowded behavior analysis, and human-computer interaction. Despite its significance, MOT faces several challenges, including ID re-identification and handling occluded objects. Tracking-by-detection is the common method in MOT, incorporating object re-identification and motion prediction. The video frames extract a set of detections to guide the tracking process, which is then associated with assigning the same identity to bounding boxes containing the same target. This article employs YOLO for object proposals and utilizes bounding box regression and association to predict object position.
This thesis proposes Multiple Objects Tracking with Occlusion Effect Reduction using YOLO-based Detector. The objective is to achieve high accuracy in MOT in challenging scenes such as crowded squares, night scenes, moving cameras in shopping malls, and crowded indoor train stations, as shown in 2DMOT15, MOT16, and MOT20 sequences.
The proposed system has two stages: YOLO detection and occlusion reduction. The detection stage uses advanced YOLOv4, YOLOv5, and YOLOv7 detectors for better results than conventional methods. The goal is to develop a system that can re-identify objects even with occlusion and use only a few target features.
The proposed method is compared with state-of-the-art techniques through experiments, demonstrating its robustness against various challenges. It shows good performance in challenging scenes such as crowded squares, night scenes, and crowded train stations.
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Thesis Organization 4
Chapter 2 Background 5
2.1 Deep Learning in Object Detection 5
2.2 Image Classification and Tracking as a Graph Problem 6
2.3 Appearance Models and Re-identification 6
2.4 Intersection over Union 6
2.5 Non-Maximum Suppression 7
2.6 Detection with Transformers 8
Chapter 3 Related Work 9
3.1 Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking 9
3.2 Tracking without Bells and Whistles 10
3.3 MOTR: End-to-End Multiple-Object Tracking with Transformer 12
Chapter 4 The Proposed Method 14
4.1 Feature Extraction 15
4.2 Object Detection 16
4.3 Bounding Box Regression and Association 18
4.4 Occlusion Effect Reduction 18
Chapter 5 ExperimentalResults 21
5.1 Metrics of MOT 21
5.1.1 Clear MOT metrics 21
5.1.2 ID scores 23
5.1.3 Classical metrics 24
5.2 Experiment Databases 25
5.3 Comparison with Other Methods 26
Chapter 6 Conclusions 32
References 33
