作者(英文):Zih-Han Luo
論文名稱(英文):A Fitness Action Tracking System Based on Pose Sequence Recognition
指導教授(英文):I-Cheng Chang
口試委員(英文):Yuan-Kai Wang
Huang-Chia Shih
Yi-Cheng Chen
關鍵詞(英文):fitness exerciseshuman pose estimationaction recognition
本論文開發一個基於人體姿態序列辨識之健身動作追蹤系統,其內容包含人體偵測、人體追蹤、人體姿態估測,及動作辨識四個技術。此系統針對多人健身運動進行追蹤與辨識,並將追蹤後的運動予以分析,紀錄每個人從事不同健身運動的時間、運動到的身體部位,及消耗的卡路里,提供使用者的運動回饋。我們建構一個多角度的拍攝的健身動作資料庫,讓訓練後的模型,在辨識時可以不受視角的影響。我們建構一個新的人體姿態估測網路RAHPNet(Residual Attention Heatmap Prediction Network),用於生成健身動作影像的動作姿態序列,並結合時空圖卷積網路用於辨識動作姿態序列。實驗部分,本研究分別在MPII Multi-Person Dataset以及MSCOCO Keypoints Challenge評估開發的RAHPNet,並分別達到74.5 mAP以及63.5mAP,超越所比較的其他研究。本系統除了可辨識影像中健身人員所從事的健身動作外,也可以在多人環境下對每個人員進行追蹤,並記錄其運動期間的相關資訊。並透過本研究所建構的Fitness Sport Dataset評估健身動作辨識系統,其準確率也達到90.2%。
In recent years, fitness exercising has become a popular trend in Taiwan. Regular fitness exercises can not only help people maintain a standard physique and muscles, but also improve sleep quality and increase positive emotions. The development of deep learning in computer vision is growing rapidly, as it performs better than traditional machine learning technologies through learning a large amount of data. Therefore, developing exercise action recognition through deep learning techniques has recently been emphasized.
The current research on action recognition based on human skeletons can be classified into two types: 2D-based and 3D-based human skeletons. 2D human skeleton sequences can be extracted from videos, which is easy to acquire and therefore has a large database and a variety of actions, but its recognition is affected by the camera angle. The 3D human skeleton sequence owns 3D information of movements and can be used to recognize complex movements; however, the data acquisition cost is relatively high, and it is not easy to be generalized to practical applications.
This thesis develops a fitness action tracking system based on pose sequence recognition, which consists of four main techniques: human detection, human tracking, human pose estimation, and action recognition. The system tracks and identifies multi-person fitness exercises, and record the time, muscle, and calories burned of each person in different fitness exercises to provide users with exercise feedback. We build a multi-view database of fitness exercises so that the trained model can be recognized regardless of the angle of view. We also build a new pose estimation network, Residual Attention Heatmap Prediction Network (RAHPNet), for generating action pose sequences from fitness exercises images. The sequences are used to recognize the fitness action through the spatial-temporal graph convolutional networks. In the experimental results, RAHPNet is evaluated on two datasets: MPII Multi-Person Dataset and MSCOCO Keypoints Challenge, and it achieved 74.5 mAP and 63.5 mAP, respectively. The experimental results show that our method outperforms the other methods. The action recognition accuracy can achieve 90.2% on the fitness action dataset. Besides, we also show the system can track each exerciser's activities under a multi-person environment.
摘要 6
Abstract 7
Content 8
List of Table 10
List of Figure 11
Chapter 1 Introduction 13
1.1 Background and Motivation 13
1.2 System Overview 14
1.3 Thesis Organization 15
Chapter 2 Related Work 16
2.1 Human Pose Estimation 16
2.1.1 Single-Person Pose Estimation 16
2.1.2 Multi-Person Pose Estimation 18
2.2 Human Action Recognition 21
2.2.1 Image-based Human Action Recognition 21
2.2.2 Skeleton-based Human Action Recognition 23
Chapter 3 Multi-Person Pose Estimation 25
3.1 Human Detection 25
3.2 Human Tracking 27
3.3 Pose Sequence Generation 29
3.3.1 Residual Attention Heatmap Prediction Network 30
3.3.2 Human Block Augmentation and Pose NMS 34
Chapter 4 Multi-View Action Recognition 37
4.1 Multi-View Fitness Action Dataset 37
4.2 ST-GCN (Spatial-Temporal Graph Convolutional Networks) 40
Chapter 5 Experimental Results 44
5.1 Performance of Pose Estimation 44
5.1.1 MSCOCO Keypoints Challenge 45
5.1.2 MPII Multi-Person Dataset 48
5.3 Performance of Action Recognition 50
5.3.1 Kinetics Dataset 50
5.3.2 Multi-View Fitness Action Dataset 51
5.4 System Performance 52
Chapter 6 Conclusion 58
References 59
