以整合式形狀與符號聚合近似法為基礎之時間序列資料分析與行為辨識_

帳號：guest(3.16.218.95) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	陳昱衡
作者(英文):	Yu-Hen Chen
論文名稱:	以整合式形狀與符號聚合近似法為基礎之時間序列資料分析與行為辨識
論文名稱(英文):	Time Series Data Analysis and Behavior Recognition with Integrated Shape and Symbolic Aggregate Approximation
指導教授:	吳秀陽
指導教授(英文):	Shiow-Yang Wu
口試委員:	張耀中孫宗瀛
口試委員(英文):	Yao-Chung Chang Sun-Zong Ying
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊工程學系
學號:	610621206
出版年(民國):	109
畢業學年度:	108
語文別:	中文
論文頁數:	60
關鍵詞:	即時串流分析、時間序列近似法、形狀與符號聚合近似法、相似度計算、行為辨識與追蹤
關鍵詞(英文):	real-time streaming analysis、time series representation、shape and symbol aggregate approximation、similarity calculation、behavior identification and tracking
相關次數:	推薦:0 點閱:23 評分: 下載:27 收藏:0

隨著物聯網(IoT)及雲端運算科技的的快速發展，串流大數據分析成為熱門的研究，其中時間序列分析與行為模式預測是一項很重要的議題。傳統上大多的資料分析都是以批次處理(batch processing)為主，無法滿足即時串流大數據分析的需求。本篇論文目的在探討即時串流大數據之分析應用，研究時間序列的行為模式辨識與追蹤，主要概念是透過分析比對行為時間序列與已知頻繁模式為判別基礎。為了減少資料維度並降低處理複雜度，我們探討了時間序列近似法，進而提出新的整合式符號與形狀聚合近似法。同時為提升即時分析的處理效率，提出前綴樹(Prefix Tree)索引以及動態搜尋策略。然後藉由分析行為時間序列與頻繁模式之間的相似程度，辨識序列的行為模式是否正常或偏離常態，並持續追蹤其行為變化。我們以實際車行軌跡和模擬資料進行實驗，測試所提方法的正確性以及辨識效能。實驗結果顯示，所提行為辨識與追蹤策略可以正確地辨識出所屬於的頻繁模式，並且也能夠成功分辨出具有數值相似而形狀走勢不同，或是走勢相似和數值不同的兩個時間序列。

With the rapid development of the Internet of Things (IoT) and cloud computing technology, streaming big data analysis has become a hot research topic. Among them, time series analysis and behavior pattern prediction are very important research topics. Traditionally, most data analysis is based on batch processing, which cannot meet the needs of real-time streaming big data analysis. The purpose of this paper is to explore the analysis and application of real-time streaming big data, and to study the identification and tracking of behavior patterns in time series. The main concept is to analyze and compare the behavior time series and known frequent patterns as the basis for discrimination. To reduce the data dimension and the processing complexity, we explored the time series representation, and then proposed a new integrated symbolic and curve aggregate representation. At the same time, to improve the processing efficiency of real-time analysis, a prefix tree index and a dynamic search strategy are proposed. Then, by analyzing the similarity between the behavioral time series and frequent patterns, identify whether the behavioral patterns of the sequence are normal or deviate from normal, and continue to track the behavior changes. We conduct experiments with actual vehicle trajectories and simulated data to test the correctness and identification performance of the proposed method. The experimental results show that the proposed behavior identification and tracking strategy can correctly identify the frequent patterns it belongs to and can also successfully distinguish two time series with similar values but different shape trends, or similar trends and different values.

第1章緒論 1
1.1研究背景與動機 1
1.2研究方法 2
1.3研究成果 2
1.4論文架構 3
第2章相關研究與技術 5
2.1物聯網 5
2.2串流大數據分析 5
2.3雲端分散式計算 6
2.4時間序列特徵近似法 9
2.5最長公共子序列 10
2.6行為辨識與追蹤 11
第3章時間序列資料處理與相似度分析 13
3.1符號聚合近似法 13
3.2形狀聚合近似法 16
3.2整合式形狀與符號聚合近似法 20
3.3相似度分析演算法 20
3.3.1 LCSS之現有研究與問題 20
3.3.2空間與時間最長公共子序列 23
第4章行為追蹤與模式預測機制 25
4.1頻繁模式索引建立 25
4.2動態搜尋策略 33
4.3行為辨識與模式追蹤 33
第5章系統實作與效能評估 39
5.1實驗環境 39
5.2實驗資料與模擬方法 40
5.3實驗結果 41
5.3.1正確性實驗 42
5.3.2資料擴展性實驗 47
5.3.3辨識能力實驗 52
第6章結論與未來工作 55
6.1結論 55
6.2未來工作 55
參考文獻 57

[1]台灣物聯網實驗室 IOT Labs, 財經觀點/大數據時代即時串流分析, https://www.facebook.com/iotlab/posts/903683066328826 [參照日期2020/7/20]
[2]簡立峰 NTU Startup Day 演說: 物聯網是台灣扭轉產業的最後一次機會 http://www.bnext.com.tw/article/39479/BN-2016-05-06-163232-117[參照日期2020/7/20]
[3]Amazon Kinesis Analytics. https://aws.amazon.com/tw/kinesis/analytics/ [參照日期2020/7/20]
[4]經濟日報https://money.udn.com/money/story/5612/2804244 [參照日期2020/7/20]
[5]Pengfei Hu, Sahraoui Dhelim, Huansheng Ning, and Tie Qiu, "Survey on fog computing: architecture, key technologies, applications and open issues", Journal of Network and Computer Applications, vol. 98, pp. 27-42, Nov. 2017.
[6]Yi, Shanhe, Cheng Li, Qun Li. “A Survey of Fog Computing: Concepts, Applications and Issues”. In Mobidata '15 Proceedings of the 2015 Workshop on Mobile Big Data, pp. 37-41, 2015.
[7]作者Taneyahomilk iT邦幫忙基於雲端Iaas基礎平台OpenStack結合Kubernetes, BlockChain, Spark, SDN系列第 11 篇Apache Spark 簡介 https://ithelp.ithome.com.tw/articles/10194895 [參照日期2020/7/20]
[8]作者yjhyjhyjh0 痞客幫：Spark RDD (Resilient Distributed Datasets) 詳細圖文介紹 https://yjhyjhyjh0.pixnet.net/blog/post/411468760 [參照日期2020/7/20]
[9]K.J. Åström, “On the choice of sampling rates in parametric identification of time series”, Information Sciences, vol. 1, no. 3, pp. 273-278, Jul. 1969.
[10]Agrawal, Rakesh, Faloutsos, Christos, Swami, Arun, “Efficient Similarity Search In Sequence Databases”, Lecture Notes in Computer Science, vol. 730, pp. 69-84, 1993.
[11]Eamonn J. Keogh and Michael J. Pazzani, “Scaling up dynamic time warping for datamining applications”, In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’00). Association for Computing Machinery, New York, NY, USA, pp. 285–289, 2000.
[12]Lin, J., Keogh, E., Wei, L. et al, “Experiencing SAX: a novel symbolic representation of time series”, Data Min Knowl Disc, vol. 15, pp. 107–144 , 2007.
[13]M. Vlachos, G. Kollios and D. Gunopulos, "Discovering similar multidimensional trajectories", Proceedings 18th International Conference on Data Engineering, San Jose, CA, USA, pp. 673-684, 2002, DOI: 10.1109/ICDE.2002.994784.
[14]遠洋號發表於程式開發：動態規劃求解最長公共子序列 https://kknews.cc/code/vvvb2e2.html [參照日期2020/7/20]
[15]最長公共子序列-維基百科https://zh.wikipedia.org/wiki/%E6%9C%80%E9%95%BF%E5%85%AC%E5%85%B1%E5%AD%90%E5%BA%8F%E5%88%97 [參照日期2020/7/20]
[16]Yufeng Yu, Yuelong Zhu, Dingsheng Wan, Qun Zhao, and Huan Liu. “A Novel Trend Symbolic Aggregate Approximation for Time Series”, ArXiv abs/1905.00421, 2019.
[17]Donald J. Berndt and James Clifford, “Using dynamic time warping to find patterns in time series”, In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (AAAIWS’94). AAAI Press, pp. 359–370, 1994.
[18]Lei Chen and Raymond Ng, “On the marriage of Lp-norms and edit distance”, In Proceedings of the Thirtieth international conference on Very large data bases - Volume 30 (VLDB ’04). VLDB Endowment, pp. 792–803, 2004.
[19]Lei Chen, M. Tamer Özsu, and Vincent Oria, “Robust and fast similarity search for moving object trajectories”, In Proceedings of the 2005 ACM SIGMOD international conference on Management of data (SIGMOD ’05). Association for Computing Machinery, New York, NY, USA, pp. 491–502, 2005, DOI:https://doi.org/10.1145/1066157.1066213
[20]Gholamreza Soleimani, Masoud Abessi, “DLCSS: A new similarity measure for time series data mining”, Engineering Applications of Artificial Intelligence, vol. 92, 2020.
[21]Taxi Trajectory Data. Data from ECML/PKDD 15: Taxi Trip Time Prediction (II) Competition. https://www.kaggle.com/crailtap/taxi-trajectory [參照日期2020/7/20]
[22]作者JulyRina 字典樹簡介 http://www.cppblog.com/JulyRina/archive/2015/03/09/209980.html [參照日期2020/7/20]
[23]Spatial Data Science powered by SAP HANA. Mathias Kemeter https://towardsdatascience.com/spatial-data-science-powered-by-sap-hana-9d1153afa577 [參照日期2020/7/20]
[24]作者Jewel INSIDE：認識大數據的黃色小象幫手-- Hadoop https://www.inside.com.tw/article/4428-big-data-4-hadoop [參照日期2020/7/20]
[25]Apache Spark http://tadviser.com/index.php/Product:Apache_Spark [參照日期2020/7/20]
[26]Marvin Raval, Longest Common Subsequence(LCS) Simplified and solution using Dynamic Programming with code also https://medium.com/@marvinraval99/longest-common-subsequence-lcs-using-dynamic-programming-2d77e6d9d683 [參照日期2020/7/20]
[27]作者lobogaw 痞客幫：常態分配 (Normal distribution) https://lobogaw.pixnet.net/blog/post/90548816 [參照日期2020/7/20]

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文