帳號:guest(3.15.182.159)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目勘誤回報
作者:鄭學遠
作者(英文):SYUE-YUAN JHENG
論文名稱:兩階段模糊C均值聚類演算法
論文名稱(英文):Two-stage Fuzzy C-means Cluster Analysis
指導教授:孫宗瀛
謝欣然
指導教授(英文):Tsung-Ying Sun
Hsin-Jang Shieh
口試委員:謝鴻琳
謝欣然
孫宗瀛
口試委員(英文):Horng-Lin Shieh
Hsin-Jang Shieh
Tsung-Ying Sun
學位類別:碩士
校院名稱:國立東華大學
系所名稱:電機工程學系
學號:610823012
出版年(民國):111
畢業學年度:110
語文別:中文
論文頁數:67
關鍵詞:基於網格的聚類分析基於馬氏距離的模糊C均值聚類分析資料分佈特徵分析
關鍵詞(英文):grid-based clusteringMahalanobis-based fuzzy C-means clusteringdata distribution feature analyzing
相關次數:
  • 推薦推薦:0
  • 點閱點閱:7
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:8
  • 收藏收藏:0
聚類分析是常用的非監督式機器學習方法,是分析不同統計測量值之間相似/差異程度的描述性分析過程,不同的聚類分析演算法都有其不同的限制,例如:預設聚類數量、任意形狀的資料分佈、最佳參數的設定或大數據聚類的計算複雜度等。
單一聚類演算法無法同時改善前述問題,由文獻探討知道網格聚類方法有不需預設聚類數量、計算複雜度較低、任意形狀聚類等優點,模糊C-均值聚類演算法對凸集合資料有較佳的聚類結果。因此,本研究思考結合這兩種聚類演算法,提出兩階段模糊C均值聚類演算法(Two-Stage Fuzzy C-means Cluster Analysis, TS-FCM),先以網格聚類做初步的分群及取得資料分佈特徵,並判斷是否適合繼續以網格聚類的結果為初始值,進行第二階段的基於馬氏距離的模糊C均值聚類,以獲得更好的聚類結果。
本研究的實驗結果顯示,結合兩種聚類演算法的兩階段聚類分析,不需要預設聚類數量,在不同的資料分佈特徵下都能有不錯的聚類計算結果。同時,明顯改善傳統模糊C均值聚類演算法的初始值設定和迭代次數問題。
Cluster analysis is a commonly used unsupervised machine learning method. It is a descriptive analysis process to analyze the degree of similarity/ difference between different statistical measurements. Different cluster analysis algorithms have different limitations, such as: preset number of clustering, the data distribution of arbitrary shape, the setting of optimal parameters, or the computational complexity of big data clustering, etc.
A single clustering algorithm cannot improve the aforementioned problems at the same time. It is known from literature research that the grid clustering method has the advantages of no need to preset the number of clusters, low computational complexity, and arbitrary shape clustering. The fuzzy C-means clustering algorithm has better clustering results for convex set data. Therefore, this research considers combining these two clustering algorithms, and proposes a two-stage fuzzy C-means clustering algorithm (TS-FCM). In the first stage, use grid-based clustering to do preliminary grouping and obtain data distribution features, and judge whether it is suitable to continue to use the results as the initial value, and perform the second stage of Mahalanobis-based fuzzy C-means clustering to obtain better clustering results.
The experimental results of this study show that the two-stage clustering analysis combined with the two clustering algorithms does not need to preset the number of clusters, and can have good clustering calculation results under different data distribution features. At the same time, the problems of initial value setting and iteration times of the traditional fuzzy C-means clustering algorithm are obviously improved.
摘要 I
ABSTRACT II
誌謝 III
目錄 IV
圖目錄 VI
表目錄 VII
第一章 緒論 1
1-1 前言 1
1-2 文獻回顧 1
1-2-1 聚類演算法 2
1-2-2 聚類效度指標 8
1-3 演算法分析 10
1-4 論文架構 14
第二章 研究方法與背景 17
2-1 基於網格的聚類方法 18
2-2 模糊C均值聚類演算法 18
2-3 馬氏距離 21
2-4 聚類效度指標 22
2-4-1 調整的蘭德指數 23
2-4-2 RT Index 25
第三章 兩階段模糊C均值聚類分析 27
3-1 演算法構想 27
3-2 基於網格聚類的資料前置處理 30
3-3 資料分佈特徵檢驗 35
3-4 基於馬氏距離的模糊C均值聚類演算法 38
3-5 兩階段模糊C均值聚類演算法 40
第四章 實驗模擬 45
4-1 初始中心與馬氏距離改良 45
4-1-1 初始中心比較 46
4-1-2 馬氏距離改良 48
4-2 SCIKIT-LEARN實驗比較 49
4-2-1 演算法比較 50
4-2-2 聚類效度指標 53
第五章 結論與未來工作 61
5-1 結論 61
5-2 未來工作 61
參考文獻 63
作者簡歷 67
[1] J. B. MacQueen, “Some Methods for classification and Analysis of Multivariate Observations,” Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press, vol. 1, pp.281-297, 1967
[2] H.-S. Park and C.-H. Jun, “A simple and fast algorithm for K-medoids clustering,” Expert Systems with Applications., vol. 36, no. 2, pp. 3336–3341, Mar. 2009.
[3] Z. Huang, “A fast clustering algorithm to cluster very large categorical data sets in data mining,” Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 1–8, 1997
[4] R. T. Ng and J. Han, “Efficient and effective clustering methods for spatial data mining,” Proceedings of the VLDB Conference, pp. 144–155, 1994
[5] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. New York, NY, USA: Wiley, 2009.
[6] R. T. Ng and J. Han, “CLARANS: A method for clustering objects for spatial data mining,” IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 5, pp. 1003–1016, Sep./Oct. 2002.
[7] James C.Bezdek, Robert Ehrlich, and William Full, “FCM: The fuzzy c-means clustering algorithm,” Computers & Geosciences, vol. 10, no. 2–3, Pages 191-203, 1984.
[8] Robert L. Cannon, Jitendra V. Dave, and James C. Bezdek, “Efficient Implementation of the Fuzzy c-Means Clustering Algorithms,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 2, pp. 248-255, Mar. 1986.
[9] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981
[10] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data., vol. 25, no. 2, pp. 103–114 , Jun. 1996.
[11] S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large databases,” Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, vol. 27, no. 2, pp. 73–84., Jun. 1998
[12] S. Guha, R. Rastogi, and K. Shim, “Rock: A robust clustering algorithm for categorical attributes,” Information Systems., vol. 25, no. 5, pp. 345–366, 2000
[13] G. Karypis, E.-H. Han, and V. Kumar, “Chameleon: Hierarchical clustering using dynamic modelling,” IEEE Computer, vol. 32, no. 8, pp. 68–75, Aug. 1999.
[14] J. H. Ward, “Hierarchical grouping to optimize an objective function,” Journal of the American Statistical Association, 58, 236–244. 1963.
[15] L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley. pp. 253–279.1990.
[16] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, 1996
[17] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “Optics: Ordering points to identify the clustering structure,” Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data., vol. 28, no. 2, pp. 49–60., 1999
[18] L. McInnes, J. Healy, “Accelerated Hierarchical Density Based Clustering,” IEEE International Conference on Data Mining Workshops , pp 33-42. 2017
[19] X. Xu, M. Ester, H.-P. Kriegel, and J. Sander, “A distribution-based clustering algorithm for mining in large spatial databases,” Proceedings 14th International Conference on Data Engineering, pp. 324–331, Feb. 1998
[20] A. Hinneburg and D. A. Keim, “An efficient approach to clustering in large multimedia databases with noise,” Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 58–65, 1998
[21] W. Wang, J. Yang, and R. R. Muntz, “Sting: A statistical information grid approach to spatial data mining,” Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 186–195., 1997
[22] R. Agrawal, J. Gehrke, D. Gunopulos, et al. “Automatic Subspace Clustering of High Dimensional Data.” Data Min Knowl Disc 11, 5–33, 2005.
[23] G. Sheikholeslami, S. Chatterjee, and A. Zhang, “Wavecluster: A multiresolution clustering approach for very large spatial databases,” Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 428–439., 1998
[24] A. Hinneburg and D. A. Keim, “Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering,” Proceedings of the 25rd International Conference on Very Large Data Bases, pp. 506–517., 1999
[25] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. 1–38, 1977.
[26] D. H. Fisher, ‘‘Knowledge acquisition via incremental conceptual clustering,’’ Mach. Learn., vol. 2, no. 2, pp. 139–172, Sep. 1987.
[27] T. Kohonen, ‘‘The self-organizing map,’’ Neurocomputing, vol. 21, no. 1, pp. 1–6, 1998.
[28] S. Ray, R. H. Turi, “Determination of number of clusters in k-means clustering and application in colour image segmentation.” Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques, Calcutta, India, 27–29,pp. 137–143. December 1999

[29] C. H. Chou, M. C. Su, E. Lai, “A new cluster validity measure and its application to image compression,” Pattern Analysis and Applications, vol. 7, No. 2, pp. 205–220, 2004
[30] J. C. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters,” Journal of Cybernetics, 32–57, 1973
[31] D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 1, No. 2, pp. 224-227., 1979
[32] N.R. Pal; J.C. Bezdek, “On cluster validity for the fuzzy c-means model,” IEEE Fuzzy Systems, vol. 3, No. 3, pp. 370-379, 1995.
[33] B. Haasdonk and E. Pękalska, “Classification with Kernel Mahalanobis Distance Classifiers,” Springer, Berlin, Heidelberg.,2009
[34] W. M. Rand. “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical Association. Vol. 66 No. 336, 846–850.1971
[35] X. V. Nguyen, J. Epps, and J. Bailey. 2009. “Information theoretic measures for clusterings comparison: is a correction for chance necessary?,” Proceedings of the 26th Annual International Conference on Machine Learning, Association for Computing Machinery, New York, 1073–1080. 2009
[36] S. Boyd, L. Vandenberghe, Convex optimization, Chapter 1, Cambridge University Press, 2004
[37] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[38] API design for machine learning software: experiences from the scikit-learn project, Buitinck et al., 2013.
[39] D. Sculley, “Web Scale K-Means clustering,” Proceedings of the 19th international conference on World wide web, 2010
[40] D. Dueck and B. J. Frey. “Non-metric Affinity Propagation for Unsupervised Image Categorization.” IEEE International Conference on Computer Vision. pp. 1-8. 2007.
[41] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp. 790-799, Aug. 1995
[42] A. Damle, V. Minden, L. Ying, “Simple, direct and efficient multi-way spectral clustering graphic,” Information and Inference: A Journal of the IMA, Volume 8, Issue 1, Pages 181–203, March 2019
[43] Yu and Shi, “Multiclass spectral clustering,” Proceedings Ninth IEEE International Conference on Computer Vision, pp. 313-319 vol.1, 2003
[44] D. N. Geary, Mixture Models: Inference and Applications to Clustering, January 1989.
[45] https://github.blog/2019-01-24-the-state-of-the-octoverse-machine-learning/
[46] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A.Y. Zomaya, S. Foufou and A. Bouras, “A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis,” IEEE Transactions on Emerging Topics in Computing, 2, 267-279. 2014
[47] M. A. Mahdi, K. M. Hosny and I. Elhenawy, “Scalable Clustering Algorithms for Big Data: A Review,” IEEE Access, vol. 9, pp. 80015-80027, 2021
[48] L. A. Zadeh. “Fuzzy sets,” Information and control, vol. 8, pp. 338–353, 1965
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *