疫情期間使用MDS及K-means對全台遊客之住宿評價進行分析__國立東華大學博碩士論文全文影像系統

帳號：guest(13.59.99.10) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	施博鈞
作者(英文):	Bor-Jiun Shih
論文名稱:	疫情期間使用MDS及K-means對全台遊客之住宿評價進行分析
論文名稱(英文):	Use MDS and K-means to analyze the accommodation evaluation of tourists in Taiwan during the epidemic
指導教授:	陳林志
指導教授(英文):	Lin-Chih Chen
口試委員:	葉國暉陳大仁
口試委員(英文):	Kuo-Hui Yeh Da-Ren Chen
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊管理學系
學號:	610735010
出版年(民國):	111
畢業學年度:	110
語文別:	英文
論文頁數:	99
關鍵詞(英文):	K-means、MDS、COVID-19、hotel industry、NLP、TF-IDF
相關次數:	推薦:1 點閱:25 評分: 下載:10 收藏:0

現代社會，交通的快速發展，不僅有利於人們資源和信息的快速流動，也支持著人類社會的快速進步。然而，相關的負面影響，例如禽流感（或稱為禽流感）病毒或冠狀病毒（COVID-19），也始終對公眾的活動和生活造成重大損害。 COVID-19 是一種新型流感病毒，於 2019 年 12 月在中國武漢首次發現。隨著人員流動，COVID-19 已成為過去兩年（2020 年和 2021 年）世界上最大的瘟疫。據相關統計，截至2021年10月，至少有2.4億人被診斷出患有COVID-19，至少有489萬人因COVD-19死亡。
在本文中，我們分析和討論了 COVID-19 流行對台灣酒店和旅遊業的影響。首先，我們使用自然語言處理（NLP）技術從收集到的相關酒店中獲取分析數據。 NLP技術可以幫助我們從大量的酒店信息（如客人點評、酒店軟硬件設施或酒店位置等）中獲取重要且有意義的分析信息。接下來，對於酒店分析信息，我們使用多維縮放（MDS）和K-means聚類算法來構建多維模型。基於這個模型，我們對不同酒店的住客點評分類進行分析和討論，為未來酒店轉型提供依據。

In modern society, the rapid development of transportation not only helps the rapid flow of people's resources and information, but also supports the rapid progress of human society. However, related negative effects, such as avian influenza (or known as bird flu) virus or coronavirus (COVID-19), also cause major damage to the activities and lives of the public at all times. COVID-19 is a new type of influenza virus and was first detected in Wuhan, China in December 2019. With the movement of people, COVID-19 has become the world's largest plague in the past two years (2020 and 2021). According to relevant statistics, as of October 2021, at least 240 million people have been diagnosed with COVID-19 and at least 4.89 million people have died due to COVD-19.
In this thesis, we analyze and discuss the impact of Taiwan's hotels and tourism industry due to the COVID-19 epidemic. First of all, we use Natural Language Processing (NLP) technology to obtain the analysis data form the collected relevant hotel. NLP technology can help us obtain important and meaningful analysis information from a large amount of hotel information (such as guest reviews, hotel software and hardware facilities, or hotel location, etc.). Next, for the hotel analysis information, we use Multi-Dimensional Scaling (MDS) and K-means clustering algorithm to build a multi-dimensional model. Based on this model, we analyze and discuss the classification of guest reviews for different hotels and provide a basis for future hotel transformation.

1. Introduction 1
2. Literature Review 5
2.1. Natural Language Processing 5
2.2. Document Clustering 8
2.2.1. K-means Clustering 9
2.2.2. Expectation-Maximization Algorithm 11
2.2.3. Gaussian Mixture Model 12
2.3. Multidimensional Scaling 15
3. Experimental Steps 19
3.1. Web crawler 20
3.1.1. Standard hotel 21
3.2. NLP preprocessing 21
3.3. Term Frequency Inverse Document Frequency 22
3.4. K-means 24
3.4.1. Use the elbow method to determine the K value of K-means 25
3.4.2. Gap statistic 26
3.4.3. K-means 28
3.5. MDS 29
4. Research results 33
4.1. Clustering of K-means 33
4.2. MDS analysis results 39
5. Conclusion and future research directions 53
5.1. Future research directions 56
6. Reference 59
7. Appendix 65

Abualigah, et al. (2018). "A new feature selection method to improve the document clustering using particle swarm optimization algorithm." Journal of Computational Science 25 25: 456-466.

Albert (2016). "The impact of the hotel industry on the competitiveness of tourism destinations in Hungary." Journal of Competitiveness 8.4 (2016) 8(4): 85.

Anand, S., et al. (2018). "An overview on web scraping techniques and tools." International Journal on Future Revolution in Computer Science Communication Engineering 4(4): 363-367.

Aristidis, L., et al. (2003). "The global k-means clustering algorithm." Pattern recognition
36(2): 451-461.

Aristophanous, M., et al. (2007). "A Gaussian mixture model for definition of lung tumor volumes in positron emission tomography." Medical physics 34(11): 4223-4235.

Arthur, D. and S. Vassilvitskii (2006). Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), IEEE.

Asher, L., et al. (2012). Finding a needle in a haystack of reviews: cold start context-based hotel recommender system. Proceedings of the sixth ACM conference on Recommender systems.

Cai, T., et al. (2016). "Natural language processing technologies in radiology research and clinical applications." Radiographics 36.1 36(1): 176-191.

Cambria, E. and B. White (2014). "Jumping NLP curves: A review of natural language processing research." IEEE Computational intelligence magazine 9(2): 48-57.

Cox, M. and T. Cox (2008). Multidimensional scaling. Handbook of data visualization, Springer: 315-347.

Curiskis, S., et al. (2020). "An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit." Information Processing & Management 53.6 57(2): 102034.

Daniel, P., et al. (2010). Subspace Gaussian mixture models for speech recognition. 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE.

Dempster, A., et al. (1977). "Maximum likelihood from incomplete data via the EM algorithm." Journal of the Royal Statistical Society: Series B (Methodological) 39(1): 1-22.

Do, C. and S. Batzoglou (2008). "What is the expectation maximization algorithm?" Nature biotechnology 26(8): 897-899.

Douglas, R. (2009). "Gaussian mixture models." Encyclopedia of biometrics
741: 659-663.

Elizabeth, L. (2001). "Natural language processing." School of Information Studies

Fessler, J. and A. Hero (1994). "Space-alternating generalized expectation-maximization algorithm." IEEE Transactions on signal processing 42(10): 2664-2677.

Gartner, W. (1989). "Tourism image: Attribute measurement of state tourism products using multidimensional scaling techniques." Journal of Travel Research 28(2): 16-20.

Gyslain, G. (2006). "Collecting and analyzing data in multidimensional scaling experiments: A guide for psychologists using SPSS." Tutorials in Quantitative Methods for Psychology 2(1): 27-38.

Habib, S. T. and A. Zahid (2018). "An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm." Future Computing and Informatics Journal 3(2): 200-209.

Hang, L. (2017). "Deep learning for natural language processing: advantages and challenges." National Science Review.

Ingwer, B., et al. (2012). Applied multidimensional scaling, Springer Science & Business Media.

Janani and Vijayarani (2019). "Text document clustering using spectral clustering algorithm with particle swarm optimization." Expert Systems with Applications 134: 192-200.

Kruskal, J. (1964). "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis." Psychometrika 29(1): 1-27.

Lin-Chih, C. (2017). "An effective LDA-based time topic model to improve blog search performance." Information Processing & Management 53(6): 1299-1319.

Manning, C. D., et al. (2014). The Stanford CoreNLP natural language processing toolkit. Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations.

Marina, S. (2016). "Advantages & disadvantages of k-means and hierarchical clustering (unsupervised learning)." http://santini. se/teaching/ml/2016/Lect_10/10c_Unsupervise dMethods. pdf

Matten, L. V. d. and G. Hinton (2012). "Visualizing non-metric similarities in multiple maps." Machine learning 87(1): 33-55.

Maurice, R. (2018). "A comparative study of divisive and agglomerative hierarchical clustering algorithms." Journal of Classification
35(2): 345-366.

Meng, J., et al. (2009). The application on intrusion detection based on k-means cluster algorithm. 2009 International Forum on Information Technology and Applications, IEEE.

Michael, H., et al. (2013). "Multidimensional scaling." Wiley Interdisciplinary Reviews: Cognitive Science 4(1): 93-103.

Mousa (2019). "Natural Language Processing (NLP)." ACADEMIA.

Purnima, B., et al. (2014). "EBK-means: A clustering technique based on elbow method and k-means in WSN." International Journal of Computer Applications 105 105(9).

Ravindra, S., et al. (2009). "Statistical representation of distribution system loads using Gaussian mixture model." IEEE Transactions on Power Systems
25(1): 29-37.

Robert, T., et al. (2001). "Estimating the number of clusters in a data set via the gap statistic." Journal of the Royal Statistical Society: Series B 63(2): 411-423.

Saeed, N., et al. (2018). "A survey on multidimensional scaling." ACM Computing Surveys
51(3): 1-25.

Santhanam and Padmavathi (2015). "Application of K-means and genetic algorithms for dimension reduction by integrating SVM for diabetes diagnosis." Procedia Computer Science
47: 76-83.

Steyvers, M. (2002). "Multidimensional scaling." Encyclopedia of cognitive science 1.

Sugar, C. and G. James (2003). "Finding the number of clusters in a dataset: An information-theoretic approach." Journal of the American Statistical Association 98(463): 750-763.

Tao, C. and Z. Jie (2010). "On-line multivariate statistical monitoring of batch processes using Gaussian mixture model." Computers & chemical engineering 34.4 34(4): 500-507.

Todd, M. (1996). "The expectation-maximization algorithm." IEEE Signal processing magazine 13(6): 47-60.

Zehra, T. and A. Umut (2019). "Natural language processing applications in library and information science." Online Information Review.

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文