作者(英文):Bor-Jiun Shih
論文名稱(英文):Use MDS and K-means to analyze the accommodation evaluation of tourists in Taiwan during the epidemic
指導教授(英文):Lin-Chih Chen
口試委員(英文):Kuo-Hui Yeh
Da-Ren Chen
關鍵詞(英文):K-meansMDSCOVID-19hotel industryNLPTF-IDF
現代社會,交通的快速發展,不僅有利於人們資源和信息的快速流動,也支持著人類社會的快速進步。然而,相關的負面影響,例如禽流感(或稱為禽流感)病毒或冠狀病毒(COVID-19),也始終對公眾的活動和生活造成重大損害。 COVID-19 是一種新型流感病毒,於 2019 年 12 月在中國武漢首次發現。隨著人員流動,COVID-19 已成為過去兩年(2020 年和 2021 年)世界上最大的瘟疫。據相關統計,截至2021年10月,至少有2.4億人被診斷出患有COVID-19,至少有489萬人因COVD-19死亡。
在本文中,我們分析和討論了 COVID-19 流行對台灣酒店和旅遊業的影響。首先,我們使用自然語言處理(NLP)技術從收集到的相關酒店中獲取分析數據。 NLP技術可以幫助我們從大量的酒店信息(如客人點評、酒店軟硬件設施或酒店位置等)中獲取重要且有意義的分析信息。接下來,對於酒店分析信息,我們使用多維縮放(MDS)和K-means聚類算法來構建多維模型。基於這個模型,我們對不同酒店的住客點評分類進行分析和討論,為未來酒店轉型提供依據。
In modern society, the rapid development of transportation not only helps the rapid flow of people's resources and information, but also supports the rapid progress of human society. However, related negative effects, such as avian influenza (or known as bird flu) virus or coronavirus (COVID-19), also cause major damage to the activities and lives of the public at all times. COVID-19 is a new type of influenza virus and was first detected in Wuhan, China in December 2019. With the movement of people, COVID-19 has become the world's largest plague in the past two years (2020 and 2021). According to relevant statistics, as of October 2021, at least 240 million people have been diagnosed with COVID-19 and at least 4.89 million people have died due to COVD-19.
In this thesis, we analyze and discuss the impact of Taiwan's hotels and tourism industry due to the COVID-19 epidemic. First of all, we use Natural Language Processing (NLP) technology to obtain the analysis data form the collected relevant hotel. NLP technology can help us obtain important and meaningful analysis information from a large amount of hotel information (such as guest reviews, hotel software and hardware facilities, or hotel location, etc.). Next, for the hotel analysis information, we use Multi-Dimensional Scaling (MDS) and K-means clustering algorithm to build a multi-dimensional model. Based on this model, we analyze and discuss the classification of guest reviews for different hotels and provide a basis for future hotel transformation.
1. Introduction 1
2. Literature Review 5
2.1. Natural Language Processing 5
2.2. Document Clustering 8
2.2.1. K-means Clustering 9
2.2.2. Expectation-Maximization Algorithm 11
2.2.3. Gaussian Mixture Model 12
2.3. Multidimensional Scaling 15
3. Experimental Steps 19
3.1. Web crawler 20
3.1.1. Standard hotel 21
3.2. NLP preprocessing 21
3.3. Term Frequency Inverse Document Frequency 22
3.4. K-means 24
3.4.1. Use the elbow method to determine the K value of K-means 25
3.4.2. Gap statistic 26
3.4.3. K-means 28
3.5. MDS 29
4. Research results 33
4.1. Clustering of K-means 33
4.2. MDS analysis results 39
5. Conclusion and future research directions 53
5.1. Future research directions 56
6. Reference 59
7. Appendix 65

