作者(英文):Shih-Hao Lu
論文名稱(英文):Sentiment Analysis Based on Traditional NLP and BERT Model - Taking Food Reviews as an Example
指導教授(英文):Guan-ling Lee
口試委員(英文):Yao-Chung Chang
Shou-Chih Lo
關鍵詞(英文):online reviewsTF-IDFWord2vecBERTSMOTEsentiment analysis
With the continuous innovation of information network-related technologies, people are now able to freely express their subjective opinions and thoughts on people and things on the Internet. Although these information exist in the virtual Internet world, they are relevant to the real society In terms of real impact, based on the online reviews of shopping websites, if some consumers leave negative comments, it will inevitably affect the electronic word-of-mouth of the merchant, and indirectly affect the purchase intention of other consumers. This shows the importance of sentiment analysis.
This article uses the Amazon food review data set on the Kaggle platform as an experiment. Through the ratings in the data set, the reviews are divided into negative, neutral, and positive emotional categories. Due to the problem of data imbalance in the data itself, most of the reviews are positive. , so this paper proposes the SMOTE method to try to solve this problem, and uses TF-IDF, Word2vec and BERT models to extract features from the comment text, and train the obtained text vector to the machine learning classifier to predict the sentiment category of the comment.
The experimental results show that before adding the SMOTE method, most of the prediction results are positive emotional categories, which are obviously affected by the imbalance of data. After adding the SMOTE method, the evaluation indicators for negative and neutral emotional categories have been greatly improved. Through this experiment, it is also pointed out that the method of combining Word2vec and BERT with a classifier has a similar effect on this review text, and the random forest classifier has the best accuracy rate, which is very suitable for using this classifier to solve multi-category problems.
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 1
1.3 論文架構 2
第二章 文獻探討 3
2.1 自然語言處理 3
2.2 情感分析 4
2.3 情感分析與評論重要性相關研究 5
2.4 傳統情感分析方法 5
2.5 新的情感分析方法 6
2.6 解決不平衡資料的方法 6
第三章 研究方法 9
3.1 資料介紹 9
3.2 研究設計 9
3.2.1 資料預處理 12
3.2.2 特徵提取 12
3.2.3 分類器 16
3.2.4 數據增強 18
3.3 優缺點比較 19
第四章 實驗結果與討論 21
4.1 資料來源 21
4.2 評估指標 21
4.3 實驗結果 22
4.3.1 未加入SMOTE的分類結果 26
4.3.2 加入SMOTE的分類結果 33
4.4 實驗總結 36
第五章 結論 39
參考文獻 41

