基於傳統NLP與BERT模型實作情感分析-以美食評論為例__國立東華大學博碩士論文全文影像系統

帳號：guest(18.227.79.206) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	盧詩皓
作者(英文):	Shih-Hao Lu
論文名稱:	基於傳統NLP與BERT模型實作情感分析-以美食評論為例
論文名稱(英文):	Sentiment Analysis Based on Traditional NLP and BERT Model - Taking Food Reviews as an Example
指導教授:	李官陵
指導教授(英文):	Guan-ling Lee
口試委員:	張耀中羅壽之
口試委員(英文):	Yao-Chung Chang Shou-Chih Lo
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊工程學系
學號:	611021215
出版年(民國):	112
畢業學年度:	111
語文別:	中文
論文頁數:	45
關鍵詞:	線上評論、TF-IDF、Word2vec、BERT、SMOTE、情感分析
關鍵詞(英文):	online reviews、TF-IDF、Word2vec、BERT、SMOTE、sentiment analysis
相關次數:	推薦:0 點閱:36 評分: 下載:25 收藏:0

隨著資訊網路相關技術不斷的推陳出新，現今人們已經能夠在網路上自由地發表對人事物的主觀意見與想法，而這些訊息雖然存在於虛擬的網路世界，但是對於現實社會中是有實質影響的，以購物網站的線上評論來看，如果有消費者留下負面的評論，勢必會影響到商家的電子口碑，並間接影響到其他消費者的購買意願，由此可見情感分析的重要性。
本文採用Kaggle平台上的亞馬遜美食評論資料集來實驗，透過資料集內的評級將評論分為負面、中立、正面的情感類別，由於資料內本身有著資料不平衡的問題，多數評論都為正面評論，所以本文利用SMOTE方法嘗試解決這個問題，並採用TF-IDF、Word2vec及BERT模型對評論文本進行特徵提取，將得到的文本向量給機器學習的分類器做訓練，以預測評論的情感類別。
實驗結果顯示，在未加入SMOTE方法之前，預測結果多數都為正面的情感類別，明顯受到資料不平衡的影響，而加入SMOTE方法之後，在負面及中立的情感分類上，評估指標都有大幅提升，透過此次實驗也指出Word2vec跟BERT結合分類器的方法在面對這個評論文本上有相近的效果，以及隨機森林分類器得到最好的準確率，很適合用此分類器來解決多分類的問題。

With the continuous innovation of information network-related technologies, people are now able to freely express their subjective opinions and thoughts on people and things on the Internet. Although these information exist in the virtual Internet world, they are relevant to the real society In terms of real impact, based on the online reviews of shopping websites, if some consumers leave negative comments, it will inevitably affect the electronic word-of-mouth of the merchant, and indirectly affect the purchase intention of other consumers. This shows the importance of sentiment analysis.
This article uses the Amazon food review data set on the Kaggle platform as an experiment. Through the ratings in the data set, the reviews are divided into negative, neutral, and positive emotional categories. Due to the problem of data imbalance in the data itself, most of the reviews are positive. , so this paper proposes the SMOTE method to try to solve this problem, and uses TF-IDF, Word2vec and BERT models to extract features from the comment text, and train the obtained text vector to the machine learning classifier to predict the sentiment category of the comment.
The experimental results show that before adding the SMOTE method, most of the prediction results are positive emotional categories, which are obviously affected by the imbalance of data. After adding the SMOTE method, the evaluation indicators for negative and neutral emotional categories have been greatly improved. Through this experiment, it is also pointed out that the method of combining Word2vec and BERT with a classifier has a similar effect on this review text, and the random forest classifier has the best accuracy rate, which is very suitable for using this classifier to solve multi-category problems.

第一章緒論 1
1.1 研究背景 1
1.2 研究動機 1
1.3 論文架構 2
第二章文獻探討 3
2.1 自然語言處理 3
2.2 情感分析 4
2.3 情感分析與評論重要性相關研究 5
2.4 傳統情感分析方法 5
2.5 新的情感分析方法 6
2.6 解決不平衡資料的方法 6
第三章研究方法 9
3.1 資料介紹 9
3.2 研究設計 9
3.2.1 資料預處理 12
3.2.2 特徵提取 12
3.2.3 分類器 16
3.2.4 數據增強 18
3.3 優缺點比較 19
第四章實驗結果與討論 21
4.1 資料來源 21
4.2 評估指標 21
4.3 實驗結果 22
4.3.1 未加入SMOTE的分類結果 26
4.3.2 加入SMOTE的分類結果 33
4.4 實驗總結 36
第五章結論 39
參考文獻 41

[1] Mo, Z., Li, Y. F., & Fan, P. (2015). Effect of online reviews on consumer purchase behavior. Journal of Service Science and Management, 8(03), 419.
[2] Babić Rosario, A., Sotgiu, F., De Valck, K., & Bijmolt, T. H. (2016). The effect of electronic word of mouth on sales: A meta-analytic review of platform, product, and metric factors. Journal of marketing research, 53(3), 297-318.
[3] Dubey, T., & Jain, A. (2019, July). Sentiment analysis of keenly intellective smart phone product review utilizing SVM classification technique. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-8). IEEE.
[4] Hemalatha, S., & Ramathmika, R. (2019, May). Sentiment analysis of yelp reviews by machine learning. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS) (pp. 700-704). IEEE.
[5] Azman, A., Alshari, E. M., Sulaiman, P. S., Abdullah, M. T., Alksher, M., & Kadir, R. A. (2017, December). Feasibility of Using Rating to Predict Sentiment for Online Reviews. In 2017 Asia Modelling Symposium (AMS) (pp. 37-41). IEEE.
[6] Rao, S., & Kakkar, M. (2017, January). A rating approach based on sentiment analysis. In 2017 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence (pp. 557-562). IEEE.
[7] Mohbey, K. K. (2021, March). Sentiment analysis for product rating using a deep learning approach. In 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) (pp. 121-126). IEEE.
[8] Brown, P. F., Della Pietra, V. J., Desouza, P. V., Lai, J. C., & Mercer, R. L. (1992). Class-based n-gram models of natural language. Computational linguistics, 18(4), 467-480.
[9] Bengio, Y., Ducharme, R., & Vincent, P. (2000). A neural probabilistic language model. Advances in neural information processing systems, 13.
[10] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[11] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[12] Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267-307.
[13] Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank.
[14] Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs/0205070.
[15] Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1253.
[16] Zhong, M., Qu, X., Chen, Y., Liao, S., & Xiao, Q. (2021). Impact of Factors of Online Deceptive Reviews on Customer Purchase Decision Based on Machine Learning. Journal of Healthcare Engineering, 2021.
[17] Kumar, A., & Jain, R. (2015, October). Sentiment analysis and feedback evaluation. In 2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE) (pp. 433-436). IEEE.
[18] Harish, B. S., Kumar, K., & Darshan, H. K. (2019). Sentiment analysis on IMDb movie reviews using hybrid feature extraction method.
[19] Lilleberg, J., Zhu, Y., & Zhang, Y. (2015, July). Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC) (pp. 136-140). IEEE.
[20] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
[21] He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263-1284.
[22] Mohammed, R., Rawashdeh, J., & Abdullah, M. (2020, April). Machine learning with oversampling and undersampling techniques: overview study and experimental results. In 2020 11th international conference on information and communication systems (ICICS) (pp. 243-248). IEEE.
[23] Ramadhan, W. P., Novianty, S. A., & Setianingsih, S. C. (2017, September). Sentiment analysis using multinomial logistic regression. In 2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC) (pp. 46-49). IEEE.
[24] Prabhat, A., & Khullar, V. (2017, January). Sentiment classification on big data using Naïve Bayes and logistic regression. In 2017 International Conference on Computer Communication and Informatics (ICCCI) (pp. 1-5). IEEE.
[25] Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1, 81-106.
[26] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20, 273-297.
[27] Liu, Y., Bi, J. W., & Fan, Z. P. (2017). A method for multi-class sentiment classification based on an improved one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm. Information Sciences, 394, 38-52.
[28] Ho, T. K. (1995, August). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (Vol. 1, pp. 278-282). IEEE.
[29] Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
[30] Karthika, P., Murugeswari, R., & Manoranjithem, R. (2019, April). Sentiment analysis of social media network using random forest algorithm. In 2019 IEEE international conference on intelligent techniques in control, optimization and signal processing (INCOS) (pp. 1-5). IEEE.
[31] Ramos, J. (2003, December). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (Vol. 242, No. 1, pp. 29-48).
[32] Singh, S., Kumar, K., & Kumar, B. (2022, May). Sentiment Analysis of Twitter Data Using TF-IDF and Machine Learning Techniques. In 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON) (Vol. 1, pp. 252-255). IEEE.
[33] Yarkareddy, S., Sasikala, T., & Santhanalakshmi, S. (2022, January). Sentiment analysis of amazon fine food reviews. In 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 1242-1247). IEEE.
[34] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
[35] Styawati, S., Nurkholis, A., Aldino, A. A., Samsugi, S., Suryati, E., & Cahyono, R. P. (2022, January). Sentiment Analysis on Online Transportation Reviews Using Word2Vec Text Embedding Model Feature Extraction and Support Vector Machine (SVM) Algorithm. In 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE) (pp. 163-167). IEEE.
[36] Yue, W., & Li, L. (2020, December). Sentiment analysis using word2vec-cnn-bilstm classification. In 2020 seventh international conference on social networks analysis, management and security (SNAMS) (pp. 1-5). IEEE.
[37] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
[38] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
[39] Amazon Fine Food Reviews (https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews)

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文