擴充中文情感詞庫於行銷推薦分析之研究__國立東華大學博碩士論文全文影像系統

帳號：guest(18.190.156.93) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	許庭瑄
作者(英文):	Ting-Hsuan Hsu
論文名稱:	擴充中文情感詞庫於行銷推薦分析之研究
論文名稱(英文):	Research on Expanding Chinese Emotional Lexicon for Marketing Recommendation Analysis
指導教授:	侯佳利
指導教授(英文):	Jia-Li Hou
口試委員:	林耀堂劉英和
口試委員(英文):	Lin-Yai Tang Ying-Ho Liu
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊管理學系
學號:	610735007
出版年(民國):	108
畢業學年度:	107
語文別:	中文
論文頁數:	49
關鍵詞:	情感分析、情感詞庫、單純貝氏分類法、K-近鄰演算法、支持向量機
關鍵詞(英文):	Sentiment Analysis、Emotion Lexicon、Naïve Bayes、K-Nearest Neighbor Algorithm、Support Vector Machine
相關次數:	推薦:0 點閱:37 評分: 下載:13 收藏:0

文字是人們彼此溝通的工具，社群網站與通訊軟體的發達改變人們溝通的方式，同時產生大量值得分析的文字資料，透過文字探勘中的情感分析可以將這些文字分析並產生其價值。然而中文語句分析尤為困難，詞是中文裡的基礎單位，中文不像英文在字詞間有空格做為詞與詞的分隔，須透過斷詞處理來分隔中文字詞，因此如何正確的斷詞來進行情感分析一直是正確判斷語句所表達涵義的關鍵問題。

在情感分析中情感辭典扮演很重要的角色，目前有許多情感分析的研究，並且建立很多情感辭典，然而因為中文具有簡體、繁體及各地區文化的差異用語等問題，因此只基於單一情感辭典做情感分析所得到的結果可能不如預期。

本論文以中文多語境情感詞彙分析研究(凃欣妤, 2018)所整理的情感辭典為基礎加入北京清華大學李軍建構的中文褒貶意詞典以及元智大學禹良治教授建構的中文維度型情感詞典進行擴充，以京東商城及Mobile 01中的評論作為中文簡體及繁體的實驗語料，並使用支持向量機、K-近鄰演算法及Naïve Bayes三種演算法進行情感分類，以此來分析透過不同語境下的中文情感辭典在結合後是否能夠更正確的分析語句情緒。實驗結果表明結合不同語境下的情感辭典可以改善情感分類的結果，提高正確率，使得正確率達到97%。

In the sentiment analysis, the emotion lexicon plays a very important role. There are many sentiment analysis studies and emotion lexicons are established. Chinese is divided into Simplified Chinese and Traditional Chinese. And due to different cultural in various regions. If only based on single emotion lexicon, the result of sentiment analysis may not be as expected.

Based on the emotion lexicon compiled by Analysis of Sentiment Vocabulary in Chinese Multilingualism (Tu Xinyu, 2018), this thesis expand the lexicon by adding the Chinese dictionary constructed by Beijing Tsinghua University and Chinese Valence-Arousal Words constructed by Professor Yan Liangzhi of Yuanzhi University. The experimental corpus is from the comments in Jingdong and Mobile 01. Using support vector machine, K-nearest neighbor algorithm and Naïve Bayes algorithm as the model of sentiment analysis.

第一章緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 4
1.4 論文架構 4
第二章文獻探討 5
2.1 情感分析 (Sentiment Analysis) 5
2.2 情感辭典 6
2.3 CKIP 8
2.4 Jieba 9
2.5 Word2Vec 10
2.6 K-近鄰演算法（K Nearest Neighbor, KNN） 11
2.7 單純貝氏分類法Naïve Bayes(NB) 11
2.8 支持向量機分類方法(Support Vector Machines, SVM) 12
2.9 ROC曲線(Receiver Operating Characteristic Curve) 12
第三章研究方法 15
3.1 實驗流程 15
3.2 中文情感辭典 16
3.3 語料收集 17
3.4 文本預處理 19
3.5 獲取特徵詞向量 21
第四章實驗結果 23
4.1 實驗資料說明 23
4.2 實驗一 23
4.3 實驗二 33
4.4 實驗三 37
4.5 實驗四 38
4.6 實驗五 41
第五章結論 43
第六章未來研究建議 45
參考文獻 47

Z. Yan and S. Ping, “Sentiment Classification of Internet Commodity Reviews Based on the Extended Chinese Sentiment Lexicon,” 2017 2nd International Conference on Artificial Intelligence: Techniques and Applications (AITA 2017)
H. Bai, G. Yu and XY. Tian, “Study on the Classification of Negative Sentiment Weibo Messages in the Post-Disaster Situation,” Journal of Digital Information Management, Vol. 14(2), April 2016, pp. 136-142
J. Li, Y. Xu, H. Xiong and Y. Wang, “Chinese text emotion classification based on emotion dictionary,” 2010 IEEE 2nd Symposium on Web Society, pp.170-174, 2010
S.B. Tan, H.F. Tang and X.Q. Cheng, “Research on Sentiment Classification of Chinese Reviews Based on Supervised Machine Learning Techniques,” Journal of Chinese Information Processing, 2007. Vol.21, No.6
S. Zhang, Z. Wei, Y. Wang and T. Liao, “Sentiment analysis of Chinese micro-blog text based on extended Sentiment Dictionary,” Future Generation Computer Systems, Vol. 81, April 2018, pp.395-403
Y.Y. Zhao, B. Qing, Q.H. Shi and T. Liu, “Large-scale Sentiment Lexicon Collection and Its Application in Sentiment Classification,” Journal of Chinese Information Processing, Vol. 31, No. 2, March 2017.
B. Pang and L. Lee, “Thumbs up: sentiment classification using machine learning techniques,” Proceedings of the 2002 conference on empirical methods in natural language processing,2002, pp. 79-86
L.W. Ku and H.H. Chen, “Mining Opinions from the Web: Beyond Relevance Retrieval,” Mining Web Resources for Enhancing Information Retrieval, Vol. 58, No. 12, August 2007, pp. 1838-1850
P. Y. Lu, “Affective Lexicon in Chinese - Construction and Annotation,” 2015
H. C. Yu, T. H. Huang and H. H. Chen, “Domain Dependent Word Polarity Analysis for Sentiment Classification,” Computational Linguistics and Chinese Language Processing, vol. 17, No. 4, December 2012, pp. 33-48
D.W. Zhang, H. Xu, Z.C. Su and Y.F. Xu, “Chinese Comments sentiment Classification Based on Word2Vec and SVMperf,” Expert System with Applications, Vol. 42, No. 4, March 2015, pp. 1857-1863
Y. Zhu, Y.Q. Zhang and Joseph Lilleberg, “Support Vector Machines and Word2Vec for Text Classification with Semantic Features,” 2015 IEEE 14th Int’l Conf. on Cognitive Informatics & Cognitive Computing
E. Altszyler, M. Sigman, S. Riberio and D.F. Slezak, “Comparative Study of LSA vs Word2Vec Embeddings in Small Corpora: a case study in dreams database,” Conscious Cogn., Vol.56:178-187, 2017 November
L. Ma and Y. Zhang, “Using Word2Vec to Process Big Text Data,” 2015 IEEE International Conference on Big Data
W. Li, L. Zhu, K. Guo, Y. Shi and Y. Zheng, “Build a Tourism-specific Sentiment Lexicon via Word2Vec,” Annals of Data Science, Vol. 5(1),March 2018, pp.1-7
XP. Yang, ZX. Zhang, L. Wang, Y. Zhang and Q. Ma, “Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec,” Computer Science,2017
X. Ge, X. Jin and Y. Xu, “Research on Sentiment Analysis of Multiple Classifiers Based on Word2Vec,” 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics
Q. Shuai, Y. Huang, L. Jin and L. Pang, “Sentiment Analysis on Chinese Hotel Reviews with Doc2Vec and Classifiers,” 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference
V. Bijalwan, V. Kumar and P. Kumari, “KNN based Machine Learning Approach for Text and Document Mining,” International Journal of Database Theory and Application, 2014, Vol.7. No. 1
B. Trstenjak, S. Mikac and D. Donko, “KNN with TF-IDF Based Framework for Text Categorization,” Procedia Engineering, Vol. 69, 2014, pp.1356-1364
M. Bilal, H. Israr, M. Shahid and A. Khan, “Sentiment Classification of Roman-Urdu Opinions using Naïve Bayesian, Decision Tree and KNN Classification Techniques,” Journal of King Saud University-Computer and Information Sciences, Vol.28(3) , 2016 July,pp.330-334
S. Goyal, “Review paper on Sentiment Analysis of Twitter Data Using Text Mining and Hybrid Classification Approach,” International Journal of Engineering Development and Research, Vol.5(2),2017, pp.197-199
Jinyan Li, Simon Fong and Yan Zhuang, “Sentiment Analysis of Online News Using MALLET,”
X.M. Zou, H.Y. Peng and Erik Cambri, “Radical-Based Hierarchical Embedding for Chinese Sentiment Analysis at Sentence Level,” Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference
P. Domingos and M. Pazzani, “On the optimality of the simple Bayesian classifier under zero-one loss,” Machine Learning, 29:103-130, 1997.
P. Ficamos, Y. Liu and W. Chen, “A Naive Bayes and Maximum Entropy approach to sentiment analysis: Capturing domain-specific data in Weibo,” 2017 IEEE International Conference on Big Data and Smart Computing
L. Yang and Y. Xiang, “Naive Bayes and BiLSTM Ensemble for Discriminating between Mainland and Taiwan Variation of Mandarin Chinese,” Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, 2019
F. Wang, X. Deng and L. Hou, “Chinese News Text Multi Classification Based on Naive Bayes Algorithm,” Proceedings of the 2nd International Symposium on Computer Science and Intelligent Control, 2018
Q. Jiang, W. Wang, X. Han and S. Zhang, “Deep feature weighting in Naive Bayes for Chinese text classification,” 2016 4th International Conference on Cloud Computing and Intelligence Systems
T. Joachims, “Text categorization with support vector machines: learning with many relevant features,” ECML'98 Proceedings of the 10th European Conference on Machine Learning, Pages 137-142,1998
W.J. Chen, T.H. Shao, C.N. Li and N.Y. Deng, “MLTSVM: A Novel Twin Support Vector Machine to Multi-label Learning,” Pattern Recognition, Vol.52, April 2016, pp.61-74
F. Luo, C. Li and Z. Cao, “Affective-feature-based Sentiment Analysis Using SVM Classifier,” 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design
H. Xu, H. Lu, G. Yang and C. Zhang, “Sentiment Analysis of Chinese Version Using SVM & RNN,” Proceedings of the 6th International Conference on Information Engineering, 2017
K. Lu and J. Wu, “Sentiment Analysis of Film Review Texts Based on Sentiment Dictionary and SVM,” Proceedings of the 2019 3rd International Conference on Innovation in Artificial Intelligence, 2019, pp. 73-77
L. Xing, L. Yuan, W. Qinglin and L. Yu, “An approach to sentiment analysis of short Chinese texts based on SVMs,” 2015 IEEE 34th Chinese Control Conference
L. Tang, S. Zhang, L. He and H. Fan, “Research on Stock Prediction in China based on Social Network and SVM Algorithm,” Proceedings of the 2018 2nd International Conference on Economic Development and Education Management
凃欣妤(2018年10月26日)，中文多語境情感詞彙分析研究，台灣網際網路研討會
簡之文，部落格文章情感分析之研究，淡江大學碩士論文，民國101年
陳建美，中文情感詞彙本體的構建及其應用，中國大連理工大學博士論文，民國 98年
楊邵為，Facebook 文章情緒分類器之設計與實作，國立中正大學碩士論文，民國 106 年
王力弘，社群媒體新詞偵測系統以PTT八卦版為例，國立政治大學資訊科學系碩士在職專班碩士論文，民國104年
丁晟春，王穎，李霄，基於SVM的中文微博情緒分析研究，情報資料工作，2016年第三期
Jieba斷詞系統，https://github.com/fxsjy/jieba， retrieved on 2019/3/15.
隱藏式馬可夫模型，http://zh.wikipedia.org/wiki/隐马尔可夫模型，retrieved on 2019/3/15.
維特比演算法，http://zh.wikipedia.org/wiki/维特比算法，retrieved on 2019/3/15.
ROC曲線，http://zh.wikipedia.org/wiki/ROC曲线，retrieved on 2019/7/18.
大陸手機市占率，https://technews.tw/2019/05/06/samsung-smart-phone-market-share-in-china-rise/ ，retrieved on 2019/8/05.
台灣手機市占率，https://www.eprice.com.tw/mobile/talk/102/5356008/1/ ，retrieved on 2019/8/05.
Word2Vec模型建構，https://www.jianshu.com/p/ec27062bd453 ，retrieved on 2019/7/2.
基於Word2Vec+SVM對電商的評論數據進行情感分類，https://github.com/maowankuiDji/Word2Vec-sentiment ，retrieved on 2019/7/2.

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文