基於機器學習分析高端疫苗新聞內容之媒體報導風格__國立東華大學博碩士論文全文影像系統

帳號：guest(3.149.214.32) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	邱英傑
作者(英文):	Ying-Chieh Chiu
論文名稱:	基於機器學習分析高端疫苗新聞內容之媒體報導風格
論文名稱(英文):	Analyzing the Media Reporting Style of Medigen COVID-19 Vaccine News Content based on machine learning
指導教授:	李官陵
指導教授(英文):	Guan-Ling Lee
口試委員:	羅壽之張耀中
口試委員(英文):	Shou-Chih Lo Yao-Chung Chang
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊工程學系
學號:	610921214
出版年(民國):	112
畢業學年度:	111
語文別:	中文
論文頁數:	49
關鍵詞:	自然語言處理、Word2Vec、TF-IDF、SMOTE、機器學習、報導風格預測
關鍵詞(英文):	Natural Language Processing、Word2Vec、TF-IDF、SMOTE、Machine Learning、Predicting Reporting Style
相關次數:	推薦:0 點閱:24 評分: 下載:48 收藏:0

自從2019年12月開始，新冠肺炎蔓延全球各地，使得臺灣在之後確診病例也慢慢提高，甚至到擴散的程度，進而造成死亡人數上升，也因此需要透過施打疫苗降低重症及保護力。而新冠肺炎是一種新型的病毒，導致沒有一種疫苗可以來預防，所以全球各地的疫苗研發人員緊急研發疫苗，透過測試後，也緊急的授權，也因供應量的不足，無法提供給臺灣每位民眾能施打到疫苗，所以臺灣疫苗研發人員也盡快研發出一款疫苗-高端疫苗，讓更多民眾能盡快施打到疫苗，也因疫情的爆發，新聞媒體對於疫苗報導的重要性更加關注。
媒體新聞報導對於內容存在相似以及偏頗，本研究，從2021年7月1日～2022年7月31日蒐集數據來自四家媒體關於高端疫苗報導內容分別為蘋果日報、ETtoday新聞雲、民視新聞網以及聯合新聞網，以每兩家不同新聞媒體進行預測，蘋果日報、ETtoday新聞雲、民視新聞網這三家媒體報導，在預測上並沒有明顯的效果，然而聯合新聞網報導與其他三家媒體報導進行預測，由於資料不平衡，所以使用SMOTE方法，將測試資料裡擴增資料拿掉保留真實資料拿去做測試，透過機器學習模型，使用三種演算法為K-近鄰演算法(K-Nearest Neighbors, KNN)、隨機森林(Random Forest)與支援向量機(Support Vector Machine, SVM)，並預測媒體報導風格。實驗結果得出，精確率與召回率提升，媒體風格更容易被分辨。

Since December 2019, the COVID-19 pandemic has spread worldwide, causing an increase in confirmed cases and even fatalities in Taiwan. To mitigate severe cases and provide protection, vaccination has become crucial. However, COVID-19 is a novel virus, and initially, there were no vaccines available for prevention. Consequently, vaccine researchers worldwide urgently developed vaccines and received emergency authorizations after testing. Due to limited supply, Taiwan faced challenges in providing vaccines to its entire population. In response, Taiwanese vaccine researchers quickly developed a high-end vaccine to enable more people to be vaccinated promptly. The outbreak of the pandemic has also increased the importance of vaccine reporting in the news media.
Media news reports exhibit similarities and biases in their content. In this study, data was collected from four media sources, namely the Apple Daily, ETtoday News Cloud, Formosa TV News network, and United Daily News, between July 1, 2021, and July 31, 2022. Predictions were made by comparing each pair of different news media outlets. Among the three media outlets, Apple Daily, ETtoday News Cloud, and Formosa TV News network, there were no significant effects observed in the predictions. However, when predicting reports from United Daily News compared to the other three media outlets, due to data imbalance, the SMOTE method was utilized. Synthetic data was generated and removed from the test data to retain only the real data for testing. Machine learning models were employed using three algorithms: K-Nearest Neighbors (KNN), Random Forest, and Support Vector Machine (SVM), to predict the media reporting style. The experimental results indicated an improvement in precision and recall, making it easier to discern the media styles.

謝辭 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 vi
表目錄 viii
公式目錄 ix
第壹章緒論 1
1.1 研究背景 1
1.2 研究動機與目的 1
1.3 論文架構 2
第貳章文獻探討 3
2.1 網路爬蟲 3
2.2 自然語言處理 4
2.2.1 Jieba斷詞 4
2.2.2 CKIP斷詞 5
2.3 Word2Vec詞嵌入向量 5
2.4 TF-IDF向量 7
2.5 機器學習 8
2.5.1 K-近鄰演算法(K-Nearest Neighbors, KNN) 9
2.5.2 隨機森林(Random Forest) 10
2.5.3 支援向量機(Support Vector Machine, SVM) 10
2.6 SMOTE (Synthetic Minority Oversampling Technique) 11
第參章研究方法 13
3.1 研究架構 13
3.2 資料蒐集 15
3.3 文本前處理 15
3.4 Jieba斷詞 16
3.5 CKIP斷詞 19
3.6 建立Word2Vec詞嵌入向量 20
3.7 建立TF-IDF向量 21
3.8 模型架構 21
3.8.1 K-近鄰演算法(K-Nearest Neighbors, KNN) 22
3.8.2 隨機森林(Random Forest) 22
3.8.3 支援向量機(Support Vector Machine, SVM) 22
3.9 建立SMOTE 23
第肆章實驗結果 25
4.1 實驗資料集 25
4.2 評估方法 26
4.2.1 混淆矩陣(confusion matrix) 26
4.2.2 精確率(Precision) 26
4.2.3 召回率(Recall) 27
4.2.4 Precision-Recall curve(PR曲線) 27
4.3 特徵向量與機器學習演算法參數設置 27
4.4 Jieba與CKIP比較結果 33
4.5 特徵空間與機器學習模型之PR曲線圖比較結果 34
4.6 新聞媒體預測媒體報導風格 42
4.7 SMOTE方法去除測試擴增資料預測媒體報導風格 43
第伍章結論與未來展望 45
參考文獻 47

[1] 高端新冠肺炎疫苗. (2023). https://zh.wikipedia.org/zh-tw/%E9%AB%98%E7%AB%AF%E6%96%B0%E5%86%A0%E8%82%BA%E7%82%8E%E7%96%AB%E8%8B%97#cite_note-17
[2] 中時新聞網. (2022). https://www.chinatimes.com/realtimenews/20210824003374-260405?chdtv
[3] 三立新聞網. (2021). https://www.setn.com/News.aspx?NewsID=987330
[4] Newtalk新聞. (2020). https://newtalk.tw/news/view/2020-10-13/478646
[5] SANYA GOEL, MUDIT BANSAL, ATUL KUMAR SRIVASTAVA, NEHA ARORA. (2019, 6 24). Web Crawling-based Search Engine using Python. 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA), pp.(436-438).
[6] Xin Ge, Minnan Yue. (2022). Design and Implementation of System of the Web Vulnerability Detection Based on Crawler and Natural Language Processing. 2022 7th International Conference on Information and Network Technologies (ICINT), pp.(67-71).
[7] Wani Rohit Bhaginath, Sandip Shingade, Mahesh Shirole. (2015). Virtualized dynamic URL assignment web crawling model. 2014 International Conference on Advances in Engineering & Technology Research (ICAETR - 2014).
[8] Ayat Abodayeh, Reem Hejazi, Ward Najjar, Leena Shihadeh, Rabia Latif. (2023). Web Scraping for Data Analytics: A BeautifulSoup Implementation. 2023 Sixth International Conference of Women in Data Science at Prince Sultan University (WiDS PSU), pp.(65-69).
[9] Zhang Yao, Wang Daling, Feng Shi, Zhang Yifei, Leng Fangling. (2012). An Approach for Crawling Dynamic WebPages Based on Script Language Analysis. 2012 Ninth Web Information Systems and Applications Conference, pp.(35-38).
[10] Tshephisho Joseph Sefara, Mahlatse Mbooi, Katlego Mashile, Thompho Rambuda, Mapitsi Rangata. (2022). A Toolkit for Text Extraction and Analysis for Natural Language Processing Tasks. 2022 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD).
[11] Ankit Chahar, Ninad Patil, Darshan Walunj, Sai Rohith T, Rajat Shah, Himanshu Saratkar. (2022). An Indispensable Contemplation on Natural Language Processing Using Ensemble Techniques for Text Classification. 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), pp.(406-410).
[12] Meng-Jin Wu, Tzu-Yuan Fu, Yao-Chung Chang, Chia-Wei Lee. (2020). A Study on Natural Language Processing Classified News. 2020 Indo – Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo-Taiwan ICAN), pp.(244-247).
[13] CKIP Lab. (2021). https://ckip.iis.sinica.edu.tw/project/ws
[14] Tomáš Mikolov. https://en.wikipedia.org/wiki/Tom%C3%A1%C5%A1_Mikolov
[15] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. (2013). Efficient Estimation of Word Representations in Vector Space., pp.(1-12).
[16] Xiong Ao, Xin Yu, Derong Liu, Hongkang Tian. (2020). News keywords extraction algorithm based on TextRank and classified TF-IDF. 2020 International Wireless Communications and Mobile Computing (IWCMC), pp.(1364-1369).
[17] Qing Liu, Jing Wang, Dehai Zhang, Yun Yang, NaiYao Wang. (2018). Text Features Extraction based on TF-IDF Associating Semantic. 2018 IEEE 4th International Conference on Computer and Communications (ICCC), pp.(2338-2343).
[18] Amarjeet Rawat, Himani Maheshwari, Manisha Khanduja, Rajiv Kumar, Minakshi Memoria, Sanjeev Kumar. (2022). Sentiment Analysis of Covid19 Vaccines Tweets Using NLP and Machine Learning Classifiers. 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), pp.(225-230).
[19] K-最近鄰法(KNN). (2017). https://blog.csdn.net/fengbingchun/article/details/78464169
[20] C, Chethana. (2021). Prediction of Heart Disease using Different KNN Classifier. 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), pp.(1186-1194).
[21] 隨機森林. (2022年July月29日). https://zh.wikipedia.org/zh-tw/%E9%9A%8F%E6%9C%BA%E6%A3%AE%E6%9E%97
[22] Divya Pramasani Mohandoss, Yong Shi, Kun Suo. (2021). Outlier Prediction Using Random Forest Classifier. 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), pp.(27-33).
[23] Support Vector Machine. (2022). https://www.spiceworks.com/tech/big-data/articles/what-is-support-vector-machine/
[24] Nidaul Hasanati, Qurrotul Aini, Arndini Nuri. (2022). Implementation of Support Vector Machine with Lexicon Based for Sentiment Analysis on Twitter. 2022 10th International Conference on Cyber and IT Service Management (CITSM).
[25] N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer. (2002). SMOTE: Synthetic Minority Over-sampling Technique. pp. (321-357).
[26] SMOTE：產生相似的合成樣本. (2018). https://taweihuang.hpd.io/2018/12/30/imbalanced-data-sampling-techniques/

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文