作者(英文):Ying-Chieh Chiu
論文名稱(英文):Analyzing the Media Reporting Style of Medigen COVID-19 Vaccine News Content based on machine learning
指導教授(英文):Guan-Ling Lee
口試委員(英文):Shou-Chih Lo
Yao-Chung Chang
關鍵詞(英文):Natural Language ProcessingWord2VecTF-IDFSMOTEMachine LearningPredicting Reporting Style
媒體新聞報導對於內容存在相似以及偏頗,本研究,從2021年7月1日~2022年7月31日蒐集數據來自四家媒體關於高端疫苗報導內容分別為蘋果日報、ETtoday新聞雲、民視新聞網以及聯合新聞網,以每兩家不同新聞媒體進行預測,蘋果日報、ETtoday新聞雲、民視新聞網這三家媒體報導,在預測上並沒有明顯的效果,然而聯合新聞網報導與其他三家媒體報導進行預測,由於資料不平衡,所以使用SMOTE方法,將測試資料裡擴增資料拿掉保留真實資料拿去做測試,透過機器學習模型,使用三種演算法為K-近鄰演算法(K-Nearest Neighbors, KNN)、隨機森林(Random Forest)與支援向量機(Support Vector Machine, SVM),並預測媒體報導風格。實驗結果得出,精確率與召回率提升,媒體風格更容易被分辨。
Since December 2019, the COVID-19 pandemic has spread worldwide, causing an increase in confirmed cases and even fatalities in Taiwan. To mitigate severe cases and provide protection, vaccination has become crucial. However, COVID-19 is a novel virus, and initially, there were no vaccines available for prevention. Consequently, vaccine researchers worldwide urgently developed vaccines and received emergency authorizations after testing. Due to limited supply, Taiwan faced challenges in providing vaccines to its entire population. In response, Taiwanese vaccine researchers quickly developed a high-end vaccine to enable more people to be vaccinated promptly. The outbreak of the pandemic has also increased the importance of vaccine reporting in the news media.
Media news reports exhibit similarities and biases in their content. In this study, data was collected from four media sources, namely the Apple Daily, ETtoday News Cloud, Formosa TV News network, and United Daily News, between July 1, 2021, and July 31, 2022. Predictions were made by comparing each pair of different news media outlets. Among the three media outlets, Apple Daily, ETtoday News Cloud, and Formosa TV News network, there were no significant effects observed in the predictions. However, when predicting reports from United Daily News compared to the other three media outlets, due to data imbalance, the SMOTE method was utilized. Synthetic data was generated and removed from the test data to retain only the real data for testing. Machine learning models were employed using three algorithms: K-Nearest Neighbors (KNN), Random Forest, and Support Vector Machine (SVM), to predict the media reporting style. The experimental results indicated an improvement in precision and recall, making it easier to discern the media styles.
