以雙BERT分解式雙線性網路預測論壇文章熱門度 — 以「批踢踢」BBS為例_

帳號：guest(18.217.76.105) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	陳冠瑋
作者(英文):	Kuan-Wei Chen
論文名稱:	以雙BERT分解式雙線性網路預測論壇文章熱門度 — 以「批踢踢」BBS為例
論文名稱(英文):	Popularity Prediction for Forum Posts using Double BERT Decomposed Bilinear Network – a Case Study with the PTT BBS
指導教授:	江政欽
指導教授(英文):	CHENG-CHIN CHIANG
口試委員:	方文杰魏德樂
口試委員(英文):	Wen-Chieh Fang Der-Lor Way
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊工程學系
學號:	610921249
出版年(民國):	112
畢業學年度:	111
語文別:	中文
論文頁數:	37
關鍵詞:	自然語言處理、文字分類、網路爬蟲、熱門度預測
關鍵詞(英文):	Natural language processing、text classification、web crawlers、popularity prediction
相關次數:	推薦:0 點閱:7 評分: 下載:2 收藏:0

伴隨著網路的發達與科技的進步，在人手一機的年代中，網路論壇也逐漸變得熱門且具影響力。預測網路論壇上哪些討論主題將吸引大量關注，對於商業、政治和社會研究具有重要意義。本研究將利用深度學習技術研發神經網路模型對臺灣知名網絡論壇「批踢踢實業坊」（PTT）中發文的標題及內容進行文章熱門討論度的預測。
首先，我們對來自 PTT 不同看板的大量文章進行數據收集與清洗，提取文章的標題、內容、回覆留言數、推噓數得重要特徵。接著，利用自然語言處理技術對文章標題進行分詞與向量化，以便在神經網路中進行分析。經過數據前處理後，利用本研究自行研發的「雙BERT分解雙線性層神經網路」（Double BERT Decomposed Bilinear Net，DB2 Net）對文章的文本資料進行建模與分析，並通過交叉驗證和測試集來評估與探討模型的預測能力。
實驗結果顯示，本研究所提出的 DB2 Net 在預測 PTT 文章熱門討論度方面具有頗高的準確性和可靠性，證明了DB2 Net的有效性。此外，我們還發現文章標題中的關鍵字詞、熱門詞彙、發布時間等對於預測文章熱門討論度具有顯著影響。基於這些發現，本研究為網絡論壇文章熱門討論度的預測提供了一個有效的方法，並對未來在其他網絡平台的相關研究具有一定的參考價值。

With the development of the internet and technology, online forums have become increasingly popular and influential in the era of everyone having a device. Predicting which discussion topics on online forums will attract a lot of attention is important for business, political, and social research. This study will use deep learning technology to develop a neural network model to predict the popularity of articles based on their titles and content in Taiwan's well-known Bulletin Board System- "PTT".
First, we collected and cleaned a large amount of data from different boards on PTT, extracting important features of article titles, content, reply comments, and likes/dislikes. Then, we used natural language processing techniques to segment and vectorize the article titles for analysis in the neural network. After data preprocessing, we used our self-developed "Double BERT Decomposed Bilinear Net" (DB2 Net) to model and analyze the text data of the articles, and evaluated and explored the predictive ability of the model through cross-validation and testing sets.
The experimental results show that the DB2 Net proposed in this study has high accuracy and reliability in predicting the popularity of PTT articles, demonstrating the effectiveness of DB2 Net. In addition, we found that keywords in article titles, the strength of the emotional tone, popular vocabulary, and publication time have a significant impact on predicting the popularity of articles. Based on these findings, this study provides an effective method for predicting the popularity of online forum articles and has some reference value for future related research on other online platforms.

第一章　緒論　　 1
第一節　研究背景　　 1
第二節　研究動機與目的　　 3
第二章　相關研究　　 6
第一節　文本特徵提取　　 6
第二節　熱門度預測分析與建模　　 8
第三章　研究方法　　 10
第一節　分解式線性層　　 10
第二節　論壇貼文熱門度模型架構設計　　 14
第四章　實驗結果與討論　　 18
第一節　實驗環境與資料處理　　 18
第二節　DB2 Net 與其他模型比較　　 23
第五章　結果與未來展望　　 35

1.林嘉維。「利用NLP建立LSTM模型進行情感分析－以年長者對話資料為例」。碩士論文，國立雲林科技大學資訊管理系，2021, pp. 5-17。
2. Philip. (2020). 台北QA問答機器人(with BERT or ALBERT)https://github.com/p208p2002/taipei-QA-BERT (Jan 07, 2023)
3. 王逢霖。「文章熱門度即時預測系統的設計與實作」。碩士論文，東海大學資訊工程學系，2021, pp. 4-20。
4. S, Dixon. (2023, February 23). Number of monthly active Facebook users worldwide as of 4th quarter 2022. Retrieved from https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/ (Jan 07, 2023)
5. Salvador Rodriguex. (2019, October 29). Instagram head Adam Mosseri says he is worried about Facebook in the 2020 election. Retrieved from https://www.cnbc.com/2019/10/29/instagram-head-adam-mosseri-tells-bill-simmons-hes-worried-about-2020.html (Jan 04, 2023)
6. T. Xie, C. Wu and K. Zheng, "A Forwarding Prediction Model of Social Network based on Heterogeneous Network," 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2021, pp. 960-964
7. Y. Liu, W. Han, Y. Tian, X. Que and W. Wang, "Trending topic prediction on social network," 2013 5th IEEE International Conference on Broadband Network & Multimedia Technology, pp. 149-154
8. H. -Y. Chen and C. -T. Li, "Predicting and Analyzing Privacy Settings and Categories for Posts on Social Media," 2022 IEEE International Conference on Big Data (Big Data), 2022, pp. 5692-5697
9. P. Chen, X. Qiao, Z. Liu and X. Tian, "The design of architecture, workflow, algorithm on grid system for Social Network context prediction analysis," 2009 IEEE International Conference on Network Infrastructure and Digital Content, 2009, pp 1-3
10. H. Zhang and D. Li, "Naïve Bayes Text Classifier," 2007 IEEE International Conference on Granular Computing (GRC 2007), 2007, pp. 708-708
11. M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt and B. Scholkopf, "Support vector machines," in IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18-28
12. A, Anand. (2020). BERT testing on IMDB dataset : Extensive Tutorial. Retrieved from https://www.kaggle.com/code/atulanandjha/bert-testing-on-imdb-dataset-extensive-tutorial/notebook (Jan 07, 2023)
13. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational, 2019, pp. 1-3
14. XAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. "Attention Is All You Need." (2017). pp. 2-10
15. H. Wei, W. Zheng, Y. Xiao and C. Dong, "News-Comment Relevance Classification Algorithm Based on Feature Extraction," 2021 International Conference on Big Data Analysis and Computer Science (BDACS), 2021, pp. 149-152
16. P. Karvelis, D. Gavrilis, G. Georgoulas and C. Stylios, "Topic recommendation using Doc2Vec," 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1-6
17. A. Kamineni, M. Shrivastava, H. Yenala and M. Chinnakotla, "Siamese LSTM with Convolutional Similarity for Similar Question Retrieval", 2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), 2018, pp. 1-7
18. Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations ", 2020, pp. 1-10
19. K. Lee, D. Palsetia, R. Narayanan, M. M. A. Patwary, A. Agrawal and A. Choudhary, "Twitter Trending Topic Classification," 2011 IEEE 11th International Conference on Data Mining Workshops, 2011, pp. 251-258
20. S. Bhatia, "Application and evaluation of Machine Learning for news article popularity prediction," 2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), 2021, pp. 1-5
21. Yu Wang, "A new concept using LSTM Neural Networks for dynamic system identification," 2017 American Control Conference (ACC), 2017, pp. 5324-5329
22. D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa and N. Harada, "Real-Time Speech Enhancement Using Equilibriated RNN," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 851-855
23. L. Yu, Y. Gao, J. Zhou and J. Zhang, "Parameter-Efficient Deep Neural Networks With Bilinear Projections," in IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 9, 2021, pp. 4075-4085
24. Sun, Yuni. (2020). "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module. Retrieved from https://github.com/fxsjy/jieba (Feb 19, 2022)
25. G. Li, W. Wang, L. Zhu, J. Peng, X. Li and R. Luo, "Research frontiers of pre-training mathematical models based on BERT," 2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), 2022, pp. 154-158
26. Y. Cui, W. Che, T. Liu, B. Qin, and Z. Yang, “Pre-Training With Whole Word Masking for Chinese BERT ,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2019, pp. 3504–3514
27. X. Song, A. Salcianu, Y. Song, D. Dopson, and D. Zhou, “Fast WordPiece Tokenization“, 2021, pp.1-15

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文