帳號:guest(18.117.192.118)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目勘誤回報
作者:洪孟煬
作者(英文):Meng-Yang Hong
論文名稱:辨識惡意彈出式之廣告:使用隨機森林演算法
論文名稱(英文):Identify malicious pop up ads:using random forest algorithm
指導教授:許芳銘
指導教授(英文):Fang-Ming Hsu
口試委員:陳偉銘
鄭仁亮
口試委員(英文):Wei-Ming Chen
Ren-Liang Cheng
學位類別:碩士
校院名稱:國立東華大學
系所名稱:資訊管理學系
學號:611035108
出版年(民國):111
畢業學年度:110
語文別:中文
論文頁數:48
關鍵詞:機器學習網路爬蟲彈出式廣告隨機森林演算法
關鍵詞(英文):pythonmachine learningweb crawlerpop up adsrandom forest algorithm
相關次數:
  • 推薦推薦:0
  • 點閱點閱:14
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏收藏:0
隨著網際網路的發達,使用網際網路瀏覽網頁的人數也日益上升,廣告商也利用網頁為消費者提供了更多的資訊,彈出式廣告便是其中一種方式。而彈出式廣告令許多消費者或純粹只是想瀏覽網頁的人來說,都成為了惱人的存在,因為它遮蔽了使用者瀏覽網頁的內容,嚴重的影響使用者的體驗。
本研究基於機器學習的技術來自動檢測彈出式廣告。首先使用python爬蟲程式抓取廣告網頁資料,從Alexa top 100K網站上擷取全球前五千大瀏覽次數的網頁,爬取該網站之HTML結構,整理出可用之資訊,並透過JavaScript使用Mutation Observer API監控DOM之變化,接著將蒐集到之資料進行標記與分類,整理歸納出不同的特徵,並結合隨機森林機器學習演算法進行分析,判別網頁是否包含惡意彈出式廣告,以及何類特徵為決定其為彈出式廣告的依據。
With the development of the Internet, the number of people who use the Internet to browse web pages is also increasing, and advertisers also use web pages to provide consumers with more information, and pop-up advertisements are one of them. And pop-up advertisements have become annoying to many consumers or people who just want to browse the web, because it obscures the content of the user's browsing of the web and seriously affects the user's experience.
This study is based on machine learning techniques to utomatically detect popup advertisements. First, use a python crawler to crawl the advertising webpage data, extract the top 5,000 webpages viewed from the Alexa top 100K website, crawl the HTML structure of the website, sort out the available information, and use the Mutation Observer API through JavaScript Monitor the changes of the DOM, then mark and classify the collected data, sort out different features, and analyze with the random forest machine learning algorithm to determine whether the webpage contains malicious pop-up advertisements, and what characteristics are the basis for determining it as a pop-up ads.
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 4
1.3 研究目的 5
1.4 各章內容 6
第二章 文獻探討 7
2.1 彈出式廣告 7
2.2 辨識彈出式廣告的各種現有方法及研究 8
2.3 機器學習應用於預測問題 10
2.3.1 機器學習 10
2.3.2 決策樹 13
2.3.3 隨機森林演算法 15
第三章 研究方法 18
3.1 研究架構 19
3.2 樣本選取 19
3.3 研究工具 20
3.3.1 python 工具及欲使用的套件與模組 20
3.3.2 DOM 與 Mutation Observer API 23
3.4 資料爬蟲 24
3.5 資料前處理 25
3.5.1 屬性特徵定義 25
3.5.2 擷取屬性 26
3.5.3 屬性標記 28
3.6 機器學習演算法 30
3.7 建立模型 30
第四章 實驗成果 31
4.1 數據統計 31
4.2 模型評估指標 31
4.3 模型預測結果 34
4.4 重要參數分析 35
4.5 演算法比較 36
4.6 相似研究比較 37
第五章 結論與未來展望 39
參考文獻 40
附錄一 建立彈出式廣告 44
附錄二 擷取網頁 HTML 程式碼 45
附錄三 Mutation Observer API 46
附錄四 隨機森林演算法 47
附錄五 重要參數分析 48
[1] Louisa, H. (2012). Online Advertising Research in Advertising Journals: A Review.Journal of Current Issues & Research in Advertising, 30(1), 31-48.
[2] David, S. E. (2009). The Online Advertising Industry: Economics, Evolution, and Privacy. Journal of Economic Perspectives, 23(3), 37-60.
[3] Interactive Advertising Bureau. (2020). Internet advertising revenue report.Retrieved from https://www.iab.com/
[4] Scott, M., Andrea E., Peter P., Dennis F. G. (2007). The effects of online advertising. Communications of the ACM, 50(3), 84-88.
[5] Steven, M. E., Hairong, L., & Joo, H. L. (2002). Forced Exposure and Psychological Reactance: Antecedents and Consequences of the Perceived Intrusiveness of PopUp Ads. Journal of Advertising, 31(3), 83-95.
[6] Hsieh, A. Y., Lo, S.K., Chiu, Y.P., & Lie, T. (2020). Do not allow pop-up ads to appear too early: Internet users’ browsing behaviour to pop-up ads. Behaviour & Information Technology. Behaviour & Information Technology, 40(16), 1796-1805.
[7] Patrali, C. (2008). Are Unclicked Ads Wasted? Enduring Effects of Banner and PopUp Ad Exposures on Brand Memory and Attitudes. Journal of Electronic Commerce Research, 9(1), 51-61.
[8] Wegert, T. (2002). Pop-up Ads, Part 1: Good? Bad? Ugly? Retrieved from http://www.clickz.com/experts/media/media_buy/article.php/991121/
[9] Johnson, M., Slack, M., & Keane, P. (1999). Inside the mind of the online consumer - increasing advertising effectiveness, Jupiter Research, Retrieved from http://www.jupiter.com/41
[10] Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260.
[11] Alina M. C., & Robert J. K. (1999). Strategies for Internet Middlemen in the Intermediation/Disintermediation/Reintermediation Cycle. Electronic Markets, 9(1), 109-117.
[12] David, F. (1996). Javascript: The Definitive Guide. California, United States ofAmerica: O’Reilly Media.
[13] Shah, K., Ninu, J., & Anand, S. M. (2018). Cognitive Marketing and Purchase Decision With Reference to Pop Up and Banner Advertisements. The Journal of Social Sciences Research, 4(12), 718-735.
[14] Scott, M., Andrea, E., Dennis, G., & Peter, P. (2004). A Study of the Effects of Online Advertising: A Focus on Pop-Up and In-Line Ads. Paper presented at Proceedings of the Third Annual Workshop on HCI Research in MIS, Washington, D.C.
[15] Rust, R. T., & Varki, S. (1996). Rising from the ashes of advertising. Journal of Business Research, 37(3), 173-181.
[16] Ashish, K. S., & Vidyasagar, P. (2009). Blocking online advertising - A state of the art. Paper presented at 2009 IEEE International Conference on Industrial Technology, Gippsland, VIC.
[17] Benjamin, S., Joel, W., & Johnny, R. (2018). The effect of ad blocking on website traffic and quality. The RAND Journal of Economics, 49(1), 43-63.
[18] Abhishek, R., Hossein, G., & Karthik, N. K. (2017). Ad-Blockers, Advertisers, and Internet: The Economic Implications of Ad-Blocker Platforms. Paper presented at ICIS 2017 Proceedings, Seoul, South Korea.
[19] Enric, P., Oliver, H., & Anja, F. (2015). Annoyed Users: Ads and Ad-Block Usage in the Wild. Paper presented at Proceedings of the 2015 Internet Measurement Conference, New York, United States of America.
[20] Ahsan, Z., Aafaq, S., Dilawer, A., & Anupam, D. (2021). Understanding the Privacy Implications of Adblock Plus's Acceptable Ads. Paper presented at Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, Hong Kong, China.
[21] Sindhu, V., Nivedha, S., & Prakash, M. (2020). An Empirical Science Research on Bioinformatics in Machine Learning. Journal of Mechanics of Continua and Mathematical Sciences, 7, 86-94.
[22] Mitchell, T. M. (1997). Machine Learning. New York, United States of America: McGraw-Hill.
[23] Shatha, G., Abir, J. H., Dhiya, A. J., Wasiq, K., Rawaa, A. J., Thar, B., Ahmed A. S., Mohammed, K. (2022). Evaluating Student Levelling Based on Machine Learning Model’s Performance. Discover Internet of Things, 2(1), 36-68.
[24] Breiman, L. (2001). Random Forests. Machine Learning, 45, 5-32.
[25] Anthony, J. M., Robert, N. F., Yang, L., Nathaniel, A. W., & Steven, D. B. (2004). An introduction to decision tree modeling. Journal of Chemometrics, 18(6), 275-285.
[26] Song, Y. Y., & Ying L. (2015). Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry, 27(2), 130-135.
[27] Breiman, L., Jerome, H. F., Richard, A. O., & Charles, J. S. (1984). Classification
And Regression Trees. New York, United States of America: Routledge.
[28] Suryakanthi, T. (2020). Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm. International Journal of Advanced Computer Science and Applications, 11(2), 612-619.
[29] Wang, J. K. (2018). Random Forest. Retrieved from https://sophia.ddns.net/
[30] Amazon Web Services. (2022). Alexa Top Sites. Retrieved from
https://aws.amazon.com/tw/alexa-top-sites/
[31] Python Software Foundation. (2022). Python Release 3.10. Retrieved from https://www.python.org/
[32] Leonard, R. (2022). beautifulsoup4 Release 4.11. Retrieved from
https://pypi.org/project/beautifulsoup4/
[33] NumPy Steering Council. (2022). NumPy Release 1.23.0. Retrieved from https://numpy.org/
[34] AQR Capital Management (2022). Pandas Release 1.4.3. Retrieved from https://pandas.pydata.org/
[35] Lauren, W., Arnaud, L. H., Vidur, A., Steve, B., Mike, C., Scott, I., Ian, J., Gavin, N., Jonathan, R., Robert, S., Chris, W. (2000). Document Object Model Level 1 Specification. W3C Working Draft.
[36] Petingo, A. (2022). MutationObserver. Retrieved from https://developer.mozilla.org/docs/Web/API/MutationObserver
[37] Muhammad, H. M., Zhiyun, Q., Zubair, S., Karishma, D., & Pan, H. (2016). A First Look at Ad-block Detection: A New Arms Race on the Web. Retrieved from https://arxiv.org/abs/1605.05841
[38] Viruthika, B., Suman, S. D., Kumar, E.M., & Prabhu, D. (2020). Detection of Advertisement Click Fraud Using Machine Learning. International Journal of Advanced Science and Technology, 29(5), 3238-3245.
(此全文20270915後開放外部瀏覽)
01.pdf
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *