帳號:guest(          離開系統
字體大小: 字級放大   字級縮小   預設字形  


作者(英文):Wan-Qing Hsu
論文名稱(英文):A Web Visitor Behavior Analysis of A University Website Based on Big Data Analytics Techniques
指導教授(英文):Chung Yung
口試委員(英文):Yu-Lan Yuan
Guan-Ling Lee
關鍵詞(英文):Big dataWeb miningDiscriminant analysis
  • 推薦推薦:0
  • 點閱點閱:29
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:4
  • 收藏收藏:0

在本論文中,我們利用分析結果將網站到訪者所在的國家分成三類:1) 申請件數= 0,2) 申請件數= 1,3) 申請件數> 1。我們使用的方法分成三個階段。首先,我們使用Incidental and Frequent User (IFU) 分析將所有到訪者分成兩組;分別是經常性到訪者和偶發性到訪者。其次,我們定義了七個有影響力的因素與IFU結果做排列組合,並在Visitor Browsing Behaviors (VBB) 分析中計算它們的影響。最後,我們對VBB分析的結果進行了區別分析,並找出較大影響的因素。


The goal of this thesis is to use big data analysis technology to analyze the relationship between the number of applications from various countries and the web log of the browsing behavior on university department websites. In order to explore their relationship, we decided to divide the countries into three groups and use discriminant analysis to analyze them. We used the data from the first four semesters to establish a model to verify the results of the 2021 spring semester, and the experimental accuracy rate is 79.6%. Then based on the results, we propose feasible strategies to make conclusions.

In this thesis, we use the analysis results to classify the countries, from which the visitors of the website, into three categories: 1) = 0, we accept to the application from the countries, 2) = 1, we accept exactly one application from each of the countries, and 3) > 1, we accept more than one application from each of the countries. Our methodology includes three plases. First, we use Incidental and Frequent User (IFU) analysis to classify all visitors into two groups; namely, frequent visitors and incidental visitors. Second, we define seven influential factors for IFU and experiment on their impact in the Visitor Browsing Behaviors (VBB) analysis. Finally, we perform the discriminant analysis with the influential factors with the best indication (indices) in the VBB analysis.

We analyze the behavior patterns for spring semesters and fall semesters and the overall behavior pattern. We use the data from the 2019 spring semester to the 2020 fall semester for experiments. There are a total of 11,182,613 records and 2,246,882 visitors. We use these data to predict the number of applications for the 2021 spring semester. Finally, we use the analysis result of the overall visitor behavior of the four semesters and predict on the classification of countries based on thesis number of applications. We get the analysis accuracy rate of 79.6%, while the accuracy rate is reduced to 77.7% with result of 2019 and 2020 spring data only.

We hope that after the results of this prediction, we can instantly know the application status of those countries from which. If the visit status of a certain country during this semester is not as expected, we can use online advertisement and videos to promote the country in the remaining two months of application time. It is hoped that in the future, we will be able to add data from other universities or departments and use different universities or the same department to observe the differences in browsing conditions and publicity strategy.
1 Introduction   1
2 Background   5
2.1 Big Data   5
2.2 Big Data Analytics Architecture   7
2.3 Web Mining   8
2.4 Discriminant Analysis   9
3 Visitor Classi cation   11
3.1 Overall Analysis Framework   11
3.2 Incidentals and Frequents User Analysis   13
3.2.1 Definition   14
2.2.2 Analysis Algorithm   16
3.3 Seven Influential Factors   19
3.4 Visitor Browsing Behavior Analysis: Based on Seven Influential Factors   22
4 Visitor Behavior Analysis   29
4.1 Combinations of Independent Variables   29
4.2 Partial Least Squares Regression Analysis   30
4.3 Discussion on PLS Regression  35
4.4 Preparation For the Discriminant Analysis  36
4.5 Discriminant Analysis in SPSS  38
5 Strategy Based on Analysis Results  53
5.1 Strategy of Pattern B  58
5.1.1 Case1:Indonesia  58
5.1.2 Case2:Kyrgyzstan  59
5.1.3 Case3:Malawi  60
5.1.4 Case4:Mongolia  62
5.1.5 Case5:Pakistan  63
5.1.6 Case6:SouthAfrica  64
5.2 Strategy of Pattern C  66
5.2.1 Case1:Gambia  66
5.2.2 Case2:Haiti  67
5.2.3 Case3:Nigeria  68
6 Discussion and Conclusion  71
6.1 Program Running  71
6.2 Analysis result  72
6.3 Conclusion   73
[1] J. Sun, C. Zhang, L. Ou (2021). Towards Visualized User Profile Analysis from Massive Web Log. 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), pp. 281-286.
[2] H. Chen & Y. Xiao (2021). Research on The Analysis of Users' Behavior Based on Big Data. 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pp.184-187.
[3] Meng-Yuan Wu (2018). A new web visitor behavior analysis based on
big data analytics techniques. (Master's thesis, Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien County). Retrieved from https://hdl.handle.net/11296/343rte
[4] Hsieh, Yi-Wei (2019). Exploring the Position of Travel Tourism Competitiveness in the Relationship between Web Browsing History and the Number of Visitors - Taking the 2018 Taiwan Lantern Festival as an Example. (Master's thesis, Graduate Institute of Sport, Leisure and Hospitality Management, National Taiwan Normal University, Taipei City). Retrieved from https://hdl.handle.net/11296/gfky33
[5] Chia-Ching Chen (2018). Big Data Analysis for Largest Combination of Frequently Visited Web Pages Based on Web Log Data. (Master's thesis, Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien County). Retrieved from https://hdl.handle.net/11296/3cx82m
[6] M. Kumar and Meenu, "Analysis of visitor's behavior from web log using web log expert tool," 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA), 2017, pp. 296-301
[7] B. M. Gayathri and C. P. Sumathi, "Feature selection using Linear Discriminant Analysis for breast cancer dataset," 2018 IEEE International Conference on Computational Intelligence and Computing Research (IC- CIC), 2018, pp. 1-5
[8] J. Ghosh and S. B. Shuvo, "Improving Classification Model's Performance Using Linear Discriminant Analysis on Linear Data," 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2019, pp. 1-5
[9] Min Chen, Shiwen Mao, Yin Zhang, Victor C.M. Leung (2014). Big data: related technologies, challenges and future prospects. Springer, Cham.
[10] Fatima-Zahra Benjelloun, Ayoub Ait Lahcen, and Samir Belfkih (2015). An overview of big data opportunities, applications and tools. 2015 Intelligent Systems and Computer Vision (ISCV), 2015, pp. 1-6.
[11] T. Chen, S. Rao and J. Hong, "Research on the Development of Maritime and Air Intelligence Big Data," 2020 6th International Conference on Big Data and Information Analytics (BigDIA), 2020, pp. 367-371
[12] A. Juneja and N. N. Das, "Big Data Quality Framework: Pre-Processing Data in Weather Monitoring Application," 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), 2019, pp. 559-563
[13] John Wiley & Sons (2015). Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. Indianapolis : EMC Education Services.
[14] Chung Yung (2015). Mining Massive Web Log Data of an Official Tourism Web Site as a Step towards Big Data Analysis in Tourism. Proceedings of the 5th ASE International Conference on Big Data (BigData 2015), (Article F3-03). Kaohsiung, Taiwan, R.O.C.
[15] Oren Etzioni (1996). The World-Wide Web: quagmire or gold mine? Commun. ACM 39, 11 (Nov. 1996), 65{68.
[16] Yeqing Li (2017). Research on Technology, Algorithm and Application of Web Mining. 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), 2017, pp. 772-775.
[17] Jokar, Nasrin & Honarvar, Ali & AgHAMIRZADEH, Shima & ESFANDIARI, Khadijeh (2016). Web mining and Web usage mining techniques. Bulletin de la Societe Royale des Sciences de Liege, 85, 321-328.
[18] R. Cooley, B. Mobasher and J. Srivastava, "Web mining: information and pattern discovery on the World Wide Web," roceedings Ninth IEEE International Conference on Tools with Artificial Intelligence, 1997, pp. 558-567.
[19] B. Singh and H. K. Singh, "Web Data Mining research: A survey," 2010 IEEE International Conference on Computational Intelligence and Computing Research, 2010, pp. 1-10.
[20] Wolfgang Karl Härdle, Léopold Sima (2015). Applied Multivariate Statistical Analysis. Springer, Berlin, Heidelberg.
[21] Huberty, C. J. (1994). Applied discriminant analysis. New York : John Wiley and Sons.
[22] Vincenzo Esposito Vinzi, Wynne W. Chin, Jörg Henseler, Huiwen Wang (2010). Handbook of Partial Least Squares. Springer, Berlin, Heidelberg.
[23] Keith McCormick, Jesus Salcedo (2017). SPSS Statistics for Data Analysis and Visualization. Indianapolis, IN : John Wiley and Sons.
[24] Geoffrey J. McLachlan (2004). Discriminant Analysis and Statistical Pattern Recognition. Wiley Series in Probability and Statistics.
[25] V. M. Jerković, V. Kojić and M. B. Popović (2015). Linear discriminant analysis: Classification of on-surface and in-air handwriting. 2015 23rd Telecommunications Forum Telfor (TELFOR), 2015, pp. 460-463.
[26] J. Zhang, "Research on Big Data Storage Structure and Query Optimization," 2017 International Conference on Computer Systems, Electronics and Control (ICCSEC), 2017, pp. 1508-1511.
第一頁 上一頁 下一頁 最後一頁 top
* *