作者(英文):Che-Ming Chen
論文名稱(英文):The Valuation of Selling Price of Real Estate in Taipei Metro Area with Machine Learning
指導教授(英文):Jin-Long Lin
口試委員(英文):Chieh-Tse Hou
Chung-Shu Wu
Chia-Hui Huang
Chien-Fu Lin
關鍵詞(英文):Model-based Recursive PartitioningMachine LearningActual Selling Price of Real Estate
本文應用「遞迴分區迴歸樹模型」(Model-based recursive partitioning、MOB)來分析影響房價的關鍵因素,並建構房價預測模型。MOB係迴歸與決策樹的混合模型,對於已知是線性或非線性的影響變數以線性或非線性迴歸模型處理之;對於影響非常複雜的變數則納入決策樹中,作為區分變數。本文使用的資料庫是不動產交易實價查詢,這是一個登錄正確性極高且交易的細節皆很完整的資料庫。該資料庫除了房價及房屋基本資訊外,額外的加入「與最近捷運站的距離」、「與最近公園的距離」、「公園的面積」,「移轉樓層」,「使用目地」,「經緯度」及「交易年份」等變數。這些變數都對房價有很大的影響且相當完整,其中某些變數適合以母數迴歸模型處理,如房屋的面積等;有些變數如經緯度則宜以納入決策樹中作為分區變數。混合模型能夠兼顧估計效率,具有非常高的彈性,且適宜處理連續與間斷型值性變數,實證結果發現MOB對於房價有很好的預測能力。
This thesis applies the “Model-based recursive partitioning” (MOB) to analyze the key factors affecting housing prices and construct a housing price prediction model. MOB is a mixed model consisting of tree model and regression model. Those factors with known linear or nonlinear effects are included in the regression model as regressors while those factors with extremely complicated nonlinear impacts are treated as tree partitioning variables. We apply the MOB to the “actual selling price of real estate”, which is a database with accurate registration and complete transaction details. In addition to the basic information of house prices, the database adds “distance to the nearest MRT station”, “distance to the nearest park”, “area of the park”, “transaction floor”, “purpose of usage “, and variables such as “latitude and longitude” and “transaction year”. These variables have a great impact on housing prices and are quite complete. Some of these variables are suitable for processing with a parametric regression model, such as the area of houses, etc. Some other variables, such as latitude and longitude, should be included in the decision tree as partition variables. The hybrid model could achieve estimation efficiency, possess a very high flexibility, and is suitable for dealing with both continuous and discrete variables. Empirical analysis shows that MOB has a good predictive ability for housing prices.
In addition to the “recursive partition regression tree model”, we adds a large number of supervised and unsupervised machine learning models for the purpose of comparison. While some of them can only output categorical estimates, some others can produce continuous estimates. We mainly uses the natural logarithm of the price per unit area of the house price as the output variable. Models under investigation include multiple regression analysis, regression tree analysis, recursive partition regression tree model, adabag, adaBoost, BlackBoostModel, C50Model, EarthModel, FDAModel, GBMModel, GLMBoostModel, GLMModel, GLMNetModel, KNNModel, LARSModel, LDAModel, LMModel, MDAModel, NNetModel, PLSModel , RangerModel, and RpartModel. All of the last nineteen models use the machine learning models provided in the R package MachineShop while the first three use different R packages. The model evaluation criterion used are RMSE, MAE, and MAPE. Empirical analysis finds that the random forest model has the best performance, immediately followed by recursive partition regression tree model. Our analysis sheds lights on selecting the right machine learning model and predictors to evaluate the price of real estate. Our analysis could allow traders to have a sensible evaluation model for the real estate price, and could improve the market efficiency in the real estate market.
第壹章 緒論 1
第一節 研究動機與目的 1
第二節 研究資料 1
第貳章 文獻探討 5
第參章 研究方法 7
第肆章 實證分析 19
第伍章 結論與未來發展 31
第一節 結論 31
第二節 未來發展 32
參考文獻 33
