HOUSING PRICE PREDICTION USING XGBOOST AND RANDOM FOREST METHODS

Eighth International Scientific Conference Contemporary Issues in Economics, Business and Management [EBM 2024], [pp. 417-424]

AUTHOR(S) / АУТОР(И): Ljiljana Matić , Zoran Kalinić

Download Full Pdf   

DOI: 10.46793/EBM24.417M

ABSTRACT / САЖЕТАК:

Real estate prices often experience significant variations due to various reasons such as shifts in land value and local infrastructure development. An application for predicting housing prices, taking into account the property and neighborhood characteristics and infrastructure, is essential for providing customers with accurate insights into where and when to invest in property. This research focuses on apartment price prediction in real estate market of Belgrade, employing advanced machine learning algorithms, XGBoost and Random Forest, to estimate property values. These algorithms are highly effective at processing large datasets and uncovering complex relationships among key predictors.

Data preprocessing plays a crucial role in ensuring model accuracy. This process includes data cleaning, handling missing values, and addressing outliers, while categorical features are transformed through one-hot encoding to enhance model performance. Both XGBoost and Random Forest models are applied to predict apartment prices, providing valuable insights into the most influential factors in the market. A comparative analysis of these methods not only highlights their respective strengths and effectiveness within the context of the Serbian real estate market but also demonstrates their utility in improving prediction accuracy.

The results of this study contribute to a more transparent and efficient real estate market in Belgrade, offering valuable tools for all stakeholders: sellers, prospective buyers, policymakers, local administration, etc. By integrating these advanced predictive models, this research also lays the groundwork for future advancements in real estate analytics, presenting a robust framework for market forecasting and strategic decision-making. Although applied to apartment price prediction, the models can be, with minor modifications, applied to price predictions of other types of real estate (houses, commercial, industrial, land plots, etc.). Additionally, the findings suggest potential areas for further research to refine prediction models and adapt them to evolving market conditions.

KEYWORDS / КЉУЧНЕ РЕЧИ:

Machine Learning, XGBoost, Random Forest, Housing Price Prediction, Real Estate

REFERENCES / ЛИТЕРАТУРА:

  • Antipov, E. A., & Pokryshevskaya, E. B. (2012). Mass appraisal of residential apartments: An application of Random Forest for valuation and a CART-based approach for model diagnostics. Expert Systems with Applications, 39(2), 1772–1778. https://doi.org/10.1016/j.eswa.2011.08.077
  • Amal Asselman, M., Khaldi, M., & Aammou, S. (2021). Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interactive Learning  Environments. 31(6), 3360–3379, https://doi.org/10.1080/10494820.2021.1928235
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
  • Čeh, M., Kilibarda, M., Lisec, A., & Bajat, B. (2018). Estimating the performance of random forest versus multiple regression for predicting prices of apartments. ISPRS International Journal of Geo-Information, 7(5), 168.  https://doi.org/10.3390/ijgi7050168
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
  • Fan, G. Z., Ong, S. E., & Koh, H. C. (2006). Determinants of house price: A decision tree approach. Urban Studies, 43(12), 2301–2315. https://doi.org/10.1080/00420980600990928
  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
  • Hjort, A., Pensar, J., Scheel, I., & Sommervoll, D. E. (2022). House price prediction with gradient-boosted trees under different loss functions. Journal of Property Research, 39(4), 338–364. https://doi.org/10.1080/09599916.2022.2073874
  • Limsombunchai, V. (2004). House price prediction: Hedonic price model vs. artificial neural network. New Zealand Agricultural and Resource Economics Society Conference, 25–26.
  • Lu, S., Li, Z., Qin, Z., Yang, X., & Goh, R. S. M. (2017). A hybrid regression technique for house prices prediction. Proceedings of the 2017 IEEE International Conference on Big Data, 2585–2590. https://doi.org/10.1109/BigData.2017.8258232
  • McCluskey, W., & Anand, S. (1999). The application of intelligent hybrid techniques for the mass appraisal of residential properties. Journal of Property Investment & Finance, 17(3), 218–239. https://doi.org/10.1108/14635789910270495
  • Nielsen, D. (2016). Tree Boosting with XGBoost-Why Does XGBoost Win Every Machine Learning Competition? (Master’s thesis, NTNU).
  • Ogunleye, B. O. (2021). Statistical Learning Approaches to Sentiment Analysis in the Nigerian Banking Context (Ph.D. thesis, Sheffield Hallam University)
  • Park, B., & Bae, J. K. (2015). Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data. Expert Systems with Applications, 42(6), 2928-2934. https://doi.org/10.1016/j.eswa.2014.11.040
  • Rawool, A. G., Rogye, D. V., Rane, S. G., & Bharadi, V. A. (2021). House price prediction using machine learning. Iconic Research and Engineering Journals, 4(11), 29–33.
  • Selim, H. (2009). Determinants of house prices in Turkey: Hedonic regression versus artificial neural network. Expert Systems with Applications, 36(2), 2843–2852. https://doi.org/10.1016/j.eswa.2008.01.044
  • Shobayo, O., Zachariah, O., Odusami, M. O., & Ogunleye, B. (2023). Prediction of stroke disease with demographic and behavioral data using random forest algorithm. Analytics, 2, 604–617. https://doi.org/10.3390/analytics2020034
  • Usama, M., Qadir, J., Raza, A., Arif, H., Yau, K. L. A., Elkhatib, Y., & Al-Fuqaha, A. (2019). Unsupervised machine learning for networking: Techniques, applications, and research challenges. IEEE Access, 7,  65579–65615. https://doi.org/10.1109/ACCESS.2019.2918193
  • Verikas, A., Lipnickas, A., & Malmqvist, K. (2002). Selecting neural networks for a committee decision. International Journal of Neural Systems, 12(5), 351–361. https://doi.org/10.1142/S0129065702001229
  • Zhou, L., Pan, S., Wang, J., & Vasilakos, A. V. (2017). Machine learning on big data: Opportunities and challenges. Neurocomputing, 237, 350–361. https://doi.org/10.1016/j.neucom.2016.04.061
  • Zou, C. (2023). The house price prediction using machine learning algorithm: The case of Jinan, China. Highlights in Science, Engineering, and Technology, 39, 327–333.