Player score prediction based on multiple linear regression model

Authors

  • Siyuan Wang
  • Jinming Chen
  • Haolong Chen

DOI:

https://doi.org/10.56028/aetr.4.1.246.2023

Keywords:

Multiple linear regression model; Box-Cox method; hypothesis testing; FIFA.

Abstract

The recent World Cup in Qatar has just come to an end and the performance of a player is key to winning the tournament, so predicting a player's score during the season based on various metrics and performance largely determines whether or not he or she will play. Our main work is to establish how far the linear regression model is based on the discovery of the linear relationship between the data set. First, we filtered the variables with correlations greater than 0.9 by Pearson's correlation coefficient to eliminate the co-linearity problem and identified the 11 variables we used. Then a random sample was extracted, the data set was cut and the null was removed. We then screened the variables again by Forward selection to build the first regression model with an R2 of 0.67. Since some of the data had some nonlinearity, we compared the transformation of the global data (Box-Cox method) with the transformation of the local data ( transformation of ) and found that adding the latter was better. After rejecting significant variables, we conducted regression again and obtained our final model with an R2 of 0.97+. Then we carried out a model diagnosis and proved that our model was indeed consistent with the linear regression model through five hypothesis tests and collinearity tests. Finally, we ran our results using the test set and found that the results were better on the test set.

Downloads

Published

2023-03-21