Random Statistics

Bias-Variance Trade off

If we are to create a model that will be used for estimating future values for a specific parameter, we have to consider the importance of the bias-variance trade-off. We say that an estimator is biased, if this estimator is not able to give the true parameter value whatever we do (for instance infinite number of observations). By variance we mean the variance of the future fitted values. We will show that there is a trade-off between these two features of the model creation.

For example, linear regression models are inflexible models, since it can only detect the linear relations between dependent and independent variables. So, the estimators will not change that if one of the observations change. However, when it comes to model that is able to detect complicated nonlinearities, its estimators will be affected by the changes in observation.

Let’s assume that the relationship that we are going to evaluate is linear, if we use more complicated model rather than linear model, there is no such difference based on the unbiasedness, since linear model is able to detect the relationship. However, since complicated model will overfit the dataset, its estimations for future values will be less accurate This happens because the statistical learning procedure of complicated model is working too hard to find patterns in the training data and may be picking up some patterns that are just caused by random chance rather than by true properties of the unknown function.

We have to indicate that, one should select the model without biasedness in its estimators, even it has higher variance. This is because biasedness cannot be rid of if a model is constructed. However, the variance of the model can be lowered by collecting more observations, increasing the degrees of freedom.

Overall, we should be aware of the fact that the model with lowest training SSR (Sum of Squared Residuals) might not have the lowest test SSR. This is because, as the complexity of the model increases, given that the relationship that is looked for is not that complicated, the model will also detect the patterns in random error term which will give us less accurate predictions for future values. Thus one should have apriori assumptions about relationship between independent and dependent variables, and then construct its prediction model according to.

Random Statistics

Bias-Variance Trade off

No comments:

Post a Comment