Bias-Variance Trade off
For example, linear regression models are inflexible models, since it can only detect the linear relations between dependent and independent variables. So, the estimators will not change that if one of the observations change. However, when it comes to model that is able to detect complicated nonlinearities, its estimators will be affected by the changes in observation.
Let’s assume that the relationship that we are going to evaluate is linear, if we use more complicated model rather than linear model, there is no such difference based on the unbiasedness, since linear model is able to detect the relationship. However, since complicated model will overfit the dataset, its estimations for future values will be less accurate This happens because the statistical learning procedure of complicated model is working too hard to find patterns in the training data and may be picking up some patterns that are just caused by random chance rather than by true properties of the unknown function.
We have to indicate that, one should select the model without biasedness in its estimators, even it has higher variance. This is because biasedness cannot be rid of if a model is constructed. However, the variance of the model can be lowered by collecting more observations, increasing the degrees of freedom.
Overall, we should be aware of the fact that the model with lowest training SSR (Sum of Squared Residuals) might not have the lowest test SSR. This is because, as the complexity of the model increases, given that the relationship that is looked for is not that complicated, the model will also detect the patterns in random error term which will give us less accurate predictions for future values. Thus one should have apriori assumptions about relationship between independent and dependent variables, and then construct its prediction model according to.
No comments:
Post a Comment