The reason is maybe there are grave implications to getting far from the center? Using the linearly proportional penalty function, the regression will assign less weight to outliers than when using the squared proportional penalty function.

However the absolute loss has the disadvantage that it is not differentiable at a = 0.

Historically, Laplace originally considered the maximum observed error as a measure of the correctness of a model.

New York: Springer. Other measures of cost are possible, for example mortality or morbidity in the field of public health or safety engineering. Therefore errors are not 'equally bad' but 'proportionally bad' as twice the error gets twice the penalty.

Choose the decision rule with the lowest average loss. Both absolute values and squared values are used based on the use-case.

But for risk-averse (or risk-loving) agents, loss is measured as the negative of a utility function, which represents satisfaction and is usually interpreted in ordinal terms rather than in cardinal (absolute) terms. Thus, squared error penalizes large errors more than does absolute error and is more forgiving of small errors than absolute error is. In economics, when an agent is risk neutral, the objective function is simply expressed in monetary terms, such as profit, income, or end-of-period wealth.

Values of MSE may be used for comparative purposes. Minimizing MSE is a key criterion in selecting estimators: see minimum mean-square error. Basically MAE is more robust to outlier than is MSE.

There is no really "good" reason that squared is used instead of higher powers (or, indeed, non-polynomial penalty functions). The reason minimizing squared error is preferred is because it prevents large errors better.

Since the value of this variable is uncertain, so is the value of the utility function; it is the expected value of utility that is maximized. Which depending on the application may not as closely characterize peoples opinions as: One 7-unit loss is just as bad as forty-nine 1-unit losses. In financial risk management the function is precisely mapped to a monetary loss.

It's the projection of Y onto the column space of X. The result for S n − 1 2 follows easily from the χ n − 1 2 variance that is 2 n − 2. With such a function, each deviation from the mean is given a proportional corresponding error.

Suppose the sample units were chosen with replacement. The goal of estimation is to find a function that models its input well: if it were applied to the training set, it should predict the values (or class labels) associated

For an infinite family of models, it is a set of parameters to the family of distributions. He soon moved to considering MAD instead.

Robust Regression, but require iterative solutions to estimates that are, in general, neither unique nor available in closed form and can be computationally expensive. If deviations become worse for you the farther away you are from the optimum and you don't care about whether the deviation is positive or negative, then the squared loss function

However the statistical properties of your solution might be hard to assess. Contents 1 Definition and basic properties 1.1 Predictor 1.2 Estimator 1.2.1 Proof of variance and bias relationship 2 Regression 3 Examples 3.1 Mean 3.2 Variance 3.3 Gaussian distribution 4 Interpretation 5. The MSE is the second moment (about the origin) of the error, and thus incorporates both the variance of the estimator and its bias.

