9.3 Regression Evaluation

: 30 minutes

(James et al. 2021) Page 29 (mse), 69 (r2).

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in r and Python. 2nd ed. Springer Texts in Statistics. Springer. https://doi.org/10.1007/978-1-0716-1418-1.

A regression task involves predicting a quantitative (numerical) outcome based on one or more input variables. For example, predicting someone’s income based on their age, education, and job title.

The goal is to model the relationship between predictors and a continuous response.

Y \approx f(X) + \varepsilon

Y is the quantitative response, X represents the predictor(s), f(X) is an unknown function capturing the relationship between X and Y, and \varepsilon is the irreducible error term.

Loss Functions

To evaluate the performance of a statistical learning method on a given data set, we need some way to measure how well its predictions actually match the observed data. That is, we need to quantify the extent to which the predicted response value for a given observation is close to the true response value for that observation.

Mean Squared Error (MSE)

The most commonly used performance measure in the regression setting is the Mean Squared Error (MSE).

\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{f}(x_i))^2

y_i is the true response for the i-th observation, \hat{f}(x_i) is the predicted value from the model \hat{f}, and n is the number of observations.

The MSE quantifies the average squared difference between the predicted values and the actual outcomes.

A small \downarrow \text{MSE} indicates that predictions are close to the true values.
A large \uparrow \text{MSE} MSE suggests substantial prediction error for at least some observations.

Example

Let’s say we have true values y = [3, -0.5, 2, 7] and predictions \hat{y} = [2.5, 0.0, 2, 8]. We can compute the Mean Squared Error (MSE) manually:

1. Compute squared errors for each observation:

(3 - 2.5)^2 = 0.25,\quad (-0.5 - 0.0)^2 = 0.25,\quad (2 - 2)^2 = 0,\quad (7 - 8)^2 = 1

2. Sum the squared errors:

0.25 + 0.25 + 0 + 1 = 1.5

3. Compute the Mean Squared Error:

\text{MSE} = \frac{1.5}{4} = 0.375

Mean Absolute Error (MAE)

The Mean Absolute Error (MAE) is another widely used regression performance metric:

\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{f}(x_i) \right|

y_i is the true response for the i-th observation, \hat{f}(x_i) is the predicted value, and n is the number of observations.

The MAE captures the average absolute deviation between predicted values and actual outcomes.

A smaller \downarrow \text{MAE} implies more accurate predictions.
A large \uparrow \text{MAE} that, on average, predictions deviate substantially from the true values.

When MAE might be better to use

MAE is less sensitive to outliers than MSE, making it preferable when large errors should not be overly penalized.

Example

Let’s say we have true values y = [3, -0.5, 2, 7] and predictions \hat{y} = [2.5, 0.0, 2, 8]. We can compute the Mean Absolute Error (MAE) manually:

1. Compute absolute errors for each observation:

|3 - 2.5| = 0.5,\quad |-0.5 - 0.0| = 0.5,\quad |2 - 2| = 0,\quad |7 - 8| = 1

2. Sum the absolute errors:

0.5 + 0.5 + 0 + 1 = 2

3. Compute the Mean Absolute Error:

\text{MAE} = \frac{2}{4} = 0.5

Metrics

To evaluate the performance of a regression model, we need a way to quantify how well its predictions align with the observed outcomes. Regression metrics provide numerical summaries of the discrepancy between the predicted and actual response values across a data set. These metrics help us assess whether the model makes accurate predictions and guide us in comparing different models or tuning their parameters.

\text{R}^2 Score

The \text{R}^2 metric provides a scale-free fit, taking values between 0 and 1.

\text{R}^2 = 1 - \frac{\text{RSS}}{\text{TSS}}

Total Sum of Squares (TSS) is the total variability in the response.

\text{TSS} = \sum_{i=1}^{n}(y_i - \bar{y})^2

Residual Sum of Squares (RSS) is the variability left unexplained by the model.

\text{RSS} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2

\text{R}^2 measures the proportion of variance in the response that is explained by the predictors.

\text{R}^2 \approx 1: model explains most of the variance.
\text{R}^2 \approx 0: model explains very little.

To interpret the regression metric, what counts as a “good” \text{R}^2 depends on context.

High \uparrow \text{R}^2 is expected in physics/engineering.
Low \downarrow \text{R}^2 is common in fields like psychology or marketing due to high inherent variability.

To connect it to correlation metric:

\text{R}^2 = r^2 \quad \text{where} \quad r = \text{Cor}(X, Y)

This highlights \text{R}^2 as a generalization of the squared correlation in the simple case.

Example

Let’s say we have true values y = [3, -0.5, 2, 7] and predictions \hat{y} = [2.5, 0.0, 2, 8]. We can compute \text{R}^2 manually:

1. Compute the mean of y:

\bar{y} = \frac{3 + (-0.5) + 2 + 7}{4} = \frac{11.5}{4} = 2.875

2. Compute Total Sum of Squares (TSS):

\text{TSS} = \sum (y_i - \bar{y})^2 = (3 - 2.875)^2 + (-0.5 - 2.875)^2 + (2 - 2.875)^2 + (7 - 2.875)^2 = 0.0156 + 11.3906 + 0.7656 + 17.0156 = 29.1875

3. Compute Residual Sum of Squares (RSS):

\text{RSS} = \sum (y_i - \hat{y}_i)^2 = (3 - 2.5)^2 + (-0.5 - 0.0)^2 + (2 - 2)^2 + (7 - 8)^2 = 0.25 + 0.25 + 0 + 1 = 1.5

4. Compute \text{R}^2:

\text{R}^2 = 1 - \frac{\text{RSS}}{\text{TSS}} = 1 - \frac{1.5}{29.1875} \approx 0.9486