1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
|
"""
===========================================
Ordinary Least Squares and Ridge Regression
===========================================
1. Ordinary Least Squares:
We illustrate how to use the ordinary least squares (OLS) model,
:class:`~sklearn.linear_model.LinearRegression`, on a single feature of
the diabetes dataset. We train on a subset of the data, evaluate on a
test set, and visualize the predictions.
2. Ordinary Least Squares and Ridge Regression Variance:
We then show how OLS can have high variance when the data is sparse or
noisy, by fitting on a very small synthetic sample repeatedly. Ridge
regression, :class:`~sklearn.linear_model.Ridge`, reduces this variance
by penalizing (shrinking) the coefficients, leading to more stable
predictions.
"""
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
# %%
# Data Loading and Preparation
# ----------------------------
#
# Load the diabetes dataset. For simplicity, we only keep a single feature in the data.
# Then, we split the data and target into training and test sets.
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
X, y = load_diabetes(return_X_y=True)
X = X[:, [2]] # Use only one feature
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20, shuffle=False)
# %%
# Linear regression model
# -----------------------
#
# We create a linear regression model and fit it on the training data. Note that by
# default, an intercept is added to the model. We can control this behavior by setting
# the `fit_intercept` parameter.
from sklearn.linear_model import LinearRegression
regressor = LinearRegression().fit(X_train, y_train)
# %%
# Model evaluation
# ----------------
#
# We evaluate the model's performance on the test set using the mean squared error
# and the coefficient of determination.
from sklearn.metrics import mean_squared_error, r2_score
y_pred = regressor.predict(X_test)
print(f"Mean squared error: {mean_squared_error(y_test, y_pred):.2f}")
print(f"Coefficient of determination: {r2_score(y_test, y_pred):.2f}")
# %%
# Plotting the results
# --------------------
#
# Finally, we visualize the results on the train and test data.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(ncols=2, figsize=(10, 5), sharex=True, sharey=True)
ax[0].scatter(X_train, y_train, label="Train data points")
ax[0].plot(
X_train,
regressor.predict(X_train),
linewidth=3,
color="tab:orange",
label="Model predictions",
)
ax[0].set(xlabel="Feature", ylabel="Target", title="Train set")
ax[0].legend()
ax[1].scatter(X_test, y_test, label="Test data points")
ax[1].plot(X_test, y_pred, linewidth=3, color="tab:orange", label="Model predictions")
ax[1].set(xlabel="Feature", ylabel="Target", title="Test set")
ax[1].legend()
fig.suptitle("Linear Regression")
plt.show()
# %%
#
# OLS on this single-feature subset learns a linear function that minimizes
# the mean squared error on the training data. We can see how well (or poorly)
# it generalizes by looking at the R^2 score and mean squared error on the
# test set. In higher dimensions, pure OLS often overfits, especially if the
# data is noisy. Regularization techniques (like Ridge or Lasso) can help
# reduce that.
# %%
# Ordinary Least Squares and Ridge Regression Variance
# ----------------------------------------------------------
#
# Next, we illustrate the problem of high variance more clearly by using
# a tiny synthetic dataset. We sample only two data points, then repeatedly
# add small Gaussian noise to them and refit both OLS and Ridge. We plot
# each new line to see how much OLS can jump around, whereas Ridge remains
# more stable thanks to its penalty term.
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
X_train = np.c_[0.5, 1].T
y_train = [0.5, 1]
X_test = np.c_[0, 2].T
np.random.seed(0)
classifiers = dict(
ols=linear_model.LinearRegression(), ridge=linear_model.Ridge(alpha=0.1)
)
for name, clf in classifiers.items():
fig, ax = plt.subplots(figsize=(4, 3))
for _ in range(6):
this_X = 0.1 * np.random.normal(size=(2, 1)) + X_train
clf.fit(this_X, y_train)
ax.plot(X_test, clf.predict(X_test), color="gray")
ax.scatter(this_X, y_train, s=3, c="gray", marker="o", zorder=10)
clf.fit(X_train, y_train)
ax.plot(X_test, clf.predict(X_test), linewidth=2, color="blue")
ax.scatter(X_train, y_train, s=30, c="red", marker="+", zorder=10)
ax.set_title(name)
ax.set_xlim(0, 2)
ax.set_ylim((0, 1.6))
ax.set_xlabel("X")
ax.set_ylabel("y")
fig.tight_layout()
plt.show()
# %%
# Conclusion
# ----------
#
# - In the first example, we applied OLS to a real dataset, showing
# how a plain linear model can fit the data by minimizing the squared error
# on the training set.
#
# - In the second example, OLS lines varied drastically each time noise
# was added, reflecting its high variance when data is sparse or noisy. By
# contrast, **Ridge** regression introduces a regularization term that shrinks
# the coefficients, stabilizing predictions.
#
# Techniques like :class:`~sklearn.linear_model.Ridge` or
# :class:`~sklearn.linear_model.Lasso` (which applies an L1 penalty) are both
# common ways to improve generalization and reduce overfitting. A well-tuned
# Ridge or Lasso often outperforms pure OLS when features are correlated, data
# is noisy, or sample size is small.
|