File: plot_linear_model.py

package info (click to toggle)
openturns 1.26-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 67,708 kB
  • sloc: cpp: 261,605; python: 67,030; ansic: 4,378; javascript: 406; sh: 185; xml: 164; makefile: 101
file content (209 lines) | stat: -rw-r--r-- 5,556 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
"""
Create a linear model
=====================
In this example we create a metamodel model using a linear model approximation
with the :class:`~openturns.LinearModelAlgorithm` class.
We show how the :class:`~openturns.LinearModelAnalysis` class
can be used to produce the statistical analysis of the least squares
regression model.
"""

# %%
# Introduction
# ~~~~~~~~~~~~
#
# The following 2-d function is used in this example:
#
# .. math::
#
#     \model(x,y) = 3 + 2x - y
#
# for any :math:`x, y \in \Rset`.
#
# Notice that this model is linear:
#
# .. math::
#
#     \model(x,y) = \beta_1 + \beta_2 x + \beta_3 y
#
# where :math:`\beta_1 = 3`, :math:`\beta_2 = 2` and :math:`\beta_3 = -1`.
#
# We consider noisy observations of the output:
#
# .. math::
#
#     Y = \model(x,y) + \epsilon
#
# where :math:`\epsilon \sim \cN(0, \sigma^2)` with :math:`\sigma > 0`
# is the standard deviation.
# In our example, we use :math:`\sigma = 0.1`.
#
# Finally, we use :math:`n = 1000` independent observations of the model.
#

# %%
import openturns as ot
import openturns.viewer as otv

# %%
# Simulate the data set
# ~~~~~~~~~~~~~~~~~~~~~
#
# We first generate the data and we add noise to the output observations:

# %%
distribution = ot.Normal(2)
distribution.setDescription(["x", "y"])
sampleSize = 1000
func = ot.SymbolicFunction(["x", "y"], ["3 + 2 * x - y"])
input_sample = distribution.getSample(sampleSize)
epsilon = ot.Normal(0, 0.1).getSample(sampleSize)
output_sample = func(input_sample) + epsilon

# %%
# Linear regression
# ~~~~~~~~~~~~~~~~~
#
# Let us run the linear model algorithm using the :class:`~openturns.LinearModelAlgorithm`
# class and get the associated results:

# %%
algo = ot.LinearModelAlgorithm(input_sample, output_sample)
result = algo.getResult()

# %%
# Residuals analysis
# ~~~~~~~~~~~~~~~~~~
#
# We can now analyze the residuals of the regression on the training data.
# For clarity purposes, only the first 5 residual values are printed.

# %%
residuals = result.getSampleResiduals()
residuals[:5]

# %%
# Alternatively, the standardized residuals can be used:

# %%
stdresiduals = result.getStandardizedResiduals()
stdresiduals[:5]

# %%
# We can also get the noise distribution which is assumed to be gaussian:

# %%
result.getNoiseDistribution()


# %%
# Analysis of the results
# ~~~~~~~~~~~~~~~~~~~~~~~
#
# In order to post-process the linear regression results, the :class:`~openturns.LinearModelAnalysis`
# class can be used:

# %%
analysis = ot.LinearModelAnalysis(result)
analysis

# %%
# The results seem to indicate that the linear model is satisfactory.
#
# - The basis uses the three functions :math:`1` (which is called the intercept),
#   :math:`x` and :math:`y`.
# - Each row of the table of coefficients tests if one single coefficient is zero.
#   The probability of observing a large value of the T statistics is close to
#   zero for all coefficients.
#   Hence, we can reject the hypothesis that any coefficient is zero.
#   In other words, all the coefficients are significantly nonzero.
# - The :math:`R^2` score is close to 1.
#   Furthermore, the adjusted :math:`R^2` value, which takes into account the size of
#   the data set and the number of hyperparameters, is similar to the
#   unadjusted :math:`R^2` score.
#   Most of the variance is explained by the linear model.
# - The F-test tests if all coefficients are simultaneously zero.
#   The `Fisher-Snedecor` p-value is lower than 1%.
#   Hence, there is at least one nonzero coefficient.
# - The normality test checks that the residuals have a normal distribution.
#   The normality assumption can be rejected if the p-value is close to zero.
#   The p-values are larger than 0.05: the normality assumption of the
#   residuals is not rejected.
#

# %%
# Graphical analyses
# ~~~~~~~~~~~~~~~~~~
#
# Let us compare model and fitted values:

# %%
graph = analysis.drawModelVsFitted()
view = otv.View(graph)

# %%
# The previous figure seems to indicate that the linearity hypothesis
# is accurate.

# %%
# Residuals can be plotted against the fitted values.

# %%
graph = analysis.drawResidualsVsFitted()
view = otv.View(graph)

# %%
graph = analysis.drawScaleLocation()
view = otv.View(graph)

# %%
graph = analysis.drawQQplot()
view = otv.View(graph)

# %%
# In this case, the two distributions are very close: there is no obvious
# outlier.
#
# Cook's distance measures the impact of every individual data point on the
# linear regression, and can be plotted as follows:

# %%
graph = analysis.drawCookDistance()
view = otv.View(graph)

# %%
# This graph shows us the index of the points with disproportionate influence.
#
# One of the components of the computation of Cook's distance at a given
# point is that point's *leverage*.
# High-leverage points are far from their closest neighbors, so the fitted
# linear regression model must pass close to them.

# sphinx_gallery_thumbnail_number = 6
graph = analysis.drawResidualsVsLeverages()
view = otv.View(graph)

# %%
# In this case, there seems to be no obvious influential outlier characterized
# by large leverage and residual values.
#
# Similarly, we can also plot Cook's distances as a function of the sample
# leverages:

# %%
graph = analysis.drawCookVsLeverages()
view = otv.View(graph)

# %%
# Finally, we give the intervals for each estimated coefficient (95% confidence
# interval):

# %%
alpha = 0.95
interval = analysis.getCoefficientsConfidenceInterval(alpha)
print("confidence intervals with level=%1.2f: " % (alpha))
print("%s" % (interval))


# %%
otv.View.ShowAll()