File: Correlation-and-Regression-Analysis.html

package info (click to toggle)
octave 10.3.0-1
  • links: PTS, VCS
  • area: main
  • in suites:
  • size: 145,388 kB
  • sloc: cpp: 335,976; ansic: 82,241; fortran: 20,963; objc: 9,402; sh: 8,756; yacc: 4,392; lex: 4,333; perl: 1,544; java: 1,366; awk: 1,259; makefile: 659; xml: 192
file content (288 lines) | stat: -rw-r--r-- 18,873 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
<!DOCTYPE html>
<html>
<!-- Created by GNU Texinfo 7.1.1, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Correlation and Regression Analysis (GNU Octave (version 10.3.0))</title>

<meta name="description" content="Correlation and Regression Analysis (GNU Octave (version 10.3.0))">
<meta name="keywords" content="Correlation and Regression Analysis (GNU Octave (version 10.3.0))">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">

<link href="index.html" rel="start" title="Top">
<link href="Concept-Index.html" rel="index" title="Concept Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Statistics.html" rel="up" title="Statistics">
<link href="Distributions.html" rel="next" title="Distributions">
<link href="Basic-Statistical-Functions.html" rel="prev" title="Basic Statistical Functions">
<style type="text/css">
<!--
a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em}
div.example {margin-left: 3.2em}
span:hover a.copiable-link {visibility: visible}
strong.def-name {font-family: monospace; font-weight: bold; font-size: larger}
-->
</style>
<link rel="stylesheet" type="text/css" href="octave.css">


</head>

<body lang="en">
<div class="section-level-extent" id="Correlation-and-Regression-Analysis">
<div class="nav-panel">
<p>
Next: <a href="Distributions.html" accesskey="n" rel="next">Distributions</a>, Previous: <a href="Basic-Statistical-Functions.html" accesskey="p" rel="prev">Basic Statistical Functions</a>, Up: <a href="Statistics.html" accesskey="u" rel="up">Statistics</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<h3 class="section" id="Correlation-and-Regression-Analysis-1"><span>26.4 Correlation and Regression Analysis<a class="copiable-link" href="#Correlation-and-Regression-Analysis-1"> &para;</a></span></h3>


<a class="anchor" id="XREFcov"></a><span style="display:block; margin-top:-4.5ex;">&nbsp;</span>


<dl class="first-deftypefn">
<dt class="deftypefn" id="index-cov"><span><code class="def-type"><var class="var">c</var> =</code> <strong class="def-name">cov</strong> <code class="def-code-arguments">(<var class="var">x</var>)</code><a class="copiable-link" href="#index-cov"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-cov-1"><span><code class="def-type"><var class="var">c</var> =</code> <strong class="def-name">cov</strong> <code class="def-code-arguments">(<var class="var">x</var>, <var class="var">y</var>)</code><a class="copiable-link" href="#index-cov-1"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-cov-2"><span><code class="def-type"><var class="var">c</var> =</code> <strong class="def-name">cov</strong> <code class="def-code-arguments">(&hellip;, <var class="var">opt</var>)</code><a class="copiable-link" href="#index-cov-2"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-cov-3"><span><code class="def-type"><var class="var">c</var> =</code> <strong class="def-name">cov</strong> <code class="def-code-arguments">(&hellip;, <var class="var">nanflag</var>)</code><a class="copiable-link" href="#index-cov-3"> &para;</a></span></dt>
<dd><p>Compute the covariance matrix.
</p>
<p>The covariance between two variable vectors <var class="var">A</var> and  <var class="var">B</var> is
calculated as:
</p>
<div class="example">
<pre class="example-preformatted">cov (<var class="var">a</var>,<var class="var">b</var>) = 1/(N-1) * SUM_i (<var class="var">a</var>(i) - mean (<var class="var">a</var>)) * (<var class="var">b</var>(i) - mean (<var class="var">b</var>))
</pre></div>

<p>where <em class="math">N</em> is the length of the vectors <var class="var">a</var> and <var class="var">b</var>.
</p>
<p>If called with one argument, compute <code class="code">cov (<var class="var">x</var>, <var class="var">x</var>)</code>.  If
<var class="var">x</var> is a vector, this is the scalar variance of <var class="var">x</var>.  If <var class="var">x</var> is
a matrix, each row of <var class="var">x</var> is treated as an observation, and each column
as a variable, and the (<var class="var">i</var>,&nbsp;<var class="var">j</var>)-th<!-- /@w -->&nbsp;entry of
<code class="code">cov (<var class="var">x</var>)</code> is the covariance between the <var class="var">i</var>-th and
<var class="var">j</var>-th columns in <var class="var">x</var>.  If <var class="var">x</var> has dimensions n x m, the output
<var class="var">c</var> will be a m x m square covariance matrix.
</p>
<p>If called with two arguments, compute <code class="code">cov (<var class="var">x</var>, <var class="var">y</var>)</code>, the
covariance between two random variables <var class="var">x</var> and <var class="var">y</var>.  <var class="var">x</var> and
<var class="var">y</var> must have the same number of elements, and will be treated as
vectors with the covariance computed as
<code class="code">cov (<var class="var">x</var>(:), <var class="var">y</var>(:))</code>.  The output will be a 2 x 2
covariance matrix.
</p>
<p>The optional argument <var class="var">opt</var> determines the type of normalization to
use.  Valid values are
</p>
<dl class="table">
<dt>0 [default]:</dt>
<dd><p>Normalize with <em class="math">N-1</em>.  This provides the best unbiased estimator of
the covariance.
</p>
</dd>
<dt>1:</dt>
<dd><p>Normalize with <em class="math">N</em>.  This provides the second moment around the
mean.  <var class="var">opt</var> is set to 1 for N = 1.
</p></dd>
</dl>

<p>The optional argument <var class="var">nanflag</var> must appear last in the argument list
and controls how NaN values are handled by <code class="code">cov</code>.  The three valid
values are:
</p>
<dl class="table">
<dt>includenan [default]:</dt>
<dd><p>Leave NaN values in <var class="var">x</var> and <var class="var">y</var>.  Output will follow the normal
rules for handling NaN values in arithmetic operations.
</p>
</dd>
<dt>omitrows:</dt>
<dd><p>Rows containing NaN values are trimmed from both <var class="var">x</var> and <var class="var">y</var>
prior to calculating the covariance.  A NaN in one variable will remove
that row from both <var class="var">x</var> and <var class="var">y</var>.
</p>
</dd>
<dt>partialrows:</dt>
<dd><p>Rows containing NaN values are ignored from both <var class="var">x</var> and <var class="var">y</var>
  independently for each <var class="var">i</var>-th and <var class="var">j</var>-th covariance
  calculation.  This may result in a different number of observations,
<em class="math">N</em>, being used to calculated each element of the covariance matrix.
</p></dd>
</dl>

<p>Compatibility Note: Before Octave v9.1.0, <code class="code">cov</code> treated rows
<var class="var">x</var> and <var class="var">y</var> as multivariate random variables.  Newer versions
attempt to maintain full compatibility with <small class="sc">MATLAB</small> by treating
<var class="var">x</var> and <var class="var">y</var> as two univariate distributions regardless of shape,
resulting in a 2x2 output matrix.  Code relying on Octave&rsquo;s previous
definition will need to be modified when running this newer version of
<code class="code">cov</code>.  The previous behavior can be obtained by using the
NaN package&rsquo;s <code class="code">covm</code> function as <code class="code">covm (<var class="var">x</var>, <var class="var">y</var>, &quot;D&quot;)</code>.
</p>
<p><strong class="strong">See also:</strong> <a class="ref" href="#XREFcorr">corr</a>.
</p></dd></dl>


<a class="anchor" id="XREFcorr"></a><span style="display:block; margin-top:-4.5ex;">&nbsp;</span>


<dl class="first-deftypefn">
<dt class="deftypefn" id="index-corr"><span><code class="def-type"><var class="var">r</var> =</code> <strong class="def-name">corr</strong> <code class="def-code-arguments">(<var class="var">x</var>)</code><a class="copiable-link" href="#index-corr"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-corr-1"><span><code class="def-type"><var class="var">r</var> =</code> <strong class="def-name">corr</strong> <code class="def-code-arguments">(<var class="var">x</var>, <var class="var">y</var>)</code><a class="copiable-link" href="#index-corr-1"> &para;</a></span></dt>
<dd><p>Compute matrix of correlation coefficients.
</p>
<p>If each row of <var class="var">x</var> and <var class="var">y</var> is an observation and each column is
a variable, then the (<var class="var">i</var>,&nbsp;<var class="var">j</var>)-th<!-- /@w -->&nbsp;entry of
<code class="code">corr (<var class="var">x</var>, <var class="var">y</var>)</code> is the correlation between the
<var class="var">i</var>-th variable in <var class="var">x</var> and the <var class="var">j</var>-th variable in <var class="var">y</var>.
<var class="var">x</var> and <var class="var">y</var> must have the same number of rows (observations).  The
correlation coefficient is calculated for two variable vectors <var class="var">A</var> and
<var class="var">B</var> (columns of <var class="var">x</var> and <var class="var">y</var>) as:
</p>
<div class="example">
<pre class="example-preformatted">corr (<var class="var">A</var>,<var class="var">B</var>) = cov (<var class="var">A</var>,<var class="var">B</var>) / (std (<var class="var">A</var>) * std (<var class="var">B</var>))
</pre></div>

<p>The output variable <var class="var">r</var> will have size n x m, where n and m are the
number of variables (columns) in <var class="var">x</var> and <var class="var">y</var>, respectively.  Note
that as the standard deviation of any scalar is zero, the correlation
coefficient will be returned as NaN for any scalar or single-row inputs.
</p>
<p>If called with one argument, compute <code class="code">corr (<var class="var">x</var>, <var class="var">x</var>)</code>,
the correlation between the each pair of columns of <var class="var">x</var>.
</p>
<p><strong class="strong">See also:</strong> <a class="ref" href="#XREFcov">cov</a>, <a class="ref" href="#XREFcorrcoef">corrcoef</a>.
</p></dd></dl>


<a class="anchor" id="XREFcorrcoef"></a><span style="display:block; margin-top:-4.5ex;">&nbsp;</span>


<dl class="first-deftypefn">
<dt class="deftypefn" id="index-corrcoef"><span><code class="def-type"><var class="var">r</var> =</code> <strong class="def-name">corrcoef</strong> <code class="def-code-arguments">(<var class="var">x</var>)</code><a class="copiable-link" href="#index-corrcoef"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-corrcoef-1"><span><code class="def-type"><var class="var">r</var> =</code> <strong class="def-name">corrcoef</strong> <code class="def-code-arguments">(<var class="var">x</var>, <var class="var">y</var>)</code><a class="copiable-link" href="#index-corrcoef-1"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-corrcoef-2"><span><code class="def-type"><var class="var">r</var> =</code> <strong class="def-name">corrcoef</strong> <code class="def-code-arguments">(&hellip;, <var class="var">param</var>, <var class="var">value</var>, &hellip;)</code><a class="copiable-link" href="#index-corrcoef-2"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-corrcoef-3"><span><code class="def-type">[<var class="var">r</var>, <var class="var">p</var>] =</code> <strong class="def-name">corrcoef</strong> <code class="def-code-arguments">(&hellip;)</code><a class="copiable-link" href="#index-corrcoef-3"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-corrcoef-4"><span><code class="def-type">[<var class="var">r</var>, <var class="var">p</var>, <var class="var">lci</var>, <var class="var">hci</var>] =</code> <strong class="def-name">corrcoef</strong> <code class="def-code-arguments">(&hellip;)</code><a class="copiable-link" href="#index-corrcoef-4"> &para;</a></span></dt>
<dd><p>Compute a matrix of correlation coefficients.
</p>
<p><var class="var">x</var> is an array where each column contains a variable and each row is
an observation.
</p>
<p>If a second input <var class="var">y</var> (of the same size as <var class="var">x</var>) is given then
calculate the correlation coefficients between <var class="var">x</var> and <var class="var">y</var>.
</p>
<p><var class="var">param</var>, <var class="var">value</var> are optional pairs of parameters and values which
modify the calculation.  Valid options are:
</p>
<dl class="table">
<dt><code class="code">&quot;alpha&quot;</code></dt>
<dd><p>Confidence level used for the bounds of the confidence interval, <var class="var">lci</var>
and <var class="var">hci</var>.  Default is 0.05, i.e., 95% confidence interval.
</p>
</dd>
<dt><code class="code">&quot;rows&quot;</code></dt>
<dd><p>Determine processing of NaN values.  Acceptable values are <code class="code">&quot;all&quot;</code>,
<code class="code">&quot;complete&quot;</code>, and <code class="code">&quot;pairwise&quot;</code>.  Default is <code class="code">&quot;all&quot;</code>.
With <code class="code">&quot;complete&quot;</code>, only the rows without NaN values are considered.
With <code class="code">&quot;pairwise&quot;</code>, the selection of NaN-free rows is made for each
pair of variables.
</p></dd>
</dl>

<p>Output <var class="var">r</var> is a matrix of Pearson&rsquo;s product moment correlation
coefficients for each pair of variables.
</p>
<p>Output <var class="var">p</var> is a matrix of pair-wise p-values testing for the null
hypothesis of a correlation coefficient of zero.
</p>
<p>Outputs <var class="var">lci</var> and <var class="var">hci</var> are matrices containing, respectively, the
lower and higher bounds of the 95% confidence interval of each correlation
coefficient.
</p>
<p><strong class="strong">See also:</strong> <a class="ref" href="#XREFcorr">corr</a>, <a class="ref" href="#XREFcov">cov</a>, <a class="ref" href="Descriptive-Statistics.html#XREFstd">std</a>.
</p></dd></dl>


<a class="anchor" id="XREFspearman"></a><span style="display:block; margin-top:-4.5ex;">&nbsp;</span>


<dl class="first-deftypefn">
<dt class="deftypefn" id="index-spearman"><span><code class="def-type"><var class="var">rho</var> =</code> <strong class="def-name">spearman</strong> <code class="def-code-arguments">(<var class="var">x</var>)</code><a class="copiable-link" href="#index-spearman"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-spearman-1"><span><code class="def-type"><var class="var">rho</var> =</code> <strong class="def-name">spearman</strong> <code class="def-code-arguments">(<var class="var">x</var>, <var class="var">y</var>)</code><a class="copiable-link" href="#index-spearman-1"> &para;</a></span></dt>
<dd><a class="index-entry-id" id="index-Spearman_0027s-Rho"></a>
<p>Compute Spearman&rsquo;s rank correlation coefficient
<var class="var">rho</var>.
</p>
<p>For two data vectors <var class="var">x</var> and <var class="var">y</var>, Spearman&rsquo;s
<var class="var">rho</var>
is the correlation coefficient of the ranks of <var class="var">x</var> and <var class="var">y</var>.
</p>
<p>If <var class="var">x</var> and <var class="var">y</var> are drawn from independent distributions,
<var class="var">rho</var>
has zero mean and variance
<code class="code">1 / (N - 1)</code>,
where <em class="math">N</em> is the length of the <var class="var">x</var> and <var class="var">y</var> vectors, and is
asymptotically normally distributed.
</p>
<p><code class="code">spearman (<var class="var">x</var>)</code> is equivalent to
<code class="code">spearman (<var class="var">x</var>, <var class="var">x</var>)</code>.
</p>
<p><strong class="strong">See also:</strong> <a class="ref" href="Basic-Statistical-Functions.html#XREFranks">ranks</a>, <a class="ref" href="#XREFkendall">kendall</a>.
</p></dd></dl>


<a class="anchor" id="XREFkendall"></a><span style="display:block; margin-top:-4.5ex;">&nbsp;</span>


<dl class="first-deftypefn">
<dt class="deftypefn" id="index-kendall"><span><code class="def-type"><var class="var">tau</var> =</code> <strong class="def-name">kendall</strong> <code class="def-code-arguments">(<var class="var">x</var>)</code><a class="copiable-link" href="#index-kendall"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-kendall-1"><span><code class="def-type"><var class="var">tau</var> =</code> <strong class="def-name">kendall</strong> <code class="def-code-arguments">(<var class="var">x</var>, <var class="var">y</var>)</code><a class="copiable-link" href="#index-kendall-1"> &para;</a></span></dt>
<dd><a class="index-entry-id" id="index-Kendall_0027s-Tau"></a>
<p>Compute Kendall&rsquo;s
<var class="var">tau</var>.
</p>
<p>For two data vectors <var class="var">x</var>, <var class="var">y</var> of common length <em class="math">N</em>, Kendall&rsquo;s
<var class="var">tau</var>
is the correlation of the signs of all rank differences of
<var class="var">x</var> and <var class="var">y</var>; i.e., if both <var class="var">x</var> and <var class="var">y</var> have distinct
entries, then
</p>
<div class="example">
<div class="group"><pre class="example-preformatted">         1
<var class="var">tau</var> = -------   SUM sign (<var class="var">q</var>(i) - <var class="var">q</var>(j)) * sign (<var class="var">r</var>(i) - <var class="var">r</var>(j))
      N (N-1)   i,j
</pre></div></div>

<p>in which the
<var class="var">q</var>(i) and <var class="var">r</var>(i)
are the ranks of <var class="var">x</var> and <var class="var">y</var>, respectively.
</p>
<p>If <var class="var">x</var> and <var class="var">y</var> are drawn from independent distributions,
Kendall&rsquo;s
<var class="var">tau</var>
is asymptotically normal with mean 0 and variance
<code class="code">(2 * (2N+5)) / (9 * N * (N-1))</code>.
</p>
<p><code class="code">kendall (<var class="var">x</var>)</code> is equivalent to <code class="code">kendall (<var class="var">x</var>,
<var class="var">x</var>)</code>.
</p>
<p><strong class="strong">See also:</strong> <a class="ref" href="Basic-Statistical-Functions.html#XREFranks">ranks</a>, <a class="ref" href="#XREFspearman">spearman</a>.
</p></dd></dl>


</div>
<hr>
<div class="nav-panel">
<p>
Next: <a href="Distributions.html">Distributions</a>, Previous: <a href="Basic-Statistical-Functions.html">Basic Statistical Functions</a>, Up: <a href="Statistics.html">Statistics</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
</div>



</body>
</html>