1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288
|
<!DOCTYPE html>
<html>
<!-- Created by GNU Texinfo 7.1.1, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Correlation and Regression Analysis (GNU Octave (version 10.3.0))</title>
<meta name="description" content="Correlation and Regression Analysis (GNU Octave (version 10.3.0))">
<meta name="keywords" content="Correlation and Regression Analysis (GNU Octave (version 10.3.0))">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Concept-Index.html" rel="index" title="Concept Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Statistics.html" rel="up" title="Statistics">
<link href="Distributions.html" rel="next" title="Distributions">
<link href="Basic-Statistical-Functions.html" rel="prev" title="Basic Statistical Functions">
<style type="text/css">
<!--
a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em}
div.example {margin-left: 3.2em}
span:hover a.copiable-link {visibility: visible}
strong.def-name {font-family: monospace; font-weight: bold; font-size: larger}
-->
</style>
<link rel="stylesheet" type="text/css" href="octave.css">
</head>
<body lang="en">
<div class="section-level-extent" id="Correlation-and-Regression-Analysis">
<div class="nav-panel">
<p>
Next: <a href="Distributions.html" accesskey="n" rel="next">Distributions</a>, Previous: <a href="Basic-Statistical-Functions.html" accesskey="p" rel="prev">Basic Statistical Functions</a>, Up: <a href="Statistics.html" accesskey="u" rel="up">Statistics</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<h3 class="section" id="Correlation-and-Regression-Analysis-1"><span>26.4 Correlation and Regression Analysis<a class="copiable-link" href="#Correlation-and-Regression-Analysis-1"> ¶</a></span></h3>
<a class="anchor" id="XREFcov"></a><span style="display:block; margin-top:-4.5ex;"> </span>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-cov"><span><code class="def-type"><var class="var">c</var> =</code> <strong class="def-name">cov</strong> <code class="def-code-arguments">(<var class="var">x</var>)</code><a class="copiable-link" href="#index-cov"> ¶</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-cov-1"><span><code class="def-type"><var class="var">c</var> =</code> <strong class="def-name">cov</strong> <code class="def-code-arguments">(<var class="var">x</var>, <var class="var">y</var>)</code><a class="copiable-link" href="#index-cov-1"> ¶</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-cov-2"><span><code class="def-type"><var class="var">c</var> =</code> <strong class="def-name">cov</strong> <code class="def-code-arguments">(…, <var class="var">opt</var>)</code><a class="copiable-link" href="#index-cov-2"> ¶</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-cov-3"><span><code class="def-type"><var class="var">c</var> =</code> <strong class="def-name">cov</strong> <code class="def-code-arguments">(…, <var class="var">nanflag</var>)</code><a class="copiable-link" href="#index-cov-3"> ¶</a></span></dt>
<dd><p>Compute the covariance matrix.
</p>
<p>The covariance between two variable vectors <var class="var">A</var> and <var class="var">B</var> is
calculated as:
</p>
<div class="example">
<pre class="example-preformatted">cov (<var class="var">a</var>,<var class="var">b</var>) = 1/(N-1) * SUM_i (<var class="var">a</var>(i) - mean (<var class="var">a</var>)) * (<var class="var">b</var>(i) - mean (<var class="var">b</var>))
</pre></div>
<p>where <em class="math">N</em> is the length of the vectors <var class="var">a</var> and <var class="var">b</var>.
</p>
<p>If called with one argument, compute <code class="code">cov (<var class="var">x</var>, <var class="var">x</var>)</code>. If
<var class="var">x</var> is a vector, this is the scalar variance of <var class="var">x</var>. If <var class="var">x</var> is
a matrix, each row of <var class="var">x</var> is treated as an observation, and each column
as a variable, and the (<var class="var">i</var>, <var class="var">j</var>)-th<!-- /@w --> entry of
<code class="code">cov (<var class="var">x</var>)</code> is the covariance between the <var class="var">i</var>-th and
<var class="var">j</var>-th columns in <var class="var">x</var>. If <var class="var">x</var> has dimensions n x m, the output
<var class="var">c</var> will be a m x m square covariance matrix.
</p>
<p>If called with two arguments, compute <code class="code">cov (<var class="var">x</var>, <var class="var">y</var>)</code>, the
covariance between two random variables <var class="var">x</var> and <var class="var">y</var>. <var class="var">x</var> and
<var class="var">y</var> must have the same number of elements, and will be treated as
vectors with the covariance computed as
<code class="code">cov (<var class="var">x</var>(:), <var class="var">y</var>(:))</code>. The output will be a 2 x 2
covariance matrix.
</p>
<p>The optional argument <var class="var">opt</var> determines the type of normalization to
use. Valid values are
</p>
<dl class="table">
<dt>0 [default]:</dt>
<dd><p>Normalize with <em class="math">N-1</em>. This provides the best unbiased estimator of
the covariance.
</p>
</dd>
<dt>1:</dt>
<dd><p>Normalize with <em class="math">N</em>. This provides the second moment around the
mean. <var class="var">opt</var> is set to 1 for N = 1.
</p></dd>
</dl>
<p>The optional argument <var class="var">nanflag</var> must appear last in the argument list
and controls how NaN values are handled by <code class="code">cov</code>. The three valid
values are:
</p>
<dl class="table">
<dt>includenan [default]:</dt>
<dd><p>Leave NaN values in <var class="var">x</var> and <var class="var">y</var>. Output will follow the normal
rules for handling NaN values in arithmetic operations.
</p>
</dd>
<dt>omitrows:</dt>
<dd><p>Rows containing NaN values are trimmed from both <var class="var">x</var> and <var class="var">y</var>
prior to calculating the covariance. A NaN in one variable will remove
that row from both <var class="var">x</var> and <var class="var">y</var>.
</p>
</dd>
<dt>partialrows:</dt>
<dd><p>Rows containing NaN values are ignored from both <var class="var">x</var> and <var class="var">y</var>
independently for each <var class="var">i</var>-th and <var class="var">j</var>-th covariance
calculation. This may result in a different number of observations,
<em class="math">N</em>, being used to calculated each element of the covariance matrix.
</p></dd>
</dl>
<p>Compatibility Note: Before Octave v9.1.0, <code class="code">cov</code> treated rows
<var class="var">x</var> and <var class="var">y</var> as multivariate random variables. Newer versions
attempt to maintain full compatibility with <small class="sc">MATLAB</small> by treating
<var class="var">x</var> and <var class="var">y</var> as two univariate distributions regardless of shape,
resulting in a 2x2 output matrix. Code relying on Octave’s previous
definition will need to be modified when running this newer version of
<code class="code">cov</code>. The previous behavior can be obtained by using the
NaN package’s <code class="code">covm</code> function as <code class="code">covm (<var class="var">x</var>, <var class="var">y</var>, "D")</code>.
</p>
<p><strong class="strong">See also:</strong> <a class="ref" href="#XREFcorr">corr</a>.
</p></dd></dl>
<a class="anchor" id="XREFcorr"></a><span style="display:block; margin-top:-4.5ex;"> </span>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-corr"><span><code class="def-type"><var class="var">r</var> =</code> <strong class="def-name">corr</strong> <code class="def-code-arguments">(<var class="var">x</var>)</code><a class="copiable-link" href="#index-corr"> ¶</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-corr-1"><span><code class="def-type"><var class="var">r</var> =</code> <strong class="def-name">corr</strong> <code class="def-code-arguments">(<var class="var">x</var>, <var class="var">y</var>)</code><a class="copiable-link" href="#index-corr-1"> ¶</a></span></dt>
<dd><p>Compute matrix of correlation coefficients.
</p>
<p>If each row of <var class="var">x</var> and <var class="var">y</var> is an observation and each column is
a variable, then the (<var class="var">i</var>, <var class="var">j</var>)-th<!-- /@w --> entry of
<code class="code">corr (<var class="var">x</var>, <var class="var">y</var>)</code> is the correlation between the
<var class="var">i</var>-th variable in <var class="var">x</var> and the <var class="var">j</var>-th variable in <var class="var">y</var>.
<var class="var">x</var> and <var class="var">y</var> must have the same number of rows (observations). The
correlation coefficient is calculated for two variable vectors <var class="var">A</var> and
<var class="var">B</var> (columns of <var class="var">x</var> and <var class="var">y</var>) as:
</p>
<div class="example">
<pre class="example-preformatted">corr (<var class="var">A</var>,<var class="var">B</var>) = cov (<var class="var">A</var>,<var class="var">B</var>) / (std (<var class="var">A</var>) * std (<var class="var">B</var>))
</pre></div>
<p>The output variable <var class="var">r</var> will have size n x m, where n and m are the
number of variables (columns) in <var class="var">x</var> and <var class="var">y</var>, respectively. Note
that as the standard deviation of any scalar is zero, the correlation
coefficient will be returned as NaN for any scalar or single-row inputs.
</p>
<p>If called with one argument, compute <code class="code">corr (<var class="var">x</var>, <var class="var">x</var>)</code>,
the correlation between the each pair of columns of <var class="var">x</var>.
</p>
<p><strong class="strong">See also:</strong> <a class="ref" href="#XREFcov">cov</a>, <a class="ref" href="#XREFcorrcoef">corrcoef</a>.
</p></dd></dl>
<a class="anchor" id="XREFcorrcoef"></a><span style="display:block; margin-top:-4.5ex;"> </span>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-corrcoef"><span><code class="def-type"><var class="var">r</var> =</code> <strong class="def-name">corrcoef</strong> <code class="def-code-arguments">(<var class="var">x</var>)</code><a class="copiable-link" href="#index-corrcoef"> ¶</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-corrcoef-1"><span><code class="def-type"><var class="var">r</var> =</code> <strong class="def-name">corrcoef</strong> <code class="def-code-arguments">(<var class="var">x</var>, <var class="var">y</var>)</code><a class="copiable-link" href="#index-corrcoef-1"> ¶</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-corrcoef-2"><span><code class="def-type"><var class="var">r</var> =</code> <strong class="def-name">corrcoef</strong> <code class="def-code-arguments">(…, <var class="var">param</var>, <var class="var">value</var>, …)</code><a class="copiable-link" href="#index-corrcoef-2"> ¶</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-corrcoef-3"><span><code class="def-type">[<var class="var">r</var>, <var class="var">p</var>] =</code> <strong class="def-name">corrcoef</strong> <code class="def-code-arguments">(…)</code><a class="copiable-link" href="#index-corrcoef-3"> ¶</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-corrcoef-4"><span><code class="def-type">[<var class="var">r</var>, <var class="var">p</var>, <var class="var">lci</var>, <var class="var">hci</var>] =</code> <strong class="def-name">corrcoef</strong> <code class="def-code-arguments">(…)</code><a class="copiable-link" href="#index-corrcoef-4"> ¶</a></span></dt>
<dd><p>Compute a matrix of correlation coefficients.
</p>
<p><var class="var">x</var> is an array where each column contains a variable and each row is
an observation.
</p>
<p>If a second input <var class="var">y</var> (of the same size as <var class="var">x</var>) is given then
calculate the correlation coefficients between <var class="var">x</var> and <var class="var">y</var>.
</p>
<p><var class="var">param</var>, <var class="var">value</var> are optional pairs of parameters and values which
modify the calculation. Valid options are:
</p>
<dl class="table">
<dt><code class="code">"alpha"</code></dt>
<dd><p>Confidence level used for the bounds of the confidence interval, <var class="var">lci</var>
and <var class="var">hci</var>. Default is 0.05, i.e., 95% confidence interval.
</p>
</dd>
<dt><code class="code">"rows"</code></dt>
<dd><p>Determine processing of NaN values. Acceptable values are <code class="code">"all"</code>,
<code class="code">"complete"</code>, and <code class="code">"pairwise"</code>. Default is <code class="code">"all"</code>.
With <code class="code">"complete"</code>, only the rows without NaN values are considered.
With <code class="code">"pairwise"</code>, the selection of NaN-free rows is made for each
pair of variables.
</p></dd>
</dl>
<p>Output <var class="var">r</var> is a matrix of Pearson’s product moment correlation
coefficients for each pair of variables.
</p>
<p>Output <var class="var">p</var> is a matrix of pair-wise p-values testing for the null
hypothesis of a correlation coefficient of zero.
</p>
<p>Outputs <var class="var">lci</var> and <var class="var">hci</var> are matrices containing, respectively, the
lower and higher bounds of the 95% confidence interval of each correlation
coefficient.
</p>
<p><strong class="strong">See also:</strong> <a class="ref" href="#XREFcorr">corr</a>, <a class="ref" href="#XREFcov">cov</a>, <a class="ref" href="Descriptive-Statistics.html#XREFstd">std</a>.
</p></dd></dl>
<a class="anchor" id="XREFspearman"></a><span style="display:block; margin-top:-4.5ex;"> </span>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-spearman"><span><code class="def-type"><var class="var">rho</var> =</code> <strong class="def-name">spearman</strong> <code class="def-code-arguments">(<var class="var">x</var>)</code><a class="copiable-link" href="#index-spearman"> ¶</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-spearman-1"><span><code class="def-type"><var class="var">rho</var> =</code> <strong class="def-name">spearman</strong> <code class="def-code-arguments">(<var class="var">x</var>, <var class="var">y</var>)</code><a class="copiable-link" href="#index-spearman-1"> ¶</a></span></dt>
<dd><a class="index-entry-id" id="index-Spearman_0027s-Rho"></a>
<p>Compute Spearman’s rank correlation coefficient
<var class="var">rho</var>.
</p>
<p>For two data vectors <var class="var">x</var> and <var class="var">y</var>, Spearman’s
<var class="var">rho</var>
is the correlation coefficient of the ranks of <var class="var">x</var> and <var class="var">y</var>.
</p>
<p>If <var class="var">x</var> and <var class="var">y</var> are drawn from independent distributions,
<var class="var">rho</var>
has zero mean and variance
<code class="code">1 / (N - 1)</code>,
where <em class="math">N</em> is the length of the <var class="var">x</var> and <var class="var">y</var> vectors, and is
asymptotically normally distributed.
</p>
<p><code class="code">spearman (<var class="var">x</var>)</code> is equivalent to
<code class="code">spearman (<var class="var">x</var>, <var class="var">x</var>)</code>.
</p>
<p><strong class="strong">See also:</strong> <a class="ref" href="Basic-Statistical-Functions.html#XREFranks">ranks</a>, <a class="ref" href="#XREFkendall">kendall</a>.
</p></dd></dl>
<a class="anchor" id="XREFkendall"></a><span style="display:block; margin-top:-4.5ex;"> </span>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-kendall"><span><code class="def-type"><var class="var">tau</var> =</code> <strong class="def-name">kendall</strong> <code class="def-code-arguments">(<var class="var">x</var>)</code><a class="copiable-link" href="#index-kendall"> ¶</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-kendall-1"><span><code class="def-type"><var class="var">tau</var> =</code> <strong class="def-name">kendall</strong> <code class="def-code-arguments">(<var class="var">x</var>, <var class="var">y</var>)</code><a class="copiable-link" href="#index-kendall-1"> ¶</a></span></dt>
<dd><a class="index-entry-id" id="index-Kendall_0027s-Tau"></a>
<p>Compute Kendall’s
<var class="var">tau</var>.
</p>
<p>For two data vectors <var class="var">x</var>, <var class="var">y</var> of common length <em class="math">N</em>, Kendall’s
<var class="var">tau</var>
is the correlation of the signs of all rank differences of
<var class="var">x</var> and <var class="var">y</var>; i.e., if both <var class="var">x</var> and <var class="var">y</var> have distinct
entries, then
</p>
<div class="example">
<div class="group"><pre class="example-preformatted"> 1
<var class="var">tau</var> = ------- SUM sign (<var class="var">q</var>(i) - <var class="var">q</var>(j)) * sign (<var class="var">r</var>(i) - <var class="var">r</var>(j))
N (N-1) i,j
</pre></div></div>
<p>in which the
<var class="var">q</var>(i) and <var class="var">r</var>(i)
are the ranks of <var class="var">x</var> and <var class="var">y</var>, respectively.
</p>
<p>If <var class="var">x</var> and <var class="var">y</var> are drawn from independent distributions,
Kendall’s
<var class="var">tau</var>
is asymptotically normal with mean 0 and variance
<code class="code">(2 * (2N+5)) / (9 * N * (N-1))</code>.
</p>
<p><code class="code">kendall (<var class="var">x</var>)</code> is equivalent to <code class="code">kendall (<var class="var">x</var>,
<var class="var">x</var>)</code>.
</p>
<p><strong class="strong">See also:</strong> <a class="ref" href="Basic-Statistical-Functions.html#XREFranks">ranks</a>, <a class="ref" href="#XREFspearman">spearman</a>.
</p></dd></dl>
</div>
<hr>
<div class="nav-panel">
<p>
Next: <a href="Distributions.html">Distributions</a>, Previous: <a href="Basic-Statistical-Functions.html">Basic Statistical Functions</a>, Up: <a href="Statistics.html">Statistics</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>
|