File: glmer.html

package info (click to toggle)
r-cran-rstanarm 2.21.1-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 7,964 kB
  • sloc: cpp: 47; sh: 18; makefile: 2
file content (544 lines) | stat: -rw-r--r-- 142,154 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
<!DOCTYPE html>

<html>

<head>

<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />

<meta name="viewport" content="width=device-width, initial-scale=1" />

<meta name="author" content="Jonah Gabry and Ben Goodrich" />

<meta name="date" content="2020-07-19" />

<title>Estimating Generalized (Non-)Linear Models with Group-Specific Terms with rstanarm</title>



<style type="text/css">code{white-space: pre;}</style>
<style type="text/css" data-origin="pandoc">
a.sourceLine { display: inline-block; line-height: 1.25; }
a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
a.sourceLine:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
a.sourceLine { text-indent: -1em; padding-left: 1em; }
}
pre.numberSource a.sourceLine
  { position: relative; left: -4em; }
pre.numberSource a.sourceLine::before
  { content: attr(title);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; pointer-events: all; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {  }
@media screen {
a.sourceLine::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */

</style>
<script>
// apply pandoc div.sourceCode style to pre.sourceCode instead
(function() {
  var sheets = document.styleSheets;
  for (var i = 0; i < sheets.length; i++) {
    if (sheets[i].ownerNode.dataset["origin"] !== "pandoc") continue;
    try { var rules = sheets[i].cssRules; } catch (e) { continue; }
    for (var j = 0; j < rules.length; j++) {
      var rule = rules[j];
      // check if there is a div.sourceCode rule
      if (rule.type !== rule.STYLE_RULE || rule.selectorText !== "div.sourceCode") continue;
      var style = rule.style.cssText;
      // check if color or background-color is set
      if (rule.style.color === '' && rule.style.backgroundColor === '') continue;
      // replace div.sourceCode by a pre.sourceCode rule
      sheets[i].deleteRule(j);
      sheets[i].insertRule('pre.sourceCode{' + style + '}', j);
    }
  }
})();
</script>



<style type="text/css">body {
background-color: #fff;
margin: 1em auto;
max-width: 700px;
overflow: visible;
padding-left: 2em;
padding-right: 2em;
font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
font-size: 14px;
line-height: 1.35;
}
#TOC {
clear: both;
margin: 0 0 10px 10px;
padding: 4px;
width: 400px;
border: 1px solid #CCCCCC;
border-radius: 5px;
background-color: #f6f6f6;
font-size: 13px;
line-height: 1.3;
}
#TOC .toctitle {
font-weight: bold;
font-size: 15px;
margin-left: 5px;
}
#TOC ul {
padding-left: 40px;
margin-left: -1.5em;
margin-top: 5px;
margin-bottom: 5px;
}
#TOC ul ul {
margin-left: -2em;
}
#TOC li {
line-height: 16px;
}
table {
margin: 1em auto;
border-width: 1px;
border-color: #DDDDDD;
border-style: outset;
border-collapse: collapse;
}
table th {
border-width: 2px;
padding: 5px;
border-style: inset;
}
table td {
border-width: 1px;
border-style: inset;
line-height: 18px;
padding: 5px 5px;
}
table, table th, table td {
border-left-style: none;
border-right-style: none;
}
table thead, table tr.even {
background-color: #f7f7f7;
}
p {
margin: 0.5em 0;
}
blockquote {
background-color: #f6f6f6;
padding: 0.25em 0.75em;
}
hr {
border-style: solid;
border: none;
border-top: 1px solid #777;
margin: 28px 0;
}
dl {
margin-left: 0;
}
dl dd {
margin-bottom: 13px;
margin-left: 13px;
}
dl dt {
font-weight: bold;
}
ul {
margin-top: 0;
}
ul li {
list-style: circle outside;
}
ul ul {
margin-bottom: 0;
}
pre, code {
background-color: #f7f7f7;
border-radius: 3px;
color: #333;
white-space: pre-wrap; 
}
pre {
border-radius: 3px;
margin: 5px 0px 10px 0px;
padding: 10px;
}
pre:not([class]) {
background-color: #f7f7f7;
}
code {
font-family: Consolas, Monaco, 'Courier New', monospace;
font-size: 85%;
}
p > code, li > code {
padding: 2px 0px;
}
div.figure {
text-align: center;
}
img {
background-color: #FFFFFF;
padding: 2px;
border: 1px solid #DDDDDD;
border-radius: 3px;
border: 1px solid #CCCCCC;
margin: 0 5px;
}
h1 {
margin-top: 0;
font-size: 35px;
line-height: 40px;
}
h2 {
border-bottom: 4px solid #f7f7f7;
padding-top: 10px;
padding-bottom: 2px;
font-size: 145%;
}
h3 {
border-bottom: 2px solid #f7f7f7;
padding-top: 10px;
font-size: 120%;
}
h4 {
border-bottom: 1px solid #f7f7f7;
margin-left: 8px;
font-size: 105%;
}
h5, h6 {
border-bottom: 1px solid #ccc;
font-size: 105%;
}
a {
color: #0033dd;
text-decoration: none;
}
a:hover {
color: #6666ff; }
a:visited {
color: #800080; }
a:visited:hover {
color: #BB00BB; }
a[href^="http:"] {
text-decoration: underline; }
a[href^="https:"] {
text-decoration: underline; }

code > span.kw { color: #555; font-weight: bold; } 
code > span.dt { color: #902000; } 
code > span.dv { color: #40a070; } 
code > span.bn { color: #d14; } 
code > span.fl { color: #d14; } 
code > span.ch { color: #d14; } 
code > span.st { color: #d14; } 
code > span.co { color: #888888; font-style: italic; } 
code > span.ot { color: #007020; } 
code > span.al { color: #ff0000; font-weight: bold; } 
code > span.fu { color: #900; font-weight: bold; } 
code > span.er { color: #a61717; background-color: #e3d2d2; } 
</style>




</head>

<body>




<h1 class="title toc-ignore">Estimating Generalized (Non-)Linear Models with Group-Specific Terms with rstanarm</h1>
<h4 class="author">Jonah Gabry and Ben Goodrich</h4>
<h4 class="date">2020-07-19</h4>


<div id="TOC">
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#glms-with-group-specific-terms">GLMs with group-specific terms</a></li>
<li><a href="#priors-on-covariance-matrices">Priors on covariance matrices</a><ul>
<li><a href="#overview">Overview</a></li>
<li><a href="#details">Details</a></li>
</ul></li>
<li><a href="#comparison-with-lme4">Comparison with <strong>lme4</strong></a><ul>
<li><a href="#advantage-better-uncertainty-estimates">Advantage: better uncertainty estimates</a></li>
<li><a href="#advantage-incorporate-prior-information">Advantage: incorporate prior information</a></li>
<li><a href="#disadvantage-speed">Disadvantage: speed</a></li>
</ul></li>
<li><a href="#relationship-to-glmer">Relationship to <code>glmer</code></a></li>
<li><a href="#relationship-to-gamm4">Relationship to <code>gamm4</code></a></li>
<li><a href="#relationship-to-nlmer">Relationship to <code>nlmer</code></a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>

<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{stan_glmer: GLMs with Group-Specific Terms}
-->
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb1-1" title="1"><span class="kw">library</span>(ggplot2)</a>
<a class="sourceLine" id="cb1-2" title="2"><span class="kw">library</span>(bayesplot)</a>
<a class="sourceLine" id="cb1-3" title="3"><span class="kw">theme_set</span>(bayesplot<span class="op">::</span><span class="kw">theme_default</span>())</a></code></pre></div>
<div id="introduction" class="section level1">
<h1>Introduction</h1>
<p>This vignette explains how to use the <code>stan_lmer</code>, <code>stan_glmer</code>, <code>stan_nlmer</code>, and <code>stan_gamm4</code> functions in the <strong>rstanarm</strong> package to estimate linear and generalized (non-)linear models with parameters that may vary across groups. Before continuing, we recommend reading the vignettes (navigate up one level) for the various ways to use the <code>stan_glm</code> function. The <em>Hierarchical Partial Pooling</em> vignette also has examples of both <code>stan_glm</code> and <code>stan_glmer</code>.</p>
</div>
<div id="glms-with-group-specific-terms" class="section level1">
<h1>GLMs with group-specific terms</h1>
<p>Models with this structure are refered to by many names: multilevel models, (generalized) linear mixed (effects) models (GLMM), hierarchical (generalized) linear models, etc. In the simplest case, the model for an outcome can be written as <span class="math display">\[\mathbf{y} = \alpha + \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \mathbf{b} + \boldsymbol{\epsilon},\]</span> where <span class="math inline">\(\mathbf{X}\)</span> is a matrix predictors that is analogous to that in Generalized Linear Models and <span class="math inline">\(\mathbf{Z}\)</span> is a matrix that encodes deviations in the predictors across specified groups.</p>
<p>The terminology for the unknowns in the model is diverse. To frequentists, the error term consists of <span class="math inline">\(\mathbf{Z}\mathbf{b} + \boldsymbol{\epsilon}\)</span> and the observations within each group are <em>not</em> independent conditional on <span class="math inline">\(\mathbf{X}\)</span> alone. Since, <span class="math inline">\(\mathbf{b}\)</span> is considered part of the random error term, frequentists allow themselves to make distributional assumptions about <span class="math inline">\(\mathbf{b}\)</span>, invariably that it is distributed multivariate normal with mean vector zero and structured covariance matrix <span class="math inline">\(\boldsymbol{\Sigma}\)</span>. If <span class="math inline">\(\epsilon_i\)</span> is also distributed (univariate) normal with mean zero and standard deviation <span class="math inline">\(\sigma\)</span>, then <span class="math inline">\(\mathbf{b}\)</span> can be integrated out, which implies <span class="math display">\[\mathbf{y} \thicksim \mathcal{N}\left(\alpha + \mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{I}+\mathbf{Z}^\top \boldsymbol{\Sigma} \mathbf{Z} \right),\]</span> and it is possible to maximize this likelihood function by choosing proposals for the parameters <span class="math inline">\(\alpha\)</span>, <span class="math inline">\(\boldsymbol{\beta}\)</span>, and (the free elements of) <span class="math inline">\(\boldsymbol{\Sigma}\)</span>.</p>
<p>Consequently, frequentists refer to <span class="math inline">\(\mathbf{b}\)</span> as the <em>random effects</em> because they capture the random deviation in the effects of predictors from one group to the next. In contradistinction, <span class="math inline">\(\alpha\)</span> and <span class="math inline">\(\boldsymbol{\beta}\)</span> are referred to as <em>fixed effects</em> because they are the same for all groups. Moreover, <span class="math inline">\(\alpha\)</span> and <span class="math inline">\(\boldsymbol{\beta}\)</span> persist in the model in hypothetical replications of the analysis that draw the members of the groups afresh every time, whereas <span class="math inline">\(\mathbf{b}\)</span> would differ from one replication to the next. Consequently, <span class="math inline">\(\mathbf{b}\)</span> is not a “parameter” to be estimated because parameters are unknown constants that are fixed in repeated sampling.</p>
<p>Bayesians condition on the data in-hand without reference to repeated sampling and describe their <em>beliefs</em> about the unknowns with prior distributions before observing the data. Thus, the likelihood in a simple hierarchical model in <strong>rstarnarm</strong> is <span class="math display">\[\mathbf{y} \thicksim \mathcal{N}\left(\alpha + \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\mathbf{b}, \sigma^2 \mathbf{I}\right)\]</span> and the observations are independent conditional on <span class="math inline">\(\mathbf{X}\)</span> and <span class="math inline">\(\mathbf{Z}\)</span>. In this formulation, there are</p>
<ul>
<li>intercept(s) and coefficients that are <em>common across groups</em></li>
<li>deviations in the intercept(s) and / or coefficients that <em>vary across groups</em></li>
</ul>
<p>Bayesians are compelled to state their prior beliefs about all unknowns and the usual assumption (which is maintained in <strong>rstanarm</strong>) is that <span class="math inline">\(\mathbf{b} \thicksim \mathcal{N}\left(\mathbf{0},\boldsymbol{\Sigma}\right),\)</span> but it is then necessary to state prior beliefs about <span class="math inline">\(\boldsymbol{\Sigma}\)</span>, in addition to <span class="math inline">\(\alpha\)</span>, <span class="math inline">\(\boldsymbol{\beta}\)</span>, and <span class="math inline">\(\sigma\)</span>.</p>
<p>One of the many challenges of fitting models to data comprising multiple groupings is confronting the tradeoff between validity and precision. An analysis that disregards between-group heterogeneity can yield parameter estimates that are wrong if there is between-group heterogeneity but would be relatively precise if there actually were no between-group heterogeneity. Group-by-group analyses, on the other hand, are valid but produces estimates that are relatively imprecise. While complete pooling or no pooling of data across groups is sometimes called for, models that ignore the grouping structures in the data tend to underfit or overfit (Gelman et al.,2013). Hierarchical modeling provides a compromise by allowing parameters to vary by group at lower levels of the hierarchy while estimating common parameters at higher levels. Inference for each group-level parameter is informed not only by the group-specific information contained in the data but also by the data for other groups as well. This is commonly referred to as <em>borrowing strength</em> or <em>shrinkage</em>.</p>
<p>In <strong>rstanarm</strong>, these models can be estimated using the <code>stan_lmer</code> and <code>stan_glmer</code> functions, which are similar in syntax to the <code>lmer</code> and <code>glmer</code> functions in the <strong>lme4</strong> package. However, rather than performing (restricted) maximum likelihood (RE)ML estimation, Bayesian estimation is performed via MCMC. The Bayesian model adds independent prior distributions on the regression coefficients (in the same way as <code>stan_glm</code>) as well as priors on the terms of a decomposition of the covariance matrices of the group-specific parameters. These priors are discussed in greater detail below.</p>
</div>
<div id="priors-on-covariance-matrices" class="section level1">
<h1>Priors on covariance matrices</h1>
<p>In this section we discuss a flexible family of prior distributions for the unknown covariance matrices of the group-specific coefficients.</p>
<div id="overview" class="section level3">
<h3>Overview</h3>
<p>For each group, we assume the vector of varying slopes and intercepts is a zero-mean random vector following a multivariate Gaussian distribution with an unknown covariance matrix to be estimated. Unfortunately, expressing prior information about a covariance matrix is not intuitive and can also be computationally challenging. When the covariance matrix is not <span class="math inline">\(1\times 1\)</span>, it is often both much more intuitive and efficient to work instead with the <strong>correlation</strong> matrix and variances. When the covariance matrix is <span class="math inline">\(1\times 1\)</span>, we still denote it as <span class="math inline">\(\boldsymbol{\Sigma}\)</span> but most of the details in this section do not apply.</p>
<p>The variances are in turn decomposed into the product of a simplex vector (probability vector) and the trace of the implied covariance matrix, which is defined as the sum of its diagonal elements. Finally, this trace is set equal to the product of the order of the matrix and the square of a scale parameter. This implied prior on a covariance matrix is represented by the <code>decov</code> (short for decomposition of covariance) function in <strong>rstanarm</strong>.</p>
</div>
<div id="details" class="section level3">
<h3>Details</h3>
<p>Using the decomposition described above, the prior used for a correlation matrix <span class="math inline">\(\Omega\)</span> is called the LKJ distribution and has a probability density function proportional to the determinant of the correlation matrix raised to a power of <span class="math inline">\(\zeta\)</span> minus one:</p>
<p><span class="math display">\[ f(\Omega | \zeta) \propto \text{det}(\Omega)^{\zeta - 1}, \quad \zeta &gt; 0. \]</span></p>
<p>The shape of this prior depends on the value of the regularization parameter, <span class="math inline">\(\zeta\)</span> in the following ways:</p>
<ul>
<li>If <span class="math inline">\(\zeta = 1\)</span> (the default), then the LKJ prior is jointly uniform over all correlation matrices of the same dimension as <span class="math inline">\(\Omega\)</span>.</li>
<li>If <span class="math inline">\(\zeta &gt; 1\)</span>, then the mode of the distribution is the identity matrix. The larger the value of <span class="math inline">\(\zeta\)</span> the more sharply peaked the density is at the identity matrix.</li>
<li>If <span class="math inline">\(0 &lt; \zeta &lt; 1\)</span>, then the density has a trough at the identity matrix.</li>
</ul>
<p>The <span class="math inline">\(J \times J\)</span> covariance matrix <span class="math inline">\(\Sigma\)</span> of a random vector <span class="math inline">\(\boldsymbol{\theta} = (\theta_1, \dots, \theta_J)\)</span> has diagonal entries <span class="math inline">\({\Sigma}_{jj} = \sigma^2_j = \text{var}(\theta_j)\)</span>. Therefore, the trace of the covariance matrix is equal to the sum of the variances. We set the trace equal to the product of the order of the covariance matrix and the square of a positive scale parameter <span class="math inline">\(\tau\)</span>:</p>
<p><span class="math display">\[\text{tr}(\Sigma) = \sum_{j=1}^{J} \Sigma_{jj} = J\tau^2.\]</span></p>
<p>The vector of variances is set equal to the product of a simplex vector <span class="math inline">\(\boldsymbol{\pi}\)</span> — which is non-negative and sums to 1 — and the scalar trace: <span class="math inline">\(J \tau^2 \boldsymbol{\pi}\)</span>. Each element <span class="math inline">\(\pi_j\)</span> of <span class="math inline">\(\boldsymbol{\pi}\)</span> then represents the proportion of the trace (total variance) attributable to the corresponding variable <span class="math inline">\(\theta_j\)</span>.</p>
<p>For the simplex vector <span class="math inline">\(\boldsymbol{\pi}\)</span> we use a symmetric Dirichlet prior, which has a single <em>concentration</em> parameter <span class="math inline">\(\gamma &gt; 0\)</span>:</p>
<ul>
<li>If <span class="math inline">\(\gamma = 1\)</span> (the default), then the prior is jointly uniform over the space of simplex vectors with <span class="math inline">\(J\)</span> elements.</li>
<li>If <span class="math inline">\(\gamma &gt; 1\)</span>, then the prior mode corresponds to all variables having the same (proportion of total) variance, which can be used to ensure that the posterior variances are not zero. As the concentration parameter approaches infinity, this mode becomes more pronounced.</li>
<li>If <span class="math inline">\(0 &lt; \gamma &lt; 1\)</span>, then the variances are more polarized.</li>
</ul>
<p>If all the elements of <span class="math inline">\(\boldsymbol{\theta}\)</span> were multiplied by the same number <span class="math inline">\(k\)</span>, the trace of their covariance matrix would increase by a factor of <span class="math inline">\(k^2\)</span>. For this reason, it is sensible to use a scale-invariant prior for <span class="math inline">\(\tau\)</span>. We choose a Gamma distribution, with shape and scale parameters both set to <span class="math inline">\(1\)</span> by default, implying a unit-exponential distribution. Users can set the shape hyperparameter to some value greater than one to ensure that the posterior trace is not zero. In the case where <span class="math inline">\(\boldsymbol{\Sigma}\)</span> is <span class="math inline">\(1\times 1\)</span>, this shape parameter is the cross-group standard deviation in the parameters and its square is the variance.</p>
</div>
</div>
<div id="comparison-with-lme4" class="section level1">
<h1>Comparison with <strong>lme4</strong></h1>
<p>There are several advantages to estimating these models using <strong>rstanarm</strong> rather than the <strong>lme4</strong> package. There are also a few drawbacks. In this section we briefly discuss what we find to be the two most important advantages as well as an important disadvantage.</p>
<div id="advantage-better-uncertainty-estimates" class="section level3">
<h3>Advantage: better uncertainty estimates</h3>
<p>While <strong>lme4</strong> uses (restricted) maximum likelihood (RE)ML estimation, <strong>rstanarm</strong> enables full Bayesian inference via MCMC to be performed. It is well known that (RE)ML tends to underestimate uncertainties because it relies on point estimates of hyperparameters. Full Bayes, on the other hand, propagates the uncertainty in the hyperparameters throughout all levels of the model and provides more appropriate estimates of uncertainty for models that consist of a mix of common and group-specific parameters.</p>
</div>
<div id="advantage-incorporate-prior-information" class="section level3">
<h3>Advantage: incorporate prior information</h3>
<p>The <code>stan_glmer</code> and <code>stan_lmer</code> functions allow the user to specify prior distributions over the regression coefficients as well as any unknown covariance matrices. There are various reasons to specify priors, from helping to stabilize computation to incorporating important information into an analysis that does not enter through the data.</p>
</div>
<div id="disadvantage-speed" class="section level3">
<h3>Disadvantage: speed</h3>
<p>The benefits of full Bayesian inference (via MCMC) come with a cost. Fitting models with (RE)ML will tend to be much faster than fitting a similar model using MCMC. Speed comparable to <strong>lme4</strong> can be obtained with <strong>rstanarm</strong> using approximate Bayesian inference via the mean-field and full-rank variational algorithms (see <code>help(&quot;rstanarm-package&quot;, &quot;rstanarm&quot;)</code> for details). These algorithms can be useful to narrow the set of candidate models in large problems, but MCMC should always be used for final statistical inference.</p>
</div>
</div>
<div id="relationship-to-glmer" class="section level1">
<h1>Relationship to <code>glmer</code></h1>
<p>In the <strong>lme4</strong> package, there is a fundamental distinction between the way that Linear Mixed Models and Generalized Linear Mixed Models are estimated. In Linear Mixed Models, <span class="math inline">\(\mathbf{b}\)</span> can be integrated out analytically, leaving a likelihood function that can be maximized over proposals for the parameters. To estimate a Linear Mixed Model, one can call the <code>lmer</code> function.</p>
<p>Generalized Linear Mixed Models are appropriate when the conditional mean of the outcome is determined by an inverse link function, <span class="math inline">\(\boldsymbol{\mu} = g\left(\alpha + \mathbf{X} \boldsymbol{\beta} + \mathbf{Z}\mathbf{b}\right)\)</span>. If <span class="math inline">\(g\left(\cdot\right)\)</span> is not the identity function, then it is not possible to integrate out <span class="math inline">\(\mathbf{b}\)</span> analytically and numerical integration must be used. To estimate a Generalized Linear Mixed Model, one can call the <code>glmer</code> function and specify the <code>family</code> argument.</p>
<p>In the <strong>rstanarm</strong> package, there is no such fundamental distinction; in fact <code>stan_lmer</code> simply calls <code>stan_glmer</code> with <code>family = gaussian(link = &quot;identity&quot;)</code>. Bayesians do not (have to) integrate <span class="math inline">\(\mathbf{b}\)</span> out of the likelihood and if <span class="math inline">\(\mathbf{b}\)</span> is not of interest, then the margins of its posterior distribution can simply be ignored.</p>
</div>
<div id="relationship-to-gamm4" class="section level1">
<h1>Relationship to <code>gamm4</code></h1>
<p>The <strong>rstanarm</strong> package includes a <code>stan_gamm4</code> function that is similar to the <code>gamm4</code> function in the <strong>gamm4</strong> package, which is in turn similar to the <code>gamm</code> function in the <strong>mgcv</strong> package. The substring <code>gamm</code> stands for Generalized Additive Mixed Models, which differ from Generalized Additive Models (GAMs) due to the presence of group-specific terms that can be specified with the syntax of <strong>lme4</strong>. Both GAMs and GAMMs include nonlinear functions of (non-categorical) predictors called “smooths”. In the example below, so-called “thin-plate splines” are used to model counts of roaches where we might fear that the number of roaches in the current period is an exponentially increasing function of the number of roaches in the previous period. Unlike <code>stan_glmer</code>, in <code>stan_gamm4</code> it is necessary to specify group-specific terms as a one-sided formula that is passed to the <code>random</code> argument as in the <code>lme</code> function in the <strong>nlme</strong> package.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb2-1" title="1"><span class="kw">library</span>(rstanarm)</a>
<a class="sourceLine" id="cb2-2" title="2"><span class="kw">data</span>(roaches)</a>
<a class="sourceLine" id="cb2-3" title="3">roaches<span class="op">$</span>roach1 &lt;-<span class="st"> </span>roaches<span class="op">$</span>roach1 <span class="op">/</span><span class="st"> </span><span class="dv">100</span></a>
<a class="sourceLine" id="cb2-4" title="4">roaches<span class="op">$</span>log_exposure2 &lt;-<span class="st"> </span><span class="kw">log</span>(roaches<span class="op">$</span>exposure2)</a>
<a class="sourceLine" id="cb2-5" title="5">post &lt;-<span class="st"> </span><span class="kw">stan_gamm4</span>(</a>
<a class="sourceLine" id="cb2-6" title="6">  y <span class="op">~</span><span class="st"> </span><span class="kw">s</span>(roach1) <span class="op">+</span><span class="st"> </span>treatment <span class="op">+</span><span class="st"> </span>log_exposure2,</a>
<a class="sourceLine" id="cb2-7" title="7">  <span class="dt">random =</span> <span class="op">~</span>(<span class="dv">1</span> <span class="op">|</span><span class="st"> </span>senior),</a>
<a class="sourceLine" id="cb2-8" title="8">  <span class="dt">data =</span> roaches, </a>
<a class="sourceLine" id="cb2-9" title="9">  <span class="dt">family =</span> neg_binomial_<span class="dv">2</span>, </a>
<a class="sourceLine" id="cb2-10" title="10">  <span class="dt">QR =</span> <span class="ot">TRUE</span>,</a>
<a class="sourceLine" id="cb2-11" title="11">  <span class="dt">cores =</span> <span class="dv">2</span>,</a>
<a class="sourceLine" id="cb2-12" title="12">  <span class="dt">chains =</span> <span class="dv">2</span>, </a>
<a class="sourceLine" id="cb2-13" title="13">  <span class="dt">adapt_delta =</span> <span class="fl">0.99</span>,</a>
<a class="sourceLine" id="cb2-14" title="14">  <span class="dt">seed =</span> <span class="dv">12345</span></a>
<a class="sourceLine" id="cb2-15" title="15">)</a></code></pre></div>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" title="1"><span class="kw">plot_nonlinear</span>(post)</a></code></pre></div>
<p><img src="" width="60%" style="display: block; margin: auto;" /></p>
<p>Here we see that the relationship between past and present roaches is estimated to be nonlinear. For a small number of past roaches, the function is steep and then it appears to flatten out, although we become highly uncertain about the function in the rare cases where the number of past roaches is large.</p>
</div>
<div id="relationship-to-nlmer" class="section level1">
<h1>Relationship to <code>nlmer</code></h1>
<p>The <code>stan_gamm4</code> function allows designated predictors to have a nonlinear effect on what would otherwise be called the “linear” predictor in Generalized Linear Models. The <code>stan_nlmer</code> function is similar to the <code>nlmer</code> function in the <strong>lme4</strong> package, and essentially allows a wider range of nonlinear functions that relate the linear predictor to the conditional expectation of a Gaussian outcome.</p>
<p>To estimate an example model with the <code>nlmer</code> function in the <strong>lme4</strong> package, we start by rescaling the outcome and main predictor(s) by a constant</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" title="1"><span class="kw">data</span>(<span class="st">&quot;Orange&quot;</span>, <span class="dt">package =</span> <span class="st">&quot;datasets&quot;</span>)</a>
<a class="sourceLine" id="cb4-2" title="2">Orange<span class="op">$</span>age &lt;-<span class="st"> </span>Orange<span class="op">$</span>age <span class="op">/</span><span class="st"> </span><span class="dv">100</span></a>
<a class="sourceLine" id="cb4-3" title="3">Orange<span class="op">$</span>circumference &lt;-<span class="st"> </span>Orange<span class="op">$</span>circumference <span class="op">/</span><span class="st"> </span><span class="dv">100</span></a></code></pre></div>
<p>Although doing so has no substantive effect on the inferences obtained, it is numerically much easier for Stan and for <strong>lme4</strong> to work with variables whose units are such that the estimated parameters tend to be single-digit numbers that are not too close to zero. The <code>nlmer</code> function requires that the user pass starting values to the ironically-named self-starting non-linear function:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" title="1">startvec &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dt">Asym =</span> <span class="dv">2</span>, <span class="dt">xmid =</span> <span class="fl">7.25</span>, <span class="dt">scal =</span> <span class="fl">3.5</span>)</a>
<a class="sourceLine" id="cb5-2" title="2"><span class="kw">library</span>(lme4)</a>
<a class="sourceLine" id="cb5-3" title="3">nm1 &lt;-<span class="st"> </span><span class="kw">nlmer</span>(circumference <span class="op">~</span><span class="st"> </span><span class="kw">SSlogis</span>(age, Asym, xmid, scal) <span class="op">~</span><span class="st"> </span>Asym<span class="op">|</span>Tree,</a>
<a class="sourceLine" id="cb5-4" title="4">             <span class="dt">data =</span> Orange, <span class="dt">start =</span> startvec)</a>
<a class="sourceLine" id="cb5-5" title="5"><span class="kw">summary</span>(nm1)</a></code></pre></div>
<pre><code>Warning in vcov.merMod(object, use.hessian = use.hessian): variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX</code></pre>
<pre><code>Warning in vcov.merMod(object, correlation = correlation, sigm = sig): variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX</code></pre>
<pre><code>Nonlinear mixed model fit by maximum likelihood  ['nlmerMod']
Formula: circumference ~ SSlogis(age, Asym, xmid, scal) ~ Asym | Tree
   Data: Orange

     AIC      BIC   logLik deviance df.resid 
   -49.2    -41.4     29.6    -59.2       30 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9170 -0.5421  0.1754  0.7116  1.6820 

Random effects:
 Groups   Name Variance Std.Dev.
 Tree     Asym 0.100149 0.31646 
 Residual      0.006151 0.07843 
Number of obs: 35, groups:  Tree, 5

Fixed effects:
     Estimate Std. Error t value
Asym   1.9205     0.1558   12.32
xmid   7.2791     0.3444   21.14
scal   3.4807     0.2631   13.23

Correlation of Fixed Effects:
     Asym  xmid 
xmid 0.384      
scal 0.362 0.762</code></pre>
<p>Note the warning messages indicating difficulty estimating the variance-covariance matrix. Although <strong>lme4</strong> has a fallback mechanism, the need to utilize it suggests that the sample is too small to sustain the asymptotic assumptions underlying the maximum likelihood estimator.</p>
<p>In the above example, we use the <code>SSlogis</code> function, which is a lot like the logistic CDF, but with an additional <code>Asym</code> argument that need not be one and indicates what value the function approaches for large values of the first argument. In this case, we can interpret the asymptote as the maximum possible circumference for an orange. However, this asymptote is allowed to vary from tree to tree using the <code>Asym | Tree</code> syntax, which reflects an assumption that the asymptote for a randomly-selected tree deviates from the asymptote for the population of orange trees in a Gaussian fashion with mean zero and an unknown standard deviation.</p>
<p>The <code>nlmer</code> function supports user-defined non-linear functions, whereas the <code>stan_nlmer</code> function only supports the pre-defined non-linear functions starting with <code>SS</code> in the <strong>stats</strong> package, which are</p>
<pre><code> [1] &quot;SSasymp&quot;     &quot;SSasympOff&quot;  &quot;SSasympOrig&quot; &quot;SSbiexp&quot;     &quot;SSfol&quot;      
 [6] &quot;SSfpl&quot;       &quot;SSgompertz&quot;  &quot;SSlogis&quot;     &quot;SSmicmen&quot;    &quot;SSweibull&quot;  </code></pre>
<p>To fit essentially the same model using Stan’s implementation of MCMC, we add a <code>stan_</code> prefix</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb10-1" title="1">post1 &lt;-<span class="st"> </span><span class="kw">stan_nlmer</span>(circumference <span class="op">~</span><span class="st"> </span><span class="kw">SSlogis</span>(age, Asym, xmid, scal) <span class="op">~</span><span class="st"> </span>Asym<span class="op">|</span>Tree,</a>
<a class="sourceLine" id="cb10-2" title="2">                    <span class="dt">data =</span> Orange, <span class="dt">cores =</span> <span class="dv">2</span>, <span class="dt">seed =</span> <span class="dv">12345</span>, <span class="dt">init_r =</span> <span class="fl">0.5</span>)</a></code></pre></div>
<div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb11-1" title="1">post1</a></code></pre></div>
<pre><code>stan_nlmer
 family:       gaussian [inv_SSlogis]
 formula:      circumference ~ SSlogis(age, Asym, xmid, scal) ~ Asym | Tree
 observations: 35
------
     Median MAD_SD
Asym 1.9    0.1   
xmid 7.2    0.4   
scal 3.4    0.3   

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.1    0.0   

Error terms:
 Groups   Name Std.Dev.
 Tree     Asym 0.310   
 Residual      0.089   
Num. levels: Tree 5 

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg</code></pre>
<p>In <code>stan_nlmer</code>, it is not necessary to supply starting values; however, in this case it was necessary to specify the <code>init_r</code> argument so that the randomly-chosen starting values were not more than <span class="math inline">\(0.5\)</span> away from zero (in the unconstrained parameter space). The default value of <span class="math inline">\(2.0\)</span> produced suboptimal results.</p>
<p>As can be seen, the posterior medians and estimated standard deviations in the MCMC case are quite similar to the maximum likelihood estimates and estimated standard errors. However, <code>stan_nlmer</code> produces uncertainty estimates for the tree-specific deviations in the asymptote, which are considerable.</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb13-1" title="1"><span class="kw">plot</span>(post1, <span class="dt">regex_pars =</span> <span class="st">&quot;^[b]&quot;</span>)</a></code></pre></div>
<p><img src="" width="60%" style="display: block; margin: auto;" /></p>
<p>As can be seen, the age of the tree has a non-linear effect on the predicted circumference of the tree (here for a out-of-sample tree):</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb14-1" title="1">nd &lt;-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">age =</span> <span class="dv">1</span><span class="op">:</span><span class="dv">20</span>, <span class="dt">Tree =</span> <span class="kw">factor</span>(<span class="st">&quot;6&quot;</span>, <span class="dt">levels =</span> <span class="dv">1</span><span class="op">:</span><span class="dv">6</span>))</a>
<a class="sourceLine" id="cb14-2" title="2">PPD &lt;-<span class="st"> </span><span class="kw">posterior_predict</span>(post1, <span class="dt">newdata =</span> nd)</a>
<a class="sourceLine" id="cb14-3" title="3">PPD_df &lt;-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">age =</span> <span class="kw">as.factor</span>(<span class="kw">rep</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">20</span>, <span class="dt">each =</span> <span class="kw">nrow</span>(PPD))),</a>
<a class="sourceLine" id="cb14-4" title="4">                     <span class="dt">circumference =</span> <span class="kw">c</span>(PPD))</a>
<a class="sourceLine" id="cb14-5" title="5"><span class="kw">ggplot</span>(PPD_df, <span class="kw">aes</span>(age, circumference)) <span class="op">+</span><span class="st"> </span><span class="kw">geom_boxplot</span>()</a></code></pre></div>
<p><img src="" width="60%" style="display: block; margin: auto;" /></p>
<p>If we were pharmacological, we could evaluate drug concentration using a first-order compartment model, such as</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb15-1" title="1">post3 &lt;-<span class="st"> </span><span class="kw">stan_nlmer</span>(conc <span class="op">~</span><span class="st"> </span><span class="kw">SSfol</span>(Dose, Time, lKe, lKa, lCl) <span class="op">~</span><span class="st"> </span></a>
<a class="sourceLine" id="cb15-2" title="2"><span class="st">                    </span>(<span class="dv">0</span> <span class="op">+</span><span class="st"> </span>lKe <span class="op">+</span><span class="st"> </span>lKa <span class="op">+</span><span class="st"> </span>lCl <span class="op">|</span><span class="st"> </span>Subject), <span class="dt">data =</span> Theoph,</a>
<a class="sourceLine" id="cb15-3" title="3">                    <span class="dt">cores =</span> <span class="dv">2</span>, <span class="dt">seed =</span> <span class="dv">12345</span>, </a>
<a class="sourceLine" id="cb15-4" title="4">                    <span class="dt">QR =</span> <span class="ot">TRUE</span>, <span class="dt">init_r =</span> <span class="fl">0.25</span>, <span class="dt">adapt_delta =</span> <span class="fl">0.999</span>)</a>
<a class="sourceLine" id="cb15-5" title="5"><span class="kw">pairs</span>(post3, <span class="dt">regex_pars =</span> <span class="st">&quot;^l&quot;</span>)</a>
<a class="sourceLine" id="cb15-6" title="6"><span class="kw">pairs</span>(post3, <span class="dt">regex_pars =</span> <span class="st">&quot;igma&quot;</span>)</a></code></pre></div>
<p>However, in this case the posterior distribution is bimodal Thus, you should always be running many chains when using Stan, especially <code>stan_nlmer</code>.</p>
</div>
<div id="conclusion" class="section level1">
<h1>Conclusion</h1>
<p>There are model fitting functions in the <strong>rstanarm</strong> package that can do essentially all of what can be done in the <strong>lme4</strong> and <strong>gamm4</strong> packages — in the sense that they can fit models with multilevel structure and / or nonlinear relationships — and propagate the uncertainty in the parameter estimates to the predictions and other functions of interest. The documentation of <strong>lme4</strong> and <strong>gamm4</strong> has various warnings that acknowledge that the estimated standard errors, confidence intervals, etc. are not entirely correct, even from a frequentist perspective.</p>
<p>A frequentist point estimate would also completely miss the second mode in the last example with <code>stan_nlmer</code>. Thus, there is considerable reason to prefer the <strong>rstanarm</strong> variants of these functions for regression modeling. The only disadvantage is the execution time required to produce an answer that properly captures the uncertainty in the estimates of complicated models such as these.</p>
</div>



<!-- code folding -->


<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>

</body>
</html>