1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287
|
---
title: "Bias reduction in generalized linear models"
author: "[Ioannis Kosmidis](https://www.ikosmidis.com), Euloge Clovis Kenne Pagui"
date: "29 April 2017"
output: rmarkdown::html_vignette
bibliography: brglm2.bib
vignette: >
%\VignetteIndexEntry{Bias reduction in generalized linear models}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 6,
fig.height = 6
)
```
# The **brglm2** package
[**brglm2**](https://github.com/ikosmidis/brglm2) provides tools for the estimation and inference from [generalized linear models](https://en.wikipedia.org/wiki/Generalized_linear_model) using various methods for bias reduction or maximum penalized likelihood with powers of the Jeffreys prior as penalty. Reduction of estimation bias is achieved either through the mean-bias reducing adjusted score equations in @firth:93 and @kosmidis:09, or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as prescribed in @cordeiro:91, or through the median-bias reducing adjusted score equations in @kenne:17.
In the special case of generalized linear models for binomial, Poisson and multinomial responses (both nominal and ordinal), mean and median bias reduction and maximum penalized likelihood return estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite, like in complete and quasi-complete separation as defined in @albert:84.
The workhorse function is [`brglmFit()`](https://cran.r-project.org/package=brglm2/brglm2.pdf), which can be passed directly to the `method` argument of the `glm` function. `brglmFit` implements a quasi-[Fisher scoring](https://en.wikipedia.org/wiki/Scoring_algorithm) procedure, whose special cases result in various explicit and implicit bias reduction methods for generalized linear models [the classification of bias reduction methods into explicit and implicit is given in @kosmidis:14].
# This vignette
This vignette
+ presents the supported bias-reducing adjustments to the score functions for generalized linear models
+ describes the fitting algorithm at the core of **brglm2**
# Other resources
The bias-reducing quasi Fisher scoring iteration is also described in detail in the [bias vignette](https://cran.r-project.org/package=enrichwith/vignettes/bias.html) of the [**enrichwith**](https://cran.r-project.org/package=enrichwith) R package. @kosmidis:10 describe a parallel quasi [Newton-Raphson](https://en.wikipedia.org/wiki/Newton%27s_method) procedure.
Most of the material in this vignette comes from a presentation by [Ioannis Kosmidis](https://www.ikosmidis.com) at the useR! 2016 international conference at University of Stanford on 16 June 2016. The presentation was titled "Reduced-bias inference in generalized linear models" and can be watched online at this [link](https://channel9.msdn.com/Events/useR-international-R-User-conference/useR2016/brglm-Reduced-bias-inference-in-generalized-linear-models).
# Generalized linear models
### Model
Suppose that $y_1, \ldots, y_n$ are observations on independent random variables $Y_1, \ldots, Y_n$, each with probability density/mass function of the form
$$
f_{Y_i}(y) = \exp\left\{\frac{y \theta_i - b(\theta_i) - c_1(y)}{\phi/m_i} - \frac{1}{2}a\left(-\frac{m_i}{\phi}\right) + c_2(y) \right\}
$$
for some sufficiently smooth functions $b(.)$, $c_1(.)$, $a(.)$ and $c_2(.)$, and fixed observation weights $m_1, \ldots, m_n$. The expected value and the variance of $Y_i$ are then
\begin{align*}
E(Y_i) & = \mu_i = b'(\theta_i) \\
Var(Y_i) & = \frac{\phi}{m_i}b''(\theta_i) = \frac{\phi}{m_i}V(\mu_i)
\end{align*}
Hence, in this parameterization, $\phi$ is a dispersion parameter.
A generalized linear model links the mean $\mu_i$ to a linear predictor $\eta_i$ as
$$
g(\mu_i) = \eta_i = \sum_{t=1}^p \beta_t x_{it}
$$
where $g(.)$ is a monotone, sufficiently smooth link function, taking values on $\Re$, $x_{it}$ is the $(i,t)th$ component of a model matrix $X$, and $\beta = (\beta_1, \ldots, \beta_p)^\top$.
### Score functions and information matrix
Suppressing the dependence of the various quantities on the model parameters and the data, the derivatives of the log-likelihood about $\beta$ and $\phi$ (score functions) are
\begin{align*}
s_\beta & = \frac{1}{\phi}X^TWD^{-1}(y - \mu) \\
s_\phi & = \frac{1}{2\phi^2}\sum_{i = 1}^n (q_i - \rho_i)
\end{align*}
with $y = (y_1, \ldots, y_n)^\top$, $\mu = (\mu_1, \ldots, \mu_n)^\top$, $W = {\rm diag}\left\{w_1, \ldots, w_n\right\}$ and $D = {\rm diag}\left\{d_1, \ldots, d_n\right\}$, where $w_i = m_i d_i^2/v_i$ is the $i$th working weight, with $d_i = d\mu_i/d\eta_i$ and $v_i = V(\mu_i)$. Furthermore, $q_i = -2 m_i \{y_i\theta_i - b(\theta_i) - c_1(y_i)\}$ and $\rho_i = m_i a'_i$ with $a'_i = a'(-m_i/\phi)$.
<!-- are the $i$th deviance residual (e.g. as is implemented in the `dev.resids` component of most `family` objects) and its expectation, respectively, with $a'_i = a'(-m_i/\phi)$. The only `family` object deviating from the above description is `Gamma` where `Gamma()$dev.resids` implements $q_i - 2m_i$ instead of $q_i$. For convenience in implementation, and just for Gamma we define $\rho_i = m_i a'_1 - 2m_i = -2\psi(-m_i/\phi) + 2\log(-m_i/\phi)$, where $\psi$ is the `digamma` function. This change affects none of the estimation methods discussed in this vignette. -->
The expected information matrix about $\beta$ and $\phi$ is
$$
i =
\left[
\begin{array}{cc}
i_{\beta\beta} & 0_p \\
0_p^\top & i_{\phi\phi}
\end{array}
\right]
=
\left[
\begin{array}{cc}
\frac{1}{\phi} X^\top W X & 0_p \\
0_p^\top & \frac{1}{2\phi^4}\sum_{i = 1}^n m_i^2 a''_i
\end{array}
\right]\,,
$$
where $0_p$ is a $p$-vector of zeros, and $a''_i = a''(-m_i/\phi)$.
### Maximum likelihood estimation
The maximum likelihood estimators $\hat\beta$ and $\hat\phi$ of $\beta$ and $\phi$, respectively, can be found by the solution of the score equations $s_\beta = 0_p$ and $s_\phi = 0$.
### Mean bias-reducing adjusted score functions
Let $A_\beta = -i_{\beta\beta} b_\beta$ and $A_\phi = -i_{\phi\phi} b_\phi$, where $b_\beta$ and $b_\phi$ are the first terms in the expansion of the mean bias of the maximum likelihood estimator of the regression parameters $\beta$ and dispersion $\phi$, respectively. The results in @firth:93 can be used to show that the solution of the adjusted score equations
\begin{align*}
s_\beta + A_\beta & = 0_p \\
s_\phi + A_\phi & = 0
\end{align*}
results in estimators $\tilde\beta$ and $\tilde\phi$ with bias of smaller asymptotic order than the maximum likelihood estimator.
The results in either @kosmidis:09 or @cordeiro:91 can then be used to re-express the adjustments in forms that are convenient for implementation. In particular, and after some algebra the bias-reducing adjustments for generalized linear models are
\begin{align*}
A_\beta & = X^\top W \xi \,, \\
A_\phi & = \frac{(p - 2)}{2\phi} + \frac{\sum_{i = 1}^n m_i^3
a'''_i}{2\phi^2\sum_{i = 1}^n m_i^2
a''_i}
%A_\phi & = \frac{(p - 2)\phi\sum_{i = 1}^n m_i^2
% a''(-m_i/\phi) + \sum_{i = 1}^n m_i^3
% a'''(-m_i/\phi))}{2\phi^2\sum_{i = 1}^n m_i^2
% a''(-m_i/\phi)}
\end{align*}
where $\xi = (\xi_1, \ldots, \xi_n)^T$ with $\xi_i = h_id_i'/(2d_iw_i)$, $d_i' = d^2\mu_i/d\eta_i^2$, $a''_i = a''(-m_i/\phi)$, $a'''_i = a'''(-m_i/\phi)$, and $h_i$ is the "hat" value for the $i$th observation (see, e.g. `?hatvalues`).
### Median bias-reducing adjusted score functions
The results in @kenne:17 can be used to show that if
\begin{align*}
A_\beta & = X^\top W (\xi + X u) \\
A_\phi & = \frac{p}{2\phi}+\frac{ \sum_{i = 1}^n m_i^3
a'''_i}{6\phi^2\sum_{i = 1}^n m_i^2 a''_i} \, ,
\end{align*}
then the solution of the adjusted score equations $s_\beta + A_\beta = 0_p$ and $s_\phi + A_\phi = 0$ results in estimators $\tilde\beta$ and $\tilde\phi$ with median bias of smaller asymptotic order than the maximum likelihood estimator. In the above expression, $u = (u_1, \ldots, u_p)^\top$ with
\begin{align*}
u_j = [(X^\top W X)^{-1}]_{j}^\top X^\top \left[
\begin{array}{c}
\tilde{h}_{j,1} \left\{d_1 v'_1 / (6 v_1) - d'_1/(2 d_1)\right\} \\
\vdots \\
\tilde{h}_{j,n} \left\{d_n v'_n / (6 v_n) - d'_n/(2 d_n)\right\}
\end{array}
\right]
\end{align*}
where $[A]_j$ denotes the $j$th row of matrix $A$ as a column vector, $v'_i = V'(\mu_i)$, and $\tilde{h}_{j,i}$ is the $i$th diagonal element of $X K_j X^T W$, with $K_j = [(X^\top W X)^{-1}]_{j} [(X^\top W X)^{-1}]_{j}^\top / [(X^\top W X)^{-1}]_{jj}$\.
### Mixed adjustments
The results in @kosmidis:2019 can be used to show that if
\begin{align*}
A_\beta & = X^\top W \xi \,, \\
A_\phi & = \frac{p}{2\phi}+\frac{ \sum_{i = 1}^n m_i^3
a'''_i}{6\phi^2\sum_{i = 1}^n m_i^2 a''_i} \, ,
\end{align*}
then the solution of the adjusted score equations $s_\beta + A_\beta = 0_p$ and $s_\phi + A_\phi = 0$ results in estimators $\tilde\beta$ with mean bias of small asymptotic order than the maximum likelihood estimator and $\tilde\phi$ with median bias of smaller asymptotic order than the maximum likelihood estimator.
### Maximum penalized likelihood with powers of Jeffreys prior as penalty
The likelihood penalized by a power of the Jeffreys prior
\[
|i_{\beta\beta}|^a |i_{\phi\phi}|^a \quad a > 0
\]
can be maximized by solving the adjusted score equations $s_\beta + A_\beta = 0_p$ and $s_\phi + A_\phi = 0$ with
\begin{align*}
A_\beta & = X^\top W \rho \,, \\
A_\phi & = -\frac{p + 4}{2\phi}+\frac{ \sum_{i = 1}^n m_i^3
a'''_i}{6\phi^2\sum_{i = 1}^n m_i^2 a''_i} \, ,
\end{align*}
where $\rho = (\rho_1, \ldots, \rho_n)^T$ with $\rho_i = h_i \{2 d_i'/(d_i w_i) - v_i' d_i/(v_i w_i)\}$.
<!-- The results in @kenne:17 can be used to show that if -->
<!-- \begin{align*} -->
<!-- A_\beta & = i_\beta \left\{\frac{u_1}{2} + \frac{u_2}{6} \right\} \\ -->
<!-- A_\phi & = \frac{p}{2\phi}+\frac{ \sum_{i = 1}^n m_i^3 -->
<!-- a'''_i}{6\phi^2\sum_{i = 1}^n m_i^2 a''_i} \, , -->
<!-- \end{align*} -->
<!-- then the solution of the adjusted score equations $s_\beta + A_\beta = 0_p$ and $s_\phi + A_\phi = 0$ results in estimators $\tilde\beta$ and $\tilde\phi$ with median bias of smaller asymptotic order than the maximum likelihood estimator. -->
<!-- In the above expressions, -->
<!-- \begin{align*} -->
<!-- u_{1r} & = \sum_{s=1}^p i^{rs}_\beta tr\{\nu_\beta (L_s+K_s)\} \\ -->
<!-- u_{2r} & = \sum_{s=1}^p i^{rs}_\beta \{i^{r}_\beta\}^TL_si^{r}_\beta/i^{rr}_\beta -->
<!-- \end{align*} -->
<!-- where $i^{rs}_\beta$ is the $(r,s)$ element of $i^{-1}_\beta,\,$ $i^{r}_\beta$ is the $r$th column of $i^{-1}_\beta$, -->
<!-- $\nu_{\beta, r} = i^{-1}_\beta - i^{r}_\beta \left\{i^{r}_\beta\right\}^\top$, $L_s = X^TWQ_{1s}X/\phi,\,$ $K_s = -X^TWQ_{2s}X/\phi$, -->
<!-- <\!--$Q_{1s} = diag\{V'(\mu_1)d_1x_{1s}/V(\mu_1),\ldots,V'(\mu_n)d_1x_{ns}/V(\mu_n)\}$ and -\-> -->
<!-- <\!--$Q_{2s} = diag\{d_1x_{1s}(\frac{V'(\mu_1)}{V(\mu_1)}-\frac{d'_1}{d^2_1}),\ldots, -\-> -->
<!-- <\!--d_nx_{ns}(\frac{V'(\mu_n)}{V(\mu_n)}-\frac{d'_n}{d^2_n})\}$. -\-> -->
<!-- $Q_{1s} = {\rm diag}\{v'_1 d_1x_{1s}/v_1, \ldots, v'_n d_n x_{ns}/v_n\}$ and $Q_{2s} = Q_{1s} - {\rm diag}\{d_1x_{1s}/d'_1, \ldots, d_n x_{ns}/d'_n\}$, with $v'_i = V'(\mu_i)$. -->
# Fitting algorithm in `brglmFit`
`brglmFit()` implements a quasi Fisher scoring procedure for solving the adjusted score equations $s_\beta + A_\beta = 0_p$ and $s_\phi + A_\phi = 0$. The iteration consists of an outer loop and an inner loop that implements step-halving. The algorithm is as follows:
### Input
+ $s_\beta$, $i_{\beta\beta}$, $A_\beta$
+ $s_\phi$, $i_{\phi\phi}$, $A_\phi$
+ Starting values $\beta^{(0)}$ and $\phi^{(0)}$
+ $\epsilon > 0$: tolerance for the $L^\infty$ norm of the search direction before reporting convergence
+ $M$: maximum number of halving steps that can be taken
### Output
+ $\tilde\beta$, $\tilde\phi$
### Iteration
*Initialize outer loop*
1. $k \leftarrow 0$
2. $\upsilon_\beta^{(0)} \leftarrow \left\{i_{\beta\beta}\left(\beta^{(0)}, \phi^{(0)}\right)\right\}^{-1} \left\{s_\beta\left(\beta^{(0)}, \phi^{(0)}\right) + A_\beta\left(\beta^{(0)}, \phi^{(0)}\right)\right\}$
3. $\upsilon_\phi^{(0)} \leftarrow \left\{i_{\phi\phi}\left(\beta^{(0)}, \phi^{(0)}\right)\right\}^{-1} \left\{s_\phi\left(\beta^{(0)}, \phi^{(0)}\right) + A_\phi\left(\beta^{(0)}, \phi^{(0)}\right)\right\}$
*Initialize inner loop*
4. $m \leftarrow 0$
5. $b^{(m)} \leftarrow \beta^{(k)}$
6. $f^{(m)} \leftarrow \phi^{(k)}$
7. $v_\beta^{(m)} \leftarrow \upsilon_\beta^{(k)}$
8. $v_\phi^{(m)} \leftarrow \upsilon_\phi^{(k)}$
9. $d \leftarrow \left|\left|(v_\beta^{(m)}, v_\phi^{(m)})\right|\right|_\infty$
*Update parameters*
10. $b^{(m + 1)} \leftarrow b^{(m)} + 2^{-m} v_\beta^{(m)}$
11. $f^{(m + 1)} \leftarrow f^{(m)} + 2^{-m} v_\phi^{(m)}$
*Update direction*
12. $v_\beta^{(m + 1)} \leftarrow \left\{i_{\beta\beta}\left(b^{(m + 1)}, f^{(m + 1)}\right)\right\}^{-1} \left\{s_\beta\left(b^{(m + 1)}, f^{(m + 1)}\right) + A_\beta\left(b^{(m + 1)}, f^{(m + 1)}\right)\right\}$
13. $v_\phi^{(m + 1)} \leftarrow \left\{i_{\phi\phi}\left(b^{(m + 1)}, f^{(m + 1)}\right)\right\}^{-1} \left\{s_\phi\left(b^{(m + 1)}, f^{(m + 1)}\right) + A_\phi\left(b^{(m + 1)}, f^{(m + 1)}\right)\right\}$
*Continue or break halving within inner loop*
14. if $m + 1 < M$ and $\left|\left|(v_\beta^{(m + 1)}, v_\phi^{(m + 1)})\right|\right|_\infty > d$
14.1. $m \leftarrow m + 1$
14.2. GO TO 10
15. else
15.1. $\beta^{(k + 1)} \leftarrow b^{(m + 1)}$
15.2. $\phi^{(k + 1)} \leftarrow f^{(m + 1)}$
15.3. $\upsilon_\beta^{(k + 1)} \leftarrow v_b^{(m + 1)}$
15.4. $\upsilon_\phi^{(k + 1)} \leftarrow v_f^{(m + 1)}$
*Continue or break outer loop*
16. if $k + 1 < K$ and $\left|\left|(\upsilon_\beta^{(k + 1)}, \upsilon_\phi^{(k + 1)})\right|\right|_\infty > \epsilon$
16.1 $k \leftarrow k + 1$
16.2. GO TO 4
17. else
17.1. $\tilde\beta \leftarrow \beta^{(k + 1)}$
17.2. $\tilde\phi \leftarrow \phi^{(k + 1)}$
17.3. STOP
# Notes
+ For $K = M = 1$, $\beta^{(0)} = \hat\beta$ and $\phi^{(0)} = \hat\phi$, the above iteration computes the bias-corrected estimates proposed in @cordeiro:91. This is achieved when the `brglmFit()` function is called with `type = "correction"` (see `?brglmFit`).
+ The mean-bias reducing adjusted score functions are solved when the `brglmFit()` function is called with `type = "AS_mean"`, and the median-bias reducing adjusted score functions with `type = "AS_median"` (see `?brglmFit`). Estimation using mixed adjustments is through `type = "AS_mixed"`. `type = "MPL_Jeffreys"` does maximum penalized likelihood with a power of the Jeffreys prior as penalty.
+ The steps where $\phi$ and the $\phi$ direction are updated are ignored for generalized linear models with known dispersion parameter, like in models with binomial and Poisson responses. Also, in that case, $v_\phi^{(.)}$ and $\upsilon_\phi^{(.)}$ in steps 9, 14 and 16 are set to zero.
+ The implementation of the adjusted score functions requires ready implementations of $d^2\mu_i/d\eta_i^2$, $a'(.)$, $a''(.)$ and $a'''(.)$. The [**enrichwith**](https://cran.r-project.org/package=enrichwith) R package is used internally to enrich the base `family` and `link-glm` objects with implementations of those functions (see `?enrich.family` and `?enrich.link-glm`).
+ The above iteration can be used to implement a variety of additive adjustments to the score function, by supplying the algorithm with appropriate adjustment functions $A_\beta$ and $A_\phi$
# Contributions to this vignette
The first version of the vignette has been written by Ioannis Kosmidis. Eugene Clovis Kenne Pagui and Nicola Sartori contributed the first version of the section "Median bias-reducing adjusted score functions", and Ioannis Kosmidis brought the expressions for the median bias-reducing adjustments in the reduced form that is shown above and is implemented in `brglmFit()`.
@kosmidis:2019 provides more details about mean and median bias reduction in generalized linear models.
# Citation
If you found this vignette or **brglm2**, in general, useful, please consider citing **brglm2** and the associated paper. You can find information on how to do this by typing `citation("brglm2")`.
# References
|