1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201
|
---
title: Model components for fitted models with plm
author:
- name: Yves Croissant
date: '`r Sys.Date()`'
output: rmarkdown::html_vignette
bibliography: ../inst/REFERENCES.bib
vignette: >
%\VignetteIndexEntry{Model components for fitted models with plm}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, echo=FALSE}
library("knitr")
opts_chunk$set(message = FALSE, warning = FALSE)
```
plm tries to follow as close as possible the way models are fitted
using `lm`. This relies on the following steps, using the
`formula`-`data` with some modifications:
- compute internally the `model.frame` by getting the relevant
arguments (`formula`, `data`, `subset`, `weights`, `na.action` and
`offset`) and the supplementary argument,
- extract from the `model.frame` the response `y` (with `pmodel.response`) and
the model matrix `X` (with `model.matrix`),
- call the (non-exported) estimation function `plm.fit` with `X` and `y` as
arguments.
Panel data has a special structure which is described by an
`index` argument. This argument can be used in the `pdata.frame` function which
returns a `pdata.frame` object. A `pdata.frame` can be used as input to the `data`
argument of `plm`. If the `data` argument of `plm` is an ordinary
`data.frame`, the `index` argument can also be supplied as an argument of
`plm`. In this case, the `pdata.frame` function is called internally to
transform the data.
Next, the `formula`, which is the first and mandatory argument of
`plm` is coerced to a `Formula` object.
`model.frame` is then called, but with the `data` argument in the
first position (a `pdata.frame` object) and the `formula` in the
second position. This unusual order of the arguments enables to use a
specific `model.frame.pdata.frame` method defined in `plm`.
As for the `model.frame.formula` method, a `data.frame` is returned,
with a `terms` attribute.
Next, the `X` matrix is extracted using `model.matrix`. The usual way
to do so is to feed the function with two arguments, a `formula` or a
`terms` object and a `data.frame` created with `model.frame`. `lm` uses
something like `model.matrix(terms(mf), mf)` where `mf` is a
`data.frame` created with `model.frame`. Therefore, `model.matrix`
needs actually one argument and not two and we therefore wrote a
`model.matrix.pdata.frame` which does the job ; the method first checks
that the argument has a `term` attribute, extracts the `terms`
(actually the `formula`) and then computes the model's matrix `X`.
The response `y` is usually extracted using `model.response`, with a
`data.frame` created with `model.frame` as first argument, but it is
not generic. We therefore created a generic called `pmodel.response`
and provide a `pmodel.response.pdata.frame` method. We illustrate
these features using a simplified (in terms of covariates) example
with the `SeatBelt` data set:
```{r }
library("plm")
data("SeatBelt", package = "pder")
SeatBelt$occfat <- with(SeatBelt, log(farsocc / (vmtrural + vmturban)))
pSB <- pdata.frame(SeatBelt)
```
We start with an OLS (pooling) specification:
```{r }
formols <- occfat ~ log(usage) + log(percapin)
mfols <- model.frame(pSB, formols)
Xols <- model.matrix(mfols)
y <- pmodel.response(mfols)
coef(lm.fit(Xols, y))
```
which is equivalent to:
```{r }
coef(plm(formols, SeatBelt, model = "pooling"))
```
Next, we use an instrumental variables specification. Variable `usage` is
endogenous and instrumented by three variables indicating the law
context: `ds`, `dp`, and `dsp`.
The model is described using a two-parts formula, the first part of
the RHS describing the covariates and the second part the
instruments. The following two formulations can be used:
```{r }
formiv1 <- occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + dsp
formiv2 <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp
```
The second formulation has two advantages:
- in the common case when a lot of covariates are instruments, these
covariates don't need to be indicated in the second RHS part of the
formula,
- the endogenous variables clearly appear as they are proceeded by a
`-` sign in the second RHS part of the formula.
The formula is coerced to a `Formula`, using the `Formula`
package. `model.matrix.pdata.frame` then internally calls
`model.matrix.Formula` in order to extract the covariates and
instruments model matrices:
```{r }
mfSB1 <- model.frame(pSB, formiv1)
X1 <- model.matrix(mfSB1, rhs = 1)
W1 <- model.matrix(mfSB1, rhs = 2)
head(X1, 3) ; head(W1, 3)
```
For the second (and preferred formulation), the `dot` argument should
be set and is passed to the `Formula` methods. `.` has actually two
meanings:
- all available covariates,
- the previous covariates used while updating a formula.
which correspond respectively to `dot = "seperate"` (the default) and
`dot = "previous"`. See the difference between the following two examples:
```{r }
library("Formula")
head(model.frame(Formula(formiv2), SeatBelt), 3)
head(model.frame(Formula(formiv2), SeatBelt, dot = "previous"), 3)
```
In the first case, all the covariates are returned by `model.frame` as
the `.` is understood by default as "everything".
In `plm`, the `dot` argument is internally set to
`previous` so that the end-user doesn't have to worry about these
subtleties.
```{r }
mfSB2 <- model.frame(pSB, formiv2)
X2 <- model.matrix(mfSB2, rhs = 1)
W2 <- model.matrix(mfSB2, rhs = 2)
head(X2, 3) ; head(W2, 3)
```
The IV estimator can then be obtained as a 2SLS estimator: First,
regress the covariates on the instruments and get the fitted values:
```{r }
HX1 <- lm.fit(W1, X1)$fitted.values
head(HX1, 3)
```
Next, regress the response on these fitted values:
```{r }
coef(lm.fit(HX1, y))
```
The same can be achieved in one command by using the `formula`-`data` interface
with `plm`:
```{r }
coef(plm(formiv1, SeatBelt, model = "pooling"))
```
or with the `ivreg` function from package `AER` (or with the newer function `ivreg`
in package `ivreg` superseding `AER::ivreg()`):
```{r }
coef(AER::ivreg(formiv1, data = SeatBelt))
```
```{r eval = FALSE, include = FALSE}
X2 <- model.matrix(Formula(form1), mfSB, rhs = 2, dot = "previous")
formols <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp
form1 <- occfat ~ log(usage) + log(percapin) + log(unemp) + log(meanage) +
log(precentb) + log(precenth) + log(densrur) + log(densurb) +
log(viopcap) + log(proppcap) + log(vmtrural) + log(vmturban) +
log(fueltax) + lim65 + lim70p + mlda21 + bac08
form2 <- . ~ . | . - log(usage) + ds + dp +dsp
jorm1 <- occfat ~ log(usage) + log(percapin) + log(unemp) + log(meanage) +
log(precentb) + log(precenth) + log(densrur) + log(densurb) +
log(viopcap) + log(proppcap) + log(vmtrural) + log(vmturban) +
log(fueltax) + lim65 + lim70p + mlda21 + bac08 | . - log(usage) +
ds + dp + dsp
jorm2 <- noccfat ~ . | .
```
|