File: C_plmModelComponents.Rmd

package info (click to toggle)
r-cran-plm 2.6-2%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 6,032 kB
  • sloc: sh: 13; makefile: 4
file content (201 lines) | stat: -rw-r--r-- 7,139 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
title: Model components for fitted models with plm
author:
- name: Yves Croissant
date: '`r Sys.Date()`'
output: rmarkdown::html_vignette
bibliography: ../inst/REFERENCES.bib
vignette: >
  %\VignetteIndexEntry{Model components for fitted models with plm}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, echo=FALSE}
library("knitr")
opts_chunk$set(message = FALSE, warning = FALSE)
```

plm tries to follow as close as possible the way models are fitted
using `lm`. This relies on the following steps, using the
`formula`-`data` with some modifications:

- compute internally the `model.frame` by getting the relevant
  arguments (`formula`, `data`, `subset`, `weights`, `na.action` and
  `offset`) and the supplementary argument,
- extract from the `model.frame` the response `y` (with `pmodel.response`) and
  the model matrix `X` (with `model.matrix`),
- call the (non-exported) estimation function `plm.fit` with `X` and `y` as
  arguments.



Panel data has a special structure which is described by an
`index` argument. This argument can be used in the `pdata.frame` function which
returns a `pdata.frame` object. A `pdata.frame` can be used as input to the `data`
argument of `plm`. If the `data` argument of `plm` is an ordinary
`data.frame`, the `index` argument can also be supplied as an argument of
`plm`. In this case, the `pdata.frame` function is called internally to
transform the data.

Next, the `formula`, which is the first and mandatory argument of
`plm` is coerced to a `Formula` object. 

`model.frame` is then called, but with the `data` argument in the
first position (a `pdata.frame` object) and the `formula` in the
second position. This unusual order of the arguments enables to use a
specific `model.frame.pdata.frame` method defined in `plm`.

As for the `model.frame.formula` method, a `data.frame` is returned,
with a `terms` attribute.

Next, the `X` matrix is extracted using `model.matrix`. The usual way
to do so is to feed the function with two arguments, a `formula` or a
`terms` object and a `data.frame` created with `model.frame`. `lm` uses
something like `model.matrix(terms(mf), mf)` where `mf` is a
`data.frame` created with `model.frame`. Therefore, `model.matrix`
needs actually one argument and not two and we therefore wrote a
`model.matrix.pdata.frame` which does the job ; the method first checks
that the argument has a `term` attribute, extracts the `terms`
(actually the `formula`) and then computes the model's matrix `X`.

The response `y` is usually extracted using `model.response`, with a
`data.frame` created with `model.frame` as first argument, but it is
not generic. We therefore created a generic called `pmodel.response`
and provide a `pmodel.response.pdata.frame` method. We illustrate
these features using a simplified (in terms of covariates) example
with the `SeatBelt` data set:

```{r }
library("plm")
data("SeatBelt", package = "pder")
SeatBelt$occfat <- with(SeatBelt, log(farsocc / (vmtrural + vmturban)))
pSB <- pdata.frame(SeatBelt)
```

We start with an OLS (pooling) specification: 

```{r }
formols <- occfat ~ log(usage) + log(percapin)
mfols <- model.frame(pSB, formols)
Xols <- model.matrix(mfols)
y <- pmodel.response(mfols)
coef(lm.fit(Xols, y))
```

which is equivalent to:

```{r }
coef(plm(formols, SeatBelt, model = "pooling"))
```

Next, we use an instrumental variables specification. Variable `usage` is
endogenous and instrumented by three variables indicating the law
context: `ds`, `dp`, and `dsp`. 

The model is described using a two-parts formula, the first part of
the RHS describing the covariates and the second part the
instruments. The following two formulations can be used:

```{r }
formiv1 <- occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + dsp
formiv2 <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp
```
The second formulation has two advantages:

- in the common case when a lot of covariates are instruments, these
  covariates don't need to be indicated in the second RHS part of the
  formula,
- the endogenous variables clearly appear as they are proceeded by a
  `-` sign in the second RHS part of the formula.

The formula is coerced to a `Formula`, using the `Formula`
package. `model.matrix.pdata.frame` then internally calls 
`model.matrix.Formula` in order to extract the covariates and
instruments model matrices:


```{r }
mfSB1 <- model.frame(pSB, formiv1)
X1 <- model.matrix(mfSB1, rhs = 1)
W1 <- model.matrix(mfSB1, rhs = 2)
head(X1, 3) ; head(W1, 3)
```
For the second (and preferred formulation), the `dot` argument should
be set and is passed to the `Formula` methods. `.` has actually two
meanings: 

- all available covariates,
- the previous covariates used while updating a formula.

which correspond respectively to `dot = "seperate"` (the default) and
`dot = "previous"`. See the difference between the following two examples:


```{r }
library("Formula")
head(model.frame(Formula(formiv2), SeatBelt), 3)
head(model.frame(Formula(formiv2), SeatBelt, dot = "previous"), 3)
```
In the first case, all the covariates are returned by `model.frame` as
the `.` is understood by default as "everything".

In `plm`, the `dot` argument is internally set to
`previous` so that the end-user doesn't have to worry about these
subtleties. 

```{r }
mfSB2 <- model.frame(pSB, formiv2)
X2 <- model.matrix(mfSB2, rhs = 1)
W2 <- model.matrix(mfSB2, rhs = 2)
head(X2, 3) ; head(W2, 3)
```
The IV estimator can then be obtained as a 2SLS estimator: First,
regress the covariates on the instruments and get the fitted values:

```{r }
HX1 <- lm.fit(W1, X1)$fitted.values
head(HX1, 3)
```

Next, regress the response on these fitted values:

```{r }
coef(lm.fit(HX1, y))
```

The same can be achieved in one command by using the `formula`-`data` interface
with `plm`:

```{r }
coef(plm(formiv1, SeatBelt, model = "pooling"))
```


or with the `ivreg` function from package `AER` (or with the newer function `ivreg`
in package `ivreg` superseding `AER::ivreg()`):

```{r }
coef(AER::ivreg(formiv1, data = SeatBelt))
```



```{r eval = FALSE, include = FALSE}
X2 <- model.matrix(Formula(form1), mfSB, rhs = 2, dot = "previous")

formols <- occfat ~ log(usage) + log(percapin)  | . - log(usage) +  ds + dp + dsp

form1 <- occfat ~ log(usage) + log(percapin) + log(unemp) + log(meanage) + 
    log(precentb) + log(precenth) + log(densrur) + log(densurb) + 
    log(viopcap) + log(proppcap) + log(vmtrural) + log(vmturban) + 
    log(fueltax) + lim65 + lim70p + mlda21 + bac08
form2 <- . ~ . |  . - log(usage) + ds + dp +dsp

jorm1 <- occfat ~ log(usage) + log(percapin) + log(unemp) + log(meanage) + 
    log(precentb) + log(precenth) + log(densrur) + log(densurb) + 
    log(viopcap) + log(proppcap) + log(vmtrural) + log(vmturban) + 
    log(fueltax) + lim65 + lim70p + mlda21 + bac08 | . - log(usage) + 
    ds + dp + dsp
jorm2 <- noccfat ~ . | .
```