File: probit.tex

package info (click to toggle)
gretl 2016d-1
  • links: PTS
  • area: main
  • in suites: stretch
  • size: 48,620 kB
  • ctags: 22,779
  • sloc: ansic: 345,830; sh: 4,648; makefile: 2,712; xml: 570; perl: 364
file content (1442 lines) | stat: -rw-r--r-- 58,818 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
\chapter{Discrete and censored dependent variables}
\label{chap:probit}

This chapter deals with models for dependent variables that are
discrete or censored or otherwise limited (as in event counts or
durations, which must be positive) and that therefore call for
estimation methods other than the classical linear model. We discuss
several estimators (mostly based on the Maximum Likelihood principle),
adding some details and examples to complement the material on
these methods in the \emph{Gretl Command Reference}.

\section{Logit and probit models}
\label{sec:logit-probit}

It often happens that one wants to specify and estimate a model in
which the dependent variable is not continuous, but discrete. A
typical example is a model in which the dependent variable is the
occupational status of an individual (1 = employed, 0 = unemployed). A
convenient way of formalizing this situation is to consider the
variable $y_i$ as a Bernoulli random variable and analyze its
distribution conditional on the explanatory variables $x_i$.  That is,
%
\begin{equation}
  \label{eq:qr-Bernoulli}
  y_i = \left\{ 
    \begin{array}{ll} 
      1 & P_i \\ 0 & 1 - P_i 
    \end{array}
    \right.
\end{equation}
%
where $P_i = P(y_i = 1 | x_i) $ is a given function of the explanatory
variables $x_i$.

In most cases, the function $P_i$ is a cumulative distribution
function $F$, applied to a linear combination of the $x_i$s. In the
probit model, the normal cdf is used, while the logit model employs
the logistic function $\Lambda()$. Therefore, we have
%
\begin{eqnarray}
  \label{eq:qr-link}
  \textrm{probit} & \qquad & P_i = F(z_i) = \Phi(z_i)  \\
  \textrm{logit}  & \qquad & P_i = F(z_i) = \Lambda(z_i) = \frac{1}{1 + e^{-z_i}} \\
  & &z_i = \sum_{j=1}^k x_{ij} \beta_j
\end{eqnarray}
%
where $z_i$ is commonly known as the \emph{index} function. Note that
in this case the coefficients $\beta_j$ cannot be interpreted as the
partial derivatives of $E(y_i | x_i)$ with respect to
$x_{ij}$.  However, for a given value of $x_i$ it is possible to
compute the vector of ``slopes'', that is
\[
  \mathrm{slope}_j(\bar{x}) = \left. \pder{F(z)}{x_j} \right|_{z =
    \bar{z}}
\]
Gretl automatically computes the slopes, setting each
explanatory variable at its sample mean.

Another, equivalent way of thinking about this model is in terms of
an unobserved variable $y^*_i$ which can be described thus:
%
\begin{equation}
  \label{eq:qr-latent}
  y^*_i = \sum_{j=1}^k x_{ij} \beta_j + \varepsilon_i = z_i  +
  \varepsilon_i 
\end{equation}
%
We observe $y_i = 1$ whenever $y^*_i > 0$ and $y_i = 0$ otherwise. If
$\varepsilon_i$ is assumed to be normal, then we have the probit
model. The logit model arises if we assume that the density function
of $\varepsilon_i$ is
%
\[
  \lambda(\varepsilon_i) =
  \pder{\Lambda(\varepsilon_i)}{\varepsilon_i} =
  \frac{e^{-\varepsilon_i}}{(1 + e^{-\varepsilon_i})^2}
\]

Both the probit and logit model are estimated in gretl via
maximum likelihood, where the log-likelihood can be written as
\begin{equation}
  \label{eq:qr-loglik}
  L(\beta) = \sum_{y_i=0} \ln [ 1 - F(z_i)] + \sum_{y_i=1} \ln F(z_i),
\end{equation}
which is always negative, since $0 < F(\cdot) < 1$.  Since the score
equations do not have a closed form solution, numerical optimization
is used. However, in most cases this is totally transparent to the
user, since usually only a few iterations are needed to ensure
convergence. The \option{verbose} switch can be used to track the
maximization algorithm.

\begin{script}[htbp]
  \caption{Estimation of simple logit and probit models}
  \label{simple-QR}
\begin{scode}
open greene19_1

logit GRADE const GPA TUCE PSI
probit GRADE const GPA TUCE PSI
\end{scode}
\end{script}

As an example, we reproduce the results given in chapter 21 of
\cite{greene00}, where the effectiveness of a program for teaching
economics is evaluated by the improvements of students' grades.
Running the code in example \ref{simple-QR} gives the output reported
in Table \ref{tab:logit-probit}. Note that for the probit model a
conditional moment test on skewness and kurtosis \citep*{JBL84} is
printed automatically as a test for normality.

\begin{table}
  \centering
\begin{code}
Model 1: Logit estimates using the 32 observations 1-32
Dependent variable: GRADE

      VARIABLE       COEFFICIENT        STDERROR      T STAT       SLOPE
                                                                  (at mean)
  const               -13.0213           4.93132      -2.641
  GPA                   2.82611          1.26294       2.238      0.533859   
  TUCE                  0.0951577        0.141554      0.672      0.0179755  
  PSI                   2.37869          1.06456       2.234      0.449339   

  Mean of GRADE = 0.344
  Number of cases 'correctly predicted' = 26 (81.2%)
  f(beta'x) at mean of independent vars = 0.189
  McFadden's pseudo-R-squared = 0.374038
  Log-likelihood = -12.8896
  Likelihood ratio test: Chi-square(3) = 15.4042 (p-value 0.001502)
  Akaike information criterion (AIC) = 33.7793
  Schwarz Bayesian criterion (BIC) = 39.6422
  Hannan-Quinn criterion (HQC) = 35.7227

           Predicted
             0    1
  Actual 0  18    3
         1   3    8

Model 2: Probit estimates using the 32 observations 1-32
Dependent variable: GRADE

      VARIABLE       COEFFICIENT        STDERROR      T STAT       SLOPE
                                                                  (at mean)
  const                -7.45232          2.54247      -2.931
  GPA                   1.62581          0.693883      2.343      0.533347   
  TUCE                  0.0517288        0.0838903     0.617      0.0169697  
  PSI                   1.42633          0.595038      2.397      0.467908   

  Mean of GRADE = 0.344
  Number of cases 'correctly predicted' = 26 (81.2%)
  f(beta'x) at mean of independent vars = 0.328
  McFadden's pseudo-R-squared = 0.377478
  Log-likelihood = -12.8188
  Likelihood ratio test: Chi-square(3) = 15.5459 (p-value 0.001405)
  Akaike information criterion (AIC) = 33.6376
  Schwarz Bayesian criterion (BIC) = 39.5006
  Hannan-Quinn criterion (HQC) = 35.581

           Predicted
             0    1
  Actual 0  18    3
         1   3    8

Test for normality of residual -
  Null hypothesis: error is normally distributed
  Test statistic: Chi-square(2) = 3.61059
  with p-value = 0.164426
\end{code}
  \caption{Example logit and probit}
  \label{tab:logit-probit}
\end{table}

In this context, the \dollar{uhat} accessor function takes a special
meaning: it returns generalized residuals as defined in
\citet*{gourieroux87}, which can be interpreted as unbiased estimators
of the latent disturbances $\varepsilon_i$. These are defined as
%
\begin{equation}
  \label{eq:QR-genres}
  u_i = \left\{
    \begin{array}{ll}
      y_i - \hat{P}_i & \textrm{for the logit model} \\
      y_i\cdot \frac{\phi(\hat{z}_i)}{\Phi(\hat{z}_i)} - 
      ( 1 - y_i ) \cdot \frac{\phi(\hat{z}_i)}{1 - \Phi(\hat{z}_i)}
      & \textrm{for the probit model} \\
    \end{array}
    \right.
\end{equation}

Among other uses, generalized residuals are often used for diagnostic
purposes.  For example, it is very easy to set up an omitted variables
test equivalent to the familiar LM test in the context of a linear
regression; example \ref{QR-add} shows how to perform a variable
addition test.

\begin{script}[htbp]
  \caption{Variable addition test in a probit model}
  \label{QR-add}
\begin{scode}
open greene19_1

probit GRADE const GPA PSI
series u = $uhat 
ols u const GPA PSI TUCE -q
printf "Variable addition test for TUCE:\n"
printf "Rsq * T = %g (p. val. = %g)\n", $trsq, pvalue(X,1,$trsq) 
\end{scode}
\end{script}
%$

\subsection{The perfect prediction problem}
\label{sec:perfpred}

One curious characteristic of logit and probit models is that (quite
paradoxically) estimation is not feasible if a model fits the data
perfectly; this is called the \emph{perfect prediction problem}. The
reason why this problem arises is easy to see by considering equation
(\ref{eq:qr-loglik}): if for some vector $\beta$ and scalar $k$ it's
the case that $z_i < k$ whenever $y_i=0$ and $z_i > k$ whenever
$y_i=1$, the same thing is true for any multiple of $\beta$. Hence,
$L(\beta)$ can be made arbitrarily close to 0 simply by choosing
enormous values for $\beta$. As a consequence, the log-likelihood has
no maximum, despite being bounded.

Gretl has a mechanism for preventing the algorithm from
iterating endlessly in search of a non-existent maximum. One sub-case
of interest is when the perfect prediction problem arises because of
a single binary explanatory variable. In this case, the offending
variable is dropped from the model and estimation proceeds with the
reduced specification. Nevertheless, it may happen that no single
``perfect classifier'' exists among the regressors, in which case
estimation is simply impossible and the algorithm stops with an
error. This behavior is triggered during the iteration process if
\[
  \stackunder{i: y_i = 0}{\max z_i} \, < \,
  \stackunder{i: y_i = 1}{\min z_i}  
\]
If this happens, unless your model is trivially mis-specified (like
predicting if a country is an oil exporter on the basis of oil
revenues), it is normally a small-sample problem: you probably just
don't have enough data to estimate your model. You may want to drop
some of your explanatory variables.

This problem is well analyzed in \cite{stokes04}; the results therein
are replicated in the example script \texttt{murder\_rates.inp}. 

\section{Ordered response models}
\label{sec:ordered}

These models constitute a simple variation on ordinary logit/probit
models, and are usually applied when the dependent variable is a
discrete and ordered measurement---not simply binary, but on an
ordinal rather than an interval scale.  For example, this sort of
model may be applied when the dependent variable is a qualitative
assessment such as ``Good'', ``Average'' and ``Bad''.  

In the general case, consider an ordered response variable, $y$, that
can take on any of the $J+1$ values ${0,1,2,\dots,J}$.  We suppose, as
before, that underlying the observed response is a latent variable,
\[
  y^* = X\beta + \varepsilon = z + \varepsilon
\]
Now define ``cut points'', $\alpha_1 < \alpha_2 < \cdots < \alpha_J$,
such that
%
\begin{equation*}
  \begin{array}{ll}
    y = 0 & \textrm{if } \, y^* \leq \alpha_1 \\
    y = 1 & \textrm{if } \, \alpha_1 < y^* \leq \alpha_2 \\
    \vdots \\
    y = J & \textrm{if } \, y^* > \alpha_J \\
  \end{array}
\end{equation*}
For example, if the response takes on three values there will be two
such cut points, $\alpha_1$ and $\alpha_2$. 

The probability that individual $i$ exhibits response $j$, conditional
on the characteristics $x_i$, is then given by
%
\begin{equation}
  \label{eq:QR-ordered}
  P(y_i = j \,|\, x_i) = \left\{
    \begin{array}{ll}
      P(y^* \leq \alpha_1 \,|\, x_i) = F(\alpha_1 - z_i) & \textrm{for }\, j = 0 \\
      P(\alpha_j < y^* \leq \alpha_{j+1} \,|\, x_i) = 
      F(\alpha_{j+1} - z_i) -  F(\alpha_j - z_i) & \textrm{for }\, 0 < j < J \\
      P(y^* > \alpha_J \,|\, x_i) = 1 -  F(\alpha_J - z_i) & \textrm{for }\, j = J 
    \end{array}
    \right.
\end{equation}
%
The unknown parameters $\alpha_j$ are estimated jointly with the
$\beta$s via maximum likelihood.  The $\hat{\alpha}_j$ estimates are
reported by gretl as \texttt{cut1}, \texttt{cut2} and so on. For
the probit variant, a conditional moment test for normality
constructed in the spirit of \cite{chesher-irish87} is also included.

Note that the $\alpha_j$ parameters can be shifted arbitrarily by
adding a constant to $z_i$, so the model is under-identified if there
is some linear combination of the explanatory variables which is
constant. The most obvious case in which this occurs is when the model
contains a constant term; for this reason, gretl drops
automatically the intercept if present. However, it may happen that
the user inadventently specifies a list of regressors that may be
combined in such a way to produce a constant (for example, by using a
full set of dummy variables for a discrete factor). If this happens,
gretl will also drop any offending regressors.

In order to apply these models in gretl, the dependent variable
must either take on only non-negative integer values, or be explicitly
marked as discrete.  (In case the variable has non-integer values, it
will be recoded internally.)  Note that gretl does not provide a
separate command for ordered models: the \texttt{logit} and
\texttt{probit} commands automatically estimate the ordered version if
the dependent variable is acceptable, but not binary.

\begin{script}[htbp]
  \caption{Ordered probit model}
  \label{ex:oprobit}
\begin{scode}
/*
  Replicate the results in Wooldridge, Econometric Analysis of Cross
  Section and Panel Data, section 15.10, using pension-plan data from
  Papke (AER, 1998).

  The dependent variable, pctstck (percent stocks), codes the asset
  allocation responses of "mostly bonds", "mixed" and "mostly stocks"
  as {0, 50, 100}.

  The independent variable of interest is "choice", a dummy indicating
  whether individuals are able to choose their own asset allocations.
*/

open pension.gdt

# demographic characteristics of participant
list DEMOG = age educ female black married
# dummies coding for income level
list INCOME = finc25 finc35 finc50 finc75 finc100 finc101

# Papke's OLS approach
ols pctstck const choice DEMOG INCOME wealth89 prftshr
# save the OLS choice coefficient 
choice_ols = $coeff(choice)

# estimate ordered probit
probit pctstck choice DEMOG INCOME wealth89 prftshr

k = $ncoeff
matrix b = $coeff[1:k-2]
a1 = $coeff[k-1]
a2 = $coeff[k]

/* 
   Wooldridge illustrates the 'choice' effect in the ordered probit 
   by reference to a single, non-black male aged 60, with 13.5 years 
   of education, income in the range $50K - $75K and wealth of $200K, 
   participating in a plan with profit sharing.
*/
matrix X = {60, 13.5, 0, 0, 0, 0, 0, 0, 1, 0, 0, 200, 1}

# with 'choice' = 0
scalar Xb = (0 ~ X) * b
P0 = cdf(N, a1 - Xb)
P50 = cdf(N, a2 - Xb) - P0
P100 = 1 - cdf(N, a2 - Xb)
E0 = 50 * P50 + 100 * P100

# with 'choice' = 1
Xb = (1 ~ X) * b
P0 = cdf(N, a1 - Xb)
P50 = cdf(N, a2 - Xb) - P0
P100 = 1 - cdf(N, a2 - Xb)
E1 = 50 * P50 + 100 * P100

printf "\nWith choice, E(y) = %.2f, without E(y) = %.2f\n", E1, E0
printf "Estimated choice effect via ML = %.2f (OLS = %.2f)\n", E1 - E0,
  choice_ols
\end{scode}
\end{script}

Listing \ref{ex:oprobit} reproduces the results presented in section
15.10 of \cite{wooldridge-panel}.  The question of interest in this
analysis is what difference it makes, to the allocation of assets
in pension funds, whether individual plan participants have a
choice in the matter.  The response variable is an ordinal measure of
the weight of stocks in the pension portfolio.  Having reported the
results of estimation of the ordered model, Wooldridge illustrates the
effect of the \texttt{choice} variable by reference to an ``average''
participant.  The example script shows how one can compute this effect
in gretl.  

After estimating ordered models, the \dollar{uhat} accessor yields
generalized residuals as in binary models; additionally, the
\dollar{yhat} accessor function returns $\hat{z}_i$, so it is
possible to compute an unbiased estimator of the latent variable
$y^*_i$ simply by adding the two together.

\section{Multinomial logit}
\label{sec:mlogit}

When the dependent variable is not binary and does not have a natural
ordering, \emph{multinomial} models are used.  Multinomial logit is
supported in gretl via the \verb|--multinomial| option to the
\texttt{logit} command.  Simple models can also be handled via the
\texttt{mle} command (see chapter \ref{chap:mle}). We give here an
example of such a model.  Let the dependent variable, $y_i$, take on
integer values $0,1,\dots p$.  The probability that $y_i = k$ is given
by
\[
  P(y_i = k |  x_i) = \frac{\exp(x_i \beta_k)}{\sum_{j=0}^p \exp(x_i \beta_j)}
\]
For the purpose of identification one of the outcomes must be taken as
the ``baseline''; it is usually assumed that $\beta_0 = 0$, in which case
\[
  P(y_i = k |  x_i) = \frac{\exp(x_i \beta_k)}{1 + \sum_{j=1}^p \exp(x_i \beta_j)} 
\]
and
\[
  P(y_i = 0 |  x_i) = \frac{1}{1 + \sum_{j=1}^p \exp(x_i \beta_j)} .
\]

\begin{script}[htbp]
  \caption{Multinomial logit}
  \label{ex:mlogit}
Input:
\begin{scodebit}
open keane.gdt
smpl (year=87) --restrict
logit status 0 educ exper expersq black --multinomial
\end{scodebit}
Output (selected portions):
\begin{scodebit}
Model 1: Multinomial Logit, using observations 1-1738 (n = 1717)
Missing or incomplete observations dropped: 21
Dependent variable: status
Standard errors based on Hessian

             coefficient   std. error      z      p-value 
  --------------------------------------------------------
  status = 2
  const      10.2779       1.13334       9.069    1.20e-19 ***
  educ       -0.673631     0.0698999    -9.637    5.57e-22 ***
  exper      -0.106215     0.173282     -0.6130   0.5399  
  expersq    -0.0125152    0.0252291    -0.4961   0.6199  
  black       0.813017     0.302723      2.686    0.0072   ***
  status = 3
  const       5.54380      1.08641       5.103    3.35e-07 ***
  educ       -0.314657     0.0651096    -4.833    1.35e-06 ***
  exper       0.848737     0.156986      5.406    6.43e-08 ***
  expersq    -0.0773003    0.0229217    -3.372    0.0007   ***
  black       0.311361     0.281534      1.106    0.2687  

Mean dependent var   2.691322   S.D. dependent var   0.573502
Log-likelihood      -907.8572   Akaike criterion     1835.714
Schwarz criterion    1890.198   Hannan-Quinn         1855.874

Number of cases 'correctly predicted' = 1366 (79.6%)
Likelihood ratio test: Chi-square(8) = 583.722 [0.0000]  
\end{scodebit}
\end{script}

Listing~\ref{ex:mlogit} reproduces Table 15.2 in
\cite{wooldridge-panel}, based on data on career choice from
\cite{keane97}.  The dependent variable is the occupational status of
an individual (0 = in school; 1 = not in school and not working; 2 =
working), and the explanatory variables are education and work
experience (linear and square) plus a ``black'' binary variable.  The
full data set is a panel; here the analysis is confined to a
cross-section for 1987. 

\section{Bivariate probit}
\label{sec:biprobit}

The bivariate probit model is simply a two-equation system in which
each equation is a probit model, but the two disturbance terms may not
be independent. In formulae,
\begin{eqnarray}
  y^*_{1,i} = \sum_{j=1}^{k_1} x_{ij} \beta_j + \varepsilon_{1,i}  & \qquad &
  y_{1,i}=1  \Longleftrightarrow y^*_{1,i}>0 \\
  y^*_{2,i} = \sum_{j=1}^{k_2} z_{ij} \gamma_j + \varepsilon_{2,i} & \qquad &
  y_{2,i}=1  \Longleftrightarrow y^*_{2,i}>0 \\
  \left[ \begin{array}{c}
      \varepsilon_{2,i} \\ \varepsilon_{2,i}
    \end{array} \right] \sim 
  N \left[ 0, \left( \begin{array}{cc}
      1 & \rho \\ \rho & 1
    \end{array} \right) \right] 
\end{eqnarray}

\begin{itemize}
\item The explanatory variables for the first equation $x$ and for the
  second equation $z$ may overlap
\item example contained in the \texttt{biprobit.inp} sample script.
\item \dollar{uhat} and \dollar{yhat} are matrices
\end{itemize}

FIXME: expand.


\section{Panel estimators}
\label{sec:REprobit}

When your dataset is a panel, the traditional choice for binary
dependent variable models was, for many years, to use logit with fixed
effects and probit with random effects (see \ref{sec:FE-vs-RE} for a
brief discussion of this dichotomy in the context of linear
models). Nowadays, the choice is somewhat wider, but the two
traditional models are by and large what practitioners use as routine
tools.

Gretl provides FE logit as a function package\footnote{Although this
  may change in the near future.} and RE probit natively. Provided
your dataset has a panel structure, the latter option can be obtained
by adding the \option{random} option to the \cmd{probit} command:
\begin{code}
  probit depvar const indvar1 indvar2 --random
\end{code}
as exemplified in the \texttt{reprobit.inp} sample script. The
numerical technique used for this particular estimator is
\emph{Gauss-Hermite quadrature}, which we'll now briefly
describe. Generalizing equation \eqref{eq:qr-latent} to a panel
context, we get
\begin{equation}
  y^*_{i,t} = \sum_{j=1}^k x_{ijt} \beta_j + \alpha_i + \varepsilon_{i,t} = z_{i,t}  +
  \omega_{i,t} 
\end{equation}
in which we assume that the individual effect, $\alpha_i$, and the
disturbance term, $\varepsilon_{i,t}$, are mutually independent
zero-mean Gaussian random variables. The composite error term,
$\omega_{i,t} = \alpha_i + \varepsilon_{i,t}$, is therefore a normal
r.~v.~with mean zero and variance $1 + \sigma^2_{\alpha}$. Because of
the individual effect, $\alpha_i$, observations for the same unit are
not independent; the likelihood therefore has to be evaluated on a
per-unit basis, as
\[
\LogLik_i = \log P\left[ y_{i,1}, y_{i,2}, \ldots , y_{i,T}\right] .
\]
and there's no way to write the above as a product of individual
terms.

However, the above probability \emph{could} be written as a product if
we were to treat $\alpha_i$ as a constant; in that case we would have
\[
\LogLik_i | \alpha_i = \sum_{t=1}^T 
\Phi\left[ 
(2 y_{i,t} - 1) \frac{x_{ijt} \beta_j + \alpha_i}{\sqrt{1 + \sigma^2_{\alpha}}}
\right]
\]
so that we can compute $\LogLik_i$ by integrating $\alpha_i$ out as
\[
\LogLik_i = E\left( \LogLik_i | \alpha_i \right) =
\int_{-\infty}^{\infty} (\LogLik_i | \alpha_i)
\frac{\varphi(\alpha_i)}{\sqrt{1 + \sigma^2_{\alpha}}} \mathrm{d} \alpha_i 
\]

The technique known as Gauss--Hermite quadrature is simply a way of
approximating the above integral via a sum of carefully chosen
terms:\footnote{Some have suggested using a more refined method called
  \emph{adaptive} Gauss-Hermite quadrature; this is not implemented in
  \app{gretl}.}
\[
\LogLik_i \simeq \sum_{k=1}^{m} (\LogLik_i | \alpha_i = n_k ) w_k
\]
where the numbers $n_k$ and $w_k$ are known as \emph{quadrature
  points} and \emph{weights}, respectively. Of course, accuracy
improves with higher values of $m$, but so does CPU usage. Note that
this technique can also be used in more general cases by using the
\cmd{quadtable()} function and the \cmd{mle} command via the apparatus
described in chapter \ref{chap:mle}. Here, however, the calculations
were hard-coded in C for maximal speed and efficiency.

Experience shows that a reasonable compromise can be achieved in most
cases by choosing $m$ in the order of 20 or so; \app{gretl} uses 32 as
a default value, but this can be changed via the \option{quadpoints}
option, as in
\begin{code}
  probit y const x1 x2 x3 --random --quadpoints=48
\end{code}

 
\section{The Tobit model}
\label{sec:tobit}

The Tobit model is used when the dependent variable of a model is
\emph{censored}. Assume a latent variable $y^*_i$ can be described
as
\[
  y^*_i = \sum_{j=1}^k x_{ij} \beta_j + \varepsilon_i ,
\]
where $\varepsilon_i \sim N(0,\sigma^2)$. If $y^*_i$ were observable,
the model's parameters could be estimated via ordinary least squares.
On the contrary, suppose that we observe $y_i$, defined as
%
\begin{equation}
  \label{eq:tobit}
  y_i = \left\{ 
    \begin{array}{ll}
      a & \mathrm{for} \quad y^*_i \le a \\
      y^*_i & \mathrm{for} \quad a < y^*_i < b \\ 
      b & \mathrm{for} \quad y^*_i \ge b 
    \end{array}
    \right. 
\end{equation}
In most cases found in the applied literature, $a=0$ and $b=\infty$,
so in practice negative values of $y^*_i$ are not observed and are
replaced by zeros.

In this case, regressing $y_i$ on the $x_i$s does not yield consistent
estimates of the parameters $\beta$, because the conditional mean
$E(y_i|x_i)$ is not equal to $\sum_{j=1}^k x_{ij} \beta_j$.  It can be
shown that restricting the sample to non-zero observations would not
yield consistent estimates either. The solution is to estimate the
parameters via maximum likelihood. The syntax is simply
\begin{code}
tobit depvar indvars
\end{code}

As usual, progress of the maximization algorithm can be tracked via
the \option{verbose} switch, while \dollar{uhat} returns the
generalized residuals. Note that in this case the generalized residual
is defined as $\hat{u}_i = E(\varepsilon_i | y_i = 0)$ for censored
observations, so the familiar equality $\hat{u}_i = y_i - \hat{y}_i$
only holds for uncensored observations, that is, when $y_i>0$.

An important difference between the Tobit estimator and OLS is that
the consequences of non-normality of the disturbance term are much
more severe: non-normality implies inconsistency for the Tobit
estimator. For this reason, the output for the Tobit model includes
the \cite{chesher-irish87} normality test by default.

The general case in which $a$ is nonzero and/or $b$ is finite can be
handled by using the options \option{llimit} and \option{rlimit}. So,
for example,
\begin{code}
tobit depvar indvars --llimit=10
\end{code}
would tell \app{gretl} that the left bound $a$ is set to 10.

\section{Interval regression}
\label{sec:intreg}

The interval regression model arises when the dependent variable is
unobserved for some (possibly all) observations; what we observe
instead is an interval in which the dependent variable lies.  In other
words, the data generating process is assumed to be
\[
  y^*_i = x_i \beta + \epsilon_i
\] 
but we only know that $m_i \le y^*_i \le M_i$, where the interval may
be left- or right-unbounded (but not both). If $m_i = M_i$, we
effectively observe $y^*_i$ and no information loss occurs. In
practice, each observation belongs to one of four categories:
\begin{enumerate}
\item left-unbounded, when $m_i = -\infty$,
\item right-unbounded, when $M_i = \infty$,
\item bounded, when $-\infty < m_i < M_i <\infty$ and
\item point observations when $m_i = M_i$.
\end{enumerate}

It is interesting to note that this model bears similarities to other
models in several special cases:
\begin{itemize}
\item When all observations are point observations the model trivially
  reduces to the ordinary linear regression model.
\item The interval model could be thought of an ordered probit model
  (see \ref{sec:ordered}) in which the cut points (the $\alpha_j$
  coefficients in eq. \ref{eq:QR-ordered}) are observed and don't need
  to be estimated.
\item The Tobit model (see \ref{sec:tobit}) is a special case of the
  interval model in which $m_i$ and $M_i$ do not depend on $i$, that
  is, the censoring limits are the same for all observations. As a
  matter of fact, gretl's \texttt{tobit} commands is handled
  internally as a special case of the interval model.
\end{itemize}

The gretl command \texttt{intreg} estimates interval models by
maximum likelihood, assuming normality of the disturbance term
$\epsilon_i$.  Its syntax is
%
\begin{code}
intreg minvar maxvar X
\end{code}
%
where \texttt{minvar} contains the $m_i$ series, with \texttt{NA}s for
left-unbounded observations, and \texttt{maxvar} contains $M_i$, with
\texttt{NA}s for right-unbounded observations.  By default, standard
errors are computed using the negative inverse of the Hessian.  If the
\option{robust} flag is given, then QML or Huber--White standard
errors are calculated instead. In this case the estimated covariance
matrix is a ``sandwich'' of the inverse of the estimated Hessian and
the outer product of the gradient.

If the model specification contains regressors other than just a
constant, the output includes a chi-square statistic for testing the
joint null hypothesis that none of these regressors has any effect on
the outcome.  This is a Wald statistic based on the estimated
covariance matrix.  If you wish to construct a likelihood ratio test,
this is easily done by estimating both the full model and the null
model (containing only the constant), saving the log-likelihood in
both cases via the \dollar{lnl} accessor, and then referring twice the
difference between the two log-likelihoods to the chi-square
distribution with $k$ degrees of freedom, where $k$ is the number of
additional regressors (see the \texttt{pvalue} command in the
\GCR). Also included is a conditional moment normality test, similar
to those provided for the probit, ordered probit and Tobit models (see
above).  An example is contained in the sample script
\texttt{wtp.inp}, provided with the gretl distribution.

\begin{script}[ht]
  \caption{Interval model on artificial data}
  \label{ex:interval}
Input:
\begin{scodebit}
nulldata 100
# generate artificial data
set seed 201449 
x = normal()
epsilon = 0.2*normal()
ystar = 1 + x + epsilon
lo_bound = floor(ystar)
hi_bound = ceil(ystar)

# run the interval model
intreg lo_bound hi_bound const x

# estimate ystar
gen_resid = $uhat
yhat = $yhat + gen_resid
corr ystar yhat 
\end{scodebit}
Output (selected portions):
\begin{scodebit}
Model 1: Interval estimates using the 100 observations 1-100
Lower limit: lo_bound, Upper limit: hi_bound

             coefficient   std. error   t-ratio    p-value 
  ---------------------------------------------------------
  const       0.993762     0.0338325     29.37    1.22e-189 ***
  x           0.986662     0.0319959     30.84    8.34e-209 ***

Chi-square(1)        950.9270   p-value              8.3e-209
Log-likelihood      -44.21258   Akaike criterion     94.42517
Schwarz criterion    102.2407   Hannan-Quinn         97.58824

sigma = 0.223273
Left-unbounded observations: 0
Right-unbounded observations: 0
Bounded observations: 100
Point observations: 0

...

corr(ystar, yhat) = 0.98960092
Under the null hypothesis of no correlation:
 t(98) = 68.1071, with two-tailed p-value 0.0000
\end{scodebit}
\end{script}

As with the probit and Tobit models, after a model has been estimated
the \dollar{uhat} accessor returns the generalized residual, which is
an estimate of $\epsilon_i$: more precisely, it equals $y_i - x_i
\hat{\beta}$ for point observations and $E(\epsilon_i| m_i, M_i, x_i)$
otherwise. Note that it is possible to compute an unbiased predictor
of $y^*_i$ by summing this estimate to $x_i \hat{\beta}$. Script
\ref{ex:interval} shows an example. As a further similarity with
Tobit, the interval regression model may deliver inconsistent
estimates if the disturbances are non-normal; hence, the
\cite{chesher-irish87} test for normality is included by default here
too.


\section{Sample selection model}
\label{sec:heckit}

In the sample selection model (also known as ``Tobit II'' model),
there are two latent variables:
%
\begin{eqnarray}
  \label{eq:heckit1}
  y^*_i & = & \sum_{j=1}^k x_{ij} \beta_j + \varepsilon_i \\
  \label{eq:heckit2}
  s^*_i & = & \sum_{j=1}^p z_{ij} \gamma_j + \eta_i 
\end{eqnarray}
%
and the observation rule is given by
%
\begin{equation}
  \label{eq:tobitII}
  y_i = \left\{ 
    \begin{array}{ll} 
      y^*_i & \mathrm{for} \quad s^*_i > 0 \\ 
      \diamondsuit & \mathrm{for} \quad s^*_i \le 0 
    \end{array}
    \right. 
\end{equation}

In this context, the $\diamondsuit$ symbol indicates that for some
observations we simply do not have data on $y$: $y_i$ may be 0, or
missing, or anything else. A dummy variable $d_i$ is normally used to
set censored observations apart.

One of the most popular applications of this model in econometrics is
a wage equation coupled with a labor force participation equation: we
only observe the wage for the employed. If $y^*_i$ and $s^*_i$ were
(conditionally) independent, there would be no reason not to use OLS
for estimating equation (\ref{eq:heckit1}); otherwise, OLS does not
yield consistent estimates of the parameters $\beta_j$.

Since conditional independence between $y^*_i$ and $s^*_i$ is
equivalent to conditional independence between $\varepsilon_i$ and
$\eta_i$, one may model the co-dependence between $\varepsilon_i$ and
$\eta_i$ as 
\[
  \varepsilon_i = \lambda \eta_i + v_i ;
\]
substituting the above expression in (\ref{eq:heckit1}), you obtain
the model that is actually estimated:
\[
  y_i = \sum_{j=1}^k x_{ij} \beta_j + \lambda \hat{\eta}_i + v_i ,
\]
so the hypothesis that censoring does not matter is equivalent to the
hypothesis $H_0: \lambda = 0$, which can be easily tested.

The parameters can be estimated via maximum likelihood under the
assumption of joint normality of $\varepsilon_i$ and $\eta_i$;
however, a widely used alternative method yields the so-called
\emph{Heckit} estimator, named after \cite{heckman79}. The procedure
can be briefly outlined as follows: first, a probit model is fit on
equation (\ref{eq:heckit2}); next, the generalized residuals are
inserted in equation (\ref{eq:heckit1}) to correct for the effect of
sample selection.

Gretl provides the \texttt{heckit} command to carry out
estimation; its syntax is
%
\begin{code}
heckit y X ; d Z
\end{code}
%
where \texttt{y} is the dependent variable, \texttt{X} is a list of
regressors, \texttt{d} is a dummy variable holding 1 for uncensored
observations and \texttt{Z} is a list of explanatory variables for the
censoring equation.

Since in most cases maximum likelihood is the method of
choice, by default gretl computes ML estimates. The 2-step
Heckit estimates can be obtained by using the \option{two-step}
option. After estimation, the \dollar{uhat} accessor contains the
generalized residuals. As in the ordinary Tobit model, the residuals
equal the difference between actual and fitted $y_i$ only for
uncensored observations (those for which $d_i = 1$).

\begin{script}[htbp]
  \caption{Heckit model}
  \label{ex:heckit}
\begin{scode}
open mroz87.gdt

genr EXP2 = AX^2
genr WA2 = WA^2
genr KIDS = (KL6+K618)>0

# Greene's specification

list X = const AX EXP2 WE CIT
list Z = const WA WA2 FAMINC KIDS WE

heckit WW X ; LFP Z --two-step 
heckit WW X ; LFP Z 

# Wooldridge's specification

series NWINC = FAMINC - WW*WHRS
series lww = log(WW)
list X = const WE AX EXP2
list Z = X NWINC WA KL6 K618

heckit lww X ; LFP Z --two-step 
\end{scode}
\end{script}

Listing \ref{ex:heckit} shows two estimates from the dataset used in
\cite{mroz87}: the first one replicates Table 22.7 in
\cite{greene03},\footnote{Note that the estimates given by gretl
  do not coincide with those found in the printed volume.  They do,
  however, match those found on the errata web page for Greene's book:
  \url{http://pages.stern.nyu.edu/~wgreene/Text/Errata/ERRATA5.htm}.}
while the second one replicates table 17.1 in \cite{wooldridge-panel}.

\section{Count data}
\label{sec:count}

Here the dependent variable is assumed to be a non-negative integer,
so a probabilistic description of $y_i | x_i$ must hinge on some
discrete distribution. The most common model is the Poisson model, in
which
\begin{eqnarray*}
  P(y_i = Y | x_i) & = & e^{\mu_i} \frac{\mu_i^Y}{Y!} \\
  \mu_i & = & \exp\left( \sum_j x_{ij} \beta_j \right)
\end{eqnarray*}

In some cases, an ``offset'' variable is needed. The number of
occurrences of $y_i$ in a given time is assumed to be strictly
proportional to the offset variable $n_i$. In the epidemiology
literature, the offset is known as ``population at risk''. In this
case, the model becomes
\[
  \mu_i = n_i \exp\left( \sum_j x_{ij} \beta_j \right)
\]
Another way to look at the offset variable is to consider its natural
log as just another explanatory variable whose coefficient is
constrained to be one.

Estimation is carried out by maximum likelihood and follows the syntax
\begin{code}
  poisson depvar indep
\end{code}
If an offset variable is needed, it has to be specified at the end of
the command, separated from the list of explanatory variables by a
semicolon, as in
\begin{code}
  poisson depvar indep ; offset
\end{code}
It should be noted that the \texttt{poisson} command does not use,
internally, the same optimization engines as most other gretl
command, such as \cmd{arma} or \cmd{tobit}. As a consequence, some
details may differ: the \option{verbose} option will yield different
output and settings such as \texttt{bfgs\_toler} will not work.

\subsection{Overdispersion}

In the Poisson model, $E(y_i | x_i) = V(y_i | x_i) = \mu_i$, that is,
the conditional mean equals the conditional variance by
construction. In many cases, this feature is at odds with the data, as
the conditional variance is often larger than the mean; this
phenomenon is called ``overdispersion''. The output from the
\texttt{poisson} command includes a conditional moment test for
overdispersion (as per \citet{davidson-mackinnon04}, section 11.5),
which is printed automatically after estimation.

Overdispersion can be attributed to unmodeled heterogeneity between
individuals. Two data points with the same observable characteristics
$x_i = x_j$ may differ because of some unobserved scale factor $s_i
\ne s_j$ so that
\[
   E(y_i | x_i, s_i) = \mu_i s_i \ne \mu_j s_j = E(y_j | x_j, s_j)
\]
even though $\mu_i = \mu_j$. In other words, $y_i$ is a Poisson random
variable conditional on both $x_i$ and $s_i$, but since $s_i$ is
unobservable, the only thing we can we can use, $P(y_i | x_i)$, will
not follow the Poisson distribution.

It is often assumed that $s_i$ can be represented as a gamma random
variable with mean 1 and variance $\alpha$: the parameter $\alpha$ is
estimated together with the vector $\beta$, and measures the degree of
heterogeneity between individuals.

In this case, the conditional probability for $y_i$ given
$x_i$ can be shown to be
  \begin{equation}
    \label{eq:negbin2}
  P(y_i | x_i) = 
  \frac{\Gamma(y_i + \alpha^{-1})}{\Gamma(\alpha^{-1})\Gamma(y_i + 1)}
  \left[ \frac{\mu_i} {\mu_i + \alpha^{-1} }\right]^{y_i}
  \left[ \frac{\alpha^{-1}} {\mu_i + \alpha^{-1} }\right]^{\alpha^{-1}}
\end{equation}
which is known as the ``Negative Binomial Model''. The conditional
mean is still $E(y_i | x_i) = \mu_i$, but the variance equals $V(y_i |
x_i) = \mu_i \left( 1 + \mu_i \alpha \right)$. The gretl command
for this model is \texttt{negbin depvar indep}.

\begin{itemize}
\item There is also a less used variant of the negative binomial
  model, in which the conditional variance is a scalar multiple of the
  conditional mean, that is $V(y_i | x_i) = \mu_i \left( 1 + \gamma
  \right)$. To distinguish between the two, the model
  \eqref{eq:negbin2} is termed ``Type 2''.  Gretl implements
  model 1 via the option \option{model1}.
\item A script which exemplifies the above models is included among
  gretl's sample scripts, under the name \texttt{camtriv.inp}.
\end{itemize}

FIXME: expand.

\section{Duration models}
\label{sec:duration}

In some contexts we wish to apply econometric methods to measurements
of the duration of certain states. Classic examples include the
following:
\begin{itemize}
\item From engineering, the ``time to failure'' of electronic or
  mechanical components: how long do, say, computer hard drives
  last until they malfunction?
\item From the medical realm: how does a new treatment affect the
  time from diagnosis of a certain condition to exit from that
  condition (where ``exit'' might mean death or full recovery)?
\item From economics: the duration of strikes, or of spells of
  unemployment.
\end{itemize}

In each case we may be interested in how the durations are
distributed, and how they are affected by relevant covariates.  There
are several approaches to this problem; the one we discuss
here---which is currently the only one supported by gretl---is
estimation of a parametric model by means of Maximum Likelihood.  In
this approach we hypothesize that the durations follow some definite
probability law and we seek to estimate the parameters of that law,
factoring in the influence of covariates.

We may express the density (PDF) of the durations as $f(t, X,
\theta)$, where $t$ is the length of time in the state in question,
$X$ is a matrix of covariates, and $\theta$ is a vector of parameters.
The likelihood for a sample of $n$ observations indexed by $i$ is then
\[
L = \prod_{i=1}^n f(t_i, x_i, \theta)
\]

Rather than working with the density directly, however, it is standard
practice to factor $f(\cdot)$ into two components, namely a
\emph{hazard function}, $\lambda$, and a \emph{survivor function},
$S$.  The survivor function gives the probability that a state lasts
at least as long as $t$; it is therefore $1 - F(t, X, \theta)$ where
$F$ is the CDF corresponding to the density $f(\cdot)$. The hazard
function addresses this question: given that a state has persisted as
long as $t$, what is the likelihood that it ends within a short
increment of time beyond $t$---that is, it ends between $t$ and $t +
\Delta$?  Taking the limit as $\Delta$ goes to zero, we end up with
the ratio of the density to the survivor function:\footnote{For a
  fuller discussion see, for example, \cite{davidson-mackinnon04}.}
\begin{equation}
\label{eq:surv-decomp}
\lambda(t, X, \theta) = \frac{f(t, X, \theta)}{S(t, X, \theta)}
\end{equation}
so the log-likelihood can be written as
\begin{equation}
\label{eq:surv-loglik}
\ell = \sum_{i=1}^n \log f(t_i, x_i, \theta) = 
\sum_{i=1}^n \log \lambda(t_i, x_i, \theta) + 
\log S(t_i, x_i, \theta)
\end{equation}

One point of interest is the shape of the hazard function, in
particular its dependence (or not) on time since the state began.  If
$\lambda$ does not depend on $t$ we say the process in question exhibits
\emph{duration independence}: the probability of exiting the state at
any given moment neither increases nor decreases based simply on how
long the state has persisted to date. The alternatives are positive
duration dependence (the likelihood of exiting the state rises, the
longer the state has persisted) or negative duration dependence (exit
becomes less likely, the longer it has persisted).  Finally, the
behavior of the hazard with respect to time need not be monotonic;
some parameterizations allow for this possibility and some do not.

Since durations are inherently positive the probability distribution
used in modeling must respect this requirement, giving a density of
zero for $t \leq 0$.  Four common candidates are the exponential,
Weibull, log-logistic and log-normal, the Weibull being the most
common choice. The table below displays the density and the hazard
function for each of these distributions as they are commonly
parameterized, written as functions of $t$ alone. ($\phi$ and $\Phi$
denote, respectively, the Gaussian PDF and CDF.)

\begin{center}
\setlength\tabcolsep{1.5em}
\begin{tabular}{lll}
 & density, $f(t)$ & hazard, $\lambda(t)$ \\ [1ex]
Exponential & $\displaystyle 
\gamma \exp\,(-\gamma t)$ &$\displaystyle
\gamma$ \\ [1ex]
Weibull & $\displaystyle
\alpha\gamma^{\alpha}t^{\alpha-1}\exp\left[-(\gamma t)^\alpha\right]$ 
& $\displaystyle \alpha\gamma^{\alpha}t^{\alpha-1}$ \\ [1ex]
Log-logistic & $\displaystyle \gamma\alpha
\frac{(\gamma t)^{\alpha-1}}
{\left[1 + (\gamma t)^\alpha\right]^2}$ 
& $\displaystyle \gamma\alpha
\frac{(\gamma t)^{\alpha-1}}
{\left[1 + (\gamma t)^\alpha\right]}$ \\ [2.5ex]
Log-normal & $\displaystyle
\frac{1}{\sigma t} \phi\left[(\log t - \mu)/\sigma \right]$ 
& $\displaystyle
\frac{1}{\sigma t} \frac{\phi\left[(\log t - \mu)/\sigma \right]}
{\Phi\left[-(\log t - \mu)/\sigma \right]}$
\end{tabular}
\end{center}

The hazard is constant for the exponential distribution.  For the
Weibull, it is monotone increasing in $t$ if $\alpha > 1$, or
monotone decreasing for $\alpha < 1$. (If $\alpha = 1$ the Weibull
collapses to the exponential.)  The log-logistic and log-normal
distributions allow the hazard to vary with $t$ in a non-monotonic
fashion.

Covariates are brought into the picture by allowing them to govern one
of the parameters of the density, so that the durations are not
identically distributed across cases.  For example, when using the
log-normal distribution it is natural to make $\mu$, the expected
value of $\log t$, depend on the covariates, $X$.  This is typically
done via a linear index function: $\mu = X\beta$.

Note that the expressions for the log-normal density and hazard
contain the term $(\log t - \mu)/\sigma$.  Replacing $\mu$ with
$X\beta$ this becomes $(\log t - X\beta)/\sigma$.  It turns out that
this constitutes a useful simplifying change of variables for all of
the distributions discussed here. As in \cite{kalbfleisch02}, we
define
\[
w_i \equiv (\log t_i - x_i\beta)/\sigma
\]
The interpretation of the scale factor, $\sigma$, in this expression
depends on the distribution. For the log-normal, $\sigma$ represents
the standard deviation of $\log t$; for the Weibull and the
log-logistic it corresponds to $1/\alpha$; and for the exponential it
is fixed at unity. For distributions other than the log-normal,
$X\beta$ corresponds to $-\log \gamma$, or in other words $\gamma =
\exp(-X\beta)$.

With this change of variables, the density and survivor functions may
be written compactly as follows (the exponential is the same as the
Weibull).

\begin{center}
\setlength\tabcolsep{1.5em}
\begin{tabular}{lll}
 & density, $f(w_i)$ & survivor, $S(w_i)$ \\ [4pt]
Weibull & 
$\exp\left(w_i - e^{w_i}\right)$ & $\exp(-e^{w_i})$
\\ [4pt]
Log-logistic & 
$e^{w_i} \left(1 + e^{w_i}\right)^{-2}$ 
& $\left(1 + e^{w_i}\right)^{-1}$ \\ [1ex]
Log-normal & $\phi(w_i)$ & $\Phi(-w_i)$
\end{tabular}
\end{center}

In light of the above we may think of the generic parameter vector
$\theta$, as in $f(t, X, \theta)$, as composed of the coefficients on
the covariates, $\beta$, plus (in all cases apart from the exponential)
the additional parameter $\sigma$.

A complication in estimation of $\theta$ is posed by ``incomplete
spells''. That is, in some cases the state in question may not have
ended at the time the observation is made (e.g.\ some workers remain
unemployed, some components have not yet failed).  If we use $t_i$ to
denote the time from entering the state to either (a) exiting the
state or (b) the observation window closing, whichever comes first,
then all we know of the ``right-censored'' cases (b) is that the
duration was at least as long as $t_i$. This can be handled by
rewriting the the log-likelihood (compare \ref{eq:surv-loglik}) as
\begin{equation}
\label{eq:surv-loglik2}
\ell_i = \sum_{i=1}^n \delta_i\log S\left(w_i\right)
+ \left(1-\delta_i\right) 
\left[-\log\sigma + \log f\left(w_i\right)\right]
\end{equation}
where $\delta_i$ equals 1 for censored cases (incomplete spells), and
0 for complete observations. The rationale for this is that the
log-density equals the sum of the log hazard and the log survivor
function, but for the incomplete spells only the survivor function
contributes to the likelihood. So in (\ref{eq:surv-loglik2}) we are
adding up the log survivor function alone for the incomplete cases,
plus the full log density for the completed cases.

\subsection{Implementation in gretl and illustration}

The \texttt{duration} command accepts a list of series on the usual
pattern: dependent variable followed by covariates. If right-censoring
is present in the data this should be represented by a dummy variable
corresponding to $\delta_i$ above, separated from the covariates by
a semicolon. For example,
\begin{code}
duration durat 0 X ; cens
\end{code}
where \texttt{durat} measures durations, \texttt{0} represents the
constant (which is required for such models), \texttt{X} is a named
list of regressors, and \texttt{cens} is the censoring dummy.

By default the Weibull distribution is used; you can substitute any of
the other three distributions discussed here by appending one of the
option flags \verb|--exponential|, \verb|--loglogistic| or 
\verb|--lognormal|.

Interpreting the coefficients in a duration model requires some care,
and we will work through an illustrative case. The example comes from
section 20.3 of \cite{wooldridge-panel}, and it concerns criminal
recidivism.\footnote{Germ\'an Rodr\'iguez of Princeton University has a
  page discussing this example and displaying estimates from
  \app{Stata} at \url{http://data.princeton.edu/pop509a/recid1.html}.}
The data (filename \texttt{recid.gdt}) pertain to a sample of 1,445
convicts released from prison between July 1, 1977 and June 30,
1978. The dependent variable is the time in months until they are
again arrested. The information was gathered retrospectively by
examining records in April 1984; the maximum possible length of
observation is 81 months.  Right-censoring is important: when the date
were compiled about 62 percent had not been arrested.  The dataset
contains several covariates, which are described in the data file; we
will focus below on interpretation of the \texttt{married} variable, a
dummy which equals 1 if the respondent was married when imprisoned.

Listing~\ref{ex:duration1} shows the gretl commands for a
Weibull model along with most of the output.  Consider first the scale
factor, $\sigma$. The estimate is 1.241 with a standard error of
0.048.  (We don't print a $z$ score and $p$-value for this term since
$H_0: \sigma = 0$ is not of interest.)  Recall that $\sigma$
corresponds to $1/\alpha$; we can be confident that $\alpha$ is less
than 1, so recidivism displays negative duration dependence.  This
makes sense: it is plausible that if a past offender manages to stay
out of trouble for an extended period his risk of engaging in crime
again diminishes. (The exponential model would therefore not be
appropriate in this case.)

\begin{script}[htbp]
  \caption{Weibull model for recidivism data}
Input:
\begin{scodebit}
open recid.gdt
list X = workprg priors tserved felon alcohol drugs \
 black married educ age
duration durat 0 X ; cens
duration durat 0 X ; cens --lognormal
\end{scodebit}
Partial output:
\begin{scodebit}
Model 1: Duration (Weibull), using observations 1-1445
Dependent variable: durat

             coefficient   std. error      z      p-value 
  --------------------------------------------------------
  const       4.22167      0.341311      12.37    3.85e-35 ***
  workprg    -0.112785     0.112535      -1.002   0.3162  
  priors     -0.110176     0.0170675     -6.455   1.08e-10 ***
  tserved    -0.0168297    0.00213029    -7.900   2.78e-15 ***
  felon       0.371623     0.131995       2.815   0.0049   ***
  alcohol    -0.555132     0.132243      -4.198   2.69e-05 ***
  drugs      -0.349265     0.121880      -2.866   0.0042   ***
  black      -0.563016     0.110817      -5.081   3.76e-07 ***
  married     0.188104     0.135752       1.386   0.1659  
  educ        0.0289111    0.0241153      1.199   0.2306  
  age         0.00462188   0.000664820    6.952   3.60e-12 ***
  sigma       1.24090      0.0482896                      

Chi-square(10)       165.4772   p-value              2.39e-30
Log-likelihood      -1633.032   Akaike criterion     3290.065

Model 2: Duration (log-normal), using observations 1-1445
Dependent variable: durat

             coefficient   std. error       z      p-value 
  ---------------------------------------------------------
  const       4.09939      0.347535      11.80     4.11e-32 ***
  workprg    -0.0625693    0.120037      -0.5213   0.6022  
  priors     -0.137253     0.0214587     -6.396    1.59e-10 ***
  tserved    -0.0193306    0.00297792    -6.491    8.51e-11 ***
  felon       0.443995     0.145087       3.060    0.0022   ***
  alcohol    -0.634909     0.144217      -4.402    1.07e-05 ***
  drugs      -0.298159     0.132736      -2.246    0.0247   **
  black      -0.542719     0.117443      -4.621    3.82e-06 ***
  married     0.340682     0.139843       2.436    0.0148   **
  educ        0.0229194    0.0253974      0.9024   0.3668  
  age         0.00391028   0.000606205    6.450    1.12e-10 ***
  sigma       1.81047      0.0623022                       

Chi-square(10)       166.7361   p-value              1.31e-30
Log-likelihood      -1597.059   Akaike criterion     3218.118

\end{scodebit}
\label{ex:duration1}
\end{script}

On a priori grounds, however, we may doubt the monotonic decline in
hazard that is implied by the Weibull specification. Even if a person
is liable to return to crime, it seems relatively unlikely that he
would do so straight out of prison. In the data, we find that only 2.6
percent of those followed were rearrested within 3 months. The
log-normal specification, which allows the hazard to rise and then
fall, may be more appropriate.  Using the \texttt{duration} command
again with the same covariates but the \verb|--lognormal| flag, we get
a log-likelihood of $-1597$ as against $-1633$ for the Weibull,
confirming that the log-normal gives a better fit.

Let us now focus on the \texttt{married} coefficient, which is
positive in both specifications but larger and more sharply estimated
in the log-normal variant. The first thing is to get the
interpretation of the sign right.  Recall that $X\beta$ enters
negatively into the intermediate variable $w$. The Weibull hazard is 
$\lambda(w_i) = e^{w_i}$, so being married reduces the hazard of
re-offending, or in other words lengthens the expected duration out of
prison.  The same qualitative interpretation applies for the
log-normal.  

To get a better sense of the married effect, it is useful to show its
impact on the hazard across time. We can do this by plotting the
hazard for two values of the index function $X\beta$: in each case the
values of all the covariates other than \texttt{married} are set to
their means (or some chosen values) while \texttt{married} is set
first to 0 then to 1. Listing~\ref{ex:hazard-plots} provides a script
that does this, and the resulting plots are shown in
Figure~\ref{fig:hazard-plots}. Note that when computing the hazards we
need to multiply by the Jacobian of the transformation from $t_i$ to
$w_i = \log (t_i - x_i\beta)/\sigma$, namely $1/t$.  Note also that
the estimate of $\sigma$ is available via the accessor \verb|$sigma|,
but it is also present as the last element in the coefficient vector
obtained via \verb|$coeff|.

A further difference between the Weibull and log-normal specifications
is illustrated in the plots. The Weibull is an instance of a
\emph{proportional hazard} model. This means that for any sets of
values of the covariates, $x_i$ and $x_j$, the ratio of the associated
hazards is invariant with respect to duration. In this example the
Weibull hazard for unmarried individuals is always 1.1637 times that
for married. In the log-normal variant, on the other hand, this ratio
gradually declines from 1.6703 at one month to 1.1766 at 100 months.


\begin{script}[htbp]
  \caption{Create plots showing conditional hazards}
\begin{scode}
open recid.gdt -q

# leave 'married' separate for analysis
list X = workprg priors tserved felon alcohol drugs \
 black educ age

# Weibull variant
duration durat 0 X married ; cens
# coefficients on all Xs apart from married
matrix beta_w = $coeff[1:$ncoeff-2]
# married coefficient
scalar mc_w = $coeff[$ncoeff-1]
scalar s_w = $sigma

# Log-normal variant
duration durat 0 X married ; cens --lognormal
matrix beta_n = $coeff[1:$ncoeff-2]
scalar mc_n = $coeff[$ncoeff-1]
scalar s_n = $sigma

list allX = 0 X
# evaluate X\beta at means of all variables except marriage
scalar Xb_w = meanc({allX}) * beta_w
scalar Xb_n = meanc({allX}) * beta_n

# construct two plot matrices
matrix mat_w = zeros(100, 3)
matrix mat_n = zeros(100, 3)

loop t=1..100 -q
  # first column, duration
  mat_w[t, 1] = t
  mat_n[t, 1] = t
  wi_w = (log(t) - Xb_w)/s_w
  wi_n = (log(t) - Xb_n)/s_n
  # second col: hazard with married = 0
  mat_w[t, 2] = (1/t) * exp(wi_w)
  mat_n[t, 2] = (1/t) * pdf(z, wi_n) / cdf(z, -wi_n)
  wi_w = (log(t) - (Xb_w + mc_w))/s_w
  wi_n = (log(t) - (Xb_n + mc_n))/s_n
  # third col: hazard with married = 1
  mat_w[t, 3] = (1/t) * exp(wi_w)
  mat_n[t, 3] = (1/t) * pdf(z, wi_n) / cdf(z, -wi_n)
endloop

colnames(mat_w, "months unmarried married")
colnames(mat_n, "months unmarried married")

gnuplot 2 3 1 --with-lines --supp --matrix=mat_w --output=weibull.plt
gnuplot 2 3 1 --with-lines --supp --matrix=mat_n --output=lognorm.plt
\end{scode}
\label{ex:hazard-plots}
\end{script}

\begin{figure}[htbp]
\centering
\includegraphics[scale=0.85]{figures/weibull}
\includegraphics[scale=0.85]{figures/lognorm}
\caption{Recidivism hazard estimates for married and unmarried
  ex-convicts}
\label{fig:hazard-plots}
\end{figure}

\subsection{Alternative representations of the Weibull model}

One point to watch out for with the Weibull duration model is that the
estimates may be represented in different ways.  The representation
given by gretl is sometimes called the \textit{accelerated
  failure-time} (AFT) metric. An alternative that one sometimes sees
is the \textit{log relative-hazard} metric; in fact this is the metric
used in Wooldridge's presentation of the recidivism example.  To get
from AFT estimates to log relative-hazard form it is necessary to
multiply the coefficients by $-\sigma^{-1}$. For example, the
\texttt{married} coefficient in the Weibull specification as shown
here is 0.188104 and $\hat{\sigma}$ is 1.24090, so the alternative
value is $-0.152$, which is what Wooldridge shows
(\citeyear{wooldridge-panel}, Table 20.1).

\subsection{Fitted values and residuals}

By default, gretl computes fitted values (accessible via
\dollar{yhat}) as the conditional mean of duration.  The formulae
are shown below (where $\Gamma$ denotes the gamma function, and the
exponential variant is just Weibull with $\sigma = 1$).

\begin{center}
\setlength\tabcolsep{1em}
\begin{tabular}{ccc}
Weibull & Log-logistic & Log-normal \\ [4pt]
$\exp(X\beta)\Gamma(1 + \sigma)$ &
$\displaystyle \exp(X\beta)\frac{\pi \sigma}{\sin(\pi \sigma)}$ &
$\exp(X\beta + \sigma^2/2)$
\end{tabular}
\end{center}

The expression given for the log-logistic mean, however, is valid only
for $\sigma < 1$; otherwise the expectation is undefined, a point that
is not noted in all software.\footnote{The \texttt{predict} adjunct to
  the \texttt{streg} command in \app{Stata} 10, for example, gaily
  produces large negative values for the log-logistic mean in duration
  models with $\sigma > 1$.}

Alternatively, if the \verb|--medians| option is given, gretl's
\texttt{duration} command will produce conditional medians as the
content of \dollar{yhat}.  For the Weibull the median is
$\exp(X\beta)(\log 2)^\sigma$; for the log-logistic and log-normal it
is just $\exp(X\beta)$.

The values we give for the accessor \dollar{uhat} are generalized
(Cox--Snell) residuals, computed as the integrated hazard function,
which equals the negative of the log of the survivor function:
\[
\hat{u}_i = \Lambda(t_i, x_i, \theta) = -\log S(t_i, x_i, \theta)
\]
Under the null of correct specification of the model these generalized
residuals should follow the unit exponential distribution, which has
mean and variance both equal to 1 and density $e^{-1}$. See
\cite{cameron-trivedi05} for further discussion.

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "gretl-guide"
%%% End: