1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
|
---
title: "Regularized SEM"
author: "Joshua Pritikin and Ross Jacobucci"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Regularization}
%\usepackage[UTF-8]{inputenc}
---
```{r, include = FALSE}
is_CRAN <- !identical(Sys.getenv("NOT_CRAN"), "true")
if (!is_CRAN) {
options(mc.cores = parallel::detectCores())
} else {
knitr::opts_chunk$set(eval = FALSE)
knitr::knit_hooks$set(evaluate.inline = function(x, envir) x)
}
```
### Simulate Data
We will first simulate data
```{r}
library(OpenMx)
manifests <- c(paste0('x',1:8), paste0('y',1:5))
set.seed(12)
sim.mod <- mxModel(
"sim", type="RAM", manifestVars = manifests, latentVars = 'f1',
mxPath(paste0('x',1:8), 'f1', values=c(0,0,0,0,0,.2,.5,.8),
labels=paste0('c',1:8)),
mxPath('f1', paste0('y',1:5), values=1),
mxPath(paste0('x',1:8), arrows=2, connect = "unique.bivariate",
values=rnorm(8*7/2, sd=.2)),
mxPath(paste0('x',1:8), arrows=2, values=1),
mxPath(paste0('y',1:5), arrows=2, values=1),
mxPath('f1', arrows=2, values=1, free=FALSE),
mxPath('one', manifests),
mxPath('one', 'f1', free=FALSE))
dat.sim = mxGenerateData(sim.mod, nrows = 100)
```
### Run the Model
And then run the model so we can better see the structure.
```{r}
run.mod <- mxModel(sim.mod, mxData(dat.sim, type="raw"))
fit <- mxRun(run.mod)
#summary(fit)
summary(fit)$parameters[1:8,]
```
### Regularize
One of the difficult pieces in using regularization is that the penalty has to be calibrated for each particular problem. In running this code, I first tested the default, but this was too small given that there were some parameters > 0.4. After plotting this initial run, I saw that some parameters weren't penalized enough, therefore, I increased the penalty step to 1.2 and with 41 different values. This tested a model that had most estimates as zero. In some cases, it isn't necessary to test a sequence of penalties that would set "large" parameters to zero, as either the model could fail to converge then, or there is not uncertainty about those parameters inclusion.
```{r,results='hide'}
regFit <- mxPenaltySearch(mxModel(
fit, mxPenaltyLASSO(paste0('c',1:8),"lasso",lambda.step=1.2),
mxMatrix('Full',1,1,free=TRUE,values=0, labels="lambda")))
```
A status code 6 warning is issued because the parameters affected
by regularization have relatively large gradients.
```{r}
round(regFit$output$gradient, 2)
```
The gradient check is only done for the model with the best EBIC.
Next, we can get a summary of the models tested to check if there
were any optimization failures.
```{r}
detail <- regFit$compute$steps$PS$output$detail
table(detail$statusCode)
```
Looks good. We can also look at summaries of some of the results,
```{r}
range(detail$lambda)
range(detail$EBIC)
best <- which(detail$EBIC == min(detail$EBIC))
detail[best, 'lambda']
```
## Plot the parameter trajectories
```{r,fig.width=5,fig.height=5}
library(reshape2)
library(ggplot2)
est <- detail[,c(paste0('c',1:8), 'lambda')]
ggplot(melt(est, id.vars = 'lambda')) +
geom_line(aes(x=lambda, y=value, color=variable)) +
geom_vline(aes(xintercept=coef(regFit)['lambda']),
linetype="dashed", alpha=.5)
```
Here we can see that we used a large enough penalty to set most parameter estimates to zero. The best fitting model is indicated by the dashed lines.
OpenMx uses EBIC to choose a final model. See what the best fitting parameter estimates are.
```{r}
summary(regFit)$parameters[1:8,]
```
In this final model, we set the regression paths for x1, x2, x3, x4, x5, and x6 to zero. We also correctly identify x7 and x8 as true paths. Compare these results with the maximum likelihood estimates.
|