1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335
|
\name{segmented}
\alias{segmented}
\alias{segmented.lm}
\alias{segmented.glm}
\alias{segmented.default}
\alias{segmented.Arima}
\alias{segmented.numeric}
%\alias{print.segmented}
%\alias{summary.segmented}
%\alias{print.summary.segmented}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{
Segmented relationships in regression models
}
\description{
Fits regression models with segmented relationships between the response
and one or more explanatory variables. Break-point estimates are provided.
}
\usage{
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, ...)
\method{segmented}{default}(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, ...)
\method{segmented}{lm}(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, ...)
\method{segmented}{glm}(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, ...)
\method{segmented}{Arima}(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, ...)
\method{segmented}{numeric}(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, adjX=FALSE, weights=NULL, ...)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
\item{obj}{ standard `linear' model of class "lm", "glm" or "Arima", or potentially any regression
fit may be supplied since version 0.5-0 (see 'Details'). \code{obj} can include any covariate understood to have a linear (i.e. no break-points) effect on the response. If \code{obj} also includes the segmented covariate specified in \code{seg.Z}, then all the slopes of the fitted segmented relationship will be estimated. On the other hand, if \code{obj} misses the segmented variable, then the 1st (the leftmost) slope is assumed to be zero. Since version 1.5.0, \code{obj} can be a simple numeric or \code{ts} object but with only a single segmented variable (\code{segmented.numeric}) see examples below.}
\item{seg.Z}{ the segmented variable(s), i.e. the continuous covariate(s) understood to have a piecewise-linear relationship with response. It is a formula with no response variable, such as \code{seg.Z=~x} or \code{seg.Z=~x1+x2}. It can be missing when \code{obj} includes only one covariate which is taken as segmented variable. Currently, formulas involving functions,
such as \code{seg.Z=~log(x1)}, or selection operators, such as \code{seg.Z=~d[,"x1"]} or \code{seg.Z=~d$x1}, are \emph{not} allowed. Also, variable names formed by \verb{U} or \verb{V} only (with or without numbers) are not permitted.}
\item{psi}{ starting values for the breakpoints to be estimated. If there is a single segmented variable specified in \code{seg.Z}, \code{psi} is a numeric vector, and it can be missing when 1 breakpoint has to be estimated (and the median of the segmented variable is used as a starting value). If \code{seg.Z} includes several covariates, \code{psi} has be specified as a \emph{named} list of vectors whose names have to match the variables in the \code{seg.Z} argument. Each vector of such list includes starting values for the break-point(s) for the corresponding variable in \code{seg.Z}. A \code{NA} value means that `\code{K}' quantiles (or equally spaced values) are used as starting values; \code{K} is fixed via the \code{\link{seg.control}} auxiliary function. See \code{npsi} as an alternative to specify just the number of breakpoints.
}
\item{npsi}{
A named vector or list meaning the \emph{number} (and not locations) of breakpoints to be estimated. The starting values will be internally computed via the quantiles or equally spaced values, as specified in argument \code{quant} in \code{\link{seg.control}}. \code{npsi} can be missing and \code{npsi=1} is assumed for all variables specified in \code{seg.Z}. If \code{psi} is provided, \code{npsi} is ignored.
}
\item{fixed.psi}{An optional named list meaning the breakpoints to be kept fixed during the estimation procedure. The names should be a subset of (or even the same) variables specified in \code{seg.Z}. If there is a single variable in \code{seg.Z}, a simple numeric vector can be specified. Note that, in addition to the values specified here, \code{segmented} will estimate additional breakpoints. To keep fixed all breakpoints (to be specified in \code{psi}) use \code{it.max=0} in \code{\link{seg.control}}
}
\item{control}{ a list of parameters for controlling the fitting process.
See the documentation for \code{\link{seg.control}} for details. }
\item{model}{logical value indicating if the model.frame should be returned.}
\item{keep.class}{logical value indicating if the final fit returned by \code{segmented.default} should keep the class '\code{segmented}' (along with the class of the original fit \code{obj}). Ignored by the segmented methods. }
\item{\dots}{ optional arguments (to be ignored safely). Notice specific arguments relevant to the original call (via \code{lm} or \code{glm} for instance), such as \code{weights} or \code{offet}, have to be included in the starting model \code{obj}
}
\item{adjX}{if \code{obj} is a \code{ts}, the segmented variable (if not specified in \code{seg.Z}) is computed by taking information from the time series (e.g., years starting from 2000, say). If \code{adjX=TRUE}, the segmented variable is shifted such that its min equals zero.
Default is using the unshifted values, but if there are several breakpoints to be estimated , it is strongly suggested to set \code{adjX=TRUE}.
}
\item{weights}{the weights if \code{obj} is a vector or a ts object, otherwise the weights should be specified in the
starting fit \code{obj}.}
}
\details{
Given a linear regression model usually of class "lm" or "glm" (or even a simple numeric/ts vector), segmented tries to estimate
a new regression model having broken-line relationships with the variables specified in \code{seg.Z}.
A segmented (or broken-line) relationship is defined by the slope
parameters and the break-points where the linear relation changes. The number of breakpoints
of each segmented relationship is fixed via the \code{psi} (or \code{npsi}) argument, where initial
values for the break-points (or simply their number via \code{npsi}) must be specified. The model
is estimated simultaneously yielding point estimates and relevant approximate
standard errors of all the model parameters, including the break-points.
Since version 0.2-9.0 \code{segmented} implements the bootstrap restarting algorithm described in Wood (2001).
The bootstrap restarting is expected to escape the local optima of the objective function when the
segmented relationship is flat and the log likelihood can have multiple local optima.
Since version 0.5-0.0 the default method \code{segmented.default} has been added to estimate segmented relationships in
general (besides "lm" and "glm" fits) regression models, such as Cox regression or quantile regression (for a single percentile).
The objective function to be minimized is the (minus) value extracted by the \code{logLik} function or it may be passed on via
the \code{fn.obj} argument in \code{seg.control}. See example below. While the default method is expected to work with any regression
fit (where the usual \code{coef()}, \code{update()}, and \code{logLik()} returns appropriate results), it is not recommended for
"lm" or "glm" fits (as \code{segmented.default} is slower than the specific methods \code{segmented.lm} and \code{segmented.glm}), although
final results are the same. However the object returned by \code{segmented.default} is \emph{not} of class "segmented", as currently
the segmented methods are not guaranteed to work for `generic' (i.e., besides "lm" and "glm") regression fits. The user
could try each "segmented" method on the returned object by calling it explicitly (e.g. via \code{plot.segmented()} or \code{confint.segmented()} wherein the regression coefficients and relevant covariance matrix have to be specified, see \code{.coef} and \code{.vcov} in \code{plot.segmented()}, \code{confint.segmented()}, \code{slope()}).
}
\value{
segmented returns an object of class "segmented" which inherits
from the class of \code{obj}, for instance "lm" or "glm". \cr
An object of class "segmented" is a list containing the components of the
original object \code{obj} with additionally the followings:
\item{psi}{estimated break-points (sorted) and relevant (approximate) standard errors}
\item{it}{number of iterations employed}
\item{epsilon}{difference in the objective function when the algorithm stops}
\item{model}{the model frame}
\item{psi.history}{a list or a vector including the breakpoint estimates at each step}
\item{seed}{the integer vector containing the seed just before the bootstrap resampling.
Returned only if bootstrap restart is employed}
\item{..}{Other components are not of direct interest of the user}
}
\section{ Warning }{
At convergence, if the estimated breakpoints are too close each other or at the boundaries, the parameter point estimate could be returned, but without finite standard errors. To avoid that, \code{segmented} revises the final breakpoint estimates to allow that at least \code{min.nj} are within each interval of the segmented covariate. A warning message is printed if such adjustment is made. See \code{min.nj} in \code{\link{seg.control}}.
}
%It is well-known that the log-likelihood function for the
%break-point may be not concave, especially
%for poor clear-cut kink-relationships. In these circumstances the initial guess
% for the break-point, i.e. the \code{psi} argument, must be provided with care. For instance visual
%inspection of a, possibly smoothed, scatter-plot is usually a good way to obtain some idea on breakpoint location.
%However bootstrap restarting, implemented since version 0.2-9.0, is relatively more robust to starting values specified
%in \code{psi}. Alternatively an automatic procedure may be implemented by specifying \code{psi=NA} and
%\code{fix.npsi=FALSE} in \code{\link{seg.control}}: experience suggests to increase the number of iterations
%via \code{it.max} in \code{seg.control()}. This automatic procedure, however, is expected to overestimate
%the number of breakpoints.
%}
\note{
\enumerate{
\item The algorithm will start if the \code{it.max} argument returned by \code{seg.control}
is greater than zero. If \code{it.max=0} \code{segmented} will estimate a new linear model with
break-point(s) fixed at the values reported in \code{psi}.Alternatively, it is also possible to set \code{h=0} in \code{seg.control()}. In this case, bootstrap restarting is unncessary, then to have breakpoints at \code{mypsi} type \cr
\code{segmented(.., psi=mypsi, control=seg.control(h=0, n.boot=0, it.max=1))}
\item In the returned fit object, `U.' is put before the name of the segmented
variable to mean the difference-in-slopes coefficient.
\item Methods specific to the class \code{"segmented"} are
\itemize{
\item \code{\link{print.segmented}}
\item \code{\link{summary.segmented}}
\item \code{\link{print.summary.segmented}}
\item \code{\link{plot.segmented}}
\item \code{\link{lines.segmented}}
\item \code{\link{confint.segmented}}
\item \code{\link{vcov.segmented}}
\item \code{\link{predict.segmented}}
\item \code{\link{points.segmented}}
\item \code{\link{coef.segmented}}
}
Others are inherited from the class \code{"lm"} or \code{"glm"} depending on the
class of \code{obj}.
}
}
\references{
Muggeo, V.M.R. (2003) Estimating regression models with unknown
break-points. \emph{Statistics in Medicine} \bold{22}, 3055--3071.
Muggeo, V.M.R. (2008) Segmented: an R package to fit regression
models with broken-line relationships. \emph{R News} \bold{8/1}, 20--25.
}
\author{ Vito M. R. Muggeo, \email{vito.muggeo@unipa.it} }
\seealso{ \code{\link{segmented.glm}} for segmented GLM and \code{\link{segreg}} to fit the models via a formula interface. \code{\link{segmented.lme}} fits random changepoints (segmented mixed) models. }
\examples{
set.seed(12)
xx<-1:100
zz<-runif(100)
yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+15*pmax(zz-.5,0)+rnorm(100,0,2)
dati<-data.frame(x=xx,y=yy,z=zz)
out.lm<-lm(y~x,data=dati)
#the simplest example: the starting model includes just 1 covariate
#.. and 1 breakpoint has to be estimated for that
o<-segmented(out.lm) #1 breakpoint for x
#the single segmented variable is not in the starting model, and thus..
#... you need to specify it via seg.Z, but no starting value for psi
o<-segmented(out.lm, seg.Z=~z)
#note the leftmost slope is constrained to be zero (since out.lm does not include z)
#2 segmented variables, 1 breakpoint each (again no need to specify npsi or psi)
o<-segmented(out.lm,seg.Z=~z+x)
#1 segmented variable, but 2 breakpoints: you have to specify starting values (vector) for psi:
o<-segmented(out.lm,seg.Z=~x,psi=c(30,60), control=seg.control(display=FALSE))
#.. or you can specify just the *number* of breakpoints
#o<-segmented(out.lm,seg.Z=~x, npsi=2, control=seg.control(display=FALSE))
slope(o) #the slopes of the segmented relationship
#2 segmented variables: starting values requested via a named list
out.lm<-lm(y~z,data=dati)
o1<-update(o,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))
#..or by specifying just the *number* of breakpoints
#o1<-update(o,seg.Z=~x+z, npsi=c(x=2,z=1))
#the default method leads to the same results (but it is slower)
#o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))
#o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3),
# control=seg.control(fn.obj="sum(x$residuals^2)"))
#automatic procedure to estimate breakpoints in the covariate x (starting from K quantiles)
# Hint: increases number of iterations. Notice: bootstrap restart is not allowed!
# However see ?selgmented for a better approach
#o<-segmented.lm(out.lm,seg.Z=~x+z,psi=list(x=NA,z=.3),
# control=seg.control(fix.npsi=FALSE, n.boot=0, tol=1e-7, it.max = 50, K=5, display=TRUE))
#assess the progress of the breakpoint estimates throughout the iterations
\dontrun{
par(mfrow=c(1,2))
draw.history(o, "x")
draw.history(o, "z")
}
#try to increase the number of iterations and re-assess the
#convergence diagnostics
# A simple segmented model with continuous responses and no linear covariates
# No need to fit the starting lm model:
segmented(yy, npsi=2) #NOTE: subsetting the vector works ( segmented(yy[-1],..) )
#only a single segmented covariate is allowed in seg.Z, and if seg.Z is unspecified,
# the segmented variable is taken as 1:n/n
# An example using the Arima method:
\dontrun{
n<-50
idt <-1:n #the time index
mu<-50-idt +1.5*pmax(idt-30,0)
set.seed(6969)
y<-mu+arima.sim(list(ar=.5),n)*3.5
o<-arima(y, c(1,0,0), xreg=idt)
os1<-segmented(o, ~idt, control=seg.control(display=TRUE))
#note using the .coef argument is mandatory!
slope(os1, .coef=os1$coef)
plot(y)
plot(os1, add=TRUE, .coef=os1$coef, col=2)
}
################################################################
################################################################
######Four examples using the default method:
################################################################
################################################################
################################################################
#==> 1. Cox regression with a segmented relationship
################################################################
\dontrun{
library(survival)
data(stanford2)
o<-coxph(Surv(time, status)~age, data=stanford2)
os<-segmented(o, ~age, psi=40) #estimate the breakpoint in the age effect
summary(os) #actually it means summary.coxph(os)
plot(os) #it does not work
plot.segmented(os) #call explicitly plot.segmented() to plot the fitted piecewise lines
################################################################
# ==> 2. Linear mixed model via the nlme package
################################################################
dati$g<-gl(10,10) #the cluster 'id' variable
library(nlme)
o<-lme(y~x+z, random=~1|g, data=dati)
os<-segmented.default(o, ~x+z, npsi=list(x=2, z=1))
#summarizing results (note the '.coef' argument)
slope(os, .coef=fixef(os))
plot.segmented(os, "x", .coef=fixef(os), conf.level=.95)
confint.segmented(os, "x", .coef=fixef(os))
dd<-data.frame(x=c(20,50),z=c(.2,.6), g=1:2)
predict.segmented(os, newdata=dd, .coef=fixef(os))
################################################################
# ==> 3. segmented quantile regression via the quantreg package
################################################################
library(quantreg)
data(Mammals)
y<-with(Mammals, log(speed))
x<-with(Mammals, log(weight))
o<-rq(y~x, tau=.9)
os<-segmented.default(o, ~x) #it does NOT work. It cannot compute the vcov matrix..
#Let's define the vcov.rq function.. (I don't know if it is the best option..)
vcov.rq<-function(x,...) {
V<-summary(x,cov=TRUE,se="nid",...)$cov
rownames(V)<-colnames(V)<-names(x$coef)
V}
os<-segmented.default(o, ~x) #now it does work
plot.segmented(os, res=TRUE, col=2, conf.level=.95)
################################################################
# ==> 4. segmented regression with the svyglm() (survey package)
################################################################
library(survey)
data(api)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
o<-svyglm(api00~ell, design=dstrat)
#specify as a string the objective function to be minimized. It can be obtained via svyvar()
fn.x<- 'as.numeric(svyvar(resid(x, "pearson"), x$survey.design, na.rm = TRUE))'
os<-segmented.default(o, ~ell, control=seg.control(fn.obj=fn.x, display=TRUE))
slope(os)
plot.segmented(os, res=TRUE, conf.level=.9, shade=TRUE)
}
}
% Add one or more standard keywords, see file 'KEYWORDS' in the
% R documentation directory.
\keyword{regression}
\keyword{nonlinear }
|