File: bivariate.density.Rd

package info (click to toggle)
r-cran-sparr 2.3-16-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 884 kB
  • sloc: makefile: 2
file content (270 lines) | stat: -rw-r--r-- 12,474 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bivariate.density.R
\name{bivariate.density}
\alias{bivariate.density}
\alias{bivden}
\title{Bivariate kernel density/intensity estimation}
\usage{
bivariate.density(
  pp,
  h0,
  hp = NULL,
  adapt = FALSE,
  resolution = 128,
  gamma.scale = "geometric",
  edge = c("uniform", "diggle", "none"),
  weights = NULL,
  intensity = FALSE,
  trim = 5,
  xy = NULL,
  pilot.density = NULL,
  leaveoneout = FALSE,
  parallelise = NULL,
  davies.baddeley = NULL,
  verbose = TRUE
)
}
\arguments{
\item{pp}{An object of class \code{\link[spatstat.geom]{ppp}} giving the observed
2D data set to be smoothed.}

\item{h0}{Global bandwidth for adaptive smoothing or fixed bandwidth for
constant smoothing. A numeric value > 0.}

\item{hp}{Pilot bandwidth (scalar, numeric > 0) to be used for fixed
bandwidth estimation of a pilot density in the case of adaptive smoothing.
If \code{NULL} (default), it will take on the value of \code{h0}. Ignored
when \code{adapt = FALSE} or if \code{pilot.density} is supplied as a
pre-defined pixel image.}

\item{adapt}{Logical value indicating whether to perform adaptive kernel
estimation. See `Details'.}

\item{resolution}{Numeric value > 0. Resolution of evaluation grid; the
density/intensity will be returned on a [\code{resolution} \eqn{\times}{x}
\code{resolution}] grid.}

\item{gamma.scale}{Scalar, numeric value > 0; controls rescaling of the
variable bandwidths. Defaults to the geometric mean of the bandwidth factors
given the pilot density (as per Silverman, 1986). See `Details'.}

\item{edge}{Character string giving the type of edge correction to employ.
\code{"uniform"} (default) corrects based on evaluation grid coordinate and
\code{"diggle"} reweights each observation-specific kernel. Setting
\code{edge = "none"} requests no edge correction. Further details can be
found in the documentation for \code{\link[spatstat.explore]{density.ppp}}.}

\item{weights}{Optional numeric vector of nonnegative weights corresponding to
each observation in \code{pp}. Must have length equal to \code{npoints(pp)}.}

\item{intensity}{Logical value indicating whether to return an intensity
estimate (integrates to the sample size over the study region), or a density
estimate (default, integrates to 1).}

\item{trim}{Numeric value > 0; controls bandwidth truncation for adaptive
estimation. See `Details'.}

\item{xy}{Optional alternative specification of the evaluation grid; matches
the argument of the same tag in \code{\link[spatstat.geom]{as.mask}}. If
supplied, \code{resolution} is ignored.}

\item{pilot.density}{An optional pixel image (class
  \code{\link[spatstat.geom]{im}}) giving the pilot density to be used for
  calculation of the variable bandwidths in adaptive estimation, \bold{or} a
  \code{\link[spatstat.geom:ppp]{ppp.object}} giving the data upon which to base a
  fixed-bandwidth pilot estimate using \code{hp}. If used, the pixel image
  \emph{must} be defined over the same domain as the data given
\code{resolution} or the supplied pre-set \code{xy} evaluation grid;
  \bold{or} the planar point pattern data must be defined with respect to the
  same polygonal study region as in \code{pp}.}

\item{leaveoneout}{Logical value indicating whether to compute and return
the value of the density/intensity at each data point for an adaptive
estimate. See `Details'.}

\item{parallelise}{Numeric argument to invoke parallel processing, giving
the number of CPU cores to use when \code{leaveoneout = TRUE}. Experimental.
Test your system first using \code{parallel::detectCores()} to identify the
number of cores available to you.}

\item{davies.baddeley}{An optional numeric vector of length 3 to control
bandwidth partitioning for approximate adaptive estimation, giving the
quantile step values for the variable bandwidths for density/intensity and
edge correction surfaces and the resolution of the edge correction surface.
May also be provided as a single numeric value. See `Details'.}

\item{verbose}{Logical value indicating whether to print a function progress
bar to the console when \code{adapt = TRUE}.}
}
\value{
If \code{leaveoneout = FALSE}, an object of class \code{"bivden"}.
This is effectively a list with the following components:
\item{z}{The
resulting density/intensity estimate, a pixel image object of class
\code{\link[spatstat.geom]{im}}.}

\item{h0}{A copy of the value of \code{h0}
used.} \item{hp}{A copy of the value of \code{hp} used.}

\item{h}{A numeric
vector of length equal to the number of data points, giving the bandwidth
used for the corresponding observation in \code{pp}.}

\item{him}{A pixel
image (class \code{\link[spatstat.geom]{im}}), giving the `hypothetical' Abramson
bandwidth at each pixel coordinate conditional upon the observed data.
\code{NULL} for fixed-bandwidth estimates.}

\item{q}{Edge-correction
weights; a pixel \code{\link[spatstat.geom]{im}}age if \code{edge = "uniform"}, a
numeric vector if \code{edge = "diggle"}, and \code{NULL} if \code{edge =
"none"}.}

\item{gamma}{The value of \eqn{\gamma} used in scaling the
bandwidths. \code{NA} if a fixed bandwidth estimate is computed.}

\item{geometric}{The geometric mean \eqn{G} of the untrimmed bandwidth
factors \eqn{\tilde{f}(x_i)^{-1/2}}. \code{NA} if a fixed bandwidth estimate
is computed.}

\item{pp}{A copy of the \code{\link[spatstat.geom:ppp]{ppp.object}}
initially passed to the \code{pp} argument, containing the data that were
smoothed.}


Else, if \code{leaveoneout = TRUE}, simply a numeric vector of length equal to the
number of data points, giving the leave-one-out value of the function at the
corresponding coordinate.
}
\description{
Provides an isotropic adaptive or fixed bandwidth kernel density/intensity
estimate of bivariate/planar/2D data.
}
\details{
Given a data set \eqn{x_1,\dots,x_n} in 2D, the isotropic kernel estimate of
its probability density function, \eqn{\hat{f}(x)}{\hat{f}(x)}, is given by
\deqn{\hat{f}(y)=n^{-1}\sum_{i=1}^{n}h(x_i)^{-2}K((y-x_i)/h(x_i)) }
where \eqn{h(x)}{h(x)} is the bandwidth function, and \eqn{K(.)} is the
bivariate standard normal smoothing kernel. Edge-correction factors (not
shown above) are also implemented.

\describe{
  \item{\bold{Fixed}}{
    The classic fixed bandwidth kernel estimator is used when
    \code{adapt = FALSE}. This amounts to setting \eqn{h(u)=}\code{h0} for all \eqn{u}.
    Further details can be found in the documentation for \code{\link[spatstat.explore]{density.ppp}}.}
  \item{\bold{Adaptive}}{Setting \code{adapt = TRUE} requests computation of Abramson's (1982)
    variable-bandwidth estimator. Under this framework, we have
    \eqn{h(u)=}\code{h0}*min[\eqn{\tilde{f}(u)^{-1/2}},\eqn{G*}\code{trim}]/\eqn{\gamma},
    where \eqn{\tilde{f}(u)} is a fixed-bandwidth kernel density estimate
    computed using the pilot bandwidth \code{hp}.
    \itemize{
      \item Global smoothing of the variable bandwidths is controlled with the global bandwidth
        \code{h0}.
      \item In the above statement, \eqn{G} is the geometric mean of the
        ``bandwidth factors'' \eqn{\tilde{f}(x_i)^{-1/2}}; \eqn{i=1,\dots,n}. By
        default, the variable bandwidths are rescaled by \eqn{\gamma=G}, which is
        set with \code{gamma.scale = "geometric"}. This allows \code{h0} to be
        considered on the same scale as the smoothing parameter in a fixed-bandwidth
        estimate i.e. on the scale of the recorded data. You can use any other
        rescaling of \code{h0} by setting \code{gamma.scale} to be any scalar
        positive numeric value; though note this only affects \eqn{\gamma} -- see
        the next bullet. When using a scale-invariant \code{h0}, set
        \code{gamma.scale = 1}.
     \item The variable bandwidths must be trimmed to
        prevent excessive values (Hall and Marron, 1988). This is achieved through
        \code{trim}, as can be seen in the equation for \eqn{h(u)} above. The
        trimming of the variable bandwidths is universally enforced by the geometric
        mean of the bandwidth factors \eqn{G} independent of the choice of
        \eqn{\gamma}. By default, the function truncates bandwidth factors at five
        times their geometric mean. For stricter trimming, reduce \code{trim}, for
        no trimming, set \code{trim = Inf}.
     \item For even moderately sized data sets
        and evaluation grid \code{resolution}, adaptive kernel estimation can be
        rather computationally expensive. The argument \code{davies.baddeley} is
        used to approximate an adaptive kernel estimate by a sum of fixed bandwidth
        estimates operating on appropriate subsets of \code{pp}. These subsets are
        defined by ``bandwidth bins'', which themselves are delineated by a quantile
        step value \eqn{0<\delta<1}. E.g. setting \eqn{\delta=0.05} will create 20
        bandwidth bins based on the 0.05th quantiles of the Abramson variable
        bandwidths. Adaptive edge-correction also utilises the partitioning, with
        pixel-wise bandwidth bins defined using the value \eqn{0<\beta<1}, and the
        option to decrease the resolution of the edge-correction surface for
        computation to a [\eqn{L} \eqn{\times}{x} \eqn{L}] grid, where \eqn{0 <L
        \le} \code{resolution}. If \code{davies.baddeley} is supplied as a vector of
        length 3, then the values \code{[1], [2], and [3]} correspond to the
        parameters \eqn{\delta}, \eqn{\beta}, and \eqn{L_M=L_N} in Davies and
        Baddeley (2018). If the argument is simply a single numeric value, it is
        used for both \eqn{\delta} and \eqn{\beta}, with
        \eqn{L_M=L_N=}\code{resolution} (i.e. no edge-correction surface
        coarsening).
     \item Computation of leave-one-out values (when
        \code{leaveoneout = TRUE}) is done by brute force, and is therefore very
        computationally expensive for adaptive smoothing. This is because the
        leave-one-out mechanism is applied to both the pilot estimation and the
        final estimation stages. Experimental code to do this via parallel
        processing using the \code{\link[foreach]{foreach}} routine is implemented.
        Fixed-bandwidth leave-one-out can be performed directly in
        \code{\link[spatstat.explore]{density.ppp}}.
    }
  }
}
}
\examples{

data(chorley) # Chorley-Ribble data from package 'spatstat'

# Fixed bandwidth kernel density; uniform edge correction
chden1 <- bivariate.density(chorley,h0=1.5) 

# Fixed bandwidth kernel density; diggle edge correction; coarser resolution
chden2 <- bivariate.density(chorley,h0=1.5,edge="diggle",resolution=64) 

\donttest{
# Adaptive smoothing; uniform edge correction
chden3 <- bivariate.density(chorley,h0=1.5,hp=1,adapt=TRUE)

# Adaptive smoothing; uniform edge correction; partitioning approximation
chden4 <- bivariate.density(chorley,h0=1.5,hp=1,adapt=TRUE,davies.baddeley=0.025)
 
oldpar <- par(mfrow=c(2,2))
plot(chden1);plot(chden2);plot(chden3);plot(chden4)
par(oldpar)
}

}
\references{
Abramson, I. (1982). On bandwidth variation in kernel estimates
--- a square root law, \emph{Annals of Statistics}, \bold{10}(4),
1217-1223.

Davies, T.M. and Baddeley A. (2018), Fast computation of
spatially adaptive kernel estimates, \emph{Statistics and Computing}, \bold{28}(4), 937-956.

Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative
risk, \emph{Statistics in Medicine}, \bold{29}(23) 2423-2437.

Davies, T.M., Jones, K. and Hazelton, M.L. (2016), Symmetric adaptive smoothing
regimens for estimation of the spatial relative risk function,
\emph{Computational Statistics & Data Analysis}, \bold{101}, 12-28.

Diggle, P.J. (1985), A kernel method for smoothing point process data,
\emph{Journal of the Royal Statistical Society, Series C}, \bold{34}(2),
138-147.

Hall P. and Marron J.S. (1988) Variable window width kernel
density estimates of probability densities. \emph{Probability Theory and
Related Fields}, \bold{80}, 37-49.

Marshall, J.C. and Hazelton, M.L. (2010) Boundary kernels for adaptive density
estimators on regions with irregular boundaries, \emph{Journal of Multivariate
Analysis}, \bold{101}, 949-963.

Silverman, B.W. (1986), \emph{Density Estimation for
Statistics and Data Analysis}, Chapman & Hall, New York.

Wand, M.P. and Jones, C.M., 1995. \emph{Kernel Smoothing}, Chapman & Hall, London.
}
\author{
T.M. Davies and J.C. Marshall
}