1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/logLik.R
\name{logLik.classIntervals}
\alias{logLik.classIntervals}
\title{Log-likelihood for classIntervals objects}
\usage{
\method{logLik}{classIntervals}(object, ...)
}
\arguments{
\item{object}{A classIntervals object}
\item{...}{Ignored.}
}
\value{
A `logLik` object (see `stats::logLik`).
}
\description{
Log-likelihood for classIntervals objects
}
\details{
Generally, the likelihood is a method for minimizing the standard deviation
within an interval, and with the AIC, a per-interval penalty can be used to
maximize the information and self-similarity of data in the interval.
Based on Birge 2006 and Davies 2009 (see references), interval binning
selections may be compared by likelihood to optimize the number of intervals
selected for a set of data. The `logLik()` function (and associated `AIC()`
function) can be used to optimize binning by maximizing the likelihood across
choices of intervals.
As illustrated by the examples below (the AIC comparison does not
specifically select 3 intervals when comparing 2, 3, and 4 intervals for data
with 3 intervals), while likelihood-based methods can provide evidence toward
optimization of binning, they are not infallible for bin selection.
}
\examples{
x <- classIntervals(rnorm(100), n=5, style="fisher")
logLik(x)
AIC(x) # By having a logLik method, AIC.default is used.
# When the intervals are made of a limited number of discrete values, the
# logLik is zero by definition (the standard deviation is zero giving a dirac
# function at the discrete value indicating a density of 1 and a log-density
# of zero).
x <- classIntervals(rep(1:2, each=10), n=2, style="jenks")
logLik(x)
x <- classIntervals(rep(1:3, each=10), n=2, style="jenks")
logLik(x)
# With slight jitter but notable categorical intervals (at 1, 2, and 3), the
# AIC will make selection of the optimal intervals easier.
data <- rep(1:3, each=100) + runif(n=300, min=-0.01, max=0.01)
x_2 <- classIntervals(data, n=2, style="jenks")
x_3 <- classIntervals(data, n=3, style="jenks")
x_4 <- classIntervals(data, n=4, style="jenks")
AIC(x_2, x_3, x_4)
}
\references{
Lucien Birge, Yves Rozenholc. How many bins should be put in a regular
histogram. ESAIM: Probability and Statistics. 31 January 2006. 10:24-45.
url: https://www.esaim-ps.org/articles/ps/abs/2006/01/ps0322/ps0322.html.
doi:10.1051/ps:2006001
Laurie Davies, Ursula Gather, Dan Nordman, Henrike Weinert. A comparison of
automatic histogram constructions. ESAIM: Probability and Statistics. 11
June 2009. 13:181-196. url:
https://www.esaim-ps.org/articles/ps/abs/2009/01/ps0721/ps0721.html
doi:10.1051/ps:2008005
}
|