File: model_frame.Rd

package info (click to toggle)
r-cran-hardhat 1.2.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 1,656 kB
  • sloc: sh: 13; makefile: 2
file content (78 lines) | stat: -rw-r--r-- 3,088 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/model-frame.R
\name{model_frame}
\alias{model_frame}
\title{Construct a model frame}
\usage{
model_frame(formula, data)
}
\arguments{
\item{formula}{A formula or terms object representing the terms of the
model frame.}

\item{data}{A data frame or matrix containing the terms of \code{formula}.}
}
\value{
A named list with two elements:
\itemize{
\item \code{"data"}: A tibble containing the model frame.
\item \code{"terms"}: A terms object containing the terms for the model frame.
}
}
\description{
\code{model_frame()} is a stricter version of \code{\link[stats:model.frame]{stats::model.frame()}}. There are
a number of differences, with the main being that rows are \emph{never} dropped
and the return value is a list with the frame and terms separated into
two distinct objects.
}
\details{
The following explains the rationale for some of the difference in arguments
compared to \code{\link[stats:model.frame]{stats::model.frame()}}:
\itemize{
\item \code{subset}: Not allowed because the number of rows before and after
\code{model_frame()} has been run should always be the same.
\item \code{na.action}: Not allowed and is forced to \code{"na.pass"} because the
number of rows before and after \code{model_frame()} has been run should always
be the same.
\item \code{drop.unused.levels}: Not allowed because it seems inconsistent for
\code{data} and the result of \code{model_frame()} to ever have the same factor column
but with different levels, unless specified though \code{original_levels}. If
this is required, it should be done through a recipe step explicitly.
\item \code{xlev}: Not allowed because this check should have been done ahead of
time. Use \code{\link[=scream]{scream()}} to check the integrity of \code{data} against a training
set if that is required.
\item \code{...}: Not exposed because offsets are handled separately, and
it is not necessary to pass weights here any more because rows are never
dropped (so weights don't have to be subset alongside the rest of the
design matrix). If other non-predictor columns are required, use the
"roles" features of recipes.
}

It is important to always use the results of \code{model_frame()} with
\code{\link[=model_matrix]{model_matrix()}} rather than \code{\link[stats:model.matrix]{stats::model.matrix()}} because the tibble
in the result of \code{model_frame()} does \emph{not} have a terms object attached.
If \verb{model.matrix(<terms>, <tibble>)} is called directly, then a call to
\code{model.frame()} will be made automatically, which can give faulty results.
}
\examples{
# ---------------------------------------------------------------------------
# Example usage

framed <- model_frame(Species ~ Sepal.Width, iris)

framed$data

framed$terms

# ---------------------------------------------------------------------------
# Missing values never result in dropped rows

iris2 <- iris
iris2$Sepal.Width[1] <- NA

framed2 <- model_frame(Species ~ Sepal.Width, iris2)

head(framed2$data)

nrow(framed2$data) == nrow(iris2)
}