File: validate_column_names.Rd

package info (click to toggle)
r-cran-hardhat 1.2.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 1,656 kB
  • sloc: sh: 13; makefile: 2
file content (117 lines) | stat: -rw-r--r-- 3,882 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/validation.R
\name{validate_column_names}
\alias{validate_column_names}
\alias{check_column_names}
\title{Ensure that \code{data} contains required column names}
\usage{
validate_column_names(data, original_names)

check_column_names(data, original_names)
}
\arguments{
\item{data}{A data frame to check.}

\item{original_names}{A character vector. The original column names.}
}
\value{
\code{validate_column_names()} returns \code{data} invisibly.

\code{check_column_names()} returns a named list of two components,
\code{ok}, and \code{missing_names}.
}
\description{
validate - asserts the following:
\itemize{
\item The column names of \code{data} must contain all \code{original_names}.
}

check - returns the following:
\itemize{
\item \code{ok} A logical. Does the check pass?
\item \code{missing_names} A character vector. The missing column names.
}
}
\details{
A special error is thrown if the missing column is named \code{".outcome"}. This
only happens in the case where \code{\link[=mold]{mold()}} is called using the xy-method, and
a \emph{vector} \code{y} value is supplied rather than a data frame or matrix. In that
case, \code{y} is coerced to a data frame, and the automatic name \code{".outcome"} is
added, and this is what is looked for in \code{\link[=forge]{forge()}}. If this happens, and the
user tries to request outcomes using \code{forge(..., outcomes = TRUE)} but
the supplied \code{new_data} does not contain the required \code{".outcome"} column,
a special error is thrown telling them what to do. See the examples!
}
\section{Validation}{


hardhat provides validation functions at two levels.
\itemize{
\item \verb{check_*()}:  \emph{check a condition, and return a list}. The list
always contains at least one element, \code{ok}, a logical that specifies if the
check passed. Each check also has check specific elements in the returned
list that can be used to construct meaningful error messages.
\item \verb{validate_*()}: \emph{check a condition, and error if it does not pass}. These
functions call their corresponding check function, and
then provide a default error message. If you, as a developer, want a
different error message, then call the \verb{check_*()} function yourself,
and provide your own validation function.
}
}

\examples{
# ---------------------------------------------------------------------------

original_names <- colnames(mtcars)

test <- mtcars
bad_test <- test[, -c(3, 4)]

# All good
check_column_names(test, original_names)

# Missing 2 columns
check_column_names(bad_test, original_names)

# Will error
try(validate_column_names(bad_test, original_names))

# ---------------------------------------------------------------------------
# Special error when `.outcome` is missing

train <- iris[1:100, ]
test <- iris[101:150, ]

train_x <- subset(train, select = -Species)
train_y <- train$Species

# Here, y is a vector
processed <- mold(train_x, train_y)

# So the default column name is `".outcome"`
processed$outcomes

# It doesn't affect forge() normally
forge(test, processed$blueprint)

# But if the outcome is requested, and `".outcome"`
# is not present in `new_data`, an error is thrown
# with very specific instructions
try(forge(test, processed$blueprint, outcomes = TRUE))

# To get this to work, just create an .outcome column in new_data
test$.outcome <- test$Species

forge(test, processed$blueprint, outcomes = TRUE)
}
\seealso{
Other validation functions: 
\code{\link{validate_no_formula_duplication}()},
\code{\link{validate_outcomes_are_binary}()},
\code{\link{validate_outcomes_are_factors}()},
\code{\link{validate_outcomes_are_numeric}()},
\code{\link{validate_outcomes_are_univariate}()},
\code{\link{validate_prediction_size}()},
\code{\link{validate_predictors_are_numeric}()}
}
\concept{validation functions}