File: impute.Rd

package info (click to toggle)
r-cran-mlr 2.19.2%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 8,264 kB
  • sloc: ansic: 65; sh: 13; makefile: 5
file content (119 lines) | stat: -rw-r--r-- 4,358 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Impute.R
\name{impute}
\alias{impute}
\title{Impute and re-impute data}
\usage{
impute(
  obj,
  target = character(0L),
  classes = list(),
  cols = list(),
  dummy.classes = character(0L),
  dummy.cols = character(0L),
  dummy.type = "factor",
  force.dummies = FALSE,
  impute.new.levels = TRUE,
  recode.factor.levels = TRUE
)
}
\arguments{
\item{obj}{(\link{data.frame} | \link{Task})\cr
Input data.}

\item{target}{(\link{character})\cr
Name of the column(s) specifying the response.
Default is \code{character(0)}.}

\item{classes}{(named \link{list})\cr
Named list containing imputation techniques for classes of columns.
E.g. \code{list(numeric = imputeMedian())}.}

\item{cols}{(named \link{list})\cr
Named list containing names of imputation methods to impute missing values
in the data column referenced by the list element's name. Overrules imputation set via
\code{classes}.}

\item{dummy.classes}{(\link{character})\cr
Classes of columns to create dummy columns for.
Default is \code{character(0)}.}

\item{dummy.cols}{(\link{character})\cr
Column names to create dummy columns (containing binary missing indicator) for.
Default is \code{character(0)}.}

\item{dummy.type}{(\code{character(1)})\cr
How dummy columns are encoded. Either as 0/1 with type \dQuote{numeric}
or as \dQuote{factor}.
Default is \dQuote{factor}.}

\item{force.dummies}{(\code{logical(1)})\cr
Force dummy creation even if the respective data column does not
contain any NAs. Note that (a) most learners will complain about
constant columns created this way but (b) your feature set might
be stochastic if you turn this off.
Default is \code{FALSE}.}

\item{impute.new.levels}{(\code{logical(1)})\cr
If new, unencountered factor level occur during reimputation,
should these be handled as NAs and then be imputed the same way?
Default is \code{TRUE}.}

\item{recode.factor.levels}{(\code{logical(1)})\cr
Recode factor levels after reimputation, so they match the respective element of
\code{lvls} (in the description object) and therefore match the levels of the
feature factor in the training data after imputation?.
Default is \code{TRUE}.}
}
\value{
(\link{list})
\itemize{
\item data (\link{data.frame}): Imputed data.
\item desc (\code{ImputationDesc}): Description object.
}
}
\description{
Allows imputation of missing feature values through various techniques.
Note that you have the possibility to re-impute a data set
in the same way as the imputation was performed during training.
This especially comes in handy during resampling when one wants to perform the
same imputation on the test set as on the training set.

The function \code{impute} performs the imputation on a data set and returns,
alongside with the imputed data set, an \dQuote{ImputationDesc} object
which can contain \dQuote{learned} coefficients and helpful data.
It can then be passed together with a new data set to \link{reimpute}.

The imputation techniques can be specified for certain features or for feature classes,
see function arguments.

You can either provide an arbitrary object, use a built-in imputation method listed
under \link{imputations} or create one yourself using \link{makeImputeMethod}.
}
\details{
The description object contains these slots
\itemize{
\item target (\link{character}): See argument
\item features (\link{character}): Feature names (column names of \code{data})
\item classes (\link{character}): Feature classes (storage type of \code{data})
\item lvls (named \link{list}): Mapping of column names of factor features to their levels, including newly created ones during imputation
\item impute (named \link{list}): Mapping of column names to imputation functions
\item dummies (named \link{list}): Mapping of column names to imputation functions
\item impute.new.levels (\code{logical(1)}): See argument
\item recode.factor.levels (\code{logical(1)}): See argument
}
}
\examples{
df = data.frame(x = c(1, 1, NA), y = factor(c("a", "a", "b")), z = 1:3)
imputed = impute(df, target = character(0), cols = list(x = 99, y = imputeMode()))
print(imputed$data)
reimpute(data.frame(x = NA_real_), imputed$desc)
}
\seealso{
Other impute: 
\code{\link{imputations}},
\code{\link{makeImputeMethod}()},
\code{\link{makeImputeWrapper}()},
\code{\link{reimpute}()}
}
\concept{impute}