1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
|
#' @title Create control structures for feature selection.
#'
#' @description
#' Feature selection method used by [selectFeatures].\cr
#' The methods used here follow a wrapper approach, described in
#' Kohavi and John (1997) (see references).
#'
#' The following optimization algorithms are available:
#' \describe{
#' \item{FeatSelControlExhaustive}{Exhaustive search. All feature sets (up to a certain number
#' of features `max.features`) are searched.}
#' \item{FeatSelControlRandom}{Random search. Features vectors are randomly drawn,
#' up to a certain number of features `max.features`.
#' A feature is included in the current set with probability `prob`.
#' So we are basically drawing (0,1)-membership-vectors, where each element
#' is Bernoulli(`prob`) distributed.}
#' \item{FeatSelControlSequential}{Deterministic forward or backward search. That means extending
#' (forward) or shrinking (backward) a feature set.
#' Depending on the given `method` different approaches are taken.\cr
#' `sfs` Sequential Forward Search: Starting from an empty model, in each step the feature increasing
#' the performance measure the most is added to the model.\cr
#' `sbs` Sequential Backward Search: Starting from a model with all features, in each step the feature
#' decreasing the performance measure the least is removed from the model.\cr
#' `sffs` Sequential Floating Forward Search: Starting from an empty model, in each step the algorithm
#' chooses the best model from all models with one additional feature and from all models with one
#' feature less.\cr
#' `sfbs` Sequential Floating Backward Search: Similar to `sffs` but starting with a full model.}
#' \item{FeatSelControlGA}{Search via genetic algorithm.
#' The GA is a simple (`mu`, `lambda`) or (`mu` + `lambda`) algorithm,
#' depending on the `comma` setting.
#' A comma strategy selects a new population of size `mu` out of the
#' `lambda` > `mu` offspring.
#' A plus strategy uses the joint pool of `mu` parents and `lambda` offspring
#' for selecting `mu` new candidates.
#' Out of those `mu` features, the new `lambda` features are generated
#' by randomly choosing pairs of parents. These are crossed over and `crossover.rate`
#' represents the probability of choosing a feature from the first parent instead of
#' the second parent.
#' The resulting offspring is mutated, i.e., its bits are flipped with
#' probability `mutation.rate`. If `max.features` is set, offspring are
#' repeatedly generated until the setting is satisfied.}
#' }
#'
#' @param same.resampling.instance (`logical(1)`)\cr
#' Should the same resampling instance be used for all evaluations to reduce variance?
#' Default is `TRUE`.
#' @template arg_imputey
#' @param maxit (`integer(1)`)\cr
#' Maximal number of iterations. Note, that this is usually not equal to the number
#' of function evaluations.
#' @param max.features (`integer(1)`)\cr
#' Maximal number of features.
#' @param tune.threshold (`logical(1)`)\cr
#' Should the threshold be tuned for the measure at hand, after each feature set evaluation,
#' via [tuneThreshold]?
#' Only works for classification if the predict type is \dQuote{prob}.
#' Default is `FALSE`.
#' @param tune.threshold.args ([list])\cr
#' Further arguments for threshold tuning that are passed down to [tuneThreshold].
#' Default is none.
#' @template arg_log_fun
#' @param prob (`numeric(1)`)\cr
#' Parameter of the random feature selection. Probability of choosing a feature.
#' @param method (`character(1)`)\cr
#' Parameter of the sequential feature selection. A character representing the method. Possible
#' values are `sfs` (forward search), `sbs` (backward search), `sffs`
#' (floating forward search) and `sfbs` (floating backward search).
#' @param alpha (`numeric(1)`)\cr
#' Parameter of the sequential feature selection.
#' Minimal required value of improvement difference for a forward / adding step.
#' Default is 0.01.
#' @param beta (`numeric(1)`)\cr
#' Parameter of the sequential feature selection.
#' Minimal required value of improvement difference for a backward / removing step.
#' Negative values imply that you allow a slight decrease for the removal of a feature.
#' Default is -0.001.
#' @param mu (`integer(1)`)\cr
#' Parameter of the GA feature selection. Size of the parent population.
#' @param lambda (`integer(1)`)\cr
#' Parameter of the GA feature selection. Size of the children population (should be smaller
#' or equal to `mu`).
#' @param crossover.rate (`numeric(1)`)\cr
#' Parameter of the GA feature selection. Probability of choosing a bit from the first parent
#' within the crossover mutation.
#' @param mutation.rate (`numeric(1)`)\cr
#' Parameter of the GA feature selection. Probability of flipping a feature bit, i.e. switch
#' between selecting / deselecting a feature.
#' @param comma (`logical(1)`)\cr
#' Parameter of the GA feature selection, indicating whether to use a (`mu`, `lambda`)
#' or (`mu` + `lambda`) GA. The default is `FALSE`.
#' @return ([FeatSelControl]). The specific subclass is one of
#' [FeatSelControlExhaustive], [FeatSelControlRandom],
#' [FeatSelControlSequential], [FeatSelControlGA].
#' @references Ron Kohavi and George H. John,
#' Wrappers for feature subset selection, Artificial Intelligence Volume 97, 1997, 273-324.
#' <http://ai.stanford.edu/~ronnyk/wrappersPrint.pdf>.\cr
#' @family featsel
#' @name FeatSelControl
#' @rdname FeatSelControl
#' @aliases FeatSelControlExhaustive FeatSelControlRandom FeatSelControlSequential FeatSelControlGA
NULL
makeFeatSelControl = function(same.resampling.instance, impute.val = NULL, maxit, max.features,
tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", ..., cl) {
maxit = asCount(maxit, na.ok = TRUE, positive = TRUE)
max.features = asCount(max.features, na.ok = TRUE, positive = TRUE)
if (identical(log.fun, "default")) {
log.fun = logFunFeatSel
} else if (identical(log.fun, "memory")) {
log.fun = logFunTuneMemory
}
x = makeOptControl(same.resampling.instance, impute.val, tune.threshold, tune.threshold.args, log.fun, ...)
x$maxit = maxit
x$max.features = max.features
addClasses(x, c(cl, "FeatSelControl"))
}
#' @export
print.FeatSelControl = function(x, ...) {
catf("FeatSel control: %s", class(x)[1])
catf("Same resampling instance: %s", x$same.resampling.instance)
catf("Imputation value: %s", ifelse(is.null(x$impute.val), "<worst>", sprintf("%g", x$impute.val)))
if (is.na(x$max.features)) {
catf("Max. features: <not used>")
} else {
catf("Max. features: %i", x$max.features)
}
catf("Max. iterations: %i", x$maxit)
catf("Tune threshold: %s", x$tune.threshold)
if (length(x$extra.args)) {
catf("Further arguments: %s", convertToShortString(x$extra.args))
} else {
catf("Further arguments: <not used>")
}
}
|