File: reduce_parameters.Rd

package info (click to toggle)
r-cran-parameters 0.24.2-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 3,852 kB
  • sloc: sh: 16; makefile: 2
file content (94 lines) | stat: -rw-r--r-- 4,741 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/reduce_parameters.R
\name{reduce_parameters}
\alias{reduce_parameters}
\alias{reduce_data}
\title{Dimensionality reduction (DR) / Features Reduction}
\usage{
reduce_parameters(x, method = "PCA", n = "max", distance = "euclidean", ...)

reduce_data(x, method = "PCA", n = "max", distance = "euclidean", ...)
}
\arguments{
\item{x}{A data frame or a statistical model.}

\item{method}{The feature reduction method. Can be one of \code{"PCA"}, \code{"cMDS"},
\code{"DRR"}, \code{"ICA"} (see the 'Details' section).}

\item{n}{Number of components to extract. If \code{n="all"}, then \code{n} is set as
the number of variables minus 1 (\code{ncol(x)-1}). If \code{n="auto"} (default) or
\code{n=NULL}, the number of components is selected through \code{\link[=n_factors]{n_factors()}} resp.
\code{\link[=n_components]{n_components()}}. Else, if \code{n} is a number, \code{n} components are extracted.
If \code{n} exceeds number of variables in the data, it is automatically set to
the maximum number (i.e. \code{ncol(x)}). In \code{\link[=reduce_parameters]{reduce_parameters()}}, can also
be \code{"max"}, in which case it will select all the components that are
maximally pseudo-loaded (i.e., correlated) by at least one variable.}

\item{distance}{The distance measure to be used. Only applies when
\code{method = "cMDS"}. This must be one of \code{"euclidean"}, \code{"maximum"},
\code{"manhattan"}, \code{"canberra"}, \code{"binary"} or \code{"minkowski"}. Any unambiguous
substring can be given.}

\item{...}{Arguments passed to or from other methods.}
}
\description{
This function performs a reduction in the parameter space (the number of
variables). It starts by creating a new set of variables, based on the given
method (the default method is "PCA", but other are available via the
\code{method} argument, such as "cMDS", "DRR" or "ICA"). Then, it names this
new dimensions using the original variables that correlates the most with it.
For instance, a variable named \code{'V1_0.97/V4_-0.88'} means that the V1 and the
V4 variables correlate maximally (with respective coefficients of .97 and
-.88) with this dimension. Although this function can be useful in
exploratory data analysis, it's best to perform the dimension reduction step
in a separate and dedicated stage, as this is a very important process in the
data analysis workflow. \code{reduce_data()} is an alias for
\code{reduce_parameters.data.frame()}.
}
\details{
The different methods available are described below:
\subsection{Supervised Methods}{
\itemize{
\item \strong{PCA}: See \code{\link[=principal_components]{principal_components()}}.
\item \strong{cMDS / PCoA}: Classical Multidimensional Scaling (cMDS) takes a
set of dissimilarities (i.e., a distance matrix) and returns a set of points
such that the distances between the points are approximately equal to the
dissimilarities.
\item \strong{DRR}: Dimensionality Reduction via Regression (DRR) is a very
recent technique extending PCA (\emph{Laparra et al., 2015}). Starting from a
rotated PCA, it predicts redundant information from the remaining components
using non-linear regression. Some of the most notable advantages of
performing DRR are avoidance of multicollinearity between predictors and
overfitting mitigation. DRR tends to perform well when the first principal
component is enough to explain most of the variation in the predictors.
Requires the \strong{DRR} package to be installed.
\item \strong{ICA}: Performs an Independent Component Analysis using the
FastICA algorithm. Contrary to PCA, which attempts to find uncorrelated
sources (through least squares minimization), ICA attempts to find
independent sources, i.e., the source space that maximizes the
"non-gaussianity" of all sources. Contrary to PCA, ICA does not rank each
source, which makes it a poor tool for dimensionality reduction. Requires the
\strong{fastICA} package to be installed.
}

See also \href{https://easystats.github.io/parameters/articles/parameters_reduction.html}{package vignette}.
}
}
\examples{
data(iris)
model <- lm(Sepal.Width ~ Species * Sepal.Length + Petal.Width, data = iris)
model
reduce_parameters(model)

out <- reduce_data(iris, method = "PCA", n = "max")
head(out)
}
\references{
\itemize{
\item Nguyen, L. H., and Holmes, S. (2019). Ten quick tips for effective
dimensionality reduction. PLOS Computational Biology, 15(6).
\item Laparra, V., Malo, J., and Camps-Valls, G. (2015). Dimensionality
reduction via regression in hyperspectral imagery. IEEE Journal of Selected
Topics in Signal Processing, 9(6), 1026-1036.
}
}