1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
|
.. _data_reduction:
=====================================
Unsupervised dimensionality reduction
=====================================
If your number of features is high, it may be useful to reduce it with an
unsupervised step prior to supervised steps. Many of the
:ref:`unsupervised-learning` methods implement a ``transform`` method that
can be used to reduce the dimensionality. Below we discuss two specific
example of this pattern that are heavily used.
.. topic:: **Pipelining**
The unsupervised data reduction and the supervised estimator can be
chained in one step. See :ref:`pipeline`.
.. currentmodule:: sklearn
PCA: principal component analysis
----------------------------------
:class:`decomposition.PCA` looks for a combination of features that
capture well the variance of the original features. See :ref:`decompositions`.
.. topic:: **Examples**
* :ref:`sphx_glr_auto_examples_applications_plot_face_recognition.py`
Random projections
-------------------
The module: :mod:`~sklearn.random_projection` provides several tools for data
reduction by random projections. See the relevant section of the
documentation: :ref:`random_projection`.
.. topic:: **Examples**
* :ref:`sphx_glr_auto_examples_miscellaneous_plot_johnson_lindenstrauss_bound.py`
Feature agglomeration
------------------------
:class:`cluster.FeatureAgglomeration` applies
:ref:`hierarchical_clustering` to group together features that behave
similarly.
.. topic:: **Examples**
* :ref:`sphx_glr_auto_examples_cluster_plot_feature_agglomeration_vs_univariate_selection.py`
* :ref:`sphx_glr_auto_examples_cluster_plot_digits_agglomeration.py`
.. topic:: **Feature scaling**
Note that if features have very different scaling or statistical
properties, :class:`cluster.FeatureAgglomeration` may not be able to
capture the links between related features. Using a
:class:`preprocessing.StandardScaler` can be useful in these settings.
|