File: unsupervised_reduction.rst

package info (click to toggle)
scikit-learn 1.4.2%2Bdfsg-8
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 25,036 kB
  • sloc: python: 201,105; cpp: 5,790; ansic: 854; makefile: 304; sh: 56; javascript: 20
file content (59 lines) | stat: -rw-r--r-- 1,962 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

.. _data_reduction:

=====================================
Unsupervised dimensionality reduction
=====================================

If your number of features is high, it may be useful to reduce it with an
unsupervised step prior to supervised steps. Many of the
:ref:`unsupervised-learning` methods implement a ``transform`` method that
can be used to reduce the dimensionality. Below we discuss two specific
example of this pattern that are heavily used.

.. topic:: **Pipelining**

    The unsupervised data reduction and the supervised estimator can be
    chained in one step. See :ref:`pipeline`.

.. currentmodule:: sklearn

PCA: principal component analysis
----------------------------------

:class:`decomposition.PCA` looks for a combination of features that
capture well the variance of the original features. See :ref:`decompositions`.

.. topic:: **Examples**

   * :ref:`sphx_glr_auto_examples_applications_plot_face_recognition.py`

Random projections
-------------------

The module: :mod:`~sklearn.random_projection` provides several tools for data
reduction by random projections. See the relevant section of the
documentation: :ref:`random_projection`.

.. topic:: **Examples**

   * :ref:`sphx_glr_auto_examples_miscellaneous_plot_johnson_lindenstrauss_bound.py`

Feature agglomeration
------------------------

:class:`cluster.FeatureAgglomeration` applies
:ref:`hierarchical_clustering` to group together features that behave
similarly.

.. topic:: **Examples**

   * :ref:`sphx_glr_auto_examples_cluster_plot_feature_agglomeration_vs_univariate_selection.py`
   * :ref:`sphx_glr_auto_examples_cluster_plot_digits_agglomeration.py`

.. topic:: **Feature scaling**

   Note that if features have very different scaling or statistical
   properties, :class:`cluster.FeatureAgglomeration` may not be able to
   capture the links between related features. Using a
   :class:`preprocessing.StandardScaler` can be useful in these settings.