60 lines
1.9 KiB
ReStructuredText
60 lines
1.9 KiB
ReStructuredText
|
|
||
|
.. _data_reduction:
|
||
|
|
||
|
=====================================
|
||
|
Unsupervised dimensionality reduction
|
||
|
=====================================
|
||
|
|
||
|
If your number of features is high, it may be useful to reduce it with an
|
||
|
unsupervised step prior to supervised steps. Many of the
|
||
|
:ref:`unsupervised-learning` methods implement a ``transform`` method that
|
||
|
can be used to reduce the dimensionality. Below we discuss two specific
|
||
|
example of this pattern that are heavily used.
|
||
|
|
||
|
.. topic:: **Pipelining**
|
||
|
|
||
|
The unsupervised data reduction and the supervised estimator can be
|
||
|
chained in one step. See :ref:`pipeline`.
|
||
|
|
||
|
.. currentmodule:: sklearn
|
||
|
|
||
|
PCA: principal component analysis
|
||
|
----------------------------------
|
||
|
|
||
|
:class:`decomposition.PCA` looks for a combination of features that
|
||
|
capture well the variance of the original features. See :ref:`decompositions`.
|
||
|
|
||
|
.. rubric:: Examples
|
||
|
|
||
|
* :ref:`sphx_glr_auto_examples_applications_plot_face_recognition.py`
|
||
|
|
||
|
Random projections
|
||
|
-------------------
|
||
|
|
||
|
The module: :mod:`~sklearn.random_projection` provides several tools for data
|
||
|
reduction by random projections. See the relevant section of the
|
||
|
documentation: :ref:`random_projection`.
|
||
|
|
||
|
.. rubric:: Examples
|
||
|
|
||
|
* :ref:`sphx_glr_auto_examples_miscellaneous_plot_johnson_lindenstrauss_bound.py`
|
||
|
|
||
|
Feature agglomeration
|
||
|
------------------------
|
||
|
|
||
|
:class:`cluster.FeatureAgglomeration` applies
|
||
|
:ref:`hierarchical_clustering` to group together features that behave
|
||
|
similarly.
|
||
|
|
||
|
.. rubric:: Examples
|
||
|
|
||
|
* :ref:`sphx_glr_auto_examples_cluster_plot_feature_agglomeration_vs_univariate_selection.py`
|
||
|
* :ref:`sphx_glr_auto_examples_cluster_plot_digits_agglomeration.py`
|
||
|
|
||
|
.. topic:: **Feature scaling**
|
||
|
|
||
|
Note that if features have very different scaling or statistical
|
||
|
properties, :class:`cluster.FeatureAgglomeration` may not be able to
|
||
|
capture the links between related features. Using a
|
||
|
:class:`preprocessing.StandardScaler` can be useful in these settings.
|