sklearn/doc/modules/preprocessing_targets.rst

.. currentmodule:: sklearn.preprocessing

.. _preprocessing_targets:

==========================================
Transforming the prediction target (``y``)
==========================================

These are transformers that are not intended to be used on features, only on
supervised learning targets. See also :ref:`transformed_target_regressor` if
you want to transform the prediction target for learning, but evaluate the
model in the original (untransformed) space.

Label binarization
==================

LabelBinarizer
--------------

:class:`LabelBinarizer` is a utility class to help create a :term:`label
indicator matrix` from a list of :term:`multiclass` labels::

    >>> from sklearn import preprocessing
    >>> lb = preprocessing.LabelBinarizer()
    >>> lb.fit([1, 2, 6, 4, 2])
    LabelBinarizer()
    >>> lb.classes_
    array([1, 2, 4, 6])
    >>> lb.transform([1, 6])
    array([[1, 0, 0, 0],
           [0, 0, 0, 1]])

Using this format can enable multiclass classification in estimators
that support the label indicator matrix format.

.. warning::

    LabelBinarizer is not needed if you are using an estimator that
    already supports :term:`multiclass` data.

For more information about multiclass classification, refer to
:ref:`multiclass_classification`.

MultiLabelBinarizer
-------------------

In :term:`multilabel` learning, the joint set of binary classification tasks is
expressed with a label binary indicator array: each sample is one row of a 2d
array of shape (n_samples, n_classes) with binary values where the one, i.e. the
non zero elements, corresponds to the subset of labels for that sample. An array
such as ``np.array([[1, 0, 0], [0, 1, 1], [0, 0, 0]])`` represents label 0 in the
first sample, labels 1 and 2 in the second sample, and no labels in the third
sample.

Producing multilabel data as a list of sets of labels may be more intuitive.
The :class:`MultiLabelBinarizer <sklearn.preprocessing.MultiLabelBinarizer>`
transformer can be used to convert between a collection of collections of
labels and the indicator format::

    >>> from sklearn.preprocessing import MultiLabelBinarizer
    >>> y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]
    >>> MultiLabelBinarizer().fit_transform(y)
    array([[0, 0, 1, 1, 1],
           [0, 0, 1, 0, 0],
           [1, 1, 0, 1, 0],
           [1, 1, 1, 1, 1],
           [1, 1, 1, 0, 0]])

For more information about multilabel classification, refer to
:ref:`multilabel_classification`.

Label encoding
==============

:class:`LabelEncoder` is a utility class to help normalize labels such that
they contain only values between 0 and n_classes-1. This is sometimes useful
for writing efficient Cython routines. :class:`LabelEncoder` can be used as
follows::

    >>> from sklearn import preprocessing
    >>> le = preprocessing.LabelEncoder()
    >>> le.fit([1, 2, 2, 6])
    LabelEncoder()
    >>> le.classes_
    array([1, 2, 6])
    >>> le.transform([1, 1, 2, 6])
    array([0, 0, 1, 2])
    >>> le.inverse_transform([0, 0, 1, 2])
    array([1, 1, 2, 6])

It can also be used to transform non-numerical labels (as long as they are
hashable and comparable) to numerical labels::

    >>> le = preprocessing.LabelEncoder()
    >>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
    LabelEncoder()
    >>> list(le.classes_)
    ['amsterdam', 'paris', 'tokyo']
    >>> le.transform(["tokyo", "tokyo", "paris"])
    array([2, 2, 1])
    >>> list(le.inverse_transform([2, 2, 1]))
    ['tokyo', 'tokyo', 'paris']
first commit 2024-08-05 09:32:03 +02:00			`.. currentmodule:: sklearn.preprocessing`

			`.. _preprocessing_targets:`

			`==========================================`
			Transforming the prediction target (``y``)
			`==========================================`

			`These are transformers that are not intended to be used on features, only on`
			supervised learning targets. See also :ref:`transformed_target_regressor` if
			`you want to transform the prediction target for learning, but evaluate the`
			`model in the original (untransformed) space.`

			`Label binarization`
			`==================`

			`LabelBinarizer`
			`--------------`

			:class:`LabelBinarizer` is a utility class to help create a :term:`label
			indicator matrix` from a list of :term:`multiclass` labels::

			`>>> from sklearn import preprocessing`
			`>>> lb = preprocessing.LabelBinarizer()`
			`>>> lb.fit([1, 2, 6, 4, 2])`
			`LabelBinarizer()`
			`>>> lb.classes_`
			`array([1, 2, 4, 6])`
			`>>> lb.transform([1, 6])`
			`array([[1, 0, 0, 0],`
			`[0, 0, 0, 1]])`

			`Using this format can enable multiclass classification in estimators`
			`that support the label indicator matrix format.`

			`.. warning::`

			`LabelBinarizer is not needed if you are using an estimator that`
			already supports :term:`multiclass` data.

			`For more information about multiclass classification, refer to`
			:ref:`multiclass_classification`.

			`MultiLabelBinarizer`
			`-------------------`

			In :term:`multilabel` learning, the joint set of binary classification tasks is
			`expressed with a label binary indicator array: each sample is one row of a 2d`
			`array of shape (n_samples, n_classes) with binary values where the one, i.e. the`
			`non zero elements, corresponds to the subset of labels for that sample. An array`
			such as ``np.array([[1, 0, 0], [0, 1, 1], [0, 0, 0]])`` represents label 0 in the
			`first sample, labels 1 and 2 in the second sample, and no labels in the third`
			`sample.`

			`Producing multilabel data as a list of sets of labels may be more intuitive.`
			The :class:`MultiLabelBinarizer <sklearn.preprocessing.MultiLabelBinarizer>`
			`transformer can be used to convert between a collection of collections of`
			`labels and the indicator format::`

			`>>> from sklearn.preprocessing import MultiLabelBinarizer`
			`>>> y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]`
			`>>> MultiLabelBinarizer().fit_transform(y)`
			`array([[0, 0, 1, 1, 1],`
			`[0, 0, 1, 0, 0],`
			`[1, 1, 0, 1, 0],`
			`[1, 1, 1, 1, 1],`
			`[1, 1, 1, 0, 0]])`

			`For more information about multilabel classification, refer to`
			:ref:`multilabel_classification`.

			`Label encoding`
			`==============`

			:class:`LabelEncoder` is a utility class to help normalize labels such that
			`they contain only values between 0 and n_classes-1. This is sometimes useful`
			for writing efficient Cython routines. :class:`LabelEncoder` can be used as
			`follows::`

			`>>> from sklearn import preprocessing`
			`>>> le = preprocessing.LabelEncoder()`
			`>>> le.fit([1, 2, 2, 6])`
			`LabelEncoder()`
			`>>> le.classes_`
			`array([1, 2, 6])`
			`>>> le.transform([1, 1, 2, 6])`
			`array([0, 0, 1, 2])`
			`>>> le.inverse_transform([0, 0, 1, 2])`
			`array([1, 1, 2, 6])`

			`It can also be used to transform non-numerical labels (as long as they are`
			`hashable and comparable) to numerical labels::`

			`>>> le = preprocessing.LabelEncoder()`
			`>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])`
			`LabelEncoder()`
			`>>> list(le.classes_)`
			`['amsterdam', 'paris', 'tokyo']`
			`>>> le.transform(["tokyo", "tokyo", "paris"])`
			`array([2, 2, 1])`
			`>>> list(le.inverse_transform([2, 2, 1]))`
			`['tokyo', 'tokyo', 'paris']`