1004 lines
44 KiB
ReStructuredText
1004 lines
44 KiB
ReStructuredText
.. include:: _contributors.rst
|
|
|
|
.. currentmodule:: sklearn
|
|
|
|
.. _release_notes_1_3:
|
|
|
|
===========
|
|
Version 1.3
|
|
===========
|
|
|
|
For a short description of the main highlights of the release, please refer to
|
|
:ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_3_0.py`.
|
|
|
|
.. include:: changelog_legend.inc
|
|
|
|
.. _changes_1_3_2:
|
|
|
|
Version 1.3.2
|
|
=============
|
|
|
|
**October 2023**
|
|
|
|
Changelog
|
|
---------
|
|
|
|
:mod:`sklearn.datasets`
|
|
.......................
|
|
|
|
- |Fix| All dataset fetchers now accept `data_home` as any object that implements
|
|
the :class:`os.PathLike` interface, for instance, :class:`pathlib.Path`.
|
|
:pr:`27468` by :user:`Yao Xiao <Charlie-XIAO>`.
|
|
|
|
:mod:`sklearn.decomposition`
|
|
............................
|
|
|
|
- |Fix| Fixes a bug in :class:`decomposition.KernelPCA` by forcing the output of
|
|
the internal :class:`preprocessing.KernelCenterer` to be a default array. When the
|
|
arpack solver is used, it expects an array with a `dtype` attribute.
|
|
:pr:`27583` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
:mod:`sklearn.metrics`
|
|
......................
|
|
|
|
- |Fix| Fixes a bug for metrics using `zero_division=np.nan`
|
|
(e.g. :func:`~metrics.precision_score`) within a paralell loop
|
|
(e.g. :func:`~model_selection.cross_val_score`) where the singleton for `np.nan`
|
|
will be different in the sub-processes.
|
|
:pr:`27573` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
:mod:`sklearn.tree`
|
|
...................
|
|
|
|
- |Fix| Do not leak data via non-initialized memory in decision tree pickle files and make
|
|
the generation of those files deterministic. :pr:`27580` by :user:`Loïc Estève <lesteve>`.
|
|
|
|
|
|
.. _changes_1_3_1:
|
|
|
|
Version 1.3.1
|
|
=============
|
|
|
|
**September 2023**
|
|
|
|
Changed models
|
|
--------------
|
|
|
|
The following estimators and functions, when fit with the same data and
|
|
parameters, may produce different models from the previous version. This often
|
|
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
|
|
random sampling procedures.
|
|
|
|
- |Fix| Ridge models with `solver='sparse_cg'` may have slightly different
|
|
results with scipy>=1.12, because of an underlying change in the scipy solver
|
|
(see `scipy#18488 <https://github.com/scipy/scipy/pull/18488>`_ for more
|
|
details)
|
|
:pr:`26814` by :user:`Loïc Estève <lesteve>`
|
|
|
|
Changes impacting all modules
|
|
-----------------------------
|
|
|
|
- |Fix| The `set_output` API correctly works with list input. :pr:`27044` by
|
|
`Thomas Fan`_.
|
|
|
|
Changelog
|
|
---------
|
|
|
|
:mod:`sklearn.calibration`
|
|
..........................
|
|
|
|
- |Fix| :class:`calibration.CalibratedClassifierCV` can now handle models that
|
|
produce large prediction scores. Before it was numerically unstable.
|
|
:pr:`26913` by :user:`Omar Salman <OmarManzoor>`.
|
|
|
|
:mod:`sklearn.cluster`
|
|
......................
|
|
|
|
- |Fix| :class:`cluster.BisectingKMeans` could crash when predicting on data
|
|
with a different scale than the data used to fit the model.
|
|
:pr:`27167` by `Olivier Grisel`_.
|
|
|
|
- |Fix| :class:`cluster.BisectingKMeans` now works with data that has a single feature.
|
|
:pr:`27243` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
|
|
|
|
:mod:`sklearn.cross_decomposition`
|
|
..................................
|
|
|
|
- |Fix| :class:`cross_decomposition.PLSRegression` now automatically ravels the output
|
|
of `predict` if fitted with one dimensional `y`.
|
|
:pr:`26602` by :user:`Yao Xiao <Charlie-XIAO>`.
|
|
|
|
:mod:`sklearn.ensemble`
|
|
.......................
|
|
|
|
- |Fix| Fix a bug in :class:`ensemble.AdaBoostClassifier` with `algorithm="SAMME"`
|
|
where the decision function of each weak learner should be symmetric (i.e.
|
|
the sum of the scores should sum to zero for a sample).
|
|
:pr:`26521` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
:mod:`sklearn.feature_selection`
|
|
................................
|
|
|
|
- |Fix| :func:`feature_selection.mutual_info_regression` now correctly computes the
|
|
result when `X` is of integer dtype. :pr:`26748` by :user:`Yao Xiao <Charlie-XIAO>`.
|
|
|
|
:mod:`sklearn.impute`
|
|
.....................
|
|
|
|
- |Fix| :class:`impute.KNNImputer` now correctly adds a missing indicator column in
|
|
``transform`` when ``add_indicator`` is set to ``True`` and missing values are observed
|
|
during ``fit``. :pr:`26600` by :user:`Shreesha Kumar Bhat <Shreesha3112>`.
|
|
|
|
:mod:`sklearn.metrics`
|
|
......................
|
|
|
|
- |Fix| Scorers used with :func:`metrics.get_scorer` handle properly
|
|
multilabel-indicator matrix.
|
|
:pr:`27002` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
:mod:`sklearn.mixture`
|
|
......................
|
|
|
|
- |Fix| The initialization of :class:`mixture.GaussianMixture` from user-provided
|
|
`precisions_init` for `covariance_type` of `full` or `tied` was not correct,
|
|
and has been fixed.
|
|
:pr:`26416` by :user:`Yang Tao <mchikyt3>`.
|
|
|
|
:mod:`sklearn.neighbors`
|
|
........................
|
|
|
|
- |Fix| :meth:`neighbors.KNeighborsClassifier.predict` no longer raises an
|
|
exception for `pandas.DataFrames` input.
|
|
:pr:`26772` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
|
|
|
|
- |Fix| Reintroduce `sklearn.neighbors.BallTree.valid_metrics` and
|
|
`sklearn.neighbors.KDTree.valid_metrics` as public class attributes.
|
|
:pr:`26754` by :user:`Julien Jerphanion <jjerphan>`.
|
|
|
|
- |Fix| :class:`sklearn.model_selection.HalvingRandomSearchCV` no longer raises
|
|
when the input to the `param_distributions` parameter is a list of dicts.
|
|
:pr:`26893` by :user:`Stefanie Senger <StefanieSenger>`.
|
|
|
|
- |Fix| Neighbors based estimators now correctly work when `metric="minkowski"` and the
|
|
metric parameter `p` is in the range `0 < p < 1`, regardless of the `dtype` of `X`.
|
|
:pr:`26760` by :user:`Shreesha Kumar Bhat <Shreesha3112>`.
|
|
|
|
:mod:`sklearn.preprocessing`
|
|
............................
|
|
|
|
- |Fix| :class:`preprocessing.LabelEncoder` correctly accepts `y` as a keyword
|
|
argument. :pr:`26940` by `Thomas Fan`_.
|
|
|
|
- |Fix| :class:`preprocessing.OneHotEncoder` shows a more informative error message
|
|
when `sparse_output=True` and the output is configured to be pandas.
|
|
:pr:`26931` by `Thomas Fan`_.
|
|
|
|
:mod:`sklearn.tree`
|
|
...................
|
|
|
|
- |Fix| :func:`tree.plot_tree` now accepts `class_names=True` as documented.
|
|
:pr:`26903` by :user:`Thomas Roehr <2maz>`
|
|
|
|
- |Fix| The `feature_names` parameter of :func:`tree.plot_tree` now accepts any kind of
|
|
array-like instead of just a list. :pr:`27292` by :user:`Rahil Parikh <rprkh>`.
|
|
|
|
.. _changes_1_3:
|
|
|
|
Version 1.3.0
|
|
=============
|
|
|
|
**June 2023**
|
|
|
|
Changed models
|
|
--------------
|
|
|
|
The following estimators and functions, when fit with the same data and
|
|
parameters, may produce different models from the previous version. This often
|
|
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
|
|
random sampling procedures.
|
|
|
|
- |Enhancement| :meth:`multiclass.OutputCodeClassifier.predict` now uses a more
|
|
efficient pairwise distance reduction. As a consequence, the tie-breaking
|
|
strategy is different and thus the predicted labels may be different.
|
|
:pr:`25196` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |Enhancement| The `fit_transform` method of :class:`decomposition.DictionaryLearning`
|
|
is more efficient but may produce different results as in previous versions when
|
|
`transform_algorithm` is not the same as `fit_algorithm` and the number of iterations
|
|
is small. :pr:`24871` by :user:`Omar Salman <OmarManzoor>`.
|
|
|
|
- |Enhancement| The `sample_weight` parameter now will be used in centroids
|
|
initialization for :class:`cluster.KMeans`, :class:`cluster.BisectingKMeans`
|
|
and :class:`cluster.MiniBatchKMeans`.
|
|
This change will break backward compatibility, since numbers generated
|
|
from same random seeds will be different.
|
|
:pr:`25752` by :user:`Gleb Levitski <glevv>`,
|
|
:user:`Jérémie du Boisberranger <jeremiedbb>`,
|
|
:user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |Fix| Treat more consistently small values in the `W` and `H` matrices during the
|
|
`fit` and `transform` steps of :class:`decomposition.NMF` and
|
|
:class:`decomposition.MiniBatchNMF` which can produce different results than previous
|
|
versions. :pr:`25438` by :user:`Yotam Avidar-Constantini <yotamcons>`.
|
|
|
|
- |Fix| :class:`decomposition.KernelPCA` may produce different results through
|
|
`inverse_transform` if `gamma` is `None`. Now it will be chosen correctly as
|
|
`1/n_features` of the data that it is fitted on, while previously it might be
|
|
incorrectly chosen as `1/n_features` of the data passed to `inverse_transform`.
|
|
A new attribute `gamma_` is provided for revealing the actual value of `gamma`
|
|
used each time the kernel is called.
|
|
:pr:`26337` by :user:`Yao Xiao <Charlie-XIAO>`.
|
|
|
|
Changed displays
|
|
----------------
|
|
|
|
- |Enhancement| :class:`model_selection.LearningCurveDisplay` displays both the
|
|
train and test curves by default. You can set `score_type="test"` to keep the
|
|
past behaviour.
|
|
:pr:`25120` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |Fix| :class:`model_selection.ValidationCurveDisplay` now accepts passing a
|
|
list to the `param_range` parameter.
|
|
:pr:`27311` by :user:`Arturo Amor <ArturoAmorQ>`.
|
|
|
|
Changes impacting all modules
|
|
-----------------------------
|
|
|
|
- |Enhancement| The `get_feature_names_out` method of the following classes now
|
|
raises a `NotFittedError` if the instance is not fitted. This ensures the error is
|
|
consistent in all estimators with the `get_feature_names_out` method.
|
|
|
|
- :class:`impute.MissingIndicator`
|
|
- :class:`feature_extraction.DictVectorizer`
|
|
- :class:`feature_extraction.text.TfidfTransformer`
|
|
- :class:`feature_selection.GenericUnivariateSelect`
|
|
- :class:`feature_selection.RFE`
|
|
- :class:`feature_selection.RFECV`
|
|
- :class:`feature_selection.SelectFdr`
|
|
- :class:`feature_selection.SelectFpr`
|
|
- :class:`feature_selection.SelectFromModel`
|
|
- :class:`feature_selection.SelectFwe`
|
|
- :class:`feature_selection.SelectKBest`
|
|
- :class:`feature_selection.SelectPercentile`
|
|
- :class:`feature_selection.SequentialFeatureSelector`
|
|
- :class:`feature_selection.VarianceThreshold`
|
|
- :class:`kernel_approximation.AdditiveChi2Sampler`
|
|
- :class:`impute.IterativeImputer`
|
|
- :class:`impute.KNNImputer`
|
|
- :class:`impute.SimpleImputer`
|
|
- :class:`isotonic.IsotonicRegression`
|
|
- :class:`preprocessing.Binarizer`
|
|
- :class:`preprocessing.KBinsDiscretizer`
|
|
- :class:`preprocessing.MaxAbsScaler`
|
|
- :class:`preprocessing.MinMaxScaler`
|
|
- :class:`preprocessing.Normalizer`
|
|
- :class:`preprocessing.OrdinalEncoder`
|
|
- :class:`preprocessing.PowerTransformer`
|
|
- :class:`preprocessing.QuantileTransformer`
|
|
- :class:`preprocessing.RobustScaler`
|
|
- :class:`preprocessing.SplineTransformer`
|
|
- :class:`preprocessing.StandardScaler`
|
|
- :class:`random_projection.GaussianRandomProjection`
|
|
- :class:`random_projection.SparseRandomProjection`
|
|
|
|
The `NotFittedError` displays an informative message asking to fit the instance
|
|
with the appropriate arguments.
|
|
|
|
:pr:`25294`, :pr:`25308`, :pr:`25291`, :pr:`25367`, :pr:`25402`,
|
|
by :user:`John Pangas <jpangas>`, :user:`Rahil Parikh <rprkh>` ,
|
|
and :user:`Alex Buzenet <albuzenet>`.
|
|
|
|
- |Enhancement| Added a multi-threaded Cython routine to the compute squared
|
|
Euclidean distances (sometimes followed by a fused reduction operation) for a
|
|
pair of datasets consisting of a sparse CSR matrix and a dense NumPy.
|
|
|
|
This can improve the performance of following functions and estimators:
|
|
|
|
- :func:`sklearn.metrics.pairwise_distances_argmin`
|
|
- :func:`sklearn.metrics.pairwise_distances_argmin_min`
|
|
- :class:`sklearn.cluster.AffinityPropagation`
|
|
- :class:`sklearn.cluster.Birch`
|
|
- :class:`sklearn.cluster.MeanShift`
|
|
- :class:`sklearn.cluster.OPTICS`
|
|
- :class:`sklearn.cluster.SpectralClustering`
|
|
- :func:`sklearn.feature_selection.mutual_info_regression`
|
|
- :class:`sklearn.neighbors.KNeighborsClassifier`
|
|
- :class:`sklearn.neighbors.KNeighborsRegressor`
|
|
- :class:`sklearn.neighbors.RadiusNeighborsClassifier`
|
|
- :class:`sklearn.neighbors.RadiusNeighborsRegressor`
|
|
- :class:`sklearn.neighbors.LocalOutlierFactor`
|
|
- :class:`sklearn.neighbors.NearestNeighbors`
|
|
- :class:`sklearn.manifold.Isomap`
|
|
- :class:`sklearn.manifold.LocallyLinearEmbedding`
|
|
- :class:`sklearn.manifold.TSNE`
|
|
- :func:`sklearn.manifold.trustworthiness`
|
|
- :class:`sklearn.semi_supervised.LabelPropagation`
|
|
- :class:`sklearn.semi_supervised.LabelSpreading`
|
|
|
|
A typical example of this performance improvement happens when passing a sparse
|
|
CSR matrix to the `predict` or `transform` method of estimators that rely on
|
|
a dense NumPy representation to store their fitted parameters (or the reverse).
|
|
|
|
For instance, :meth:`sklearn.neighbors.NearestNeighbors.kneighbors` is now up
|
|
to 2 times faster for this case on commonly available laptops.
|
|
|
|
:pr:`25044` by :user:`Julien Jerphanion <jjerphan>`.
|
|
|
|
- |Enhancement| All estimators that internally rely on OpenMP multi-threading
|
|
(via Cython) now use a number of threads equal to the number of physical
|
|
(instead of logical) cores by default. In the past, we observed that using as
|
|
many threads as logical cores on SMT hosts could sometimes cause severe
|
|
performance problems depending on the algorithms and the shape of the data.
|
|
Note that it is still possible to manually adjust the number of threads used
|
|
by OpenMP as documented in :ref:`parallelism`.
|
|
|
|
:pr:`26082` by :user:`Jérémie du Boisberranger <jeremiedbb>` and
|
|
:user:`Olivier Grisel <ogrisel>`.
|
|
|
|
Experimental / Under Development
|
|
--------------------------------
|
|
|
|
- |MajorFeature| :ref:`Metadata routing <metadata_routing>`'s related base
|
|
methods are included in this release. This feature is only available via the
|
|
`enable_metadata_routing` feature flag which can be enabled using
|
|
:func:`sklearn.set_config` and :func:`sklearn.config_context`. For now this
|
|
feature is mostly useful for third party developers to prepare their code
|
|
base for metadata routing, and we strongly recommend that they also hide it
|
|
behind the same feature flag, rather than having it enabled by default.
|
|
:pr:`24027` by `Adrin Jalali`_, :user:`Benjamin Bossan <BenjaminBossan>`, and
|
|
:user:`Omar Salman <OmarManzoor>`.
|
|
|
|
Changelog
|
|
---------
|
|
|
|
..
|
|
Entries should be grouped by module (in alphabetic order) and prefixed with
|
|
one of the labels: |MajorFeature|, |Feature|, |Efficiency|, |Enhancement|,
|
|
|Fix| or |API| (see whats_new.rst for descriptions).
|
|
Entries should be ordered by those labels (e.g. |Fix| after |Efficiency|).
|
|
Changes not specific to a module should be listed under *Multiple Modules*
|
|
or *Miscellaneous*.
|
|
Entries should end with:
|
|
:pr:`123456` by :user:`Joe Bloggs <joeongithub>`.
|
|
where 123456 is the *pull request* number, not the issue number.
|
|
|
|
`sklearn`
|
|
.........
|
|
|
|
- |Feature| Added a new option `skip_parameter_validation`, to the function
|
|
:func:`sklearn.set_config` and context manager :func:`sklearn.config_context`, that
|
|
allows to skip the validation of the parameters passed to the estimators and public
|
|
functions. This can be useful to speed up the code but should be used with care
|
|
because it can lead to unexpected behaviors or raise obscure error messages when
|
|
setting invalid parameters.
|
|
:pr:`25815` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
|
|
|
|
:mod:`sklearn.base`
|
|
...................
|
|
|
|
- |Feature| A `__sklearn_clone__` protocol is now available to override the
|
|
default behavior of :func:`base.clone`. :pr:`24568` by `Thomas Fan`_.
|
|
|
|
- |Fix| :class:`base.TransformerMixin` now currently keeps a namedtuple's class
|
|
if `transform` returns a namedtuple. :pr:`26121` by `Thomas Fan`_.
|
|
|
|
:mod:`sklearn.calibration`
|
|
..........................
|
|
|
|
- |Fix| :class:`calibration.CalibratedClassifierCV` now does not enforce sample
|
|
alignment on `fit_params`. :pr:`25805` by `Adrin Jalali`_.
|
|
|
|
:mod:`sklearn.cluster`
|
|
......................
|
|
|
|
- |MajorFeature| Added :class:`cluster.HDBSCAN`, a modern hierarchical density-based
|
|
clustering algorithm. Similarly to :class:`cluster.OPTICS`, it can be seen as a
|
|
generalization of :class:`cluster.DBSCAN` by allowing for hierarchical instead of flat
|
|
clustering, however it varies in its approach from :class:`cluster.OPTICS`. This
|
|
algorithm is very robust with respect to its hyperparameters' values and can
|
|
be used on a wide variety of data without much, if any, tuning.
|
|
|
|
This implementation is an adaptation from the original implementation of HDBSCAN in
|
|
`scikit-learn-contrib/hdbscan <https://github.com/scikit-learn-contrib/hdbscan>`_,
|
|
by :user:`Leland McInnes <lmcinnes>` et al.
|
|
|
|
:pr:`26385` by :user:`Meekail Zain <micky774>`
|
|
|
|
- |Enhancement| The `sample_weight` parameter now will be used in centroids
|
|
initialization for :class:`cluster.KMeans`, :class:`cluster.BisectingKMeans`
|
|
and :class:`cluster.MiniBatchKMeans`.
|
|
This change will break backward compatibility, since numbers generated
|
|
from same random seeds will be different.
|
|
:pr:`25752` by :user:`Gleb Levitski <glevv>`,
|
|
:user:`Jérémie du Boisberranger <jeremiedbb>`,
|
|
:user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |Fix| :class:`cluster.KMeans`, :class:`cluster.MiniBatchKMeans` and
|
|
:func:`cluster.k_means` now correctly handle the combination of `n_init="auto"`
|
|
and `init` being an array-like, running one initialization in that case.
|
|
:pr:`26657` by :user:`Binesh Bannerjee <bnsh>`.
|
|
|
|
- |API| The `sample_weight` parameter in `predict` for
|
|
:meth:`cluster.KMeans.predict` and :meth:`cluster.MiniBatchKMeans.predict`
|
|
is now deprecated and will be removed in v1.5.
|
|
:pr:`25251` by :user:`Gleb Levitski <glevv>`.
|
|
|
|
- |API| The `Xred` argument in :func:`cluster.FeatureAgglomeration.inverse_transform`
|
|
is renamed to `Xt` and will be removed in v1.5. :pr:`26503` by `Adrin Jalali`_.
|
|
|
|
:mod:`sklearn.compose`
|
|
......................
|
|
|
|
- |Fix| :class:`compose.ColumnTransformer` raises an informative error when the individual
|
|
transformers of `ColumnTransformer` output pandas dataframes with indexes that are
|
|
not consistent with each other and the output is configured to be pandas.
|
|
:pr:`26286` by `Thomas Fan`_.
|
|
|
|
- |Fix| :class:`compose.ColumnTransformer` correctly sets the output of the
|
|
remainder when `set_output` is called. :pr:`26323` by `Thomas Fan`_.
|
|
|
|
:mod:`sklearn.covariance`
|
|
.........................
|
|
|
|
- |Fix| Allows `alpha=0` in :class:`covariance.GraphicalLasso` to be
|
|
consistent with :func:`covariance.graphical_lasso`.
|
|
:pr:`26033` by :user:`Genesis Valencia <genvalen>`.
|
|
|
|
- |Fix| :func:`covariance.empirical_covariance` now gives an informative
|
|
error message when input is not appropriate.
|
|
:pr:`26108` by :user:`Quentin Barthélemy <qbarthelemy>`.
|
|
|
|
- |API| Deprecates `cov_init` in :func:`covariance.graphical_lasso` in 1.3 since
|
|
the parameter has no effect. It will be removed in 1.5.
|
|
:pr:`26033` by :user:`Genesis Valencia <genvalen>`.
|
|
|
|
- |API| Adds `costs_` fitted attribute in :class:`covariance.GraphicalLasso` and
|
|
:class:`covariance.GraphicalLassoCV`.
|
|
:pr:`26033` by :user:`Genesis Valencia <genvalen>`.
|
|
|
|
- |API| Adds `covariance` parameter in :class:`covariance.GraphicalLasso`.
|
|
:pr:`26033` by :user:`Genesis Valencia <genvalen>`.
|
|
|
|
- |API| Adds `eps` parameter in :class:`covariance.GraphicalLasso`,
|
|
:func:`covariance.graphical_lasso`, and :class:`covariance.GraphicalLassoCV`.
|
|
:pr:`26033` by :user:`Genesis Valencia <genvalen>`.
|
|
|
|
:mod:`sklearn.datasets`
|
|
.......................
|
|
|
|
- |Enhancement| Allows to overwrite the parameters used to open the ARFF file using
|
|
the parameter `read_csv_kwargs` in :func:`datasets.fetch_openml` when using the
|
|
pandas parser.
|
|
:pr:`26433` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |Fix| :func:`datasets.fetch_openml` returns improved data types when
|
|
`as_frame=True` and `parser="liac-arff"`. :pr:`26386` by `Thomas Fan`_.
|
|
|
|
- |Fix| Following the ARFF specs, only the marker `"?"` is now considered as a missing
|
|
values when opening ARFF files fetched using :func:`datasets.fetch_openml` when using
|
|
the pandas parser. The parameter `read_csv_kwargs` allows to overwrite this behaviour.
|
|
:pr:`26551` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |Fix| :func:`datasets.fetch_openml` will consistently use `np.nan` as missing marker
|
|
with both parsers `"pandas"` and `"liac-arff"`.
|
|
:pr:`26579` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |API| The `data_transposed` argument of :func:`datasets.make_sparse_coded_signal`
|
|
is deprecated and will be removed in v1.5.
|
|
:pr:`25784` by :user:`Jérémie du Boisberranger`.
|
|
|
|
:mod:`sklearn.decomposition`
|
|
............................
|
|
|
|
- |Efficiency| :class:`decomposition.MiniBatchDictionaryLearning` and
|
|
:class:`decomposition.MiniBatchSparsePCA` are now faster for small batch sizes by
|
|
avoiding duplicate validations.
|
|
:pr:`25490` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
|
|
|
|
- |Enhancement| :class:`decomposition.DictionaryLearning` now accepts the parameter
|
|
`callback` for consistency with the function :func:`decomposition.dict_learning`.
|
|
:pr:`24871` by :user:`Omar Salman <OmarManzoor>`.
|
|
|
|
- |Fix| Treat more consistently small values in the `W` and `H` matrices during the
|
|
`fit` and `transform` steps of :class:`decomposition.NMF` and
|
|
:class:`decomposition.MiniBatchNMF` which can produce different results than previous
|
|
versions. :pr:`25438` by :user:`Yotam Avidar-Constantini <yotamcons>`.
|
|
|
|
- |API| The `W` argument in :func:`decomposition.NMF.inverse_transform` and
|
|
:class:`decomposition.MiniBatchNMF.inverse_transform` is renamed to `Xt` and
|
|
will be removed in v1.5. :pr:`26503` by `Adrin Jalali`_.
|
|
|
|
:mod:`sklearn.discriminant_analysis`
|
|
....................................
|
|
|
|
- |Enhancement| :class:`discriminant_analysis.LinearDiscriminantAnalysis` now
|
|
supports the `PyTorch <https://pytorch.org/>`__. See
|
|
:ref:`array_api` for more details. :pr:`25956` by `Thomas Fan`_.
|
|
|
|
:mod:`sklearn.ensemble`
|
|
.......................
|
|
|
|
- |Feature| :class:`ensemble.HistGradientBoostingRegressor` now supports
|
|
the Gamma deviance loss via `loss="gamma"`.
|
|
Using the Gamma deviance as loss function comes in handy for modelling skewed
|
|
distributed, strictly positive valued targets.
|
|
:pr:`22409` by :user:`Christian Lorentzen <lorentzenchr>`.
|
|
|
|
- |Feature| Compute a custom out-of-bag score by passing a callable to
|
|
:class:`ensemble.RandomForestClassifier`, :class:`ensemble.RandomForestRegressor`,
|
|
:class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor`.
|
|
:pr:`25177` by `Tim Head`_.
|
|
|
|
- |Feature| :class:`ensemble.GradientBoostingClassifier` now exposes
|
|
out-of-bag scores via the `oob_scores_` or `oob_score_` attributes.
|
|
:pr:`24882` by :user:`Ashwin Mathur <awinml>`.
|
|
|
|
- |Efficiency| :class:`ensemble.IsolationForest` predict time is now faster
|
|
(typically by a factor of 8 or more). Internally, the estimator now precomputes
|
|
decision path lengths per tree at `fit` time. It is therefore not possible
|
|
to load an estimator trained with scikit-learn 1.2 to make it predict with
|
|
scikit-learn 1.3: retraining with scikit-learn 1.3 is required.
|
|
:pr:`25186` by :user:`Felipe Breve Siola <fsiola>`.
|
|
|
|
- |Efficiency| :class:`ensemble.RandomForestClassifier` and
|
|
:class:`ensemble.RandomForestRegressor` with `warm_start=True` now only
|
|
recomputes out-of-bag scores when there are actually more `n_estimators`
|
|
in subsequent `fit` calls.
|
|
:pr:`26318` by :user:`Joshua Choo Yun Keat <choo8>`.
|
|
|
|
- |Enhancement| :class:`ensemble.BaggingClassifier` and
|
|
:class:`ensemble.BaggingRegressor` expose the `allow_nan` tag from the
|
|
underlying estimator. :pr:`25506` by `Thomas Fan`_.
|
|
|
|
- |Fix| :meth:`ensemble.RandomForestClassifier.fit` sets `max_samples = 1`
|
|
when `max_samples` is a float and `round(n_samples * max_samples) < 1`.
|
|
:pr:`25601` by :user:`Jan Fidor <JanFidor>`.
|
|
|
|
- |Fix| :meth:`ensemble.IsolationForest.fit` no longer warns about missing
|
|
feature names when called with `contamination` not `"auto"` on a pandas
|
|
dataframe.
|
|
:pr:`25931` by :user:`Yao Xiao <Charlie-XIAO>`.
|
|
|
|
- |Fix| :class:`ensemble.HistGradientBoostingRegressor` and
|
|
:class:`ensemble.HistGradientBoostingClassifier` treats negative values for
|
|
categorical features consistently as missing values, following LightGBM's and
|
|
pandas' conventions.
|
|
:pr:`25629` by `Thomas Fan`_.
|
|
|
|
- |Fix| Fix deprecation of `base_estimator` in :class:`ensemble.AdaBoostClassifier`
|
|
and :class:`ensemble.AdaBoostRegressor` that was introduced in :pr:`23819`.
|
|
:pr:`26242` by :user:`Marko Toplak <markotoplak>`.
|
|
|
|
:mod:`sklearn.exceptions`
|
|
.........................
|
|
|
|
- |Feature| Added :class:`exceptions.InconsistentVersionWarning` which is raised
|
|
when a scikit-learn estimator is unpickled with a scikit-learn version that is
|
|
inconsistent with the sckit-learn version the estimator was pickled with.
|
|
:pr:`25297` by `Thomas Fan`_.
|
|
|
|
:mod:`sklearn.feature_extraction`
|
|
.................................
|
|
|
|
- |API| :class:`feature_extraction.image.PatchExtractor` now follows the
|
|
transformer API of scikit-learn. This class is defined as a stateless transformer
|
|
meaning that it is note required to call `fit` before calling `transform`.
|
|
Parameter validation only happens at `fit` time.
|
|
:pr:`24230` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
:mod:`sklearn.feature_selection`
|
|
................................
|
|
|
|
- |Enhancement| All selectors in :mod:`sklearn.feature_selection` will preserve
|
|
a DataFrame's dtype when transformed. :pr:`25102` by `Thomas Fan`_.
|
|
|
|
- |Fix| :class:`feature_selection.SequentialFeatureSelector`'s `cv` parameter
|
|
now supports generators. :pr:`25973` by `Yao Xiao <Charlie-XIAO>`.
|
|
|
|
:mod:`sklearn.impute`
|
|
.....................
|
|
|
|
- |Enhancement| Added the parameter `fill_value` to :class:`impute.IterativeImputer`.
|
|
:pr:`25232` by :user:`Thijs van Weezel <ValueInvestorThijs>`.
|
|
|
|
- |Fix| :class:`impute.IterativeImputer` now correctly preserves the Pandas
|
|
Index when the `set_config(transform_output="pandas")`. :pr:`26454` by `Thomas Fan`_.
|
|
|
|
:mod:`sklearn.inspection`
|
|
.........................
|
|
|
|
- |Enhancement| Added support for `sample_weight` in
|
|
:func:`inspection.partial_dependence` and
|
|
:meth:`inspection.PartialDependenceDisplay.from_estimator`. This allows for
|
|
weighted averaging when aggregating for each value of the grid we are making the
|
|
inspection on. The option is only available when `method` is set to `brute`.
|
|
:pr:`25209` and :pr:`26644` by :user:`Carlo Lemos <vitaliset>`.
|
|
|
|
- |API| :func:`inspection.partial_dependence` returns a :class:`utils.Bunch` with
|
|
new key: `grid_values`. The `values` key is deprecated in favor of `grid_values`
|
|
and the `values` key will be removed in 1.5.
|
|
:pr:`21809` and :pr:`25732` by `Thomas Fan`_.
|
|
|
|
:mod:`sklearn.kernel_approximation`
|
|
...................................
|
|
|
|
- |Fix| :class:`kernel_approximation.AdditiveChi2Sampler` is now stateless.
|
|
The `sample_interval_` attribute is deprecated and will be removed in 1.5.
|
|
:pr:`25190` by :user:`Vincent Maladière <Vincent-Maladiere>`.
|
|
|
|
:mod:`sklearn.linear_model`
|
|
...........................
|
|
|
|
- |Efficiency| Avoid data scaling when `sample_weight=None` and other
|
|
unnecessary data copies and unexpected dense to sparse data conversion in
|
|
:class:`linear_model.LinearRegression`.
|
|
:pr:`26207` by :user:`Olivier Grisel <ogrisel>`.
|
|
|
|
- |Enhancement| :class:`linear_model.SGDClassifier`,
|
|
:class:`linear_model.SGDRegressor` and :class:`linear_model.SGDOneClassSVM`
|
|
now preserve dtype for `numpy.float32`.
|
|
:pr:`25587` by :user:`Omar Salman <OmarManzoor>`.
|
|
|
|
- |Enhancement| The `n_iter_` attribute has been included in
|
|
:class:`linear_model.ARDRegression` to expose the actual number of iterations
|
|
required to reach the stopping criterion.
|
|
:pr:`25697` by :user:`John Pangas <jpangas>`.
|
|
|
|
- |Fix| Use a more robust criterion to detect convergence of
|
|
:class:`linear_model.LogisticRegression` with `penalty="l1"` and `solver="liblinear"`
|
|
on linearly separable problems.
|
|
:pr:`25214` by `Tom Dupre la Tour`_.
|
|
|
|
- |Fix| Fix a crash when calling `fit` on
|
|
:class:`linear_model.LogisticRegression` with `solver="newton-cholesky"` and
|
|
`max_iter=0` which failed to inspect the state of the model prior to the
|
|
first parameter update.
|
|
:pr:`26653` by :user:`Olivier Grisel <ogrisel>`.
|
|
|
|
- |API| Deprecates `n_iter` in favor of `max_iter` in
|
|
:class:`linear_model.BayesianRidge` and :class:`linear_model.ARDRegression`.
|
|
`n_iter` will be removed in scikit-learn 1.5. This change makes those
|
|
estimators consistent with the rest of estimators.
|
|
:pr:`25697` by :user:`John Pangas <jpangas>`.
|
|
|
|
:mod:`sklearn.manifold`
|
|
.......................
|
|
|
|
- |Fix| :class:`manifold.Isomap` now correctly preserves the Pandas
|
|
Index when the `set_config(transform_output="pandas")`. :pr:`26454` by `Thomas Fan`_.
|
|
|
|
:mod:`sklearn.metrics`
|
|
......................
|
|
|
|
- |Feature| Adds `zero_division=np.nan` to multiple classification metrics:
|
|
:func:`metrics.precision_score`, :func:`metrics.recall_score`,
|
|
:func:`metrics.f1_score`, :func:`metrics.fbeta_score`,
|
|
:func:`metrics.precision_recall_fscore_support`,
|
|
:func:`metrics.classification_report`. When `zero_division=np.nan` and there is a
|
|
zero division, the metric is undefined and is excluded from averaging. When not used
|
|
for averages, the value returned is `np.nan`.
|
|
:pr:`25531` by :user:`Marc Torrellas Socastro <marctorsoc>`.
|
|
|
|
- |Feature| :func:`metrics.average_precision_score` now supports the
|
|
multiclass case.
|
|
:pr:`17388` by :user:`Geoffrey Bolmier <gbolmier>` and
|
|
:pr:`24769` by :user:`Ashwin Mathur <awinml>`.
|
|
|
|
- |Efficiency| The computation of the expected mutual information in
|
|
:func:`metrics.adjusted_mutual_info_score` is now faster when the number of
|
|
unique labels is large and its memory usage is reduced in general.
|
|
:pr:`25713` by :user:`Kshitij Mathur <Kshitij68>`,
|
|
:user:`Guillaume Lemaitre <glemaitre>`, :user:`Omar Salman <OmarManzoor>` and
|
|
:user:`Jérémie du Boisberranger <jeremiedbb>`.
|
|
|
|
- |Enhancement| :class:`metrics.silhouette_samples` nows accepts a sparse
|
|
matrix of pairwise distances between samples, or a feature array.
|
|
:pr:`18723` by :user:`Sahil Gupta <sahilgupta2105>` and
|
|
:pr:`24677` by :user:`Ashwin Mathur <awinml>`.
|
|
|
|
- |Enhancement| A new parameter `drop_intermediate` was added to
|
|
:func:`metrics.precision_recall_curve`,
|
|
:func:`metrics.PrecisionRecallDisplay.from_estimator`,
|
|
:func:`metrics.PrecisionRecallDisplay.from_predictions`,
|
|
which drops some suboptimal thresholds to create lighter precision-recall
|
|
curves.
|
|
:pr:`24668` by :user:`dberenbaum`.
|
|
|
|
- |Enhancement| :meth:`metrics.RocCurveDisplay.from_estimator` and
|
|
:meth:`metrics.RocCurveDisplay.from_predictions` now accept two new keywords,
|
|
`plot_chance_level` and `chance_level_kw` to plot the baseline chance
|
|
level. This line is exposed in the `chance_level_` attribute.
|
|
:pr:`25987` by :user:`Yao Xiao <Charlie-XIAO>`.
|
|
|
|
- |Enhancement| :meth:`metrics.PrecisionRecallDisplay.from_estimator` and
|
|
:meth:`metrics.PrecisionRecallDisplay.from_predictions` now accept two new
|
|
keywords, `plot_chance_level` and `chance_level_kw` to plot the baseline
|
|
chance level. This line is exposed in the `chance_level_` attribute.
|
|
:pr:`26019` by :user:`Yao Xiao <Charlie-XIAO>`.
|
|
|
|
- |Fix| :func:`metrics.pairwise.manhattan_distances` now supports readonly sparse datasets.
|
|
:pr:`25432` by :user:`Julien Jerphanion <jjerphan>`.
|
|
|
|
- |Fix| Fixed :func:`metrics.classification_report` so that empty input will return
|
|
`np.nan`. Previously, "macro avg" and `weighted avg` would return
|
|
e.g. `f1-score=np.nan` and `f1-score=0.0`, being inconsistent. Now, they
|
|
both return `np.nan`.
|
|
:pr:`25531` by :user:`Marc Torrellas Socastro <marctorsoc>`.
|
|
|
|
- |Fix| :func:`metrics.ndcg_score` now gives a meaningful error message for input of
|
|
length 1.
|
|
:pr:`25672` by :user:`Lene Preuss <lene>` and :user:`Wei-Chun Chu <wcchu>`.
|
|
|
|
- |Fix| :func:`metrics.log_loss` raises a warning if the values of the parameter
|
|
`y_pred` are not normalized, instead of actually normalizing them in the metric.
|
|
Starting from 1.5 this will raise an error.
|
|
:pr:`25299` by :user:`Omar Salman <OmarManzoor`.
|
|
|
|
- |Fix| In :func:`metrics.roc_curve`, use the threshold value `np.inf` instead of
|
|
arbitrary `max(y_score) + 1`. This threshold is associated with the ROC curve point
|
|
`tpr=0` and `fpr=0`.
|
|
:pr:`26194` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |Fix| The `'matching'` metric has been removed when using SciPy>=1.9
|
|
to be consistent with `scipy.spatial.distance` which does not support
|
|
`'matching'` anymore.
|
|
:pr:`26264` by :user:`Barata T. Onggo <magnusbarata>`
|
|
|
|
- |API| The `eps` parameter of the :func:`metrics.log_loss` has been deprecated and
|
|
will be removed in 1.5. :pr:`25299` by :user:`Omar Salman <OmarManzoor>`.
|
|
|
|
:mod:`sklearn.gaussian_process`
|
|
...............................
|
|
|
|
- |Fix| :class:`gaussian_process.GaussianProcessRegressor` has a new argument
|
|
`n_targets`, which is used to decide the number of outputs when sampling
|
|
from the prior distributions. :pr:`23099` by :user:`Zhehao Liu <MaxwellLZH>`.
|
|
|
|
:mod:`sklearn.mixture`
|
|
......................
|
|
|
|
- |Efficiency| :class:`mixture.GaussianMixture` is more efficient now and will bypass
|
|
unnecessary initialization if the weights, means, and precisions are
|
|
given by users.
|
|
:pr:`26021` by :user:`Jiawei Zhang <jiawei-zhang-a>`.
|
|
|
|
:mod:`sklearn.model_selection`
|
|
..............................
|
|
|
|
- |MajorFeature| Added the class :class:`model_selection.ValidationCurveDisplay`
|
|
that allows easy plotting of validation curves obtained by the function
|
|
:func:`model_selection.validation_curve`.
|
|
:pr:`25120` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |API| The parameter `log_scale` in the class
|
|
:class:`model_selection.LearningCurveDisplay` has been deprecated in 1.3 and
|
|
will be removed in 1.5. The default scale can be overridden by setting it
|
|
directly on the `ax` object and will be set automatically from the spacing
|
|
of the data points otherwise.
|
|
:pr:`25120` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |Enhancement| :func:`model_selection.cross_validate` accepts a new parameter
|
|
`return_indices` to return the train-test indices of each cv split.
|
|
:pr:`25659` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
:mod:`sklearn.multioutput`
|
|
..........................
|
|
|
|
- |Fix| :func:`getattr` on :meth:`multioutput.MultiOutputRegressor.partial_fit`
|
|
and :meth:`multioutput.MultiOutputClassifier.partial_fit` now correctly raise
|
|
an `AttributeError` if done before calling `fit`. :pr:`26333` by `Adrin
|
|
Jalali`_.
|
|
|
|
:mod:`sklearn.naive_bayes`
|
|
..........................
|
|
|
|
- |Fix| :class:`naive_bayes.GaussianNB` does not raise anymore a `ZeroDivisionError`
|
|
when the provided `sample_weight` reduces the problem to a single class in `fit`.
|
|
:pr:`24140` by :user:`Jonathan Ohayon <Johayon>` and :user:`Chiara Marmo <cmarmo>`.
|
|
|
|
:mod:`sklearn.neighbors`
|
|
........................
|
|
|
|
- |Enhancement| The performance of :meth:`neighbors.KNeighborsClassifier.predict`
|
|
and of :meth:`neighbors.KNeighborsClassifier.predict_proba` has been improved
|
|
when `n_neighbors` is large and `algorithm="brute"` with non Euclidean metrics.
|
|
:pr:`24076` by :user:`Meekail Zain <micky774>`, :user:`Julien Jerphanion <jjerphan>`.
|
|
|
|
- |Fix| Remove support for `KulsinskiDistance` in :class:`neighbors.BallTree`. This
|
|
dissimilarity is not a metric and cannot be supported by the BallTree.
|
|
:pr:`25417` by :user:`Guillaume Lemaitre <glemaitre>`.
|
|
|
|
- |API| The support for metrics other than `euclidean` and `manhattan` and for
|
|
callables in :class:`neighbors.NearestNeighbors` is deprecated and will be removed in
|
|
version 1.5. :pr:`24083` by :user:`Valentin Laurent <Valentin-Laurent>`.
|
|
|
|
:mod:`sklearn.neural_network`
|
|
.............................
|
|
|
|
- |Fix| :class:`neural_network.MLPRegressor` and :class:`neural_network.MLPClassifier`
|
|
reports the right `n_iter_` when `warm_start=True`. It corresponds to the number
|
|
of iterations performed on the current call to `fit` instead of the total number
|
|
of iterations performed since the initialization of the estimator.
|
|
:pr:`25443` by :user:`Marvin Krawutschke <Marvvxi>`.
|
|
|
|
:mod:`sklearn.pipeline`
|
|
.......................
|
|
|
|
- |Feature| :class:`pipeline.FeatureUnion` can now use indexing notation (e.g.
|
|
`feature_union["scalar"]`) to access transformers by name. :pr:`25093` by
|
|
`Thomas Fan`_.
|
|
|
|
- |Feature| :class:`pipeline.FeatureUnion` can now access the
|
|
`feature_names_in_` attribute if the `X` value seen during `.fit` has a
|
|
`columns` attribute and all columns are strings. e.g. when `X` is a
|
|
`pandas.DataFrame`
|
|
:pr:`25220` by :user:`Ian Thompson <it176131>`.
|
|
|
|
- |Fix| :meth:`pipeline.Pipeline.fit_transform` now raises an `AttributeError`
|
|
if the last step of the pipeline does not support `fit_transform`.
|
|
:pr:`26325` by `Adrin Jalali`_.
|
|
|
|
:mod:`sklearn.preprocessing`
|
|
............................
|
|
|
|
- |MajorFeature| Introduces :class:`preprocessing.TargetEncoder` which is a
|
|
categorical encoding based on target mean conditioned on the value of the
|
|
category. :pr:`25334` by `Thomas Fan`_.
|
|
|
|
- |Feature| :class:`preprocessing.OrdinalEncoder` now supports grouping
|
|
infrequent categories into a single feature. Grouping infrequent categories
|
|
is enabled by specifying how to select infrequent categories with
|
|
`min_frequency` or `max_categories`. :pr:`25677` by `Thomas Fan`_.
|
|
|
|
- |Enhancement| :class:`preprocessing.PolynomialFeatures` now calculates the
|
|
number of expanded terms a-priori when dealing with sparse `csr` matrices
|
|
in order to optimize the choice of `dtype` for `indices` and `indptr`. It
|
|
can now output `csr` matrices with `np.int32` `indices/indptr` components
|
|
when there are few enough elements, and will automatically use `np.int64`
|
|
for sufficiently large matrices.
|
|
:pr:`20524` by :user:`niuk-a <niuk-a>` and
|
|
:pr:`23731` by :user:`Meekail Zain <micky774>`
|
|
|
|
- |Enhancement| A new parameter `sparse_output` was added to
|
|
:class:`preprocessing.SplineTransformer`, available as of SciPy 1.8. If
|
|
`sparse_output=True`, :class:`preprocessing.SplineTransformer` returns a sparse
|
|
CSR matrix. :pr:`24145` by :user:`Christian Lorentzen <lorentzenchr>`.
|
|
|
|
- |Enhancement| Adds a `feature_name_combiner` parameter to
|
|
:class:`preprocessing.OneHotEncoder`. This specifies a custom callable to
|
|
create feature names to be returned by
|
|
:meth:`preprocessing.OneHotEncoder.get_feature_names_out`. The callable
|
|
combines input arguments `(input_feature, category)` to a string.
|
|
:pr:`22506` by :user:`Mario Kostelac <mariokostelac>`.
|
|
|
|
- |Enhancement| Added support for `sample_weight` in
|
|
:class:`preprocessing.KBinsDiscretizer`. This allows specifying the parameter
|
|
`sample_weight` for each sample to be used while fitting. The option is only
|
|
available when `strategy` is set to `quantile` and `kmeans`.
|
|
:pr:`24935` by :user:`Seladus <seladus>`, :user:`Guillaume Lemaitre <glemaitre>`, and
|
|
:user:`Dea María Léon <deamarialeon>`, :pr:`25257` by :user:`Gleb Levitski <glevv>`.
|
|
|
|
- |Enhancement| Subsampling through the `subsample` parameter can now be used in
|
|
:class:`preprocessing.KBinsDiscretizer` regardless of the strategy used.
|
|
:pr:`26424` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
|
|
|
|
- |Fix| :class:`preprocessing.PowerTransformer` now correctly preserves the Pandas
|
|
Index when the `set_config(transform_output="pandas")`. :pr:`26454` by `Thomas Fan`_.
|
|
|
|
- |Fix| :class:`preprocessing.PowerTransformer` now correctly raises error when
|
|
using `method="box-cox"` on data with a constant `np.nan` column.
|
|
:pr:`26400` by :user:`Yao Xiao <Charlie-XIAO>`.
|
|
|
|
- |Fix| :class:`preprocessing.PowerTransformer` with `method="yeo-johnson"` now leaves
|
|
constant features unchanged instead of transforming with an arbitrary value for
|
|
the `lambdas_` fitted parameter.
|
|
:pr:`26566` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
|
|
|
|
- |API| The default value of the `subsample` parameter of
|
|
:class:`preprocessing.KBinsDiscretizer` will change from `None` to `200_000` in
|
|
version 1.5 when `strategy="kmeans"` or `strategy="uniform"`.
|
|
:pr:`26424` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
|
|
|
|
:mod:`sklearn.svm`
|
|
..................
|
|
|
|
- |API| `dual` parameter now accepts `auto` option for
|
|
:class:`svm.LinearSVC` and :class:`svm.LinearSVR`.
|
|
:pr:`26093` by :user:`Gleb Levitski <glevv>`.
|
|
|
|
:mod:`sklearn.tree`
|
|
...................
|
|
|
|
- |MajorFeature| :class:`tree.DecisionTreeRegressor` and
|
|
:class:`tree.DecisionTreeClassifier` support missing values when
|
|
`splitter='best'` and criterion is `gini`, `entropy`, or `log_loss`,
|
|
for classification or `squared_error`, `friedman_mse`, or `poisson`
|
|
for regression. :pr:`23595`, :pr:`26376` by `Thomas Fan`_.
|
|
|
|
- |Enhancement| Adds a `class_names` parameter to
|
|
:func:`tree.export_text`. This allows specifying the parameter `class_names`
|
|
for each target class in ascending numerical order.
|
|
:pr:`25387` by :user:`William M <Akbeeh>` and :user:`crispinlogan <crispinlogan>`.
|
|
|
|
- |Fix| :func:`tree.export_graphviz` and :func:`tree.export_text` now accepts
|
|
`feature_names` and `class_names` as array-like rather than lists.
|
|
:pr:`26289` by :user:`Yao Xiao <Charlie-XIAO>`
|
|
|
|
:mod:`sklearn.utils`
|
|
....................
|
|
|
|
- |FIX| Fixes :func:`utils.check_array` to properly convert pandas
|
|
extension arrays. :pr:`25813` and :pr:`26106` by `Thomas Fan`_.
|
|
|
|
- |Fix| :func:`utils.check_array` now supports pandas DataFrames with
|
|
extension arrays and object dtypes by return an ndarray with object dtype.
|
|
:pr:`25814` by `Thomas Fan`_.
|
|
|
|
- |API| `utils.estimator_checks.check_transformers_unfitted_stateless` has been
|
|
introduced to ensure stateless transformers don't raise `NotFittedError`
|
|
during `transform` with no prior call to `fit` or `fit_transform`.
|
|
:pr:`25190` by :user:`Vincent Maladière <Vincent-Maladiere>`.
|
|
|
|
- |API| A `FutureWarning` is now raised when instantiating a class which inherits from
|
|
a deprecated base class (i.e. decorated by :class:`utils.deprecated`) and which
|
|
overrides the `__init__` method.
|
|
:pr:`25733` by :user:`Brigitta Sipőcz <bsipocz>` and
|
|
:user:`Jérémie du Boisberranger <jeremiedbb>`.
|
|
|
|
:mod:`sklearn.semi_supervised`
|
|
..............................
|
|
|
|
- |Enhancement| :meth:`semi_supervised.LabelSpreading.fit` and
|
|
:meth:`semi_supervised.LabelPropagation.fit` now accepts sparse metrics.
|
|
:pr:`19664` by :user:`Kaushik Amar Das <cozek>`.
|
|
|
|
Miscellaneous
|
|
.............
|
|
|
|
- |Enhancement| Replace obsolete exceptions `EnvironmentError`, `IOError` and
|
|
`WindowsError`.
|
|
:pr:`26466` by :user:`Dimitri Papadopoulos ORfanos <DimitriPapadopoulos>`.
|
|
|
|
.. rubric:: Code and documentation contributors
|
|
|
|
Thanks to everyone who has contributed to the maintenance and improvement of
|
|
the project since version 1.2, including:
|
|
|
|
2357juan, Abhishek Singh Kushwah, Adam Handke, Adam Kania, Adam Li, adienes,
|
|
Admir Demiraj, adoublet, Adrin Jalali, A.H.Mansouri, Ahmedbgh, Ala-Na, Alex
|
|
Buzenet, AlexL, Ali H. El-Kassas, amay, András Simon, André Pedersen, Andrew
|
|
Wang, Ankur Singh, annegnx, Ansam Zedan, Anthony22-dev, Artur Hermano, Arturo
|
|
Amor, as-90, ashah002, Ashish Dutt, Ashwin Mathur, AymericBasset, Azaria
|
|
Gebremichael, Barata Tripramudya Onggo, Benedek Harsanyi, Benjamin Bossan,
|
|
Bharat Raghunathan, Binesh Bannerjee, Boris Feld, Brendan Lu, Brevin Kunde,
|
|
cache-missing, Camille Troillard, Carla J, carlo, Carlo Lemos, c-git, Changyao
|
|
Chen, Chiara Marmo, Christian Lorentzen, Christian Veenhuis, Christine P. Chai,
|
|
crispinlogan, Da-Lan, DanGonite57, Dave Berenbaum, davidblnc, david-cortes,
|
|
Dayne, Dea María Léon, Denis, Dimitri Papadopoulos Orfanos, Dimitris
|
|
Litsidis, Dmitry Nesterov, Dominic Fox, Dominik Prodinger, Edern, Ekaterina
|
|
Butyugina, Elabonga Atuo, Emir, farhan khan, Felipe Siola, futurewarning, Gael
|
|
Varoquaux, genvalen, Gleb Levitski, Guillaume Lemaitre, gunesbayir, Haesun
|
|
Park, hujiahong726, i-aki-y, Ian Thompson, Ido M, Ily, Irene, Jack McIvor,
|
|
jakirkham, James Dean, JanFidor, Jarrod Millman, JB Mountford, Jérémie du
|
|
Boisberranger, Jessicakk0711, Jiawei Zhang, Joey Ortiz, JohnathanPi, John
|
|
Pangas, Joshua Choo Yun Keat, Joshua Hedlund, JuliaSchoepp, Julien Jerphanion,
|
|
jygerardy, ka00ri, Kaushik Amar Das, Kento Nozawa, Kian Eliasi, Kilian Kluge,
|
|
Lene Preuss, Linus, Logan Thomas, Loic Esteve, Louis Fouquet, Lucy Liu, Madhura
|
|
Jayaratne, Marc Torrellas Socastro, Maren Westermann, Mario Kostelac, Mark
|
|
Harfouche, Marko Toplak, Marvin Krawutschke, Masanori Kanazu, mathurinm, Matt
|
|
Haberland, Max Halford, maximeSaur, Maxwell Liu, m. bou, mdarii, Meekail Zain,
|
|
Mikhail Iljin, murezzda, Nawazish Alam, Nicola Fanelli, Nightwalkx, Nikolay
|
|
Petrov, Nishu Choudhary, NNLNR, npache, Olivier Grisel, Omar Salman, ouss1508,
|
|
PAB, Pandata, partev, Peter Piontek, Phil, pnucci, Pooja M, Pooja Subramaniam,
|
|
precondition, Quentin Barthélemy, Rafal Wojdyla, Raghuveer Bhat, Rahil Parikh,
|
|
Ralf Gommers, ram vikram singh, Rushil Desai, Sadra Barikbin, SANJAI_3, Sashka
|
|
Warner, Scott Gigante, Scott Gustafson, searchforpassion, Seoeun
|
|
Hong, Shady el Gewily, Shiva chauhan, Shogo Hida, Shreesha Kumar Bhat, sonnivs,
|
|
Sortofamudkip, Stanislav (Stanley) Modrak, Stefanie Senger, Steven Van
|
|
Vaerenbergh, Tabea Kossen, Théophile Baranger, Thijs van Weezel, Thomas A
|
|
Caswell, Thomas Germer, Thomas J. Fan, Tim Head, Tim P, Tom Dupré la Tour,
|
|
tomiock, tspeng, Valentin Laurent, Veghit, VIGNESH D, Vijeth Moudgalya, Vinayak
|
|
Mehta, Vincent M, Vincent-violet, Vyom Pathak, William M, windiana42, Xiao
|
|
Yuan, Yao Xiao, Yaroslav Halchenko, Yotam Avidar-Constantini, Yuchen Zhou,
|
|
Yusuf Raji, zeeshan lone
|