392 lines
14 KiB
ReStructuredText
392 lines
14 KiB
ReStructuredText
.. include:: _contributors.rst
|
|
|
|
.. currentmodule:: sklearn
|
|
|
|
============
|
|
Version 0.14
|
|
============
|
|
|
|
.. _changes_0_14:
|
|
|
|
Version 0.14
|
|
===============
|
|
|
|
**August 7, 2013**
|
|
|
|
Changelog
|
|
---------
|
|
|
|
- Missing values with sparse and dense matrices can be imputed with the
|
|
transformer `preprocessing.Imputer` by `Nicolas Trésegnie`_.
|
|
|
|
- The core implementation of decisions trees has been rewritten from
|
|
scratch, allowing for faster tree induction and lower memory
|
|
consumption in all tree-based estimators. By `Gilles Louppe`_.
|
|
|
|
- Added :class:`ensemble.AdaBoostClassifier` and
|
|
:class:`ensemble.AdaBoostRegressor`, by `Noel Dawe`_ and
|
|
`Gilles Louppe`_. See the :ref:`AdaBoost <adaboost>` section of the user
|
|
guide for details and examples.
|
|
|
|
- Added `grid_search.RandomizedSearchCV` and
|
|
`grid_search.ParameterSampler` for randomized hyperparameter
|
|
optimization. By `Andreas Müller`_.
|
|
|
|
- Added :ref:`biclustering <biclustering>` algorithms
|
|
(`sklearn.cluster.bicluster.SpectralCoclustering` and
|
|
`sklearn.cluster.bicluster.SpectralBiclustering`), data
|
|
generation methods (:func:`sklearn.datasets.make_biclusters` and
|
|
:func:`sklearn.datasets.make_checkerboard`), and scoring metrics
|
|
(:func:`sklearn.metrics.consensus_score`). By `Kemal Eren`_.
|
|
|
|
- Added :ref:`Restricted Boltzmann Machines<rbm>`
|
|
(:class:`neural_network.BernoulliRBM`). By `Yann Dauphin`_.
|
|
|
|
- Python 3 support by :user:`Justin Vincent <justinvf>`, `Lars Buitinck`_,
|
|
:user:`Subhodeep Moitra <smoitra87>` and `Olivier Grisel`_. All tests now pass under
|
|
Python 3.3.
|
|
|
|
- Ability to pass one penalty (alpha value) per target in
|
|
:class:`linear_model.Ridge`, by @eickenberg and `Mathieu Blondel`_.
|
|
|
|
- Fixed `sklearn.linear_model.stochastic_gradient.py` L2 regularization
|
|
issue (minor practical significance).
|
|
By :user:`Norbert Crombach <norbert>` and `Mathieu Blondel`_ .
|
|
|
|
- Added an interactive version of `Andreas Müller`_'s
|
|
`Machine Learning Cheat Sheet (for scikit-learn)
|
|
<https://peekaboo-vision.blogspot.de/2013/01/machine-learning-cheat-sheet-for-scikit.html>`_
|
|
to the documentation. See :ref:`Choosing the right estimator <ml_map>`.
|
|
By `Jaques Grobler`_.
|
|
|
|
- `grid_search.GridSearchCV` and
|
|
`cross_validation.cross_val_score` now support the use of advanced
|
|
scoring function such as area under the ROC curve and f-beta scores.
|
|
See :ref:`scoring_parameter` for details. By `Andreas Müller`_
|
|
and `Lars Buitinck`_.
|
|
Passing a function from :mod:`sklearn.metrics` as ``score_func`` is
|
|
deprecated.
|
|
|
|
- Multi-label classification output is now supported by
|
|
:func:`metrics.accuracy_score`, :func:`metrics.zero_one_loss`,
|
|
:func:`metrics.f1_score`, :func:`metrics.fbeta_score`,
|
|
:func:`metrics.classification_report`,
|
|
:func:`metrics.precision_score` and :func:`metrics.recall_score`
|
|
by `Arnaud Joly`_.
|
|
|
|
- Two new metrics :func:`metrics.hamming_loss` and
|
|
`metrics.jaccard_similarity_score`
|
|
are added with multi-label support by `Arnaud Joly`_.
|
|
|
|
- Speed and memory usage improvements in
|
|
:class:`feature_extraction.text.CountVectorizer` and
|
|
:class:`feature_extraction.text.TfidfVectorizer`,
|
|
by Jochen Wersdörfer and Roman Sinayev.
|
|
|
|
- The ``min_df`` parameter in
|
|
:class:`feature_extraction.text.CountVectorizer` and
|
|
:class:`feature_extraction.text.TfidfVectorizer`, which used to be 2,
|
|
has been reset to 1 to avoid unpleasant surprises (empty vocabularies)
|
|
for novice users who try it out on tiny document collections.
|
|
A value of at least 2 is still recommended for practical use.
|
|
|
|
- :class:`svm.LinearSVC`, :class:`linear_model.SGDClassifier` and
|
|
:class:`linear_model.SGDRegressor` now have a ``sparsify`` method that
|
|
converts their ``coef_`` into a sparse matrix, meaning stored models
|
|
trained using these estimators can be made much more compact.
|
|
|
|
- :class:`linear_model.SGDClassifier` now produces multiclass probability
|
|
estimates when trained under log loss or modified Huber loss.
|
|
|
|
- Hyperlinks to documentation in example code on the website by
|
|
:user:`Martin Luessi <mluessi>`.
|
|
|
|
- Fixed bug in :class:`preprocessing.MinMaxScaler` causing incorrect scaling
|
|
of the features for non-default ``feature_range`` settings. By `Andreas
|
|
Müller`_.
|
|
|
|
- ``max_features`` in :class:`tree.DecisionTreeClassifier`,
|
|
:class:`tree.DecisionTreeRegressor` and all derived ensemble estimators
|
|
now supports percentage values. By `Gilles Louppe`_.
|
|
|
|
- Performance improvements in :class:`isotonic.IsotonicRegression` by
|
|
`Nelle Varoquaux`_.
|
|
|
|
- :func:`metrics.accuracy_score` has an option normalize to return
|
|
the fraction or the number of correctly classified sample
|
|
by `Arnaud Joly`_.
|
|
|
|
- Added :func:`metrics.log_loss` that computes log loss, aka cross-entropy
|
|
loss. By Jochen Wersdörfer and `Lars Buitinck`_.
|
|
|
|
- A bug that caused :class:`ensemble.AdaBoostClassifier`'s to output
|
|
incorrect probabilities has been fixed.
|
|
|
|
- Feature selectors now share a mixin providing consistent ``transform``,
|
|
``inverse_transform`` and ``get_support`` methods. By `Joel Nothman`_.
|
|
|
|
- A fitted `grid_search.GridSearchCV` or
|
|
`grid_search.RandomizedSearchCV` can now generally be pickled.
|
|
By `Joel Nothman`_.
|
|
|
|
- Refactored and vectorized implementation of :func:`metrics.roc_curve`
|
|
and :func:`metrics.precision_recall_curve`. By `Joel Nothman`_.
|
|
|
|
- The new estimator :class:`sklearn.decomposition.TruncatedSVD`
|
|
performs dimensionality reduction using SVD on sparse matrices,
|
|
and can be used for latent semantic analysis (LSA).
|
|
By `Lars Buitinck`_.
|
|
|
|
- Added self-contained example of out-of-core learning on text data
|
|
:ref:`sphx_glr_auto_examples_applications_plot_out_of_core_classification.py`.
|
|
By :user:`Eustache Diemert <oddskool>`.
|
|
|
|
- The default number of components for
|
|
`sklearn.decomposition.RandomizedPCA` is now correctly documented
|
|
to be ``n_features``. This was the default behavior, so programs using it
|
|
will continue to work as they did.
|
|
|
|
- :class:`sklearn.cluster.KMeans` now fits several orders of magnitude
|
|
faster on sparse data (the speedup depends on the sparsity). By
|
|
`Lars Buitinck`_.
|
|
|
|
- Reduce memory footprint of FastICA by `Denis Engemann`_ and
|
|
`Alexandre Gramfort`_.
|
|
|
|
- Verbose output in `sklearn.ensemble.gradient_boosting` now uses
|
|
a column format and prints progress in decreasing frequency.
|
|
It also shows the remaining time. By `Peter Prettenhofer`_.
|
|
|
|
- `sklearn.ensemble.gradient_boosting` provides out-of-bag improvement
|
|
`oob_improvement_`
|
|
rather than the OOB score for model selection. An example that shows
|
|
how to use OOB estimates to select the number of trees was added.
|
|
By `Peter Prettenhofer`_.
|
|
|
|
- Most metrics now support string labels for multiclass classification
|
|
by `Arnaud Joly`_ and `Lars Buitinck`_.
|
|
|
|
- New OrthogonalMatchingPursuitCV class by `Alexandre Gramfort`_
|
|
and `Vlad Niculae`_.
|
|
|
|
- Fixed a bug in `sklearn.covariance.GraphLassoCV`: the
|
|
'alphas' parameter now works as expected when given a list of
|
|
values. By Philippe Gervais.
|
|
|
|
- Fixed an important bug in `sklearn.covariance.GraphLassoCV`
|
|
that prevented all folds provided by a CV object to be used (only
|
|
the first 3 were used). When providing a CV object, execution
|
|
time may thus increase significantly compared to the previous
|
|
version (bug results are correct now). By Philippe Gervais.
|
|
|
|
- `cross_validation.cross_val_score` and the `grid_search`
|
|
module is now tested with multi-output data by `Arnaud Joly`_.
|
|
|
|
- :func:`datasets.make_multilabel_classification` can now return
|
|
the output in label indicator multilabel format by `Arnaud Joly`_.
|
|
|
|
- K-nearest neighbors, :class:`neighbors.KNeighborsRegressor`
|
|
and :class:`neighbors.RadiusNeighborsRegressor`,
|
|
and radius neighbors, :class:`neighbors.RadiusNeighborsRegressor` and
|
|
:class:`neighbors.RadiusNeighborsClassifier` support multioutput data
|
|
by `Arnaud Joly`_.
|
|
|
|
- Random state in LibSVM-based estimators (:class:`svm.SVC`, :class:`svm.NuSVC`,
|
|
:class:`svm.OneClassSVM`, :class:`svm.SVR`, :class:`svm.NuSVR`) can now be
|
|
controlled. This is useful to ensure consistency in the probability
|
|
estimates for the classifiers trained with ``probability=True``. By
|
|
`Vlad Niculae`_.
|
|
|
|
- Out-of-core learning support for discrete naive Bayes classifiers
|
|
:class:`sklearn.naive_bayes.MultinomialNB` and
|
|
:class:`sklearn.naive_bayes.BernoulliNB` by adding the ``partial_fit``
|
|
method by `Olivier Grisel`_.
|
|
|
|
- New website design and navigation by `Gilles Louppe`_, `Nelle Varoquaux`_,
|
|
Vincent Michel and `Andreas Müller`_.
|
|
|
|
- Improved documentation on :ref:`multi-class, multi-label and multi-output
|
|
classification <multiclass>` by `Yannick Schwartz`_ and `Arnaud Joly`_.
|
|
|
|
- Better input and error handling in the :mod:`sklearn.metrics` module by
|
|
`Arnaud Joly`_ and `Joel Nothman`_.
|
|
|
|
- Speed optimization of the `hmm` module by :user:`Mikhail Korobov <kmike>`
|
|
|
|
- Significant speed improvements for :class:`sklearn.cluster.DBSCAN`
|
|
by `cleverless <https://github.com/cleverless>`_
|
|
|
|
|
|
API changes summary
|
|
-------------------
|
|
|
|
- The `auc_score` was renamed :func:`metrics.roc_auc_score`.
|
|
|
|
- Testing scikit-learn with ``sklearn.test()`` is deprecated. Use
|
|
``nosetests sklearn`` from the command line.
|
|
|
|
- Feature importances in :class:`tree.DecisionTreeClassifier`,
|
|
:class:`tree.DecisionTreeRegressor` and all derived ensemble estimators
|
|
are now computed on the fly when accessing the ``feature_importances_``
|
|
attribute. Setting ``compute_importances=True`` is no longer required.
|
|
By `Gilles Louppe`_.
|
|
|
|
- :class:`linear_model.lasso_path` and
|
|
:class:`linear_model.enet_path` can return its results in the same
|
|
format as that of :class:`linear_model.lars_path`. This is done by
|
|
setting the ``return_models`` parameter to ``False``. By
|
|
`Jaques Grobler`_ and `Alexandre Gramfort`_
|
|
|
|
- `grid_search.IterGrid` was renamed to `grid_search.ParameterGrid`.
|
|
|
|
- Fixed bug in `KFold` causing imperfect class balance in some
|
|
cases. By `Alexandre Gramfort`_ and Tadej Janež.
|
|
|
|
- :class:`sklearn.neighbors.BallTree` has been refactored, and a
|
|
:class:`sklearn.neighbors.KDTree` has been
|
|
added which shares the same interface. The Ball Tree now works with
|
|
a wide variety of distance metrics. Both classes have many new
|
|
methods, including single-tree and dual-tree queries, breadth-first
|
|
and depth-first searching, and more advanced queries such as
|
|
kernel density estimation and 2-point correlation functions.
|
|
By `Jake Vanderplas`_
|
|
|
|
- Support for scipy.spatial.cKDTree within neighbors queries has been
|
|
removed, and the functionality replaced with the new
|
|
:class:`sklearn.neighbors.KDTree` class.
|
|
|
|
- :class:`sklearn.neighbors.KernelDensity` has been added, which performs
|
|
efficient kernel density estimation with a variety of kernels.
|
|
|
|
- :class:`sklearn.decomposition.KernelPCA` now always returns output with
|
|
``n_components`` components, unless the new parameter ``remove_zero_eig``
|
|
is set to ``True``. This new behavior is consistent with the way
|
|
kernel PCA was always documented; previously, the removal of components
|
|
with zero eigenvalues was tacitly performed on all data.
|
|
|
|
- ``gcv_mode="auto"`` no longer tries to perform SVD on a densified
|
|
sparse matrix in :class:`sklearn.linear_model.RidgeCV`.
|
|
|
|
- Sparse matrix support in `sklearn.decomposition.RandomizedPCA`
|
|
is now deprecated in favor of the new ``TruncatedSVD``.
|
|
|
|
- `cross_validation.KFold` and
|
|
`cross_validation.StratifiedKFold` now enforce `n_folds >= 2`
|
|
otherwise a ``ValueError`` is raised. By `Olivier Grisel`_.
|
|
|
|
- :func:`datasets.load_files`'s ``charset`` and ``charset_errors``
|
|
parameters were renamed ``encoding`` and ``decode_errors``.
|
|
|
|
- Attribute ``oob_score_`` in :class:`sklearn.ensemble.GradientBoostingRegressor`
|
|
and :class:`sklearn.ensemble.GradientBoostingClassifier`
|
|
is deprecated and has been replaced by ``oob_improvement_`` .
|
|
|
|
- Attributes in OrthogonalMatchingPursuit have been deprecated
|
|
(copy_X, Gram, ...) and precompute_gram renamed precompute
|
|
for consistency. See #2224.
|
|
|
|
- :class:`sklearn.preprocessing.StandardScaler` now converts integer input
|
|
to float, and raises a warning. Previously it rounded for dense integer
|
|
input.
|
|
|
|
- :class:`sklearn.multiclass.OneVsRestClassifier` now has a
|
|
``decision_function`` method. This will return the distance of each
|
|
sample from the decision boundary for each class, as long as the
|
|
underlying estimators implement the ``decision_function`` method.
|
|
By `Kyle Kastner`_.
|
|
|
|
- Better input validation, warning on unexpected shapes for y.
|
|
|
|
People
|
|
------
|
|
List of contributors for release 0.14 by number of commits.
|
|
|
|
* 277 Gilles Louppe
|
|
* 245 Lars Buitinck
|
|
* 187 Andreas Mueller
|
|
* 124 Arnaud Joly
|
|
* 112 Jaques Grobler
|
|
* 109 Gael Varoquaux
|
|
* 107 Olivier Grisel
|
|
* 102 Noel Dawe
|
|
* 99 Kemal Eren
|
|
* 79 Joel Nothman
|
|
* 75 Jake VanderPlas
|
|
* 73 Nelle Varoquaux
|
|
* 71 Vlad Niculae
|
|
* 65 Peter Prettenhofer
|
|
* 64 Alexandre Gramfort
|
|
* 54 Mathieu Blondel
|
|
* 38 Nicolas Trésegnie
|
|
* 35 eustache
|
|
* 27 Denis Engemann
|
|
* 25 Yann N. Dauphin
|
|
* 19 Justin Vincent
|
|
* 17 Robert Layton
|
|
* 15 Doug Coleman
|
|
* 14 Michael Eickenberg
|
|
* 13 Robert Marchman
|
|
* 11 Fabian Pedregosa
|
|
* 11 Philippe Gervais
|
|
* 10 Jim Holmström
|
|
* 10 Tadej Janež
|
|
* 10 syhw
|
|
* 9 Mikhail Korobov
|
|
* 9 Steven De Gryze
|
|
* 8 sergeyf
|
|
* 7 Ben Root
|
|
* 7 Hrishikesh Huilgolkar
|
|
* 6 Kyle Kastner
|
|
* 6 Martin Luessi
|
|
* 6 Rob Speer
|
|
* 5 Federico Vaggi
|
|
* 5 Raul Garreta
|
|
* 5 Rob Zinkov
|
|
* 4 Ken Geis
|
|
* 3 A. Flaxman
|
|
* 3 Denton Cockburn
|
|
* 3 Dougal Sutherland
|
|
* 3 Ian Ozsvald
|
|
* 3 Johannes Schönberger
|
|
* 3 Robert McGibbon
|
|
* 3 Roman Sinayev
|
|
* 3 Szabo Roland
|
|
* 2 Diego Molla
|
|
* 2 Imran Haque
|
|
* 2 Jochen Wersdörfer
|
|
* 2 Sergey Karayev
|
|
* 2 Yannick Schwartz
|
|
* 2 jamestwebber
|
|
* 1 Abhijeet Kolhe
|
|
* 1 Alexander Fabisch
|
|
* 1 Bastiaan van den Berg
|
|
* 1 Benjamin Peterson
|
|
* 1 Daniel Velkov
|
|
* 1 Fazlul Shahriar
|
|
* 1 Felix Brockherde
|
|
* 1 Félix-Antoine Fortin
|
|
* 1 Harikrishnan S
|
|
* 1 Jack Hale
|
|
* 1 JakeMick
|
|
* 1 James McDermott
|
|
* 1 John Benediktsson
|
|
* 1 John Zwinck
|
|
* 1 Joshua Vredevoogd
|
|
* 1 Justin Pati
|
|
* 1 Kevin Hughes
|
|
* 1 Kyle Kelley
|
|
* 1 Matthias Ekman
|
|
* 1 Miroslav Shubernetskiy
|
|
* 1 Naoki Orii
|
|
* 1 Norbert Crombach
|
|
* 1 Rafael Cunha de Almeida
|
|
* 1 Rolando Espinoza La fuente
|
|
* 1 Seamus Abshere
|
|
* 1 Sergey Feldman
|
|
* 1 Sergio Medina
|
|
* 1 Stefano Lattarini
|
|
* 1 Steve Koch
|
|
* 1 Sturla Molden
|
|
* 1 Thomas Jarosch
|
|
* 1 Yaroslav Halchenko
|