sklearn/doc/whats_new/v0.13.rst

397 lines
14 KiB
ReStructuredText

.. include:: _contributors.rst
.. currentmodule:: sklearn
============
Version 0.13
============
.. _changes_0_13_1:
Version 0.13.1
==============
**February 23, 2013**
The 0.13.1 release only fixes some bugs and does not add any new functionality.
Changelog
---------
- Fixed a testing error caused by the function `cross_validation.train_test_split` being
interpreted as a test by `Yaroslav Halchenko`_.
- Fixed a bug in the reassignment of small clusters in the :class:`cluster.MiniBatchKMeans`
by `Gael Varoquaux`_.
- Fixed default value of ``gamma`` in :class:`decomposition.KernelPCA` by `Lars Buitinck`_.
- Updated joblib to ``0.7.0d`` by `Gael Varoquaux`_.
- Fixed scaling of the deviance in :class:`ensemble.GradientBoostingClassifier` by `Peter Prettenhofer`_.
- Better tie-breaking in :class:`multiclass.OneVsOneClassifier` by `Andreas Müller`_.
- Other small improvements to tests and documentation.
People
------
List of contributors for release 0.13.1 by number of commits.
* 16 `Lars Buitinck`_
* 12 `Andreas Müller`_
* 8 `Gael Varoquaux`_
* 5 Robert Marchman
* 3 `Peter Prettenhofer`_
* 2 Hrishikesh Huilgolkar
* 1 Bastiaan van den Berg
* 1 Diego Molla
* 1 `Gilles Louppe`_
* 1 `Mathieu Blondel`_
* 1 `Nelle Varoquaux`_
* 1 Rafael Cunha de Almeida
* 1 Rolando Espinoza La fuente
* 1 `Vlad Niculae`_
* 1 `Yaroslav Halchenko`_
.. _changes_0_13:
Version 0.13
============
**January 21, 2013**
New Estimator Classes
---------------------
- :class:`dummy.DummyClassifier` and :class:`dummy.DummyRegressor`, two
data-independent predictors by `Mathieu Blondel`_. Useful to sanity-check
your estimators. See :ref:`dummy_estimators` in the user guide.
Multioutput support added by `Arnaud Joly`_.
- :class:`decomposition.FactorAnalysis`, a transformer implementing the
classical factor analysis, by `Christian Osendorfer`_ and `Alexandre
Gramfort`_. See :ref:`FA` in the user guide.
- :class:`feature_extraction.FeatureHasher`, a transformer implementing the
"hashing trick" for fast, low-memory feature extraction from string fields
by `Lars Buitinck`_ and :class:`feature_extraction.text.HashingVectorizer`
for text documents by `Olivier Grisel`_ See :ref:`feature_hashing` and
:ref:`hashing_vectorizer` for the documentation and sample usage.
- :class:`pipeline.FeatureUnion`, a transformer that concatenates
results of several other transformers by `Andreas Müller`_. See
:ref:`feature_union` in the user guide.
- :class:`random_projection.GaussianRandomProjection`,
:class:`random_projection.SparseRandomProjection` and the function
:func:`random_projection.johnson_lindenstrauss_min_dim`. The first two are
transformers implementing Gaussian and sparse random projection matrix
by `Olivier Grisel`_ and `Arnaud Joly`_.
See :ref:`random_projection` in the user guide.
- :class:`kernel_approximation.Nystroem`, a transformer for approximating
arbitrary kernels by `Andreas Müller`_. See
:ref:`nystroem_kernel_approx` in the user guide.
- :class:`preprocessing.OneHotEncoder`, a transformer that computes binary
encodings of categorical features by `Andreas Müller`_. See
:ref:`preprocessing_categorical_features` in the user guide.
- :class:`linear_model.PassiveAggressiveClassifier` and
:class:`linear_model.PassiveAggressiveRegressor`, predictors implementing
an efficient stochastic optimization for linear models by `Rob Zinkov`_ and
`Mathieu Blondel`_. See :ref:`passive_aggressive` in the user
guide.
- :class:`ensemble.RandomTreesEmbedding`, a transformer for creating high-dimensional
sparse representations using ensembles of totally random trees by `Andreas Müller`_.
See :ref:`random_trees_embedding` in the user guide.
- :class:`manifold.SpectralEmbedding` and function
:func:`manifold.spectral_embedding`, implementing the "laplacian
eigenmaps" transformation for non-linear dimensionality reduction by Wei
Li. See :ref:`spectral_embedding` in the user guide.
- :class:`isotonic.IsotonicRegression` by `Fabian Pedregosa`_, `Alexandre Gramfort`_
and `Nelle Varoquaux`_,
Changelog
---------
- :func:`metrics.zero_one_loss` (formerly ``metrics.zero_one``) now has
option for normalized output that reports the fraction of
misclassifications, rather than the raw number of misclassifications. By
Kyle Beauchamp.
- :class:`tree.DecisionTreeClassifier` and all derived ensemble models now
support sample weighting, by `Noel Dawe`_ and `Gilles Louppe`_.
- Speedup improvement when using bootstrap samples in forests of randomized
trees, by `Peter Prettenhofer`_ and `Gilles Louppe`_.
- Partial dependence plots for :ref:`gradient_boosting` in
`ensemble.partial_dependence.partial_dependence` by `Peter
Prettenhofer`_. See :ref:`sphx_glr_auto_examples_inspection_plot_partial_dependence.py` for an
example.
- The table of contents on the website has now been made expandable by
`Jaques Grobler`_.
- :class:`feature_selection.SelectPercentile` now breaks ties
deterministically instead of returning all equally ranked features.
- :class:`feature_selection.SelectKBest` and
:class:`feature_selection.SelectPercentile` are more numerically stable
since they use scores, rather than p-values, to rank results. This means
that they might sometimes select different features than they did
previously.
- Ridge regression and ridge classification fitting with ``sparse_cg`` solver
no longer has quadratic memory complexity, by `Lars Buitinck`_ and
`Fabian Pedregosa`_.
- Ridge regression and ridge classification now support a new fast solver
called ``lsqr``, by `Mathieu Blondel`_.
- Speed up of :func:`metrics.precision_recall_curve` by Conrad Lee.
- Added support for reading/writing svmlight files with pairwise
preference attribute (qid in svmlight file format) in
:func:`datasets.dump_svmlight_file` and
:func:`datasets.load_svmlight_file` by `Fabian Pedregosa`_.
- Faster and more robust :func:`metrics.confusion_matrix` and
:ref:`clustering_evaluation` by Wei Li.
- `cross_validation.cross_val_score` now works with precomputed kernels
and affinity matrices, by `Andreas Müller`_.
- LARS algorithm made more numerically stable with heuristics to drop
regressors too correlated as well as to stop the path when
numerical noise becomes predominant, by `Gael Varoquaux`_.
- Faster implementation of :func:`metrics.precision_recall_curve` by
Conrad Lee.
- New kernel `metrics.chi2_kernel` by `Andreas Müller`_, often used
in computer vision applications.
- Fix of longstanding bug in :class:`naive_bayes.BernoulliNB` fixed by
Shaun Jackman.
- Implemented ``predict_proba`` in :class:`multiclass.OneVsRestClassifier`,
by Andrew Winterman.
- Improve consistency in gradient boosting: estimators
:class:`ensemble.GradientBoostingRegressor` and
:class:`ensemble.GradientBoostingClassifier` use the estimator
:class:`tree.DecisionTreeRegressor` instead of the
`tree._tree.Tree` data structure by `Arnaud Joly`_.
- Fixed a floating point exception in the :ref:`decision trees <tree>`
module, by Seberg.
- Fix :func:`metrics.roc_curve` fails when y_true has only one class
by Wei Li.
- Add the :func:`metrics.mean_absolute_error` function which computes the
mean absolute error. The :func:`metrics.mean_squared_error`,
:func:`metrics.mean_absolute_error` and
:func:`metrics.r2_score` metrics support multioutput by `Arnaud Joly`_.
- Fixed ``class_weight`` support in :class:`svm.LinearSVC` and
:class:`linear_model.LogisticRegression` by `Andreas Müller`_. The meaning
of ``class_weight`` was reversed as erroneously higher weight meant less
positives of a given class in earlier releases.
- Improve narrative documentation and consistency in
:mod:`sklearn.metrics` for regression and classification metrics
by `Arnaud Joly`_.
- Fixed a bug in :class:`sklearn.svm.SVC` when using csr-matrices with
unsorted indices by Xinfan Meng and `Andreas Müller`_.
- :class:`cluster.MiniBatchKMeans`: Add random reassignment of cluster centers
with little observations attached to them, by `Gael Varoquaux`_.
API changes summary
-------------------
- Renamed all occurrences of ``n_atoms`` to ``n_components`` for consistency.
This applies to :class:`decomposition.DictionaryLearning`,
:class:`decomposition.MiniBatchDictionaryLearning`,
:func:`decomposition.dict_learning`, :func:`decomposition.dict_learning_online`.
- Renamed all occurrences of ``max_iters`` to ``max_iter`` for consistency.
This applies to `semi_supervised.LabelPropagation` and
`semi_supervised.label_propagation.LabelSpreading`.
- Renamed all occurrences of ``learn_rate`` to ``learning_rate`` for
consistency in `ensemble.BaseGradientBoosting` and
:class:`ensemble.GradientBoostingRegressor`.
- The module ``sklearn.linear_model.sparse`` is gone. Sparse matrix support
was already integrated into the "regular" linear models.
- `sklearn.metrics.mean_square_error`, which incorrectly returned the
accumulated error, was removed. Use :func:`metrics.mean_squared_error` instead.
- Passing ``class_weight`` parameters to ``fit`` methods is no longer
supported. Pass them to estimator constructors instead.
- GMMs no longer have ``decode`` and ``rvs`` methods. Use the ``score``,
``predict`` or ``sample`` methods instead.
- The ``solver`` fit option in Ridge regression and classification is now
deprecated and will be removed in v0.14. Use the constructor option
instead.
- `feature_extraction.text.DictVectorizer` now returns sparse
matrices in the CSR format, instead of COO.
- Renamed ``k`` in `cross_validation.KFold` and
`cross_validation.StratifiedKFold` to ``n_folds``, renamed
``n_bootstraps`` to ``n_iter`` in ``cross_validation.Bootstrap``.
- Renamed all occurrences of ``n_iterations`` to ``n_iter`` for consistency.
This applies to `cross_validation.ShuffleSplit`,
`cross_validation.StratifiedShuffleSplit`,
:func:`utils.extmath.randomized_range_finder` and
:func:`utils.extmath.randomized_svd`.
- Replaced ``rho`` in :class:`linear_model.ElasticNet` and
:class:`linear_model.SGDClassifier` by ``l1_ratio``. The ``rho`` parameter
had different meanings; ``l1_ratio`` was introduced to avoid confusion.
It has the same meaning as previously ``rho`` in
:class:`linear_model.ElasticNet` and ``(1-rho)`` in
:class:`linear_model.SGDClassifier`.
- :class:`linear_model.LassoLars` and :class:`linear_model.Lars` now
store a list of paths in the case of multiple targets, rather than
an array of paths.
- The attribute ``gmm`` of `hmm.GMMHMM` was renamed to ``gmm_``
to adhere more strictly with the API.
- `cluster.spectral_embedding` was moved to
:func:`manifold.spectral_embedding`.
- Renamed ``eig_tol`` in :func:`manifold.spectral_embedding`,
:class:`cluster.SpectralClustering` to ``eigen_tol``, renamed ``mode``
to ``eigen_solver``.
- Renamed ``mode`` in :func:`manifold.spectral_embedding` and
:class:`cluster.SpectralClustering` to ``eigen_solver``.
- ``classes_`` and ``n_classes_`` attributes of
:class:`tree.DecisionTreeClassifier` and all derived ensemble models are
now flat in case of single output problems and nested in case of
multi-output problems.
- The ``estimators_`` attribute of
:class:`ensemble.GradientBoostingRegressor` and
:class:`ensemble.GradientBoostingClassifier` is now an
array of :class:`tree.DecisionTreeRegressor`.
- Renamed ``chunk_size`` to ``batch_size`` in
:class:`decomposition.MiniBatchDictionaryLearning` and
:class:`decomposition.MiniBatchSparsePCA` for consistency.
- :class:`svm.SVC` and :class:`svm.NuSVC` now provide a ``classes_``
attribute and support arbitrary dtypes for labels ``y``.
Also, the dtype returned by ``predict`` now reflects the dtype of
``y`` during ``fit`` (used to be ``np.float``).
- Changed default test_size in `cross_validation.train_test_split`
to None, added possibility to infer ``test_size`` from ``train_size`` in
`cross_validation.ShuffleSplit` and
`cross_validation.StratifiedShuffleSplit`.
- Renamed function `sklearn.metrics.zero_one` to
`sklearn.metrics.zero_one_loss`. Be aware that the default behavior
in `sklearn.metrics.zero_one_loss` is different from
`sklearn.metrics.zero_one`: ``normalize=False`` is changed to
``normalize=True``.
- Renamed function `metrics.zero_one_score` to
:func:`metrics.accuracy_score`.
- :func:`datasets.make_circles` now has the same number of inner and outer points.
- In the Naive Bayes classifiers, the ``class_prior`` parameter was moved
from ``fit`` to ``__init__``.
People
------
List of contributors for release 0.13 by number of commits.
* 364 `Andreas Müller`_
* 143 `Arnaud Joly`_
* 137 `Peter Prettenhofer`_
* 131 `Gael Varoquaux`_
* 117 `Mathieu Blondel`_
* 108 `Lars Buitinck`_
* 106 Wei Li
* 101 `Olivier Grisel`_
* 65 `Vlad Niculae`_
* 54 `Gilles Louppe`_
* 40 `Jaques Grobler`_
* 38 `Alexandre Gramfort`_
* 30 `Rob Zinkov`_
* 19 Aymeric Masurelle
* 18 Andrew Winterman
* 17 `Fabian Pedregosa`_
* 17 Nelle Varoquaux
* 16 `Christian Osendorfer`_
* 14 `Daniel Nouri`_
* 13 :user:`Virgile Fritsch <VirgileFritsch>`
* 13 syhw
* 12 `Satrajit Ghosh`_
* 10 Corey Lynch
* 10 Kyle Beauchamp
* 9 Brian Cheung
* 9 Immanuel Bayer
* 9 mr.Shu
* 8 Conrad Lee
* 8 `James Bergstra`_
* 7 Tadej Janež
* 6 Brian Cajes
* 6 `Jake Vanderplas`_
* 6 Michael
* 6 Noel Dawe
* 6 Tiago Nunes
* 6 cow
* 5 Anze
* 5 Shiqiao Du
* 4 Christian Jauvin
* 4 Jacques Kvam
* 4 Richard T. Guy
* 4 `Robert Layton`_
* 3 Alexandre Abraham
* 3 Doug Coleman
* 3 Scott Dickerson
* 2 ApproximateIdentity
* 2 John Benediktsson
* 2 Mark Veronda
* 2 Matti Lyra
* 2 Mikhail Korobov
* 2 Xinfan Meng
* 1 Alejandro Weinstein
* 1 `Alexandre Passos`_
* 1 Christoph Deil
* 1 Eugene Nizhibitsky
* 1 Kenneth C. Arnold
* 1 Luis Pedro Coelho
* 1 Miroslav Batchkarov
* 1 Pavel
* 1 Sebastian Berg
* 1 Shaun Jackman
* 1 Subhodeep Moitra
* 1 bob
* 1 dengemann
* 1 emanuele
* 1 x006