sklearn/doc/whats_new/v0.13.rst

.. include:: _contributors.rst

.. currentmodule:: sklearn

============
Version 0.13
============

.. _changes_0_13_1:

Version 0.13.1
==============

**February 23, 2013**

The 0.13.1 release only fixes some bugs and does not add any new functionality.

Changelog
---------

- Fixed a testing error caused by the function `cross_validation.train_test_split` being
  interpreted as a test by `Yaroslav Halchenko`_.

- Fixed a bug in the reassignment of small clusters in the :class:`cluster.MiniBatchKMeans`
  by `Gael Varoquaux`_.

- Fixed default value of ``gamma`` in :class:`decomposition.KernelPCA` by `Lars Buitinck`_.

- Updated joblib to ``0.7.0d`` by `Gael Varoquaux`_.

- Fixed scaling of the deviance in :class:`ensemble.GradientBoostingClassifier` by `Peter Prettenhofer`_.

- Better tie-breaking in :class:`multiclass.OneVsOneClassifier` by `Andreas Müller`_.

- Other small improvements to tests and documentation.

People
------
List of contributors for release 0.13.1 by number of commits.

* 16  `Lars Buitinck`_
* 12  `Andreas Müller`_
*  8  `Gael Varoquaux`_
*  5  Robert Marchman
*  3  `Peter Prettenhofer`_
*  2  Hrishikesh Huilgolkar
*  1  Bastiaan van den Berg
*  1  Diego Molla
*  1  `Gilles Louppe`_
*  1  `Mathieu Blondel`_
*  1  `Nelle Varoquaux`_
*  1  Rafael Cunha de Almeida
*  1  Rolando Espinoza La fuente
*  1  `Vlad Niculae`_
*  1  `Yaroslav Halchenko`_


.. _changes_0_13:

Version 0.13
============

**January 21, 2013**

New Estimator Classes
---------------------

- :class:`dummy.DummyClassifier` and :class:`dummy.DummyRegressor`, two
  data-independent predictors by `Mathieu Blondel`_. Useful to sanity-check
  your estimators. See :ref:`dummy_estimators` in the user guide.
  Multioutput support added by `Arnaud Joly`_.

- :class:`decomposition.FactorAnalysis`, a transformer implementing the
  classical factor analysis, by `Christian Osendorfer`_ and `Alexandre
  Gramfort`_. See :ref:`FA` in the user guide.

- :class:`feature_extraction.FeatureHasher`, a transformer implementing the
  "hashing trick" for fast, low-memory feature extraction from string fields
  by `Lars Buitinck`_ and :class:`feature_extraction.text.HashingVectorizer`
  for text documents by `Olivier Grisel`_  See :ref:`feature_hashing` and
  :ref:`hashing_vectorizer` for the documentation and sample usage.

- :class:`pipeline.FeatureUnion`, a transformer that concatenates
  results of several other transformers by `Andreas Müller`_. See
  :ref:`feature_union` in the user guide.

- :class:`random_projection.GaussianRandomProjection`,
  :class:`random_projection.SparseRandomProjection` and the function
  :func:`random_projection.johnson_lindenstrauss_min_dim`. The first two are
  transformers implementing Gaussian and sparse random projection matrix
  by `Olivier Grisel`_ and `Arnaud Joly`_.
  See :ref:`random_projection` in the user guide.

- :class:`kernel_approximation.Nystroem`, a transformer for approximating
  arbitrary kernels by `Andreas Müller`_. See
  :ref:`nystroem_kernel_approx` in the user guide.

- :class:`preprocessing.OneHotEncoder`, a transformer that computes binary
  encodings of categorical features by `Andreas Müller`_. See
  :ref:`preprocessing_categorical_features` in the user guide.

- :class:`linear_model.PassiveAggressiveClassifier` and
  :class:`linear_model.PassiveAggressiveRegressor`, predictors implementing
  an efficient stochastic optimization for linear models by `Rob Zinkov`_ and
  `Mathieu Blondel`_. See :ref:`passive_aggressive` in the user
  guide.

- :class:`ensemble.RandomTreesEmbedding`, a transformer for creating high-dimensional
  sparse representations using ensembles of totally random trees by  `Andreas Müller`_.
  See :ref:`random_trees_embedding` in the user guide.

- :class:`manifold.SpectralEmbedding` and function
  :func:`manifold.spectral_embedding`, implementing the "laplacian
  eigenmaps" transformation for non-linear dimensionality reduction by Wei
  Li. See :ref:`spectral_embedding` in the user guide.

- :class:`isotonic.IsotonicRegression` by `Fabian Pedregosa`_, `Alexandre Gramfort`_
  and `Nelle Varoquaux`_,


Changelog
---------

- :func:`metrics.zero_one_loss` (formerly ``metrics.zero_one``) now has
  option for normalized output that reports the fraction of
  misclassifications, rather than the raw number of misclassifications. By
  Kyle Beauchamp.

- :class:`tree.DecisionTreeClassifier` and all derived ensemble models now
  support sample weighting, by `Noel Dawe`_  and `Gilles Louppe`_.

- Speedup improvement when using bootstrap samples in forests of randomized
  trees, by `Peter Prettenhofer`_  and `Gilles Louppe`_.

- Partial dependence plots for :ref:`gradient_boosting` in
  `ensemble.partial_dependence.partial_dependence` by `Peter
  Prettenhofer`_. See :ref:`sphx_glr_auto_examples_inspection_plot_partial_dependence.py` for an
  example.

- The table of contents on the website has now been made expandable by
  `Jaques Grobler`_.

- :class:`feature_selection.SelectPercentile` now breaks ties
  deterministically instead of returning all equally ranked features.

- :class:`feature_selection.SelectKBest` and
  :class:`feature_selection.SelectPercentile` are more numerically stable
  since they use scores, rather than p-values, to rank results. This means
  that they might sometimes select different features than they did
  previously.

- Ridge regression and ridge classification fitting with ``sparse_cg`` solver
  no longer has quadratic memory complexity, by `Lars Buitinck`_ and
  `Fabian Pedregosa`_.

- Ridge regression and ridge classification now support a new fast solver
  called ``lsqr``, by `Mathieu Blondel`_.

- Speed up of :func:`metrics.precision_recall_curve` by Conrad Lee.

- Added support for reading/writing svmlight files with pairwise
  preference attribute (qid in svmlight file format) in
  :func:`datasets.dump_svmlight_file` and
  :func:`datasets.load_svmlight_file` by `Fabian Pedregosa`_.

- Faster and more robust :func:`metrics.confusion_matrix` and
  :ref:`clustering_evaluation` by Wei Li.

- `cross_validation.cross_val_score` now works with precomputed kernels
  and affinity matrices, by `Andreas Müller`_.

- LARS algorithm made more numerically stable with heuristics to drop
  regressors too correlated as well as to stop the path when
  numerical noise becomes predominant, by `Gael Varoquaux`_.

- Faster implementation of :func:`metrics.precision_recall_curve` by
  Conrad Lee.

- New kernel `metrics.chi2_kernel` by `Andreas Müller`_, often used
  in computer vision applications.

- Fix of longstanding bug in :class:`naive_bayes.BernoulliNB` fixed by
  Shaun Jackman.

- Implemented ``predict_proba`` in :class:`multiclass.OneVsRestClassifier`,
  by Andrew Winterman.

- Improve consistency in gradient boosting: estimators
  :class:`ensemble.GradientBoostingRegressor` and
  :class:`ensemble.GradientBoostingClassifier` use the estimator
  :class:`tree.DecisionTreeRegressor` instead of the
  `tree._tree.Tree` data structure by `Arnaud Joly`_.

- Fixed a floating point exception in the :ref:`decision trees <tree>`
  module, by Seberg.

- Fix :func:`metrics.roc_curve` fails when y_true has only one class
  by Wei Li.

- Add the :func:`metrics.mean_absolute_error` function which computes the
  mean absolute error. The :func:`metrics.mean_squared_error`,
  :func:`metrics.mean_absolute_error` and
  :func:`metrics.r2_score` metrics support multioutput by `Arnaud Joly`_.

- Fixed ``class_weight`` support in :class:`svm.LinearSVC` and
  :class:`linear_model.LogisticRegression` by `Andreas Müller`_. The meaning
  of ``class_weight`` was reversed as erroneously higher weight meant less
  positives of a given class in earlier releases.

- Improve narrative documentation and consistency in
  :mod:`sklearn.metrics` for regression and classification metrics
  by `Arnaud Joly`_.

- Fixed a bug in :class:`sklearn.svm.SVC` when using csr-matrices with
  unsorted indices by Xinfan Meng and `Andreas Müller`_.

- :class:`cluster.MiniBatchKMeans`: Add random reassignment of cluster centers
  with little observations attached to them, by `Gael Varoquaux`_.


API changes summary
-------------------
- Renamed all occurrences of ``n_atoms`` to ``n_components`` for consistency.
  This applies to :class:`decomposition.DictionaryLearning`,
  :class:`decomposition.MiniBatchDictionaryLearning`,
  :func:`decomposition.dict_learning`, :func:`decomposition.dict_learning_online`.

- Renamed all occurrences of ``max_iters`` to ``max_iter`` for consistency.
  This applies to `semi_supervised.LabelPropagation` and
  `semi_supervised.label_propagation.LabelSpreading`.

- Renamed all occurrences of ``learn_rate`` to ``learning_rate`` for
  consistency in `ensemble.BaseGradientBoosting` and
  :class:`ensemble.GradientBoostingRegressor`.

- The module ``sklearn.linear_model.sparse`` is gone. Sparse matrix support
  was already integrated into the "regular" linear models.

- `sklearn.metrics.mean_square_error`, which incorrectly returned the
  accumulated error, was removed. Use :func:`metrics.mean_squared_error` instead.

- Passing ``class_weight`` parameters to ``fit`` methods is no longer
  supported. Pass them to estimator constructors instead.

- GMMs no longer have ``decode`` and ``rvs`` methods. Use the ``score``,
  ``predict`` or ``sample`` methods instead.

- The ``solver`` fit option in Ridge regression and classification is now
  deprecated and will be removed in v0.14. Use the constructor option
  instead.

- `feature_extraction.text.DictVectorizer` now returns sparse
  matrices in the CSR format, instead of COO.

- Renamed ``k`` in `cross_validation.KFold` and
  `cross_validation.StratifiedKFold` to ``n_folds``, renamed
  ``n_bootstraps`` to ``n_iter`` in ``cross_validation.Bootstrap``.

- Renamed all occurrences of ``n_iterations`` to ``n_iter`` for consistency.
  This applies to `cross_validation.ShuffleSplit`,
  `cross_validation.StratifiedShuffleSplit`,
  :func:`utils.extmath.randomized_range_finder` and
  :func:`utils.extmath.randomized_svd`.

- Replaced ``rho`` in :class:`linear_model.ElasticNet` and
  :class:`linear_model.SGDClassifier` by ``l1_ratio``. The ``rho`` parameter
  had different meanings; ``l1_ratio`` was introduced to avoid confusion.
  It has the same meaning as previously ``rho`` in
  :class:`linear_model.ElasticNet` and ``(1-rho)`` in
  :class:`linear_model.SGDClassifier`.

- :class:`linear_model.LassoLars` and :class:`linear_model.Lars` now
  store a list of paths in the case of multiple targets, rather than
  an array of paths.

- The attribute ``gmm`` of `hmm.GMMHMM` was renamed to ``gmm_``
  to adhere more strictly with the API.

- `cluster.spectral_embedding` was moved to
  :func:`manifold.spectral_embedding`.

- Renamed ``eig_tol`` in :func:`manifold.spectral_embedding`,
  :class:`cluster.SpectralClustering` to ``eigen_tol``, renamed ``mode``
  to ``eigen_solver``.

- Renamed ``mode`` in :func:`manifold.spectral_embedding` and
  :class:`cluster.SpectralClustering` to ``eigen_solver``.

- ``classes_`` and ``n_classes_`` attributes of
  :class:`tree.DecisionTreeClassifier` and all derived ensemble models are
  now flat in case of single output problems and nested in case of
  multi-output problems.

- The ``estimators_`` attribute of
  :class:`ensemble.GradientBoostingRegressor` and
  :class:`ensemble.GradientBoostingClassifier` is now an
  array of :class:`tree.DecisionTreeRegressor`.

- Renamed ``chunk_size`` to ``batch_size`` in
  :class:`decomposition.MiniBatchDictionaryLearning` and
  :class:`decomposition.MiniBatchSparsePCA` for consistency.

- :class:`svm.SVC` and :class:`svm.NuSVC` now provide a ``classes_``
  attribute and support arbitrary dtypes for labels ``y``.
  Also, the dtype returned by ``predict`` now reflects the dtype of
  ``y`` during ``fit`` (used to be ``np.float``).

- Changed default test_size in `cross_validation.train_test_split`
  to None, added possibility to infer ``test_size`` from ``train_size`` in
  `cross_validation.ShuffleSplit` and
  `cross_validation.StratifiedShuffleSplit`.

- Renamed function `sklearn.metrics.zero_one` to
  `sklearn.metrics.zero_one_loss`. Be aware that the default behavior
  in `sklearn.metrics.zero_one_loss` is different from
  `sklearn.metrics.zero_one`: ``normalize=False`` is changed to
  ``normalize=True``.

- Renamed function `metrics.zero_one_score` to
  :func:`metrics.accuracy_score`.

- :func:`datasets.make_circles` now has the same number of inner and outer points.

- In the Naive Bayes classifiers, the ``class_prior`` parameter was moved
  from ``fit`` to ``__init__``.

People
------
List of contributors for release 0.13 by number of commits.

* 364  `Andreas Müller`_
* 143  `Arnaud Joly`_
* 137  `Peter Prettenhofer`_
* 131  `Gael Varoquaux`_
* 117  `Mathieu Blondel`_
* 108  `Lars Buitinck`_
* 106  Wei Li
* 101  `Olivier Grisel`_
*  65  `Vlad Niculae`_
*  54  `Gilles Louppe`_
*  40  `Jaques Grobler`_
*  38  `Alexandre Gramfort`_
*  30  `Rob Zinkov`_
*  19  Aymeric Masurelle
*  18  Andrew Winterman
*  17  `Fabian Pedregosa`_
*  17  Nelle Varoquaux
*  16  `Christian Osendorfer`_
*  14  `Daniel Nouri`_
*  13  :user:`Virgile Fritsch <VirgileFritsch>`
*  13  syhw
*  12  `Satrajit Ghosh`_
*  10  Corey Lynch
*  10  Kyle Beauchamp
*   9  Brian Cheung
*   9  Immanuel Bayer
*   9  mr.Shu
*   8  Conrad Lee
*   8  `James Bergstra`_
*   7  Tadej Janež
*   6  Brian Cajes
*   6  `Jake Vanderplas`_
*   6  Michael
*   6  Noel Dawe
*   6  Tiago Nunes
*   6  cow
*   5  Anze
*   5  Shiqiao Du
*   4  Christian Jauvin
*   4  Jacques Kvam
*   4  Richard T. Guy
*   4  `Robert Layton`_
*   3  Alexandre Abraham
*   3  Doug Coleman
*   3  Scott Dickerson
*   2  ApproximateIdentity
*   2  John Benediktsson
*   2  Mark Veronda
*   2  Matti Lyra
*   2  Mikhail Korobov
*   2  Xinfan Meng
*   1  Alejandro Weinstein
*   1  `Alexandre Passos`_
*   1  Christoph Deil
*   1  Eugene Nizhibitsky
*   1  Kenneth C. Arnold
*   1  Luis Pedro Coelho
*   1  Miroslav Batchkarov
*   1  Pavel
*   1  Sebastian Berg
*   1  Shaun Jackman
*   1  Subhodeep Moitra
*   1  bob
*   1  dengemann
*   1  emanuele
*   1  x006