1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391
|
.. include:: _contributors.rst
.. currentmodule:: sklearn
============
Version 0.14
============
.. _changes_0_14:
Version 0.14
===============
**August 7, 2013**
Changelog
---------
- Missing values with sparse and dense matrices can be imputed with the
transformer `preprocessing.Imputer` by `Nicolas Trésegnie`_.
- The core implementation of decisions trees has been rewritten from
scratch, allowing for faster tree induction and lower memory
consumption in all tree-based estimators. By `Gilles Louppe`_.
- Added :class:`ensemble.AdaBoostClassifier` and
:class:`ensemble.AdaBoostRegressor`, by `Noel Dawe`_ and
`Gilles Louppe`_. See the :ref:`AdaBoost <adaboost>` section of the user
guide for details and examples.
- Added `grid_search.RandomizedSearchCV` and
`grid_search.ParameterSampler` for randomized hyperparameter
optimization. By `Andreas Müller`_.
- Added :ref:`biclustering <biclustering>` algorithms
(`sklearn.cluster.bicluster.SpectralCoclustering` and
`sklearn.cluster.bicluster.SpectralBiclustering`), data
generation methods (:func:`sklearn.datasets.make_biclusters` and
:func:`sklearn.datasets.make_checkerboard`), and scoring metrics
(:func:`sklearn.metrics.consensus_score`). By `Kemal Eren`_.
- Added :ref:`Restricted Boltzmann Machines<rbm>`
(:class:`neural_network.BernoulliRBM`). By `Yann Dauphin`_.
- Python 3 support by :user:`Justin Vincent <justinvf>`, `Lars Buitinck`_,
:user:`Subhodeep Moitra <smoitra87>` and `Olivier Grisel`_. All tests now pass under
Python 3.3.
- Ability to pass one penalty (alpha value) per target in
:class:`linear_model.Ridge`, by @eickenberg and `Mathieu Blondel`_.
- Fixed `sklearn.linear_model.stochastic_gradient.py` L2 regularization
issue (minor practical significance).
By :user:`Norbert Crombach <norbert>` and `Mathieu Blondel`_ .
- Added an interactive version of `Andreas Müller`_'s
`Machine Learning Cheat Sheet (for scikit-learn)
<https://peekaboo-vision.blogspot.de/2013/01/machine-learning-cheat-sheet-for-scikit.html>`_
to the documentation. See :ref:`Choosing the right estimator <ml_map>`.
By `Jaques Grobler`_.
- `grid_search.GridSearchCV` and
`cross_validation.cross_val_score` now support the use of advanced
scoring function such as area under the ROC curve and f-beta scores.
See :ref:`scoring_parameter` for details. By `Andreas Müller`_
and `Lars Buitinck`_.
Passing a function from :mod:`sklearn.metrics` as ``score_func`` is
deprecated.
- Multi-label classification output is now supported by
:func:`metrics.accuracy_score`, :func:`metrics.zero_one_loss`,
:func:`metrics.f1_score`, :func:`metrics.fbeta_score`,
:func:`metrics.classification_report`,
:func:`metrics.precision_score` and :func:`metrics.recall_score`
by `Arnaud Joly`_.
- Two new metrics :func:`metrics.hamming_loss` and
`metrics.jaccard_similarity_score`
are added with multi-label support by `Arnaud Joly`_.
- Speed and memory usage improvements in
:class:`feature_extraction.text.CountVectorizer` and
:class:`feature_extraction.text.TfidfVectorizer`,
by Jochen Wersdörfer and Roman Sinayev.
- The ``min_df`` parameter in
:class:`feature_extraction.text.CountVectorizer` and
:class:`feature_extraction.text.TfidfVectorizer`, which used to be 2,
has been reset to 1 to avoid unpleasant surprises (empty vocabularies)
for novice users who try it out on tiny document collections.
A value of at least 2 is still recommended for practical use.
- :class:`svm.LinearSVC`, :class:`linear_model.SGDClassifier` and
:class:`linear_model.SGDRegressor` now have a ``sparsify`` method that
converts their ``coef_`` into a sparse matrix, meaning stored models
trained using these estimators can be made much more compact.
- :class:`linear_model.SGDClassifier` now produces multiclass probability
estimates when trained under log loss or modified Huber loss.
- Hyperlinks to documentation in example code on the website by
:user:`Martin Luessi <mluessi>`.
- Fixed bug in :class:`preprocessing.MinMaxScaler` causing incorrect scaling
of the features for non-default ``feature_range`` settings. By `Andreas
Müller`_.
- ``max_features`` in :class:`tree.DecisionTreeClassifier`,
:class:`tree.DecisionTreeRegressor` and all derived ensemble estimators
now supports percentage values. By `Gilles Louppe`_.
- Performance improvements in :class:`isotonic.IsotonicRegression` by
`Nelle Varoquaux`_.
- :func:`metrics.accuracy_score` has an option normalize to return
the fraction or the number of correctly classified sample
by `Arnaud Joly`_.
- Added :func:`metrics.log_loss` that computes log loss, aka cross-entropy
loss. By Jochen Wersdörfer and `Lars Buitinck`_.
- A bug that caused :class:`ensemble.AdaBoostClassifier`'s to output
incorrect probabilities has been fixed.
- Feature selectors now share a mixin providing consistent ``transform``,
``inverse_transform`` and ``get_support`` methods. By `Joel Nothman`_.
- A fitted `grid_search.GridSearchCV` or
`grid_search.RandomizedSearchCV` can now generally be pickled.
By `Joel Nothman`_.
- Refactored and vectorized implementation of :func:`metrics.roc_curve`
and :func:`metrics.precision_recall_curve`. By `Joel Nothman`_.
- The new estimator :class:`sklearn.decomposition.TruncatedSVD`
performs dimensionality reduction using SVD on sparse matrices,
and can be used for latent semantic analysis (LSA).
By `Lars Buitinck`_.
- Added self-contained example of out-of-core learning on text data
:ref:`sphx_glr_auto_examples_applications_plot_out_of_core_classification.py`.
By :user:`Eustache Diemert <oddskool>`.
- The default number of components for
`sklearn.decomposition.RandomizedPCA` is now correctly documented
to be ``n_features``. This was the default behavior, so programs using it
will continue to work as they did.
- :class:`sklearn.cluster.KMeans` now fits several orders of magnitude
faster on sparse data (the speedup depends on the sparsity). By
`Lars Buitinck`_.
- Reduce memory footprint of FastICA by `Denis Engemann`_ and
`Alexandre Gramfort`_.
- Verbose output in `sklearn.ensemble.gradient_boosting` now uses
a column format and prints progress in decreasing frequency.
It also shows the remaining time. By `Peter Prettenhofer`_.
- `sklearn.ensemble.gradient_boosting` provides out-of-bag improvement
`oob_improvement_`
rather than the OOB score for model selection. An example that shows
how to use OOB estimates to select the number of trees was added.
By `Peter Prettenhofer`_.
- Most metrics now support string labels for multiclass classification
by `Arnaud Joly`_ and `Lars Buitinck`_.
- New OrthogonalMatchingPursuitCV class by `Alexandre Gramfort`_
and `Vlad Niculae`_.
- Fixed a bug in `sklearn.covariance.GraphLassoCV`: the
'alphas' parameter now works as expected when given a list of
values. By Philippe Gervais.
- Fixed an important bug in `sklearn.covariance.GraphLassoCV`
that prevented all folds provided by a CV object to be used (only
the first 3 were used). When providing a CV object, execution
time may thus increase significantly compared to the previous
version (bug results are correct now). By Philippe Gervais.
- `cross_validation.cross_val_score` and the `grid_search`
module is now tested with multi-output data by `Arnaud Joly`_.
- :func:`datasets.make_multilabel_classification` can now return
the output in label indicator multilabel format by `Arnaud Joly`_.
- K-nearest neighbors, :class:`neighbors.KNeighborsRegressor`
and :class:`neighbors.RadiusNeighborsRegressor`,
and radius neighbors, :class:`neighbors.RadiusNeighborsRegressor` and
:class:`neighbors.RadiusNeighborsClassifier` support multioutput data
by `Arnaud Joly`_.
- Random state in LibSVM-based estimators (:class:`svm.SVC`, :class:`svm.NuSVC`,
:class:`svm.OneClassSVM`, :class:`svm.SVR`, :class:`svm.NuSVR`) can now be
controlled. This is useful to ensure consistency in the probability
estimates for the classifiers trained with ``probability=True``. By
`Vlad Niculae`_.
- Out-of-core learning support for discrete naive Bayes classifiers
:class:`sklearn.naive_bayes.MultinomialNB` and
:class:`sklearn.naive_bayes.BernoulliNB` by adding the ``partial_fit``
method by `Olivier Grisel`_.
- New website design and navigation by `Gilles Louppe`_, `Nelle Varoquaux`_,
Vincent Michel and `Andreas Müller`_.
- Improved documentation on :ref:`multi-class, multi-label and multi-output
classification <multiclass>` by `Yannick Schwartz`_ and `Arnaud Joly`_.
- Better input and error handling in the :mod:`sklearn.metrics` module by
`Arnaud Joly`_ and `Joel Nothman`_.
- Speed optimization of the `hmm` module by :user:`Mikhail Korobov <kmike>`
- Significant speed improvements for :class:`sklearn.cluster.DBSCAN`
by `cleverless <https://github.com/cleverless>`_
API changes summary
-------------------
- The `auc_score` was renamed :func:`metrics.roc_auc_score`.
- Testing scikit-learn with ``sklearn.test()`` is deprecated. Use
``nosetests sklearn`` from the command line.
- Feature importances in :class:`tree.DecisionTreeClassifier`,
:class:`tree.DecisionTreeRegressor` and all derived ensemble estimators
are now computed on the fly when accessing the ``feature_importances_``
attribute. Setting ``compute_importances=True`` is no longer required.
By `Gilles Louppe`_.
- :class:`linear_model.lasso_path` and
:class:`linear_model.enet_path` can return its results in the same
format as that of :class:`linear_model.lars_path`. This is done by
setting the ``return_models`` parameter to ``False``. By
`Jaques Grobler`_ and `Alexandre Gramfort`_
- `grid_search.IterGrid` was renamed to `grid_search.ParameterGrid`.
- Fixed bug in `KFold` causing imperfect class balance in some
cases. By `Alexandre Gramfort`_ and Tadej Janež.
- :class:`sklearn.neighbors.BallTree` has been refactored, and a
:class:`sklearn.neighbors.KDTree` has been
added which shares the same interface. The Ball Tree now works with
a wide variety of distance metrics. Both classes have many new
methods, including single-tree and dual-tree queries, breadth-first
and depth-first searching, and more advanced queries such as
kernel density estimation and 2-point correlation functions.
By `Jake Vanderplas`_
- Support for scipy.spatial.cKDTree within neighbors queries has been
removed, and the functionality replaced with the new
:class:`sklearn.neighbors.KDTree` class.
- :class:`sklearn.neighbors.KernelDensity` has been added, which performs
efficient kernel density estimation with a variety of kernels.
- :class:`sklearn.decomposition.KernelPCA` now always returns output with
``n_components`` components, unless the new parameter ``remove_zero_eig``
is set to ``True``. This new behavior is consistent with the way
kernel PCA was always documented; previously, the removal of components
with zero eigenvalues was tacitly performed on all data.
- ``gcv_mode="auto"`` no longer tries to perform SVD on a densified
sparse matrix in :class:`sklearn.linear_model.RidgeCV`.
- Sparse matrix support in `sklearn.decomposition.RandomizedPCA`
is now deprecated in favor of the new ``TruncatedSVD``.
- `cross_validation.KFold` and
`cross_validation.StratifiedKFold` now enforce `n_folds >= 2`
otherwise a ``ValueError`` is raised. By `Olivier Grisel`_.
- :func:`datasets.load_files`'s ``charset`` and ``charset_errors``
parameters were renamed ``encoding`` and ``decode_errors``.
- Attribute ``oob_score_`` in :class:`sklearn.ensemble.GradientBoostingRegressor`
and :class:`sklearn.ensemble.GradientBoostingClassifier`
is deprecated and has been replaced by ``oob_improvement_`` .
- Attributes in OrthogonalMatchingPursuit have been deprecated
(copy_X, Gram, ...) and precompute_gram renamed precompute
for consistency. See #2224.
- :class:`sklearn.preprocessing.StandardScaler` now converts integer input
to float, and raises a warning. Previously it rounded for dense integer
input.
- :class:`sklearn.multiclass.OneVsRestClassifier` now has a
``decision_function`` method. This will return the distance of each
sample from the decision boundary for each class, as long as the
underlying estimators implement the ``decision_function`` method.
By `Kyle Kastner`_.
- Better input validation, warning on unexpected shapes for y.
People
------
List of contributors for release 0.14 by number of commits.
* 277 Gilles Louppe
* 245 Lars Buitinck
* 187 Andreas Mueller
* 124 Arnaud Joly
* 112 Jaques Grobler
* 109 Gael Varoquaux
* 107 Olivier Grisel
* 102 Noel Dawe
* 99 Kemal Eren
* 79 Joel Nothman
* 75 Jake VanderPlas
* 73 Nelle Varoquaux
* 71 Vlad Niculae
* 65 Peter Prettenhofer
* 64 Alexandre Gramfort
* 54 Mathieu Blondel
* 38 Nicolas Trésegnie
* 35 eustache
* 27 Denis Engemann
* 25 Yann N. Dauphin
* 19 Justin Vincent
* 17 Robert Layton
* 15 Doug Coleman
* 14 Michael Eickenberg
* 13 Robert Marchman
* 11 Fabian Pedregosa
* 11 Philippe Gervais
* 10 Jim Holmström
* 10 Tadej Janež
* 10 syhw
* 9 Mikhail Korobov
* 9 Steven De Gryze
* 8 sergeyf
* 7 Ben Root
* 7 Hrishikesh Huilgolkar
* 6 Kyle Kastner
* 6 Martin Luessi
* 6 Rob Speer
* 5 Federico Vaggi
* 5 Raul Garreta
* 5 Rob Zinkov
* 4 Ken Geis
* 3 A. Flaxman
* 3 Denton Cockburn
* 3 Dougal Sutherland
* 3 Ian Ozsvald
* 3 Johannes Schönberger
* 3 Robert McGibbon
* 3 Roman Sinayev
* 3 Szabo Roland
* 2 Diego Molla
* 2 Imran Haque
* 2 Jochen Wersdörfer
* 2 Sergey Karayev
* 2 Yannick Schwartz
* 2 jamestwebber
* 1 Abhijeet Kolhe
* 1 Alexander Fabisch
* 1 Bastiaan van den Berg
* 1 Benjamin Peterson
* 1 Daniel Velkov
* 1 Fazlul Shahriar
* 1 Felix Brockherde
* 1 Félix-Antoine Fortin
* 1 Harikrishnan S
* 1 Jack Hale
* 1 JakeMick
* 1 James McDermott
* 1 John Benediktsson
* 1 John Zwinck
* 1 Joshua Vredevoogd
* 1 Justin Pati
* 1 Kevin Hughes
* 1 Kyle Kelley
* 1 Matthias Ekman
* 1 Miroslav Shubernetskiy
* 1 Naoki Orii
* 1 Norbert Crombach
* 1 Rafael Cunha de Almeida
* 1 Rolando Espinoza La fuente
* 1 Seamus Abshere
* 1 Sergey Feldman
* 1 Sergio Medina
* 1 Stefano Lattarini
* 1 Steve Koch
* 1 Sturla Molden
* 1 Thomas Jarosch
* 1 Yaroslav Halchenko
|