File: v0.13.rst

package info (click to toggle)
scikit-learn 0.23.2-5
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, sid
  • size: 21,892 kB
  • sloc: python: 132,020; cpp: 5,765; javascript: 2,201; ansic: 831; makefile: 213; sh: 44
file content (391 lines) | stat: -rw-r--r-- 14,368 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
.. include:: _contributors.rst

.. currentmodule:: sklearn

.. _changes_0_13_1:

Version 0.13.1
==============

**February 23, 2013**

The 0.13.1 release only fixes some bugs and does not add any new functionality.

Changelog
---------

- Fixed a testing error caused by the function :func:`cross_validation.train_test_split` being
  interpreted as a test by `Yaroslav Halchenko`_.

- Fixed a bug in the reassignment of small clusters in the :class:`cluster.MiniBatchKMeans`
  by `Gael Varoquaux`_.

- Fixed default value of ``gamma`` in :class:`decomposition.KernelPCA` by `Lars Buitinck`_.

- Updated joblib to ``0.7.0d`` by `Gael Varoquaux`_.

- Fixed scaling of the deviance in :class:`ensemble.GradientBoostingClassifier` by `Peter Prettenhofer`_.

- Better tie-breaking in :class:`multiclass.OneVsOneClassifier` by `Andreas Müller`_.

- Other small improvements to tests and documentation.

People
------
List of contributors for release 0.13.1 by number of commits.
 * 16  `Lars Buitinck`_
 * 12  `Andreas Müller`_
 *  8  `Gael Varoquaux`_
 *  5  Robert Marchman
 *  3  `Peter Prettenhofer`_
 *  2  Hrishikesh Huilgolkar
 *  1  Bastiaan van den Berg
 *  1  Diego Molla
 *  1  `Gilles Louppe`_
 *  1  `Mathieu Blondel`_
 *  1  `Nelle Varoquaux`_
 *  1  Rafael Cunha de Almeida
 *  1  Rolando Espinoza La fuente
 *  1  `Vlad Niculae`_
 *  1  `Yaroslav Halchenko`_


.. _changes_0_13:

Version 0.13
============

**January 21, 2013**

New Estimator Classes
---------------------

- :class:`dummy.DummyClassifier` and :class:`dummy.DummyRegressor`, two
  data-independent predictors by `Mathieu Blondel`_. Useful to sanity-check
  your estimators. See :ref:`dummy_estimators` in the user guide.
  Multioutput support added by `Arnaud Joly`_.

- :class:`decomposition.FactorAnalysis`, a transformer implementing the
  classical factor analysis, by `Christian Osendorfer`_ and `Alexandre
  Gramfort`_. See :ref:`FA` in the user guide.

- :class:`feature_extraction.FeatureHasher`, a transformer implementing the
  "hashing trick" for fast, low-memory feature extraction from string fields
  by `Lars Buitinck`_ and :class:`feature_extraction.text.HashingVectorizer`
  for text documents by `Olivier Grisel`_  See :ref:`feature_hashing` and
  :ref:`hashing_vectorizer` for the documentation and sample usage.

- :class:`pipeline.FeatureUnion`, a transformer that concatenates
  results of several other transformers by `Andreas Müller`_. See
  :ref:`feature_union` in the user guide.

- :class:`random_projection.GaussianRandomProjection`,
  :class:`random_projection.SparseRandomProjection` and the function
  :func:`random_projection.johnson_lindenstrauss_min_dim`. The first two are
  transformers implementing Gaussian and sparse random projection matrix
  by `Olivier Grisel`_ and `Arnaud Joly`_.
  See :ref:`random_projection` in the user guide.

- :class:`kernel_approximation.Nystroem`, a transformer for approximating
  arbitrary kernels by `Andreas Müller`_. See
  :ref:`nystroem_kernel_approx` in the user guide.

- :class:`preprocessing.OneHotEncoder`, a transformer that computes binary
  encodings of categorical features by `Andreas Müller`_. See
  :ref:`preprocessing_categorical_features` in the user guide.

- :class:`linear_model.PassiveAggressiveClassifier` and
  :class:`linear_model.PassiveAggressiveRegressor`, predictors implementing
  an efficient stochastic optimization for linear models by `Rob Zinkov`_ and
  `Mathieu Blondel`_. See :ref:`passive_aggressive` in the user
  guide.

- :class:`ensemble.RandomTreesEmbedding`, a transformer for creating high-dimensional
  sparse representations using ensembles of totally random trees by  `Andreas Müller`_.
  See :ref:`random_trees_embedding` in the user guide.

- :class:`manifold.SpectralEmbedding` and function
  :func:`manifold.spectral_embedding`, implementing the "laplacian
  eigenmaps" transformation for non-linear dimensionality reduction by Wei
  Li. See :ref:`spectral_embedding` in the user guide.

- :class:`isotonic.IsotonicRegression` by `Fabian Pedregosa`_, `Alexandre Gramfort`_
  and `Nelle Varoquaux`_,


Changelog
---------

- :func:`metrics.zero_one_loss` (formerly ``metrics.zero_one``) now has
  option for normalized output that reports the fraction of
  misclassifications, rather than the raw number of misclassifications. By
  Kyle Beauchamp.

- :class:`tree.DecisionTreeClassifier` and all derived ensemble models now
  support sample weighting, by `Noel Dawe`_  and `Gilles Louppe`_.

- Speedup improvement when using bootstrap samples in forests of randomized
  trees, by `Peter Prettenhofer`_  and `Gilles Louppe`_.

- Partial dependence plots for :ref:`gradient_boosting` in
  :func:`ensemble.partial_dependence.partial_dependence` by `Peter
  Prettenhofer`_. See :ref:`sphx_glr_auto_examples_inspection_plot_partial_dependence.py` for an
  example.

- The table of contents on the website has now been made expandable by
  `Jaques Grobler`_.

- :class:`feature_selection.SelectPercentile` now breaks ties
  deterministically instead of returning all equally ranked features.

- :class:`feature_selection.SelectKBest` and
  :class:`feature_selection.SelectPercentile` are more numerically stable
  since they use scores, rather than p-values, to rank results. This means
  that they might sometimes select different features than they did
  previously.

- Ridge regression and ridge classification fitting with ``sparse_cg`` solver
  no longer has quadratic memory complexity, by `Lars Buitinck`_ and
  `Fabian Pedregosa`_.

- Ridge regression and ridge classification now support a new fast solver
  called ``lsqr``, by `Mathieu Blondel`_.

- Speed up of :func:`metrics.precision_recall_curve` by Conrad Lee.

- Added support for reading/writing svmlight files with pairwise
  preference attribute (qid in svmlight file format) in
  :func:`datasets.dump_svmlight_file` and
  :func:`datasets.load_svmlight_file` by `Fabian Pedregosa`_.

- Faster and more robust :func:`metrics.confusion_matrix` and
  :ref:`clustering_evaluation` by Wei Li.

- :func:`cross_validation.cross_val_score` now works with precomputed kernels
  and affinity matrices, by `Andreas Müller`_.

- LARS algorithm made more numerically stable with heuristics to drop
  regressors too correlated as well as to stop the path when
  numerical noise becomes predominant, by `Gael Varoquaux`_.

- Faster implementation of :func:`metrics.precision_recall_curve` by
  Conrad Lee.

- New kernel :class:`metrics.chi2_kernel` by `Andreas Müller`_, often used
  in computer vision applications.

- Fix of longstanding bug in :class:`naive_bayes.BernoulliNB` fixed by
  Shaun Jackman.

- Implemented ``predict_proba`` in :class:`multiclass.OneVsRestClassifier`,
  by Andrew Winterman.

- Improve consistency in gradient boosting: estimators
  :class:`ensemble.GradientBoostingRegressor` and
  :class:`ensemble.GradientBoostingClassifier` use the estimator
  :class:`tree.DecisionTreeRegressor` instead of the
  :class:`tree._tree.Tree` data structure by `Arnaud Joly`_.

- Fixed a floating point exception in the :ref:`decision trees <tree>`
  module, by Seberg.

- Fix :func:`metrics.roc_curve` fails when y_true has only one class
  by Wei Li.

- Add the :func:`metrics.mean_absolute_error` function which computes the
  mean absolute error. The :func:`metrics.mean_squared_error`,
  :func:`metrics.mean_absolute_error` and
  :func:`metrics.r2_score` metrics support multioutput by `Arnaud Joly`_.

- Fixed ``class_weight`` support in :class:`svm.LinearSVC` and
  :class:`linear_model.LogisticRegression` by `Andreas Müller`_. The meaning
  of ``class_weight`` was reversed as erroneously higher weight meant less
  positives of a given class in earlier releases.

- Improve narrative documentation and consistency in
  :mod:`sklearn.metrics` for regression and classification metrics
  by `Arnaud Joly`_.

- Fixed a bug in :class:`sklearn.svm.SVC` when using csr-matrices with
  unsorted indices by Xinfan Meng and `Andreas Müller`_.

- :class:`MiniBatchKMeans`: Add random reassignment of cluster centers
  with little observations attached to them, by `Gael Varoquaux`_.


API changes summary
-------------------
- Renamed all occurrences of ``n_atoms`` to ``n_components`` for consistency.
  This applies to :class:`decomposition.DictionaryLearning`,
  :class:`decomposition.MiniBatchDictionaryLearning`,
  :func:`decomposition.dict_learning`, :func:`decomposition.dict_learning_online`.

- Renamed all occurrences of ``max_iters`` to ``max_iter`` for consistency.
  This applies to :class:`semi_supervised.LabelPropagation` and
  :class:`semi_supervised.label_propagation.LabelSpreading`.

- Renamed all occurrences of ``learn_rate`` to ``learning_rate`` for
  consistency in :class:`ensemble.BaseGradientBoosting` and
  :class:`ensemble.GradientBoostingRegressor`.

- The module ``sklearn.linear_model.sparse`` is gone. Sparse matrix support
  was already integrated into the "regular" linear models.

- :func:`sklearn.metrics.mean_square_error`, which incorrectly returned the
  accumulated error, was removed. Use ``mean_squared_error`` instead.

- Passing ``class_weight`` parameters to ``fit`` methods is no longer
  supported. Pass them to estimator constructors instead.

- GMMs no longer have ``decode`` and ``rvs`` methods. Use the ``score``,
  ``predict`` or ``sample`` methods instead.

- The ``solver`` fit option in Ridge regression and classification is now
  deprecated and will be removed in v0.14. Use the constructor option
  instead.

- :class:`feature_extraction.text.DictVectorizer` now returns sparse
  matrices in the CSR format, instead of COO.

- Renamed ``k`` in :class:`cross_validation.KFold` and
  :class:`cross_validation.StratifiedKFold` to ``n_folds``, renamed
  ``n_bootstraps`` to ``n_iter`` in ``cross_validation.Bootstrap``.

- Renamed all occurrences of ``n_iterations`` to ``n_iter`` for consistency.
  This applies to :class:`cross_validation.ShuffleSplit`,
  :class:`cross_validation.StratifiedShuffleSplit`,
  :func:`utils.randomized_range_finder` and :func:`utils.randomized_svd`.

- Replaced ``rho`` in :class:`linear_model.ElasticNet` and
  :class:`linear_model.SGDClassifier` by ``l1_ratio``. The ``rho`` parameter
  had different meanings; ``l1_ratio`` was introduced to avoid confusion.
  It has the same meaning as previously ``rho`` in
  :class:`linear_model.ElasticNet` and ``(1-rho)`` in
  :class:`linear_model.SGDClassifier`.

- :class:`linear_model.LassoLars` and :class:`linear_model.Lars` now
  store a list of paths in the case of multiple targets, rather than
  an array of paths.

- The attribute ``gmm`` of :class:`hmm.GMMHMM` was renamed to ``gmm_``
  to adhere more strictly with the API.

- :func:`cluster.spectral_embedding` was moved to
  :func:`manifold.spectral_embedding`.

- Renamed ``eig_tol`` in :func:`manifold.spectral_embedding`,
  :class:`cluster.SpectralClustering` to ``eigen_tol``, renamed ``mode``
  to ``eigen_solver``.

- Renamed ``mode`` in :func:`manifold.spectral_embedding` and
  :class:`cluster.SpectralClustering` to ``eigen_solver``.

- ``classes_`` and ``n_classes_`` attributes of
  :class:`tree.DecisionTreeClassifier` and all derived ensemble models are
  now flat in case of single output problems and nested in case of
  multi-output problems.

- The ``estimators_`` attribute of
  :class:`ensemble.gradient_boosting.GradientBoostingRegressor` and
  :class:`ensemble.gradient_boosting.GradientBoostingClassifier` is now an
  array of :class:'tree.DecisionTreeRegressor'.

- Renamed ``chunk_size`` to ``batch_size`` in
  :class:`decomposition.MiniBatchDictionaryLearning` and
  :class:`decomposition.MiniBatchSparsePCA` for consistency.

- :class:`svm.SVC` and :class:`svm.NuSVC` now provide a ``classes_``
  attribute and support arbitrary dtypes for labels ``y``.
  Also, the dtype returned by ``predict`` now reflects the dtype of
  ``y`` during ``fit`` (used to be ``np.float``).

- Changed default test_size in :func:`cross_validation.train_test_split`
  to None, added possibility to infer ``test_size`` from ``train_size`` in
  :class:`cross_validation.ShuffleSplit` and
  :class:`cross_validation.StratifiedShuffleSplit`.

- Renamed function :func:`sklearn.metrics.zero_one` to
  :func:`sklearn.metrics.zero_one_loss`. Be aware that the default behavior
  in :func:`sklearn.metrics.zero_one_loss` is different from
  :func:`sklearn.metrics.zero_one`: ``normalize=False`` is changed to
  ``normalize=True``.

- Renamed function :func:`metrics.zero_one_score` to
  :func:`metrics.accuracy_score`.

- :func:`datasets.make_circles` now has the same number of inner and outer points.

- In the Naive Bayes classifiers, the ``class_prior`` parameter was moved
  from ``fit`` to ``__init__``.

People
------
List of contributors for release 0.13 by number of commits.

 * 364  `Andreas Müller`_
 * 143  `Arnaud Joly`_
 * 137  `Peter Prettenhofer`_
 * 131  `Gael Varoquaux`_
 * 117  `Mathieu Blondel`_
 * 108  `Lars Buitinck`_
 * 106  Wei Li
 * 101  `Olivier Grisel`_
 *  65  `Vlad Niculae`_
 *  54  `Gilles Louppe`_
 *  40  `Jaques Grobler`_
 *  38  `Alexandre Gramfort`_
 *  30  `Rob Zinkov`_
 *  19  Aymeric Masurelle
 *  18  Andrew Winterman
 *  17  `Fabian Pedregosa`_
 *  17  Nelle Varoquaux
 *  16  `Christian Osendorfer`_
 *  14  `Daniel Nouri`_
 *  13  :user:`Virgile Fritsch <VirgileFritsch>`
 *  13  syhw
 *  12  `Satrajit Ghosh`_
 *  10  Corey Lynch
 *  10  Kyle Beauchamp
 *   9  Brian Cheung
 *   9  Immanuel Bayer
 *   9  mr.Shu
 *   8  Conrad Lee
 *   8  `James Bergstra`_
 *   7  Tadej Janež
 *   6  Brian Cajes
 *   6  `Jake Vanderplas`_
 *   6  Michael
 *   6  Noel Dawe
 *   6  Tiago Nunes
 *   6  cow
 *   5  Anze
 *   5  Shiqiao Du
 *   4  Christian Jauvin
 *   4  Jacques Kvam
 *   4  Richard T. Guy
 *   4  `Robert Layton`_
 *   3  Alexandre Abraham
 *   3  Doug Coleman
 *   3  Scott Dickerson
 *   2  ApproximateIdentity
 *   2  John Benediktsson
 *   2  Mark Veronda
 *   2  Matti Lyra
 *   2  Mikhail Korobov
 *   2  Xinfan Meng
 *   1  Alejandro Weinstein
 *   1  `Alexandre Passos`_
 *   1  Christoph Deil
 *   1  Eugene Nizhibitsky
 *   1  Kenneth C. Arnold
 *   1  Luis Pedro Coelho
 *   1  Miroslav Batchkarov
 *   1  Pavel
 *   1  Sebastian Berg
 *   1  Shaun Jackman
 *   1  Subhodeep Moitra
 *   1  bob
 *   1  dengemann
 *   1  emanuele
 *   1  x006