File: v0.23.rst

package info (click to toggle)
scikit-learn 0.23.2-5
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, sid
  • size: 21,892 kB
  • sloc: python: 132,020; cpp: 5,765; javascript: 2,201; ansic: 831; makefile: 213; sh: 44
file content (854 lines) | stat: -rw-r--r-- 38,380 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
.. include:: _contributors.rst

.. currentmodule:: sklearn

.. _changes_0_23_2:

Version 0.23.2
==============

**August 3 2020**

Changed models
--------------

The following estimators and functions, when fit with the same data and
parameters, may produce different models from the previous version. This often
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
random sampling procedures.

- |Fix| ``inertia_`` attribute of :class:`cluster.KMeans` and
  :class:`cluster.MiniBatchKMeans`.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we
cannot assure that this list is complete.)

Changelog
---------

:mod:`sklearn.cluster`
......................

- |Fix| Fixed a bug in :class:`cluster.KMeans` where rounding errors could
  prevent convergence to be declared when `tol=0`. :pr:`17959` by
  :user:`Jérémie du Boisberranger <jeremiedbb>`.

- |Fix| Fixed a bug in :class:`cluster.KMeans` and
  :class:`cluster.MiniBatchKMeans` where the reported inertia was incorrectly
  weighted by the sample weights. :pr:`17848` by
  :user:`Jérémie du Boisberranger <jeremiedbb>`.

- |Fix| Fixed a bug in :class:`cluster.MeanShift` with `bin_seeding=True`. When
  the estimated bandwidth is 0, the behavior is equivalent to
  `bin_seeding=False`.
  :pr:`17742` by :user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Fix| Fixed a bug in :class:`cluster.AffinityPropagation`, that
  gives incorrect clusters when the array dtype is float32.
  :pr:`17995` by :user:`Thomaz Santana  <Wikilicious>` and
  :user:`Amanda Dsouza <amy12xx>`.

:mod:`sklearn.decomposition`
............................

- |Fix| Fixed a bug in
  :func:`decomposition.MiniBatchDictionaryLearning.partial_fit` which should
  update the dictionary by iterating only once over a mini-batch.
  :pr:`17433` by :user:`Chiara Marmo <cmarmo>`.

- |Fix| Avoid overflows on Windows in
  :func:`decomposition.IncrementalPCA.partial_fit` for large ``batch_size`` and
  ``n_samples`` values.
  :pr:`17985` by :user:`Alan Butler <aldee153>` and
  :user:`Amanda Dsouza <amy12xx>`.

:mod:`sklearn.ensemble`
.......................

- |Fix| Fixed bug in :class:`ensemble.MultinomialDeviance` where the
  average of logloss was incorrectly calculated as sum of logloss.
  :pr:`17694` by :user:`Markus Rempfler <rempfler>` and
  :user:`Tsutomu Kusanagi <t-kusanagi2>`.

- |Fix| Fixes :class:`ensemble.StackingClassifier` and
  :class:`ensemble.StackingRegressor` compatibility with estimators that
  do not define `n_features_in_`. :pr:`17357` by `Thomas Fan`_.

:mod:`sklearn.feature_extraction`
.................................

- |Fix| Fixes bug in :class:`feature_extraction.text.CountVectorizer` where
  sample order invariance was broken when `max_features` was set and features
  had the same count. :pr:`18016` by `Thomas Fan`_, `Roman Yurchak`_, and
  `Joel Nothman`_.

:mod:`sklearn.linear_model`
...........................

- |Fix| :func:`linear_model.lars_path` does not overwrite `X` when
  `X_copy=True` and `Gram='auto'`. :pr:`17914` by `Thomas Fan`_.

:mod:`sklearn.manifold`
.......................

- |Fix| Fixed a bug where :func:`metrics.pairwise_distances` would raise an
  error if ``metric='seuclidean'`` and ``X`` is not type ``np.float64``.
  :pr:`15730` by :user:`Forrest Koch <ForrestCKoch>`.

:mod:`sklearn.metrics`
......................

- |Fix| Fixed a bug in :func:`metrics.mean_squared_error` where the
  average of multiple RMSE values was incorrectly calculated as the root of the
  average of multiple MSE values.
  :pr:`17309` by :user:`Swier Heeres <swierh>`.

:mod:`sklearn.pipeline`
.......................

- |Fix| :class:`pipeline.FeatureUnion` raises a deprecation warning when
  `None` is included in `transformer_list`. :pr:`17360` by `Thomas Fan`_.

:mod:`sklearn.utils`
....................

- |Fix| Fix :func:`utils.estimator_checks.check_estimator` so that all test
  cases support the `binary_only` estimator tag.
  :pr:`17812` by :user:`Bruno Charron <brcharron>`.

.. _changes_0_23_1:

Version 0.23.1
==============

**May 18 2020**

Changelog
---------

:mod:`sklearn.cluster`
......................

- |Efficiency| :class:`cluster.KMeans` efficiency has been improved for very
  small datasets. In particular it cannot spawn idle threads any more.
  :pr:`17210` and :pr:`17235` by :user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Fix| Fixed a bug in :class:`cluster.KMeans` where the sample weights
  provided by the user were modified in place. :pr:`17204` by
  :user:`Jeremie du Boisberranger <jeremiedbb>`.


Miscellaneous
.............

- |Fix| Fixed a bug in the `repr` of third-party estimators that use a
  `**kwargs` parameter in their constructor, when `changed_only` is True
  which is now the default. :pr:`17205` by `Nicolas Hug`_.

.. _changes_0_23:

Version 0.23.0
==============

**May 12 2020**

For a short description of the main highlights of the release, please
refer to
:ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_0_23_0.py`.


.. include:: changelog_legend.inc

Enforcing keyword-only arguments
--------------------------------

In an effort to promote clear and non-ambiguous use of the library, most
constructor and function parameters are now expected to be passed as keyword
arguments (i.e. using the `param=value` syntax) instead of positional. To
ease the transition, a `FutureWarning` is raised if a keyword-only parameter
is used as positional. In version 0.25, these parameters will be strictly
keyword-only, and a `TypeError` will be raised.
:issue:`15005` by `Joel Nothman`_, `Adrin Jalali`_, `Thomas Fan`_, and
`Nicolas Hug`_. See `SLEP009
<https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html>`_
for more details.

Changed models
--------------

The following estimators and functions, when fit with the same data and
parameters, may produce different models from the previous version. This often
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
random sampling procedures.

- |Fix| :class:`ensemble.BaggingClassifier`, :class:`ensemble.BaggingRegressor`,
  and :class:`ensemble.IsolationForest`.
- |Fix| :class:`cluster.KMeans` with ``algorithm="elkan"`` and
  ``algorithm="full"``.
- |Fix| :class:`cluster.Birch`
- |Fix| :func:`compose.ColumnTransformer.get_feature_names`
- |Fix| :func:`compose.ColumnTransformer.fit`
- |Fix| :func:`datasets.make_multilabel_classification`
- |Fix| :class:`decomposition.PCA` with `n_components='mle'`
- |Enhancement| :class:`decomposition.NMF` and
  :func:`decomposition.non_negative_factorization` with float32 dtype input.
- |Fix| :func:`decomposition.KernelPCA.inverse_transform`
- |API| :class:`ensemble.HistGradientBoostingClassifier` and
  :class:`ensemble.HistGradientBoostingRegrerssor`
- |Fix| ``estimator_samples_`` in :class:`ensemble.BaggingClassifier`,
  :class:`ensemble.BaggingRegressor` and :class:`ensemble.IsolationForest`
- |Fix| :class:`ensemble.StackingClassifier` and
  :class:`ensemble.StackingRegressor` with `sample_weight`
- |Fix| :class:`gaussian_process.GaussianProcessRegressor`
- |Fix| :class:`linear_model.RANSACRegressor` with ``sample_weight``.
- |Fix| :class:`linear_model.RidgeClassifierCV`
- |Fix| :func:`metrics.mean_squared_error` with `squared` and
  `multioutput='raw_values'`.
- |Fix| :func:`metrics.mutual_info_score` with negative scores.
- |Fix| :func:`metrics.confusion_matrix` with zero length `y_true` and `y_pred`
- |Fix| :class:`neural_network.MLPClassifier`
- |Fix| :class:`preprocessing.StandardScaler` with `partial_fit` and sparse
  input.
- |Fix| :class:`preprocessing.Normalizer` with norm='max'
- |Fix| Any model using the :func:`svm.libsvm` or the :func:`svm.liblinear` solver,
  including :class:`svm.LinearSVC`, :class:`svm.LinearSVR`,
  :class:`svm.NuSVC`, :class:`svm.NuSVR`, :class:`svm.OneClassSVM`,
  :class:`svm.SVC`, :class:`svm.SVR`, :class:`linear_model.LogisticRegression`.
- |Fix| :class:`tree.DecisionTreeClassifier`, :class:`tree.ExtraTreeClassifier` and
  :class:`ensemble.GradientBoostingClassifier` as well as ``predict`` method of
  :class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeRegressor`, and
  :class:`ensemble.GradientBoostingRegressor` and read-only float32 input in
  ``predict``, ``decision_path`` and ``predict_proba``.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we
cannot assure that this list is complete.)

Changelog
---------

..
    Entries should be grouped by module (in alphabetic order) and prefixed with
    one of the labels: |MajorFeature|, |Feature|, |Efficiency|, |Enhancement|,
    |Fix| or |API| (see whats_new.rst for descriptions).
    Entries should be ordered by those labels (e.g. |Fix| after |Efficiency|).
    Changes not specific to a module should be listed under *Multiple Modules*
    or *Miscellaneous*.
    Entries should end with:
    :pr:`123456` by :user:`Joe Bloggs <joeongithub>`.
    where 123456 is the *pull request* number, not the issue number.

:mod:`sklearn.cluster`
......................

- |Efficiency| :class:`cluster.Birch` implementation of the predict method
  avoids high memory footprint by calculating the distances matrix using
  a chunked scheme.
  :pr:`16149` by :user:`Jeremie du Boisberranger <jeremiedbb>` and
  :user:`Alex Shacked <alexshacked>`.

- |Efficiency| |MajorFeature| The critical parts of :class:`cluster.KMeans`
  have a more optimized implementation. Parallelism is now over the data
  instead of over initializations allowing better scalability. :pr:`11950` by
  :user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Enhancement| :class:`cluster.KMeans` now supports sparse data when
  `solver = "elkan"`. :pr:`11950` by
  :user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Enhancement| :class:`cluster.AgglomerativeClustering` has a faster and more
  memory efficient implementation of single linkage clustering.
  :pr:`11514` by :user:`Leland McInnes <lmcinnes>`.

- |Fix| :class:`cluster.KMeans` with ``algorithm="elkan"`` now converges with
  ``tol=0`` as with the default ``algorithm="full"``. :pr:`16075` by
  :user:`Erich Schubert <kno10>`.

- |Fix| Fixed a bug in :class:`cluster.Birch` where the `n_clusters` parameter
  could not have a `np.int64` type. :pr:`16484`
  by :user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Fix| :class:`cluster.AgglomerativeCluClustering` add specific error when
  distance matrix is not square and `affinity=precomputed`.
  :pr:`16257` by :user:`Simona Maggio <simonamaggio>`.

- |API| The ``n_jobs`` parameter of :class:`cluster.KMeans`,
  :class:`cluster.SpectralCoclustering` and
  :class:`cluster.SpectralBiclustering` is deprecated. They now use OpenMP
  based parallelism. For more details on how to control the number of threads,
  please refer to our :ref:`parallelism` notes. :pr:`11950` by
  :user:`Jeremie du Boisberranger <jeremiedbb>`.

- |API| The ``precompute_distances`` parameter of :class:`cluster.KMeans` is
  deprecated. It has no effect. :pr:`11950` by
  :user:`Jeremie du Boisberranger <jeremiedbb>`.

- |API| The ``random_state`` parameter has been added to
  :class:`cluster.AffinityPropagation`. :pr:`16801` by :user:`rcwoolston`
  and :user:`Chiara Marmo <cmarmo>`.

:mod:`sklearn.compose`
......................

- |Efficiency| :class:`compose.ColumnTransformer` is now faster when working
  with dataframes and strings are used to specific subsets of data for
  transformers. :pr:`16431` by `Thomas Fan`_.

- |Enhancement| :class:`compose.ColumnTransformer` method ``get_feature_names``
  now supports `'passthrough'` columns, with the feature name being either
  the column name for a dataframe, or `'xi'` for column index `i`.
  :pr:`14048` by :user:`Lewis Ball <lrjball>`.

- |Fix| :class:`compose.ColumnTransformer` method ``get_feature_names`` now
  returns correct results when one of the transformer steps applies on an
  empty list of columns :pr:`15963` by `Roman Yurchak`_.

- |Fix| :func:`compose.ColumnTransformer.fit` will error when selecting
  a column name that is not unique in the dataframe. :pr:`16431` by
  `Thomas Fan`_.

:mod:`sklearn.datasets`
.......................

- |Efficiency| :func:`datasets.fetch_openml` has reduced memory usage because
  it no longer stores the full dataset text stream in memory. :pr:`16084` by
  `Joel Nothman`_.

- |Feature| :func:`datasets.fetch_california_housing` now supports
  heterogeneous data using pandas by setting `as_frame=True`. :pr:`15950`
  by :user:`Stephanie Andrews <gitsteph>` and
  :user:`Reshama Shaikh <reshamas>`.

- |Feature| embedded dataset loaders :func:`load_breast_cancer`,
  :func:`load_diabetes`, :func:`load_digits`, :func:`load_iris`,
  :func:`load_linnerud` and :func:`load_wine` now support loading as a pandas
  ``DataFrame`` by setting `as_frame=True`. :pr:`15980` by :user:`wconnell` and
  :user:`Reshama Shaikh <reshamas>`.

- |Enhancement| Added ``return_centers`` parameter  in
  :func:`datasets.make_blobs`, which can be used to return
  centers for each cluster.
  :pr:`15709` by :user:`shivamgargsya` and
  :user:`Venkatachalam N <venkyyuvy>`.

- |Enhancement| Functions :func:`datasets.make_circles` and
  :func:`datasets.make_moons` now accept two-element tuple.
  :pr:`15707` by :user:`Maciej J Mikulski <mjmikulski>`.

- |Fix| :func:`datasets.make_multilabel_classification` now generates
  `ValueError` for arguments `n_classes < 1` OR `length < 1`.
  :pr:`16006` by :user:`Rushabh Vasani <rushabh-v>`.

- |API| The `StreamHandler` was removed from `sklearn.logger` to avoid
  double logging of messages in common cases where a hander is attached
  to the root logger, and to follow the Python logging documentation
  recommendation for libraries to leave the log message handling to
  users and application code. :pr:`16451` by :user:`Christoph Deil <cdeil>`.

:mod:`sklearn.decomposition`
............................

- |Enhancement| :class:`decomposition.NMF` and
  :func:`decomposition.non_negative_factorization` now preserves float32 dtype.
  :pr:`16280` by :user:`Jeremie du Boisberranger <jeremiedbb>`.

- |Enhancement| :func:`TruncatedSVD.transform` is now faster on given sparse
  ``csc`` matrices. :pr:`16837` by :user:`wornbb`.

- |Fix| :class:`decomposition.PCA` with a float `n_components` parameter, will
  exclusively choose the components that explain the variance greater than
  `n_components`. :pr:`15669` by :user:`Krishna Chaitanya <krishnachaitanya9>`

- |Fix| :class:`decomposition.PCA` with `n_components='mle'` now correctly
  handles small eigenvalues, and does not infer 0 as the correct number of
  components. :pr:`16224` by :user:`Lisa Schwetlick <lschwetlick>`, and
  :user:`Gelavizh Ahmadi <gelavizh1>` and :user:`Marija Vlajic Wheeler
  <marijavlajic>` and :pr:`16841` by `Nicolas Hug`_.

- |Fix| :class:`decomposition.KernelPCA` method ``inverse_transform`` now
  applies the correct inverse transform to the transformed data. :pr:`16655`
  by :user:`Lewis Ball <lrjball>`.

- |Fix| Fixed bug that was causing :class:`decomposition.KernelPCA` to sometimes
  raise `invalid value encountered in multiply` during `fit`.
  :pr:`16718` by :user:`Gui Miotto <gui-miotto>`.

- |Feature| Added `n_components_` attribute to :class:`decomposition.SparsePCA`
  and :class:`decomposition.MiniBatchSparsePCA`. :pr:`16981` by
  :user:`Mateusz Górski <Reksbril>`.

:mod:`sklearn.ensemble`
.......................

- |MajorFeature|  :class:`ensemble.HistGradientBoostingClassifier` and
  :class:`ensemble.HistGradientBoostingRegressor` now support
  :term:`sample_weight`. :pr:`14696` by `Adrin Jalali`_ and `Nicolas Hug`_.

- |Feature| Early stopping in
  :class:`ensemble.HistGradientBoostingClassifier` and
  :class:`ensemble.HistGradientBoostingRegressor` is now determined with a
  new `early_stopping` parameter instead of `n_iter_no_change`. Default value
  is 'auto', which enables early stopping if there are at least 10,000
  samples in the training set. :pr:`14516` by :user:`Johann Faouzi
  <johannfaouzi>`.

- |MajorFeature| :class:`ensemble.HistGradientBoostingClassifier` and
  :class:`ensemble.HistGradientBoostingRegressor` now support monotonic
  constraints, useful when features are supposed to have a positive/negative
  effect on the target. :pr:`15582` by `Nicolas Hug`_.

- |API| Added boolean `verbose` flag to classes:
  :class:`ensemble.VotingClassifier` and :class:`ensemble.VotingRegressor`.
  :pr:`16069` by :user:`Sam Bail <spbail>`,
  :user:`Hanna Bruce MacDonald <hannahbrucemacdonald>`,
  :user:`Reshama Shaikh <reshamas>`, and
  :user:`Chiara Marmo <cmarmo>`.

- |API| Fixed a bug in :class:`ensemble.HistGradientBoostingClassifier` and
  :class:`ensemble.HistGradientBoostingRegrerssor` that would not respect the
  `max_leaf_nodes` parameter if the criteria was reached at the same time as
  the `max_depth` criteria. :pr:`16183` by `Nicolas Hug`_.

- |Fix|  Changed the convention for `max_depth` parameter of
  :class:`ensemble.HistGradientBoostingClassifier` and
  :class:`ensemble.HistGradientBoostingRegressor`. The depth now corresponds to
  the number of edges to go from the root to the deepest leaf.
  Stumps (trees with one split) are now allowed.
  :pr:`16182` by :user:`Santhosh B <santhoshbala18>`

- |Fix| Fixed a bug in :class:`ensemble.BaggingClassifier`,
  :class:`ensemble.BaggingRegressor` and :class:`ensemble.IsolationForest`
  where the attribute `estimators_samples_` did not generate the proper indices
  used during `fit`.
  :pr:`16437` by :user:`Jin-Hwan CHO <chofchof>`.

- |Fix| Fixed a bug in :class:`ensemble.StackingClassifier` and
  :class:`ensemble.StackingRegressor` where the `sample_weight`
  argument was not being passed to `cross_val_predict` when
  evaluating the base estimators on cross-validation folds
  to obtain the input to the meta estimator.
  :pr:`16539` by :user:`Bill DeRose <wderose>`.

- |Feature| Added additional option `loss="poisson"` to
  :class:`ensemble.HistGradientBoostingRegressor`, which adds Poisson deviance
  with log-link useful for modeling count data.
  :pr:`16692` by :user:`Christian Lorentzen <lorentzenchr>`

- |Fix| Fixed a bug where :class:`ensemble.HistGradientBoostingRegressor` and
  :class:`ensemble.HistGradientBoostingClassifier` would fail with multiple
  calls to fit when `warm_start=True`, `early_stopping=True`, and there is no
  validation set. :pr:`16663` by `Thomas Fan`_.

:mod:`sklearn.feature_extraction`
.................................

- |Efficiency| :class:`feature_extraction.text.CountVectorizer` now sorts
  features after pruning them by document frequency. This improves performances
  for datasets with large vocabularies combined with ``min_df`` or ``max_df``.
  :pr:`15834` by :user:`Santiago M. Mola <smola>`.

:mod:`sklearn.feature_selection`
................................

- |Enhancement| Added support for multioutput data in
  :class:`feature_selection.RFE` and :class:`feature_selection.RFECV`.
  :pr:`16103` by :user:`Divyaprabha M <divyaprabha123>`.

- |API| Adds :class:`feature_selection.SelectorMixin` back to public API.
  :pr:`16132` by :user:`trimeta`.

:mod:`sklearn.gaussian_process`
...............................

- |Enhancement| :func:`gaussian_process.kernels.Matern` returns the RBF kernel when ``nu=np.inf``.
  :pr:`15503` by :user:`Sam Dixon <sam-dixon>`.

- |Fix| Fixed bug in :class:`gaussian_process.GaussianProcessRegressor` that
  caused predicted standard deviations to only be between 0 and 1 when
  WhiteKernel is not used. :pr:`15782`
  by :user:`plgreenLIRU`.

:mod:`sklearn.impute`
.....................

- |Enhancement| :class:`impute.IterativeImputer` accepts both scalar and array-like inputs for
  ``max_value`` and ``min_value``. Array-like inputs allow a different max and min to be specified
  for each feature. :pr:`16403` by :user:`Narendra Mukherjee <narendramukherjee>`.

- |Enhancement| :class:`impute.SimpleImputer`, :class:`impute.KNNImputer`, and
  :class:`impute.IterativeImputer` accepts pandas' nullable integer dtype with
  missing values. :pr:`16508` by `Thomas Fan`_.

:mod:`sklearn.inspection`
.........................

- |Feature| :func:`inspection.partial_dependence` and
  :func:`inspection.plot_partial_dependence` now support the fast 'recursion'
  method for :class:`ensemble.RandomForestRegressor` and
  :class:`tree.DecisionTreeRegressor`. :pr:`15864` by
  `Nicolas Hug`_.

:mod:`sklearn.linear_model`
...........................

- |MajorFeature| Added generalized linear models (GLM) with non normal error
  distributions, including :class:`linear_model.PoissonRegressor`,
  :class:`linear_model.GammaRegressor` and :class:`linear_model.TweedieRegressor`
  which use Poisson, Gamma and Tweedie distributions respectively.
  :pr:`14300` by :user:`Christian Lorentzen <lorentzenchr>`, `Roman Yurchak`_,
  and `Olivier Grisel`_.

- |MajorFeature| Support of `sample_weight` in
  :class:`linear_model.ElasticNet` and :class:`linear_model.Lasso` for dense
  feature matrix `X`. :pr:`15436` by :user:`Christian Lorentzen
  <lorentzenchr>`.

- |Efficiency| :class:`linear_model.RidgeCV` and
  :class:`linear_model.RidgeClassifierCV` now does not allocate a
  potentially large array to store dual coefficients for all hyperparameters
  during its `fit`, nor an array to store all error or LOO predictions unless
  `store_cv_values` is `True`.
  :pr:`15652` by :user:`Jérôme Dockès <jeromedockes>`.

- |Enhancement| :class:`linear_model.LassoLars` and
  :class:`linear_model.Lars` now support a `jitter` parameter that adds
  random noise to the target. This might help with stability in some edge
  cases. :pr:`15179` by :user:`angelaambroz`.

- |Fix| Fixed a bug where if a `sample_weight` parameter was passed to the fit
  method of :class:`linear_model.RANSACRegressor`, it would not be passed to
  the wrapped `base_estimator` during the fitting of the final model.
  :pr:`15773` by :user:`Jeremy Alexandre <J-A16>`.

- |Fix| add `best_score_` attribute to :class:`linear_model.RidgeCV` and
  :class:`linear_model.RidgeClassifierCV`.
  :pr:`15653` by :user:`Jérôme Dockès <jeromedockes>`.

- |Fix| Fixed a bug in :class:`linear_model.RidgeClassifierCV` to pass a
  specific scoring strategy. Before the internal estimator outputs score
  instead of predictions.
  :pr:`14848` by :user:`Venkatachalam N <venkyyuvy>`.

- |Fix| :class:`linear_model.LogisticRegression` will now avoid an unnecessary
  iteration when `solver='newton-cg'` by checking for inferior or equal instead
  of strictly inferior for maximum of `absgrad` and `tol` in `utils.optimize._newton_cg`.
  :pr:`16266` by :user:`Rushabh Vasani <rushabh-v>`.

- |API| Deprecated public attributes `standard_coef_`, `standard_intercept_`,
  `average_coef_`, and `average_intercept_` in
  :class:`linear_model.SGDClassifier`,
  :class:`linear_model.SGDRegressor`,
  :class:`linear_model.PassiveAggressiveClassifier`,
  :class:`linear_model.PassiveAggressiveRegressor`.
  :pr:`16261` by :user:`Carlos Brandt <chbrandt>`.

- |Fix| |Efficiency| :class:`linear_model.ARDRegression` is more stable and
  much faster when `n_samples > n_features`. It can now scale to hundreds of
  thousands of samples. The stability fix might imply changes in the number
  of non-zero coefficients and in the predicted output. :pr:`16849` by
  `Nicolas Hug`_.

- |Fix| Fixed a bug in :class:`linear_model.ElasticNetCV`,
  :class:`linear_model.MultitaskElasticNetCV`, :class:`linear_model.LassoCV`
  and :class:`linear_model.MultitaskLassoCV` where fitting would fail when
  using joblib loky backend. :pr:`14264` by
  :user:`Jérémie du Boisberranger <jeremiedbb>`.

- |Efficiency| Speed up :class:`linear_model.MultiTaskLasso`,
  :class:`linear_model.MultiTaskLassoCV`, :class:`linear_model.MultiTaskElasticNet`,
  :class:`linear_model.MultiTaskElasticNetCV` by avoiding slower
  BLAS Level 2 calls on small arrays
  :pr:`17021` by :user:`Alex Gramfort <agramfort>` and
  :user:`Mathurin Massias <mathurinm>`.

:mod:`sklearn.metrics`
......................

- |Enhancement| :func:`metrics.pairwise.pairwise_distances_chunked` now allows
  its ``reduce_func`` to not have a return value, enabling in-place operations.
  :pr:`16397` by `Joel Nothman`_.

- |Fix| Fixed a bug in :func:`metrics.mean_squared_error` to not ignore
  argument `squared` when argument `multioutput='raw_values'`.
  :pr:`16323` by :user:`Rushabh Vasani <rushabh-v>`

- |Fix| Fixed a bug in :func:`metrics.mutual_info_score` where negative
  scores could be returned. :pr:`16362` by `Thomas Fan`_.

- |Fix| Fixed a bug in :func:`metrics.confusion_matrix` that would raise
  an error when `y_true` and `y_pred` were length zero and `labels` was
  not `None`. In addition, we raise an error when an empty list is given to
  the `labels` parameter.
  :pr:`16442` by :user:`Kyle Parsons <parsons-kyle-89>`.

- |API| Changed the formatting of values in
  :meth:`metrics.ConfusionMatrixDisplay.plot` and
  :func:`metrics.plot_confusion_matrix` to pick the shorter format (either '2g'
  or 'd'). :pr:`16159` by :user:`Rick Mackenbach <Rick-Mackenbach>` and
  `Thomas Fan`_.

- |API| From version 0.25, :func:`metrics.pairwise.pairwise_distances` will no
  longer automatically compute the ``VI`` parameter for Mahalanobis distance
  and the ``V`` parameter for seuclidean distance if ``Y`` is passed. The user
  will be expected to compute this parameter on the training data of their
  choice and pass it to `pairwise_distances`. :pr:`16993` by `Joel Nothman`_.

:mod:`sklearn.model_selection`
..............................

- |Enhancement| :class:`model_selection.GridSearchCV` and
  :class:`model_selection.RandomizedSearchCV` yields stack trace information
  in fit failed warning messages in addition to previously emitted
  type and details.
  :pr:`15622` by :user:`Gregory Morse <GregoryMorse>`.

- |Fix| :func:`model_selection.cross_val_predict` supports
  `method="predict_proba"` when `y=None`. :pr:`15918` by
  :user:`Luca Kubin <lkubin>`.

- |Fix| :func:`model_selection.fit_grid_point` is deprecated in 0.23 and will
  be removed in 0.25. :pr:`16401` by
  :user:`Arie Pratama Sutiono <ariepratama>`

:mod:`sklearn.multioutput`
..........................

- |Enhancement| :class:`multioutput.RegressorChain` now supports `fit_params`
  for `base_estimator` during `fit`.
  :pr:`16111` by :user:`Venkatachalam N <venkyyuvy>`.

:mod:`sklearn.naive_bayes`
.............................

- |Fix| A correctly formatted error message is shown in
  :class:`naive_bayes.CategoricalNB` when the number of features in the input
  differs between `predict` and `fit`.
  :pr:`16090` by :user:`Madhura Jayaratne <madhuracj>`.

:mod:`sklearn.neural_network`
.............................

- |Efficiency| :class:`neural_network.MLPClassifier` and
  :class:`neural_network.MLPRegressor` has reduced memory footprint when using
  stochastic solvers, `'sgd'` or `'adam'`, and `shuffle=True`. :pr:`14075` by
  :user:`meyer89`.

- |Fix| Increases the numerical stability of the logistic loss function in
  :class:`neural_network.MLPClassifier` by clipping the probabilities.
  :pr:`16117` by `Thomas Fan`_.

:mod:`sklearn.inspection`
.........................

- |Enhancement| :class:`inspection.PartialDependenceDisplay` now exposes the
  deciles lines as attributes so they can be hidden or customized. :pr:`15785`
  by `Nicolas Hug`_

:mod:`sklearn.preprocessing`
............................

- |Feature| argument `drop` of :class:`preprocessing.OneHotEncoder`
  will now accept value 'if_binary' and will drop the first category of
  each feature with two categories. :pr:`16245`
  by :user:`Rushabh Vasani <rushabh-v>`.

- |Enhancement| :class:`preprocessing.OneHotEncoder`'s `drop_idx_` ndarray
  can now contain `None`, where `drop_idx_[i] = None` means that no category
  is dropped for index `i`. :pr:`16585` by :user:`Chiara Marmo <cmarmo>`.

- |Enhancement| :class:`preprocessing.MaxAbsScaler`,
  :class:`preprocessing.MinMaxScaler`, :class:`preprocessing.StandardScaler`,
  :class:`preprocessing.PowerTransformer`,
  :class:`preprocessing.QuantileTransformer`,
  :class:`preprocessing.RobustScaler` now supports pandas' nullable integer
  dtype with missing values. :pr:`16508` by `Thomas Fan`_.

- |Efficiency| :class:`preprocessing.OneHotEncoder` is now faster at
  transforming. :pr:`15762` by `Thomas Fan`_.

- |Fix| Fix a bug in :class:`preprocessing.StandardScaler` which was incorrectly
  computing statistics when calling `partial_fit` on sparse inputs.
  :pr:`16466` by :user:`Guillaume Lemaitre <glemaitre>`.

- |Fix| Fix a bug in :class:`preprocessing.Normalizer` with norm='max',
  which was not taking the absolute value of the maximum values before
  normalizing the vectors. :pr:`16632` by
  :user:`Maura Pintor <Maupin1991>` and :user:`Battista Biggio <bbiggio>`.

:mod:`sklearn.semi_supervised`
..............................

- |Fix| :class:`semi_supervised.LabelSpreading` and
  :class:`semi_supervised.LabelPropagation` avoids divide by zero warnings
  when normalizing `label_distributions_`. :pr:`15946` by :user:`ngshya`.

:mod:`sklearn.svm`
..................

- |Fix| |Efficiency| Improved ``libsvm`` and ``liblinear`` random number
  generators used to randomly select coordinates in the coordinate descent
  algorithms. Platform-dependent C ``rand()`` was used, which is only able to
  generate numbers up to ``32767`` on windows platform (see this `blog
  post <https://codeforces.com/blog/entry/61587>`_) and also has poor
  randomization power as suggested by `this presentation
  <https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful>`_.
  It was replaced with C++11 ``mt19937``, a Mersenne Twister that correctly
  generates 31bits/63bits random numbers on all platforms. In addition, the
  crude "modulo" postprocessor used to get a random number in a bounded
  interval was replaced by the tweaked Lemire method as suggested by `this blog
  post <http://www.pcg-random.org/posts/bounded-rands.html>`_.
  Any model using the :func:`svm.libsvm` or the :func:`svm.liblinear` solver,
  including :class:`svm.LinearSVC`, :class:`svm.LinearSVR`,
  :class:`svm.NuSVC`, :class:`svm.NuSVR`, :class:`svm.OneClassSVM`,
  :class:`svm.SVC`, :class:`svm.SVR`, :class:`linear_model.LogisticRegression`,
  is affected. In particular users can expect a better convergence when the
  number of samples (LibSVM) or the number of features (LibLinear) is large.
  :pr:`13511` by :user:`Sylvain Marié <smarie>`.

- |Fix| Fix use of custom kernel not taking float entries such as string
  kernels in :class:`svm.SVC` and :class:`svm.SVR`. Note that custom kennels
  are now expected to validate their input where they previously received
  valid numeric arrays.
  :pr:`11296` by `Alexandre Gramfort`_ and  :user:`Georgi Peev <georgipeev>`.

- |API| :class:`svm.SVR` and :class:`svm.OneClassSVM` attributes, `probA_` and
  `probB_`, are now deprecated as they were not useful. :pr:`15558` by
  `Thomas Fan`_.

:mod:`sklearn.tree`
...................

- |Fix| :func:`tree.plot_tree` `rotate` parameter was unused and has been
  deprecated.
  :pr:`15806` by :user:`Chiara Marmo <cmarmo>`.

- |Fix| Fix support of read-only float32 array input in ``predict``,
  ``decision_path`` and ``predict_proba`` methods of
  :class:`tree.DecisionTreeClassifier`, :class:`tree.ExtraTreeClassifier` and
  :class:`ensemble.GradientBoostingClassifier` as well as ``predict`` method of
  :class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeRegressor`, and
  :class:`ensemble.GradientBoostingRegressor`.
  :pr:`16331` by :user:`Alexandre Batisse <batalex>`.

:mod:`sklearn.utils`
....................

- |MajorFeature| Estimators can now be displayed with a rich html
  representation. This can be enabled in Jupyter notebooks by setting
  `display='diagram'` in :func:`~sklearn.set_config`. The raw html can be
  returned by using :func:`utils.estimator_html_repr`.
  :pr:`14180` by `Thomas Fan`_.

- |Enhancement| improve error message in :func:`utils.validation.column_or_1d`.
  :pr:`15926` by :user:`Loïc Estève <lesteve>`.

- |Enhancement| add warning in :func:`utils.check_array` for
  pandas sparse DataFrame.
  :pr:`16021` by :user:`Rushabh Vasani <rushabh-v>`.

- |Enhancement| :func:`utils.check_array` now constructs a sparse
  matrix from a pandas DataFrame that contains only `SparseArray` columns.
  :pr:`16728` by `Thomas Fan`_.

- |Enhancement| :func:`utils.validation.check_array` supports pandas'
  nullable integer dtype with missing values when `force_all_finite` is set to
  `False` or `'allow-nan'` in which case the data is converted to floating
  point values where `pd.NA` values are replaced by `np.nan`. As a consequence,
  all :mod:`sklearn.preprocessing` transformers that accept numeric inputs with
  missing values represented as `np.nan` now also accepts being directly fed
  pandas dataframes with `pd.Int* or `pd.Uint*` typed columns that use `pd.NA`
  as a missing value marker. :pr:`16508` by `Thomas Fan`_.

- |API| Passing classes to :func:`utils.estimator_checks.check_estimator` and
  :func:`utils.estimator_checks.parametrize_with_checks` is now deprecated,
  and support for classes will be removed in 0.24. Pass instances instead.
  :pr:`17032` by `Nicolas Hug`_.

- |API| The private utility `_safe_tags` in `utils.estimator_checks` was
  removed, hence all tags should be obtained through `estimator._get_tags()`.
  Note that Mixins like `RegressorMixin` must come *before* base classes
  in the MRO for `_get_tags()` to work properly.
  :pr:`16950` by `Nicolas Hug`_.

- |FIX| :func:`utils.all_estimators` now only returns public estimators.
  :pr:`15380` by `Thomas Fan`_.

Miscellaneous
.............

- |MajorFeature| Adds a HTML representation of estimators to be shown in
  a jupyter notebook or lab. This visualization is acitivated by setting the
  `display` option in :func:`sklearn.set_config`. :pr:`14180` by
  `Thomas Fan`_.

- |Enhancement| ``scikit-learn`` now works with ``mypy`` without errors.
  :pr:`16726` by `Roman Yurchak`_.

- |API| Most estimators now expose a `n_features_in_` attribute. This
  attribute is equal to the number of features passed to the `fit` method.
  See `SLEP010
  <https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep010/proposal.html>`_
  for details. :pr:`16112` by `Nicolas Hug`_.

- |API| Estimators now have a `requires_y` tags which is False by default
  except for estimators that inherit from `~sklearn.base.RegressorMixin` or
  `~sklearn.base.ClassifierMixin`. This tag is used to ensure that a proper
  error message is raised when y was expected but None was passed.
  :pr:`16622` by `Nicolas Hug`_.

- |API| The default setting `print_changed_only` has been changed from False
  to True. This means that the `repr` of estimators is now more concise and
  only shows the parameters whose default value has been changed when
  printing an estimator. You can restore the previous behaviour by using
  `sklearn.set_config(print_changed_only=False)`. Also, note that it is
  always possible to quickly inspect the parameters of any estimator using
  `est.get_params(deep=False)`. :pr:`17061` by `Nicolas Hug`_.

Code and Documentation Contributors
-----------------------------------

Thanks to everyone who has contributed to the maintenance and improvement of the
project since version 0.22, including:

Abbie Popa, Adrin Jalali, Aleksandra Kocot, Alexandre Batisse, Alexandre
Gramfort, Alex Henrie, Alex Itkes, Alex Liang, alexshacked, Alonso Silva
Allende, Ana Casado, Andreas Mueller, Angela Ambroz, Ankit810, Arie Pratama
Sutiono, Arunav Konwar, Baptiste Maingret, Benjamin Beier Liu, bernie gray,
Bharathi Srinivasan, Bharat Raghunathan, Bibhash Chandra Mitra, Brian Wignall,
brigi, Brigitta Sipőcz, Carlos H Brandt, CastaChick, castor, cgsavard, Chiara
Marmo, Chris Gregory, Christian Kastner, Christian Lorentzen, Corrie
Bartelheimer, Daniël van Gelder, Daphne, David Breuer, david-cortes, dbauer9,
Divyaprabha M, Edward Qian, Ekaterina Borovikova, ELNS, Emily Taylor, Erich
Schubert, Eric Leung, Evgeni Chasnovski, Fabiana, Facundo Ferrín, Fan,
Franziska Boenisch, Gael Varoquaux, Gaurav Sharma, Geoffrey Bolmier, Georgi
Peev, gholdman1, Gonthier Nicolas, Gregory Morse, Gregory R. Lee, Guillaume
Lemaitre, Gui Miotto, Hailey Nguyen, Hanmin Qin, Hao Chun Chang, HaoYin, Hélion
du Mas des Bourboux, Himanshu Garg, Hirofumi Suzuki, huangk10, Hugo van
Kemenade, Hye Sung Jung, indecisiveuser, inderjeet, J-A16, Jérémie du
Boisberranger, Jin-Hwan CHO, JJmistry, Joel Nothman, Johann Faouzi, Jon Haitz
Legarreta Gorroño, Juan Carlos Alfaro Jiménez, judithabk6, jumon, Kathryn
Poole, Katrina Ni, Kesshi Jordan, Kevin Loftis, Kevin Markham,
krishnachaitanya9, Lam Gia Thuan, Leland McInnes, Lisa Schwetlick, lkubin, Loic
Esteve, lopusz, lrjball, lucgiffon, lucyleeow, Lucy Liu, Lukas Kemkes, Maciej J
Mikulski, Madhura Jayaratne, Magda Zielinska, maikia, Mandy Gu, Manimaran,
Manish Aradwad, Maren Westermann, Maria, Mariana Meireles, Marie Douriez,
Marielle, Mateusz Górski, mathurinm, Matt Hall, Maura Pintor, mc4229, meyer89,
m.fab, Michael Shoemaker, Michał Słapek, Mina Naghshhnejad, mo, Mohamed
Maskani, Mojca Bertoncelj, narendramukherjee, ngshya, Nicholas Won, Nicolas
Hug, nicolasservel, Niklas, @nkish, Noa Tamir, Oleksandr Pavlyk, olicairns,
Oliver Urs Lenz, Olivier Grisel, parsons-kyle-89, Paula, Pete Green, Pierre
Delanoue, pspachtholz, Pulkit Mehta, Qizhi  Jiang, Quang Nguyen, rachelcjordan,
raduspaimoc, Reshama Shaikh, Riccardo Folloni, Rick Mackenbach, Ritchie Ng,
Roman Feldbauer, Roman Yurchak, Rory Hartong-Redden, Rüdiger Busche, Rushabh
Vasani, Sambhav Kothari, Samesh Lakhotia, Samuel Duan, SanthoshBala18, Santiago
M. Mola, Sarat Addepalli, scibol, Sebastian Kießling, SergioDSR, Sergul Aydore,
Shiki-H, shivamgargsya, SHUBH CHATTERJEE, Siddharth Gupta, simonamaggio,
smarie, Snowhite, stareh, Stephen Blystone, Stephen Marsh, Sunmi Yoon,
SylvainLan, talgatomarov, tamirlan1, th0rwas, theoptips, Thomas J Fan, Thomas
Li, Thomas Schmitt, Tim Nonner, Tim Vink, Tiphaine Viard, Tirth Patel, Titus
Christian, Tom Dupré la Tour, trimeta, Vachan D A, Vandana Iyer, Venkatachalam
N, waelbenamara, wconnell, wderose, wenliwyan, Windber, wornbb, Yu-Hang "Maxin"
Tang