1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277
|
.. include:: _contributors.rst
.. currentmodule:: sklearn
.. _release_notes_1_0:
===========
Version 1.0
===========
For a short description of the main highlights of the release, please refer to
:ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_0_0.py`.
.. include:: changelog_legend.inc
.. _changes_1_0_2:
Version 1.0.2
=============
**December 2021**
- |Fix| :class:`cluster.Birch`,
:class:`feature_selection.RFECV`, :class:`ensemble.RandomForestRegressor`,
:class:`ensemble.RandomForestClassifier`,
:class:`ensemble.GradientBoostingRegressor`, and
:class:`ensemble.GradientBoostingClassifier` do not raise warning when fitted
on a pandas DataFrame anymore. :pr:`21578` by `Thomas Fan`_.
Changelog
---------
:mod:`sklearn.cluster`
......................
- |Fix| Fixed an infinite loop in :func:`cluster.SpectralClustering` by
moving an iteration counter from try to except.
:pr:`21271` by :user:`Tyler Martin <martintb>`.
:mod:`sklearn.datasets`
.......................
- |Fix| :func:`datasets.fetch_openml` is now thread safe. Data is first
downloaded to a temporary subfolder and then renamed.
:pr:`21833` by :user:`Siavash Rezazadeh <siavrez>`.
:mod:`sklearn.decomposition`
............................
- |Fix| Fixed the constraint on the objective function of
:class:`decomposition.DictionaryLearning`,
:class:`decomposition.MiniBatchDictionaryLearning`, :class:`decomposition.SparsePCA`
and :class:`decomposition.MiniBatchSparsePCA` to be convex and match the referenced
article. :pr:`19210` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
:mod:`sklearn.ensemble`
.......................
- |Fix| :class:`ensemble.RandomForestClassifier`,
:class:`ensemble.RandomForestRegressor`,
:class:`ensemble.ExtraTreesClassifier`, :class:`ensemble.ExtraTreesRegressor`,
and :class:`ensemble.RandomTreesEmbedding` now raise a ``ValueError`` when
``bootstrap=False`` and ``max_samples`` is not ``None``.
:pr:`21295` :user:`Haoyin Xu <PSSF23>`.
- |Fix| Solve a bug in :class:`ensemble.GradientBoostingClassifier` where the
exponential loss was computing the positive gradient instead of the
negative one.
:pr:`22050` by :user:`Guillaume Lemaitre <glemaitre>`.
:mod:`sklearn.feature_selection`
................................
- |Fix| Fixed :class:`feature_selection.SelectFromModel` by improving support
for base estimators that do not set `feature_names_in_`. :pr:`21991` by
`Thomas Fan`_.
:mod:`sklearn.impute`
.....................
- |Fix| Fix a bug in :class:`linear_model.RidgeClassifierCV` where the method
`predict` was performing an `argmax` on the scores obtained from
`decision_function` instead of returning the multilabel indicator matrix.
:pr:`19869` by :user:`Guillaume Lemaitre <glemaitre>`.
:mod:`sklearn.linear_model`
...........................
- |Fix| :class:`linear_model.LassoLarsIC` now correctly computes AIC
and BIC. An error is now raised when `n_features > n_samples` and
when the noise variance is not provided.
:pr:`21481` by :user:`Guillaume Lemaitre <glemaitre>` and
:user:`Andrés Babino <ababino>`.
:mod:`sklearn.manifold`
.......................
- |Fix| Fixed an unnecessary error when fitting :class:`manifold.Isomap` with a
precomputed dense distance matrix where the neighbors graph has multiple
disconnected components. :pr:`21915` by `Tom Dupre la Tour`_.
:mod:`sklearn.metrics`
......................
- |Fix| All :class:`sklearn.metrics.DistanceMetric` subclasses now correctly support
read-only buffer attributes.
This fixes a regression introduced in 1.0.0 with respect to 0.24.2.
:pr:`21694` by :user:`Julien Jerphanion <jjerphan>`.
- |Fix| All `sklearn.metrics.MinkowskiDistance` now accepts a weight
parameter that makes it possible to write code that behaves consistently both
with scipy 1.8 and earlier versions. In turns this means that all
neighbors-based estimators (except those that use `algorithm="kd_tree"`) now
accept a weight parameter with `metric="minknowski"` to yield results that
are always consistent with `scipy.spatial.distance.cdist`.
:pr:`21741` by :user:`Olivier Grisel <ogrisel>`.
:mod:`sklearn.multiclass`
.........................
- |Fix| :meth:`multiclass.OneVsRestClassifier.predict_proba` does not error when
fitted on constant integer targets. :pr:`21871` by `Thomas Fan`_.
:mod:`sklearn.neighbors`
........................
- |Fix| :class:`neighbors.KDTree` and :class:`neighbors.BallTree` correctly supports
read-only buffer attributes. :pr:`21845` by `Thomas Fan`_.
:mod:`sklearn.preprocessing`
............................
- |Fix| Fixes compatibility bug with NumPy 1.22 in :class:`preprocessing.OneHotEncoder`.
:pr:`21517` by `Thomas Fan`_.
:mod:`sklearn.tree`
...................
- |Fix| Prevents :func:`tree.plot_tree` from drawing out of the boundary of
the figure. :pr:`21917` by `Thomas Fan`_.
- |Fix| Support loading pickles of decision tree models when the pickle has
been generated on a platform with a different bitness. A typical example is
to train and pickle the model on 64 bit machine and load the model on a 32
bit machine for prediction. :pr:`21552` by :user:`Loïc Estève <lesteve>`.
:mod:`sklearn.utils`
....................
- |Fix| :func:`utils.estimator_html_repr` now escapes all the estimator
descriptions in the generated HTML. :pr:`21493` by
:user:`Aurélien Geron <ageron>`.
.. _changes_1_0_1:
Version 1.0.1
=============
**October 2021**
Fixed models
------------
- |Fix| Non-fit methods in the following classes do not raise a UserWarning
when fitted on DataFrames with valid feature names:
:class:`covariance.EllipticEnvelope`, :class:`ensemble.IsolationForest`,
:class:`ensemble.AdaBoostClassifier`, :class:`neighbors.KNeighborsClassifier`,
:class:`neighbors.KNeighborsRegressor`,
:class:`neighbors.RadiusNeighborsClassifier`,
:class:`neighbors.RadiusNeighborsRegressor`. :pr:`21199` by `Thomas Fan`_.
:mod:`sklearn.calibration`
..........................
- |Fix| Fixed :class:`calibration.CalibratedClassifierCV` to take into account
`sample_weight` when computing the base estimator prediction when
`ensemble=False`.
:pr:`20638` by :user:`Julien Bohné <JulienB-78>`.
- |Fix| Fixed a bug in :class:`calibration.CalibratedClassifierCV` with
`method="sigmoid"` that was ignoring the `sample_weight` when computing the
the Bayesian priors.
:pr:`21179` by :user:`Guillaume Lemaitre <glemaitre>`.
:mod:`sklearn.cluster`
......................
- |Fix| Fixed a bug in :class:`cluster.KMeans`, ensuring reproducibility and equivalence
between sparse and dense input. :pr:`21195`
by :user:`Jérémie du Boisberranger <jeremiedbb>`.
:mod:`sklearn.ensemble`
.......................
- |Fix| Fixed a bug that could produce a segfault in rare cases for
:class:`ensemble.HistGradientBoostingClassifier` and
:class:`ensemble.HistGradientBoostingRegressor`.
:pr:`21130` :user:`Christian Lorentzen <lorentzenchr>`.
:mod:`sklearn.gaussian_process`
...............................
- |Fix| Compute `y_std` properly with multi-target in
:class:`sklearn.gaussian_process.GaussianProcessRegressor` allowing
proper normalization in multi-target scene.
:pr:`20761` by :user:`Patrick de C. T. R. Ferreira <patrickctrf>`.
:mod:`sklearn.feature_extraction`
.................................
- |Efficiency| Fixed an efficiency regression introduced in version 1.0.0 in the
`transform` method of :class:`feature_extraction.text.CountVectorizer` which no
longer checks for uppercase characters in the provided vocabulary. :pr:`21251`
by :user:`Jérémie du Boisberranger <jeremiedbb>`.
- |Fix| Fixed a bug in :class:`feature_extraction.text.CountVectorizer` and
:class:`feature_extraction.text.TfidfVectorizer` by raising an
error when 'min_idf' or 'max_idf' are floating-point numbers greater than 1.
:pr:`20752` by :user:`Alek Lefebvre <AlekLefebvre>`.
:mod:`sklearn.linear_model`
...........................
- |Fix| Improves stability of :class:`linear_model.LassoLars` for different
versions of openblas. :pr:`21340` by `Thomas Fan`_.
- |Fix| :class:`linear_model.LogisticRegression` now raises a better error
message when the solver does not support sparse matrices with int64 indices.
:pr:`21093` by `Tom Dupre la Tour`_.
:mod:`sklearn.neighbors`
........................
- |Fix| :class:`neighbors.KNeighborsClassifier`,
:class:`neighbors.KNeighborsRegressor`,
:class:`neighbors.RadiusNeighborsClassifier`,
:class:`neighbors.RadiusNeighborsRegressor` with `metric="precomputed"` raises
an error for `bsr` and `dok` sparse matrices in methods: `fit`, `kneighbors`
and `radius_neighbors`, due to handling of explicit zeros in `bsr` and `dok`
:term:`sparse graph` formats. :pr:`21199` by `Thomas Fan`_.
:mod:`sklearn.pipeline`
.......................
- |Fix| :meth:`pipeline.Pipeline.get_feature_names_out` correctly passes feature
names out from one step of a pipeline to the next. :pr:`21351` by
`Thomas Fan`_.
:mod:`sklearn.svm`
..................
- |Fix| :class:`svm.SVC` and :class:`svm.SVR` check for an inconsistency
in its internal representation and raise an error instead of segfaulting.
This fix also resolves
`CVE-2020-28975 <https://nvd.nist.gov/vuln/detail/CVE-2020-28975>`__.
:pr:`21336` by `Thomas Fan`_.
:mod:`sklearn.utils`
....................
- |Enhancement| `utils.validation._check_sample_weight` can perform a
non-negativity check on the sample weights. It can be turned on
using the only_non_negative bool parameter.
Estimators that check for non-negative weights are updated:
:func:`linear_model.LinearRegression` (here the previous
error message was misleading),
:func:`ensemble.AdaBoostClassifier`,
:func:`ensemble.AdaBoostRegressor`,
:func:`neighbors.KernelDensity`.
:pr:`20880` by :user:`Guillaume Lemaitre <glemaitre>`
and :user:`András Simon <simonandras>`.
- |Fix| Solve a bug in ``sklearn.utils.metaestimators.if_delegate_has_method``
where the underlying check for an attribute did not work with NumPy arrays.
:pr:`21145` by :user:`Zahlii <Zahlii>`.
Miscellaneous
.............
- |Fix| Fitting an estimator on a dataset that has no feature names, that was previously
fitted on a dataset with feature names no longer keeps the old feature names stored in
the `feature_names_in_` attribute. :pr:`21389` by
:user:`Jérémie du Boisberranger <jeremiedbb>`.
.. _changes_1_0:
Version 1.0.0
=============
**September 2021**
Minimal dependencies
--------------------
Version 1.0.0 of scikit-learn requires python 3.7+, numpy 1.14.6+ and
scipy 1.1.0+. Optional minimal dependency is matplotlib 2.2.2+.
Enforcing keyword-only arguments
--------------------------------
In an effort to promote clear and non-ambiguous use of the library, most
constructor and function parameters must now be passed as keyword arguments
(i.e. using the `param=value` syntax) instead of positional. If a keyword-only
parameter is used as positional, a `TypeError` is now raised.
:issue:`15005` :pr:`20002` by `Joel Nothman`_, `Adrin Jalali`_, `Thomas Fan`_,
`Nicolas Hug`_, and `Tom Dupre la Tour`_. See `SLEP009
<https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html>`_
for more details.
Changed models
--------------
The following estimators and functions, when fit with the same data and
parameters, may produce different models from the previous version. This often
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
random sampling procedures.
- |Fix| :class:`manifold.TSNE` now avoids numerical underflow issues during
affinity matrix computation.
- |Fix| :class:`manifold.Isomap` now connects disconnected components of the
neighbors graph along some minimum distance pairs, instead of changing
every infinite distances to zero.
- |Fix| The splitting criterion of :class:`tree.DecisionTreeClassifier` and
:class:`tree.DecisionTreeRegressor` can be impacted by a fix in the handling
of rounding errors. Previously some extra spurious splits could occur.
- |Fix| :func:`model_selection.train_test_split` with a `stratify` parameter
and :class:`model_selection.StratifiedShuffleSplit` may lead to slightly
different results.
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we
cannot assure that this list is complete.)
Changelog
---------
..
Entries should be grouped by module (in alphabetic order) and prefixed with
one of the labels: |MajorFeature|, |Feature|, |Efficiency|, |Enhancement|,
|Fix| or |API| (see whats_new.rst for descriptions).
Entries should be ordered by those labels (e.g. |Fix| after |Efficiency|).
Changes not specific to a module should be listed under *Multiple Modules*
or *Miscellaneous*.
Entries should end with:
:pr:`123456` by :user:`Joe Bloggs <joeongithub>`.
where 123456 is the *pull request* number, not the issue number.
- |API| The option for using the squared error via ``loss`` and
``criterion`` parameters was made more consistent. The preferred way is by
setting the value to `"squared_error"`. Old option names are still valid,
produce the same models, but are deprecated and will be removed in version
1.2.
:pr:`19310` by :user:`Christian Lorentzen <lorentzenchr>`.
- For :class:`ensemble.ExtraTreesRegressor`, `criterion="mse"` is deprecated,
use `"squared_error"` instead which is now the default.
- For :class:`ensemble.GradientBoostingRegressor`, `loss="ls"` is deprecated,
use `"squared_error"` instead which is now the default.
- For :class:`ensemble.RandomForestRegressor`, `criterion="mse"` is deprecated,
use `"squared_error"` instead which is now the default.
- For :class:`ensemble.HistGradientBoostingRegressor`, `loss="least_squares"`
is deprecated, use `"squared_error"` instead which is now the default.
- For :class:`linear_model.RANSACRegressor`, `loss="squared_loss"` is
deprecated, use `"squared_error"` instead.
- For :class:`linear_model.SGDRegressor`, `loss="squared_loss"` is
deprecated, use `"squared_error"` instead which is now the default.
- For :class:`tree.DecisionTreeRegressor`, `criterion="mse"` is deprecated,
use `"squared_error"` instead which is now the default.
- For :class:`tree.ExtraTreeRegressor`, `criterion="mse"` is deprecated,
use `"squared_error"` instead which is now the default.
- |API| The option for using the absolute error via ``loss`` and
``criterion`` parameters was made more consistent. The preferred way is by
setting the value to `"absolute_error"`. Old option names are still valid,
produce the same models, but are deprecated and will be removed in version
1.2.
:pr:`19733` by :user:`Christian Lorentzen <lorentzenchr>`.
- For :class:`ensemble.ExtraTreesRegressor`, `criterion="mae"` is deprecated,
use `"absolute_error"` instead.
- For :class:`ensemble.GradientBoostingRegressor`, `loss="lad"` is deprecated,
use `"absolute_error"` instead.
- For :class:`ensemble.RandomForestRegressor`, `criterion="mae"` is deprecated,
use `"absolute_error"` instead.
- For :class:`ensemble.HistGradientBoostingRegressor`,
`loss="least_absolute_deviation"` is deprecated, use `"absolute_error"`
instead.
- For :class:`linear_model.RANSACRegressor`, `loss="absolute_loss"` is
deprecated, use `"absolute_error"` instead which is now the default.
- For :class:`tree.DecisionTreeRegressor`, `criterion="mae"` is deprecated,
use `"absolute_error"` instead.
- For :class:`tree.ExtraTreeRegressor`, `criterion="mae"` is deprecated,
use `"absolute_error"` instead.
- |API| `np.matrix` usage is deprecated in 1.0 and will raise a `TypeError` in
1.2. :pr:`20165` by `Thomas Fan`_.
- |API| :term:`get_feature_names_out` has been added to the transformer API
to get the names of the output features. `get_feature_names` has in
turn been deprecated. :pr:`18444` by `Thomas Fan`_.
- |API| All estimators store `feature_names_in_` when fitted on pandas Dataframes.
These feature names are compared to names seen in non-`fit` methods, e.g.
`transform` and will raise a `FutureWarning` if they are not consistent.
These ``FutureWarning`` s will become ``ValueError`` s in 1.2. :pr:`18010` by
`Thomas Fan`_.
:mod:`sklearn.base`
...................
- |Fix| :func:`config_context` is now threadsafe. :pr:`18736` by `Thomas Fan`_.
:mod:`sklearn.calibration`
..........................
- |Feature| :func:`calibration.CalibrationDisplay` added to plot
calibration curves. :pr:`17443` by :user:`Lucy Liu <lucyleeow>`.
- |Fix| The ``predict`` and ``predict_proba`` methods of
:class:`calibration.CalibratedClassifierCV` can now properly be used on
prefitted pipelines. :pr:`19641` by :user:`Alek Lefebvre <AlekLefebvre>`.
- |Fix| Fixed an error when using a :class:`ensemble.VotingClassifier`
as `base_estimator` in :class:`calibration.CalibratedClassifierCV`.
:pr:`20087` by :user:`Clément Fauchereau <clement-f>`.
:mod:`sklearn.cluster`
......................
- |Efficiency| The ``"k-means++"`` initialization of :class:`cluster.KMeans`
and :class:`cluster.MiniBatchKMeans` is now faster, especially in multicore
settings. :pr:`19002` by :user:`Jon Crall <Erotemic>` and :user:`Jérémie du
Boisberranger <jeremiedbb>`.
- |Efficiency| :class:`cluster.KMeans` with `algorithm='elkan'` is now faster
in multicore settings. :pr:`19052` by
:user:`Yusuke Nagasaka <YusukeNagasaka>`.
- |Efficiency| :class:`cluster.MiniBatchKMeans` is now faster in multicore
settings. :pr:`17622` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
- |Efficiency| :class:`cluster.OPTICS` can now cache the output of the
computation of the tree, using the `memory` parameter. :pr:`19024` by
:user:`Frankie Robertson <frankier>`.
- |Enhancement| The `predict` and `fit_predict` methods of
:class:`cluster.AffinityPropagation` now accept sparse data type for input
data.
:pr:`20117` by :user:`Venkatachalam Natchiappan <venkyyuvy>`
- |Fix| Fixed a bug in :class:`cluster.MiniBatchKMeans` where the sample
weights were partially ignored when the input is sparse. :pr:`17622` by
:user:`Jérémie du Boisberranger <jeremiedbb>`.
- |Fix| Improved convergence detection based on center change in
:class:`cluster.MiniBatchKMeans` which was almost never achievable.
:pr:`17622` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
- |FIX| :class:`cluster.AgglomerativeClustering` now supports readonly
memory-mapped datasets.
:pr:`19883` by :user:`Julien Jerphanion <jjerphan>`.
- |Fix| :class:`cluster.AgglomerativeClustering` correctly connects components
when connectivity and affinity are both precomputed and the number
of connected components is greater than 1. :pr:`20597` by
`Thomas Fan`_.
- |Fix| :class:`cluster.FeatureAgglomeration` does not accept a ``**params`` kwarg in
the ``fit`` function anymore, resulting in a more concise error message. :pr:`20899`
by :user:`Adam Li <adam2392>`.
- |Fix| Fixed a bug in :class:`cluster.KMeans`, ensuring reproducibility and equivalence
between sparse and dense input. :pr:`20200`
by :user:`Jérémie du Boisberranger <jeremiedbb>`.
- |API| :class:`cluster.Birch` attributes, `fit_` and `partial_fit_`, are
deprecated and will be removed in 1.2. :pr:`19297` by `Thomas Fan`_.
- |API| the default value for the `batch_size` parameter of
:class:`cluster.MiniBatchKMeans` was changed from 100 to 1024 due to
efficiency reasons. The `n_iter_` attribute of
:class:`cluster.MiniBatchKMeans` now reports the number of started epochs and
the `n_steps_` attribute reports the number of mini batches processed.
:pr:`17622` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
- |API| :func:`cluster.spectral_clustering` raises an improved error when passed
a `np.matrix`. :pr:`20560` by `Thomas Fan`_.
:mod:`sklearn.compose`
......................
- |Enhancement| :class:`compose.ColumnTransformer` now records the output
of each transformer in `output_indices_`. :pr:`18393` by
:user:`Luca Bittarello <lbittarello>`.
- |Enhancement| :class:`compose.ColumnTransformer` now allows DataFrame input to
have its columns appear in a changed order in `transform`. Further, columns that
are dropped will not be required in transform, and additional columns will be
ignored if `remainder='drop'`. :pr:`19263` by `Thomas Fan`_.
- |Enhancement| Adds `**predict_params` keyword argument to
:meth:`compose.TransformedTargetRegressor.predict` that passes keyword
argument to the regressor.
:pr:`19244` by :user:`Ricardo <ricardojnf>`.
- |FIX| `compose.ColumnTransformer.get_feature_names` supports
non-string feature names returned by any of its transformers. However, note
that ``get_feature_names`` is deprecated, use ``get_feature_names_out``
instead. :pr:`18459` by :user:`Albert Villanova del Moral <albertvillanova>`
and :user:`Alonso Silva Allende <alonsosilvaallende>`.
- |Fix| :class:`compose.TransformedTargetRegressor` now takes nD targets with
an adequate transformer.
:pr:`18898` by :user:`Oras Phongpanagnam <panangam>`.
- |API| Adds `verbose_feature_names_out` to :class:`compose.ColumnTransformer`.
This flag controls the prefixing of feature names out in
:term:`get_feature_names_out`. :pr:`18444` and :pr:`21080` by `Thomas Fan`_.
:mod:`sklearn.covariance`
.........................
- |Fix| Adds arrays check to :func:`covariance.ledoit_wolf` and
:func:`covariance.ledoit_wolf_shrinkage`. :pr:`20416` by :user:`Hugo Defois
<defoishugo>`.
- |API| Deprecates the following keys in `cv_results_`: `'mean_score'`,
`'std_score'`, and `'split(k)_score'` in favor of `'mean_test_score'`
`'std_test_score'`, and `'split(k)_test_score'`. :pr:`20583` by `Thomas Fan`_.
:mod:`sklearn.datasets`
.......................
- |Enhancement| :func:`datasets.fetch_openml` now supports categories with
missing values when returning a pandas dataframe. :pr:`19365` by
`Thomas Fan`_ and :user:`Amanda Dsouza <amy12xx>` and
:user:`EL-ATEIF Sara <elateifsara>`.
- |Enhancement| :func:`datasets.fetch_kddcup99` raises a better message
when the cached file is invalid. :pr:`19669` `Thomas Fan`_.
- |Enhancement| Replace usages of ``__file__`` related to resource file I/O
with ``importlib.resources`` to avoid the assumption that these resource
files (e.g. ``iris.csv``) already exist on a filesystem, and by extension
to enable compatibility with tools such as ``PyOxidizer``.
:pr:`20297` by :user:`Jack Liu <jackzyliu>`.
- |Fix| Shorten data file names in the openml tests to better support
installing on Windows and its default 260 character limit on file names.
:pr:`20209` by `Thomas Fan`_.
- |Fix| :func:`datasets.fetch_kddcup99` returns dataframes when
`return_X_y=True` and `as_frame=True`. :pr:`19011` by `Thomas Fan`_.
- |API| Deprecates `datasets.load_boston` in 1.0 and it will be removed
in 1.2. Alternative code snippets to load similar datasets are provided.
Please report to the docstring of the function for details.
:pr:`20729` by `Guillaume Lemaitre`_.
:mod:`sklearn.decomposition`
............................
- |Enhancement| added a new approximate solver (randomized SVD, available with
`eigen_solver='randomized'`) to :class:`decomposition.KernelPCA`. This
significantly accelerates computation when the number of samples is much
larger than the desired number of components.
:pr:`12069` by :user:`Sylvain Marié <smarie>`.
- |Fix| Fixes incorrect multiple data-conversion warnings when clustering
boolean data. :pr:`19046` by :user:`Surya Prakash <jdsurya>`.
- |Fix| Fixed :func:`decomposition.dict_learning`, used by
:class:`decomposition.DictionaryLearning`, to ensure determinism of the
output. Achieved by flipping signs of the SVD output which is used to
initialize the code. :pr:`18433` by :user:`Bruno Charron <brcharron>`.
- |Fix| Fixed a bug in :class:`decomposition.MiniBatchDictionaryLearning`,
:class:`decomposition.MiniBatchSparsePCA` and
:func:`decomposition.dict_learning_online` where the update of the dictionary
was incorrect. :pr:`19198` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
- |Fix| Fixed a bug in :class:`decomposition.DictionaryLearning`,
:class:`decomposition.SparsePCA`,
:class:`decomposition.MiniBatchDictionaryLearning`,
:class:`decomposition.MiniBatchSparsePCA`,
:func:`decomposition.dict_learning` and
:func:`decomposition.dict_learning_online` where the restart of unused atoms
during the dictionary update was not working as expected. :pr:`19198` by
:user:`Jérémie du Boisberranger <jeremiedbb>`.
- |API| In :class:`decomposition.DictionaryLearning`,
:class:`decomposition.MiniBatchDictionaryLearning`,
:func:`decomposition.dict_learning` and
:func:`decomposition.dict_learning_online`, `transform_alpha` will be equal
to `alpha` instead of 1.0 by default starting from version 1.2 :pr:`19159` by
:user:`Benoît Malézieux <bmalezieux>`.
- |API| Rename variable names in :class:`decomposition.KernelPCA` to improve
readability. `lambdas_` and `alphas_` are renamed to `eigenvalues_`
and `eigenvectors_`, respectively. `lambdas_` and `alphas_` are
deprecated and will be removed in 1.2.
:pr:`19908` by :user:`Kei Ishikawa <kstoneriv3>`.
- |API| The `alpha` and `regularization` parameters of :class:`decomposition.NMF` and
:func:`decomposition.non_negative_factorization` are deprecated and will be removed
in 1.2. Use the new parameters `alpha_W` and `alpha_H` instead. :pr:`20512` by
:user:`Jérémie du Boisberranger <jeremiedbb>`.
:mod:`sklearn.dummy`
....................
- |API| Attribute `n_features_in_` in :class:`dummy.DummyRegressor` and
:class:`dummy.DummyRegressor` is deprecated and will be removed in 1.2.
:pr:`20960` by `Thomas Fan`_.
:mod:`sklearn.ensemble`
.......................
- |Enhancement| :class:`~sklearn.ensemble.HistGradientBoostingClassifier` and
:class:`~sklearn.ensemble.HistGradientBoostingRegressor` take cgroups quotas
into account when deciding the number of threads used by OpenMP. This
avoids performance problems caused by over-subscription when using those
classes in a docker container for instance. :pr:`20477`
by `Thomas Fan`_.
- |Enhancement| :class:`~sklearn.ensemble.HistGradientBoostingClassifier` and
:class:`~sklearn.ensemble.HistGradientBoostingRegressor` are no longer
experimental. They are now considered stable and are subject to the same
deprecation cycles as all other estimators. :pr:`19799` by `Nicolas Hug`_.
- |Enhancement| Improve the HTML rendering of the
:class:`ensemble.StackingClassifier` and :class:`ensemble.StackingRegressor`.
:pr:`19564` by `Thomas Fan`_.
- |Enhancement| Added Poisson criterion to
:class:`ensemble.RandomForestRegressor`. :pr:`19836` by :user:`Brian Sun
<bsun94>`.
- |Fix| Do not allow to compute out-of-bag (OOB) score in
:class:`ensemble.RandomForestClassifier` and
:class:`ensemble.ExtraTreesClassifier` with multiclass-multioutput target
since scikit-learn does not provide any metric supporting this type of
target. Additional private refactoring was performed.
:pr:`19162` by :user:`Guillaume Lemaitre <glemaitre>`.
- |Fix| Improve numerical precision for weights boosting in
:class:`ensemble.AdaBoostClassifier` and :class:`ensemble.AdaBoostRegressor`
to avoid underflows.
:pr:`10096` by :user:`Fenil Suchak <fenilsuchak>`.
- |Fix| Fixed the range of the argument ``max_samples`` to be ``(0.0, 1.0]``
in :class:`ensemble.RandomForestClassifier`,
:class:`ensemble.RandomForestRegressor`, where `max_samples=1.0` is
interpreted as using all `n_samples` for bootstrapping. :pr:`20159` by
:user:`murata-yu`.
- |Fix| Fixed a bug in :class:`ensemble.AdaBoostClassifier` and
:class:`ensemble.AdaBoostRegressor` where the `sample_weight` parameter
got overwritten during `fit`.
:pr:`20534` by :user:`Guillaume Lemaitre <glemaitre>`.
- |API| Removes `tol=None` option in
:class:`ensemble.HistGradientBoostingClassifier` and
:class:`ensemble.HistGradientBoostingRegressor`. Please use `tol=0` for
the same behavior. :pr:`19296` by `Thomas Fan`_.
:mod:`sklearn.feature_extraction`
.................................
- |Fix| Fixed a bug in :class:`feature_extraction.text.HashingVectorizer`
where some input strings would result in negative indices in the transformed
data. :pr:`19035` by :user:`Liu Yu <ly648499246>`.
- |Fix| Fixed a bug in :class:`feature_extraction.DictVectorizer` by raising an
error with unsupported value type.
:pr:`19520` by :user:`Jeff Zhao <kamiyaa>`.
- |Fix| Fixed a bug in :func:`feature_extraction.image.img_to_graph`
and :func:`feature_extraction.image.grid_to_graph` where singleton connected
components were not handled properly, resulting in a wrong vertex indexing.
:pr:`18964` by `Bertrand Thirion`_.
- |Fix| Raise a warning in :class:`feature_extraction.text.CountVectorizer`
with `lowercase=True` when there are vocabulary entries with uppercase
characters to avoid silent misses in the resulting feature vectors.
:pr:`19401` by :user:`Zito Relova <zitorelova>`
:mod:`sklearn.feature_selection`
................................
- |Feature| :func:`feature_selection.r_regression` computes Pearson's R
correlation coefficients between the features and the target.
:pr:`17169` by :user:`Dmytro Lituiev <DSLituiev>`
and :user:`Julien Jerphanion <jjerphan>`.
- |Enhancement| :func:`feature_selection.RFE.fit` accepts additional estimator
parameters that are passed directly to the estimator's `fit` method.
:pr:`20380` by :user:`Iván Pulido <ijpulidos>`, :user:`Felipe Bidu <fbidu>`,
:user:`Gil Rutter <g-rutter>`, and :user:`Adrin Jalali <adrinjalali>`.
- |FIX| Fix a bug in :func:`isotonic.isotonic_regression` where the
`sample_weight` passed by a user were overwritten during ``fit``.
:pr:`20515` by :user:`Carsten Allefeld <allefeld>`.
- |Fix| Change :func:`feature_selection.SequentialFeatureSelector` to
allow for unsupervised modelling so that the `fit` signature need not
do any `y` validation and allow for `y=None`.
:pr:`19568` by :user:`Shyam Desai <ShyamDesai>`.
- |API| Raises an error in :class:`feature_selection.VarianceThreshold`
when the variance threshold is negative.
:pr:`20207` by :user:`Tomohiro Endo <europeanplaice>`
- |API| Deprecates `grid_scores_` in favor of split scores in `cv_results_` in
:class:`feature_selection.RFECV`. `grid_scores_` will be removed in
version 1.2.
:pr:`20161` by :user:`Shuhei Kayawari <wowry>` and :user:`arka204`.
:mod:`sklearn.inspection`
.........................
- |Enhancement| Add `max_samples` parameter in
:func:`inspection.permutation_importance`. It enables to draw a subset of the
samples to compute the permutation importance. This is useful to keep the
method tractable when evaluating feature importance on large datasets.
:pr:`20431` by :user:`Oliver Pfaffel <o1iv3r>`.
- |Enhancement| Add kwargs to format ICE and PD lines separately in partial
dependence plots `inspection.plot_partial_dependence` and
:meth:`inspection.PartialDependenceDisplay.plot`. :pr:`19428` by :user:`Mehdi
Hamoumi <mhham>`.
- |Fix| Allow multiple scorers input to
:func:`inspection.permutation_importance`. :pr:`19411` by :user:`Simona
Maggio <simonamaggio>`.
- |API| :class:`inspection.PartialDependenceDisplay` exposes a class method:
:func:`~inspection.PartialDependenceDisplay.from_estimator`.
`inspection.plot_partial_dependence` is deprecated in favor of the
class method and will be removed in 1.2. :pr:`20959` by `Thomas Fan`_.
:mod:`sklearn.kernel_approximation`
...................................
- |Fix| Fix a bug in :class:`kernel_approximation.Nystroem`
where the attribute `component_indices_` did not correspond to the subset of
sample indices used to generate the approximated kernel. :pr:`20554` by
:user:`Xiangyin Kong <kxytim>`.
:mod:`sklearn.linear_model`
...........................
- |MajorFeature| Added :class:`linear_model.QuantileRegressor` which implements
linear quantile regression with L1 penalty.
:pr:`9978` by :user:`David Dale <avidale>` and
:user:`Christian Lorentzen <lorentzenchr>`.
- |Feature| The new :class:`linear_model.SGDOneClassSVM` provides an SGD
implementation of the linear One-Class SVM. Combined with kernel
approximation techniques, this implementation approximates the solution of
a kernelized One Class SVM while benefitting from a linear
complexity in the number of samples.
:pr:`10027` by :user:`Albert Thomas <albertcthomas>`.
- |Feature| Added `sample_weight` parameter to
:class:`linear_model.LassoCV` and :class:`linear_model.ElasticNetCV`.
:pr:`16449` by :user:`Christian Lorentzen <lorentzenchr>`.
- |Feature| Added new solver `lbfgs` (available with `solver="lbfgs"`)
and `positive` argument to :class:`linear_model.Ridge`. When `positive` is
set to `True`, forces the coefficients to be positive (only supported by
`lbfgs`). :pr:`20231` by :user:`Toshihiro Nakae <tnakae>`.
- |Efficiency| The implementation of :class:`linear_model.LogisticRegression`
has been optimised for dense matrices when using `solver='newton-cg'` and
`multi_class!='multinomial'`.
:pr:`19571` by :user:`Julien Jerphanion <jjerphan>`.
- |Enhancement| `fit` method preserves dtype for numpy.float32 in
:class:`linear_model.Lars`, :class:`linear_model.LassoLars`,
:class:`linear_model.LassoLars`, :class:`linear_model.LarsCV` and
:class:`linear_model.LassoLarsCV`. :pr:`20155` by :user:`Takeshi Oura
<takoika>`.
- |Enhancement| Validate user-supplied gram matrix passed to linear models
via the `precompute` argument. :pr:`19004` by :user:`Adam Midvidy <amidvidy>`.
- |Fix| :meth:`linear_model.ElasticNet.fit` no longer modifies `sample_weight`
in place. :pr:`19055` by `Thomas Fan`_.
- |Fix| :class:`linear_model.Lasso` and :class:`linear_model.ElasticNet` no
longer have a `dual_gap_` not corresponding to their objective. :pr:`19172`
by :user:`Mathurin Massias <mathurinm>`
- |Fix| `sample_weight` are now fully taken into account in linear models
when `normalize=True` for both feature centering and feature
scaling.
:pr:`19426` by :user:`Alexandre Gramfort <agramfort>` and
:user:`Maria Telenczuk <maikia>`.
- |Fix| Points with residuals equal to ``residual_threshold`` are now considered
as inliers for :class:`linear_model.RANSACRegressor`. This allows fitting
a model perfectly on some datasets when `residual_threshold=0`.
:pr:`19499` by :user:`Gregory Strubel <gregorystrubel>`.
- |Fix| Sample weight invariance for :class:`linear_model.Ridge` was fixed in
:pr:`19616` by :user:`Oliver Grisel <ogrisel>` and :user:`Christian Lorentzen
<lorentzenchr>`.
- |Fix| The dictionary `params` in :func:`linear_model.enet_path` and
:func:`linear_model.lasso_path` should only contain parameter of the
coordinate descent solver. Otherwise, an error will be raised.
:pr:`19391` by :user:`Shao Yang Hong <hongshaoyang>`.
- |API| Raise a warning in :class:`linear_model.RANSACRegressor` that from
version 1.2, `min_samples` need to be set explicitly for models other than
:class:`linear_model.LinearRegression`. :pr:`19390` by :user:`Shao Yang Hong
<hongshaoyang>`.
- |API|: The parameter ``normalize`` of :class:`linear_model.LinearRegression`
is deprecated and will be removed in 1.2. Motivation for this deprecation:
``normalize`` parameter did not take any effect if ``fit_intercept`` was set
to False and therefore was deemed confusing. The behavior of the deprecated
``LinearModel(normalize=True)`` can be reproduced with a
:class:`~sklearn.pipeline.Pipeline` with ``LinearModel`` (where
``LinearModel`` is :class:`~linear_model.LinearRegression`,
:class:`~linear_model.Ridge`, :class:`~linear_model.RidgeClassifier`,
:class:`~linear_model.RidgeCV` or :class:`~linear_model.RidgeClassifierCV`)
as follows: ``make_pipeline(StandardScaler(with_mean=False),
LinearModel())``. The ``normalize`` parameter in
:class:`~linear_model.LinearRegression` was deprecated in :pr:`17743` by
:user:`Maria Telenczuk <maikia>` and :user:`Alexandre Gramfort <agramfort>`.
Same for :class:`~linear_model.Ridge`,
:class:`~linear_model.RidgeClassifier`, :class:`~linear_model.RidgeCV`, and
:class:`~linear_model.RidgeClassifierCV`, in: :pr:`17772` by :user:`Maria
Telenczuk <maikia>` and :user:`Alexandre Gramfort <agramfort>`. Same for
:class:`~linear_model.BayesianRidge`, :class:`~linear_model.ARDRegression`
in: :pr:`17746` by :user:`Maria Telenczuk <maikia>`. Same for
:class:`~linear_model.Lasso`, :class:`~linear_model.LassoCV`,
:class:`~linear_model.ElasticNet`, :class:`~linear_model.ElasticNetCV`,
:class:`~linear_model.MultiTaskLasso`,
:class:`~linear_model.MultiTaskLassoCV`,
:class:`~linear_model.MultiTaskElasticNet`,
:class:`~linear_model.MultiTaskElasticNetCV`, in: :pr:`17785` by :user:`Maria
Telenczuk <maikia>` and :user:`Alexandre Gramfort <agramfort>`.
- |API| The ``normalize`` parameter of
:class:`~linear_model.OrthogonalMatchingPursuit` and
:class:`~linear_model.OrthogonalMatchingPursuitCV` will default to False in
1.2 and will be removed in 1.4. :pr:`17750` by :user:`Maria Telenczuk
<maikia>` and :user:`Alexandre Gramfort <agramfort>`. Same for
:class:`~linear_model.Lars` :class:`~linear_model.LarsCV`
:class:`~linear_model.LassoLars` :class:`~linear_model.LassoLarsCV`
:class:`~linear_model.LassoLarsIC`, in :pr:`17769` by :user:`Maria Telenczuk
<maikia>` and :user:`Alexandre Gramfort <agramfort>`.
- |API| Keyword validation has moved from `__init__` and `set_params` to `fit`
for the following estimators conforming to scikit-learn's conventions:
:class:`~linear_model.SGDClassifier`,
:class:`~linear_model.SGDRegressor`,
:class:`~linear_model.SGDOneClassSVM`,
:class:`~linear_model.PassiveAggressiveClassifier`, and
:class:`~linear_model.PassiveAggressiveRegressor`.
:pr:`20683` by `Guillaume Lemaitre`_.
:mod:`sklearn.manifold`
.......................
- |Enhancement| Implement `'auto'` heuristic for the `learning_rate` in
:class:`manifold.TSNE`. It will become default in 1.2. The default
initialization will change to `pca` in 1.2. PCA initialization will
be scaled to have standard deviation 1e-4 in 1.2.
:pr:`19491` by :user:`Dmitry Kobak <dkobak>`.
- |Fix| Change numerical precision to prevent underflow issues
during affinity matrix computation for :class:`manifold.TSNE`.
:pr:`19472` by :user:`Dmitry Kobak <dkobak>`.
- |Fix| :class:`manifold.Isomap` now uses `scipy.sparse.csgraph.shortest_path`
to compute the graph shortest path. It also connects disconnected components
of the neighbors graph along some minimum distance pairs, instead of changing
every infinite distances to zero. :pr:`20531` by `Roman Yurchak`_ and `Tom
Dupre la Tour`_.
- |Fix| Decrease the numerical default tolerance in the lobpcg call
in :func:`manifold.spectral_embedding` to prevent numerical instability.
:pr:`21194` by :user:`Andrew Knyazev <lobpcg>`.
:mod:`sklearn.metrics`
......................
- |Feature| :func:`metrics.mean_pinball_loss` exposes the pinball loss for
quantile regression. :pr:`19415` by :user:`Xavier Dupré <sdpython>`
and :user:`Oliver Grisel <ogrisel>`.
- |Feature| :func:`metrics.d2_tweedie_score` calculates the D^2 regression
score for Tweedie deviances with power parameter ``power``. This is a
generalization of the `r2_score` and can be interpreted as percentage of
Tweedie deviance explained.
:pr:`17036` by :user:`Christian Lorentzen <lorentzenchr>`.
- |Feature| :func:`metrics.mean_squared_log_error` now supports
`squared=False`.
:pr:`20326` by :user:`Uttam kumar <helper-uttam>`.
- |Efficiency| Improved speed of :func:`metrics.confusion_matrix` when labels
are integral.
:pr:`9843` by :user:`Jon Crall <Erotemic>`.
- |Enhancement| A fix to raise an error in :func:`metrics.hinge_loss` when
``pred_decision`` is 1d whereas it is a multiclass classification or when
``pred_decision`` parameter is not consistent with the ``labels`` parameter.
:pr:`19643` by :user:`Pierre Attard <PierreAttard>`.
- |Fix| :meth:`metrics.ConfusionMatrixDisplay.plot` uses the correct max
for colormap. :pr:`19784` by `Thomas Fan`_.
- |Fix| Samples with zero `sample_weight` values do not affect the results
from :func:`metrics.det_curve`, :func:`metrics.precision_recall_curve`
and :func:`metrics.roc_curve`.
:pr:`18328` by :user:`Albert Villanova del Moral <albertvillanova>` and
:user:`Alonso Silva Allende <alonsosilvaallende>`.
- |Fix| avoid overflow in :func:`metrics.adjusted_rand_score` with
large amount of data. :pr:`20312` by :user:`Divyanshu Deoli
<divyanshudeoli>`.
- |API| :class:`metrics.ConfusionMatrixDisplay` exposes two class methods
:func:`~metrics.ConfusionMatrixDisplay.from_estimator` and
:func:`~metrics.ConfusionMatrixDisplay.from_predictions` allowing to create
a confusion matrix plot using an estimator or the predictions.
`metrics.plot_confusion_matrix` is deprecated in favor of these two
class methods and will be removed in 1.2.
:pr:`18543` by `Guillaume Lemaitre`_.
- |API| :class:`metrics.PrecisionRecallDisplay` exposes two class methods
:func:`~metrics.PrecisionRecallDisplay.from_estimator` and
:func:`~metrics.PrecisionRecallDisplay.from_predictions` allowing to create
a precision-recall curve using an estimator or the predictions.
`metrics.plot_precision_recall_curve` is deprecated in favor of these
two class methods and will be removed in 1.2.
:pr:`20552` by `Guillaume Lemaitre`_.
- |API| :class:`metrics.DetCurveDisplay` exposes two class methods
:func:`~metrics.DetCurveDisplay.from_estimator` and
:func:`~metrics.DetCurveDisplay.from_predictions` allowing to create
a confusion matrix plot using an estimator or the predictions.
`metrics.plot_det_curve` is deprecated in favor of these two
class methods and will be removed in 1.2.
:pr:`19278` by `Guillaume Lemaitre`_.
:mod:`sklearn.mixture`
......................
- |Fix| Ensure that the best parameters are set appropriately
in the case of divergency for :class:`mixture.GaussianMixture` and
:class:`mixture.BayesianGaussianMixture`.
:pr:`20030` by :user:`Tingshan Liu <tliu68>` and
:user:`Benjamin Pedigo <bdpedigo>`.
:mod:`sklearn.model_selection`
..............................
- |Feature| added :class:`model_selection.StratifiedGroupKFold`, that combines
:class:`model_selection.StratifiedKFold` and
:class:`model_selection.GroupKFold`, providing an ability to split data
preserving the distribution of classes in each split while keeping each
group within a single split.
:pr:`18649` by :user:`Leandro Hermida <hermidalc>` and
:user:`Rodion Martynov <marrodion>`.
- |Enhancement| warn only once in the main process for per-split fit failures
in cross-validation. :pr:`20619` by :user:`Loïc Estève <lesteve>`
- |Enhancement| The `model_selection.BaseShuffleSplit` base class is
now public. :pr:`20056` by :user:`pabloduque0`.
- |Fix| Avoid premature overflow in :func:`model_selection.train_test_split`.
:pr:`20904` by :user:`Tomasz Jakubek <t-jakubek>`.
:mod:`sklearn.naive_bayes`
..........................
- |Fix| The `fit` and `partial_fit` methods of the discrete naive Bayes
classifiers (:class:`naive_bayes.BernoulliNB`,
:class:`naive_bayes.CategoricalNB`, :class:`naive_bayes.ComplementNB`,
and :class:`naive_bayes.MultinomialNB`) now correctly handle the degenerate
case of a single class in the training set.
:pr:`18925` by :user:`David Poznik <dpoznik>`.
- |API| The attribute ``sigma_`` is now deprecated in
:class:`naive_bayes.GaussianNB` and will be removed in 1.2.
Use ``var_`` instead.
:pr:`18842` by :user:`Hong Shao Yang <hongshaoyang>`.
:mod:`sklearn.neighbors`
........................
- |Enhancement| The creation of :class:`neighbors.KDTree` and
:class:`neighbors.BallTree` has been improved for their worst-cases time
complexity from :math:`\mathcal{O}(n^2)` to :math:`\mathcal{O}(n)`.
:pr:`19473` by :user:`jiefangxuanyan <jiefangxuanyan>` and
:user:`Julien Jerphanion <jjerphan>`.
- |FIX| `neighbors.DistanceMetric` subclasses now support readonly
memory-mapped datasets. :pr:`19883` by :user:`Julien Jerphanion <jjerphan>`.
- |FIX| :class:`neighbors.NearestNeighbors`, :class:`neighbors.KNeighborsClassifier`,
:class:`neighbors.RadiusNeighborsClassifier`, :class:`neighbors.KNeighborsRegressor`
and :class:`neighbors.RadiusNeighborsRegressor` do not validate `weights` in
`__init__` and validates `weights` in `fit` instead. :pr:`20072` by
:user:`Juan Carlos Alfaro Jiménez <alfaro96>`.
- |API| The parameter `kwargs` of :class:`neighbors.RadiusNeighborsClassifier` is
deprecated and will be removed in 1.2.
:pr:`20842` by :user:`Juan Martín Loyola <jmloyola>`.
:mod:`sklearn.neural_network`
.............................
- |Fix| :class:`neural_network.MLPClassifier` and
:class:`neural_network.MLPRegressor` now correctly support continued training
when loading from a pickled file. :pr:`19631` by `Thomas Fan`_.
:mod:`sklearn.pipeline`
.......................
- |API| The `predict_proba` and `predict_log_proba` methods of the
:class:`pipeline.Pipeline` now support passing prediction kwargs to the final
estimator. :pr:`19790` by :user:`Christopher Flynn <crflynn>`.
:mod:`sklearn.preprocessing`
............................
- |Feature| The new :class:`preprocessing.SplineTransformer` is a feature
preprocessing tool for the generation of B-splines, parametrized by the
polynomial ``degree`` of the splines, number of knots ``n_knots`` and knot
positioning strategy ``knots``.
:pr:`18368` by :user:`Christian Lorentzen <lorentzenchr>`.
:class:`preprocessing.SplineTransformer` also supports periodic
splines via the ``extrapolation`` argument.
:pr:`19483` by :user:`Malte Londschien <mlondschien>`.
:class:`preprocessing.SplineTransformer` supports sample weights for
knot position strategy ``"quantile"``.
:pr:`20526` by :user:`Malte Londschien <mlondschien>`.
- |Feature| :class:`preprocessing.OrdinalEncoder` supports passing through
missing values by default. :pr:`19069` by `Thomas Fan`_.
- |Feature| :class:`preprocessing.OneHotEncoder` now supports
`handle_unknown='ignore'` and dropping categories. :pr:`19041` by
`Thomas Fan`_.
- |Feature| :class:`preprocessing.PolynomialFeatures` now supports passing
a tuple to `degree`, i.e. `degree=(min_degree, max_degree)`.
:pr:`20250` by :user:`Christian Lorentzen <lorentzenchr>`.
- |Efficiency| :class:`preprocessing.StandardScaler` is faster and more memory
efficient. :pr:`20652` by `Thomas Fan`_.
- |Efficiency| Changed ``algorithm`` argument for :class:`cluster.KMeans` in
:class:`preprocessing.KBinsDiscretizer` from ``auto`` to ``full``.
:pr:`19934` by :user:`Gleb Levitskiy <GLevV>`.
- |Efficiency| The implementation of `fit` for
:class:`preprocessing.PolynomialFeatures` transformer is now faster. This is
especially noticeable on large sparse input. :pr:`19734` by :user:`Fred
Robinson <frrad>`.
- |Fix| The :func:`preprocessing.StandardScaler.inverse_transform` method
now raises error when the input data is 1D. :pr:`19752` by :user:`Zhehao Liu
<Max1993Liu>`.
- |Fix| :func:`preprocessing.scale`, :class:`preprocessing.StandardScaler`
and similar scalers detect near-constant features to avoid scaling them to
very large values. This problem happens in particular when using a scaler on
sparse data with a constant column with sample weights, in which case
centering is typically disabled. :pr:`19527` by :user:`Oliver Grisel
<ogrisel>` and :user:`Maria Telenczuk <maikia>` and :pr:`19788` by
:user:`Jérémie du Boisberranger <jeremiedbb>`.
- |Fix| :meth:`preprocessing.StandardScaler.inverse_transform` now
correctly handles integer dtypes. :pr:`19356` by :user:`makoeppel`.
- |Fix| :meth:`preprocessing.OrdinalEncoder.inverse_transform` is not
supporting sparse matrix and raises the appropriate error message.
:pr:`19879` by :user:`Guillaume Lemaitre <glemaitre>`.
- |Fix| The `fit` method of :class:`preprocessing.OrdinalEncoder` will not
raise error when `handle_unknown='ignore'` and unknown categories are given
to `fit`.
:pr:`19906` by :user:`Zhehao Liu <MaxwellLZH>`.
- |Fix| Fix a regression in :class:`preprocessing.OrdinalEncoder` where large
Python numeric would raise an error due to overflow when casted to C type
(`np.float64` or `np.int64`).
:pr:`20727` by `Guillaume Lemaitre`_.
- |Fix| :class:`preprocessing.FunctionTransformer` does not set `n_features_in_`
based on the input to `inverse_transform`. :pr:`20961` by `Thomas Fan`_.
- |API| The `n_input_features_` attribute of
:class:`preprocessing.PolynomialFeatures` is deprecated in favor of
`n_features_in_` and will be removed in 1.2. :pr:`20240` by
:user:`Jérémie du Boisberranger <jeremiedbb>`.
:mod:`sklearn.svm`
...................
- |API| The parameter `**params` of :func:`svm.OneClassSVM.fit` is
deprecated and will be removed in 1.2.
:pr:`20843` by :user:`Juan Martín Loyola <jmloyola>`.
:mod:`sklearn.tree`
...................
- |Enhancement| Add `fontname` argument in :func:`tree.export_graphviz`
for non-English characters. :pr:`18959` by :user:`Zero <Zeroto521>`
and :user:`wstates <wstates>`.
- |Fix| Improves compatibility of :func:`tree.plot_tree` with high DPI screens.
:pr:`20023` by `Thomas Fan`_.
- |Fix| Fixed a bug in :class:`tree.DecisionTreeClassifier`,
:class:`tree.DecisionTreeRegressor` where a node could be split whereas it
should not have been due to incorrect handling of rounding errors.
:pr:`19336` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
- |API| The `n_features_` attribute of :class:`tree.DecisionTreeClassifier`,
:class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeClassifier` and
:class:`tree.ExtraTreeRegressor` is deprecated in favor of `n_features_in_`
and will be removed in 1.2. :pr:`20272` by
:user:`Jérémie du Boisberranger <jeremiedbb>`.
:mod:`sklearn.utils`
....................
- |Enhancement| Deprecated the default value of the `random_state=0` in
:func:`~sklearn.utils.extmath.randomized_svd`. Starting in 1.2,
the default value of `random_state` will be set to `None`.
:pr:`19459` by :user:`Cindy Bezuidenhout <cinbez>` and
:user:`Clifford Akai-Nettey<cliffordEmmanuel>`.
- |Enhancement| Added helper decorator :func:`utils.metaestimators.available_if`
to provide flexibility in metaestimators making methods available or
unavailable on the basis of state, in a more readable way.
:pr:`19948` by `Joel Nothman`_.
- |Enhancement| :func:`utils.validation.check_is_fitted` now uses
``__sklearn_is_fitted__`` if available, instead of checking for attributes
ending with an underscore. This also makes :class:`pipeline.Pipeline` and
:class:`preprocessing.FunctionTransformer` pass
``check_is_fitted(estimator)``. :pr:`20657` by `Adrin Jalali`_.
- |Fix| Fixed a bug in :func:`utils.sparsefuncs.mean_variance_axis` where the
precision of the computed variance was very poor when the real variance is
exactly zero. :pr:`19766` by :user:`Jérémie du Boisberranger <jeremiedbb>`.
- |Fix| The docstrings of properties that are decorated with
:func:`utils.deprecated` are now properly wrapped. :pr:`20385` by `Thomas
Fan`_.
- |Fix| `utils.stats._weighted_percentile` now correctly ignores
zero-weighted observations smaller than the smallest observation with
positive weight for ``percentile=0``. Affected classes are
:class:`dummy.DummyRegressor` for ``quantile=0`` and
`ensemble.HuberLossFunction` and `ensemble.HuberLossFunction`
for ``alpha=0``. :pr:`20528` by :user:`Malte Londschien <mlondschien>`.
- |Fix| :func:`utils._safe_indexing` explicitly takes a dataframe copy when
integer indices are provided avoiding to raise a warning from Pandas. This
warning was previously raised in resampling utilities and functions using
those utilities (e.g. :func:`model_selection.train_test_split`,
:func:`model_selection.cross_validate`,
:func:`model_selection.cross_val_score`,
:func:`model_selection.cross_val_predict`).
:pr:`20673` by :user:`Joris Van den Bossche <jorisvandenbossche>`.
- |Fix| Fix a regression in `utils.is_scalar_nan` where large Python
numbers would raise an error due to overflow in C types (`np.float64` or
`np.int64`).
:pr:`20727` by `Guillaume Lemaitre`_.
- |Fix| Support for `np.matrix` is deprecated in
:func:`~sklearn.utils.check_array` in 1.0 and will raise a `TypeError` in
1.2. :pr:`20165` by `Thomas Fan`_.
- |API| `utils._testing.assert_warns` and `utils._testing.assert_warns_message`
are deprecated in 1.0 and will be removed in 1.2. Used `pytest.warns` context
manager instead. Note that these functions were not documented and part from
the public API. :pr:`20521` by :user:`Olivier Grisel <ogrisel>`.
- |API| Fixed several bugs in `utils.graph.graph_shortest_path`, which is
now deprecated. Use `scipy.sparse.csgraph.shortest_path` instead. :pr:`20531`
by `Tom Dupre la Tour`_.
.. rubric:: Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of
the project since version 0.24, including:
Abdulelah S. Al Mesfer, Abhinav Gupta, Adam J. Stewart, Adam Li, Adam Midvidy,
Adrian Garcia Badaracco, Adrian Sadłocha, Adrin Jalali, Agamemnon Krasoulis,
Alberto Rubiales, Albert Thomas, Albert Villanova del Moral, Alek Lefebvre,
Alessia Marcolini, Alexandr Fonari, Alihan Zihna, Aline Ribeiro de Almeida,
Amanda, Amanda Dsouza, Amol Deshmukh, Ana Pessoa, Anavelyz, Andreas Mueller,
Andrew Delong, Ashish, Ashvith Shetty, Atsushi Nukariya, Aurélien Geron, Avi
Gupta, Ayush Singh, baam, BaptBillard, Benjamin Pedigo, Bertrand Thirion,
Bharat Raghunathan, bmalezieux, Brian Rice, Brian Sun, Bruno Charron, Bryan
Chen, bumblebee, caherrera-meli, Carsten Allefeld, CeeThinwa, Chiara Marmo,
chrissobel, Christian Lorentzen, Christopher Yeh, Chuliang Xiao, Clément
Fauchereau, cliffordEmmanuel, Conner Shen, Connor Tann, David Dale, David Katz,
David Poznik, Dimitri Papadopoulos Orfanos, Divyanshu Deoli, dmallia17,
Dmitry Kobak, DS_anas, Eduardo Jardim, EdwinWenink, EL-ATEIF Sara, Eleni
Markou, EricEllwanger, Eric Fiegel, Erich Schubert, Ezri-Mudde, Fatos Morina,
Felipe Rodrigues, Felix Hafner, Fenil Suchak, flyingdutchman23, Flynn, Fortune
Uwha, Francois Berenger, Frankie Robertson, Frans Larsson, Frederick Robinson,
frellwan, Gabriel S Vicente, Gael Varoquaux, genvalen, Geoffrey Thomas,
geroldcsendes, Gleb Levitskiy, Glen, Glòria Macià Muñoz, gregorystrubel,
groceryheist, Guillaume Lemaitre, guiweber, Haidar Almubarak, Hans Moritz
Günther, Haoyin Xu, Harris Mirza, Harry Wei, Harutaka Kawamura, Hassan
Alsawadi, Helder Geovane Gomes de Lima, Hugo DEFOIS, Igor Ilic, Ikko Ashimine,
Isaack Mungui, Ishaan Bhat, Ishan Mishra, Iván Pulido, iwhalvic, J Alexander,
Jack Liu, James Alan Preiss, James Budarz, James Lamb, Jannik, Jeff Zhao,
Jennifer Maldonado, Jérémie du Boisberranger, Jesse Lima, Jianzhu Guo, jnboehm,
Joel Nothman, JohanWork, John Paton, Jonathan Schneider, Jon Crall, Jon Haitz
Legarreta Gorroño, Joris Van den Bossche, José Manuel Nápoles Duarte, Juan
Carlos Alfaro Jiménez, Juan Martin Loyola, Julien Jerphanion, Julio Batista
Silva, julyrashchenko, JVM, Kadatatlu Kishore, Karen Palacio, Kei Ishikawa,
kmatt10, kobaski, Kot271828, Kunj, KurumeYuta, kxytim, lacrosse91, LalliAcqua,
Laveen Bagai, Leonardo Rocco, Leonardo Uieda, Leopoldo Corona, Loic Esteve,
LSturtew, Luca Bittarello, Luccas Quadros, Lucy Jiménez, Lucy Liu, ly648499246,
Mabu Manaileng, Manimaran, makoeppel, Marco Gorelli, Maren Westermann,
Mariangela, Maria Telenczuk, marielaraj, Martin Hirzel, Mateo Noreña, Mathieu
Blondel, Mathis Batoul, mathurinm, Matthew Calcote, Maxime Prieur, Maxwell,
Mehdi Hamoumi, Mehmet Ali Özer, Miao Cai, Michal Karbownik, michalkrawczyk,
Mitzi, mlondschien, Mohamed Haseeb, Mohamed Khoualed, Muhammad Jarir Kanji,
murata-yu, Nadim Kawwa, Nanshan Li, naozin555, Nate Parsons, Neal Fultz, Nic
Annau, Nicolas Hug, Nicolas Miller, Nico Stefani, Nigel Bosch, Nikita Titov,
Nodar Okroshiashvili, Norbert Preining, novaya, Ogbonna Chibuike Stephen,
OGordon100, Oliver Pfaffel, Olivier Grisel, Oras Phongpanangam, Pablo Duque,
Pablo Ibieta-Jimenez, Patric Lacouth, Paulo S. Costa, Paweł Olszewski, Peter
Dye, PierreAttard, Pierre-Yves Le Borgne, PranayAnchuri, Prince Canuma,
putschblos, qdeffense, RamyaNP, ranjanikrishnan, Ray Bell, Rene Jean Corneille,
Reshama Shaikh, ricardojnf, RichardScottOZ, Rodion Martynov, Rohan Paul, Roman
Lutz, Roman Yurchak, Samuel Brice, Sandy Khosasi, Sean Benhur J, Sebastian
Flores, Sebastian Pölsterl, Shao Yang Hong, shinehide, shinnar, shivamgargsya,
Shooter23, Shuhei Kayawari, Shyam Desai, simonamaggio, Sina Tootoonian,
solosilence, Steven Kolawole, Steve Stagg, Surya Prakash, swpease, Sylvain
Marié, Takeshi Oura, Terence Honles, TFiFiE, Thomas A Caswell, Thomas J. Fan,
Tim Gates, TimotheeMathieu, Timothy Wolodzko, Tim Vink, t-jakubek, t-kusanagi,
tliu68, Tobias Uhmann, tom1092, Tomás Moreyra, Tomás Ronald Hughes, Tom
Dupré la Tour, Tommaso Di Noto, Tomohiro Endo, TONY GEORGE, Toshihiro NAKAE,
tsuga, Uttam kumar, vadim-ushtanit, Vangelis Gkiastas, Venkatachalam N, Vilém
Zouhar, Vinicius Rios Fuck, Vlasovets, waijean, Whidou, xavier dupré,
xiaoyuchai, Yasmeen Alsaedy, yoch, Yosuke KOBAYASHI, Yu Feng, YusukeNagasaka,
yzhenman, Zero, ZeyuSun, ZhaoweiWang, Zito, Zito Relova
|