1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
|
"""
Base class for the under-sampling method.
"""
# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
# License: MIT
import numbers
from collections.abc import Mapping
from ..base import BaseSampler
from ..utils._param_validation import Interval, StrOptions
class BaseUnderSampler(BaseSampler):
"""Base class for under-sampling algorithms.
Warning: This class should not be used directly. Use the derive classes
instead.
"""
_sampling_type = "under-sampling"
_sampling_strategy_docstring = """sampling_strategy : float, str, dict, callable, default='auto'
Sampling information to sample the data set.
- When ``float``, it corresponds to the desired ratio of the number of
samples in the minority class over the number of samples in the
majority class after resampling. Therefore, the ratio is expressed as
:math:`\\alpha_{us} = N_{m} / N_{rM}` where :math:`N_{m}` is the
number of samples in the minority class and
:math:`N_{rM}` is the number of samples in the majority class
after resampling.
.. warning::
``float`` is only available for **binary** classification. An
error is raised for multi-class classification.
- When ``str``, specify the class targeted by the resampling. The
number of samples in the different classes will be equalized.
Possible choices are:
``'majority'``: resample only the majority class;
``'not minority'``: resample all classes but the minority class;
``'not majority'``: resample all classes but the majority class;
``'all'``: resample all classes;
``'auto'``: equivalent to ``'not minority'``.
- When ``dict``, the keys correspond to the targeted classes. The
values correspond to the desired number of samples for each targeted
class.
- When callable, function taking ``y`` and returns a ``dict``. The keys
correspond to the targeted classes. The values correspond to the
desired number of samples for each class.
""".rstrip() # noqa: E501
_parameter_constraints: dict = {
"sampling_strategy": [
Interval(numbers.Real, 0, 1, closed="right"),
StrOptions({"auto", "majority", "not minority", "not majority", "all"}),
Mapping,
callable,
],
}
class BaseCleaningSampler(BaseSampler):
"""Base class for under-sampling algorithms.
Warning: This class should not be used directly. Use the derive classes
instead.
"""
_sampling_type = "clean-sampling"
_sampling_strategy_docstring = """sampling_strategy : str, list or callable
Sampling information to sample the data set.
- When ``str``, specify the class targeted by the resampling. Note the
the number of samples will not be equal in each. Possible choices
are:
``'majority'``: resample only the majority class;
``'not minority'``: resample all classes but the minority class;
``'not majority'``: resample all classes but the majority class;
``'all'``: resample all classes;
``'auto'``: equivalent to ``'not minority'``.
- When ``list``, the list contains the classes targeted by the
resampling.
- When callable, function taking ``y`` and returns a ``dict``. The keys
correspond to the targeted classes. The values correspond to the
desired number of samples for each class.
""".rstrip()
_parameter_constraints: dict = {
"sampling_strategy": [
Interval(numbers.Real, 0, 1, closed="right"),
StrOptions({"auto", "majority", "not minority", "not majority", "all"}),
list,
callable,
],
}
|