File: plot_pipeline_classification.py

package info (click to toggle)
imbalanced-learn 0.12.4-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 2,160 kB
  • sloc: python: 17,221; sh: 481; makefile: 187; javascript: 50
file content (75 lines) | stat: -rw-r--r-- 2,006 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
"""
====================================
Usage of pipeline embedding samplers
====================================

An example of the :class:~imblearn.pipeline.Pipeline` object (or
:func:`~imblearn.pipeline.make_pipeline` helper function) working with
transformers and resamplers.
"""

# Authors: Christos Aridas
#          Guillaume Lemaitre <g.lemaitre58@gmail.com>
# License: MIT

# %%
print(__doc__)

# %% [markdown]
# Let's first create an imbalanced dataset and split in to two sets.

# %%
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(
    n_classes=2,
    class_sep=1.25,
    weights=[0.3, 0.7],
    n_informative=3,
    n_redundant=1,
    flip_y=0,
    n_features=5,
    n_clusters_per_class=1,
    n_samples=5000,
    random_state=10,
)

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)

# %% [markdown]
# Now, we will create each individual steps that we would like later to combine

# %%
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier

from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import EditedNearestNeighbours

pca = PCA(n_components=2)
enn = EditedNearestNeighbours()
smote = SMOTE(random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)

# %% [markdown]
# Now, we can finally create a pipeline to specify in which order the different
# transformers and samplers should be executed before to provide the data to
# the final classifier.

# %%
from imblearn.pipeline import make_pipeline

model = make_pipeline(pca, enn, smote, knn)

# %% [markdown]
# We can now use the pipeline created as a normal classifier where resampling
# will happen when calling `fit` and disabled when calling `decision_function`,
# `predict_proba`, or `predict`.

# %%
from sklearn.metrics import classification_report

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))