File: normalizing_labels.md

package info (click to toggle)
mlpack 4.6.2-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 31,272 kB
  • sloc: cpp: 226,039; python: 1,934; sh: 1,198; lisp: 414; makefile: 85
file content (82 lines) | stat: -rw-r--r-- 3,006 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# Normalizing labels

mlpack classifiers and other algorithms require labels to be in the range `0` to
`numClasses - 1`.  A vector of labels with arbitrary (`size_t`) values can be
normalized to the required range with the
[`NormalizeLabels()`](#datanormalizelabels) function, and reverted to the
original range with the [`RevertLabels()`](#datarevertlabels) function.

---

## `data::NormalizeLabels()`

 * `data::NormalizeLabels(labelsIn, labelsOut, mappings)`
    - Map vector `labelsIn` into the range `0` to `numClasses - 1`, storing as
      `labelsOut` (of type `arma::Row<size_t>`).
      * `numClasses` is automatically detected using the number of unique values
        in `labelsIn`. 

    - The column vector `mappings` will be filled with the reverse mappings to
      convert back to the old labels; this can be used by `RevertLabels()`.

    - `mappings[i]` contains the original class label for the mapped label `i`.

---

## `data::RevertLabels()`

 * `data::RevertLabels(labelsIn, mappings, labelsOut)`
    - Unmap normalized labels `labelsIn` using `mappings` into `labelsOut`.

    - Performs the reverse operation of `NormalizeLabels()`; `mappings` should
      be the same vector output by `NormalizeLabels()`.

---

## Example

Convert labels into `0`, `1`, `2`, learn a model, then convert predictions back
to the original label values.

```c++
// Create a random dataset with 5 points in 10 dimensions.
arma::mat dataset(10, 5, arma::fill::randu);

// Manually assemble labels vector: [3, 7, 3, 3, 5]
arma::Row<size_t> labels = { 3, 7, 3, 3, 5 };

// Note that these labels are not in the range `0` to `2`, and thus cannot be
// used directly by mlpack classifiers!
// We will map them to that range using NormalizeLabels().
arma::Row<size_t> mappedLabels;
arma::Col<size_t> mappings;
mlpack::data::NormalizeLabels(labels, mappedLabels, mappings);
const size_t numClasses = mappedLabels.max() + 1;

// Print the mapped values:
// [3, 7, 3, 3, 5] maps to [0, 1, 0, 0, 2].
// The `mappings` vector will be [3, 7, 5].
std::cout << "Original labels: " << labels;
std::cout << "Mapped labels:   " << mappedLabels;
std::cout << std::endl;
std::cout << "Mappings: " << mappings.t();
std::cout << std::endl << std::endl;

// Learn a model with the mapped labels.
mlpack::DecisionTree d(dataset, mappedLabels, numClasses, 1 /* leaf size */);

// Make predictions on the training dataset.
arma::Row<size_t> mappedPredictions;
d.Classify(dataset, mappedPredictions);

// The predictions use mapped labels (0, 1, 2), which we will need to map back
// to the original labels using RevertLabels().
arma::Row<size_t> predictions;
mlpack::data::RevertLabels(mappedPredictions, mappings, predictions);

// Print the predictions before and after unmapping.
// The mapped predictions will take values 0, 1, or 2; the predictions will take
// values 3, 7, or 5 (like the original data).
std::cout << "Mapped predictions: " << mappedPredictions;
std::cout << "Predictions:        " << predictions;
```