File: amf.md

package info (click to toggle)
mlpack 4.6.2-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 31,272 kB
  • sloc: cpp: 226,039; python: 1,934; sh: 1,198; lisp: 414; makefile: 85
file content (185 lines) | stat: -rw-r--r-- 6,966 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
# Alternating Matrix Factorization tutorial

Alternating matrix factorization decomposes a matrix `V` in the form `V ~ WH`
where `W` is called the basis matrix and `H` is called the encoding matrix. `V`
is taken to be of size `n x m` and the obtained `W` is `n x r` and `H` is `r x
m`. The size `r` is called the *rank* of the factorization. Factorization is
done by alternately calculating `W` and `H` respectively while holding the other
matrix constant.

mlpack provides a simple C++ interface to perform Alternating Matrix
Factorization.

## The `AMF` class

The `AMF` class is templatized with 3 parameters; the first contains the policy
used to determine when the algorithm has converged; the second contains the
initialization rule for the `W` and `H` matrix; the last contains the update
rule to be used during each iteration. This templatization allows the user to
try various update rules, initialization rules, and termination policies
(including ones not supplied with mlpack) for factorization.

The class provides the following method that performs factorization

```c++
template<typename MatType> double Apply(const MatType& V,
                                        const size_t r,
                                        arma::mat& W,
                                        arma::mat& H);
```

## Using different termination policies

The `AMF` implementation comes with different termination policies to support
many implemented algorithms. Every termination policy implements the following
method which returns the status of convergence.

```c++
bool IsConverged(arma::mat& W, arma::mat& H)
```

Below is a list of all the termination policies that mlpack contains.

 - `SimpleResidueTermination`
 - `SimpleToleranceTermination`
 - `ValidationRMSETermination`

In `SimpleResidueTermination`, the termination decision depends on two factors,
value of residue and number of iteration. If the current value of residue drops
below the threshold or the number of iterations goes beyond the threshold,
positive termination signal is passed to AMF.

In `SimpleToleranceTermination`, termination criterion is met when the increase
in residue value drops below the given tolerance. To accommodate spikes, certain
number of successive residue drops are accepted. Secondary termination criterion
terminates algorithm when iteration count goes beyond the threshold.

`ValidationRMSETermination` divides the data into 2 sets, training set and
validation set. Entries of the validation set are nullifed in the input matrix.
Termination criterion is met when increase in validation set RMSe value drops
below the given tolerance. To accommodate spikes certain number of successive
validation RMSE drops are accepted. This upper imit on successive drops can be
adjusted with `reverseStepCount`. A secondary termination criterion terminates
the algorithm when the iteration count goes above the threshold. Though this
termination policy is better measure of convergence than the above 2 termination
policies, it may cause a decrease in performance since it is computationally
expensive.

On the other hand, `CompleteIncrementalTermination` and
`IncompleteIncrementalTermination` are just wrapper classes for other
termination policies. These policies are used when AMF is applied with
`SVDCompleteIncrementalLearning` and `SVDIncompleteIncrementalLearning`,
respectively.

## Using different initialization policies

mlpack currently has 2 initialization policies implemented for AMF:

 - `RandomInitialization`
 - `RandomAcolInitialization`

`RandomInitialization` initializes matrices `W` and `H` with random uniform
distribution while `RandomAcolInitialization` initializes the `W` matrix by
averaging p randomly chosen columns of `V`.  In the case of
`RandomAcolInitialization`, `p` is a template parameter.

To implement their own initialization policy, users need to define the following
function in their class.

```c++
template<typename MatType>
inline static void Initialize(const MatType& V,
                              const size_t r,
                              arma::mat& W,
                              arma::mat& H)
```

## Using different update rules

mlpack implements the following update rules for the AMF class:

 - `NMFALSUpdate`
 - `NMFMultiplicativeDistanceUpdate`
 - `NMFMultiplicativeDivergenceUpdate`
 - `SVDBatchLearning`
 - `SVDIncompleteIncrementalLearning`
 - `SVDCompleteIncrementalLearning`

Non-Negative Matrix factorization can be achieved with `NMFALSUpdate`,
`NMFMultiplicativeDivergenceUpdate` or `NMFMultiplicativeDivergenceUpdate`.
`NMFALSUpdate` implements a simple Alternating Least Squares optimization while
the other rules implement algorithms given in the paper 'Algorithms for
Non-negative Matrix Factorization'.

The remaining update rules perform the singular value decomposition of the
matrix `V`.  This SVD factorization is optimized for use by mlpack's
collaborative filtering code (see the [collaborative filtering
tutorial](cf.md)). This use of SVD factorizers for collaborative filtering is
described in the paper 'A Guide to Singular Value Decomposition for
Collaborative Filtering' by Chih-Chao Ma. For further details about the
algorithms refer to the respective class documentation.

## Using Non-Negative Matrix Factorization with `AMF`

The use of `AMF` for Non-Negative Matrix factorization is simple. The AMF module
defines `NMFALSFactorizer` which can be used directly without knowing the
internal structure of `AMF`. For example:

```c++
#include <mlpack.hpp>

using namespace std;
using namespace arma;
using namespace mlpack;

int main()
{
  NMFALSFactorizer nmf;
  mat W, H;
  mat V = randu<mat>(100, 100);
  size_t r = 10;
  double residue = nmf.Apply(V, r, W, H);
}
```

`NMFALSFactorizer` uses `SimpleResidueTermination`, which is most preferred with
Non-Negative Matrix factorizers.  The initialization of `W` and `H` in
`NMFALSFactorizer` is random. The `Apply()` function returns the residue
obtained by comparing the constructed matrix `W * H` with the original matrix
`V`.

## Using Singular Value Decomposition with `AMF`

mlpack has the following SVD factorizers implemented for AMF:

 - `SVDBatchFactorizer`
 - `SVDIncompleteIncrementalFactorizer`
 - `SVDCompleteIncrementalFactorizer`

Each of these factorizers takes a template parameter `MatType`, which specifies
the type of the matrix `V` (dense or sparse---these have types `arma::mat` and
`arma::sp_mat`, respectively).  When the matrix to be factorized is relatively
sparse, specifying `MatType = arma::sp_mat` can provide a runtime boost.

```c++
#include <mlpack.hpp>

using namespace std;
using namespace arma;
using namespace mlpack;

int main()
{
  sp_mat V = sprandu<sp_mat>(100,100,0.1);
  size_t r = 10;
  mat W, H;

  SVDBatchFactorizer<sp_mat> svd;
  double residue = svd.Apply(V, r, W, H);
}
```

## Further documentation

For further documentation on the `AMF` class, consult the `AMF`
source code comments.