1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461
|
## `LARS`
The `LARS` class implements the least-angle regression (LARS) algorithm for
L1-penalized and L2-penalized linear regression. `LARS` can also solve the
LASSO (least absolute shrinkage and selection operator) problem. The LARS
algorithm is a *path* algorithm, and thus will recover solutions for *all* L1
penalty parameters greater than or equal to the given L1 penalty parameter.
#### Simple usage example:
```c++
// Train a LARS model on random numeric data and make predictions.
// All data and responses are uniform random; this uses 10 dimensional data.
// Replace with a Load() call or similar for a real application.
arma::mat dataset(10, 1000, arma::fill::randu); // 1000 points.
arma::rowvec responses = arma::randn<arma::rowvec>(1000);
arma::mat testDataset(10, 500, arma::fill::randu); // 500 test points.
mlpack::LARS lars(true, 0.1 /* L1 penalty */); // Step 1: create model.
lars.Train(dataset, responses); // Step 2: train model.
arma::rowvec predictions;
lars.Predict(testDataset, predictions); // Step 3: use model to predict.
// Print some information about the test predictions.
std::cout << arma::accu(predictions > 0.7) << " test points predicted to have"
<< " responses greater than 0.7." << std::endl;
std::cout << arma::accu(predictions < 0) << " test points predicted to have "
<< "negative responses." << std::endl;
```
<p style="text-align: center; font-size: 85%"><a href="#simple-examples">More examples...</a></p>
#### Quick links:
* [Constructors](#constructors): create `LARS` objects.
* [`Train()`](#training): train model.
* [`Predict()`](#prediction): predict with a trained model.
* [Other functionality](#other-functionality) for loading, saving, and
inspecting.
* [The LARS path](#the-lars-path): use models from the LARS path with different
L1 penalty values.
* [Examples](#simple-examples) of simple usage and links to detailed example
projects.
* [Template parameters](#advanced-functionality-different-element-types) for
using different element types for a model.
#### See also:
* [`LinearRegression`](linear_regression.md)
* [mlpack regression techniques](../modeling.md#regression)
* [Least-angle Regression on Wikipedia](https://en.wikipedia.org/wiki/Least-angle_regression)
### Constructors
* `lars = LARS(useCholesky=false, lambda1=0.0, lambda2=0.0, tolerance=1e-16, fitIntercept=true, normalizeData=true)`
- Initialize the model without training.
- You will need to call [`Train()`](#training) later to train the model
before calling [`Predict()`](#prediction).
---
* `lars = LARS(data, responses, colMajor=true, useCholesky=true, lambda1=0.0, lambda2=0.0, tolerance=1e-16, fitIntercept=true, normalizeData=true)`
- Train model on the given data and responses, using the given settings for
hyperparameters.
---
* `lars = LARS(data, responses, colMajor, useCholesky, gramMatrix, lambda1=0.0, lambda2=0.0, tolerance=1e-16, fitIntercept=true, normalizeData=true)`
- *(Advanced constructor)*.
- Train model on the given data and responses, using a precomputed Gram
matrix (`gramMatrix`, equivalent to `data * data.t()`).
- Using a precomputed Gram matrix can save time, if it has already been
computed.
- ***Note:*** any precomputed Gram matrix must also match the settings of
`fitIntercept` and `normalizeData`; so, if both are `true`, then
`gramMatrix` must be computed on mean-centered data whose features are
normalized to have unit variance. In addition, if `lambda2 > 0`, then
it is expected that `lambda2` is added to each element on the diagonal of
`gramMatrix`.
---
#### Constructor Parameters:
| **name** | **type** | **description** | **default** |
|----------|----------|-----------------|-------------|
| `data` | [`arma::mat`](../matrices.md) | Training matrix. | _(N/A)_ |
| `responses` | [`arma::rowvec`](../matrices.md) | Training responses (e.g. values to predict). Should have length `data.n_cols`. | _(N/A)_ |
| `colMajor` | `bool` | Should be set to `true` if `data` is [column-major](../matrices.md#representing-data-in-mlpack). Passing row-major data can avoid a transpose operation. | `false` |
| `useCholesky` | `bool` | If `true`, use the Cholesky decomposition of the Gram matrix to solve linear systems (as opposed to the full Gram matrix). | `false` |
| `gramMatrix` | [`arma::mat`](../matrices.md) | Precomputed Gram matrix of `data` (i.e. `data * data.t()` for column-major data). | _(N/A)_ |
| `lambda1` | `double` | L1 regularization penalty parameter. | `0.0` |
| `lambda2` | `double` | L2 regularization penalty parameter. | `0.0` |
| `tolerance` | `double` | Tolerance on feature correlations for convergence. | `1e-16` |
| `fitIntercept` | `bool` | If `true`, an intercept term will be included in the model. | `true` |
| `normalizeData` | `bool` | If `true`, data will be normalized before fitting the model. | `true` |
As an alternative to passing hyperparameters, each hyperparameter can be set
with a standalone method. The following functions can be used before calling
`Train()` to set hyperparameters:
* `lars.UseCholesky() = useChol;` will set whether or not the Cholesky
decomposition will be used during training to `useChol`.
* `lars.Lambda1() = lambda1;` will set the L1 regularization penalty parameter
to `lambda1`.
* `lars.Lambda2() = lambda2;` will set the L2 regularization penalty parameter
to `lambda2`.
* `lars.Tolerance() = tol;` will set the convergence tolerance to `tol`.
* `lars.FitIntercept(fitIntercept);` will set whether an intercept will be fit
to `fitIntercept`. If an external Gram matrix has been specified, this will
throw an exception.
* `lars.NormalizeData(normalizeData);` will set whether data should be
normalized to `normalizeData`. If an external Gram matrix has been
specified, this will throw an exception.
***Notes:***
- The `lambda1` parameter implicitly controls the sparsity of the model; for
more sparse models (i.e. fewer nonzero weights), specify a larger `lambda1`.
- Specifying a too-small `lambda1` or `lambda2` value may cause the model to
overfit; however, setting it too large may cause the model to underfit.
Because LARS is a path algorithm, the
[`SelectBeta()`](#other-functionality) functions can be used to select models
with different values of `lambda1`. For tuning `lambda2`,
[Automatic hyperparameter tuning](../hpt.md) can be used.
<!-- TODO: fix the link to the hyperparameter tuner -->
- `fitIntercept` and `normalizeData` are recommended to be set as `true`, in
accordance with the original LARS algorithm. `false` can be used for
`fitIntercept` if the features and responses are already mean-centered, and
`false` can also be used for `normalizeData` if the features are already
unit-variance. Using `false` for either option can provide a small amount of
speedup.
- `useCholesky` should generally be set to `true` and in most situations will
result in faster training.
### Training
If training is not done as part of the constructor call, it can be done with the
`Train()` function:
* `lars.Train(data, responses, colMajor=true, useCholesky=true, lambda1=0.0, lambda2=0.0, tolerance=1e-16, fitIntercept=true, normalizeData=true)`
- Train the model on the given data.
---
* `lars.Train(data, responses, colMajor, useCholesky, gramMatrix, lambda1=0.0, lambda2=0.0, tolerance=1e-16, fitIntercept=true, normalizeData=true)`
- *(Advanced training.)*
- Train model on the given data and responses, using a precomputed Gram
matrix (`gramMatrix`, equivalent to `data * data.t()`).
- Using a precomputed Gram matrix can save time, if it has already been
computed.
- ***Note:*** any precomputed Gram matrix must also match the settings of
`fitIntercept` and `normalizeData`; so, if both are `true`, then
`gramMatrix` must be computed on mean-centered data whose features are
normalized to have unit variance. In addition, if `lambda2 > 0`, then
it is expected that `lambda2` is added to each element on the diagonal of
`gramMatrix`.
---
Types of each argument are the same as in the table for constructors
[above](#constructor-parameters).
***Notes:***
* Training is not incremental. A second call to `Train()` will retrain the
model from scratch.
* `Train()` returns the squared error (loss) of the model on the training set
as a `double`. To obtain the MSE, divide by the number of training points.
### Prediction
Once a `LARS` model is trained, the `Predict()` member function
can be used to make predictions for new data.
* `double predictedValue = lars.Predict(point)`
- ***(Single-point)***
- Make a prediction for a single point, returning the predicted value.
---
* `lars.Predict(data, predictions, colMajor=true)`
- ***(Multi-point)***
- Make predictions for a set of points.
- The prediction for data point `i` can be accessed with `predictions[i]`.
---
#### Prediction Parameters:
| **usage** | **name** | **type** | **description** |
|-----------|----------|----------|-----------------|
| _single-point_ | `point` | [`arma::vec`](../matrices.md) | Single point for prediction. |
||||
| _multi-point_ | `data` | [`arma::mat`](../matrices.md) | Set of [column-major](../matrices.md) points for classification. |
| _multi-point_ | `predictions` | [`arma::rowvec&`](../matrices.md) | Vector of `double`s to store predictions into. Will be set to length `data.n_cols`. |
| _multi-point_ | `colMajor` | `bool` | Should be set to `true` if `data` is [column-major](../matrices.md). Passing row-major data can avoid a transpose operation. (Default `true`.) |
### Other Functionality
* A `LARS` model can be serialized with
[`Save()` and `Load()`](../load_save.md#mlpack-models-and-objects).
* `lars.Beta()` will return an `arma::vec` with the model parameters. This
will have length equal to the dimensionality of the model. Note that
`lars.Beta()` can be changed to a different model on the LARS path using the
[`lars.SelectBeta()` method](#the-lars-path).
* `lars.Intercept()` will return a `double` representing the fitted intercept
term, or 0 if `lars.FitIntercept()` is `false`.
* `lars.ActiveSet()` will return a `std::vector<size_t>&` containing the
indices of nonzero dimensions in the model parameters (`lars.Beta()`).
* `lars.ComputeError(data, responses, colMajor=true)` will return a `double`
containing the squared error of the model on `data`, given that
the true responses are `responses`. To obtain the MSE, divide by the number
of points in `data`.
### The LARS Path
LARS is a *path* (or stepwise) algorithm, meaning it adds one feature at a time
to the model. This in turn means that when we train a LARS model with
`lambda1` set to `l`, we also recover every possible LARS model on the same data
with a `lambda1` greater than `l`.
The `LARS` class provides a way to access all of the models on the path, and
switch between them for prediction purposes:
* `lars.BetaPath()` returns a `std::vector<arma::vec>&` containing each set of
model weights on the LARS path.
* `lars.InterceptPath()` returns a `std::vector<double>&` containing each
intercept value on the LARS path. These values are only meaningful if
`lars.FitIntercept()` is `true`.
* `lars.LambdaPath()` returns a `std::vector<double>&` containing each
`lambda1` value that is associated with each element in `lars.BetaPath()` and
`lars.InterceptPath()`. That is, `lars.LambdaPath()[i]` is the `lambda1`
value corresponding to the model defined by `lars.BetaPath()[i]` and
`lars.InterceptPath()[i]`.
* `lars.SelectBeta(lambda1)` will set the model weights (`lars.ActiveSet()`,
`lars.Beta()` and `lars.Intercept()`) to the path location with L1 penalty
`lambda1`. This is equivalent to calling `lars.Train(data, responses,
colMajor, useCholesky, lambda1)`---but much more efficient! `lambda1`
cannot be less than `lars.Lambda1()`, or an exception will be thrown.
* `lars.SelectedLambda1()` returns the currently selected L1 regularization
penalty parameter.
* For any value `lambda1` between `lars.LambdaPath()[i]` and
`lars.LambdaPath()[i + 1]`, the corresponding model is a linear interpolation
between `lars.BetaPath()[i]` and `lars.BetaPath()[i + 1]` (and
`lars.InterceptPath()[i]` and `lars.InterceptPath()[i + 1]`). This exact
linear interpolation is what is computed by `lars.SelectBeta(lambda1)`.
### Simple Examples
See also the [simple usage example](#simple-usage-example) for a trivial usage
of the `LARS` class.
---
Train a LARS model in the constructor, and print the MSE on training and
test data for each set of weights in the path.
```c++
// See https://datasets.mlpack.org/wave_energy_farm_100.csv.
arma::mat data;
mlpack::Load("wave_energy_farm_100.csv", data, mlpack::Fatal);
// Split the last row off: it is the responses. Also, normalize the responses
// to [0, 1].
arma::rowvec responses = data.row(data.n_rows - 1);
responses /= responses.max();
data.shed_row(data.n_rows - 1);
// Split into a training and test dataset. 20% of the data is held out as a
// test set.
arma::mat trainingData, testData;
arma::rowvec trainingResponses, testResponses;
mlpack::Split(data, responses, trainingData, testData, trainingResponses,
testResponses, 0.2);
// Train a LARS model with lambda1 = 1e-5 and lambda2 = 1e-6.
mlpack::LARS lars(trainingData, trainingResponses, true, true, 1e-5, 1e-6);
// Iterate over all the models in the path.
const size_t pathLength = lars.BetaPath().size();
for (size_t i = 0; i < pathLength; ++i)
{
// Use the i'th model in the path.
lars.SelectBeta(lars.LambdaPath()[i]);
// ComputeError() returns the total loss, which we need to divide by the
// number of points to get the MSE.
const double trainMSE = lars.ComputeError(trainingData, trainingResponses) /
trainingData.n_cols;
const double testMSE = lars.ComputeError(testData, testResponses) /
testData.n_cols;
std::cout << "L1 penalty parameter: " << lars.SelectedLambda1() << std::endl;
std::cout << " MSE on training set: " << trainMSE << "." << std::endl;
std::cout << " MSE on test set: " << testMSE << "." << std::endl;
}
```
---
Train a LARS model, print predictions for a random point, and save to a file.
```c++
// See https://datasets.mlpack.org/admission_predict.csv.
arma::mat data;
mlpack::Load("admission_predict.csv", data, mlpack::Fatal);
// See https://datasets.mlpack.org/admission_predict.responses.csv.
arma::rowvec responses;
mlpack::Load("admission_predict.responses.csv", responses, mlpack::Fatal);
// Train a LARS model with only L2 regularization.
mlpack::LARS lars(data, responses, true, true, 0.0, 0.1 /* lambda2 */);
// Predict on a random point.
arma::vec point = arma::randu<arma::vec>(data.n_rows);
const double prediction = lars.Predict(point);
std::cout << "Prediction on random point: " << prediction << "." << std::endl;
// Save the model to "lars_model.bin".
mlpack::Save("lars_model.bin", lars, mlpack::Fatal);
```
---
Load a LARS model from disk and print some information about it.
```c++
// This assumes a Lars model has previously been saved to
// "lars_model.bin".
mlpack::LARS lars;
mlpack::Load("lars_model.bin", lars, mlpack::Fatal);
if (lars.BetaPath().size() == 0)
{
std::cout << "lars_model.bin contains an untrained LARS model." << std::endl;
}
else
{
std::cout << "Information on the LARS model in lars_model.bin:" << std::endl;
std::cout << " - Model dimensionality: " << lars.Beta().n_elem << "."
<< std::endl;
std::cout << " - Has intercept: "
<< (lars.FitIntercept() ? std::string("yes") : std::string("no")) << "."
<< std::endl;
std::cout << " - Current L1 regularization penalty parameter value: "
<< lars.SelectedLambda1() << "." << std::endl;
std::cout << " - L2 regularization penalty parameter: " << lars.Lambda2()
<< "." << std::endl;
std::cout << " - Number of nonzero elements in model: "
<< lars.ActiveSet().size() << "." << std::endl;
std::cout << " - Number of models in LARS path: " << lars.BetaPath().size()
<< "." << std::endl;
std::cout << " - Model weight for dimension 0: " << lars.Beta()[0] << "."
<< std::endl;
if (lars.FitIntercept())
{
std::cout << " - Intercept value: " << lars.Intercept() << "." << std::endl;
}
}
```
---
Train several models with different L2 regularization penalty parameters, using
a precomputed Gram matrix.
```c++
// See https://datasets.mlpack.org/admission_predict.csv.
arma::mat data;
mlpack::Load("admission_predict.csv", data, mlpack::Fatal);
// See https://datasets.mlpack.org/admission_predict.responses.csv.
arma::rowvec responses;
mlpack::Load("admission_predict.responses.csv", responses, mlpack::Fatal);
// Precompute Gram matrix.
arma::mat gramMatrix = data * data.t();
std::vector<double> lambda2Values = { 0.01, 0.1, 1.0, 10.0, 100.0 };
for (double lambda2 : lambda2Values)
{
// Build a LARS model using the precomputed Gram matrix. We did not normalize
// or center the data before computing the Gram matrix, so we have to set
// fitIntercept and normalizeData accordingly.
mlpack::LARS lars(data, responses, true, true, gramMatrix, 0.01, lambda2,
1e-16, false, false);
std::cout << "MSE with L2 penalty " << lambda2 << ": "
<< (lars.ComputeError(data, responses) / data.n_cols) << "." << std::endl;
}
```
### Advanced Functionality: Different Element Types
The `LARS` class has one template parameter that can be used to
control the element type of the model. The full signature of the class is:
```
LARS<ModelMatType>
```
`ModelMatType` specifies the type of matrix used for the internal representation
of model parameters. Any matrix type that implements the Armadillo API can be
used.
Note that the `Train()` and `Predict()` functions themselves are templatized and
can allow any matrix type that has the same element type. So, for instance, a
`LARS<arma::sp_mat>` can accept an `arma::mat` for training.
The example below trains a LARS model on 32-bit precision data, using
`arma::sp_fmat` to store the model parameters.
```c++
// Create random, sparse 1000-dimensional data.
arma::fmat dataset(1000, 5000, arma::fill::randu);
// Generate noisy responses from random data.
arma::fvec trueWeights(1000, arma::fill::randu);
arma::frowvec responses = trueWeights.t() * dataset +
0.01 * arma::randu<arma::frowvec>(5000) /* noise term */;
mlpack::LARS<arma::sp_fmat> lars;
lars.Lambda1() = 0.1;
lars.Lambda2() = 0.01;
lars.Train(dataset, responses);
// Compute the MSE on the training set and a random test set.
arma::fmat testDataset(1000, 2500, arma::fill::randu);
arma::frowvec testResponses = trueWeights.t() * testDataset +
0.01 * arma::randu<arma::frowvec>(2500) /* noise term */;
const float trainMSE = lars.ComputeError(dataset, responses) / dataset.n_cols;
const float testMSE = lars.ComputeError(testDataset, testResponses) /
testDataset.n_cols;
std::cout << "MSE on training set: " << trainMSE << "." << std::endl;
std::cout << "MSE on test set: " << testMSE << "." << std::endl;
```
***Note:*** it is generally only more efficient to use a sparse type (e.g.
`arma::sp_mat`) for `ModelMatType` when the L1 regularization parameter is set
such that a highly sparse model is produced.
|