1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
|
/*********************************************************************
MLDemos: A User-Friendly visualization toolkit for machine learning
Copyright (C) 2010 Basilio Noris
Contact: mldemos@b4silio.com
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*********************************************************************/
#include "public.h"
#include "basicMath.h"
#include "regressorGB.h"
RegressorGB::RegressorGB()
: boostIters(0), boostLossType(0), boostTreeDepths(0), gbt(0)
{
}
RegressorGB::~RegressorGB()
{
DEL(gbt);
}
void RegressorGB::Train(std::vector< fvec > samples, ivec labels)
{
u32 sampleCnt = samples.size();
if(!sampleCnt) return;
dim = samples[0].size();
if(outputDim != -1 && outputDim < dim -1)
{
// we need to swap the current last dimension with the desired output
FOR(i, samples.size())
{
float val = samples[i][dim-1];
samples[i][dim-1] = samples[i][outputDim];
samples[i][outputDim] = val;
}
}
DEL(gbt);
dim = samples[0].size()-1;
// convert the samples into CvMat* format
CvMat *trainSamples = cvCreateMat(sampleCnt, dim, CV_32FC1);
CvMat *trainOutputs = cvCreateMat(sampleCnt, 1, CV_32FC1);
const CvMat *sampleIdx = 0;//cvCreateMat(samples[0].size(), 1, CV_32FC1);//dim;//samples[0].size();// should be dim
const CvMat *varIdx = 0;// 0 if the sused ubset if not set
const CvMat *varType = 0;
const CvMat *missingDataMask = 0;
int tflag = CV_ROW_SAMPLE;
//loss_function_type – Type of the loss function used for training (see Training the GBT model). It must be one of the following types: CvGBTrees::SQUARED_LOSS, CvGBTrees::ABSOLUTE_LOSS, CvGBTrees::HUBER_LOSS, CvGBTrees::DEVIANCE_LOSS. The first three types are used for regression problems, and the last one for classification.
int activationFunction;
switch(boostLossType==1)
{
case 1:
activationFunction = CvGBTrees::SQUARED_LOSS;
break;
case 2:
activationFunction = CvGBTrees::ABSOLUTE_LOSS;
break;
case 3:
activationFunction = CvGBTrees::HUBER_LOSS;
break;
}
//weak_count – Count of boosting algorithm iterations. weak_count*K is the total count of trees in the GBT model, where K is the output classes count (equal to one in case of a regression).
int weak_count = boostIters;
//shrinkage – Regularization parameter (see Training the GBT model).
float shrinkage = 0.1f;
//subsample_portion – Portion of the whole training set used for each algorithm iteration. Subset is generated randomly. For more information see http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf.
float subsample_portion = 0.5f;
//max_depth – Maximal depth of each decision tree in the ensemble (see CvDTree).
int max_depth = boostTreeDepths;
//use_surrogates – If true, surrogate splits are built (see CvDTree).
bool use_surrogates = false;
CvGBTreesParams params= CvGBTreesParams(activationFunction, weak_count, shrinkage, subsample_portion, max_depth, use_surrogates);
u32 *perm = randPerm(sampleCnt);
FOR(i, sampleCnt)
{
FOR(j, dim) cvSetReal2D(trainSamples, i, j, samples[perm[i]][j]);
cvSet1D(trainOutputs, i, cvScalar(samples[perm[i]][dim]));
}
delete [] perm;
gbt = new CvGBTrees();
gbt->train(trainSamples, tflag,trainOutputs,varIdx,sampleIdx,varType,missingDataMask,params);
cvReleaseMat(&trainSamples);
cvReleaseMat(&trainOutputs);
}
fvec RegressorGB::Test( const fvec &sample)
{
fvec res;
res.resize(2);
if(!gbt) return res;
float *_input = new float[dim];
if(outputDim != -1 & outputDim < sample.size())
{
fvec newSample = sample;
newSample[outputDim] = sample[sample.size()-1];
newSample[sample.size()-1] = sample[outputDim];
FOR(d, min(dim,(u32)sample.size())) _input[d] = newSample[d];
for(int d=min(dim,(u32)sample.size()); d<dim; d++) _input[d] = 0;
}
else
{
FOR(d, min(dim,(u32)sample.size())) _input[d] = sample[d];
for(int d=min(dim,(u32)sample.size()); d<dim; d++) _input[d] = 0;
}
CvMat input = cvMat(1,dim,CV_32FC1, _input);
float output;
output = gbt->predict(&input);
res[0] = output;
res[1] = 0;
delete [] _input;
return res;
}
void RegressorGB::SetParams(int boostIters, int boostLossType, int boostTreeDepths)
{
this->boostIters = boostIters;
this->boostLossType = boostLossType;
this->boostTreeDepths = boostTreeDepths;
}
const char *RegressorGB::GetInfoString()
{
char *text = new char[1024];
sprintf(text, "Gradient Boosting Tree\n");
sprintf(text, "\n");
return text;
}
|