1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
|
=head1 NAME
heri-split - splits the dataset into training and testing sets
=head1 SYNOPSIS
B<heri-split> [OPTIONS] I<dataset1> [I<dataset2>...]
=head1 DESCRIPTION
B<heri-split> splits the dataset into several training and testing
sets as it is required for N-fold cross-validation. Dataset contains
one object per line as in svmlight format. By default
stratified sampling is used. That is, all folds contain
the same number of objects for each label.
If option B<-c> is specified, testI<N>.txt and trainI<N>.txt files
(also in svmlight format) are created, where I<N> is the number of fold.
If option B<-R> is specified, test.txt and train.txt files
are created for the same purposes.
Also testing_fold.txt file is created, where for each object (one per line)
its testing fold number is specified if oprion B<-c> is applied.
The file testing_fold.txt contain either 1 for testing set and 0
for training set, if option B<-R> is applied.
=head1 OPTIONS
=over 6
=item B<-h, --help>
Display help information.
=item B<-c, --folds> I<count>
Set the number of folds. This is a mandatory option.
=item B<-d, --output-dir> I<dir>
Set the output directory. This is a mandatory option.
=item B<-r,--random>
Use random sampling instead of stratified one.
=item B<-R,--ratio>
Split the input dataset into training and testing one in the specified ratio (in percents).
=item B<-s, --seed> I<seed>
Set the seed value for pseudorandom generator.
=back
=head1 HOME
L<http://github.com/cheusov/herisvm>
=head1 SEE ALSO
L<heri-eval(1)>
L<heri-stat(1)>
|