File: batchmark.Rd

package info (click to toggle)
r-cran-mlr 2.19.2%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 8,264 kB
  • sloc: ansic: 65; sh: 13; makefile: 5
file content (114 lines) | stat: -rw-r--r-- 4,796 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/batchmark.R
\name{batchmark}
\alias{batchmark}
\title{Run machine learning benchmarks as distributed experiments.}
\usage{
batchmark(
  learners,
  tasks,
  resamplings,
  measures,
  keep.pred = TRUE,
  keep.extract = FALSE,
  models = FALSE,
  reg = batchtools::getDefaultRegistry()
)
}
\arguments{
\item{learners}{(list of \link{Learner} | \link{character})\cr
Learning algorithms which should be compared, can also be a single learner.
If you pass strings the learners will be created via \link{makeLearner}.}

\item{tasks}{{list of \link{Task}}\cr
Tasks that learners should be run on.}

\item{resamplings}{[(list of) \link{ResampleDesc})\cr
Resampling strategy for each tasks.
If only one is provided, it will be replicated to match the number of tasks.
If missing, a 10-fold cross validation is used.}

\item{measures}{(list of \link{Measure})\cr
Performance measures for all tasks.
If missing, the default measure of the first task is used.}

\item{keep.pred}{(\code{logical(1)})\cr
Keep the prediction data in the \code{pred} slot of the result object.
If you do many experiments (on larger data sets) these objects might unnecessarily increase
object size / mem usage, if you do not really need them.
The default is set to \code{TRUE}.}

\item{keep.extract}{(\code{logical(1)})\cr
Keep the \code{extract} slot of the result object. When creating a lot of
benchmark results with extensive tuning, the resulting R objects can become
very large in size. That is why the tuning results stored in the \code{extract}
slot are removed by default (\code{keep.extract = FALSE}). Note that when
\code{keep.extract = FALSE} you will not be able to conduct analysis in the
tuning results.}

\item{models}{(\code{logical(1)})\cr
Should all fitted models be stored in the \link{ResampleResult}?
Default is \code{FALSE}.}

\item{reg}{(\link[batchtools:makeRegistry]{batchtools::Registry})\cr
Registry, created by \link[batchtools:makeExperimentRegistry]{batchtools::makeExperimentRegistry}. If not
explicitly passed, uses the last created registry.}
}
\value{
(\link{data.table}). Generated job ids are stored in the column
\dQuote{job.id}.
}
\description{
This function is a very parallel version of \link{benchmark} using
\pkg{batchtools}. Experiments are created in the provided registry for each
combination of learners, tasks and resamplings. The experiments are then
stored in a registry and the runs can be started via
\link[batchtools:submitJobs]{batchtools::submitJobs}. A job is one train/test split of the outer
resampling. In case of nested resampling (e.g. with \link{makeTuneWrapper}), each
job is a full run of inner resampling, which can be parallelized in a second
step with \pkg{ParallelMap}.

For details on the usage and support backends have a look at the batchtools
tutorial page: \url{https://github.com/mllg/batchtools}.

The general workflow with \code{batchmark} looks like this:
\enumerate{
\item Create an ExperimentRegistry using \link[batchtools:makeExperimentRegistry]{batchtools::makeExperimentRegistry}.
\item Call \code{batchmark(...)} which defines jobs for all learners and tasks in an \link[base:expand.grid]{base::expand.grid} fashion.
\item Submit jobs using \link[batchtools:submitJobs]{batchtools::submitJobs}.
\item Babysit the computation, wait for all jobs to finish using \link[batchtools:waitForJobs]{batchtools::waitForJobs}.
\item Call \code{reduceBatchmarkResult()} to reduce results into a \link{BenchmarkResult}.
}

If you want to use this with \pkg{OpenML} datasets you can generate tasks
from a vector of dataset IDs easily with \code{tasks = lapply(data.ids, function(x) convertOMLDataSetToMlr(getOMLDataSet(x)))}.
}
\seealso{
Other benchmark: 
\code{\link{BenchmarkResult}},
\code{\link{benchmark}()},
\code{\link{convertBMRToRankMatrix}()},
\code{\link{friedmanPostHocTestBMR}()},
\code{\link{friedmanTestBMR}()},
\code{\link{generateCritDifferencesData}()},
\code{\link{getBMRAggrPerformances}()},
\code{\link{getBMRFeatSelResults}()},
\code{\link{getBMRFilteredFeatures}()},
\code{\link{getBMRLearnerIds}()},
\code{\link{getBMRLearnerShortNames}()},
\code{\link{getBMRLearners}()},
\code{\link{getBMRMeasureIds}()},
\code{\link{getBMRMeasures}()},
\code{\link{getBMRModels}()},
\code{\link{getBMRPerformances}()},
\code{\link{getBMRPredictions}()},
\code{\link{getBMRTaskDescs}()},
\code{\link{getBMRTaskIds}()},
\code{\link{getBMRTuneResults}()},
\code{\link{plotBMRBoxplots}()},
\code{\link{plotBMRRanksAsBarChart}()},
\code{\link{plotBMRSummary}()},
\code{\link{plotCritDifferences}()},
\code{\link{reduceBatchmarkResults}()}
}
\concept{benchmark}