File: dart.rst

package info (click to toggle)
xgboost 3.0.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 13,796 kB
  • sloc: cpp: 67,502; python: 35,503; java: 4,676; ansic: 1,426; sh: 1,320; xml: 1,197; makefile: 204; javascript: 19
file content (111 lines) | stat: -rw-r--r-- 3,754 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
############
DART booster
############
XGBoost mostly combines a huge number of regression trees with a small learning rate.
In this situation, trees added early are significant and trees added late are unimportant.

Vinayak and Gilad-Bachrach proposed a new method to add dropout techniques from the deep neural net community to boosted trees, and reported better results in some situations.

This is a instruction of new tree booster ``dart``.

**************
Original paper
**************
Rashmi Korlakai Vinayak, Ran Gilad-Bachrach. "DART: Dropouts meet Multiple Additive Regression Trees." [`PMLR <http://proceedings.mlr.press/v38/korlakaivinayak15.pdf>`_, `arXiv <https://arxiv.org/abs/1505.01866>`_].

********
Features
********
- Drop trees in order to solve the over-fitting.

  - Trivial trees (to correct trivial errors) may be prevented.

Because of the randomness introduced in the training, expect the following few differences:

- Training can be slower than ``gbtree`` because the random dropout prevents usage of the prediction buffer.
- The early stop might not be stable, due to the randomness.

************
How it works
************
- In :math:`m`-th training round, suppose :math:`k` trees are selected to be dropped.
- Let :math:`D = \sum_{i \in \mathbf{K}} F_i` be the leaf scores of dropped trees and :math:`F_m = \eta \tilde{F}_m` be the leaf scores of a new tree.
- The objective function is as follows:

.. math::

  \mathrm{Obj}
  = \sum_{j=1}^n L \left( y_j, \hat{y}_j^{m-1} - D_j + \tilde{F}_m \right)
  + \Omega \left( \tilde{F}_m \right).

- :math:`D` and :math:`F_m` are overshooting, so using scale factor

.. math::

  \hat{y}_j^m = \sum_{i \not\in \mathbf{K}} F_i + a \left( \sum_{i \in \mathbf{K}} F_i + b F_m \right) .

**********
Parameters
**********

The booster ``dart`` inherits ``gbtree`` booster, so it supports all parameters that ``gbtree`` does, such as ``eta``, ``gamma``, ``max_depth`` etc.

Additional parameters are noted below:

* ``sample_type``: type of sampling algorithm.

  - ``uniform``: (default) dropped trees are selected uniformly.
  - ``weighted``: dropped trees are selected in proportion to weight.

* ``normalize_type``: type of normalization algorithm.

  - ``tree``: (default) New trees have the same weight of each of dropped trees.

  .. math::

    a \left( \sum_{i \in \mathbf{K}} F_i + \frac{1}{k} F_m \right)
    &= a \left( \sum_{i \in \mathbf{K}} F_i + \frac{\eta}{k} \tilde{F}_m \right) \\
    &\sim a \left( 1 + \frac{\eta}{k} \right) D \\
    &= a \frac{k + \eta}{k} D = D , \\
    &\quad a = \frac{k}{k + \eta}

  - ``forest``: New trees have the same weight of sum of dropped trees (forest).

  .. math::

    a \left( \sum_{i \in \mathbf{K}} F_i + F_m \right)
    &= a \left( \sum_{i \in \mathbf{K}} F_i + \eta \tilde{F}_m \right) \\
    &\sim a \left( 1 + \eta \right) D \\
    &= a (1 + \eta) D = D , \\
    &\quad a = \frac{1}{1 + \eta} .

* ``rate_drop``: dropout rate.

  - range: [0.0, 1.0]

* ``skip_drop``: probability of skipping dropout.

  - If a dropout is skipped, new trees are added in the same manner as gbtree.
  - range: [0.0, 1.0]

*************
Sample Script
*************

.. code-block:: python

  import xgboost as xgb
  # read in data
  dtrain = xgb.DMatrix('demo/data/agaricus.txt.train?format=libsvm')
  dtest = xgb.DMatrix('demo/data/agaricus.txt.test?format=libsvm')
  # specify parameters via map
  param = {'booster': 'dart',
           'max_depth': 5, 'learning_rate': 0.1,
           'objective': 'binary:logistic',
           'sample_type': 'uniform',
           'normalize_type': 'tree',
           'rate_drop': 0.1,
           'skip_drop': 0.5}
  num_round = 50
  bst = xgb.train(param, dtrain, num_round)
  preds = bst.predict(dtest)