File: java_intro.rst

package info (click to toggle)
xgboost 3.0.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 13,796 kB
  • sloc: cpp: 67,502; python: 35,503; java: 4,676; ansic: 1,426; sh: 1,320; xml: 1,197; makefile: 204; javascript: 19
file content (158 lines) | stat: -rw-r--r-- 4,294 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
##############################
Getting Started with XGBoost4J
##############################
This tutorial introduces Java API for XGBoost.

**************
Data Interface
**************
Like the XGBoost python module, XGBoost4J uses DMatrix to handle data.
LIBSVM txt format file, sparse matrix in CSR/CSC format, and dense matrix are
supported.

* The first step is to import DMatrix:

  .. code-block:: java

    import ml.dmlc.xgboost4j.java.DMatrix;

* Use DMatrix constructor to load data from a libsvm text format file:

  .. code-block:: java

    DMatrix dmat = new DMatrix("train.svm.txt");

* Pass arrays to DMatrix constructor to load from sparse matrix.

  Suppose we have a sparse matrix

  .. code-block:: none

    1 0 2 0
    4 0 0 3
    3 1 2 0

  We can express the sparse matrix in `Compressed Sparse Row (CSR) <https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format)>`_ format:

  .. code-block:: java

    long[] rowHeaders = new long[] {0,2,4,7};
    float[] data = new float[] {1f,2f,4f,3f,3f,1f,2f};
    int[] colIndex = new int[] {0,2,0,3,0,1,2};
    int numColumn = 4;
    DMatrix dmat = new DMatrix(rowHeaders, colIndex, data, DMatrix.SparseType.CSR, numColumn);

  ... or in `Compressed Sparse Column (CSC) <https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_(CSC_or_CCS)>`_ format:

  .. code-block:: java

    long[] colHeaders = new long[] {0,3,4,6,7};
    float[] data = new float[] {1f,4f,3f,1f,2f,2f,3f};
    int[] rowIndex = new int[] {0,1,2,2,0,2,1};
    int numRow = 3;
    DMatrix dmat = new DMatrix(colHeaders, rowIndex, data, DMatrix.SparseType.CSC, numRow);

* You may also load your data from a dense matrix. Let's assume we have a matrix of form

  .. code-block:: none

    1    2
    3    4
    5    6

  Using `row-major layout <https://en.wikipedia.org/wiki/Row-_and_column-major_order>`_, we specify the dense matrix as follows:

  .. code-block:: java

    float[] data = new float[] {1f,2f,3f,4f,5f,6f};
    int nrow = 3;
    int ncol = 2;
    float missing = 0.0f;
    DMatrix dmat = new DMatrix(data, nrow, ncol, missing);

* To set weight:

  .. code-block:: java

    float[] weights = new float[] {1f,2f,1f};
    dmat.setWeight(weights);

******************
Setting Parameters
******************
To set parameters, parameters are specified as a Map:

.. code-block:: java

  Map<String, Object> params = new HashMap<String, Object>() {
    {
      put("eta", 1.0);
      put("max_depth", 2);
      put("objective", "binary:logistic");
      put("eval_metric", "logloss");
    }
  };

**************
Training Model
**************
With parameters and data, you are able to train a booster model.

* Import Booster and XGBoost:

  .. code-block:: java

    import ml.dmlc.xgboost4j.java.Booster;
    import ml.dmlc.xgboost4j.java.XGBoost;

* Training

  .. code-block:: java

    DMatrix trainMat = new DMatrix("train.svm.txt");
    DMatrix validMat = new DMatrix("valid.svm.txt");
    // Specify a watch list to see model accuracy on data sets
    Map<String, DMatrix> watches = new HashMap<String, DMatrix>() {
      {
        put("train", trainMat);
        put("test", testMat);
      }
    };
    int nround = 2;
    Booster booster = XGBoost.train(trainMat, params, nround, watches, null, null);

* Saving model

  After training, you can save model and dump it out.

  .. code-block:: java

    booster.saveModel("model.json");

* Generating model dump with feature map

  .. code-block:: java

    // dump without feature map
    String[] model_dump = booster.getModelDump(null, false);
    // dump with feature map
    String[] model_dump_with_feature_map = booster.getModelDump("featureMap.txt", false);

* Load a model

  .. code-block:: java

    Booster booster = XGBoost.loadModel("model.json");

**********
Prediction
**********
After training and loading a model, you can use it to make prediction for other data. The result will be a two-dimension float array ``(nsample, nclass)``; for ``predictLeaf()``, the result would be of shape ``(nsample, nclass*ntrees)``.

.. code-block:: java

  DMatrix dtest = new DMatrix("test.svm.txt");
  // predict
  float[][] predicts = booster.predict(dtest);
  // predict leaf
  float[][] leafPredicts = booster.predictLeaf(dtest, 0);