1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
|
xgboost-predictor-java
======================
[](https://travis-ci.org/komiya-atsushi/xgboost-predictor-java)
[ ](https://bintray.com/komiya-atsushi/maven/xgboost-predictor/_latestVersion)
Pure Java implementation of [XGBoost](https://github.com/dmlc/xgboost/) predictor for online prediction tasks.
# Getting started
## Adding to dependencies
If you use **Maven**:
```xml
<repositories>
<repository>
<id>bintray-komiya-atsushi-maven</id>
<url>http://dl.bintray.com/komiya-atsushi/maven</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>biz.k11i</groupId>
<artifactId>xgboost-predictor</artifactId>
<version>0.3.0</version>
</dependency>
</dependencies>
```
Or **Gradle**:
```groovy
repositories {
// Use jcenter instead of mavenCentral
jcenter()
}
dependencies {
compile group: 'biz.k11i', name: 'xgboost-predictor', version: '0.3.0'
}
```
Or **sbt**:
```scala
resolvers += Resolver.jcenterRepo
libraryDependencies ++= Seq(
"biz.k11i" % "xgboost-predictor" % "0.3.0"
)
```
## Using Predictor in Java
```java
package biz.k11i.xgboost.demo;
import biz.k11i.xgboost.Predictor;
import biz.k11i.xgboost.util.FVec;
public class HowToUseXgboostPredictor {
public static void main(String[] args) throws java.io.IOException {
// If you want to use faster exp() calculation, uncomment the line below
// ObjFunction.useFastMathExp(true);
// Load model and create Predictor
Predictor predictor = new Predictor(
new java.io.FileInputStream("/path/to/xgboost-model-file"));
// Create feature vector from dense representation by array
double[] denseArray = {0, 0, 32, 0, 0, 16, -8, 0, 0, 0};
FVec fVecDense = FVec.Transformer.fromArray(
denseArray,
true /* treat zero element as N/A */);
// Create feature vector from sparse representation by map
FVec fVecSparse = FVec.Transformer.fromMap(
new java.util.HashMap<Integer, Double>() {{
put(2, 32.);
put(5, 16.);
put(6, -8.);
}});
// Predict probability or classification
double[] prediction = predictor.predict(fVecDense);
// prediction[0] has
// - probability ("binary:logistic")
// - class label ("multi:softmax")
// Predict leaf index of each tree
int[] leafIndexes = predictor.predictLeaf(fVecDense);
// leafIndexes[i] has a leaf index of i-th tree
}
}
```
## Apache Spark integration
See detail [xgboost-predictor-spark](https://github.com/komiya-atsushi/xgboost-predictor-java/tree/master/xgboost-predictor-spark).
# Benchmark
Throughput comparison to [xgboost4j 1.1](https://github.com/dmlc/xgboost/tree/master/java/xgboost4j) by [xgboost-predictor-benchmark](https://github.com/komiya-atsushi/xgboost-predictor-benchmark).
| Feature | xgboost-predictor | xgboost4j |
| ----------------- | ----------------: | -------------: |
| Model loading | 49017.60 ops/s | 39669.36 ops/s |
| Single prediction | 6016955.46 ops/s | 1018.01 ops/s |
| Batch prediction | 44985.71 ops/s | 5.04 ops/s |
| Leaf prediction | 11115853.34 ops/s | 1076.54 ops/s |
Xgboost-predictor-java is about **6,000 to 10,000 times faster than** xgboost4j on prediction tasks.
# Supported models, objective functions and API
- Models
- "gblinear"
- "gbtree"
- "dart"
- Objective functions
- "binary:logistic"
- "binary:logitraw"
- "multi:softmax"
- "multi:softprob"
- "reg:linear"
- "rank:pairwise"
- API
- Predicts probability or classification
- `Predictor#predict(FVec)`
- Outputs margin
- `Predictor#predict(FVec, true /* output margin */)`
- Predicts leaf index
- `Predictor#predictLeaf(FVec)`
|