1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382
|
# Deploy to a Docker container
In many machine learning applications, it can be useful to deploy a model as a
standalone Docker container that can return predictions. This tutorial shows
how to build a Docker container with an mlpack model serving predictions. Here
the model returns predictions in a very primitive way: directly as input from a
terminal, but it would be straightforward to adapt the container to provide a
full REST API or similar.
mlpack applications inside of Docker containers can be built in a way that the
resulting container is extremely small---sometimes even less than 1 MB!
*See also*:
- [Installing mlpack](install.md)
- [Compile an mlpack program](compile.md)
- [Deploying mlpack on Windows](deploy_windows.md)
- [Setting up an mlpack cross-compilation environment](../embedded/supported_boards.md)
## General workflow
To make a Docker container that serves predictions, we must first train a model.
Therefore, our workflow to build this container will be:
* [Decide on the problem to solve](#problem-statement)
* [Write a program to train the model](#model-training-program)
* [Write a program to make predictions with the trained model](#prediction-program)
* [Build the Docker container with the prediction program](#building-the-container)
* [Run the Docker container](#run-the-container)
## Problem statement
For this simple example, we will solve a problem from the cybersecurity world:
[DGA detection](https://en.wikipedia.org/wiki/Domain_generation_algorithm).
The example here is based on (and heavily uses the code from) mlpack's
[DGA detection LSTM example](https://github.com/mlpack/examples/tree/master/cpp/lstm/dga_detection).
Malware authors often write malware that communicates with a centralized
command-and-control server. This can allow the malware to update itself, or to
receive commands from an operator (e.g., 'start Bitcoin mining', or 'lock system
and display ransomware message').
The malware author cannot hardcode a domain name into their malware, because
this would be easily blocked by any antivirus software. So, instead, a *domain
generation algorithm* is used to generate a series of domain names. The malware
will try to contact a server at each of these domain names. The individual DGA
domain names can look very random; for instance, some DGA domains generated by
the `matsnu` malware family are:
* `brothernerveplacebringconsult.com`
* `screencatchdishtellproposed.com`
* `balladoptwelladdinfluence.com`
* `capitalhuntdealsmokeboxclue.com`
Other malware families may generate very random looking names, like
`i828ywu0ywqs.net` or similar. Since the set of domain names that can be
generated by a DGA is huge, blocking individual domain names is not a realistic
mitigation strategy. However, we can use machine learning techniques to detect
DGA-generated domain names with a high level of accuracy!
This strategy has been shown to be effective in
[some previous work](https://www.arxiv.org/pdf/1611.00791).
Adapting that approach for simplicity, we will train simple recurrent neural
networks with LSTMs to detect benign domains and DGA domains, and then the
Docker container application will read domain names as input and output a score
indicating the likelihood that the domain name was generated by a DGA.
## Model training program
Training a model can be done as a separate standalone program; since our goal is
just to provide a container that produces predictions, the program in the
container does not need to support training.
The C++ code for training a DGA detector is available as the standalone program
[`lstm_dga_detection_train.cpp`](https://github.com/mlpack/examples/blob/master/cpp/lstm/dga_detection/lstm_dga_detection_train.cpp)
We can compile it with a call to `g++`, following the instructions from
[the compilation guide](compile.md):
```sh
g++ -O3 -o lstm_dga_detection_train lstm_dga_detection_train.cpp -fopenmp -larmadillo
```
Some modification of the command above may be necessary if mlpack is installed
on your system in a nonstandard location, or if you are not using the Armadillo
wrapper. See [Configuring mlpack with compile-time
definitions](compile.md#configuring-mlpack-with-compile-time-definitions) and
[Linking without the Armadillo
wrapper](compile.md#linking-without-the-armadillo-wrapper) for more details.
Once the program is compiled, we can train on a
[dataset of domain names](https://datasets.mlpack.org/dga_domains.csv.gz). The
commands below will download the prepared dataset from the mlpack website, and
then run the training process.
```sh
wget https://datasets.mlpack.org/dga_domains.csv.gz
gunzip dga_domains.csv.gz
./lstm_dga_detection_train dga_domains.csv
```
The training process may take a while, but when it is finished, performance
statistics about the models will be printed, and the model will be saved to
`lstm_dga_detector.bin`. The models should achieve 99%+ accuracy on the
held-out test data.
## Prediction program
The examples repository also provides the standalone prediction program
[`lstm_dga_detection_predict.cpp`](https://github.com/mlpack/examples/blob/master/cpp/lstm/dga_detection/lstm_dga_detection_predict.cpp).
We can also compile this with a call to `g++`, following the instructions from
[the compilation guide](compile.md):
```sh
g++ -O3 -o lstm_dga_detection_predict lstm_dga_detection_predict.cpp -fopenmp -larmadillo -static
```
As with the training program [above](#model-training-program), some modification
of the compilation command may be necessary depending on your configuration.
We also specified the `-static` option here, so that the produced program is
statically linked. This will help us deploy into a Docker container, since we
can just run the program directly and do not need to ensure that supporting
libraries are available. Note that you will need a statically-compiled version
of Armadillo available to link against, or instead define the compiler option
`-DARMA_DONT_USE_WRAPPER` and link with static OpenBLAS using `-lopenblas`.
The prediction program reads from stdin, and once a domain is entered, a
prediction is computed and the word `malicious` or `benign` is emitted, along
with a score indicating the model's "confidence". A sample transcript of the
program is below:
```text
$ ./lstm_dga_detection_predict lstm_dga_detector_benign.bin lstm_dga_detector_malicious.bin
www.mlpack.org
benign (score 44.5417)
mdfvkejbqoxg.ru
malicious (score 22.2786)
arma.sourceforge.net
benign (score 19.4883)
11b5n854ublnv152.net
malicious (score 7.89951)
```
## Building the container
Building a Docker container that runs `lstm_dga_detection_predict` is very
simple; we only need to put the prediction program and model in the container.
In fact, to save additional time, we can use a
[distroless](https://github.com/GoogleContainerTools/distroless) container.
This code can be used as the `Dockerfile`:
```dockerfile
FROM gcr.io/distroless/static-debian12
ADD lstm_dga_detection_predict .
ADD lstm_dga_detector_benign.bin .
ADD lstm_dga_detector_malicious.bin .
ENTRYPOINT ["./lstm_dga_detection_predict", \
"lstm_dga_detector_benign.bin", \
"lstm_dga_detector_malicious.bin"]
```
Building the container is simple (and nearly instantaneous):
```sh
docker build -t lstm_dga_detector .
```
And once it is built, it is easy to see that the container is relatively small:
```text
$ docker images | grep -B 1 lstm_dga_detector
REPOSITORY TAG IMAGE ID CREATED SIZE
lstm_dga_detector latest 9fc931a2cdae 35 seconds ago 33.7MB
```
However, we haven't even tried to optimize for size---see the [Reducing the size
of the container further](#reducing-the-size-of-the-container-further) section
## Run the container
The container can now be deployed or run in any standard Docker environment
(including on Kubernetes, although that seems like overkill for such a simple
example).
Running the container locally (and interacting with the prediction service) is a
simple command:
```sh
docker run --rm -it lstm_dga_detector
```
Once the container has started, simply type a domain name, hit enter, and a
prediction (plus score) will be printed.
To build the example into a more complex application, you can use
`lstm_dga_detection_predict.cpp` as a starting point.
## Reducing the size of the container further
Although 33.7 MB for a container is already orders of magnitude smaller than an
equivalent unoptimized Python container, it is readily possible via compilation
options and a few other tricks to get the size to be significantly smaller. The
vast majority of the size of the container is just the size of the compiled
`lstm_dga_detection_predict`:
```text
$ ls -lh
-rwxrwxr-x 1 ryan ryan 31M Mar 6 20:46 lstm_dga_detection_predict
-rw-rw-r-- 1 ryan ryan 80K Mar 5 19:10 lstm_dga_detector_benign.bin
-rw-rw-r-- 1 ryan ryan 80K Mar 5 19:10 lstm_dga_detector_malicious.bin
```
31 MB for a compiled program is quite large! But, as it turns out, most of this
is due to dependencies. We can see this by running a command that does not
perform linking, like this:
```sh
g++ -O3 -c -o lstm_dga_detection_predict.o lstm_dga_detection_predict.cpp
```
This compiled (but not linked) object file is much smaller:
```text
$ ls -lh lstm_dga_detection_predict.o
-rw-rw-r-- 1 ryan ryan 1.6M Mar 6 20:48 lstm_dga_detection_predict.o
```
That implies that our primary size issue is not with our own code, but instead
with our dependencies. Specifically, on most systems, OpenBLAS is compiled to
support any architecture and this results in the statically-linked OpenBLAS
library being *quite* large. On a Debian system:
```text
$ cd /usr/lib/x86_64-linux-gnu/openblas-pthread/
$ ls -lh libopenblasp-r0.3.28.a
-rw-r--r-- 1 root root 61M Nov 20 05:52 libopenblasp-r0.3.28.a
```
61 MB is very large! Although not all of OpenBLAS is used by our DGA detection
program, a significant portion is, and this is the primary culprit for our large
program size. Two alternatives to reduce this size are:
#### Use reference BLAS and LAPACK instead
The reference BLAS and LAPACK implementations are often slower, but much smaller
in size.
* On a Debian or Ubuntu system, install `libblas-dev` and `liblapack-dev`
instead of `libopenblas-dev`.
* This step alone reduces the size of the statically-linked
`lstm_dga_detection_predict` to ***3.4 MB***, but the use of reference BLAS
and LAPACK is likely to be slower than OpenBLAS.
#### Compile OpenBLAS manually
Another option is to compile OpenBLAS manually
[from source](https://github.com/OpenMathLib/OpenBLAS) to reduce its size.
OpenBLAS is compiled with a `make` command; options for this can be found in the
[OpenBLAS manual](http://www.openmathlib.org/OpenBLAS/docs/build_system/#important-variables).
As an example (***note:*** do not use this command directly on your system, see
the description of the options and decide which is right for your system), the
following command produces an OpenBLAS library that is only 16MB.
```sh
make \
TARGET=NEHALEM \
DYNAMIC_ARCH=0 \
COMMON_OPTS="-Os -ffunction-sections -fdata-sections" \
NO_SHARED=1 \
BUILD_DOUBLE=0 \
BUILD_COMPLEX=0 \
BUILD_COMPLEX16=0 \
BUILD_BFLOAT16=0
```
Looking at each option:
* `TARGET=NEHALEM` and `DYNAMIC_ARCH=0` specifies that this version of OpenBLAS
can ***only*** be run on
[Nehalem](https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)) or newer
Intel processors.
- ***This causes the code to be significantly less portable.***
- However, the `DYNAMIC_ARCH=0` option *significantly* reduces the size of
OpenBLAS and therefore downstream code too by compiling only for the
processor of interest.
- For other `TARGET` options, see `TargetList.txt` in the OpenBLAS source
code.
* `COMMON_OPTS="-Os -ffunction-sections -fdata-sections"` specifies that the
compiler should aim to keep the compiled code as small as possible, and
include section information for later stripping and further size minimization
of code.
* `NO_SHARED=1` specifies that only the static version of OpenBLAS
(`libopenblas.a`) should be built.
* `BUILD_DOUBLE=0 BUILD_COMPLEX=0 BUILD_COMPLEX16=0 BUILD_BFLOAT16=0` disables
all OpenBLAS functions for data types we are not using in our program.
- The `lstm_dga_detection_predict.cpp` code only uses `arma::fmat` (e.g.
matrices with `float`), so we can omit the other functions.
With this stripped-down version of OpenBLAS, the size of
`lstm_dga_detection_predict` with no further compilation modifications is
reduced to ***3.5 MB***.
***Note:*** when compiling `lstm_dga_detection_predict`, linking against the
hand-compiled OpenBLAS will require specifying the `-L/path/to/openblas/` option
so that the linker finds the manually-compiled OpenBLAS version.
### Compilation options for further size reduction
We have gone from 31 MB to roughly 3 MB just by replacing our OpenBLAS
implementation with either reference BLAS/LAPACK or a hand-compiled version.
However, we can specify some additional compiler options to reduce the size
further. Assuming that we have placed the manually-compiled `libopenblas.a` in
the same directory as `lstm_dga_detection_predict.cpp`, we can compile with the
following command to reduce size even further:
```sh
g++ -o lstm_dga_detection_predict lstm_dga_detection_predict.cpp \
-Os \
-DNDEBUG \
-DARMA_DONT_USE_WRAPPER \
-ffunction-sections -fdata-sections \
-static \
-L. -lopenblas \
-Wl,--gc-sections \
-Wl,--strip-all
```
Looking at each option:
* `-Os` tells the compiler to optimize each function to have a small size.
* `-DNDEBUG` removes any debugging symbols or code paths.
* `-DARMA_DONT_USE_WRAPPER` is an Armadillo directive that tells Armadillo not
to link against the Armadillo runtime library but instead directly against
OpenBLAS.
- This is why we use `-lopenblas` later instead of `-larmadillo`.
* `-ffunction-sections -fdata-sections` are used so that the linker can later
remove unused sections from the code.
* `-Wl,--gc-sections -Wl,--strip-all` tells the linker to strip all unused code
from the final program.
* Note that we have *omitted* `-fopenmp`; this will cause compiled code to be
somewhat smaller, but it will execute serially. Given that our prediction
program only predicts a single domain at a time, this is acceptable.
This shaves off more than 1 MB:
```text
$ ls -lh lstm_dga_detection_predict
-rwxrwxr-x 1 ryan ryan 1.9M Mar 6 21:37 lstm_dga_detection_predict
```
That is a size reduction of basically ***16x*** entirely obtained through
compiler options. When the Docker container is recompiled with the new version
of `lstm_dga_detection_predict`, the size is significantly improved:
```text
$ docker images | grep -B 1 lstm
REPOSITORY TAG IMAGE ID CREATED SIZE
lstm_dga_detector latest 0be5a677fd9c 6 seconds ago 4.09MB
```
But there is still room for additional size reduction, either through
further compiler options or more intrusive code modifications. Likely the
largest contributor to code size after the optimizations above are
serialization and the standard libraries:
* Avoiding the use of `data::Load()` (and thus the
[Cereal](https://uscilab.github.io/cereal/) serialization library) by
directly saving the weights of the RNN using Armadillo's built-in save
functionality would be effective.
* Replacing the standard `libc` and `libstdc++` implementation with lightweight
replacements, like [`musl`](https://musl.libc.org/) and others.
These further optimizations, however, are beyond the scope of this tutorial.
|