File: deploy_docker.md

package info (click to toggle)
mlpack 4.6.2-1
links: PTS, VCS
area: main
in suites: sid
size: 31,272 kB
sloc: cpp: 226,039; python: 1,934; sh: 1,198; lisp: 414; makefile: 85
file content (382 lines) | stat: -rw-r--r-- 15,856 bytes
parent folder | download | duplicates (2)
# Deploy to a Docker container

In many machine learning applications, it can be useful to deploy a model as a
standalone Docker container that can return predictions.  This tutorial shows
how to build a Docker container with an mlpack model serving predictions.  Here
the model returns predictions in a very primitive way: directly as input from a
terminal, but it would be straightforward to adapt the container to provide a
full REST API or similar.

mlpack applications inside of Docker containers can be built in a way that the
resulting container is extremely small---sometimes even less than 1 MB!

*See also*:

 - [Installing mlpack](install.md)
 - [Compile an mlpack program](compile.md)
 - [Deploying mlpack on Windows](deploy_windows.md)
 - [Setting up an mlpack cross-compilation environment](../embedded/supported_boards.md)

## General workflow

To make a Docker container that serves predictions, we must first train a model.
Therefore, our workflow to build this container will be:

 * [Decide on the problem to solve](#problem-statement)
 * [Write a program to train the model](#model-training-program)
 * [Write a program to make predictions with the trained model](#prediction-program)
 * [Build the Docker container with the prediction program](#building-the-container)
 * [Run the Docker container](#run-the-container)

## Problem statement

For this simple example, we will solve a problem from the cybersecurity world:
[DGA detection](https://en.wikipedia.org/wiki/Domain_generation_algorithm).
The example here is based on (and heavily uses the code from) mlpack's
[DGA detection LSTM example](https://github.com/mlpack/examples/tree/master/cpp/lstm/dga_detection).

Malware authors often write malware that communicates with a centralized
command-and-control server.  This can allow the malware to update itself, or to
receive commands from an operator (e.g., 'start Bitcoin mining', or 'lock system
and display ransomware message').

The malware author cannot hardcode a domain name into their malware, because
this would be easily blocked by any antivirus software.  So, instead, a *domain
generation algorithm* is used to generate a series of domain names.  The malware
will try to contact a server at each of these domain names.  The individual DGA
domain names can look very random; for instance, some DGA domains generated by
the `matsnu` malware family are:

 * `brothernerveplacebringconsult.com`
 * `screencatchdishtellproposed.com`
 * `balladoptwelladdinfluence.com`
 * `capitalhuntdealsmokeboxclue.com`

Other malware families may generate very random looking names, like
`i828ywu0ywqs.net` or similar.  Since the set of domain names that can be
generated by a DGA is huge, blocking individual domain names is not a realistic
mitigation strategy.  However, we can use machine learning techniques to detect
DGA-generated domain names with a high level of accuracy!

This strategy has been shown to be effective in
[some previous work](https://www.arxiv.org/pdf/1611.00791).

Adapting that approach for simplicity, we will train simple recurrent neural
networks with LSTMs to detect benign domains and DGA domains, and then the
Docker container application will read domain names as input and output a score
indicating the likelihood that the domain name was generated by a DGA.

## Model training program

Training a model can be done as a separate standalone program; since our goal is
just to provide a container that produces predictions, the program in the
container does not need to support training.

The C++ code for training a DGA detector is available as the standalone program
[`lstm_dga_detection_train.cpp`](https://github.com/mlpack/examples/blob/master/cpp/lstm/dga_detection/lstm_dga_detection_train.cpp)
We can compile it with a call to `g++`, following the instructions from
[the compilation guide](compile.md):

```sh
g++ -O3 -o lstm_dga_detection_train lstm_dga_detection_train.cpp -fopenmp -larmadillo
```

Some modification of the command above may be necessary if mlpack is installed
on your system in a nonstandard location, or if you are not using the Armadillo
wrapper.  See [Configuring mlpack with compile-time
definitions](compile.md#configuring-mlpack-with-compile-time-definitions) and
[Linking without the Armadillo
wrapper](compile.md#linking-without-the-armadillo-wrapper) for more details.

Once the program is compiled, we can train on a
[dataset of domain names](https://datasets.mlpack.org/dga_domains.csv.gz).  The
commands below will download the prepared dataset from the mlpack website, and
then run the training process.

```sh
wget https://datasets.mlpack.org/dga_domains.csv.gz
gunzip dga_domains.csv.gz
./lstm_dga_detection_train dga_domains.csv
```

The training process may take a while, but when it is finished, performance
statistics about the models will be printed, and the model will be saved to
`lstm_dga_detector.bin`.  The models should achieve 99%+ accuracy on the
held-out test data.

## Prediction program

The examples repository also provides the standalone prediction program
[`lstm_dga_detection_predict.cpp`](https://github.com/mlpack/examples/blob/master/cpp/lstm/dga_detection/lstm_dga_detection_predict.cpp).
We can also compile this with a call to `g++`, following the instructions from
[the compilation guide](compile.md):

```sh
g++ -O3 -o lstm_dga_detection_predict lstm_dga_detection_predict.cpp -fopenmp -larmadillo -static
```

As with the training program [above](#model-training-program), some modification
of the compilation command may be necessary depending on your configuration.

We also specified the `-static` option here, so that the produced program is
statically linked.  This will help us deploy into a Docker container, since we
can just run the program directly and do not need to ensure that supporting
libraries are available.  Note that you will need a statically-compiled version
of Armadillo available to link against, or instead define the compiler option
`-DARMA_DONT_USE_WRAPPER` and link with static OpenBLAS using `-lopenblas`.

The prediction program reads from stdin, and once a domain is entered, a
prediction is computed and the word `malicious` or `benign` is emitted, along
with a score indicating the model's "confidence".  A sample transcript of the
program is below:

```text
$ ./lstm_dga_detection_predict lstm_dga_detector_benign.bin lstm_dga_detector_malicious.bin
www.mlpack.org
benign (score 44.5417)
mdfvkejbqoxg.ru
malicious (score 22.2786)
arma.sourceforge.net
benign (score 19.4883)
11b5n854ublnv152.net
malicious (score 7.89951)
```

## Building the container

Building a Docker container that runs `lstm_dga_detection_predict` is very
simple; we only need to put the prediction program and model in the container.
In fact, to save additional time, we can use a
[distroless](https://github.com/GoogleContainerTools/distroless) container.
This code can be used as the `Dockerfile`:

```dockerfile
FROM gcr.io/distroless/static-debian12

ADD lstm_dga_detection_predict .
ADD lstm_dga_detector_benign.bin .
ADD lstm_dga_detector_malicious.bin .

ENTRYPOINT ["./lstm_dga_detection_predict", \
            "lstm_dga_detector_benign.bin", \
            "lstm_dga_detector_malicious.bin"]
```

Building the container is simple (and nearly instantaneous):

```sh
docker build -t lstm_dga_detector .
```

And once it is built, it is easy to see that the container is relatively small:

```text
$ docker images | grep -B 1 lstm_dga_detector
REPOSITORY                          TAG       IMAGE ID       CREATED          SIZE
lstm_dga_detector                   latest    9fc931a2cdae   35 seconds ago   33.7MB
```

However, we haven't even tried to optimize for size---see the [Reducing the size
of the container further](#reducing-the-size-of-the-container-further) section

## Run the container

The container can now be deployed or run in any standard Docker environment
(including on Kubernetes, although that seems like overkill for such a simple
example).

Running the container locally (and interacting with the prediction service) is a
simple command:

```sh
docker run --rm -it lstm_dga_detector
```

Once the container has started, simply type a domain name, hit enter, and a
prediction (plus score) will be printed.

To build the example into a more complex application, you can use
`lstm_dga_detection_predict.cpp` as a starting point.

## Reducing the size of the container further

Although 33.7 MB for a container is already orders of magnitude smaller than an
equivalent unoptimized Python container, it is readily possible via compilation
options and a few other tricks to get the size to be significantly smaller.  The
vast majority of the size of the container is just the size of the compiled
`lstm_dga_detection_predict`:

```text
$ ls -lh
-rwxrwxr-x 1 ryan ryan  31M Mar  6 20:46 lstm_dga_detection_predict
-rw-rw-r-- 1 ryan ryan  80K Mar  5 19:10 lstm_dga_detector_benign.bin
-rw-rw-r-- 1 ryan ryan  80K Mar  5 19:10 lstm_dga_detector_malicious.bin
```

31 MB for a compiled program is quite large!  But, as it turns out, most of this
is due to dependencies.  We can see this by running a command that does not
perform linking, like this:

```sh
g++ -O3 -c -o lstm_dga_detection_predict.o lstm_dga_detection_predict.cpp
```

This compiled (but not linked) object file is much smaller:

```text
$ ls -lh lstm_dga_detection_predict.o
-rw-rw-r-- 1 ryan ryan 1.6M Mar  6 20:48 lstm_dga_detection_predict.o
```

That implies that our primary size issue is not with our own code, but instead
with our dependencies.  Specifically, on most systems, OpenBLAS is compiled to
support any architecture and this results in the statically-linked OpenBLAS
library being *quite* large.  On a Debian system:

```text
$ cd /usr/lib/x86_64-linux-gnu/openblas-pthread/
$ ls -lh libopenblasp-r0.3.28.a
-rw-r--r-- 1 root root 61M Nov 20 05:52 libopenblasp-r0.3.28.a
```

61 MB is very large!  Although not all of OpenBLAS is used by our DGA detection
program, a significant portion is, and this is the primary culprit for our large
program size.  Two alternatives to reduce this size are:

#### Use reference BLAS and LAPACK instead

The reference BLAS and LAPACK implementations are often slower, but much smaller
in size.

 * On a Debian or Ubuntu system, install `libblas-dev` and `liblapack-dev`
   instead of `libopenblas-dev`.

 * This step alone reduces the size of the statically-linked
   `lstm_dga_detection_predict` to ***3.4 MB***, but the use of reference BLAS
   and LAPACK is likely to be slower than OpenBLAS.

#### Compile OpenBLAS manually

Another option is to compile OpenBLAS manually
[from source](https://github.com/OpenMathLib/OpenBLAS) to reduce its size.
OpenBLAS is compiled with a `make` command; options for this can be found in the
[OpenBLAS manual](http://www.openmathlib.org/OpenBLAS/docs/build_system/#important-variables).

As an example (***note:*** do not use this command directly on your system, see
the description of the options and decide which is right for your system), the
following command produces an OpenBLAS library that is only 16MB.

```sh
make \
    TARGET=NEHALEM \
    DYNAMIC_ARCH=0 \
    COMMON_OPTS="-Os -ffunction-sections -fdata-sections" \
    NO_SHARED=1 \
    BUILD_DOUBLE=0 \
    BUILD_COMPLEX=0 \
    BUILD_COMPLEX16=0 \
    BUILD_BFLOAT16=0
```

Looking at each option:

 * `TARGET=NEHALEM` and `DYNAMIC_ARCH=0` specifies that this version of OpenBLAS
   can ***only*** be run on
   [Nehalem](https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)) or newer
   Intel processors.
   - ***This causes the code to be significantly less portable.***
   - However, the `DYNAMIC_ARCH=0` option *significantly* reduces the size of
     OpenBLAS and therefore downstream code too by compiling only for the
     processor of interest.
   - For other `TARGET` options, see `TargetList.txt` in the OpenBLAS source
     code.

 * `COMMON_OPTS="-Os -ffunction-sections -fdata-sections"` specifies that the
   compiler should aim to keep the compiled code as small as possible, and
   include section information for later stripping and further size minimization
   of code.

 * `NO_SHARED=1` specifies that only the static version of OpenBLAS
   (`libopenblas.a`) should be built.

 * `BUILD_DOUBLE=0 BUILD_COMPLEX=0 BUILD_COMPLEX16=0 BUILD_BFLOAT16=0` disables
   all OpenBLAS functions for data types we are not using in our program.
   - The `lstm_dga_detection_predict.cpp` code only uses `arma::fmat` (e.g.
     matrices with `float`), so we can omit the other functions.

With this stripped-down version of OpenBLAS, the size of
`lstm_dga_detection_predict` with no further compilation modifications is
reduced to ***3.5 MB***.

***Note:*** when compiling `lstm_dga_detection_predict`, linking against the
hand-compiled OpenBLAS will require specifying the `-L/path/to/openblas/` option
so that the linker finds the manually-compiled OpenBLAS version.

### Compilation options for further size reduction

We have gone from 31 MB to roughly 3 MB just by replacing our OpenBLAS
implementation with either reference BLAS/LAPACK or a hand-compiled version.
However, we can specify some additional compiler options to reduce the size
further.  Assuming that we have placed the manually-compiled `libopenblas.a` in
the same directory as `lstm_dga_detection_predict.cpp`, we can compile with the
following command to reduce size even further:

```sh
g++ -o lstm_dga_detection_predict lstm_dga_detection_predict.cpp \
    -Os \
    -DNDEBUG \
    -DARMA_DONT_USE_WRAPPER \
    -ffunction-sections -fdata-sections \
    -static \
    -L. -lopenblas \
    -Wl,--gc-sections \
    -Wl,--strip-all
```

Looking at each option:

 * `-Os` tells the compiler to optimize each function to have a small size.
 * `-DNDEBUG` removes any debugging symbols or code paths.
 * `-DARMA_DONT_USE_WRAPPER` is an Armadillo directive that tells Armadillo not
   to link against the Armadillo runtime library but instead directly against
   OpenBLAS.
   - This is why we use `-lopenblas` later instead of `-larmadillo`.
 * `-ffunction-sections -fdata-sections` are used so that the linker can later
   remove unused sections from the code.
 * `-Wl,--gc-sections -Wl,--strip-all` tells the linker to strip all unused code
   from the final program.
 * Note that we have *omitted* `-fopenmp`; this will cause compiled code to be
   somewhat smaller, but it will execute serially.  Given that our prediction
   program only predicts a single domain at a time, this is acceptable.

This shaves off more than 1 MB:

```text
$ ls -lh lstm_dga_detection_predict
-rwxrwxr-x 1 ryan ryan 1.9M Mar  6 21:37 lstm_dga_detection_predict
```

That is a size reduction of basically ***16x*** entirely obtained through
compiler options.  When the Docker container is recompiled with the new version
of `lstm_dga_detection_predict`, the size is significantly improved:

```text
$ docker images | grep -B 1 lstm
REPOSITORY                          TAG       IMAGE ID       CREATED          SIZE
lstm_dga_detector                   latest    0be5a677fd9c   6 seconds ago   4.09MB
```

But there is still room for additional size reduction, either through
further compiler options or more intrusive code modifications.  Likely the
largest contributor to code size after the optimizations above are
serialization and the standard libraries:

 * Avoiding the use of `data::Load()` (and thus the
   [Cereal](https://uscilab.github.io/cereal/) serialization library) by
   directly saving the weights of the RNN using Armadillo's built-in save
   functionality would be effective.

 * Replacing the standard `libc` and `libstdc++` implementation with lightweight
   replacements, like [`musl`](https://musl.libc.org/) and others.

These further optimizations, however, are beyond the scope of this tutorial.