1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
|
---
title: "Using docker containers"
description: >
A guide for arrow developers wanting to use docker
output: rmarkdown::html_vignette
---
Arrow is compatible with a huge number of combinations of OSs, OS versions,
compilers, R versions, and other variables. Sometimes these combinations of
variables means that behaviours are found in some environments which cannot be
replicated in others. In addition, there are different ways of building Arrow,
for example, using environment variables to specify the building of optional
components.
What all this means is that you may need to use a different setup to the one in
which you are working, when diagnosing a bug or testing out a new feature which
you have reason to believe may be affected by these variables. One way to do
this is so spin up a Docker image containing the desired setup.
This article provides a basic guide to using Docker in your R development.
## How do I run a Docker container?
There are a number of images which have been created for the convenience of
Arrow devs and you can find them on [the DockerHub repo](https://hub.docker.com/r/apache/arrow-dev/tags).
The code below shows an example command you could use to run a Docker container.
This should be run in the root directory of a checkout of the arrow repo.
```shell
docker run -it -e ARROW_DEPENDENCY_SOURCE=AUTO -v $(pwd):/arrow apache/arrow-dev:r-rhub-ubuntu-release-latest
```
Components:
* `docker run` - command to run the container
* `-it` - run with an interactive terminal so you can run commands on the containers
* `-e ARROW_DEPENDENCY_SOURCE=AUTO` - set the environment variable `ARROW_DEPENDENCY_SOURCE` to the value `AUTO`
* `-v $(pwd):/arrow` - mount the current directory at `/arrow` in the container
* `apache/arrow-dev` - the DockerHub repo to get this container from
* `r-rhub-ubuntu-release-latest` - the image tag
Once you run this command, if you don't have a copy of that particular image
saved locally, it will first be downloaded before a container is spun up.
In the example above, mounting the directory in which the Arrow repo was stored
on the local machine, meant that that code could be built and tested on the
container.
## How do I exit this image?
On Linux, press Ctrl+D.
## How do I show all images saved?
```shell
docker images
```
## How do I show all running containers?
```shell
docker ps
```
## How do I show all containers?
```shell
sudo docker ps -a
```
## Running existing workflows from compose.yaml
There are a number of workflows outlined in the file `compose.yaml` in the
arrow repo root directory. For example, you can use the workflow called `r` to
test building and installing the R package. This is advantageous as you can use
existing utility scripts and install it onto a container which already has R on
it.
These workflows are also parameterized, which means you can specify different
options (or just use the defaults, which can be found in `.env`)
### Example - The manual way
If you wanted to run [RHub's latest `ubuntu-release` image](https://hub.docker.com/r/rhub/ubuntu-release), you could
run:
```shell
R_ORG=rhub R_IMAGE=ubuntu-release R_TAG=latest docker compose build r
R_ORG=rhub R_IMAGE=ubuntu-release R_TAG=latest docker compose run r
```
### Example - Using Archery
Alternatively, you may prefer to use the [Archery tool to run docker images](https://arrow.apache.org/docs/developers/docker.html).
This has the advantage of making it simpler to build some of the existing Arrow
CI jobs which have hierarchical dependencies, and so for example, you could
build the R package on a container which already has the C++ code pre-built.
This is the same tool which our CI uses - via a tool called [Crossbow](https://arrow.apache.org/docs/developers/crossbow.html).
If you want to run the `r` workflow discussed above, you could run:
```shell
R_ORG=rhub R_IMAGE=ubuntu-release R_TAG=latest archery docker run r
```
|