1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
|
This directory contains the YAML pipeline(s) and a bunch of ancillary scripts
for our gitlab.com CI.
Running CI locally
------------------
Because the gitlab.com jobs run in Docker containers, you can get a pretty
close match to the CI environment locally, which is useful for debugging.
These steps mount your Charliecloud source directory live inside the
container, so you can actually iterate on the code.
WARNING: CHARLIECLOUD DEVELOPERS SHOULD NOT USE THIS AS A SUBSTITUTE FOR A
HOST ENVIRONMENT THAT PASSES TESTS.
To run the test suite interactively in such an environment:
1. Build the Docker image you need, with ./build-images or manually.
2. Get a shell with e.g.:
$ docker run --privileged -it -w /src \
--mount type=bind,src=.,dst=/src ci_debian
NOTE: This assumes that the “docker” command can deal with your web
proxy, if any. See the FAQ.
WARNING re. Git worktrees: The error “fatal: not a git repository” does
not immediately stop the build but will break the tests. In Git
worktrees, .git is not the repository contents but rather a plain file
containing the host absolute path of the repository. Because we only
bind-mounted “.”, this directory is not available in the container. There
are workarounds but I haven’t figured out a good one.
3. Run something pretty close the “test” job:
> cd /builds/charliecloud/charliecloud
> test/gitlab.com/testjob.sh
4. Or, because you have an interactive shell, you can test whatever. The
script above (and the one it calls) has steps to consider. The container
path contains “/builds/charliecloud/main/bin” for your convenience.
According to the internet, you can also run the actual GitLab runner in a
local Docker container [8]. I have not tested this.
Misc useful things to know
---------------------------
* Each “job” gets its own ephemeral VM [1], which is different from GitHub
Actions. You can pass things between jobs with:
1. “artifacts”, which are basically zip files that the runners pass around
and zip/unzip automatically [2], or
2. the “cache”, which is also a zip file but manipulated using different
rules that seem a bit tricky to me [3]. The cache limit is 5 GiB.
Each job creates at most one artifact.
Notes/gotchas:
1. The Git working directory ($CI_PROJECT_DIR) is *not* automatically
transferred between jobs. Use the “artifacts:untracked” keyword.
2. Only files below the Git working directory may be put in the artifact.
3. Artifacts kind of don’t work with “parallel:matrix”. See below.
* A “pipeline” is a DAG of jobs. A project has one pipeline unless you use
“downstream pipelines” [11], which seemed overly complicated to me.
Therefore we only have one pipeline.yaml, but we have a MODE variable that
gates some of the jobs.
* The “workflow” keyword controls whether the pipeline is created at all,
while “rules” keywords within each job control whether that job is added to
the pipeline.
* Pushing to a branch associated with a MR causes two pipelines:
1. A “branch pipeline” for the tip of the branch, labeled “branch” in the
web UI.
2. A “merge request pipeline” for the result of merging the branch to the
target branch (usually main), labeled “branch”.
We want the second because this will catch bugs introduced by the new code
as well as the merge itself. Therefore we exclude the first in the
“workflow” keyword.
Notes/gotchas:
1. If the branch doesn’t merge automatically, there is no pipeline at all.
2. We have no access to the actual commit message, only a fake one
generated by GitLab for the merge commit.
* Within the ephemeral VM, each job *also* runs inside a Docker container
using an image you select. Notably, this is “--privileged” Docker, so
“Docker-In-Docker” (DIND) works, as does Charliecloud [4].
In our pipeline, we first build some container images suitable for building
and running Charliecloud (if they need updating), then use those for the
rest of the pipeline.
Dockerfile notes/gotchas:
* These Dockerfiles must use HTTP(S) transport, not SSH, because CI (and
also maybe Docker) will not have your SSH keys, which are needed for
pulling even public repositories.
* Unless otherwise specified, we use the most recent tagged version of
dependencies.
* When iterating, run build-images locally to take advantage of your
Docker cache (the CI workers start fresh every time).
* Writing shell scripts within YAML is tricky, mostly because of nasty
interactions between shell and YAML quoting [5]. We have various workarounds
for this.
* Standard output and standard error lines can be out of order (e.g. “set +x”
traces).
* The GitLab web UI *does* interpret ANSI escape codes.
* While we do define stages, if a job specifies “needs”, then it runs when all
the needed jobs finish, regardless of stage [10]. Otherwise, it runs when
all jobs in the prior stage finish.
FIXME — no artifact transfer between corresponding parallel jobs
- generate the .yml file (via “trigger”?)
- put Charliecloud build/test in the same job so no artifact needed
Matrix
------
GitLab runs jobs in parallel using a “matrix”, which a set of variable-value
vectors specified using Cartesian products or one-off entries (i.e., it’s a
sparse matrix so we are good citizens with CI resources and don’t clutter the
results with distracting values).
Notably, matrix jobs cannot pass a single artifact to a later, corresponding
matrix job [6]. For this reason, we build Charliecloud and run the entire test
suite in a single job. Some of the jobs exit after building Charliecloud.
Alternatives include:
1. Generate some/all of the .yml files (via “trigger”?), which would allow
us to have parallel jobs via generated names rather than matrix. This
seemed too complicated for the benefit.
2. Download and unpack artifacts via the API (in “script”) rather than YAML
directives. This also seemed too complicated, especially since we need to
preserve file metadata.
Notes/gotchas:
1. Values are strings and cannot be duplicates [7]. This leads to some
contrived values.
2. All-caps variables prefixed by “CH_” are used by Charliecloud, while
lower-case “ci_” are used by this pipeline. Hence the more descriptive
values for lower-case.
Matrix dimensions with variable names and possible values:
1. CH_TEST_BUILDER (ch-image, docker)
2. CH_IMAGE_CACHE (enabled, rebuild, disabled)
3. CH_TEST_PACK_FMT (squash-mount, squash-unpack, tar-unpack)
4. ci_arch (amd64, arm64)
Architecture. Values need to match the gitlab.com runner tags [9].
4. ci_distro (see *.df)
Linux distribution. The main considerations are compiler and libc.
5. ci_sudo (sudo-yes, sudo-no)
Whether the “charlie” user that the tests run as has sudo for the test
suite. The images set up the user with sudo and then “before_script”
deletes it if appropriate.
References
----------
[1]: https://docs.gitlab.com/ee/ci/runners
[2]: https://docs.gitlab.com/ee/ci/jobs/job_artifacts.html
[3]: https://docs.gitlab.com/ee/ci/caching
[4]: https://hub.docker.com/_/docker
[5]: https://yaml-multiline.info
[6]: https://forum.gitlab.com/t/how-to-matrix-subsequent-jobs-referencing-artifacts-from-previous-matrix-job/66849/2
[7]: https://docs.gitlab.com/ee/ci/yaml/index.html#parallelmatrix
[8]: https://stackoverflow.com/a/65920577
[9]: https://docs.gitlab.com/ee/ci/runners/hosted_runners/linux.html
[10]: https://docs.gitlab.com/ee/ci/yaml/needs.html
[11]: https://docs.gitlab.com/ee/ci/pipelines/downstream_pipelines.html
[12]: https://docs.gitlab.com/ci/yaml/workflow/#switch-between-branch-pipelines-and-merge-request-pipelines
--
LocalWords: parallelmatrix FMT ci needsparallelmatrix readonly dst testjob
|