File: troubleshooting.md

package info (click to toggle)
gitlab 17.6.5-19
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 629,368 kB
  • sloc: ruby: 1,915,304; javascript: 557,307; sql: 60,639; xml: 6,509; sh: 4,567; makefile: 1,239; python: 406
file content (218 lines) | stat: -rw-r--r-- 9,289 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
---
stage: Deploy
group: Environments
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments
---

# Troubleshooting the GitLab agent for Kubernetes

When you are using the GitLab agent for Kubernetes, you might experience issues you need to troubleshoot.

You can start by viewing the service logs:

```shell
kubectl logs -f -l=app.kubernetes.io/name=gitlab-agent -n gitlab-agent
```

If you are a GitLab administrator, you can also view the [GitLab agent server logs](../../../administration/clusters/kas.md#troubleshooting).

## Transport: Error while dialing failed to WebSocket dial

```json
{
  "level": "warn",
  "time": "2020-11-04T10:14:39.368Z",
  "msg": "GetConfiguration failed",
  "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing failed to WebSocket dial: failed to send handshake request: Get \\\"https://gitlab-kas:443/-/kubernetes-agent\\\": dial tcp: lookup gitlab-kas on 10.60.0.10:53: no such host\""
}
```

This error occurs when there are connectivity issues between the `kas-address`
and your agent pod. To fix this issue, make sure the `kas-address` is accurate.

```json
{
  "level": "error",
  "time": "2021-06-25T21:15:45.335Z",
  "msg": "Reverse tunnel",
  "mod_name": "reverse_tunnel",
  "error": "Connect(): rpc error: code = Unavailable desc = connection error: desc= \"transport: Error while dialing failed to WebSocket dial: expected handshake response status code 101 but got 301\""
}
```

This error occurs when the `kas-address` doesn't include a trailing slash. To fix this issue, make sure that the
`wss` or `ws` URL ends with a trailing slash, like `wss://GitLab.host.tld:443/-/kubernetes-agent/`
or `ws://GitLab.host.tld:80/-/kubernetes-agent/`.

## Error while dialing failed to WebSocket dial: failed to send handshake request

```json
{
  "level": "warn",
  "time": "2020-10-30T09:50:51.173Z",
  "msg": "GetConfiguration failed",
  "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing failed to WebSocket dial: failed to send handshake request: Get \\\"https://GitLabhost.tld:443/-/kubernetes-agent\\\": net/http: HTTP/1.x transport connection broken: malformed HTTP response \\\"\\\\x00\\\\x00\\\\x06\\\\x04\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x05\\\\x00\\\\x00@\\\\x00\\\"\""
}
```

This error occurs when you configured `wss` as `kas-address` on the agent side,
but the agent server is not available at `wss`. To fix this issue, make sure the
same schemes are configured on both sides.

## Decompressor is not installed for grpc-encoding

```json
{
  "level": "warn",
  "time": "2020-11-05T05:25:46.916Z",
  "msg": "GetConfiguration.Recv failed",
  "error": "rpc error: code = Unimplemented desc = grpc: Decompressor is not installed for grpc-encoding \"gzip\""
}
```

This error occurs when the version of the agent is newer that the version of the agent server (KAS).
To fix it, make sure that both `agentk` and the agent server are the same version.

## Certificate signed by unknown authority

```json
{
  "level": "error",
  "time": "2021-02-25T07:22:37.158Z",
  "msg": "Reverse tunnel",
  "mod_name": "reverse_tunnel",
  "error": "Connect(): rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing failed to WebSocket dial: failed to send handshake request: Get \\\"https://GitLabhost.tld:443/-/kubernetes-agent/\\\": x509: certificate signed by unknown authority\""
}
```

This error occurs when your GitLab instance is using a certificate signed by an internal
certificate authority that is unknown to the agent.

To fix this issue, you can present the CA certificate file to the agent
by [customizing the Helm installation](install/index.md#customize-the-helm-installation).
Add `--set-file config.kasCaCert=my-custom-ca.pem` to the `helm install` command. The file should be a valid PEM or DER-encoded certificate.

When you deploy `agentk` with a set `config.kasCaCert` value, the certificate is added to `configmap` and the certificate file is mounted in `/etc/ssl/certs`.

```yaml
$ kubectl get configmap -lapp=gitlab-agent -o yaml
apiVersion: v1
items:
- apiVersion: v1
  data:
    ca.crt: |-
      -----BEGIN CERTIFICATE-----
      MIIFmzCCA4OgAwIBAgIUE+FvXfDpJ869UgJitjRX7HHT84cwDQYJKoZIhvcNAQEL
      ...truncated certificate...
      GHZCTQkbQyUwBWJOUyOxW1lro4hWqtP4xLj8Dpq1jfopH72h0qTGkX0XhFGiSaM=
      -----END CERTIFICATE-----
  kind: ConfigMap
  metadata:
    annotations:
      meta.helm.sh/release-name: self-signed
      meta.helm.sh/release-namespace: gitlab-agent-self-signed
    creationTimestamp: "2023-03-07T20:12:26Z"
    labels:
      app: gitlab-agent
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: gitlab-agent
      app.kubernetes.io/version: v15.9.0
      helm.sh/chart: gitlab-agent-1.11.0
    name: self-signed-gitlab-agent
    resourceVersion: "263184207"
kind: List
```

You might see a similar error in the [agent server (KAS) logs](../../../administration/logs/index.md#gitlab-agent-server) of your GitLab application server:

```json
{"level":"error","time":"2023-03-07T20:19:48.151Z","msg":"AgentInfo()","grpc_service":"gitlab.agent.agent_configuration.rpc.AgentConfiguration","grpc_method":"GetConfiguration","error":"Get \"https://gitlab.example.com/api/v4/internal/kubernetes/agent_info\": x509: certificate signed by unknown authority"}
```

To fix it, [install the public certificate of your internal CA](https://docs.gitlab.com/omnibus/settings/ssl/#install-custom-public-certificates) in the `/etc/gitlab/trusted-certs` directory.

Alternatively, you can configure the agent server (KAS) to read the certificate from a custom directory.
Add the following configuration to `/etc/gitlab/gitlab.rb`:

```ruby
gitlab_kas['env'] = {
   'SSL_CERT_DIR' => "/opt/gitlab/embedded/ssl/certs/"
 }
```

To apply the changes:

1. Reconfigure GitLab.

   ```shell
   sudo gitlab-ctl reconfigure
   ```

1. Restart `gitlab-kas`.

   ```shell
   gitlab-ctl restart gitlab-kas
   ```

## Failed to perform vulnerability scan on workload: jobs.batch already exists

```json
{
  "level": "error",
  "time": "2022-06-22T21:03:04.769Z",
  "msg": "Failed to perform vulnerability scan on workload",
  "mod_name": "starboard_vulnerability",
  "error": "running scan job: creating job: jobs.batch \"scan-vulnerabilityreport-b8d497769\" already exists"
}
```

The GitLab agent performs vulnerability scans by creating a job to scan each workload. If a scan
is interrupted, these jobs may be left behind and need to be cleaned up before more jobs can
be run. You can clean up these jobs by running:

```shell
kubectl delete jobs -l app.kubernetes.io/managed-by=starboard -n gitlab-agent
```

[We're working on making the cleanup of these jobs more robust.](https://gitlab.com/gitlab-org/gitlab/-/issues/362016)

## Parse error during installation

When you install the agent, you might encounter an error that states:

```shell
Error: parse error at (gitlab-agent/templates/observability-secret.yaml:1): unclosed action
```

This error is typically caused by an incompatible version of Helm. To resolve the issue, ensure that you are using a version of Helm [compatible with your version of Kubernetes](index.md#supported-kubernetes-versions-for-gitlab-features).

## `GitLab Agent Server: Unauthorized` error on Dashboard for Kubernetes

An error like `GitLab Agent Server: Unauthorized. Trace ID: <...>`
on the [Dashboard for Kubernetes](../../../ci/environments/kubernetes_dashboard.md) page
might be caused by one of the following:

- The `user_access` entry in the agent configuration file doesn't exist or is wrong.
  To resolve, see [Grant users Kubernetes access](user_access.md).
- There are multiple [`_gitlab_kas` cookies](../../../administration/clusters/kas.md#kubernetes-api-proxy-cookie)
  in the browser and sent to KAS. The most likely cause is multiple GitLab instances hosted
  on the same site.

  For example, `gitlab.com` set a `_gitlab_kas` cookie targeted for `kas.gitlab.com`,
  but the cookie is also sent to `kas.staging.gitlab.com`, which causes the error on `staging.gitlab.com`.

  To temporarily resolve, delete the `_gitlab_kas` cookie for `gitlab.com` from the browser cookie store.
  [Issue 418998](https://gitlab.com/gitlab-org/gitlab/-/issues/418998) proposes a fix for this known issue.
- GitLab and KAS run on different sites. For example, GitLab on `gitlab.example.com` and KAS on `kas.example.com`.
  GitLab does not support this use case. For details, see [issue 416436](https://gitlab.com/gitlab-org/gitlab/-/issues/416436).

## Agent version mismatch

In GitLab, on the **Agent** tab of the Kubernetes clusters page, you might see
a warning that says `Agent version mismatch: The agent versions do not match each other across your cluster's pods.`

This warning might be caused by an older version of the agent being cached by the agent server for Kubernetes (`kas`).
Because `kas` periodically deletes outdated agent versions, you should wait at least 20 minutes for the agent
and GitLab to reconcile.

If the warning persists, update the agent installed on your cluster.