File: moving_repositories.md

package info (click to toggle)
gitlab 17.6.5-19
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 629,368 kB
  • sloc: ruby: 1,915,304; javascript: 557,307; sql: 60,639; xml: 6,509; sh: 4,567; makefile: 1,239; python: 406
file content (259 lines) | stat: -rw-r--r-- 12,174 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
---
stage: Systems
group: Gitaly
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments
---

# Moving repositories managed by GitLab

DETAILS:
**Tier:** Free, Premium, Ultimate
**Offering:** Self-managed

You can move all repositories managed by GitLab to another file system or another server.

## Moving data in a GitLab instance

The GitLab API is the recommended way to move Git repositories:

- Between servers.
- Between different storage.
- From single-node Gitaly to Gitaly Cluster.

For more information, see:

- [Configuring additional storage for Gitaly](../gitaly/configure_gitaly.md#network-architecture). This
  example configures additional storage called `storage1` and `storage2`.
- [The API documentation](../../api/project_repository_storage_moves.md) details the endpoints for
  querying and scheduling project repository moves.
- [The API documentation](../../api/snippet_repository_storage_moves.md) details the endpoints for
  querying and scheduling snippet repository moves.
- [The API documentation](../../api/group_repository_storage_moves.md) details the endpoints for
  querying and scheduling group repository moves.
- [Migrate to Gitaly Cluster](../gitaly/index.md#migrate-to-gitaly-cluster).

### Moving Repositories

GitLab repositories can be associated with projects, groups, and snippets. Each of these types
has a separate API to schedule the respective repositories to move. To move all repositories
on a GitLab instance, each of these types must be scheduled to move for each storage.

Each repository is made read-only for the duration of the move. The repository is not writable
until the move has completed.

To move repositories:

1. Ensure all [local and cluster storages](../gitaly/configure_gitaly.md#mixed-configuration) are accessible to the GitLab instance. In
   this example, these are `<original_storage_name>` and `<cluster_storage_name>`.
1. [Configure repository storage weights](../repository_storage_paths.md#configure-where-new-repositories-are-stored)
   so that the new storages receives all new projects. This stops new projects from being created
   on existing storages while the migration is in progress.
1. Schedule repository moves for:
   - [All projects](#move-all-projects) or
     [individual projects](../../api/project_repository_storage_moves.md#schedule-a-repository-storage-move-for-a-project).
   - [All snippets](#move-all-snippets) or
     [individual snippets](../../api/snippet_repository_storage_moves.md#schedule-a-repository-storage-move-for-a-snippet).
   - [All groups](#move-all-groups) or
     [individual groups](../../api/group_repository_storage_moves.md#schedule-a-repository-storage-move-for-a-group).
1. If [Geo](../geo/index.md) is enabled,
   [resync all repositories](../geo/replication/troubleshooting/synchronization_verification.md#queue-up-all-repositories-for-resync).

#### Move all projects

To move all projects by using the API:

1. [Schedule repository storage moves for all projects on a storage shard](../../api/project_repository_storage_moves.md#schedule-repository-storage-moves-for-all-projects-on-a-storage-shard)
   using the API. For example:

   ```shell
   curl --request POST --header "Private-Token: <your_access_token>" \
        --header "Content-Type: application/json" \
        --data '{"source_storage_name":"<original_storage_name>","destination_storage_name":"<cluster_storage_name>"}' \
        "https://gitlab.example.com/api/v4/project_repository_storage_moves"
   ```

1. [Query the most recent repository moves](../../api/project_repository_storage_moves.md#retrieve-all-project-repository-storage-moves)
   using the API. The response indicates either:
   - The moves have completed successfully. The `state` field is `finished`.
   - The moves are in progress. Re-query the repository move until it completes successfully.
   - The moves have failed. Most failures are temporary and are solved by rescheduling the move.

1. After the moves are complete, use the API to [query projects](../../api/projects.md#list-all-projects) and confirm that all projects have moved. None of the projects should be returned with the
   `repository_storage` field set to the old storage. For example:

   ```shell
   curl --header "Private-Token: <your_access_token>" --header "Content-Type: application/json" \
   "https://gitlab.example.com/api/v4/projects?repository_storage=<original_storage_name>"
   ```

   Alternatively use [the rails console](../operations/rails_console.md) to confirm that all
   projects have moved. Run the following in the rails console:

   ```ruby
   ProjectRepository.for_repository_storage('<original_storage_name>')
   ```

1. Repeat for each storage as required.

#### Move all snippets

To move all snippets by using the API:

1. [Schedule repository storage moves for all snippets on a storage shard](../../api/snippet_repository_storage_moves.md#schedule-repository-storage-moves-for-all-snippets-on-a-storage-shard). For example:

   ```shell
   curl --request POST --header "PRIVATE-TOKEN: <your_access_token>" \
        --header "Content-Type: application/json" \
        --data '{"source_storage_name":"<original_storage_name>","destination_storage_name":"<cluster_storage_name>"}' \
        "https://gitlab.example.com/api/v4/snippet_repository_storage_moves"
   ```

1. [Query the most recent repository moves](../../api/snippet_repository_storage_moves.md#retrieve-all-snippet-repository-storage-moves).
   The response indicates either:
   - The moves have completed successfully. The `state` field is `finished`.
   - The moves are in progress. Re-query the repository move until it completes successfully.
   - The moves have failed. Most failures are temporary and are solved by rescheduling the move.

1. After the moves are complete, use [the rails console](../operations/rails_console.md) to confirm
   that all snippets have moved. No snippets should be returned for the original storage. Run the
   following in the rails console:

   ```ruby
   SnippetRepository.for_repository_storage('<original_storage_name>')
   ```

1. Repeat for each storage as required.

#### Move all groups

DETAILS:
**Tier:** Premium, Ultimate
**Offering:** Self-managed

To move all groups by using the API:

1. [Schedule repository storage moves for all groups on a storage shard](../../api/group_repository_storage_moves.md#schedule-repository-storage-moves-for-all-groups-on-a-storage-shard).
   For example:

   ```shell
   curl --request POST --header "PRIVATE-TOKEN: <your_access_token>" \
        --header "Content-Type: application/json" \
        --data '{"source_storage_name":"<original_storage_name>","destination_storage_name":"<cluster_storage_name>"}' \
        "https://gitlab.example.com/api/v4/group_repository_storage_moves"
   ```

1. [Query the most recent repository moves](../../api/group_repository_storage_moves.md#retrieve-all-group-repository-storage-moves).
   The response indicates either:
   - The moves have completed successfully. The `state` field is `finished`.
   - The moves are in progress. Re-query the repository move until it completes successfully.
   - The moves have failed. Most failures are temporary and are solved by rescheduling the move.

1. After the moves are complete, use [the rails console](../operations/rails_console.md) to confirm
   that all groups have moved. No groups should be returned for the original storage. Run the
   following in the rails console:

   ```ruby
   GroupWikiRepository.for_repository_storage('<original_storage_name>')
   ```

1. Repeat for each storage as required.

## Migrating to another GitLab instance

[Using the API](#moving-data-in-a-gitlab-instance) isn't an option if you are migrating to a new
GitLab environment, for example:

- From a single-node GitLab to a scaled-out architecture.
- From a GitLab instance in your private data center to a cloud provider.

The rest of the document looks
at some of the ways you can copy all your repositories from
`/var/opt/gitlab/git-data/repositories` to `/mnt/gitlab/repositories`.

We look at three scenarios:

- The target directory is empty.
- The target directory contains an outdated copy of the repositories.
- How to deal with thousands of repositories.

WARNING:
Each of the approaches we list can or does overwrite data in the target directory
`/mnt/gitlab/repositories`. Do not mix up the source and the target.

### Recommended approach in all cases

For either Gitaly or Gitaly Cluster targets, the GitLab [backup and restore capability](../../administration/backup_restore/index.md)
should be used. Git repositories are accessed, managed, and stored on GitLab servers by Gitaly as a database. Data loss
can result from directly accessing and copying Gitaly files using tools like `rsync`.

- Backup performance can be improved by
  [processing multiple repositories concurrently](../../administration/backup_restore/backup_gitlab.md#back-up-git-repositories-concurrently).
- Backups can be created of just the repositories using the
  [skip feature](../../administration/backup_restore/backup_gitlab.md#excluding-specific-data-from-the-backup).

No other method works for Gitaly Cluster targets.

### Target directory is empty: use a `tar` pipe

For Gitaly targets (use [recommended approach](#recommended-approach-in-all-cases) for Gitaly Cluster targets), if the
target directory `/mnt/gitlab/repositories` is empty the simplest thing to do is to use a `tar` pipe. This method has
low overhead and `tar` is almost always already installed on your system.

However, it is not possible to resume an interrupted `tar` pipe; if that happens then all data must be copied again.

```shell
sudo -u git sh -c 'tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\
  tar -C /mnt/gitlab/repositories -xf -'
```

If you want to see progress, replace `-xf` with `-xvf`.

#### `tar` pipe to another server

For Gitaly targets (use [recommended approach](#recommended-approach-in-all-cases) for Gitaly Cluster targets), you can
also use a `tar` pipe to copy data to another server. If your `git` user has SSH access to the new server as
`git@newserver`, you can pipe the data through SSH.

```shell
sudo -u git sh -c 'tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\
  ssh git@newserver tar -C /mnt/gitlab/repositories -xf -'
```

If you want to compress the data before it goes over the network
(which costs you CPU cycles) you can replace `ssh` with `ssh -C`.

### The target directory contains an outdated copy of the repositories: use `rsync`

WARNING:
Using `rsync` to migrate Git data can cause data loss and repository corruption.
[These instructions are being reviewed](https://gitlab.com/gitlab-org/gitlab/-/issues/270422).

If the target directory already contains a partial or outdated copy of the repositories it may be wasteful to copy all
the data again with `tar`. In this scenario it is better to use `rsync` for Gitaly targets (use
[recommended approach](#recommended-approach-in-all-cases) for Gitaly Cluster targets).

This utility is either already installed on your system, or installable using `apt` or `yum`.

```shell
sudo -u git  sh -c 'rsync -a --delete /var/opt/gitlab/git-data/repositories/. \
  /mnt/gitlab/repositories'
```

The `/.` in the command above is very important, without it you can
get the wrong directory structure in the target directory.
If you want to see progress, replace `-a` with `-av`.

#### Single `rsync` to another server

WARNING:
Using `rsync` to migrate Git data can cause data loss and repository corruption.
[These instructions are being reviewed](https://gitlab.com/gitlab-org/gitlab/-/issues/270422).

For Gitaly targets (use [recommended approach](#recommended-approach-in-all-cases) for Gitaly Cluster targets), if the
`git` user on your source system has SSH access to the target server you can send the repositories over the network with
`rsync`.

```shell
sudo -u git sh -c 'rsync -a --delete /var/opt/gitlab/git-data/repositories/. \
  git@newserver:/mnt/gitlab/repositories'
```