File: examples-from-user-filed-issues.md

package info (click to toggle)
git-filter-repo 2.47.0-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,280 kB
  • sloc: sh: 4,887; python: 4,856; makefile: 114
file content (464 lines) | stat: -rw-r--r-- 15,430 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
# Examples from user-filed issues

Lots of people have filed issues against git-filter-repo, and many times their
issue boils down into questions of "How do I?" or "Why doesn't this work?"

Below are a collection of example repository filterings in answer to their
questions, which may be of interest to others.

## Table of Contents

  * [Adding files to root commits](#adding-files-to-root-commits)
  * [Purge a large list of files](#purge-a-large-list-of-files)
  * [Extracting a libary from a repo](#Extracting-a-libary-from-a-repo)
  * [Replace words in all commit messages](#Replace-words-in-all-commit-messages)
  * [Only keep files from two branches](#Only-keep-files-from-two-branches)
  * [Renormalize end-of-line characters and add a .gitattributes](#Renormalize-end-of-line-characters-and-add-a-gitattributes)
  * [Remove spaces at the end of lines](#Remove-spaces-at-the-end-of-lines)
  * [Having both exclude and include rules for filenames](#Having-both-exclude-and-include-rules-for-filenames)
  * [Removing paths with a certain extension](#Removing-paths-with-a-certain-extension)
  * [Removing a directory](#Removing-a-directory)
  * [Convert from NFD filenames to NFC](#Convert-from-NFD-filenames-to-NFC)
  * [Set the committer of the last few commits to myself](#Set-the-committer-of-the-last-few-commits-to-myself)
  * [Handling special characters, e.g. accents in names](#Handling-special-characters-eg-accents-in-names)
  * [Handling repository corruption](#Handling-repository-corruption)
  * [Removing all files with a backslash in them](#Removing-all-files-with-a-backslash-in-them)
  * [Replace a binary blob in history](#Replace-a-binary-blob-in-history)
  * [Remove commits older than N days](#Remove-commits-older-than-N-days)
  * [Replacing pngs with compressed alternative](#Replacing-pngs-with-compressed-alternative)
  * [Updating submodule hashes](#Updating-submodule-hashes)
  * [Using multi-line strings in callbacks](#Using-multi-line-strings-in-callbacks)


## Adding files to root commits

<!-- https://github.com/newren/git-filter-repo/issues/21 -->

Here's an example that will take `/path/to/existing/README.md` and
store it as `README.md` in the repository, and take
`/home/myusers/mymodule.gitignore` and store it as `src/.gitignore` in
the repository:

```
git filter-repo --commit-callback "if not commit.parents: commit.file_changes += [
    FileChange(b'M', b'README.md', b'$(git hash-object -w '/path/to/existing/README.md')', b'100644'), 
    FileChange(b'M', b'src/.gitignore', b'$(git hash-object -w '/home/myusers/mymodule.gitignore')', b'100644')]"
```

Alternatively, you could also use the [insert-beginning](../contrib/filter-repo-demos/insert-beginning) contrib script:

```
mv /path/to/existing/README.md README.md
mv /home/myusers/mymodule.gitignore src/.gitignore
insert-beginning --file README.md
insert-beginning --file src/.gitignore
```

## Purge a large list of files

<!-- https://github.com/newren/git-filter-repo/issues/63 -->

Stick all the files in some file (one per line),
e.g. `../DELETED_FILENAMES.txt`, and then run

```
git filter-repo --invert-paths --paths-from-file ../DELETED_FILENAMES.txt
```

## Extracting a libary from a repo

<!-- https://github.com/newren/git-filter-repo/issues/80 -->

If you want to pick out some subdirectory to keep
(e.g. `src/some-filder/some-feature/`), but don't want it moved to the
repository root (so that --subdirectory-filter isn't applicable) but
instead want it to become some other higher level directory
(e.g. `src/`):

```
git filter-repo \
    --path src/some-folder/some-feature/ \
    --path-rename src/some-folder/some-feature/:src/
```

## Replace words in all commit messages

<!-- https://github.com/newren/git-filter-repo/issues/83 -->

Replace "stuff" in any commit message with "task".

```
git-filter-repo --message-callback 'return message.replace(b"stuff", b"task")'
```

## Only keep files from two branches

<!-- https://github.com/newren/git-filter-repo/issues/91 -->

Let's say you know that the files currently present on two branches
are the only files that matter.  Files that used to exist in either of
these branches, or files that only exist on some other branch, should
all be deleted from all versions of history.  This can be accomplished
by getting a list of files from each branch, combining them, sorting
the list and picking out just the unique entries, then passing the
result to `--paths-from-file`:

```
git ls-tree -r ${BRANCH1} >../my-files
git ls-tree -r ${BRANCH2} >>../my-files
sort ../my-files | uniq >../my-relevant-files
git filter-repo --paths-from-file ../my-relevant-files
```

## Renormalize end-of-line characters and add a .gitattributes

<!-- https://github.com/newren/git-filter-repo/issues/122 -->

```
contrib/filter-repo-demos/lint-history dos2unix
[edit .gitattributes]
contrib/filter-repo-demos/insert-beginning .gitattributes
```

## Remove spaces at the end of lines

<!-- https://github.com/newren/git-filter-repo/issues/145 -->

Removing all spaces at the end of lines of non-binary files, including
converting CRLF to LF:

```
git filter-repo --replace-text <(echo 'regex:[\r\t ]+(\n|$)==>\n')
```

## Having both exclude and include rules for filenames

<!-- https://github.com/newren/git-filter-repo/issues/230 -->

If you want to have rules to both include and exclude filenames, you
can simply invoke `git filter-repo` multiple times.  Alternatively,
you can do it in one run if you dispense with `--path` arguments and
instead use the more generic `--filename-callback`.  For example to
include all files under `src/` except for `src/README.md`:

```
git filter-repo --filename-callback '
    if filename == b"src/README.md":
        return None
    if filename.startswith(b"src/"):
        return filename
  return None'
```

## Removing paths with a certain extension

<!-- https://github.com/newren/git-filter-repo/issues/274 -->

```
git filter-repo --invert-paths --path-glob '*.xsa'
```

or

```
git filter-repo --filename-callback '
    if filename.endswith(b".xsa"):
        return None
    return filename'
```

## Removing a directory

<!-- https://github.com/newren/git-filter-repo/issues/278 -->

```
git filter-repo --path node_modules/electron/dist/ --invert-paths
```

## Convert from NFD filenames to NFC

<!-- https://github.com/newren/git-filter-repo/issues/296 -->

Given that Mac does utf-8 normalization of filenames, and has
historically switched which kind of normalization it does, users may
have committed files with alternative normalizations to their
repository.  If someone wants to convert filenames in NFD form to NFC,
they could run

```
git filter-repo --filename-callback '
    try: 
        return subprocess.check_output("iconv -f utf-8-mac -t utf-8".split(),
                                       input=filename)
    except:
        return filename
'
```

or instead of relying on the system iconv utility and spawning separate
processes, doing it within python:

```
git filter-repo --filename-callback '
    import unicodedata
    try:
       return bytearray(unicodedata.normalize('NFC', filename.decode('utf-8')), 'utf-8')
    except:
      return filename
'
```
  
## Set the committer of the last few commits to myself

<!-- https://github.com/newren/git-filter-repo/issues/379 -->

```
git filter-repo --refs main~5..main --commit-callback '
    commit.commiter_name = b"My Wonderful Self"
    commit.committer_email = b"my@self.org"
'
```

## Handling special characters, e.g. accents and umlauts in names

<!-- https://github.com/newren/git-filter-repo/issues/383 -->

Since characters like ë and á are multi-byte characters and python
won't allow you to directly place those in a bytestring
(e.g. `b"Raphaël González"` would result in a `SyntaxError: bytes can
only contain ASCII literal characters` error from Python), you just
need to make a normal (UTF-8) string and then convert to a bytestring
to handle these.  For example, changing the author name and email
where the author email is currently `example@test.com`:

```
git filter-repo --refs main~5..main --commit-callback '
    if commit.author_email = b"example@test.com":
        commit.author_name = "Raphaël González".encode()
        commit.author_email = b"rgonzalez@test.com"
'
```

## Handling repository corruption

<!-- https://github.com/newren/git-filter-repo/issues/420 -->

First, run fsck to get a list of the corrupt objects, e.g.:
```
$ git fsck
error in commit 166f57b3fbe31257100361ecaf735f305b533b21: missingSpaceBeforeDate: invalid author/committer line - missing space before date
Checking object directories: 100% (256/256), done.
```

Then print out that object literally to a temporary file:
```
$ git cat-file -p 166f57b3fbe31257100361ecaf735f305b533b21 >tmp
```

Taking a look at the file would show, for example:
```
$ cat tmp
tree e1d871155fce791680ec899fe7869067f2b4ffd2
author My Name <my@email.com>1673287380 -0800
committer My Name <my@email.com> 1673287380 -0800

Initial
```

Edit that file to fix the error (in this case, the missing space
between author email and author date):

```
tree e1d871155fce791680ec899fe7869067f2b4ffd2
author My Name <my@email.com> 1673287380 -0800
committer My Name <my@email.com> 1673287380 -0800

Initial
```

Save the updated file, then use `git-replace` to make a replace reference
for it.
```
$ git replace -f 166f57b3fbe31257100361ecaf735f305b533b21 $(git hash-object -t commit -w tmp)
```

Then remove the temporary file `tmp` and run `filter-repo` to consume
the replace reference and make it permanent:

```
$ rm tmp
$ git filter-repo --proceed
```

Note that if you have multiple corrupt objects, you only need to run
filter-repo once; that is, so long as you create all the replacements
before you run filter-repo.

## Removing all files with a backslash in them

<!-- https://github.com/newren/git-filter-repo/issues/427 -->

```
git filter-repo --filename-callback 'return None if b'\\' in filename else filename'
```

## Replace a binary blob in history

<!-- https://github.com/newren/git-filter-repo/issues/436 -->

Let's say you committed a binary blob, perhaps an image file, with
sensitive data, and never modified it.  You want to replace it with
the contents of some alternate file, currently found at
`../alternative-file.jpg` (it can have a different filename than what
is stored in the repository).  Let's also say the hash of the old file
was `f4ede2e944868b9a08401dafeb2b944c7166fd0a`.  You can replace it
with either

```
git filter-repo --blob-callback '
    if blob.original_id == b"f4ede2e944868b9a08401dafeb2b944c7166fd0a":
        blob.data = open("../alternative-file.jpg", "rb").read()
'
```

or

```
git replace -f f4ede2e944868b9a08401dafeb2b944c7166fd0a $(git hash-object -w ../alternative-file.jpg)
git filter-repo --proceed
```

## Remove commits older than N days

<!-- https://github.com/newren/git-filter-repo/issues/300 -->

This is such a bad usecase.  I'm tempted to leave it out, but it has
come up multiple times, and there are people who are totally fine with
changing every commit hash in their repository and throwing away
history periodically.  First, identify an ${OLD_COMMIT} that you want
to be a new root commit, then run:

```
git replace --graft ${OLD_COMMIT}
git filter-repo --proceed
```

(The trick here is that `git replace --graft` takes a commit to replace, and
a list of new parents for the commit.  Since ${OLD_COMMIT} is the final
positional argument, it means the list of new parents is an empty list, i.e.
we are turning it into a new root commit.)

## Replacing pngs with compressed alternative

<!-- https://github.com/newren/git-filter-repo/issues/492 -->

Let's say you committed thousands of pngs that were poorly compressed,
but later aggressively recompressed the pngs and commited and pushed.
Unfortunately, clones are slow because they still contain the poorly
compressed pngs and you'd like to rewrite history to pretend that the
aggressively compressed versions were used when the files were first
introduced.

First, take a look at the commit that aggressively recompressed the pngs:

```
git log -1 --raw --no-abbrev ${COMMIT_WHERE_YOU_COMPRESSED_PNGS}
```

that will show output like
```
:100755 100755 edf570fde099c0705432a389b96cb86489beda09 9cce52ae0806d695956dcf662cd74b497eaa7b12 M      resources/foo.png
:100755 100755 644f7c55e1a88a29779dc86b9ff92f512bf9bc11 88b02e9e45c0a62db2f1751b6c065b0c2e538820 M      resources/bar.png
```

Use that to make a --file-info-callback to fix up the original versions:
```
git filter-repo --file-info-callback '
    if filename == b"resources/foo.png" and blob_id == b"edf570fde099c0705432a389b96cb86489beda09":
        blob_id = b"9cce52ae0806d695956dcf662cd74b497eaa7b12"
    if filename == b"resources/bar.png" and blob_id == b"644f7c55e1a88a29779dc86b9ff92f512bf9bc11":
        blob_id = b"88b02e9e45c0a62db2f1751b6c065b0c2e538820"
    return (filename, mode, blob_id)
'
```

## Updating submodule hashes

<!-- https://github.com/newren/git-filter-repo/issues/537 -->

Let's say you have a repo with a submodule at src/my-submodule, and
that you feel the wrong commit-hashes of the submodule were commited
within your project and you want them updated according to the
following table:
```
old                                      new
edf570fde099c0705432a389b96cb86489beda09 9cce52ae0806d695956dcf662cd74b497eaa7b12
644f7c55e1a88a29779dc86b9ff92f512bf9bc11 88b02e9e45c0a62db2f1751b6c065b0c2e538820
```

You could do this as follows:
```
git filter-repo --file-info-callback '
    if filename == b"src/my-submodule" and blob_id == b"edf570fde099c0705432a389b96cb86489beda09":
        blob_id = b"9cce52ae0806d695956dcf662cd74b497eaa7b12"
    if filename == b"src/my-submodule" and blob_id == b"644f7c55e1a88a29779dc86b9ff92f512bf9bc11":
        blob_id = b"88b02e9e45c0a62db2f1751b6c065b0c2e538820"
    return (filename, mode, blob_id)
```

Yes, `blob_id` is kind of a misnomer here since the file's hash
actually refers to a commit from the sub-project.  But `blob_id` is
the name of the parameter passed to the --file-info-callback, so that
is what must be used.

## Using multi-line strings in callbacks

<!-- https://lore.kernel.org/git/CABPp-BFqbiS8xsbLouNB41QTc5p0hEOy-EoV0Sjnp=xJEShkTw@mail.gmail.com/ -->

Since the text for callbacks have spaces inserted at the front of every
line, multi-line strings are normally munged.  For example, the command

```
git filter-repo --blob-callback '
  blob.data = bytes("""\
This is the new
file that I am
replacing every blob
with.  It is great.\n""", "utf-8")
'
```

would result in a file with extra spaces at the front of every line:
```
  This is the new
  file that I am
  replacing every blob
  with.  It is great.
```

The two spaces at the beginning of every-line were inserted into every
line of the callback when trying to compile it as a function.
However, you can use textwrap.dedent to fix this; in fact, using it
will even allow you to add more leading space so that it looks nicely
indented.  For example:

```
git filter-repo --blob-callback '
  import textwrap
  blob.data = bytes(textwrap.dedent("""\
    This is the new
    file that I am
    replacing every blob
    with.  It is great.\n"""), "utf-8")
'
```

That will result in a file with contents
```
This is the new
file that I am
replacing every blob
with.  It is great.
```

which has no leading spaces on any lines.