File: rdfind.1

package info (click to toggle)
rdfind 1.4.1-1
  • links: PTS, VCS
  • area: main
  • in suites: buster, sid
  • size: 648 kB
  • sloc: cpp: 1,670; sh: 1,622; makefile: 32
file content (186 lines) | stat: -rw-r--r-- 6,689 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
.\" View this file with
.\" groff -man -Tutf8 rdfind.1 |less
.\"
.\" Author Paul Dreik 2006
.\" see LICENSE for details.
.TH rdfind "1" 1.4.1 "Nov 2018" rdfind
.SH NAME
rdfind \- finds duplicate files
.SH SYNOPSIS
.B rdfind [ options ] 
.I directory1 | file1
.B [
.I directory2 | file2
.B ] ...
.SH DESCRIPTION
.B rdfind
finds duplicate files across and/or within several directories. It calculates
checksum only if necessary.
rdfind runs in O(Nlog(N)) time with N being the number of files. 

If two (or more) equal files are found, the program decides which of
them is the original and the rest are considered duplicates. This
is done by ranking the files to each other and deciding which has the
highest rank. See section RANKING for details.

By default, no action is taken besides creating a file with the
detected files and showing the possible amount of saved space. 

If you need better control over the ranking than given, you can use
some preprocessor which sorts the file names in desired order and then
run the program using xargs. See examples below for how to use find
and xargs in conjunction with rdfind.

To include files or directories that have names starting with -, use 
rdfind ./- to not confuse them with options.

.SH RANKING
Given two or more equal files, the one with the highest rank is
selected to be the original and the rest are duplicates. The rules of
ranking are given below, where the rules are executed from start until
an original has been found. Given two files A and B which have equal
size and content, the ranking is as follows: 

If A was found while scanning an input argument earlier than than B, A
is higher ranked.

If A was found at a directory depth lower than B, A is higher ranked
(A is closer to the root).

if A and B are found during scanning of the same input argument and share
the same directory depth, the one that ranks highest depends on if
deterministic operation is enabled. This is on by default, see option
\fB-deterministic\fR). If enabled, which one ranks highest is
unspecified but deterministic. If disabled, the one that was reported
first from the file system is highest ranked.

.SH OPTIONS
Searching options etc:
.TP
.BR \-ignoreempty " " \fItrue\fR|\fIfalse\fR
Ignore empty files. Setting this to true (the default) is equivalent to
-minsize 1, false is equivalent to -minsize 0.
.TP
.BR \-minsize " "\fIN\fR
Ignores files with less than N bytes. Default is 1, meaning empty files
are ignored.
.TP
.BR \-followsymlinks " " \fItrue\fR|\fIfalse\fR
Follow symlinks. Default is false.
.TP
.BR \-removeidentinode " " \fItrue\fR|\fIfalse\fR
Removes items found which have identical inode and device ID. Default
is true.
.TP
.BR \-checksum " " \fImd5\fR|\fIsha1\fR|\fIsha256\fR
What type of checksum to be used: md5, sha1 or sha256. The default is
sha1 since version 1.4.0.
.TP
.BR \-deterministic " " \fItrue\fR|\fIfalse\fR
If set (the default), sort files of equal rank in an unspecified but
deterministic order. This makes the behaviour independent of in which
order files are listed when querying the file system.
.PP
Action options:
.TP
.BR \-makesymlinks " " \fItrue\fR|\fIfalse\fR
Replace duplicate files with symbolic links. Default is false.
.TP
.BR \-makehardlinks " " \fItrue\fR|\fIfalse\fR
Replace duplicate files with hard links. Default is false.
.TP
.BR \-makeresultsfile " " \fItrue\fR|\fIfalse\fR
Make a results file in the current directory. Default is true. If the
file exists, it is overwritten.
.TP
.BR \-outputname " " \fIname\fR
Make the results file name to be "name" instead of the default
results.txt.
.TP
.BR \-deleteduplicates " " \fItrue\fR|\fIfalse\fR
Delete (unlink) files. Default is false.
.PP
General options:
.TP
.BR \-sleep " " \fIX\fRms
Sleeps X milliseconds between reading each file, to reduce
load. Default is 0 (no sleep). Note that only a few values are
supported at present: 0,1-5,10,25,50,100 milliseconds. 
.TP
.BR \-n ", " \-dryrun " " \fItrue\fR|\fIfalse\fR
Displays what should have been done, don't actually delete or link
anything. Default is false.
.TP
.BR \-h ", " \-help ", " \-\-help
Displays a brief help message.
.TP
.BR \-v ", " \-version ", " \-\-version
Displays the version number.
.SH EXAMPLES
.TP
Search for duplicate files in the home directory and a backup directory:
.B rdfind ~ /mnt/backup
.TP
Delete duplicates in a backup directory:
.B rdfind -deleteduplicates true /mnt/backup
.TP
Search for duplicate files in directories called foo:
.B find . -type d -name foo -print0 |xargs -0 rdfind
.SH FILES
.I results.txt
(the default name is results.txt and can be changed with option outputname,
see above) The results file results.txt will contain one row per duplicate file
found, along with a header row explaining the columns.
A text describes why the file is considered a duplicate:

DUPTYPE_UNKNOWN some internal error

DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the original.

DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing
the directory in the same input argument as the original)

DUPTYPE_OUTSIDE_TREE the file is found during processing another input
argument than the original. 
.SH ENVIRONMENT
.SH DIAGNOSTICS
.SH EXIT VALUES
0 on success, nonzero otherwise.
.SH BUGS/FEATURES
When specifying the same directory twice, it keeps the first
encountered as the most important (original), and the rest as
duplicates. This might not be what you want.

The symlink creates absolute links. This might not be what you
want. To create relative links instead, you may use the symlinks (2)
command, which is able to convert absolute links to relative links.

Older versions unfortunately contained a misspelling on the word
occurrence. This is now corrected (since 1.3), which might affect
user scripts parsing the output file written by rdfind.

.SH SECURITY CONSIDERATIONS
Avoid manipulating the directories while rdfind is reading.
rdfind is quite brittle in that case. Especially, when deleting
or making links, rdfind can be subject to a symlink attack.
Use with care!
.SH AUTHOR
Paul Dreik 2006-2018, reachable at rdfind@pauldreik.se
Rdfind can be found at https://rdfind.pauldreik.se/

Do you find rdfind useful? Drop me a line! It is always fun to
hear from people who actually use it and what data collections
they run it on.
.SH THANKS
Several persons have helped with suggestions and improvements:
Niels Möller, Carl Payne and Salvatore Ansani. Thanks also to you
who tested the program and sent me feedback.
.SH VERSION
1.4.1 (release date 2018-11-12)
.SH COPYRIGHT
This program is distributed under GPLv2 or later, at your option.
.SH "SEE ALSO"
.BR md5sum (1),
.BR sha1sum (1),
.BR find (1),
.BR symlinks(2)