File: find-maint.texi

package info (click to toggle)
findutils 4.6.0%2Bgit%2B20190209-2
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 20,276 kB
  • sloc: ansic: 98,553; sh: 8,927; yacc: 1,840; exp: 850; makefile: 813; python: 66; sed: 16
file content (1157 lines) | stat: -rw-r--r-- 43,487 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename find-maint.info
@settitle Maintaining Findutils
@c For double-sided printing, uncomment:
@c @setchapternewpage odd
@c %**end of header

@include versionmaint.texi

@iftex
@finalout
@end iftex

@dircategory GNU organization
@direntry
* Maintaining Findutils: (find-maint).        Maintaining GNU findutils
@end direntry

@copying
This manual explains how GNU findutils is maintained, how changes should
be made and tested, and what resources exist to help developers.

This is edition @value{EDITION}, for findutils version @value{VERSION}.

Copyright @copyright{} 2007-2019 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts.
A copy of the license is included in the section entitled ``GNU
Free Documentation License''.
@end copying

@titlepage
@title Maintaining Findutils
@subtitle Edition @value{EDITION}, for GNU findutils version @value{VERSION}
@subtitle @value{UPDATED}
@author by James Youngman

@page
@vskip 0pt plus 1filll
@insertcopying
@end titlepage

@contents

@ifnottex
@node Top, Introduction, (dir), (dir)
@top Maintaining GNU Findutils

@insertcopying
@end ifnottex

@menu
* Introduction::
* Maintaining GNU Programs::
* Design Issues::
* Coding Conventions::
* Tools::
* Using the GNU Portability Library::
* Documentation::
* Testing::
* Bugs::
* Distributions::
* Internationalisation::
* Security::
* Making Releases::
* GNU Free Documentation License::
@end menu





@node Introduction
@chapter Introduction

This document explains how to contribute to and maintain GNU
Findutils.  It concentrates on developer-specific issues.  For
information about how to use the software please refer to
@xref{Introduction, ,Introduction,find,The Findutils manual}.

This manual aims to be useful without necessarily being verbose.  It's
also a recent document, so there will be a many areas in which
improvements can be made.  If you find that the document misses out
important information or any part of the document is be so terse as to
be unuseful, please ask for help on the @email{bug-findutils@@gnu.org}
mailing list.  We'll try to improve this document too.


@node Maintaining GNU Programs
@chapter Maintaining GNU Programs

GNU Findutils is part of the GNU Project and so there are a number of
documents which set out standards for the maintenance of GNU
software.

@table @file
@item standards.texi
GNU Project Coding Standards.  All changes to findutils should comply
with these standards.  In some areas we go somewhat beyond the
requirements of the standards, but these cases are explained in this
manual.
@item maintain.texi
Information for Maintainers of GNU Software.  This document provides
guidance for GNU maintainers.  Everybody with commit access should
read this document.   Everybody else is welcome to do so too, of
course.
@end table



@node Design Issues
@chapter Design Issues

The findutils package is installed on many many systems, usually as a
fundamental component.  The programs in the package are often used in
order to successfully boot or fix the system.

This fact means that for findutils we bear in mind considerations that
may not apply so much as for other packages.  For example, the fact
that findutils is often a base component motivates us to
@itemize
@item Limit dependencies on libraries
@item Avoid dependencies on other large packages (for example, interpreters)
@item Be conservative when making changes to the 'stable' release branch
@end itemize

All those considerations come before functionality.  Functional
enhancements are still made to findutils, but these are almost
exclusively introduced in the 'development' release branch, to allow
extensive testing and proving.

Sometimes it is useful to have a priority list to provide guidance
when making design trade-offs.   For findutils, that priority list is:

@enumerate
@item Correctness
@item Standards compliance
@item Security
@item Backward compatibility
@item Performance
@item Functionality
@end enumerate

For example, we support the @code{-exec} action because POSIX
compliance requires this, even though there are security problems with
it and we would otherwise prefer people to use @code{-execdir}.  There
are also cases where some performance is sacrificed in the name of
security.  For example, the sanity checks that @code{find} performs
while traversing a directory tree may slow it down.   We adopt
functional changes, and functional changes are allowed to make
@code{find} slower, but only if there is no detectable impact on users
who don't use the feature.

Backward-incompatible changes do get made in order to comply with
standards (for example the behaviour of @code{-perm -...} changed in
order to comply with POSIX).  However, they don't get made in order to
provide better ease of use; for example the semantics of @code{-size
-2G} are almost always unexpected by users, but we retain the current
behaviour because of backward compatibility and for its similarity to
the block-rounding behaviour of @code{-size -30}.  We might introduce
a change which does not have the unfortunate rounding behaviour, but
we would choose another syntax (for example @code{-size '<2G'}) for
this.

In a general sense, we try to do test-driven development of the
findutils code; that is, we try to implement test cases for new
features and bug fixes before modifying the code to make the test
pass.  Some features of the code are tested well, but the test
coverage for other features is less good.  If you are about to modify
the code for a predicate and aren't sure about the test coverage, use
@code{grep} on the test directories and measure the coverage with
@code{lcov} or another test coverage tool.

You should be able to use the @code{coverage} Makefile target (it's
defined in @code{maint.mk} to generate a test coverage report for
findutils.   Due to limitations in @code{lcov}, this only works if
your build directory is the same asthe source directory (that is,
you're not using a VPATH build configuration).

Lastly, we try not to depend on having a ``working system''.  The
findutils suite is used for diagnosis of problems, and this applies
especially to @code{find}.  We should ensure that @code{find} still
works on relatively broken systems, for example systems with damaged
@file{/etc/passwd} or @code{/etc/fstab} files.  Another interesting
example is the case where a system is a client of one or more
unresponsive NFS servers.  On such a system, if you try to stat all
mount points, your program will hang indefinitely, waiting for the
remote NFS server to respond.

Another interesting but unusual case is broken NFS servers and corrupt
filesystems; sometimes they return `impossible' file modes.  It's
important that find does not entirely fail when encountering such a
file.


@node Coding Conventions
@chapter Coding Conventions

Coding style documents which set out to establish a uniform look and
feel to source code have worthy goals, for example greater ease of
maintenance and readability.  However, I do not believe that in
general coding style guide authors can envisage every situation, and
it is always possible that it might on occasion be necessary to break
the letter of the style guide in order to honour its spirit, or to
better achieve the style guide's goals.

I've certainly seen many style guides outside the free software world
which make bald statements such as ``functions shall have exactly one
return statement''.  The desire to ensure consistency and obviousness
of control flow is laudable, but it is all too common for such bald
requirements to be followed unthinkingly.  Certainly I've seen such
coding standards result in unmaintainable code with terrible
infelicities such as functions containing @code{if} statements nested
nine levels deep.  I suppose such coding standards don't survive in
free software projects because they tend to drive away potential
contributors or tend to generate heated discussions on mailing lists.
Equally, a nine-level-deep function in a free software program would
quickly get refactored, assuming it is obvious what the function is
supposed to do...

Be that as it may, the approach I will take for this document is to
explain some idioms and practices in use in the findutils source code,
and leave it up to the reader's engineering judgement to decide which
considerations apply to the code they are working on, and whether or
not there is sufficient reason to ignore the guidance in current
circumstances.


@menu
* Make the Compiler Find the Bugs::
* Factor Out Repeated Code::
* Debugging is For Users Too::
* Don't Trust the File System Contents::
* The File System Is Being Modified::
@end menu

@node    Make the Compiler Find the Bugs
@section Make the Compiler Find the Bugs

Finding bugs is tedious.  If I have a filesystem containing two
million files, and a find command line should print one million of
them, but in fact it misses out 1%, you can tell the program is
printing the wrong result only if you know the right answer for that
filesystem at that time.  If you don't know this, you may just not
find out about that bug.  For this reason it is important to have a
comprehensive test suite.

The test suite is of course not the only way to find the bugs.  The
findutils source code makes liberal use of the assert macro.  While on
the one hand these might be a performance drain, the performance
impact of most of these is negligible compared to the time taken to
fetch even one sector from a disk drive.

Assertions should not be used to check the results of operations which
may be affected by the program's external environment.  For example,
never assert that a file could be opened successfully.  Errors
relating to problems with the program's execution environment should
be diagnosed with a user-oriented error message.  An assertion failure
should always denote a bug in the program.

Avoid using @code{assert} to mark not-fully-implemented features of
your code as such.  Finish the implementation, disable the code, or
leave the unfinished version on a local branch.

Several programs in the findutils suite perform self-checks.  See for
example the function @code{pred_sanity_check} in @file{find/pred.c}.
This is generally desirable.

There are also a number of small ways in which we can help the
compiler to find the bugs for us.

@subsection Constants in Equality Testing

It's a common error to write @code{=} when @code{==} is meant.
Sometimes this happens in new code and is simply due to finger
trouble.  Sometimes it is the result of the inadvertent deletion of a
character.  In any case, there is a subset of cases where we can
persuade the compiler to generate an error message when we make this
mistake; this is where the equality test is with a constant.

This is an example of a vulnerable piece of code.

@example
if (x == 2)
 ...
@end example

A simple typo converts the above into

@example
if (x = 2)
 ...
@end example

We've introduced a bug; the condition is always true, and the value of
@code{x} has been changed.  However, a simple change to our practice
would have made us immune to this problem:

@example
if (2 == x)
 ...
@end example

Usually, the Emacs keystroke @kbd{M-t} can be used to swap the operands.


@subsection Spelling of ASCII NUL

Strings in C are just sequences of characters terminated by a NUL.
The ASCII NUL character has the numerical value zero.  It is normally
represented in C code as @samp{\0}.  Here is a typical piece of C
code:

@example
*p = '\0';
@end example

Consider what happens if there is an unfortunate typo:

@example
*p = '0';
@end example

We have changed the meaning of our program and the compiler cannot
diagnose this as an error.  Our string is no longer terminated.  Bad
things will probably happen.  It would be better if the compiler could
help us diagnose this problem.

In C, the type of @code{'\0'} is in fact int, not char.  This provides
us with a simple way to avoid this error.  The constant @code{0} has
the same value and type as the constant @code{'\0'}.  However, it is
not as vulnerable to typos.    For this reason I normally prefer to
use this code:

@example
*p = 0;
@end example


@node    Factor Out Repeated Code
@section Factor Out Repeated Code

Repeated code imposes a greater maintenance burden and increases the
exposure to bugs.  For example, if you discover that something you
want to implement has some similarity with an existing piece of code,
don't cut and paste it.  Instead, factor the code out.  The risk of
cutting and pasting the code, particularly if you do this several
times, is that you end up with several copies of the same code.

If the original code had a bug, you now have N places where this needs
to be fixed.  It's all to easy to miss some out when trying to fix the
bug.  Equally, it's quite possible that when pasting the code into
some function, the pasted code was not quite adapted correctly to its
new environment.  To pick a contrived example, perhaps it modifies a
global variable which it (that [original] code) shouldn't be touching
in its new home.  Worse, perhaps it makes some unstated assumption about
the nature of the input arguments which is in fact not true for the
context of the now duplicated code.

A good example of the use of refactoring in findutils is the
@code{collect_arg} function in @file{find/parser.c}.  A less clear-cut
but larger example is the factoring out of code which would otherwise
have been duplicated between @file{find/oldfind.c} and
@code{find/ftsfind.c}.

The findutils test suite is comprehensive enough that refactoring code
should not generally be a daunting prospect from a testing point of
view.  Nevertheless there are some areas which are only
lightly-tested:

@enumerate
@item Tests on the ages of files
@item Code which deals with the values returned by operating system calls (for example handling of ENOENT)
@item Code dealing with OS limits (for example, limits on path length
or exec arguments)
@item Code relating to features not all systems have (for example
Solaris Doors)
@end enumerate

Please exercise caution when working in those areas.


@node    Debugging is For Users Too
@section Debugging is For Users Too

Debug and diagnostic code is often used to verify that a program is
working in the way its author thinks it should be.  But users are
often uncertain about what a program is doing, too.  Exposing them a
little more diagnostic information can help.  Much of the diagnostic
code in @code{find}, for example, is controlled by the @samp{-D} flag,
as opposed to C preprocessor directives.

Making diagnostic messages available to users also means that the
phrasing of the diagnostic messages becomes important, too.


@node    Don't Trust the File System Contents
@section Don't Trust the File System Contents

People use @code{find} to search in directories created by other
people.  Sometimes they do this to check to suspicious activity (for
example to look for new setuid binaries).  This means that it would be
bad if @code{find} were vulnerable to, say, a security problem
exploitable by constructing a specially-crafted filename.  The same
consideration would apply to @code{locate} and @code{updatedb}.

Henry Spencer said this well in his fifth commandment:
@quotation
Thou shalt check the array bounds of all strings (indeed, all arrays),
for surely where thou typest @samp{foo} someone someday shall type
@samp{supercalifragilisticexpialidocious}.
@end quotation

Symbolic links can often be a problem.  If @code{find} calls
@code{lstat} on something and discovers that it is a directory, it's
normal for @code{find} to recurse into it.  Even if the @code{chdir}
system call is used immediately, there is still a window of
opportunity between the @code{lstat} and the @code{chdir} in which a
malicious person could rename the directory and substitute a symbolic
link to some other directory.

@node    The File System Is Being Modified
@section The File System Is Being Modified

The filesystem gets modified while you are traversing it.  For,
example, it's normal for files to get deleted while @code{find} is
traversing a directory.  Issuing an error message seems helpful when a
file is deleted from the one directory you are interested in, but if
@code{find} is searching 15000 directories, such a message becomes
less helpful.

Bear in mind also that it is possible for the directory @code{find} is
searching to be concurrently moved elsewhere in the file system,
and that the directory in which @code{find} was started could be
deleted.

Henry Spencer's sixth commandment is also apposite here:
@quotation
If a function be advertised to return an error code in the event of
difficulties, thou shalt check for that code, yea, even though the
checks triple the size of thy code and produce aches in thy typing
fingers, for if thou thinkest ``it cannot happen to me'', the gods
shall surely punish thee for thy arrogance.
@end quotation

There are a lot of files out there.  They come in all dates and
sizes.  There is a condition out there in the real world to exercise
every bit of the code base.  So we try to test that code base before
someone falls over a bug.


@node Tools
@chapter Tools
Most of the tools required to build findutils are mentioned in the
file @file{README-hacking}.  We also use some other tools:

@table @asis
@item System call traces
Much of the execution time of find is spent waiting for filesystem
operations.  A system call trace (for example, that provided by
@code{strace}) shows what system calls are being made.   Using this
information we can work to remove unnecessary file system operations.

@item Valgrind
Valgrind is a tool which dynamically verifies the memory accesses a
program makes to ensure that they are valid (for example, that the
behaviour of the program does not in any way depend on the contents of
uninitialized memory).

@item DejaGnu
DejaGnu is the test framework used to run the findutils test suite
(the @code{runtest} program is part of DejaGnu).  It would be ideal if
everybody building @code{findutils} also ran the test suite, but many
people don't have DejaGnu installed.  When changes are made to
findutils, DejaGnu is invoked a lot. @xref{Testing}, for more
information.
@end table

@node Using the GNU Portability Library
@chapter Using the GNU Portability Library
The Gnulib library (@url{https://www.gnu.org/software/gnulib/}) makes a
variety of systems look more like a GNU/Linux system and also applies
a bunch of automatic bug fixes and workarounds.  Some of these also
apply to GNU/Linux systems too.  For example, the Gnulib regex
implementation is used when we determine that we are building on a
GNU libc system with a bug in the regex implementation.


@section How and Why we Import the Gnulib Code
Gnulib does not have a release process which results in a source
tarball you can download.  Instead, the code is simply made available
by GIT, so we import gnulib via the submodule feature.  The bootstrap
script performs the necessary steps.

Findutils does not use all the Gnulib code.  The modules we need are
listed in the file @file{bootstrap.conf}.

The upshot of all this is that we can use the findutils git repository
to track which version of Gnulib every findutils release uses.

A small number of files are installed by automake and will therefore
vary according to which version of automake was used to generate a
release.  This includes for example boiler-plate GNU files such as
@file{ABOUT-NLS}, @file{INSTALL} and @file{COPYING}.


@section How We Fix Gnulib Bugs
Gnulib is used by quite a number of GNU projects, and this means that
it gets plenty of testing.  Therefore there are relatively few bugs in
the Gnulib code, but it does happen from time to time.

However, since there is no waiting around for a Gnulib source release
tarball, Gnulib bugs are generally fixed quickly.  Here is an outline
of the way we would contribute a fix to Gnulib (assuming you know it
is not already fixed in the current Gnulib git tree):

@table @asis
@item Check you already completed a copyright assignment for Gnulib
@item Begin with a vanilla git tree
Download the Findutils source code from git (or use the tree you have
already)
@item Run the bootstrap script
@item Run configure
@item Build findutils
Build findutils and run the test suite, which should pass.  In our
example we assume you have just noticed a bug in Gnulib, not that
recent Gnulib changes broke the findutils regression tests.
@item Write a test case
If in fact Gnulib did break the findutils regression tests, you can probably
skip this step, since you already have a test case demonstrating the problem.
Otherwise, write a findutils test case for the bug and/or a Gnulib test case.
@item Fix the Gnulib bug
Make sure your editor follows symbolic links so that your changes to
@file{gnulib/...} actually affect the files in the git working
directory you checked out earlier.   Observe that your test now passes.
@item Prepare a Gnulib patch
In the gnulib subdirectory, use @code{git format-patch} to prepare the
patch.  Follow the normal usage for checkin comments (take a look at
the output of @code{git log}).  Check that the patch conforms with the
GNU coding standards, and email it to the Gnulib mailing list.
@item Wait for the patch to be applied
Once your bug fix has been applied, you can update your gnulib
directory from git, and then check in the change to the submodule as
normal (you can check @code{git help submodule} for details).
@end table

There is an alternative to the method above; it is possible to store
local diffs to be patched into gnulib beneath the
@file{gnulib-local}.  Normally however, there is no need for this,
since gnulib updates are very prompt.

@section How to update Gnulib to latest
With a non-dirty working tree, the command @code{make update-gnulib-to-latest}
(or the shorter alias @code{make gnulib-sync} allows, well, to update the
gnulib submodule.  In detail, that is:
@enumerate
@item Fetching the latest upstream gnulib reference.
@item Copying the files which should stay in sync like
@file{bootstrap} from gnulib into the findutils working tree.
@item And finally showing the @code{git status} for the gnulib submodule
and the above copied files.
@end enumerate
After that, the maintainer compares if all is correct, if the findutils build
and run correct, and finally commits with the new gnulib version, e.g. via
@code{git gui}.

The @code{gnulib-sync} target can be run any time - after a @code{configure}
run -, and only rejects to run if the working tree is dirty.

@node Documentation
@chapter Documentation

The findutils git tree includes several different types of
documentation.

@section git change log
The git change log for the source tree contains check-in messages
which describe each check-in.   These have a standard format:

@smallexample
Summary of the change.

(ChangeLog-style detail)
@end smallexample

Here, the format of the detail part follows the standard GNU ChangeLog
style, but without whitespace in the left margin and without
author/date headers.   Take a look at the output of @code{git log} to
see some examples.   The README-hacking file also contains an example
with an explanation.

@section User Documentation
User-oriented documentation is provided as manual pages and in
Texinfo.  See
@ref{Introduction,,Introduction,find,The Findutils manual}.

Please make sure both sets of documentation are updated if you make a
change to the code.  The GNU coding standards do not normally call for
maintaining manual pages on the grounds of effort duplication.
However, the manual page format is more convenient for quick
reference, and so it's worth maintaining both types of documentation.
However, the manual pages are normally rather more terse than the
Texinfo documentation.  The manual pages are suitable for reference
use, but the Texinfo manual should also include introductory and
tutorial material.


@section Build Guidance

@table @file
@item ABOUT-NLS
Describes the Free Translation Project, the translation status of
various GNU projects, and how to participate by translating an
application.
@item AUTHORS
Lists the authors of findutils.
@item COPYING
The copyright license covering findutils; currently, the GNU GPL,
version 3.
@item INSTALL
Generic installation instructions for installing GNU programs.
@item README
Information about how to compile findutils in particular
@item README-hacking
Describes how to build findutils from the code in git.
@item THANKS
Thanks for people who contributed to findutils.  Generally, if
someone's contribution was significant enough to need a copyright
assignment, their name should go in here.
@item TODO
Mainly obsolete.  Please add bugs to the Savannah bug tracker instead
of adding entries to this file.
@end table


@section Release Information
@table @file
@item NEWS
Enumerates the user-visible change in each release.  Typical changes
are fixed bugs, functionality changes and documentation changes.
Include the date when a release is made.
@item ChangeLog
This file enumerates all changes to the findutils source code (with
the possible exception of @file{.cvsignore} and @code{.gitignore}
changes).  The level of detail used for this file should be sufficient
to answer the questions ``what changed?'' and ``why was it changed?''.
The file is generated from the git commit messages during @code{make dist}.
If a change fixes a bug, always give the bug reference number in the
@file{NEWS} file and of course also in the checkin message.
In general, it should be possible to enumerate all
material changes to a function by searching for its name in
@file{ChangeLog}.  Mention when each release is made.
@end table

@node Testing
@chapter Testing
This chapter will explain the general procedures for adding tests to
the test suite, and the functions defined in the findutils-specific
DejaGnu configuration.  Where appropriate references will be made to
the DejaGnu documentation.

@node Bugs
@chapter Bugs

Bugs are logged in the Savannah bug tracker
@url{https://savannah.gnu.org/bugs/?group=findutils}.  The tracker
offers several fields but their use is largely obvious.  The
life-cycle of a bug is like this:


@table @asis
@item Open
Someone, usually a maintainer, a distribution maintainer or a user,
creates a bug by filling in the form.   They fill in field values as
they see fit.  This will generate an email to
@email{bug-findutils@@gnu.org}.

@item Triage
The bug hangs around with @samp{Status=None} until someone begins to
work on it.  At that point they set the ``Assigned To'' field and will
sometimes set the status to @samp{In Progress}, especially if the bug
will take a while to fix.

@item Non-bugs
Quite a lot of reports are not actually bugs; for these the usual
procedure is to explain why the problem is not a bug, set the status
to @samp{Invalid} and close the bug.   Make sure you set the
@samp{Assigned to} field to yourself before closing the bug.

@item Fixing
When you commit a bug fix into git (or in the case of a contributed
patch, commit the change), mark the bug as @samp{Fixed}.  Make sure
you include a new test case where this is relevant.  If you can figure
out which releases are affected, please also set the @samp{Release}
field to the earliest release which is affected by the bug.
Indicate which source branch the fix is included in (for example,
4.2.x or 4.3.x).  Don't close the bug yet.

@item Release
When a release is made which includes the bug fix, make sure the bug
is listed in the NEWS file.  Once the release is made, fill in the
@samp{Fixed Release} field and close the bug.
@end table


@node Distributions
@chapter Distributions
Almost all GNU/Linux distributions include findutils, but only some of
them have a package maintainer who is a member of the mailing list.
Distributions don't often feed back patches to the
@email{bug-findutils@@gnu.org} list, but on the other hand many of
their patches relate only to standards for file locations and so
forth, and are therefore distribution specific.  On an irregular basis
I check the current patches being used by one or two distributions,
but the total number of GNU/Linux distributions is large enough that
we could not hope to cover them all.

Often, bugs are raised against a distribution's bug tracker instead of
GNU's.    Periodically (about every six months) I take a look at some
of the more accessible bug trackers to indicate which bugs have been
fixed upstream.

Many distributions include both findutils and the slocate package,
which provides a replacement @code{locate}.


@node Internationalisation
@chapter Internationalisation
Translation is essentially automated from the maintainer's point of
view.  The TP mails the maintainer when a new PO file is available,
and we just download it and check it in.  The @file{bootstrap} script
copies @file{.po} files into the working tree.  For more information,
please see
@url{https://translationproject.org/domain/findutils.html}.


@node Security
@chapter Security

See @ref{Security Considerations, ,Security Considerations,find,The
Findutils manual}, for a full description of the findutils approach to
security considerations and discussion of particular tools.

If someone reports a security bug publicly, we should fix this as
rapidly as possible.  If necessary, this can mean issuing a fixed
release containing just the one bug fix.  We try to avoid issuing
releases which include both significant security fixes and functional
changes.

Where someone reports a security problem privately, we generally try
to construct and test a patch without pushing the intermediate code to
the public repository.

Once everything has been tested, this allows us to make a release and
push the patch.  The advantage of doing things this way is that we
avoid situations where people watching for git commits can figure out
and exploit a security problem before a fixed release is available.

It's important that security problems be fixed promptly, but don't
rush so much that things go wrong.  Make sure the new release really
fixes the problem.  It's usually best not to include functional
changes in your security-fix release.

If the security problem is serious, send an alert to
@email{vendor-sec@@lst.de}.  The members of the list include most
GNU/Linux distributions.  The point of doing this is to allow them to
prepare to release your security fix to their customers, once the fix
becomes available.    Here is an example alert:-

@smallexample
GNU findutils heap buffer overrun (potential privilege escalation)



I. BACKGROUND
=============

GNU findutils is a set of programs which search for files on Unix-like
systems.  It is maintained by the GNU Project of the Free Software
Foundation.  For more information, see
@url{https://www.gnu.org/software/findutils}.


II. DESCRIPTION
===============

When GNU locate reads filenames from an old-format locate database,
they are read into a fixed-length buffer allocated on the heap.
Filenames longer than the 1026-byte buffer can cause a buffer overrun.
The overrunning data can be chosen by any person able to control the
names of filenames created on the local system.  This will normally
include all local users, but in many cases also remote users (for
example in the case of FTP servers allowing uploads).

III. ANALYSIS
=============

Findutils supports three different formats of locate database, its
native format "LOCATE02", the slocate variant of LOCATE02, and a
traditional ("old") format that locate uses on other Unix systems.

When locate reads filenames from a LOCATE02 database (the default
format), the buffer into which data is read is automatically extended
to accommodate the length of the filenames.

This automatic buffer extension does not happen for old-format
databases.  Instead a 1026-byte buffer is used.  When a longer
pathname appears in the locate database, the end of this buffer is
overrun.  The buffer is allocated on the heap (not the stack).

If the locate database is in the default LOCATE02 format, the locate
program does perform automatic buffer extension, and the program is
not vulnerable to this problem.  The software used to build the
old-format locate database is not itself vulnerable to the same
attack.

Most installations of GNU findutils do not use the old database
format, and so will not be vulnerable.


IV. DETECTION
=============

Software
--------
All existing releases of findutils are affected.


Installations
-------------

To discover the longest path name on a given system, you can use the
following command (requires GNU findutils and GNU coreutils):

@verbatim
find / -print0 | tr -c '\0' 'x' | tr '\0' '\n' | wc -L
@end verbatim

V. EXAMPLE
==========

This section includes a shell script which determines which of a list
of locate binaries is vulnerable to the problem.  The shell script has
been tested only on glibc based systems having a mktemp binary.

NOTE: This script deliberately overruns the buffer in order to
determine if a binary is affected.  Therefore running it on your
system may have undesirable effects.  We recommend that you read the
script before running it.

@verbatim
#! /bin/sh
set +m
if vanilla_db="$(mktemp nicedb.XXXXXX)" ; then
    if updatedb --prunepaths="" --old-format --localpaths="/tmp" \
	--output="$@{vanilla_db@}" ; then
	true
    else
	rm -f "$@{vanilla_db@}"
	vanilla_db=""
	echo "Failed to create old-format locate database; skipping the sanity checks" >&2
    fi
fi

make_overrun_db() @{
    # Start with a valid database
    cat "$@{vanilla_db@}"
    # Make the final entry really long
    dd if=/dev/zero  bs=1 count=1500 2>/dev/null | tr '\000' 'x'
@}



ulimit -c 0

usage() @{ echo "usage: $0 binary [binary...]" >&2; exit $1; @}
[ $# -eq 0 ] && usage 1

bad=""
good=""
ugly=""
if dbfile="$(mktemp nasty.XXXXXX)"
then
    make_overrun_db > "$dbfile"
    for locate ; do
      ver="$locate = $("$locate"  --version | head -1)"
      if [ -z "$vanilla_db" ] || "$locate" -d "$vanilla_db" "" >/dev/null ; then
	  "$locate" -d "$dbfile" "" >/dev/null
	  if [ $? -gt 128 ] ; then
	      bad="$bad
vulnerable: $ver"
	  else
	      good="$good
good: $ver"
	  fi
       else
	  # the regular locate failed
	  ugly="$ugly
buggy, may or may not be vulnerable: $ver"
       fi
    done
    rm -f "$@{dbfile@}" "$@{vanilla_db@}"
    # good: unaffected.  bad: affected (vulnerable).
    # ugly: doesn't even work for a normal old-format database.
    echo "$good"
    echo "$bad"
    echo "$ugly"
else
  exit 1
fi
@end verbatim




VI. VENDOR RESPONSE
===================

The GNU project discovered the problem while 'locate' was being worked
on; this is the first public announcement of the problem.

The GNU findutils mantainer has issued a patch as p[art of this
announcement.  The patch appears below.

A source release of findutils-4.2.31 will be issued on 2007-05-30.
That release will of course include the patch.  The patch will be
committed to the public CVS repository at the same time.  Public
announcements of the release, including a description of the bug, will
be made at the same time as the release.

A release of findutils-4.3.x will follow and will also include the
patch.


VII. PATCH
==========

This patch should apply to findutils-4.2.23 and later.
Findutils-4.2.23 was released almost two years ago.
@verbatim
Index: locate/locate.c
===================================================================
RCS file: /cvsroot/findutils/findutils/locate/locate.c,v
retrieving revision 1.58.2.2
diff -u -p -r1.58.2.2 locate.c
--- locate/locate.c	22 Apr 2007 16:57:42 -0000	1.58.2.2
+++ locate/locate.c	28 May 2007 10:18:16 -0000
@@@@ -124,9 +124,9 @@@@ extern int errno;

 #include "locatedb.h"
 #include <getline.h>
-#include "../gnulib/lib/xalloc.h"
-#include "../gnulib/lib/error.h"
-#include "../gnulib/lib/human.h"
+#include "xalloc.h"
+#include "error.h"
+#include "human.h"
 #include "dirname.h"
 #include "closeout.h"
 #include "nextelem.h"
@@@@ -468,10 +468,36 @@@@ visit_justprint_unquoted(struct process_
   return VISIT_CONTINUE;
 @}

+static void
+toolong (struct process_data *procdata)
+@{
+  error (EXIT_FAILURE, 0,
+	 _("locate database %s contains a "
+	   "filename longer than locate can handle"),
+	 procdata->dbfile);
+@}
+
+static void
+extend (struct process_data *procdata, size_t siz1, size_t siz2)
+@{
+  /* Figure out if the addition operation is safe before performing it. */
+  if (SIZE_MAX - siz1 < siz2)
+    @{
+      toolong (procdata);
+    @}
+  else if (procdata->pathsize < (siz1+siz2))
+    @{
+      procdata->pathsize = siz1+siz2;
+      procdata->original_filename = x2nrealloc (procdata->original_filename,
+						&procdata->pathsize,
+						1);
+    @}
+@}
+
 static int
 visit_old_format(struct process_data *procdata, void *context)
 @{
-  register char *s;
+  register size_t i;
   (void) context;

   /* Get the offset in the path where this path info starts.  */
@@@@ -479,20 +505,35 @@@@ visit_old_format(struct process_data *pr
     procdata->count += getw (procdata->fp) - LOCATEDB_OLD_OFFSET;
   else
     procdata->count += procdata->c - LOCATEDB_OLD_OFFSET;
+  assert(procdata->count > 0);

-  /* Overlay the old path with the remainder of the new.  */
-  for (s = procdata->original_filename + procdata->count;
+  /* Overlay the old path with the remainder of the new.  Read
+   * more data until we get to the next filename.
+   */
+  for (i=procdata->count;
        (procdata->c = getc (procdata->fp)) > LOCATEDB_OLD_ESCAPE;)
-    if (procdata->c < 0200)
-      *s++ = procdata->c;		/* An ordinary character.  */
-    else
-      @{
-	/* Bigram markers have the high bit set. */
-	procdata->c &= 0177;
-	*s++ = procdata->bigram1[procdata->c];
-	*s++ = procdata->bigram2[procdata->c];
-      @}
-  *s-- = '\0';
+    @{
+      if (procdata->c < 0200)
+	@{
+	  /* An ordinary character. */
+	  extend (procdata, i, 1u);
+	  procdata->original_filename[i++] = procdata->c;
+	@}
+      else
+	@{
+	  /* Bigram markers have the high bit set. */
+	  extend (procdata, i, 2u);
+	  procdata->c &= 0177;
+	  procdata->original_filename[i++] = procdata->bigram1[procdata->c];
+	  procdata->original_filename[i++] = procdata->bigram2[procdata->c];
+	@}
+    @}
+
+  /* Consider the case where we executed the loop body zero times; we
+   * still need space for the terminating null byte.
+   */
+  extend (procdata, i, 1u);
+  procdata->original_filename[i] = 0;

   procdata->munged_filename = procdata->original_filename;
@end verbatim


VIII. THANKS
============

Thanks to Rob Holland <rob@@inversepath.com> and Tavis Ormandy.


VIII. CVE INFORMATION
=====================

No CVE candidate number has yet been assigned for this vulnerability.
If someone provides one, I will include it in the public announcement
and change logs.
@end smallexample

The original announcement above was sent out with a cleartext PGP
signature, of course, but that has been omitted from the example.

Once a fixed release is available, announce the new release using the
normal channels.  Any CVE number assigned for the problem should be
included in the @file{ChangeLog} and @file{NEWS} entries. See
@url{https://cve.mitre.org/} for an explanation of CVE numbers.



@node Making Releases
@chapter Making Releases
This section will explain how to make a findutils release.   For the
time being here is a terse description of the main steps:

@enumerate
@item Commit changes; make sure your working directory has no
uncommitted changes.
@item Test; make sure that all changes you have made have tests, and
that the tests pass.  Verify this with @code{make distcheck}.
@item Bugs; make sure all Savannah bug entries fixed in this release
are fixed.
@item NEWS; make sure that the NEWS file is updated with the new release
number (and checked in).
@item Tag the release; findutils releases are tagged like this for
example: v4.5.5.  Previously a different format was in use:
FINDUTILS_4_3_8-1.  You can create a tag with the a command like this:
@code{git tag -s -m "Findutils release v4.5.7" v4.5.7}.
@item Build the release tarball; do this with @code{make distcheck}.
Copy the tarball somewhere safe.
@item Prepare the upload and upload it.
@xref{Automated FTP Uploads, ,Automated FTP
Uploads, maintain, Information for Maintainers of GNU Software},
for detailed upload instructions.
@item Make a release announcement; include an extract from the NEWS
file which explains what's changed.  Announcements for test releases
should just go to @email{bug-findutils@@gnu.org}.  Announcements for
stable releases should go to @email{info-gnu@@gnu.org} as well.
@item Post-release administrativa: add a new dummy release header in NEWS:

@code{* Major changes in release ?.?.?, YYYY-MM-DD}

and update the @code{old_NEWS_hash} in @file{cfg.mk} with
@code{make update-NEWS-hash}.
Commit both changes.
@item Close bugs; any bugs recorded on Savannah which were fixed in this
release should now be marked as closed.   Update the @samp{Fixed
Release} field of these bugs appropriately and make sure the
@samp{Assigned to} field is populated.
@end enumerate


@node GNU Free Documentation License
@appendix GNU Free Documentation License
@include fdl.texi

@bye

@comment texi related words used by Emacs' spell checker ispell.el

@comment LocalWords: texinfo setfilename settitle setchapternewpage
@comment LocalWords: iftex finalout ifinfo DIR titlepage vskip pt
@comment LocalWords: filll dir samp dfn noindent xref pxref
@comment LocalWords: var deffn texi deffnx itemx emph asis
@comment LocalWords: findex smallexample subsubsection cindex
@comment LocalWords: dircategory direntry itemize

@comment other words used by Emacs' spell checker ispell.el
@comment LocalWords: README fred updatedb xargs Plett Rendell akefile
@comment LocalWords: args grep Filesystems fo foo fOo wildcards iname
@comment LocalWords: ipath regex iregex expr fubar regexps
@comment LocalWords: metacharacters macs sr sc inode lname ilname
@comment LocalWords: sysdep noleaf ls inum xdev filesystems usr atime
@comment LocalWords: ctime mtime amin cmin mmin al daystart Sladkey rm
@comment LocalWords: anewer cnewer bckw rf xtype uname gname uid gid
@comment LocalWords: nouser nogroup chown chgrp perm ch maxdepth
@comment LocalWords: mindepth cpio src CD AFS statted stat fstype ufs
@comment LocalWords: nfs tmp mfs printf fprint dils rw djm Nov lwall
@comment LocalWords: POSIXLY fls fprintf strftime locale's EDT GMT AP
@comment LocalWords: EST diff perl backquotes sprintf Falstad Oct cron
@comment LocalWords: eg vmunix mkdir afs allexec allwrite ARG bigram
@comment LocalWords: bigrams cd chmod comp crc CVS dbfile eof
@comment LocalWords: fileserver filesystem fn frcode Ghazi Hnewc iXX
@comment LocalWords: joeuser Kaveh localpaths localuser LOGNAME
@comment LocalWords: Meyering mv netpaths netuser nonblank nonblanks
@comment LocalWords: ois ok Pinard printindex proc procs prunefs
@comment LocalWords: prunepaths pwd RFS rmadillo rmdir rsh sbins str
@comment LocalWords: su Timar ubins ug unstripped vf VM Weitzel
@comment LocalWords: wildcard zlogout basename execdir wholename iwholename
@comment LocalWords: timestamp timestamps Solaris FreeBSD OpenBSD POSIX