File: mirror.txt

package info (click to toggle)
mirror 2.9-52
  • links: PTS
  • area: main
  • in suites: sarge
  • size: 1,084 kB
  • ctags: 568
  • sloc: perl: 10,892; sh: 174; makefile: 152
file content (1130 lines) | stat: -rw-r--r-- 58,766 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
                         Mirror 2.9 Reference Manual

                               Lee McLoughlin

                                     and

                                  Zo Leech

                                 1 June 1998
                            lmjm@icparc.ic.ac.uk
                             zl@icparc.ic.ac.uk

   * Introduction
   * Description
   * Flags
   * Package Files
        o Keywords
   * Filestores
   * Examples
   * Temporary Filenames
   * Regular Expressions
   * Hints
   * Netiquette
   * See Also
   * Bugs
   * Remember!
   * Author

Introduction

Mirror is a package written in Perl that uses the FTP protocol to duplicate
a directory hierarchy between the machine it is run on and a remote host. It
avoids copying files unnecessarily by comparing the file time-stamps and
file sizes before transferring. Amongst other things, it can optionally
rename, compress, gzip, and split files.

Mirror was written by Lee McLoughlin <lmjm@icparc.ic.ac.uk> for use by
archive maintainers but can be used by anyone wanting to transfer a lot of
files via FTP.  Although originally only available on Un*x with version 2.9
mirror will also run on Wind*ws 95 and Wind*ws NT.


The latest version of mirror can always be found at either:

     ftp://sunsite.org.uk/packages/mirror/mirror.tar.gz
     ftp://sunsite.org.uk/packages/mirror/mirror.zip

The latest version of this guide can always be found at:

     http://sunsite.org.uk/packages/mirror/

Description

Mirror is called in one of two ways (see also mirror master):

     mirror [flags] -gsite:pathname

     mirror [flags] [package-files]

The first method is used to retrieve a remote file or directory into the
current directory. If you are mirroring a directory it is best to end the
pathname in a slash ('/') as this makes the remote recursive listing smaller
or use the -r flag to suppress recursion (see -g below). The mirror.defaults
file is not used.

In the second method given above, a minimal number of arguments are required
and mirror is controlled by keyword=value lines read from the package
files. If a file named mirror.defaults is found in either the directory
containing the mirror executable or in the PERLLIB path, then it is loaded
before any of the package-files.  mirror.defaults normally just contains
the package of keyword settings called defaults that is used to provide
common defaults for all package-files.   If no mirror.defaults file is
found  the default settings built into mirror  are used.

Each package-files is read in turn, looking for named packages.  If the
package is not named defaults, then mirror will perform the following steps.

If mirror is already connected to a site, other than the target site, it
will disconnect from the site.  It then changes to the given local
directory, creating it if necessary, and scans it to get the details of the
local files that are already there.  Mirror then attempts to connect to the
remote site's FTP daemon. It will then login using the given remote_user and
remote_password.  The remote directory is then scanned. Mirror does this by
changing to the remote directory (remote_dir) and running the FTP LIST
command, passing the flags_recursive  or flags_nonrecursive options
depending on the value of recursive.  Alternatively a file containing the
directory listing may be retrieved (see ls_lR_file and local_ls_lR_file) .
Each remote pathname will have any required mappings performed on it to
create a local pathname. Then any checks specified by the exclude_patt,
max_days, get_newer and get_size_change keywords are applied to names of
files or symlinks. max_days, get_newer and get_size_change  are not applied
to directories.  This creates a list of all required remote files and the
local pathnames to store them in.

Local versions of all required directories are then created.  Then all
required files are fetched from the remote site into their local pathnames.
This is done by retrieving the file into a temporary file in the target
directory. The transfer is normally done in binary mode (see
vms_xfer_text).  If required the temporary file may be compressed, gzip'ed
or split. The file's time-stamps are reset to match those of the remote
file.  Finally the temporary file is renamed to have the correct name.

Once all files have been transferred any required symbolic links are created
(where support by your Operating System) and any unnecessary pathnames in
the mirror are deleted.

Unless an internal failure is detected, any error will cause the current
package to be skipped and the next one tried.

Mirror can handle symbolic links but not hard links. It does not duplicate
owner or group information as usually this is meaningless over a network
(but see user and group). If you require any of these options and you are on
Un*x use rdist(1) instead.

Mirror was written to mirror remote Un*x archives, but has grown (like
topsy).

Flags

Although mirror has a large number of command line flags most should only
really be used when doing a very simple mirror as a one-time event.  If you
intend to maintain a mirror area it is much better to put all the details
into a mirror package file and then run mirror on that file.

The only flags you should use often are -n and, if you like to see what
mirror is up to,-d.

 -d           Enable debugging. If this argument is given more than once
              (e.g. -d -d) the debugging level will increase. Currently the
              maximum useful level is four.
 -n           Do nothing except compare local and remote directories, no
              file transfers are done. Sets debug level to two, so that you
              are shown a trace of what would be done.
 -g site:path Get all files  matching path, which is a regexp, on the given
              site. If path matches .*/.+ (e.g. /fred or /fred/bloggs) then
              it is the name of the directory and everything after the last
              / is the pattern of filenames to get. If path ends with /
              then it is the name of a directory and all its contents are
              retrieved.  One note of caution. If you use host:/fred, a
              full directory listing of / on the remote host will be done.
              If all you wanted was the contents of the directory /fred
              then specify host:/fred/
 -p package   When using multiple package files only mirror the given
              package. This option may be given multiple times in which
              case all the given packages will be mirrored. Without this
              option, all packages will be mirrored. Package is a regexp
              matched against the package name following the -p.
 -R package   Similar to -p but skips all packages until it reaches the
              given package. Useful for restarting failed mirror runs from
              where they left off.
 -F           Use temporary dbm files for the information about files. This
              is useful if you mirror a very large directory.  See the
              variable use_files.
 -r           Equivalent to -k recursive=false
 -v           Print the version details of mirror and exit.
 -T           Do not do any file transfers just force the time-stamps of
              any local files to be reset to be the same as the remote
              files. Normally only used when initialising a mirror that
              already contains files retrieved another way (e.g. from
              CDROM).
 -Ufilename   Record all files transfered by mirror into the given
              filename. Remember that mirror changes into local_dir to do
              its work, so it should be a full pathname. If no filename is
              given, it defaults to upload_log.day.month.year.
 -k key=value Override any default key/value.  See below
 -m           Equivalent to -k mode_copy=true
 -t           Equivalent to -k text_mode=true
 -f           Equivalent to -k force=true
 -s site      Equivalent to -k site=site
 -u user      Equivalent to -k remote_user=user You are then prompted for a
              password, with echo turned off. The password is used as the
              remote_password.
 -L           Just generate a pretty printed version of the input and exit.

Package Files

Each group of keywords defines how to mirror a particular package and should
begin with a unique package line. The package name is used in report
generation and by the -p argument, so pick something mnemonic. The minimum
needed for each package is package, site, remote_dir and local_dir . On
finding a package line, all the default values are reset to either the
values from the defaults package (or built-in values if defaults has not
been set).  A package ends at either the next package statement or at the
end of file.

Package files are parsed as a series of statements. Blank lines and lines
beginning with a hash are ignored. Each statement is of the form

     keyword=value

or

     keyword+value

 You can add whitespace before the keyword and the equals/plus. Everything
immediately following the equals/plus is the value, including any leading or
trailing whitespace. The equals version sets the keyword to this value,
while the plus version concatenates the value onto the end of the existing
value (normally set in defaults package).

A statement can be continued over multiple lines by ending all lines except
the last, with the character ampersand ('&'). The line following the
ampersand, is appended to the current line with all leading whitespace
removed.

Although there are a lot of keywords that can be set, the built-in defaults
will handle most cases. Normally only package,  site,  remote_dir and
local_dir need to be set.

Setting Defaults

If the package name is defaults, then no site is contacted, but the default
values given for any keywords are changed. Normally all the defaults are in
the file mirror.defaults which will be automatically loaded before any
package files (see Description).

# Sample mirror.defaults
package=defaults
        # The LOCAL hostname - if not the same as `hostname` returns
        # (I advertise the name sunsite.org.uk but the machine is
        #  really swallow.doc.ic.ac.uk.)
        hostname=sunsite.org.uk
        # Keep all local_dirs relative to here
        local_dir=/public/
        remote_password=wizards@sunsite.org.uk

Keywords

The following is a list of all the available keywords and the default values
built into mirror.  To change these defaults it is usually best to change
your mirror.defaults file.

The keywords are grouped into the following sections:

   * Required Keywords
   * FTP Related
   * File Copying
   * Local File Attributes
   * File Deletion
   * File Compression
   * File Splitting
   * Directory Listings
   * Logging
   * Special


 Required Keywords
 keyword              default                    Description
 package              none                       A name for the package to
                                                 be mirrored.  Should be
                                                 different from all other
                                                 package names you use.
 site                 none                       Hostname or IP address of
                                                 the remote site to mirror
                                                 from.
 remote_dir           none                       Remote directory to
                                                 mirror. See also
                                                 recurse_hard.
 local_dir            none                       Local directory.

 FTP Related
 keyword              default                    Description
 remote_user          anonymous                  Username to use at remote
                                                 site.
 remote_password      localuser@localhostname    Password to use at remote
                                                 site.  Note: localuser is
                                                 will be your name and
                                                 localhostname will be the
                                                 name of the local machine
                                                 (if it can be found, see
                                                 hostname)
 remote_account       none                       Account name/password to
                                                 use at remote site, after
                                                 logging in anonymously
                                                 (for systems that require
                                                 it).
 remote_group         none                       If present set the remote
                                                 'site group'.
 remote_gpass         none                       If present set the remote
                                                 'site gpass'.
 timeout              40                         Timeout FTP requests after
                                                 this many seconds.
 failed_gets_excl     none                       Regexp of error messages
                                                 to skip reporting, when
                                                 the FTP GET command
                                                 fails.  (E.g. permission
                                                 denied.)
 ftp_port             21                         Port number of remote FTP
                                                 daemon.
 proxy                false                      Set to true to use proxy
                                                 FTP service.
 proxy_ftp_port       4514                       Port number of
                                                 proxy-service FTP daemon.
                                                 This value should be
                                                 changed depending on which
                                                 proxy library you are
                                                 using.
 proxy_gateway        internet-gateway           Name of proxy-service, may
                                                 also be supplied by the
                                                 environment variable
                                                 INTERNET_HOST.
 using_socks          false                      Set to true if you are
                                                 using a SOCKS version of
                                                 Perl.
 passive_ftp          false                      Set to true if you want to
                                                 use the PASV extension of
                                                 the FTP protocol.
                                                 Especially useful with
                                                 firewalls, other proxy FTP
                                                 servers, and the variable
                                                 using_socks.
 retry_call           true                       If initial connect fails,
                                                 retry ONCE after ONE
                                                 minute. This is to handle
                                                 sites which reverse lookup
                                                 the incoming host but
                                                 sometimes timeout on the
                                                 first attempt.
 disconnect           false                      Disconnect from remote
                                                 site at end of package.
                                                 Normally only disconnects
                                                 if the next package
                                                 specifies a different
                                                 site.  (Some sites will
                                                 not let you change to
                                                 certain directories except
                                                 when first connecting in.)
 remote_idle          none                       If set try and set the
                                                 remote idle timer to this.

 File Copying
 keyword              default                    Description
 get_patt             .                          Regexp of remote pathnames
                                                 to retrieve.
 exclude_patt         none                       Regexp of remote pathnames
                                                 to ignore.
 local_ignore         none                       Regexp of local pathnames
                                                 to ignore. Useful to skip
                                                 restricted local
                                                 directories.
 get_newer            true                       Get the remote file if it
                                                 is more recent that the
                                                 local file.
 get_size_change      true                       Get the file if the size
                                                 is different from local.
                                                 If the file is to be
                                                 compressed after being
                                                 fetched get_size_change is
                                                 automatically set to
                                                 false.
 make_bad_symlinks    false                      If true, symlinks will be
                                                 made to invalid
                                                 (non-existent) pathnames.
                                                 (In older versions of
                                                 mirror this defaulted to
                                                 true.)
 follow_local_symlinksnone                       Regexp of pathnames of
                                                 local symbolic links.
                                                 Rather than treating them
                                                 as symlinks the target
                                                 files or directories they
                                                 reference are used
                                                 instead. This makes local
                                                 symlinks invisible to
                                                 mirror.
 get_missing          true                       Really get files. When set
                                                 to false, only deletions
                                                 and symlinking will be
                                                 done. Used to delete
                                                 expired files older than
                                                 max_days without
                                                 retrieving older files.
 get_file             true                       Get files.  If set to
                                                 false mirror will try to
                                                 put files.
 text_mode            false                      If true, all files are
                                                 transferred in TEXT mode.
                                                 Un*x prefers binary so
                                                 that is the default.
 strip_cr             false                      Strip carriage returns
                                                 from any file as it is
                                                 retrieved.
 vms_keep_versions    true                       When mirroring VMS files,
                                                 keep the version numbers.
                                                 If false, the versions are
                                                 stripped off and the only
                                                 the base filenames are
                                                 kept.
 vms_xfer_text        (readme|info|listing|\.c)$ Pattern of VMS files to
                                                 transfer in TEXT mode
                                                 (case insensitive).
 name_mappings        none                       Remote to local pathname
                                                 mappings (a Perl
                                                 substitute command, e.g.
                                                 s:old:new:).
 external_mapping     none                       Specifies a file that
                                                 should contain a Perl
                                                 module called extmap
                                                 containing at least a
                                                 function called map.  This
                                                 function is used as the
                                                 name_mappings function.
 update_local         false                      Set get_patt to be all the
                                                 files and directories
                                                 already present in
                                                 local_dir.
 max_days             0                          If >0, ignore files older
                                                 than this many days.  Any
                                                 ignored files will not be
                                                 transferred or deleted.
 max_size             0                          If >0, do not transfer any
                                                 files any larger than this
                                                 many bytes.
 chmod                true                       By default try and set the
                                                 file attributes (e.g.
                                                 time-stamps) of the copied
                                                 file.  If false do not set
                                                 attributes.

 Local File Attributes
 keyword              default                    Description
 user                 none                       User name or uid to give
                                                 to local pathnames.
 group                none                       Group name or gid to give
                                                 to local pathnames.
 mode_copy            false                      Flag indicating if we need
                                                 to copy the file/dir
                                                 modes.  If this is false
                                                 then file_mode and
                                                 dir_mode will be used
                                                 instead.
 file_mode            0444                       Mode to give files created
                                                 locally if mode_copy is
                                                 false.
 dir_mode             0755                       Mode to give directories
                                                 created locally if
                                                 mode_copy is false.
 force                false                      If true, all files will be
                                                 transferred regardless of
                                                 the results from size or
                                                 time-stamp comparisons.
 umask                07000                      Do not create setuid files
                                                 by default (see the
                                                 chmod(1) on Un*x).
 use_timelocal        true                       Time-stamp files to local
                                                 time zone. If false, the
                                                 time zone is set to GMT
                                                 (older versions of mirror
                                                 had a bug setting all
                                                 files to GMT).
 force_times          yes                        Force local times to match
                                                 remote times.

 File Deletion
 keyword              default                    Description
 do_deletes           false                      Delete destination files
                                                 if not in source tree.
 delete_patt          .                          Regexp of local pathnames
                                                 to check for deletions.
                                                 Names that are not matched
                                                 are not checked. The match
                                                 by delete_excl is done to
                                                 all files selected by this
                                                 pattern.
 delete_get_patt      false                      Set delete_patt to be
                                                 get_patt.
 delete_excl          none                       Regexp of local pathnames
                                                 that mirror will not
                                                 delete.
 max_delete_files     10%                        If this is set to just a
                                                 number and there are more
                                                 than this many files to
                                                 delete, do not delete just
                                                 warn. If this is set to
                                                 number% and the percentage
                                                 of files that would be
                                                 deleted is greater than
                                                 the number, do not delete
                                                 just warn.
 max_delete_dirs      10%                        As max_delete_files except
                                                 applies to directories.
 save_deletes         false                      Instead of deleting local
                                                 files move them into
                                                 save_dir .
 save_dir             Old                        Where local files no
                                                 longer on remote site are
                                                 moved to.  Either begins
                                                 with / or is relative to
                                                 local_dir.  Only used when
                                                 save_deletes is true.
 store_remote_listing none                       Local pathname where
                                                 remote listings are kept.
                                                 Useful if you have a slow
                                                 network or want to perform
                                                 several operations on the
                                                 same package without
                                                 retrieving the index every
                                                 time.

 File Compression
 keyword              default                    Description
 compress_patt        none                       Regexp of files to
                                                 compress before storing
                                                 locally. See
                                                 get_size_change.
 compress_excl        \.(z|gz)$                  Regexp of files not to
                                                 compress (case
                                                 insensitive).
 compress_prog        compress                   Program to compress files.
                                                 If set to the word
                                                 compress or gzip, the full
                                                 pathname for the program
                                                 and correct
                                                 compress_suffix will
                                                 automatically be set. When
                                                 using gzip, level -9 is
                                                 used. Note that
                                                 compress_suffix can be
                                                 reset to a non-standard
                                                 value by setting it after
                                                 compress_prog.
 compress_suffix      none                       Character(s) the compress
                                                 program appends to files.
                                                 If compress_prog is
                                                 compress, this defaults to
                                                 .Z. If compress_prog is
                                                 gzip, this defaults to
                                                 .gz.
 compress_conv_patt   (\.Z|\.taz)$               If compress_prog is gzip,
                                                 files matching this
                                                 pattern are uncompressed
                                                 and gzip'ed before storing
                                                 locally. Compression
                                                 conversion is only meant
                                                 to do compress to gzip
                                                 conversion.
 compress_conv_expr   s/\.Z$/\.gz/;              Perl expression to convert
                      s/\.taz$/\.tgz/            suffix from compress to
                                                 gzip style. Change .Z to
                                                 .gz and .taz to .tgz.
 compress_size_floor  0                          Do not compress files
                                                 smaller than this size, in
                                                 bytes.

 File Splitting
 keyword              default                    Description
 split_max            0                          If >0 and the size of the
                                                 file is greater than this
                                                 many bytes, the file is
                                                 split up to be stored
                                                 locally (filename must
                                                 also match split_patt).
                                                 The name of the file being
                                                 split up is used as the
                                                 directory name and each
                                                 part is stored in a file
                                                 called part1, part2... in
                                                 that directory.
 split_patt           none                       Regexp of remote pathnames
                                                 to split up before storing
                                                 locally.
 split_chunk          102400                     Size, in bytes, of chunks
                                                 to split files into.

 Directory Listings
 keyword              default                    Description
 remote_fs            unix                       File store type. Currently
                                                 can be one of unix, dls,
                                                 netware, vms, dosftp,
                                                 macos, lsparse and
                                                 infomac. See the
                                                 Filestores section for
                                                 more details.
 ls_lR_file           none                       Remote file containing
                                                 ls-lR (result of running
                                                 ls -lR on that machine),
                                                 otherwise run remote ls
                                                 command.
 local_ls_lR_file     none                       Local file containing
                                                 ls-lR, otherwise use
                                                 remote ls_lR_file. This is
                                                 useful when first
                                                 mirroring a large package.
 recursive            true                       Mirror both the contents
                                                 of local_dir and sub
                                                 directories of local_dir.
 recurse_hard         false                      Generate remote ls by
                                                 doing CWD and ls for each
                                                 sub directory. In this
                                                 case remote_dir must be
                                                 absolute (begin with a /)
                                                 not relative. Use the CWD
                                                 command in FTP to find the
                                                 path for the start of the
                                                 remote archive area. (Not
                                                 available if remote_fs is
                                                 VMS.)
 flags_recursive      -lRat                      Flags to send to remote ls
                                                 to do a recursive listing.
 flags_nonrecursive   -lat                       Flags to send to remote ls
                                                 to do a non-recursive
                                                 listing.
                                                 Edit pathnames in remote
 ls_fix_mappings      none                       directory listings (a Perl
                                                 substitute command, e.g.
                                                 s:/usr/spool/pub:/:).

 Logging
 keyword              default                    Description
 update_log           none                       Filename, relative to
                                                 local_dir, where mirror
                                                 will write a report of all
                                                 it does to maintain a
                                                 package.
 mail_to              none                       Mail a log of the work
                                                 done to this comma
                                                 separated list of
                                                 addresses (currently only
                                                 supported on Un*x).
 mail_prog            none                       Program called to send to
                                                 the mail_to list. May be
                                                 passed the argument
                                                 mail_subject. Defaults to
                                                 mailx, Mail, or mail. (Not
                                                 supported under Wind*ws)
 mail_subject         -s "mirror update"         This can contain
                                                 $keyword.  These will be
                                                 replaced by the current
                                                 value for that keyword
                                                 (e.g.: -s "mirror update:
                                                 $package")

 Special
 keyword              default                    Description
 hostname             none                       Mirror automatically skips
                                                 packages whose site
                                                 variable matches this
                                                 host. Defaults to the
                                                 local hostname.  This is
                                                 normally only ever set in
                                                 the defaults package.
                                                 Useful if you are sharing
                                                 mirror package files with
                                                 others.
 comment              none                       Used in reports.
 use_files            false                      Put the associative arrays
                                                 that mirror uses into
                                                 temporary files (currently
                                                 only support on Un*x).
                                                 The files are created in
                                                 /var/tmp with names:
                                                 local_map and remote_map.
                                                 The suffixes will depend
                                                 on which DBM library was
                                                 set as default when Perl
                                                 was installed on your
                                                 machine.
 interactive          false                      A non-batch transfer.
                                                 Implied by -g flag.
 skip                 none                       If set causes this package
                                                 to be skipped.  The value
                                                 is reported as the reason
                                                 for skipping.
 verbose              false                      Verbose messages.
 algorithm            0                          Sets the basic algorithm
                                                 that mirror uses.

                                                 Algorithm=0 mirrors an
                                                 entire site at a time.
                                                 This is very friendly on
                                                 the remote site as it uses
                                                 few of its resources.
                                                 However it can chew up a
                                                 lot of memory on the local
                                                 machine.

                                                 Algorithm=1 mirrors a site
                                                 directory-by-directory.
                                                 Should ONLY be used for
                                                 true mirrors (i.e.: no
                                                 differences between the
                                                 this mirror copy and the
                                                 original). This uses up a
                                                 lot less local resources.
                                                 However it is very
                                                 unfriendly to the remote
                                                 site as it requires remote
                                                 site to run an ls command
                                                 for each directory
                                                 mirrored.   Mirror will
                                                 only "see" the one
                                                 directory it is mirroring
                                                 so it will not know that
                                                 files outside this
                                                 directory exists so
                                                 symlinks outside this
                                                 directory are considered
                                                 bad, see
                                                 make_bad_symlinks.
                                                 Deletions are done on a
                                                 directory by directory
                                                 basis so be extra careful
                                                 about the settings of
                                                 max_delete_files and
                                                 max_delete_dirs.  get_patt
                                                 is applied to just the
                                                 filename in this directory
                                                 not the full path, as are
                                                 other name checks. You
                                                 will almost certainly need
                                                 to set remote_dir to be an
                                                 absolute pathname
                                                 (beginning with /).
 local_dir_check      false                      If true and the local_dir
                                                 does not exit skip this
                                                 package.  By default the
                                                 local_dir will be created
                                                 if it does not already
                                                 exist.

Filestores

Mirror uses the remote directory listing to work out what files are
available. Mirror was originally targeted connect to Un*x FTP daemons using
a standard ls command. To use a Un*x host with a non-standard ls or a non
Un*x host it is necessary to set the remote_fs variable to match the kind of
directory listing that will be returned. There is some interaction between
remote_fs and other variables in particular flags_nonrecursive, recurse_hard
and get_size_change. The following sections show examples of the results of
running the FTP DIR command on the various kinds of archive and
recommendations for related variables. With some unusual set-ups archive you
may have to vary from the recommended variable set-ups.

remote_fs=unix

total 65
-rw-r--r-- 1 nobody nobody   2245 Jan 28 20:06 README
-rw-r--r-- 1 nobody nobody  45881 Jan 29 19:13 mirror.html

This is the default and you should not normally have to reset any other
related variables.

remote_fs=dls

00index.txt      189916
0readme            5793
1_x/                  =  OS/2 1.x-specific files

This is an ls variant used on some Un*x archives. It provides descriptions
of known items in the listing. Set flags_recursive to -dtR.

remote_fs=netware

- [R----F--] jrd                  1646       May 07 21:43    index
d [R----F--] jrd                   512       Sep 09 10:52    netwire
d [R----F--] jrd                   512       Sep 02 01:31    pktdrvr
d [RWCE-F--] jrd                   512       Sep 04 10:55    incoming

or

-[R----F--] 1 jrd                  1646       May 07 21:43    index
d[R----F--] 1 jrd                   512       Sep 09 10:52    netwire
d[R----F--] 1 jrd                   512       Sep 02 01:31    pktdrvr

This is used by Novell archives. Set recurse_hard to true and set
flags_nonrecursive to be nothing. See also remote_dir.

remote_fs=dosftp

00-index.txt  6,471 13:54  7/20/93   alabama.txt   1,246 23:29  5/08/97
alaska.txt      873 23:29  5/08/92   alberta.txt   2,162 23:29  5/08/97

dosftp is for an FTP daemon on D*S boxes. Set recurse_hard to true and set
flags_nonrecursive to nothing. See also remote_dir.

remote_fs=macos

-------r--      0      127   127 Aug 27 13:53 !Gopher Links
drwxrwxr-x          folder    32 Sep  9 16:30 FAQ
drwxrwx-wx          folder     0 Sep  9 09:59 incoming

macos is for one of Macintosh FTP daemon variants. Although the output is
similar to Un*x  the Un*x remote_fs type cannot cope with it because there
are three file sizes for each file. Set recurse_hard to true,
flags_nonrecursive to nothing, get_size_change to false and compress_patt to
nothing (this last setting is due to the unusual file names upsetting the
shell used to run compress). See also remote_dir.

remote_fs=vms

USERS:[ANONYMOUS.PUBLIC]

1-README.FIRST;13     9  14-JUN-1993 13:09 [ANONYMOUS] (RWE,RWE,RE,RE)
PALTER.DIR;1          1  18-JAN-1993 11:56 [ANONYMOUS] (RWE,RWE,RE,RE)
PRESS-RELEASES.DIR;1
                      1  11-AUG-1992 20:05 [ANONYMOUS] (RWE,RWE,,)

alternatively:

[VMSSERV.FILES]ALARM.DIR;1      1/3          5-MAR-1993 18:09
[VMSSERV.FILES]ALARM.TXT;1      1/3          4-FEB-1993 12:20

Set flags_recursive to '[...]' and get_size_change to false. recurse_hard is
not available with VMS. See also the vms_keep_versions and vms_xfer_text
variables.


remote_fs=infomac

-r     1974 Jul 21 00:06 00readme.txt
lr        3 Sep  8 08:34 AntiVirus -> vir

This is a special case just meant to handle the sumex-aim.stanford.edu
info-mac directory listing stored on that archive in help/all-files.
recurse_hard should be set to true.

remote_fs=dosish

This is for a D*S/Wind*ws FTP server with a faintly DOS like output

03-04-94  08:45PM       <DIR>          .
03-04-94  08:45PM       <DIR>          ..
03-04-94  09:58AM                 9718 Conduit
03-04-94  09:59AM                 8745 Eve

recurse_hard should be set to true and flags_nonrecursive to nothing.

remote_fs=lsparse

Allow reparsing of the listing generated by mirror with debugging turned to
a high level. Meant only for mirror wizards.

Examples

Here is the mirror.defaults file from the archive on sunsite.org.uk:

# This is the default mirror settings used by my site:
# sunsite.org.uk (193.63.255.4)

package=defaults
        # The LOCAL hostname - if not the same as `hostname`
        # (I advertise the name sunsite.org.uk but the machine is
        #  really swallow.sunsite.org.uk)
        hostname=sunsite.org.uk
        # Keep all local_dirs relative to here
        local_dir=/public/Mirrors
        remote_password=wizards@sunsite.org.uk
        mail_to=
        # Don't mirror file modes.  Set all dirs/files to these
        dir_mode=0755
        file_mode=0444
        # By default, files are owned by root.zero
        user=0
        group=0
#       # Keep a log file in each updated directory
#       update_log=.mirror
        update_log=
        # Don't overwrite my mirror log with the remote one.
        # Don't retrieve any of their mirror temporary files.
        # Don't touch anything whose name begins with a space!
        # nor any FSP or gopher files...
        exclude_patt=(^|/)(\.mirror$|\.in\..*\.$|MIRROR.LOG|#.*#|\.FSP|\.cache|\.zipped|lost+found/|)
        # Try to compress everything
        compress_patt=.
        compress_prog=compress
        # Don't compress information files, files that don't benefit from
        # being compressed, files that tell ftpd, gopher, wais... to do things,
        # the sources for compression programs...
        # (Note this is the only regexp that is case insensitive.)
        compress_excl+|^\.notar$|-z|\.gz$|\.taz$|\.tar.Z|\.arc$|\.zip$|\.lzh$|\.zoo$|\.exe$|\.lha$|\.zom$|\.gif$|\.jpeg$|\.jpg$|\.mpeg$|\.au$|read.*me|index|\.message|info|faq|gzip|compress
        # Don't delete own mirror log or any .notar files (incl in subdirs)
        delete_excl=(^|/)\.(mirror|notar)$
        # Ignore any local readme files
        local_ignore=README.doc.ic
        # Automatically delete local copies of files that the
        # remote site has zapped
        do_deletes=true

Here are some sample package descriptions:

package=gnu
        comment=Powerful and free Un*x utilities
        site=prep.ai.mit.edu
        remote_dir=/pub/gnu
        # Local_dir+ causes gnu to be appended to the default local_dir
        # so making /public/gnu
        local_dir+gnu
        exclude_patt+|^ListArchives/|^lost+found/|^scheme-7.0/|^\.history
        # I tend to only keep the latest couple of versions of things
        # this stops mirror from retrieving the older versions I've removed
        max_days=30
        do_deletes=false

package=X11R6
        comment=X Windows (windowing graphics system for Un*x)
        site=ftp.x.org
        remote_dir=/pub/R6
        local_dir+ftp.x.org/pub/R6
        # This is a local symlink to the free-for-all contrib area
        # and is mirrored elsewhere
        local_ignore=^contrib$
        # Don't compress a thing.  It is already compressed
        # but doesn't look it.
        compress_patt=

# THIS IS JUST A TEST
package=test vms site
        site=vmsbox.somewhere.ac.uk
        local_dir=/tmp/copy4
        remote_dir=vmsserv/files
        remote_fs=vms
        # Must do these settings for VMS
        flags_recursive=[...]
        get_size_change=false

# and on, and on ...

Temporary Filenames

By default when mirror creates a temporary filename it takes the real
filename and puts .in. at the start.
If your system limits the length of a filename a lot (some older Un*xes were
limited to 14 characters) then look for:

  LIMITED NAMELEN

which is about 75% of the way through mirror.pl, for a note on how to reduce
temporary filename length.  I only know of one site using this.

Regular Expressions

This is a short explanation of regular expressions.  For a more
comprehensive guide see the Perl manual pages or the O'Reilly book
"Mastering Regular Expressions".

A regular expression, or regexp, is a way of using matching patterns in text
strings.  For example the regexp:

      ^s

would match any string that begins with an s.  The ^ is a special character
that means beginning of string.  There are a number of specials possible in
a regexp, everything that is not special is taken as a literal character,
such as the s in the example above.  To turn off a special character put a
backslash, \, in front of it.  This only effects the special character
immediately following it.

A word of warning: although very similar to Un*x shell (and D*S COMMAND)
wildcards there are differences.  For example any Un*x and D*S would treat
*.ZIP as any filename ending in .ZIP, *.ZIP as a regular expression is an
error!  The * is special that must follow something (see below).

Regexp Specials

 ^            beginning of string
 $            end of string
 .            any character

 [r]          a range or characters either as a list abcef or a hyphen
              separated range a-f
 [^r]         anything not in the given list or range
 (p1|p2|p3...)patterns p1 or p2 or p3 ... (the patterns may be specials)
 *            zero or more of the preceding item (which may be a special)
 +            one or more of the preceding item (which may be a special)
 \d           any digit (same as [0-9])
 \D           any non-digit (same as [^0-9])
 \s           any whitespace character
 \S           any whitespace character

Regexp Examples

 abc                     matches abc, also xxxabcyyy but not xabbcy
 ^abc$                   matches only abc
 a.*z                    matches a any string z. e.g. asdkjfhaksdjfhz

 index.html              matches index.html AND indexXhtml index/html (.
                         matches any character)

 index\.html             matches index.html (the backslash stops . matching
                         any character)
 [rR][eE][aA][dD][mM][eE]matches readme, Readme, README ...
 \.(gz|Z)$               matches strings ending in .gz or .Z

Hints

When adding a new package, first test it by running mirror with the -n
option.

If you are adding to an existing archive that was not created by mirror
(perhaps you copied the files from a CDROM) then it is usually best to force
the time-stamps of the existing local files so time comparisons with the
remote files show the files as identical (see -T).

Try and keep all packages that are being retrieved from the same site
together in the same package file. That way mirror will only have to login
once.

Remember that all regexp's are Perl regular expressions.

If the remote site contains symlinks that you want to "flatten out" into the
corresponding files, then do this by changing the flags passed to the remote
ls which will be either flags_recursive or flags_nonrecursive to include L
First test this by trying a ls -lRatL on the remote site under the FTP
command to check whether the remote filestore has any symlink loops.   These
cause ls to go into an infinite loop - if this happens you will have to talk
to the manager of the remote area about removing them.

If you are mirroring a very large site that changes infrequently, add
max_days=7 to the settings after it is initially mirrored. That way mirror
will only have to consider recent files when updating. Then once a week, or
whenever necessary, call mirror with -k max_days=0 to force a full update.

If you don't want to compress anything from the remote site the easiest way
to do this is to set the compress_patt to nothing.

If you want to run a command at the end of mirroring a package a useful
trick is to reset the mail_prog variable to be the program name and mail_to
to be the arguments.

For netware, dosftp, macos and VMS you should normally set remote_dir to be
the home directory of the remote FTP daemon. Connect in manually and before
changing directory use the pwd command to find where home is. If you are
only mirroring part of the tree then give the full pathname including this
home directory at the start.

macos names can sometimes contain characters that make it hard to pass them
through Un*x shells. Since compressing files is done via a shell it would be
best to turn off compression with compress_patt=

macos files seem to always change size when transfered, in either binary or
text mode. So it would be best to set get_size_change=false

Netiquette

If you are going to mirror a remote site, please obey any restrictions that
the site administrators place on access. You can generally find the
restrictions on connecting to the archive using the standard FTP command.
Any restrictions are normally given as a login banner or in a (hopefully)
obvious file.

Here are, what I hope are, some good general rules:

You should probably get permission from the remote site before setting up a
mirror of it.  Some sites require detailed logs.  Unauthorised mirrors would
take traffic from the site generating the logs and so ruin their
statistics.  There may also be SERIOUS LEGAL REASONS why mirrors are
unwanted.

Only mirror a site well outside the working hours of both the local and
remote sites.

It is probably unfriendly to try to mirror a remote site more than once a
day.

Before trying to mirror a remote site, try and find the packages you want
from local archives, as no one will be pleased if you soak up a lot of
network bandwidth needlessly.

If you have a local archive, then tell people about it so they don't have to
waste bandwidth and CPU at the remote site.

Do remember to check your package-files from time to time in case the remote
archive has changed their access restrictions.

See Also

perl(l), ftp(1), mm(1)

Bugs

Some of the netiquette guidelines should be enforced.

Should be able to cope with links as well as symlinks.

Suffers from creeping featurism. (Actually more like galloping featurism!.)

Remember!

Objects in a mirror are closer than you think!

Author

Mirror was writen by Lee McLoughlin <lmjm@icparc.ic.ac.uk>. It uses a
heavily rewritten and extended version of the ftp.pl package originally by:
Alan R. Martello <al@ee.pitt.edu> which uses lchat.pl which is based on the
chat2.pl package by: Randal L. Schwartz <merlyn@ora.com>

Special thanks to the following people for patches, comments and other
suggestions that have helped to improve mirror. If I have omitted anyone,
please contact me.

Zo Leech <zl@icparc.ic.ac.uk>
James Revell <revell@uunet.uu.net>
Chris Myers <chris@wugate.wustl.edu>
Amos Shapira <amoss@cs.huji.ac.il>
Paul A Vixie <vixie@pa.dec.com>
Jonathan Kamens <jik@pit-manager.mit.edu>
Christian Andretzky <casys@otto.mb3.tu-chemnitz.de>
Kean Stump <kean@ucs.orst.edu>
Anita Eijs <anita@hermes.bouw.tno.nl>
Simon E Sperro <S.E.Sperro@gdr.bath.ac.uk>
Aaron Wohl <aw0g+@andrew.cmu.edu>
Michael Meissner <meissner@osf.org>
Michael Graff <explorer@iastate.edu>
Bradley Rhoades <us267388@mail.mmmg.com>
Edwards Reed <eer@cinops.xerox.com>
Joachim Schrod <schrod@iti.informatik.th-darmstadt.de>
David Woodgate <David.Woodgate@mel.dit.csiro.au>
Pieter Immelman <pi@itu1.sun.ac.za>
Jost Krieger <x920031@bus072.rz.ruhr-uni-bochum.de>
Erez Zadok <ezk@cs.columbia.edu>


Copyright

Mirror, both the software and all the accompanying documentation including
this document, is under the following copyright.

Copyright  1990 - 1998 Lee McLoughlin

Permission to use, copy, and distribute this software and its documentation
for any purpose with or without fee is hereby granted, provided that the
above copyright notice appear in all copies and that both that copyright
notice and this permission notice appear in supporting documentation.

Permission to modify the software is granted, but not the right to
distribute the modified code. Modifications are to be distributed as patches
to released version.

This software is provided "as is" without express or implied warranty.