File: nlcvt.pl

package info (click to toggle)
enigma 1.20-dfsg.1-2.2
  • links: PTS
  • area: main
  • in suites: bullseye, sid
  • size: 64,696 kB
  • sloc: xml: 153,614; cpp: 63,581; ansic: 31,088; sh: 4,825; makefile: 1,858; yacc: 288; perl: 84; sed: 16
file content (262 lines) | stat: -rw-r--r-- 6,751 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
#!/usr/bin/perl -w
# nlcvt - convert newline notations
# Tom Christiansen, 9 March 1999

#   "The most brilliant decision in all of Unix was 
#    the choice of a *single* character for the 
#    newline sequence.      --Mike O'Dell, only half jokingly

use strict;

END {
    close STDOUT            || die "$0: can't close stdout: $!\n";
    $? = 1 if $? == 255;    # from die
} 

my(
    $src,		# input format style
    $dst,		# output format style
    %format,		# table of conversion
    $errors, 		# file input errors
);

$errors = 0;

%format = (

    # the good...

    "unix"		=> "\cJ",	# CANON
    "plan9"		=> "\cJ",
    "inferno"		=> "\cJ",
    "linux"		=> "\cJ",	# some people don't get it
    "bsd"		=> "\cJ",	# some people don't get it
    "be"		=> "\cJ",
    "beos"		=> "\cJ",

    # the not so good, but still ok...

    "mac"		=> "\cM", 	# CANON
    "apple"		=> "\cM",
    "macintosh"		=> "\cM", 

    # and the really unbelievably idiotic...

    "cpm"		=> "\cM\cJ",	# CANON
    "cp/m"		=> "\cM\cJ",	# could be in first arg
    "dos"		=> "\cM\cJ",
    "windows"		=> "\cM\cJ",
    "microsoft"		=> "\cM\cJ",
    "nt"		=> "\cM\cJ",
    "win"		=> "\cM\cJ",

);

sub usage {    
    warn "$0: @_\n" if @_;
    my @names = sort { 
			$format{$a} cmp $format{$b} 
				     ||
				$a cmp $b 
    } keys %format;
    my $fmts = "@names";
    print STDERR "usage: $0 src2dst [file ...]\n";
    format STDERR = 
    where src and dst are both one of:
~~      ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
	$fmts
.
    write STDERR;
    exit(1);
}

($src, $dst) = ($0 =~ /(\w+)2(\w+)/);

usage("insufficient args") unless @ARGV || ($src && $dst);

if (@ARGV && $ARGV[0] =~ /(\w+)2(\w+)/) {
    ($src, $dst) = ($1, $2);
    shift @ARGV;
} 

usage("no conversion specified") unless $src && $dst;

usage("unknown input format: $src")  unless $/ = $format{lc $src};
usage("unknown output format: $dst") unless $\ = $format{lc $dst};

binmode(STDOUT);

unshift @ARGV, '-' unless @ARGV;

for my $infile (@ARGV) {
    unless (open(INPUT, $infile)) {
	warn "$0: cannot open $infile: $!\n";
	$errors++;
	next;
    } 


    binmode(INPUT);

    unless (-T INPUT) {
	warn "$0: WARNING: $infile appears to be a binary file.\n";
	$errors++;
    } 

    while (<INPUT>) {
	unless (chomp) {
	    $errors++;
	    warn "$0: WARNING: last line of $infile truncated, correcting\n";
	} 
	print;
    } 

    unless (close INPUT) {
	warn "$0: cannot close $infile: $!\n";
	$errors++;
	next;
    } 
} 

exit ($errors != 0);

__END__

=head1 NAME

nlcvt - convert foreign line terminators

=head1 SYNOPSIS

B<nlcvt> I<src>2I<dst> [I<file> ...]

B<unix2mac> [I<file> ...]

B<unix2cpm> [I<file> ...]

B<cpm2unix> [I<file> ...]

B<cpm2mac> [I<file> ...]

B<mac2unix> [I<file> ...]

B<mac2cpm> [I<file> ...]

=head1 DESCRIPTION

Mike O'Dell said, only half-jokingly, that "the most brilliant decision
in all of Unix was the choice of a I<single> character for the newline
sequence."  But legacy systems live on past their days, and these programs
can help that.  Note, however, that if you've downloaded a binary file in
"text" mode rather than "binary", your mileage may vary.

The B<nlcvt> program, or any of its many aliases, is a filter to convert
from one system's notion of proper line terminators to that of another.
This usually happens because you've downloaded or otherwise directly
transferred a text file in so-called "binary" rather than "text" mode.

Unix format considers a lone Control-J to be the end of line.  Mac format
considers a lone Control-M to be the end of line.  The archaic CP/M
format considers a Control-M and a Control-J to be the end of line.

This program expects its first argument to be of the form I<src>2I<dst>,
where I<src> and I<dst> are both one of B<unix>, B<mac>, or B<cpm>.
(That's speaking canonically--many aliases for those systems exist: call
B<nlcvt> without arguments to see what names are accepted.)  The converted
data is written to the standard output.  B<nlcvt> does I<not> do 
destructive, in-place modification of its source files.  Do this
instead:

    cpm2unix < file.bad > file.good 
    mv file.good file.bad

This program can also be called by the name of the conversion itself.
Just create links to the B<nlcvt> program for each systems, and the
program use its own name to determine the conversion.  For example:

    #!/usr/bin/perl
    # make nlcvt links
    chomp($path = `which nlcvt`);
    @systems = qw(unix mac cpm);
    for $src (@systems) {
	for $dst (@systems) {
	    next if $src eq $dst;
	    ln($path, "${src}2$dst") || die $!;
	} 
    } 

=head1 DIAGNOSTICS

Any of the following diagnostics cause B<nlcvt>
to exit non-zero.

=over

=item C<insufficient args>

You called the program by its canonical name, 
and supplied no other arguments.
You must supply a conversion argument.

=item C<no conversion specified>

Neither the name of the program nor its
first argument were of the form I<src>2I<dst>.

=item C<unknown input format: %s>

The specified input format, C<%s>, was unknown.
Call B<nlcvt> without arguments for a list of
valid conversion formats.

=item C<unknown output format: %s>

The specified output format, C<%s>, was unknown.
Call B<nlcvt> without arguments for a list of
valid conversion formats.

=item C<cannot open %s: %m>

The input file C<%s> could not be opened for the reason
listed in C<%m>.

=item C<cannot close %s: %m>

The input file C<%s> could not be close for the reason
listed in C<%m>.  This error is rare.

=item C<can't close stdout: %m>

The filter could not finish writing to its standard output for the
reason listed in C<%m>.  This could be caused by a full or temporarily
unreachable file system.

=item C<WARNING: last line of %s truncated, correcting>

Text files contain zero or more variable-length, newline-terminated
records.  Occasionally, the final record terminator is missing,
perhaps due to an incomplete transfer, perhaps due to an aberrant
I<emacs> user.  A newline sequence appropriate to the destination
system is appended.  This would be a valid use of a I<unix2unix>
conversion.  And no, you can't call it as B<emacs2vi>.

=item C<WARNING: %s appears to be a binary file>

Perl's C<-T> operator did not think the input file was a text file.
The conversion is still performed, but is of dubious value.  If
the file really was binary, the resulting output may be mangled.
Garbage in, garbage out.

=back

=head1 AUTHOR

Tom Christiansen, I<tchrist@perl.com>.

=head1 COPYRIGHT

This program is copyright (c) 1999 by Tom Christiansen.

This program is free and open software. You may use, copy, modify,
distribute, and sell this program (and any modified variants) in any
way you wish, provided you do not restrict others from doing the same.