File: IdiotCSV.pm

package info (click to toggle)
libdata-tablereader-perl 0.021-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 392 kB
  • sloc: perl: 2,340; makefile: 2; sh: 1
file content (86 lines) | stat: -rw-r--r-- 2,184 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
package Data::TableReader::Decoder::IdiotCSV;
use Moo 2;
use Try::Tiny;
use Carp;
use Log::Any '$log';

extends 'Data::TableReader::Decoder::CSV';

# ABSTRACT: Access rows of a badly formatted comma-delimited text file
our $VERSION = '0.021'; # VERSION


sub _build_parser {
	my $args= shift->_parser_args || {};
	Data::TableReader::Decoder::CSV->default_csv_module->new({
		binary => 1,
		allow_loose_quotes => 1,
		allow_whitespace => 1,
		auto_diag => 1,
		escape_char => undef,
		%$args,
	});
}

1;

__END__

=pod

=encoding UTF-8

=head1 NAME

Data::TableReader::Decoder::IdiotCSV - Access rows of a badly formatted comma-delimited text file

=head1 VERSION

version 0.021

=head1 DESCRIPTION

This decoder is like L<::Decoder::CSV|Data::TableReader::Decoder::CSV>, but can additionally
parse the garbage resulting from those special people who write "CSV Export" code
that looks like

  print join(",", map qq{"$_"}, @record)."\n";

(or rather, the equivalent code in Visual Basic or PHP which is what they're
probably using)  regardless of their data containing quote characters or newlines,
resulting in garbage like:

  "First Name","Last Name","email"
  "Joseph "Joe","Smith",""Smith, Joe" <jsmith@example.com>"

This can actually be processed by (recent versions of) the L<Text::CSV> module
with the following configuration:

  {
    binary => 1,
    allow_loose_quotes => 1,
    allow_whitespace => 1,
    escape_char => undef,
  }

And so this module is simply a subclass of L<Data::TableReader::Decoder::CSV>
which provides those defaults to the parser.

How does the parsing work though?  Well, some guesswork and patterns.  It's not super
reliable, and you should always complain loudly to whoever generated that data,
unless they're a much larger company than you and would never listen, or went
out of business a while back, in which case you can justify using this module
in production.

=head1 AUTHOR

Michael Conrad <mike@nrdvana.net>

=head1 COPYRIGHT AND LICENSE

This software is copyright (c) 2024 by Michael Conrad.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.

=cut