File: Filter.pm

package info (click to toggle)
libweb-scraper-perl 0.38-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, forky, sid, trixie
  • size: 308 kB
  • sloc: perl: 473; makefile: 2
file content (75 lines) | stat: -rw-r--r-- 1,619 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
package Web::Scraper::Filter;
use strict;
use warnings;

sub new {
    my $class = shift;
    bless {}, $class;
}

1;

__END__

=for stopwords namespace inline callback

=head1 NAME

Web::Scraper::Filter - Base class for Web::Scraper filters

=head1 SYNOPSIS

  package Web::Scraper::Filter::YAML;
  use base qw( Web::Scraper::Filter );
  use YAML ();

  sub filter {
      my($self, $value) = @_;
      YAML::Load($value);
  }

  1;

  use Web::Scraper;

  my $scraper = scraper {
      process ".yaml-code", data => [ 'TEXT', 'YAML' ];
  };

=head1 DESCRIPTION

Web::Scraper::Filter is a base class for text filters in
Web::Scraper. You can create your own text filter by subclassing this
module.

There are two ways to create and use your custom filter. If you name
your filter Web::Scraper::Filter::Something, you just call:

  process $exp, $key => [ 'TEXT', 'Something' ];

If you declare your filter under your own namespace, like
'MyApp::Filter::Foo',

  process $exp, $key => [ 'TEXT', '+MyApp::Filter::Foo' ];

You can also inline your filter function or regexp without creating a
filter class:

  process $exp, $key => [ 'TEXT', sub { s/foo/bar/ } ];

  process $exp, $key => [ 'TEXT', qr/Price: (\d+)/ ];
  process $exp, $key => [ 'TEXT', qr/(?<name>\w+): (?<value>\w+)/ ];

Note that this function munges C<$_> and returns the count of
replacement. Filter code special cases if the return value of the
callback is number and C<$_> value is updated.

You can, of course, stack filters like:

  process $exp, $key => [ '@href', 'Foo', '+MyApp::Filter::Bar', \&baz ];

=head1 AUTHOR

Tatsuhiko Miyagawa

=cut