File: Simple.pm

package info (click to toggle)
liburi-find-simple-perl 1.07-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, forky, sid, trixie
  • size: 76 kB
  • sloc: perl: 41; makefile: 2
file content (123 lines) | stat: -rw-r--r-- 3,231 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
package URI::Find::Simple;
use warnings;
use strict;
use 5.006;

use URI::Find;
use Carp qw(croak);
use Encode qw( encode );


our @ISA = qw( Exporter );
our @EXPORT_OK = qw( list_uris change_uris );

our $VERSION = 1.07;

our $CHARSET = "utf-8";

sub list_uris {
  my $text = shift;
  croak "expected a text string" unless defined($text);

  my @list;
  my $uri_find = URI::Find->new( sub {
    my ($object, $text) = @_;
    push @list, $object->as_string;
    return $text;
  } );
  
  if ($CHARSET) {
    my $copy = encode($CHARSET, $text);
    $copy =~ s/([^\000-\177])/'%' . sprintf("%x", ord($1))/eg;
    $text = $copy;
  }
  $uri_find->find(\$text);
  return @list;
}

sub change_uris {
  my $text = shift;
  my $sub = shift;
  croak "expected a text string" unless defined($text);
  croak "expected a code ref" unless ref($sub) eq 'CODE';

  my $uri_find = URI::Find->new( sub {
    my ($object, $text) = @_;
    return $sub->($object->as_string);
  } );
  $uri_find->find(\$text);
  return $text;
}

1;

__END__

=head1 NAME

URI::Find::Simple - a simple interface to URI::Find

=head1 SYNOPSIS

  use URI::Find::Simple qw( list_uris );
  my @list = list_uris($text);

  my $html = change_uris($text, sub { "<a href=\"$_[0]\">$_[0]</a>" } );

=head1 DESCRIPTION

L<URI::Find> is all very well, but sometimes you just want a list of the
links in a given piece of text, or you want to change all the urls in
some text somehow, and don't want to mess with callback interfaces.

This module uses URI::Find, but hides the callback interface, providing two
functions - one to list all the uris, and one to change all the uris.

=head2 list_uris( text )

returns a list of all the uris in the passed string, in the form output by
the URI->as_string function, not the form that they exist in the text.

=head2 change_uris( text, sub { code } )

the passed sub is called for every found uri in the text, and it's return
value is substituted into the string. Returns the changed string.

=head1 CAVEATS, BUGS, ETC

The change_uris function is only just nicer than the callback interface. In
some ways it's worse. I's prefer to just pass an s/// operator somehow, but
I don't think that's possible.

The list_uris function returns the stringified versions of the URI objects,
this seemed to be the sensible thing. To present a consistent interface, the
change_uris function operates on these strings as well, which are not the same
as the strings actually present in the original. Therefore this code:

  my $text = change_uris($text, sub { shift } );

may not return the same thing you pass it. URIs such as <URI:http://jerakeen.org>
will be converted to the string 'http://jerakeen.org'.

=head1 SEE ALSO

L<URI::Find>, L<URI::Find::Iterator>, L<URI>, L<HTML::LinkExtor>, L<HTML::LinkExtractor>.

=head1 REPOSITORY

L<https://github.com/neilb/URI-Find-Simple>

=head1 AUTHOR

Tom Insam E<lt>tom@jerakeen.orgE<gt>
inspired by Paul Mison E<lt>paul@husk.orgE<gt>

This module is now maintained by Neil Bowers E<lt>neilb@cpan.orgE<gt>.

=head1 COPYRIGHT AND LICENSE

Copyright (c) 2004 Tom Insam E<lt>tom@jerakeen.orgE<gt>.

This program is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.