File: swish_filter.pl.in

package info (click to toggle)
swish-e 2.4.3-7
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k
  • size: 7,308 kB
  • ctags: 7,642
  • sloc: ansic: 47,402; sh: 8,508; perl: 5,281; makefile: 723; xml: 9
file content (62 lines) | stat: -rwxr-xr-x 1,879 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#!@@perlbinary@@ -w
use strict;

# This is set to where Swish-e's "make install" installed the helper modules.
use lib qw( @@perlmoduledir@@ );


use SWISH::Filter;


=pod

This is an example of how to use the SWISH::Filter module to filter
documents using Swish-e's C<FileFilter> feature.  This will filter any
number of document types, depending on what filter modules are installed.

This program should typically only be used for the -S fs indexing method. 
For -S http the F<swishspider> program calls SWISH::Filter directly.  And -S
prog programs written in Perl can also make use of SWISH::Filter directly.

In general, you will not want to filter with this program if you have a lot
of files to filter.  Running a perl program for many documents will be slow
(due to the compiliation of the perl program).  If you have many documents
to convert with the -S fs method of indexing then consider using -S prog
with F<prog-bin/DirTree.pl> and use the SWISH::Filter module (see
F<filters/README>).

Swish-e configuration:

    FileFilter .pdf /path/to/swish_filter.pl
    FileFilter .doc /path/to/swish_filter.pl
    FileFilter .mp3 /path/to/swish_filter.pl
    IndexContents HTML2 .pdf .mp3
    IndexContents TXT2 .doc

Then when indexing those type of documents this program will attempt to filter (convert)
them into a text format.

See SWISH-CONFIG documentation on Filtering for more information.

=cut    


    my ( $work_path, $real_path ) = @ARGV;
    my $filter = SWISH::Filter->new;

    my $filtered = $filter->filter(
        document => $work_path,
        name     => $real_path,
        content_type => \$real_path, # use the real path to lookup the content type
    );

    print STDERR $filtered ? " - Filtered: $real_path\n" : " - Not filtered: $real_path ($work_path)\n";

    print $filtered
        ? ${$filter->fetch_doc}
        : $real_path;