1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
|
# ----- Example 9 - Filtering PDF with "prog" -------
#
# Please see the swish-e documentation for
# information on configuration directives.
# Documentation is included with the swish-e
# distribution, and also can be found on-line
# at http://swish-e.org
#
#
# This example demonstrates how to use swish's
# "prog" document source feature to filter documents.
#
# The "prog" document source feature allows
# an external program to feed documents to
# swish, one after another. This allows you
# to index documents from any source (e.g. web, DBMS)
# and to filter and adjust the content before swish
# indexes the content.
#
# Using the "prog" method to filter documents requires more
# work to set up than using the "filters" described in
# example8.config because you must write a program to retrieve
# the documents and feed them to swish.
#
# On the otherhand, the "prog" method should be faster than the
# filter method in example8.config because swish doesn't need to fork
# itself and run an external program for each document to filter.
# This can be significant if you are using a perl script as a filter since
# the perl script must be compiled each time it is run. This "prog" method
# avoides that overhead.
#
# This example uses the example9.pl program. This program
# is very similar to the included DirTree.pl program found in
# the prog-bin directory. This program simple reads files from the
# file system, and passes their content onto swish if they are the correct
# type. PDF files are converted by the prog-bin/pdf2xml.pm module.
#
# The PDF info fields (e.g. author) are placed in xml tags
# which allows indexing the PDF info as MetaNames.
# By specifying metanemes you can limit searches by this PDF info.
#
# For this example, you will need the xpdf package.
# Type "perldoc pdf2xml" from the prog-bin directory for
# more information.
#
# Run this example as:
#
# swish-e -S prog -c example9.config
#
#---------------------------------------------------
# Include our site-wide configuration settings:
IncludeConfigFile example4.config
# Define the program to run
IndexDir ./example9.pl
# Pass in the top-level directory to index
# (here we specify the current directory)
SwishProgParameters .
# Swish can index a number of different types of documents.
# .config are text, and .pdf are converted (filtered) to xml:
IndexContents TXT .config
IndexContents XML .pdf
# Since the pdf2xml module generates xml for the PDF info fields and
# for the PDF content, let's use MetaNames
# Instead of specifying each metaname, let's let swish do it automatically.
UndefinedMetaTags auto
# Show what's happening
IndexReport 3
# end of example
|