File: StopFilter.pm

package info (click to toggle)
libplucene-perl 1.24-1
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k
  • size: 1,292 kB
  • ctags: 429
  • sloc: perl: 4,158; makefile: 52
file content (49 lines) | stat: -rw-r--r-- 934 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
package Plucene::Analysis::StopFilter;

=head1 NAME 

Plucene::Analysis::StopFilter - the stop filter

=head1 SYNOPSIS

	# isa Plucene::Analysis::TokenFilter

	my $next = $stop_filter->next;

=head1 DESCRIPTION

This removes stop words from a token stream.

Instances of the StopFilter class are tokens filters that removes from the 
indexed text words of your choice. Typically this is used to filter out common 
words ('the', 'a' 'if' etc) that increase the overhead but add no value during 
searches.

=head1 METHODS

=cut

use strict;
use warnings;

use base 'Plucene::Analysis::TokenFilter';

=head2 next

	my $next = $stop_filter->next;

This returns the next input token whose term is not a stop word.

=cut

sub next {
	my $self = shift;
	$self->{stophash} ||= { map { $_ => 1 } @{ $self->{stoplist} } };
	while (my $t = $self->input->next) {
		next if exists $self->{stophash}->{ $t->text() };
		return $t;
	}
	return;
}

1;