File: buftok.rb

package info (click to toggle)
ruby-buftok 0.3.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 108 kB
  • sloc: ruby: 71; makefile: 2
file content (72 lines) | stat: -rw-r--r-- 2,574 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# frozen_string_literal: true
#
# BufferedTokenizer takes a delimiter upon instantiation, or acts line-based
# by default.  It allows input to be spoon-fed from some outside source which
# receives arbitrary length datagrams which may-or-may-not contain the token
# by which entities are delimited.  In this respect it's ideally paired with
# something like EventMachine (http://rubyeventmachine.com/).
class BufferedTokenizer
  # New BufferedTokenizers will operate on lines delimited by a delimiter,
  # which is by default the global input delimiter $/ ("\n").
  #
  # The input buffer is stored as an array.  This is by far the most efficient
  # approach given language constraints (in C a linked list would be a more
  # appropriate data structure).  Segments of input data are stored in a list
  # which is only joined when a token is reached, substantially reducing the
  # number of objects required for the operation.
  def initialize(delimiter = $/)
    @delimiter = delimiter
    @input = []
    @tail = String.new
    @trim = @delimiter.length - 1
  end

  # Determine the size of the internal buffer.
  #
  # Size is not cached and is determined every time this method is called
  # in order to optimize throughput for extract.
  def size
    @tail.length + @input.inject(0) { |total, input| total + input.length }
  end

  # Extract takes an arbitrary string of input data and returns an array of
  # tokenized entities, provided there were any available to extract.  This
  # makes for easy processing of datagrams using a pattern like:
  #
  #   tokenizer.extract(data).map { |entity| Decode(entity) }.each do ...
  #
  # Using -1 makes split to return "" if the token is at the end of
  # the string, meaning the last element is the start of the next chunk.
  def extract(data)
    if @trim > 0
      tail_end = @tail.slice!(-@trim, @trim) # returns nil if string is too short
      data = tail_end + data if tail_end
    end

    @input << @tail
    entities = data.split(@delimiter, -1)
    @tail = entities.shift

    unless entities.empty?
      @input << @tail
      entities.unshift @input.join
      @input.clear
      @tail = entities.pop
    end

    entities
  end

  # Flush the contents of the input buffer, i.e. return the input buffer even though
  # a token has not yet been encountered
  def flush
    @input << @tail
    buffer = @input.join
    @input.clear
    @tail = String.new # @tail.clear is slightly faster, but not supported on 1.8.7
    buffer
  end
end

# The expected constant for a gem named buftok
Buftok = BufferedTokenizer