1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
|
# frozen_string_literal: true
require_relative 'subset'
module TTFunk
# Subset collection.
#
# For many use cases a font subset can be efficiently encoded using MacRoman
# encoding. However, for full font coverage and characters that are not in
# MacRoman encoding an additional Unicode subset is used. There can be as many
# as needed Unicode subsets to fully cover glyphs provided by the original
# font. Ther resulting set of subsets all use 8-bit encoding helping to
# efficiently encode text in Prawn.
class SubsetCollection
# @param original [TTFunk::File]
def initialize(original)
@original = original
@subsets = [Subset.for(@original, :mac_roman)]
end
# Get subset by index.
#
# @param subset [Integer]
# @return [TTFunk::Subset::Unicode, TTFunk::Subset::Unicode8Bit,
# TTFunk::Subset::MacRoman, TTFunk::Subset::Windows1252]
def [](subset)
@subsets[subset]
end
# Add chracters to appropiate subsets.
#
# @param characters [Array<Integer>] should be an array of UTF-16 code
# points
# @return [void]
def use(characters)
characters.each do |char|
covered = false
i = 0
length = @subsets.length
while i < length
subset = @subsets[i]
if subset.covers?(char)
subset.use(char)
covered = true
break
end
i += 1
end
unless covered
@subsets << Subset.for(@original, :unicode_8bit)
@subsets.last.use(char)
end
end
end
# Encode characters into subset-character pairs.
#
# @param characters [Array<Integer>] should be an array of UTF-16 code
# points
# @return [Array<Array(Integer, String)>] subset chunks, where each chunk
# is another array of two elements. The first element is the subset
# number, and the second element is the string of characters to render
# with that font subset. The strings will be encoded for their subset
# font, and so may not look (in the raw) like what was passed in, but they
# will render correctly with the corresponding subset font.
def encode(characters)
return [] if characters.empty?
# TODO: probably would be more optimal to nix the #use method,
# and merge it into this one, so it can be done in a single
# pass instead of two passes.
use(characters)
parts = []
current_subset = 0
current_char = 0
char = characters[current_char]
loop do
while @subsets[current_subset].includes?(char)
char = @subsets[current_subset].from_unicode(char)
if parts.empty? || parts.last[0] != current_subset
encoded_char = char.chr
if encoded_char.respond_to?(:force_encoding)
encoded_char.force_encoding('ASCII-8BIT')
end
parts << [current_subset, encoded_char]
else
parts.last[1] << char
end
current_char += 1
return parts if current_char >= characters.length
char = characters[current_char]
end
current_subset = (current_subset + 1) % @subsets.length
end
end
end
end
|