File: 2010-01-14-memory-efficiency-when-using-ruby.md

package info (click to toggle)
ruby-chunky-png 1.4.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,232 kB
  • sloc: ruby: 4,586; makefile: 4
file content (136 lines) | stat: -rw-r--r-- 5,534 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
author: Willem van Bergen
title: Memory efficiency when using Ruby
---

I have been spending some time creating [a pure Ruby PNG library](https://github.com/wvanbergen/chunky_png). For this library, I need to have some representation of the image, which is composed of RGB pixels, supporting an alpha channel. Because images can be composed of a lot of pixels, I want the implementation to be as memory efficient as possible. I also would like decent performance.

A very naive Ruby implementation for an image represents the red, green, blue and alpha channel using a floating point number between 0.0 and 1.0, and might look something like this:

{% highlight ruby %}
class Pixel
  attr_reader :r, :g, :b, :a

  def initialize(r, g, b, a = 1.0)
    @r, @g, @b, @a = r, g, b, a
  end
end

class Image
  attr_reader :width, :height

  def initialize(width, height)
    @width, @height = width, height
    @pixels = Array.new(width * height)
  end

  def [](x,y)
    @pixels[y * width + x]
  end

  def []=(x,y, pixel)
    @pixels[y * width + x] = pixel
  end
end
{% endhighlight %}

For a 10×10 image, this representation requires 4 times 100 floating point numbers, which require 8 bytes each. That’s already over 3kB for such a small image just for the floating point numbers! Ouch.

A simple improvement is to decide that 8-bit color depth is enough in the case, in which case each channel can be represented by an integer between 0 and 255. Storing such a number only costs one byte of memory. Ruby’s Fixnum class typically uses 4-byte integers. If only the 4 channels of one byte each could be combined into a single Fixnum instance… Behold!

{% highlight ruby %}
class Pixel
  attr_reader :value
  alias :to_i :value

  def initialize(value)
    @value = value
  end

  def self.rgba(r, g, b, a = 255)
    self.new(r << 24 | g << 16 | b << 8 | a)
  end

  def r; (@value & 0xff000000) >> 24; end
  def g; (@value & 0x00ff0000) >> 16; end
  def b; (@value & 0x0000ff00) >>  8; end
  def a; (@value & 0x000000ff); end
end
{% endhighlight %}

Notice the bit operations, which are extremely fast. This only requires 100 times 4 bytes = 400 bytes for storing the RGBA values for a 10×10 image, an 8 times improvement!

This implementation wraps every pixel inside an object. This is nice, because I want to access the separate channels of every pixel easily using the r, g, b, and a methods, and every other method that is defined for every pixel. However, a Ruby object instance has an overhead of at least 20 bytes. That’s 20 times 100 is about 2kB for our 10×10 image!

To get rid of the object overhead, it is possible to simply store the Fixnum value for every pixel, and only wrapping it inside a Pixel object when it is accessed. This can be done by modifying the Image class:

{% highlight ruby %}
class Image
  # ...

  def [](x,y)
    Pixel.new(@pixels[y * width + x]) # wrap
  end

  def []=(x,y, pixel)
    @pixels[y * width + x] = pixel.to_i # unwrap
  end
end
{% endhighlight %}

As you can see, some simply changes in the representation can really make a difference in the memory usage. Can this representation be improved further?

## Integer math calculations

Because we are now using integers to represent a pixel, this can cause problems when the math requires you to use floating point numbers. For example, the formula for [alpha composition](https://en.wikipedia.org/wiki/Alpha_compositing) of two pixels is as follows:

\\[ C_o = C_a \alpha_a + C_b \alpha_b (1 - \alpha_a) \\]

in which \\(C_a\\) is the color component of the foreground pixel, \\(\alpha_a\\) the alpha channel of the foreground pixel, \\(C_b\\) and \\(\alpha_b\\) the same values for the background pixel, all of which should be values between 0 and 1.

A naive implementation could convert the integer numbers to their floating point equivalents:

{% highlight ruby %}
def compose(fg, bg)
  return bg if fg.a == 0
  return fg if fg.a == 255

  fg_alpha = fg.a / 255.0
  bg_alpha = fg.a / 255.0
  alpha_complement = (1.0 - fg_alpha) * bg_alpha

  new_r = (fg_alpha * fg.r + alpha_complement * bg.r).round
  new_g = (fg_alpha * fg.g + alpha_complement * bg.g).round
  new_b = (fg_alpha * fg.b + alpha_complement * bg.b).round
  new_a = ((fg_alpha + alpha_complement) * 255).round

  Pixel.rgba(new_r, new_g, new_b, new_a)
end
{% endhighlight %}

This implementation is already a little bit optimized: no unnecessary conversions and calculations are being performed. However, this composition can be done a lot quicker after realizing that 255 is almost a power of two, in which computers excel because it can use bitwise operators and shifting for some calculations.

My new approach uses a quicker implementation of multiplication of 8-bit integers that represent floating numbers between 0 and 1:

{% highlight ruby %}
def compose(fg, bg)
  return bg if fg.a == 0
  return fg if fg.a == 255

  alpha_complement = multiply(255 - fg.a, bg.a)
  new_r = multiply(fg.a, fg.r) + multiply(alpha_complement, bg.r)
  new_g = multiply(fg.a, fg.g) + multiply(alpha_complement, bg.g)
  new_b = multiply(fg.a, fg.b) + multiply(alpha_complement, bg.b)
  new_a = fg.a + alpha_complement

  Pixel.rgba(new_r, new_g, new_b, new_a)
end

# Quicker alternative for (a * b / 255.0).round
def multiply(a, b)
  t = a * b + 0x80
  ((t >> 8) + t) >> 8
end
{% endhighlight %}

Note that the new implementation is less precise in theory, but this precision is lost anyway because we have to convert the values back to 8 bit RGBA values. Your thoughts?