File: index.html

package info (click to toggle)
openrefine-arithcode 1.2-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, forky, sid, trixie
  • size: 308 kB
  • sloc: java: 1,194; xml: 29; makefile: 5
file content (145 lines) | stat: -rw-r--r-- 4,466 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Arithmetic Coding and PPM in Java</title>
<meta http-equiv="Content-type"
      content="application/xhtml+xml; charset=iso-8859-1"/>
<meta http-equiv="Content-Language"
      content="en"/>
<style>
* { font-family: arial; }
body {  width: 45em; padding: 0 1em 1em 1em; font-size: 90% }
code { font-family: courier }
h1 {  font-size: 140%; font-style: bold }
h2 {  font-size: 115%; margin: 18px 0 6px 0 }
h3 {  font-size: 100%; margin: 12px 0 6px 0; padding: 0 }
h4 { margin: 12px 0 6px 0; padding 0 }
p {  margin: 6px 0 0 0; padding 0 0 6px 0 }
</style>
</head>
<body>

<h1>Arithmetic Coding and PPM Compression in Java</h1>

<h2>Overview</h2>

<p> The arithmetic coding package is an open-soruce implementation of
a generic arithemtic coder and decoder, along with byte stream models
that are subclasses of Java's I/O streams.  The implementation of
arithmetic coding is based on:</p>

<ul>
<li>Witten, I. H., R. Neal, and J. G. Cleary. 1987. Arithmetic coding for
data compression. <i>Communications of the ACM</i> <b>30</b>(6): 520-540.
</li>
</ul>

<p>Example statistical models
include a uniform distribution, simple unigram model, and a parametric
prediction by partial matching (PPM) model.  The PPM model is based
on:</p>

<ul>
<li>Cleary, J.G. and I.H. Witten. 1984. Data compression using
adaptive coding and partial string matching. <i>IEEE Transactions on
Communications.</i> <b>32</b>(4):396-402.  
</li>
</ul>

<p>Models other than PPM may be integrated for compression.</p>



<h2>Download</h2>

<p><b>License:</b> The arithcode package is licensed under the standard <a
href="license.txt">Apache 2.0 license</a>.  </p>

<p>You only need the jar to run the arithcode package, but you need
the source distribution to build and test it.</p>

<ul>
  <li>Java Archive (Jar): <a href="arithcode-1.2.jar">arithcode-1.2.jar</a></li>
  <li>Tar/Gzipped Source: <a href="arithcode-1.2.tgz">arithcode-1.2.tgz</a>
</ul>

<p>The source distribution includes <a href="http://www.junit.org">JUnit</a>,
which is required for testing, but not for building the jar or using
the jar.</p>

<p>The distribution also includes Witten and Bell's <a
href="http://en.wikipedia.org/wiki/Calgary_Corpus">Calgary corpus</a>,
which includes a range of different kinds of byte data for testing
compression algorithms.</p>



<h2>Getting Started</h2>

<p>The package has an <a href="http://ant.apache.org/">Apache Ant</a>
build file which has targets to build the jar, test, and document the
package. So just download the source, unpack it and run any of the
following ant targets: </p>

<ul>
<li>clean: cleans the distribution</li>
<li>compile: compile the java source</li>
<li>jar: build the jar file in /arithcode-1.2.jar</li>
<li>javadoc: generate the javadoc in /build/doc/api/</li>
<li>test-compile: compiles the tests (requires JUnit)</li>
<li>test: run the unit tests (requires JUnit)</li>
<li>calgary: run various PPM compressors on the Calgary corpus</li>
<li>ppm: compress a file using PPM</li>
<li>unppm: uncompress a file using PPM</li>
<li>web: build the web site</li>
</ul>


<h2>Understanding Arithmetic Coding and PPM</h2>

<p>The original papers cited in the introduction are
quite readable.  I've also provided a tutorial with
further references.</p>

<ul>
<li><a href="doc/tutorial.html">Quick Intro to Arithmetic Coding and PPM</a>
</ul>


<h2>PPM Compression Rates and Speed</h2>

<p>Here are results for the Calgary corpus for speed
and compression for various lengths of PPM.</p>


<ul>
<li><a href="results/results-09-02-2011.html">Version 1.2 Results: 9 February 2011</a></li>
</ul>

<p>And just for laughs, some historical results illustrating the
progress of compilers and processors and memory.</p>

<ul>
<li><a href="results/results-26-01-2003.html">Version 1.1 Results: 26 January 2003</a></li>
<li><a href="results/results-06-10-2002.html">Version 1.0 Results: 6 October 2002</a></li>
</ul>


<h2>Acknowledgements</h2>

<p>Thanks to the original authors for making their source and
corpora available.</p>

<p>
Thanks also to Garrick Toubassi for patching a bug in version 1.1; it
only arose in fairly extreme boundary conditions, making its diagnosis
a very nice piece of debugging indeed.</p>


</body>

</html>