1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
|
\defmodule {scomp}
This module contains three tests based on the evolution of the
{\em linear complexity\/} of a sequence of bits as it grows,%
\index{complexity!linear}
and a test based on the compressibility of the bit sequence,
as measured by the Lempel-Ziv complexity.%
\index{compression}
% The first two linear complexity tests are taken from
% Erdmann \cite{rERD92a} and the third from \cite {rRUK01a}.
For the compressibility test, we use the Lempel-Ziv compression
algorithm of 1978 (see \cite{iZIV78a}).
A similar test is implemented in \cite{rERD92a,rRUK01a},
but according to the authors, it uses a version of
the Lempel-Ziv algorithm of 1977 instead \cite{iZIV77a}.
\resdef
\bigskip
\hrule
\code\hide
/* scomp.h for ANSI C */
#ifndef SCOMP_H
#define SCOMP_H
\endhide
#include <testu01/unif01.h>
#include <testu01/sres.h>
\endcode
\ifdetailed %%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\guisec{Structure for test results}
\code
typedef struct {
sres_Basic *JumpNum;
sres_Chi2 *JumpSize;
sres_Chi2 *LinComp;
} scomp_Res;
\endcode
\tab
Structure for keeping the results of the tests on linear complexity
in this module. The results for the number of jumps are kept in
{\tt JumpNum}, for the size of the jumps in {\tt JumpSize}, and for the
linear complexity itself in {\tt LinComp}.
\endtab
\code
scomp_Res * scomp_CreateRes (void);
\endcode
\tab
Creates and returns a structure that will hold the results
of a {\tt scomp\_LinearComp} test.
\endtab
\code
void scomp_DeleteRes (scomp_Res *res);
\endcode
\tab
Frees the memory allocated by {\tt scomp\_CreateRes}.
\endtab
\fi %%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\guisec{The tests}
\code
void scomp_LinearComp (unif01_Gen *gen, scomp_Res *res,
long N, long n, int r, int s);
\endcode
\tab \index{Test!LinearComp} %
\index{Test!linear complexity|see{LinearComp}} %
This procedure applies simultaneously the jump complexity test and the jump
size test, two tests based on the linear complexity of a sequence of bits and
described in \cite{rCAR89a,rERD92a}. A sequence of $n$ bits is constructed by
taking $s$ bits from each random number. For each $\ell$, $1\le \ell\le n$,
the linear complexity of the subsequence formed by the first $\ell$ bits is
computed by the Berlekamp-Massey algorithm \cite{mBER84a,mMAS69a}.
\index{Berlekamp-Massey algorithm}
The {\em jump complexity test\/} counts the number of jumps occurring in the
linear complexity of the sequence. A {\em jump\/} occurs whenever adding
the next bit to the sequence increases its linear complexity.
Under $\cH_0$, for $n$ sufficiently large, the number of jumps, say $J$,
is approximately normally distributed with mean and variance
given by \cite{rWAN97a}:
\begin{eqnarray}
E(J) & = & \frac n 4 + \frac{4 + R_n}{12} - \frac 1 {3(2^n)} \\[6pt]
\Var(J) & = & \frac n 8 - \frac{2 - R_n}{9 - R_n} + \frac n {6(2^n)}
+ \frac{6 + R_n}{18(2^n)} - \frac 1 {9(2^{2n})},
\end{eqnarray}
where $R_n$ is the parity of $n$ (= 0 for even $n$, 1 for odd $n$).
The test compares the standardized value of the observed number
of jumps to the standard normal distribution.
\hpierre{Should be called ``jump sizes'', I believe.}
The {\em jump size test\/} looks at the
{\em size\/} of the jumps (there is a jump of size $h$ if the
complexity increases by $h$ when we consider the next bit of the
sequence), and counts how many jumps of each size have occurred.
It then compares these countings with the expected values via a chi-square test.
Carter has shown \cite{rCAR89a} that under $\cH_0$, the jump sizes are i.i.d.
random variables which obey a geometric law with parameter 1/2.
% It finally applies a Kolmogorov-Smirnov test comparing the distribution of
% the $N$ chi-square values to the chi-square distribution.
When $N$ is large enough, the procedure also applies a third test,
taken from \cite{rRUK01a}. It is a chi-square
test based on the $N$ replications of the statistic
\begin{eqnarray}
T_n & = & (-1)^n (L_n - \xi_n) + 2/9, \nonumber
\end{eqnarray}
where $L_n$ is the linear complexity of the sequence, and
$\xi_n = n/2 + (4 + R_n)/18$ is an approximation of $E[L_n]$ under $\cH_0$.
The statistic $T_n$ takes only integer values, with probabilities
given by (see \cite{rRUK01a}):
% Nist Special publication 800-22, May 15,
% 2001, p. 86, at \url{http://csrc.nist.gov/rng/}.
\begin{eqnarray*}
P[T = k] & = & \left\{\begin{array}{ll}
0.5 & \mbox{ for } k=0, \\[4pt]
\displaystyle {2^{-2k}} & \mbox{ for } k = 1, 2, 3,
\ldots \\[4pt]
\displaystyle { 2^{-2|k| - 1}} & \mbox{ for } k = -1, -2,
-3, \ldots
\end{array} \right.
\end{eqnarray*}
% Restriction: $ns/16 \ge$ {\tt gofs\_MinExpected}.
Recommendation: take $N=1$ and $n$ large (however, computing the
linear complexity takes $O(n^2 \log n)$ time).
\endtab
\code
void scomp_LempelZiv (unif01_Gen *gen, sres_Basic *res,
long N, int k, int r, int s);
\endcode
\tab
Given a string of $n= 2^k$ bits, this test \cite{rRUK01a,rERD92a}
counts the number $W$ of distinct patterns in the string in order to
determine%
\index{Test!LempelZiv} \index{Lempel-Ziv compression} %
\index{Test!compression|see{LempelZiv}}
its compressibility by the Lempel-Ziv compression
algorithm of 1978 (see \cite{iZIV78a}).
According to \cite{iKIR94a}, under $\cH_0$,
$$
Z = \frac{W - {n}/{\log_2 n}}{\sqrt{{0.266\,n} /
{(\log_2 n)^3}}},
$$
has approximately the standard normal distribution for large $n$.
However, our tests show that even for $n$ as large as $2^{24}$,
the approximation is not very good.
\hrichard {An interesting problem: To examine the exact distribution
for small or finite n. What is the right correction for finite n?
Why the correction calculated in \cite{iKIR94a} (and in other papers)
does not seem to work?}
% does not seem to obey this distribution, which
% seem to indicate that the asymptotic regime is attained only for much
% larger values of $n$ because of the logarithmic factor.
Our implementation of the test assumes that $W$ has approximately
the normal distribution, but uses estimates of its mean and
standard deviation that have been obtained through simulation
with two differents reliable generators.
Restriction: $k \le 28$ and $N$ not too large.
\endtab
\code
\hide
#endif
\endhide
\endcode
|