File: scomp.tex

package info (click to toggle)
testu01 1.2.3%2Bds1-1
links: PTS, VCS
area: non-free
in suites: bullseye, buster
size: 17,740 kB
sloc: ansic: 52,357; makefile: 241; sh: 53
file content (178 lines) | stat: -rw-r--r-- 6,582 bytes
parent folder | download | duplicates (4)
\defmodule {scomp}

This module contains three tests based on the evolution of the 
{\em linear complexity\/} of a sequence of bits as it grows,%
\index{complexity!linear}
and a test based on the compressibility of the bit sequence,
as measured by the Lempel-Ziv complexity.%
\index{compression} 
% The first two linear complexity tests are taken from 
% Erdmann \cite{rERD92a} and the third from \cite {rRUK01a}. 
For the compressibility test, we use the Lempel-Ziv compression 
algorithm of 1978 (see \cite{iZIV78a}).
A similar test is implemented in \cite{rERD92a,rRUK01a},
but according to the authors, it uses a version of
the Lempel-Ziv algorithm of 1977 instead \cite{iZIV77a}.
\resdef


\bigskip
\hrule
\code\hide
/* scomp.h  for ANSI C */
#ifndef SCOMP_H
#define SCOMP_H
\endhide
#include <testu01/unif01.h>
#include <testu01/sres.h>
\endcode

\ifdetailed  %%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\guisec{Structure for test results}
\code
typedef struct {

   sres_Basic *JumpNum;
   sres_Chi2 *JumpSize;
   sres_Chi2 *LinComp;

} scomp_Res;
\endcode
 \tab
  Structure for keeping the results of the tests on linear complexity
  in this module. The results for the number of jumps are kept in
  {\tt JumpNum}, for the size of the jumps in {\tt JumpSize}, and for the
  linear complexity itself in  {\tt LinComp}.
 \endtab
\code


scomp_Res * scomp_CreateRes (void);
\endcode
 \tab 
  Creates and returns a structure that will hold the results
  of a  {\tt scomp\_LinearComp} test. 
 \endtab
\code


void scomp_DeleteRes (scomp_Res *res);
\endcode
 \tab 
  Frees the memory allocated by {\tt scomp\_CreateRes}.
 \endtab

\fi %%%%%%%%%%%%%%%%%%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\guisec{The tests}

\code
void scomp_LinearComp (unif01_Gen *gen, scomp_Res *res,
                       long N, long n, int r, int s);
\endcode
\tab \index{Test!LinearComp} %
  \index{Test!linear complexity|see{LinearComp}} %
  This procedure applies simultaneously the jump complexity test and the jump 
  size test, two tests based on the linear complexity of a sequence of bits and
  described in \cite{rCAR89a,rERD92a}. A sequence of $n$ bits is constructed by
  taking $s$ bits from each random number. For each $\ell$, $1\le \ell\le n$,
  the linear complexity of the subsequence formed by the first $\ell$ bits is
  computed by the  Berlekamp-Massey algorithm \cite{mBER84a,mMAS69a}. 
  \index{Berlekamp-Massey algorithm} 

  The {\em jump complexity test\/} counts the number of jumps occurring in the 
  linear complexity of the sequence.  A {\em jump\/} occurs whenever adding 
  the next bit to the sequence increases its linear complexity. 
  Under $\cH_0$, for $n$ sufficiently large, the number of jumps, say $J$,
  is approximately normally distributed  with mean and variance
  given by \cite{rWAN97a}:
  \begin{eqnarray}
      E(J) & = & \frac n 4 + \frac{4 + R_n}{12} - \frac 1 {3(2^n)} \\[6pt]
   \Var(J) & = & \frac n 8 - \frac{2 - R_n}{9 - R_n} + \frac n {6(2^n)}
                 + \frac{6 + R_n}{18(2^n)} - \frac 1 {9(2^{2n})},
   \end{eqnarray}
  where $R_n$ is the parity of $n$ (= 0 for even $n$, 1 for odd $n$).
  The test compares the standardized value of the observed number 
  of jumps to the standard normal distribution.

 \hpierre{Should be called ``jump sizes'', I believe.}
  The {\em jump size test\/} looks at the
  {\em size\/} of the jumps (there is a jump of size $h$ if the
  complexity increases by $h$ when we consider the next bit of the
  sequence), and counts how many jumps of each size have occurred.
  It then compares these countings with the expected values via a chi-square test.
   Carter has shown \cite{rCAR89a} that under $\cH_0$, the jump sizes are i.i.d.
   random variables which obey a geometric law with parameter 1/2.
%  It finally applies a Kolmogorov-Smirnov test comparing the distribution of
%  the $N$ chi-square values  to the chi-square distribution.

  When $N$ is large enough, the procedure also applies a third test,
  taken from \cite{rRUK01a}.  It is a chi-square 
  test based on the $N$ replications of the statistic
  \begin{eqnarray}
   T_n  & = & (-1)^n (L_n - \xi_n) + 2/9,  \nonumber
  \end{eqnarray}
  where $L_n$ is the linear complexity of the sequence, and
  $\xi_n = n/2 + (4 + R_n)/18$ is an approximation of $E[L_n]$ under $\cH_0$.
  The statistic $T_n$ takes only integer values, with probabilities 
  given by (see \cite{rRUK01a}):
 % Nist Special publication 800-22, May 15,
 % 2001, p. 86, at \url{http://csrc.nist.gov/rng/}.
   \begin{eqnarray*}
     P[T = k] & = & \left\{\begin{array}{ll} 
          0.5 & \mbox{ for } k=0, \\[4pt]
         \displaystyle {2^{-2k}} & \mbox{ for } k = 1, 2, 3,
    \ldots \\[4pt]
         \displaystyle { 2^{-2|k| - 1}} & \mbox{ for } k = -1, -2,
  -3, \ldots 
       \end{array} \right.
   \end{eqnarray*}
%  Restriction: $ns/16 \ge$ {\tt gofs\_MinExpected}.
  Recommendation: take $N=1$ and $n$ large (however, computing the 
  linear complexity takes $O(n^2 \log n)$ time).
\endtab
\code


void scomp_LempelZiv (unif01_Gen *gen, sres_Basic *res,
                      long N, int k, int r, int s);
\endcode
\tab
   Given a string of $n= 2^k$ bits, this test \cite{rRUK01a,rERD92a} 
   counts the number $W$ of distinct patterns in the string in order to
   determine%
\index{Test!LempelZiv} \index{Lempel-Ziv compression} %
 \index{Test!compression|see{LempelZiv}}
   its compressibility by the Lempel-Ziv compression
   algorithm of 1978 (see \cite{iZIV78a}). 
   According to \cite{iKIR94a}, under $\cH_0$, 
   $$
     Z = \frac{W - {n}/{\log_2 n}}{\sqrt{{0.266\,n} /
        {(\log_2 n)^3}}},
   $$
   has approximately the standard normal distribution for large $n$. 
   However, our tests show that even for $n$ as large as $2^{24}$,
   the approximation is not very good.
\hrichard {An interesting problem: To examine the exact distribution
 for small or finite n. What is the right correction for finite n?
 Why the correction calculated in \cite{iKIR94a} (and in other papers)
 does not seem to work?}
% does not seem to obey this distribution, which
%   seem to indicate that the asymptotic regime is attained only for much
%   larger values of $n$ because of the logarithmic factor.
   Our implementation of the test assumes that $W$ has approximately
   the normal distribution, but uses estimates of its mean and 
   standard deviation that have been obtained through simulation 
   with two differents reliable generators.
   Restriction: $k \le 28$ and $N$ not too large.
\endtab

\code
\hide
#endif
\endhide
\endcode