File: scomp.tex

package info (click to toggle)
testu01 1.2.3%2Bds1-1
  • links: PTS, VCS
  • area: non-free
  • in suites: bullseye, buster
  • size: 17,740 kB
  • sloc: ansic: 52,357; makefile: 241; sh: 53
file content (178 lines) | stat: -rw-r--r-- 6,582 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
\defmodule {scomp}

This module contains three tests based on the evolution of the 
{\em linear complexity\/} of a sequence of bits as it grows,%
\index{complexity!linear}
and a test based on the compressibility of the bit sequence,
as measured by the Lempel-Ziv complexity.%
\index{compression} 
% The first two linear complexity tests are taken from 
% Erdmann \cite{rERD92a} and the third from \cite {rRUK01a}. 
For the compressibility test, we use the Lempel-Ziv compression 
algorithm of 1978 (see \cite{iZIV78a}).
A similar test is implemented in \cite{rERD92a,rRUK01a},
but according to the authors, it uses a version of
the Lempel-Ziv algorithm of 1977 instead \cite{iZIV77a}.
\resdef


\bigskip
\hrule
\code\hide
/* scomp.h  for ANSI C */
#ifndef SCOMP_H
#define SCOMP_H
\endhide
#include <testu01/unif01.h>
#include <testu01/sres.h>
\endcode

\ifdetailed  %%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\guisec{Structure for test results}
\code
typedef struct {

   sres_Basic *JumpNum;
   sres_Chi2 *JumpSize;
   sres_Chi2 *LinComp;

} scomp_Res;
\endcode
 \tab
  Structure for keeping the results of the tests on linear complexity
  in this module. The results for the number of jumps are kept in
  {\tt JumpNum}, for the size of the jumps in {\tt JumpSize}, and for the
  linear complexity itself in  {\tt LinComp}.
 \endtab
\code


scomp_Res * scomp_CreateRes (void);
\endcode
 \tab 
  Creates and returns a structure that will hold the results
  of a  {\tt scomp\_LinearComp} test. 
 \endtab
\code


void scomp_DeleteRes (scomp_Res *res);
\endcode
 \tab 
  Frees the memory allocated by {\tt scomp\_CreateRes}.
 \endtab

\fi %%%%%%%%%%%%%%%%%%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\guisec{The tests}

\code
void scomp_LinearComp (unif01_Gen *gen, scomp_Res *res,
                       long N, long n, int r, int s);
\endcode
\tab \index{Test!LinearComp} %
  \index{Test!linear complexity|see{LinearComp}} %
  This procedure applies simultaneously the jump complexity test and the jump 
  size test, two tests based on the linear complexity of a sequence of bits and
  described in \cite{rCAR89a,rERD92a}. A sequence of $n$ bits is constructed by
  taking $s$ bits from each random number. For each $\ell$, $1\le \ell\le n$,
  the linear complexity of the subsequence formed by the first $\ell$ bits is
  computed by the  Berlekamp-Massey algorithm \cite{mBER84a,mMAS69a}. 
  \index{Berlekamp-Massey algorithm} 

  The {\em jump complexity test\/} counts the number of jumps occurring in the 
  linear complexity of the sequence.  A {\em jump\/} occurs whenever adding 
  the next bit to the sequence increases its linear complexity. 
  Under $\cH_0$, for $n$ sufficiently large, the number of jumps, say $J$,
  is approximately normally distributed  with mean and variance
  given by \cite{rWAN97a}:
  \begin{eqnarray}
      E(J) & = & \frac n 4 + \frac{4 + R_n}{12} - \frac 1 {3(2^n)} \\[6pt]
   \Var(J) & = & \frac n 8 - \frac{2 - R_n}{9 - R_n} + \frac n {6(2^n)}
                 + \frac{6 + R_n}{18(2^n)} - \frac 1 {9(2^{2n})},
   \end{eqnarray}
  where $R_n$ is the parity of $n$ (= 0 for even $n$, 1 for odd $n$).
  The test compares the standardized value of the observed number 
  of jumps to the standard normal distribution.

 \hpierre{Should be called ``jump sizes'', I believe.}
  The {\em jump size test\/} looks at the
  {\em size\/} of the jumps (there is a jump of size $h$ if the
  complexity increases by $h$ when we consider the next bit of the
  sequence), and counts how many jumps of each size have occurred.
  It then compares these countings with the expected values via a chi-square test.
   Carter has shown \cite{rCAR89a} that under $\cH_0$, the jump sizes are i.i.d.
   random variables which obey a geometric law with parameter 1/2.
%  It finally applies a Kolmogorov-Smirnov test comparing the distribution of
%  the $N$ chi-square values  to the chi-square distribution.

  When $N$ is large enough, the procedure also applies a third test,
  taken from \cite{rRUK01a}.  It is a chi-square 
  test based on the $N$ replications of the statistic
  \begin{eqnarray}
   T_n  & = & (-1)^n (L_n - \xi_n) + 2/9,  \nonumber
  \end{eqnarray}
  where $L_n$ is the linear complexity of the sequence, and
  $\xi_n = n/2 + (4 + R_n)/18$ is an approximation of $E[L_n]$ under $\cH_0$.
  The statistic $T_n$ takes only integer values, with probabilities 
  given by (see \cite{rRUK01a}):
 % Nist Special publication 800-22, May 15,
 % 2001, p. 86, at \url{http://csrc.nist.gov/rng/}.
   \begin{eqnarray*}
     P[T = k] & = & \left\{\begin{array}{ll} 
          0.5 & \mbox{ for } k=0, \\[4pt]
         \displaystyle {2^{-2k}} & \mbox{ for } k = 1, 2, 3,
    \ldots \\[4pt]
         \displaystyle { 2^{-2|k| - 1}} & \mbox{ for } k = -1, -2,
  -3, \ldots 
       \end{array} \right.
   \end{eqnarray*}
%  Restriction: $ns/16 \ge$ {\tt gofs\_MinExpected}.
  Recommendation: take $N=1$ and $n$ large (however, computing the 
  linear complexity takes $O(n^2 \log n)$ time).
\endtab
\code


void scomp_LempelZiv (unif01_Gen *gen, sres_Basic *res,
                      long N, int k, int r, int s);
\endcode
\tab
   Given a string of $n= 2^k$ bits, this test \cite{rRUK01a,rERD92a} 
   counts the number $W$ of distinct patterns in the string in order to
   determine%
\index{Test!LempelZiv} \index{Lempel-Ziv compression} %
 \index{Test!compression|see{LempelZiv}}
   its compressibility by the Lempel-Ziv compression
   algorithm of 1978 (see \cite{iZIV78a}). 
   According to \cite{iKIR94a}, under $\cH_0$, 
   $$
     Z = \frac{W - {n}/{\log_2 n}}{\sqrt{{0.266\,n} /
        {(\log_2 n)^3}}},
   $$
   has approximately the standard normal distribution for large $n$. 
   However, our tests show that even for $n$ as large as $2^{24}$,
   the approximation is not very good.
\hrichard {An interesting problem: To examine the exact distribution
 for small or finite n. What is the right correction for finite n?
 Why the correction calculated in \cite{iKIR94a} (and in other papers)
 does not seem to work?}
% does not seem to obey this distribution, which
%   seem to indicate that the asymptotic regime is attained only for much
%   larger values of $n$ because of the logarithmic factor.
   Our implementation of the test assumes that $W$ has approximately
   the normal distribution, but uses estimates of its mean and 
   standard deviation that have been obtained through simulation 
   with two differents reliable generators.
   Restriction: $k \le 28$ and $N$ not too large.
\endtab

\code
\hide
#endif
\endhide
\endcode