Explicit Inversive Pseudorandom Number Generators
Diplomarbeit
zur Erlangung des Magistergrades
an der Naturwissenschaftlichen Fakultät
der Universität Salzburg
eingereicht von
Otmar Lendl
Salzburg, im November 1996
As a rule, random number generators are fragile and need to be treated with respect. It’s diﬃcult to be sure that a particular generator is good without an enormous amount of eﬀort in the various statistical test.
The moral is: do your best to use a good generator, based on the mathematical analysis and the experience of others; just to be sure, examine the numbers to make sure that they “look” random; if anything goes wrong, blame the randomnumber generator !
– Robert Sedgewick, in “Algorithms” (Second Edition, 1988)
I’d like to thank everybody who helped to make this thesis become reality. Peter for his patience, trust and guidance; Charly, Hannes, Karin, and Stefan for the most creative, helpful and entertaining working environment I have ever experienced; and my mother for her unwavering support.
The purpose of this thesis is to discuss the implementation of the explicit inversive congruential generator (EICG) and the properties of the resulting pseudorandom numbers. But before we delve into the details of the implementation or the theoretical and empirical results we will take a closer look at the basic concept of pseudorandom numbers.
What do we mean when we talk about pseudorandom numbers (PRN) ? And for what purpose do we devise such elaborate means to artiﬁcially generate megabytes of digital noise ?
To the uninitiated, all this pseudorandom numbers “business” seems to have no serious applications. Everybody will come up with computer games as a ﬁeld where pseudorandom numbers are used to make the behaviour of the computer less predictable. Steering the movements of some onscreen monster does not require a high standard of randomness, almost any algorithm will suﬃce, provided it is easy to implement and does not cost too much computing resources.
Another domain where we need PRN is wherever we need to model a more or less random phenomenon of the real world. The simulation of a roulette table or other forms of lottery might still be in the area of nonserious application, but here the defects of the generator start to be an issue. Imagine this scenario: you try to develop a winning strategy for blackjack and use a simulation to test your algorithm. Any correlation between statistical defects and the strategy will lead to a skewed result and may even change the sign of the expected outcome. Playing that strategy in a real casino might cost you dearly. Thus it is important to make the choice of the generator an issue even in such nonscientiﬁc applications.
Simulation of random events is far from being limited to gambling, a signiﬁcant percentage of all simulations of natural phenomena contains a random component. Whether that may be quantum eﬀects, rainfall on a certain area, Brownian motion, absorption pattern, bifurcation of tree roots, failure of technical components, solar activity …, in all cases we know at best the statistical properties of an event. An analytical solution of the given problem based on probabilities is often not possible. Thus one has to resort to stochastic simulation (see [3, 48, 75]) where one calculates the result of the overall simulation by choosing possible outcomes of the underlying random events according to their respective probability. Doing this a number of times should provide enough samples of the outcome to estimate the probability of each possible result. Needless to say, the selection of the realisations of the underlying random events is crucial to the correctness of the whole calculation. Since this selection is done with the aid of PRN, their quality plays an important role in the whole process.
Finding the use of PRN in stochastic simulation is not that surprising, but ﬁnding them in algorithms for such mundane tasks like integration might need some more explanation. Numerical integration is a common problem in a great deal of real world problems. A big battery of algorithms (trapezoid method, Simpson’s method, spline quadrature, adaptive quadrature, RungeKutta, …) was developed to minimize the calculation costs while increasing the accuracy of the result. All these methods scale very badly with the dimension of the integral, so a completely diﬀerent approach is more appropriate there. The Monte Carlo method (see [3, 48, 78, 84]) uses randomly selected samples of the function to estimate the integral. Provided we know more about the behaviour of the function (i.e. its total variation) and the distribution of the actual samples used (as measured by their discrepancy), the inequality of KoksmaHlawka (See page §, [51, 70]) will give an error bound for this method. Since it is generally not possible to calculate the exact value of the discrepancy of the random numbers used for the integration, this error bound will only be a probabilistic one. In order to get a deterministic one, the random numbers, which determine which samples of the function will be evaluated, are replaced by numbers for which the order of the discrepancy is known. This turns the Monte Carlo method into the QuasiMonte Carlo method.
One way to get such good point sets is to explicitly construct them with that goal in mind (See [70] for a discussion of $\left(t,m,s\right)$nets and other methods.) or use a PRNG for which an upper bound on the discrepancy is known. This is one of the reasons why we will take a close look at this quantity in Section 3.3. Not all applications of QuasiMonte Carlo integration are labeled as such, as the basic algorithm can be regarded as a simple heuristic. For example, distributed ray tracing [32, p. 788] uses a set of randomly distributed rays to implement spatial and temporal antialiasing which amounts to an integration over both the timeframe of the picture and the pixel’s spatial extension.
Nondeterministic algorithms often use pseudorandom numbers, too. These algorithms are used to tackle problems for which a deterministic solution takes too much time. Although they cannot guarantee success, they promise to ﬁnd the solution (or a suboptimal one) within reasonable time. Examples for this kind of algorithm are Pollard’s rho heuristic [8, p. 844] for integer factorization, the RabinMiller primality test [8, p. 839], Simulated Annealing [31], and Threshold Accepting [11].
Other algorithms use pseudorandom numbers for a diﬀerent purpose. Instead of using them directly for solving the problem they are used to randomize the problem (or the algorithm) in order to avoid running into the same worstcase behaviour again and again. See [8, p. 161] for an explanation of the rationale behind randomized quicksort.
Some cryptographic algorithms and protocols require a good source of random numbers, too. Stream ciphers [77, p. 168f], for example, use the output of a PRNG (termed keystream generator) to encrypt the plaintext. The security of this cipher depends largely on the statistical quality of the keystream. Any regularities of the PRNG can be used by the attacker to predict the next bits and thus crack the code. No other domain of PRNG applications has such a high demand on the “randomness” of the generated PRN. Algorithms which are good enough for stochastic simulations are typically way too predictable to be useful as a keystream generator for a stream cipher. Thus the ﬁeld of cryptographically secure PRNG has amazingly little in common with the study of PRNG for stochastic simulation on which we will focus in this thesis. Information on cryptographically secure pseudorandom numbers can be found in [52], [82], and [77].
Another application of pseudorandom numbers in the ﬁeld of cryptology is providing the “random” numbers needed for a variety of cryptographic protocols. A well known example are the session keys generated for each transaction in hybrid cryptosystems. As the recent debacle involving the Netscape Navigator [33, 69] has shown, one must be very careful not to use a simple PRNG for this task. Since this is more a matter of how to get the entropy needed for nonpredictability than one of analysing the properties of sequences of PRN we will not elaborate on this subject in this thesis.
Now that we know a bit about the various applications of PRN, let’s try to formulate a few criteria for the selection of a good PRN generation algorithm. As we will see later it is crucial for the selection of the right PRNG to keep an eye on the application of the PRN.
This criterion may sound strange at ﬁrst sight, since reproducibility contradicts the intuitive notion of randomness, and indeed, real random number generators are extremely unlikely to ever repeat their output. So what are the advantages of a generator which will produce the same sequence of pseudorandom numbers when fed with the same parameters ? Once again, we have to turn to the application of which the generator is a component. In the case of a stochastic simulation the beneﬁt is twofold:
In some areas, for example stream ciphers, reproducibility is a key requirement for the application. Only very few applications, most of them in the area of cryptography, do actually beneﬁt from the use of nonreproducible PRN.
It is clear that when we want to simulate a random variable with a PRNG, then the output of the generator should model as closely as possible the expected behaviour of instances of the random variable. If a simulation of a dice generates a 7 or strongly favors the 6 we will not accept the generator. Other deviations from the desired behaviour, e.g. correlations, are harder to detect, and methods for systematically testing generators for such deﬁciencies have been the subject of considerable mathematical work [50, 28, 54, 85], including some parts of this thesis.
As we will see later, proving that a generator really has all the statistical properties a real random number generator is supposed to have, is not possible. So all we can do is to establish faith in the generator by testing it for some properties.
Empirical testing usually involves using the PRN for a stochastic simulation with a known result. If the computed results contradict the expected ones, the generator will be dismissed as not suitable for that kind of stochastic simulation. A passed test will increase the faith that this generator will yield correct results in real world problems. We will examine the signiﬁcance of empirical test results later in greater details.
A large battery of such tests was developed over the years, from the well known tests of Knuth [50] and Marsaglia [66] to recent additions like the weighted spectral test [40, 41, 46, 44]. See [54, §3.5.] for further references on testing pseudorandom number generators.
In order to make analytical investigations possible, most modern PRNG are deﬁned in quite simple mathematical terms. It is a tradeoﬀ: The simpler the algorithm, the easier it will be to prove statements concerning the quality of the generated numbers. On the other hand, a convoluted algorithm appeals to the intuition. History has shown [50, 74] that quite a few people could not resist the temptation to build generators based on doing obscure transformations on numbers stored in computers. Empirical analysis has shown that the quality of such generators are often abysmal.
Doing empirical studies on the properties of a PRNG is always possible, but deriving properties of the generator output by pure mathematical study has a lot of advantages. Whereas an empirical test can only cover one speciﬁc set of parameters of a generator, it is sometime possible to make analytically proven statements on the properties of PRN generated by a certain generator regardless of the parameters used. In the same vein, an empirical test on a speciﬁc part of the generator’s output, say the ﬁrst billion numbers, may give us conﬁdence on the behaviour of the next billion numbers, but cannot oﬀer any guarantee that they will be equally good. Analytical results fall in the following categories:
For most generators not all possible parameters will result in a functional generator. A typical question is that of gaining the largest possible period length. For the LCG^{1} this is just a set of simple conditions, for the ICG it involves ﬁnding IMP polynomials [6, 30, 45, 42, 43, 25]. For compound generators to work it is also necessary to obey certain analytically derived constraints.
For some generators it is possible to derive statements on some aspects of the output. The wellknown fact that tuples of LCG generated numbers form lattices (see page §) is one example.
Especially the discrepancy has been the subject of analytical study. There are numerous estimates and bounds for various generators.
With all the mathematical discussions about the merits of PRN generated by a new algorithm one should not forget the fact that we need to actually implement this algorithm on a real computer. There are a few things which should be noted here:
Implementing (i.e. programming) an algorithm is usually a onetime investment of eﬀort. Once the code is there, integrating it into a larger project is more or less trivial. What are the diﬃculties in implementing a typical pseudorandom number generation algorithm ? As we will see later in Chapter 5, the main problem lies in the handling of large integers and performing standard mathematical operations like addition and multiplication on them. For inversive generators ﬁnding the multiplicative inverse in ${\mathbb{Z}}_{p}^{\ast}$ is a required operation, too.
The execution of any algorithm requires both CPU and memory resources. Typical PRNG (EICG, LCG, ICG, …) have only very small memory requirements. The code is very compact and the state information does only require a few bytes.
As far as CPU consumption is concerned, a PC with Intel 486DX266 processor is capable of executing the EICG algorithm about 70000 times in a second. The author’s implementation of the LCG runs at about 400000 PRN per second. The highly optimized system pseudorandom number generator runs at over 700000 calls per second. These numbers are only provided to give a rough feeling for the speed of the algorithms when using a modulus in the range of ${2}^{31}$.
As the generation of the PRN is usually performed on demand on the same computer as the stochastic simulation for which they are used, they compete for the same resources. The total running time for the simulation can be described as the sum of the time used for the PRN generation plus the time used for doing the actual calculations. The latter often dominates the former, thus it does not make sense to try to gain overall speed by sacriﬁcing quality in the PRN algorithm.
After implementing the algorithm one has to ﬁnd good parameters for that generator, too. Fortunately, for some common PRNG tables containing suitable parameters have been published [29, 43, 2, 83], so there is no need to reinvent the wheel there. Other generators like the EICG are known to be rather insensitive to the choice of the parameters.
Another aspect is the possibility to generate independent streams of pseudorandom numbers. Such streams are needed for parallel or vectorized computing. See [55, §8], [1], [12], and [36] for more information on this topic.
There is no shortage on proposed pseudorandom number generation algorithms. Every year new ideas on this topic are published, but only if the resulting PRN have been subject to intensive theoretical and empirical study the generator might have a chance to get used in a real world problem. As it is often the case with competing inventions, an objective technological superiority does not immediately lead to market domination. Whether the generator is included in standard programming libraries seems to be much more important than any published results on the distribution properties of the numbers. A classic example is the now infamous RANDU generator which was included in IBM’s Fortran library and features an extremely poor distribution of triples composed of subsequent numbers.
The following list introduces some of the most commonly used generators as well as the inversive generators on which we will focus in this thesis. More complete surveys on the current menagerie of PRNG can be found in [70, 72, 55].
In the following $M$ denotes a positive integer (termed modulus) and ${\mathbb{Z}}_{M}=\left\{0,1,\dots ,M1\right\}$ represents the system of all residues modulo $M$. With the addition and multiplication modulo $M$ the set ${\mathbb{Z}}_{M}$ acquires the algebraic structure of a ﬁnite ring. If the context makes it clear that we operate in the ring $\left({\mathbb{Z}}_{M},+,\cdot \right)$ we will omit the trailing “mod”.
Deﬁnition 1.1 Let $a,b,{y}_{0}\in {\mathbb{Z}}_{M}$. The linear congruential generator (abbreviated as “LCG”) with parameters $M,a,b,$ and ${y}_{0}$ deﬁnes a sequence ${\left({y}_{n}\right)}_{n\ge 0}$ in ${\mathbb{Z}}_{M}$ by
and a sequence ${\left({x}_{n}\right)}_{n\ge 0}$ of pseudorandom numbers in $[0,1[$ by
As the sequence $lcg\left(p,a,b,{y}_{0}\right)={\left({y}_{n}\right)}_{n\ge 0}$ is deﬁned by a recursion of order one on a ﬁnite set it must be periodic. The longest possible period length is $M$ in the case of $b\ne 0$ and $M1$ in the case of $b=0$. The necessary conditions for achieving these period lengths are well known. [70, p. 169]
The LCG is very popular. Its implementation is quite simple, especially if $M$ is chosen as 2 to the power of bits per native word of the computer (e.g. ${2}^{32}$) which reduces the modulo operations to just ignoring the overﬂow. Due to its simplicity and popularity the LCG has been subjected to intensive analytical and empirical examination. The quality of the resulting PRN depends very much on the choice of the parameters $M,a,$ and $b$. Fortunately, tables containing good parameters have been published, see [29, 27, 53].
The output of a LCG shows a strong intrinsic structure ([65], see also p. §). A number of modiﬁcations were proposed to improve the quality of the generator. One approach is to extend the recursion to higher orders by making ${y}_{n}$ a function of ${y}_{n1},\dots ,{y}_{nr}$. Other proposals modify the function which describes the recursion. As the name says, the LCG uses the linear function $f\left({y}_{n}\right)=a\cdot {y}_{n1}+b\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}M\right)$ to calculate ${y}_{n}$ from ${y}_{n1}$. If we replace $f$ by an arbitrary function, we refer to the resulting PRNG as a general ﬁrstorder congruential generator [70, p. 177]. In order to guarantee maximal period length, the function $f$ must be carefully selected. For example, the quadratic congruential method, as proposed by Knuth in [50, §3.2.2] uses a polynomial of degree 2 as the recursion and a power of 2 as the modulus. See [70, p. 181f] for the conditions on the parameters and analytical investigation on the resulting PRN.
Shiftregister generators diﬀer from standard linear congruential generators in two respects. First, they use a higherorder linear recursion of the form
$${y}_{n+k}\equiv \sum _{h=0}^{k1}{a}_{h}{y}_{n+h}\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}M\right)\phantom{\rule{1em}{0ex}}\left(n\ge 0\right)$$  (1.1) 
where $M\ge 2$ is the modulus, $k\ge 1$ is the order of the recursion and ${a}_{0},\dots ,{a}_{k1}$ are elements of ${\mathbb{Z}}_{M}$. Second, instead of just scaling the ${y}_{n}$ to the unity interval to get the pseudorandom numbers, the ${x}_{n}$ are calculated from a block of consecutive values ${y}_{n},\dots ,{y}_{n+m}$. Thus it is no longer necessary to use a large modulus to get a decent resolution of the resulting PRN. In order to simplify and optimize the implementation of recursion, the common choice of $M$ is the prime 2. On a $L$bit computer this allows the grouping of $L$ steps into one operation.
Two techniques for the transformation of the sequence ${\left({y}_{n}\right)}_{n\ge 0}$ into a sequence of pseudorandom numbers in $[0,1[$ are commonly used: The digital multistep method puts
$${x}_{n}=\sum _{j=1}^{m}{y}_{mn+j1}{p}^{j}\in [0,1[\phantom{\rule{1em}{0ex}}\left(n\ge 0\right).$$  (1.2) 
The Tausworthe generator [81] is a special case of this method.
More popular is the generalized feedback shiftregister method (GSFR) which can take advantage of the above mentioned blocking of $L$ bits if ${h}_{1},\dots ,{h}_{m}\ge 0$ are selected suitably:
$${x}_{n}=\sum _{j=1}^{m}{y}_{n+{h}_{j}}{p}^{j}\in [0,1[\phantom{\rule{1em}{0ex}}\left(n\ge 0\right).$$  (1.3) 
If the parameters are carefully selected the period length will in both cases be $per\left({x}_{n}\right)={p}^{k}1$.
Shift register pseudorandom numbers have the advantage of a fast generation algorithm and a period length independent of the limitations of the integers used for the calculation. See [70, Chapter 9] and [55] for a discussion on the properties of shift register pseudorandom numbers.
A promising modiﬁcation of the LCG was proposed by Eichenauer and Lehn in [14]. We will only consider the case of a prime modulus $p=M$ here. It involves the operation of modular inversion in ${\mathbb{Z}}_{p}$ which we will denote by an overline ($\overline{c}$).
$$\overline{c}=\left\{\begin{array}{cc}{c}^{1}\hfill & \text{for}c\in {\mathbb{Z}}_{p},c\ne 0\hfill \\ 0\hfill & \text{for}c=0\hfill \end{array}\right.$$  (1.4) 
The restriction to prime moduli guarantees the unique existence of an inversive element in ${\mathbb{Z}}_{p}$. This deﬁnition implies $c\overline{c}\equiv 1\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)$ for $c\ne 0$.
Deﬁnition 1.2 Let $p$ be a (large) prime and $a,b,{y}_{0}\in {\mathbb{Z}}_{p}$. The inversive congruential generator (abbreviated as “ICG”) with parameters $p,a,b,$ and ${y}_{0}$ deﬁnes a sequence ${\left({y}_{n}\right)}_{n\ge 0}$ in ${\mathbb{Z}}_{p}$ by
and a sequence ${\left({x}_{n}\right)}_{n\ge 0}$ of pseudorandom numbers in $[0,1[$ by
Empirical as well as analytical investigations indicate that the output of an ICG is superior to the output of a LCG in several respects: longer usable sample sizes [85, 61], less correlations between consecutive numbers [72].
Analytical calculations have led to the following observation: We can describe the generator as a function mapping $n$ to ${y}_{n}$. This selfmap $n\mapsto {y}_{n}$ in the ﬁnite ﬁeld ${\mathbb{Z}}_{p}$ can be written as a uniquely deﬁned polynomial $g$ with degree $d<p$. If we demand the sequence ${\left({y}_{n}\right)}_{n\ge 0}$ to have the maximal possible period length $p$, the polynomial $g$ maps ${\mathbb{Z}}_{p}$ onto itself and thus must be a permutation polynomial, which is either linear ($d=1$) or satisﬁes $3\le d\le p2$ according to [64, Cor. 7.5]. It turns out that the degree $d$ plays an important role in the analytical examination of the generator in a sense that a higher degree seems to indicate better distribution properties [70, Theorems 8.2, 8.3] (see p. §). The theorem of EulerFermat tells us that evaluating ${c}^{p2}$ corresponds to the calculation of the multiplicative inverse. In this spirit, the deﬁnition^{2} of the EICG seems quite natural:
Deﬁnition 1.3 (EichenauerHerrmann [15]) Let $p$ be a (large) prime and $a,b,{n}_{0}\in {\mathbb{Z}}_{p}$. The explicit inversive congruential generator (abbreviated as “EICG”) with parameters $p,a,b,$ and ${n}_{0}$ deﬁnes a sequence ${\left({y}_{n}\right)}_{n\ge 0}$ in ${\mathbb{Z}}_{p}$ by
and a sequence ${\left({x}_{n}\right)}_{n\ge 0}$ of pseudorandom numbers in $[0,1[$ by
As long as $a\ne 0$ this generator will always have period length $p$. Once again analytical and empirical investigations have shown that the output of this generator is superior to that of an LCG. This will be the generator on which we will focus our attention in this thesis. The other generators mainly serve as a reference against which the EICG must compete.
Two variations of the basic explicit inversive congruential generator have been proposed. Both proposals substitute the prime modulus $p$ with $M={2}^{\omega}\phantom{\rule{1em}{0ex}}\left(\omega \ge 4\right)$. In the set ${\mathbb{Z}}_{M}$ we can deﬁne the modular inversion only for odd integers. This inversion is once again deﬁned by $c\overline{c}=1\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}M\right)$ for all odd $c$.
Deﬁnition 1.4 (EichenauerHerrmann and Ickstadt [22]) Let $M$ be a power of 2, and $a,b,{n}_{0}\in {\mathbb{Z}}_{M}$ with $a\equiv 2mod4$ and $b\equiv 1mod2$. The explicit inversive congruential generator with power of two modulus with parameters $p,a,b,$ and ${n}_{0}$ deﬁnes a sequence ${\left({y}_{n}\right)}_{n\ge 0}$ in ${\mathbb{Z}}_{M}$ by
and a sequence ${\left({x}_{n}\right)}_{n\ge 0}$ of pseudorandom numbers in $[0,1[$ by
The conditions on $a$ and $b$ guarantee that the sequence ${x}_{0},{x}_{1},\dots $ is purely periodic with period $M\u22152$. While powers of 2 as modulus have certain advantages for the implementation of the generator, all theoretical investigations [22, 16] on the quality of the resulting numbers have concluded that this generator is inferior to the original EICG.
In order to achieve a period length of $M$, EichenauerHerrmann [17] proposed the following generator:
Deﬁnition 1.5 Let $M$ be a power of 2, and $a,b,{n}_{0}\in {\mathbb{Z}}_{M}$ with $a\equiv 2mod4$ and $b\equiv 1mod2$. The modiﬁed explicit inversive congruential with parameters $p,a,b,$ and ${n}_{0}$ deﬁnes a sequence ${\left({y}_{n}\right)}_{n\ge 0}$ in ${\mathbb{Z}}_{M}$ by
and a sequence ${\left({x}_{n}\right)}_{n\ge 0}$ of pseudorandom numbers in $[0,1[$ by
Although this modiﬁcation does indeed increase the period length to $M$, the theoretically derived properties of the resulting numbers are still inferior to the original EICG.
An interesting metagenerator is the compound method. This is a very simple and eﬀective way to combine several streams of PRN into one single sequence with (hopefully) superior properties. It works as follows: For $1\le j\le r$ let ${x}_{0}^{\left(j\right)},{x}_{1}^{\left(j\right)},{x}_{2}^{\left(j\right)},\dots $ be a purely periodic sequence of pseudorandom numbers. Then we get the compound sequence ${x}_{0},{x}_{1},\dots $ by
If the subsequences are purely periodic with distinct period $per\left({x}_{n}^{\left(j\right)}\right)={p}_{j}$, then we have $per\left({x}_{n}\right)={\prod}_{j=1}^{r}{p}_{j}$.
This compound method extends the wellknown approach of Wichmann and Hill [88]. The properties of the resulting sequence has been subject to a number of publications; we refer to Niederreiter [72, 4.2] for all the references. Generally speaking, the compound method preserves the basic properties of the underlying generators.
Examining what we mean by random numbers will help us to understand the diﬃculties in generating pseudorandom numbers and interpreting test results. We will look at how we all intuitively deal with supposedly random sequences, and touch upon the mathematical treatment of the subject. Regrettably, we will not be able to comprehensively cover this topic, thus we will focus on the subject of testing (ﬁnite) sequences of PRN.
First of all, we want to take a closer look at the intuitive notion of randomness. For one, we all intuitively assign probabilities to various events we encounter, from such mundane things like which side a dropped slice of bread will land on, everyday events like rainfall, the number of red traﬃc lights encountered, or friends met in the bus, to explicitly random events like the outcome of a dice or the weekly lottery.
But how do we come to the conclusion that one of these events is somehow random ? What are the criteria for that decision ? In some of the example above the decision is easy as we know about the process which leads to the outcome. Watching the dice being cast properly is a sure way to convince oneself that the outcome is indeed truly random. But how do we proceed when we cannot look behind the scenes, when the sequence of outcomes is the only information we have got ?
The human mind has remarkable capabilities to spot regularities in a sequence of events. If it fails to notice anything suspicious it will declare the sequence to be random.
Let’s test this notion on the most widely used source of random numbers, the dice. A dice is supposed to select one of the numbers $1,\dots ,6$ in a fair and independent fashion each time it is cast. In the following list we will argue on the merits of a few possible outcomes.^{1}
Did you see the one big fault in this sequence of wouldbe random sequences ? We did not notice it because we looked only at single sequences. Can you ﬁnd it now ?^{2}
Let us summarize the arguments:
If we argue about the “randomness” of a given sequence we try to ﬁnd reasons for rejecting it as random. There seems to be no way of asserting a sequence to be random, it is only the absence of arguments to the contrary that will lead to conﬁdence in the sequence. The proper formulation in the language of statistics is the following: The null hypothesis is always to assume the sequence was indeed generated by a random process with well known statistical properties. As we will see later, it is not possible to reverse the problem and regard the nonrandomness as the null hypothesis.
Short sequences are likely to contain some sort of perceived regularity, thus it is hard to reject such a sequence based on a suspicious pattern. If the sequence is long enough to check if the pattern continues to appear in it, one can try to determine if the pattern is part of some systematic fault or just coincidence.
Just when we thought we have found a sequence which does not exhibit the patterns we have found in all the previous faulty ones, it turns out that there is a diﬀerent kind of regularity in it. Somehow this is just like the trick question for the ﬁrst natural number without any special properties. If such a number existed, the very fact would make it special, thus there can be no such number. We almost get the same feeling when we examine sequences for their nonconspicuousness. As there are so many ways a sequence can exhibit a pattern, a complete absence of patterns is just as conspicuous as any weak regularity.
Furthermore, it is worth pondering if there are not so many patterns that all sequences will exhibit one. We will take a closer mathematical look at this question later.
If the “random” sequence exhibits exactly the expected distribution this will cause suspicion, too. A random sequence is supposed to deviate from its distribution. The common measure for this is the variation. A sequence with a perfect distribution will fail to have the same variation a random sequence is supposed to have.
As the variation can be viewed as just another test statistic, it, too, should vary in a certain way. From that point of view, a constant and perfect variation is just as suspicious as a constant and perfect distribution. This reasoning leads to the demand that not only the distribution of the numbers should be as wanted, but also that the empirical higher moments should be close to the values predicted by probability theory.
Now that we have examined what we intuitively mean by saying “This sequence looks random.” we can try to formalize this notion and develop a set of properties we want to check if we have to judge a sequence and its generating algorithm. The goal in this formalisation is to be able to delegate the testing to computer programs. As computers are known to be very bad at spotting patterns, it will not be an easy undertaking to ﬁnd an algorithm which does as good as the human mind. We can only hope that all systematic faults in the sequence will eventually cause a suspicious behaviour of the sequence in a generic test.
In the following we abandon the dice as the example, and turn to uniformly distributed numbers in the interval $[0,1[$.
The ﬁrst step in testing a sequence is usually to test its distribution characteristics. That is, are the numbers equally spread over $[0,1[$ ?
In order to test the (empirical) distribution one partitions the interval $[0,1[$ in sets ${A}_{i}$ and compares the number of hits in each set to the size (measure) of that interval.
In the discrete case this can be done by simply counting how often each possible value appears in the sequence. If the counts diﬀer signiﬁcantly, the distribution property of the sequence is inadequate.
In order to keep the problem manageable in the case of a huge number of possible outcomes and in the continuous case, the bins (i.e. the ${A}_{i}$) used for counting will cover more than one outcome.
The layout of the partition is a crucial part of the test: If the ${A}_{i}$ are simple intervals the test will measure the overall distribution of the sequence. But the ${A}_{i}$ could be the union of a set of small intervals, in which case the test targets irregularities in the ﬁne structure of the sequence.
Once we have ﬁnished the counting process we need some mathematically justiﬁed criteria for interpreting the diﬀerence between the number of hits in each ${A}_{i}$ and the expected count. There are a number of possible algorithms for this, the most popular of which are the ${\chi}^{2}$test and the KolmogorovSmirnov test (often abbreviated as KStest). The former uses a test statistic based on the diﬀerence between expected and actual count in each bin, whereas the latter compares the empirical distribution function of the counts to the expected one.
It should be clear that any numbers in a deterministically generated and thus reproducible sequence are trivially correlated. Therefore it makes no sense to look for such intricate dependencies like the generation rule in the sequence. We will restrict our search to much simpler correlations, which makes additional sense because that will be the only kind of correlations we can hope to ﬁnd with the limited capabilities of a computer program. There are two approaches to this:
Tests for special correlations check if the sequence exhibits a given kind of regularity. An often used example is the runtest which measures the frequency of ascending or decreasing parts in the sequences. The distribution of these runs in random sequences is known, making it possible to judge the sequence with respect to this type of correlation.
The serial test is a more general way of examining a sequence. It transforms the problem of testing for correlation to the problem of testing for equidistribution by looking at tuples composed of elements from the sequence. The size of each tuple is called the dimension $s\ge 2$ of the test. Common tests use either overlapping tuples deﬁned as ${x}_{n}:=\left({x}_{n},{x}_{n+1},\dots ,{x}_{n+s1}\right)$, or nonoverlapping tuples deﬁned as ${x}_{n}:=\left({x}_{sn},{x}_{sn+1},\dots ,{x}_{sn+s1}\right)$. If there are no correlations in the original sequence the $s$tuples are equidistributed in the unit cube of dimension $s$, which can be checked using the techniques outlined above.
To illustrate this, let us examine the sequence $\left\{1,4,2,6,2,3,4,1,5,3,2,5\right\}$ with the serial test of dimension 2.
As you can see, the fact that large and small numbers alternate causes a signiﬁcant deviation from the equidistribution of the points.
For a number of generators it is possible to derive analytical bounds for the deviation from the equidistribution of $s$dimensional tuples as measured by the discrepancy.
If one does not restrict oneself to form tuples out of consecutive numbers, the resulting test will be able to ﬁnd more subtle kinds of correlations without resorting to high dimensions $s$. While this modiﬁcation hardly changes the empirical testing, only in the case of the EICG analytical bounds have been derived for this generalized serial test.
Now that we have clariﬁed the intuitive understanding of the concept of testing pseudorandom sequences, we will turn to the mathematical treatment of the subject. Rather than providing a full scale discussion of the mathematical objects and formalisms involved, which would exceed the scope of this thesis, we want to present an introduction targeted at the mathematical layman. Our aim in this section is to introduce as much of relevant concepts as is necessary to be able to explain the problems one faces when testing pseudorandom numbers and comparing PRNG. We refer to [85] for an indepth discussion.
There is more than one mathematical approach to this topic. The following list tries to introduce the diﬀerent viewpoints and gives references for further reading.
In our context, this branch of mathematics focuses on the equidistribution of a sequence of numbers.
Various measures for the quality of the equidistribution were developed over the years, of all these numbers, the discrepancy is the most common.
Theorems on the equidistribution usually deal with inﬁnite sequences, thus they are not particularly useful in conjecture with ﬁnite (or periodic) sequences. For example, equidistribution of a sequence can be deﬁned in terms of the discrepancy in the following way:
This approach targets the complexity and information content of the sequence in question. One of the possible measurements is the minimal size of a computer a program (or a Turing machine) which can reproduce the sequence. In the optimal case, the program code will have to explicitly contain the sequence in order to print it. Any possible shortcuts the program can use (like exploiting dependencies) will be a measure for the lack of randomness of the sequence.
Since all our sequences are generated by short programs, they apriori fail this test. Thus we will not consider this notion in our tests.
A similar approach is to focus on the amount of information contained in the sequence. If the entropy is high enough, we will accept the sequence as a good approximation of random numbers. Another way to express this notion is to state that the sequence is not compressible.
Testing whether a sequence is compressible is not easy since all common implementations cannot achieve the theoretically possible compression. Only really bad PRN can be eliminated with programs like gzip or compress. Extending the capabilities of these programs (for example enlarging the range of the pattern search in gzip) might be a way to get a workable test. As far as we know, nobody has tried this yet.
For larger sequences, the distinction between these ideas start to blur, as the size of the information needed to transform one representation into the other becomes irrelevant.
A sequence of PRN can be used to construct a stream cipher. If “true random” numbers are used, this cipher is called the onetime pad and is provably secure. So it is natural to ask what properties the PRN must have to achieve a good level of security.
According to Rueppel [76] there are several approaches to the construction of a secure stream cipher: The informationtheoretic approach considers the possibility in principle to derive the seed (i.e. the key) from an observation of the PRN, the systemtheoretic approach tries to make breaking the cipher at least as hard as solving known “hard” problems like factoring or the discrete logarithm. The complexitytheoretic approach tries to make sure that the amount of work needed to break the cipher is of nonpolynomial complexity, randomized stream ciphers increase the magnitude of the codebreaker’s problem by utilizing a public pool of random numbers.
The basic idea of statistical testing can be summarized as follows: From a sample of supposedly random numbers a function called test statistic is calculated. As the distribution of this function is known for the case of real random numbers (otherwise the test does not make sense), one can determine which kind of results are extremely unlikely to occur. Typically this is formulated as intervals in the domain of the test statistic. These intervals (usually called critical region) are selected in a way that the probability that real random numbers lead to a test statistic there is smaller than the level of signiﬁcance (usually 0.05, 0.01 or 0.005). If now for a sample of PRNG the test statistic falls into the critical region the common inference is to reject the sample.
All common tests rely on the idea of statistical testing. In the following we will try to elaborate on the motivation behind these tests, their mathematical foundation, their power and limitations, and how to interpret their results.
First of all, let us take a closer look at what we want to simulate. Our target are sequences of random numbers, which are realisations of a sequence of independent, uniformly distributed random variables.
Random variables (RVs) are one of the main building blocks in probability theory. They are used to assign each possible outcome (or, to be more exact, each reasonable set of outcomes) of an experiment a real number which is interpreted as the probability of this outcome.
But strictly speaking, the mathematical concept of RVs does not explicitly reﬂect our intuitive ideas about randomness of events, on the contrary: RVs are just simple, ordinary functions. One is tempted to ascribe mythical powers to RVs, like the ability to randomly select one of a set of possible events. This is not true, they only describe certain aspects of an idealized system which ﬂips the metaphorical coin.
So where is the link between the mathematical world of RVs and the real life world of roulette tables ? Unfortunately there is none for single events. Even if a RV does in fact model a real world event, hardly any conclusions can be made about the outcome of the next single event. Even such unlikely events as winning the jackpot in a lottery do happen every now and then, and most people are not deterred by the extremely bad odds from playing every week. On the other hand some people are scared of travelling by plane because the probability of a safe ﬂight is marginally less than one. In both cases our experience tells us that the probability alone cannot predict the next outcome.
But even such pretty deﬁnite sounding statements like “this event will occur with probability 1” cannot guarantee the outcome of an event. More insight into measure theory will tell us why such strange things can happen. For example, the probability that the next realisation of an $U([0,1\left[\right)$distributed random variable will be a rational number is zero. This does not stop the real world from delivering one of the inﬁnite number of rational numbers, thus rendering the statement “This experiment will only return irrational numbers” incorrect.
We have seen that a RV cannot make concrete statements about a single outcome, so we might ask what statements about outcomes it can make at all. One way to formulate the meaning of probability is the following: [85, p. 10]
The probability assigned to an event expresses the expected average rate of occurrences of the event in an unlinked sequence of experiments.
We need to elaborate on two aspects of this deﬁnition as they are not as strict and unambiguous as commonly demanded from a good deﬁnition.
First, what do we mean by “expected” ? That seems to indicate that probability cannot be an intrinsic property on an event. There is no mathematically satisfying way to assign a probability to an event based on a (ﬁnite^{3}) set of measurements, as it is extremely unlikely that another set of experiments will result in the same value. The common way out is to make assumptions about some parts of the experiment, like the Laplace assumption which assigns the same probability to all underlying events. These assumptions are based on a mental model of that event which includes a theory on how often something should occur. It is the mathematician, the physicist or just some observer who forms a mental model based on experiences or consideration. Such simplifying mental models of the real world are ubiquitous as they provide an essential simpliﬁcation in the way we view the world. Other such simpliﬁcations include the concept of rigid bodies, ﬂuids, or gases which are abstractions of “a bunch of molecules tied together by various forces”. Just as the laws of leverage rely in their formulation on the concept of forces and of rigid bodies the laws of chance depend on the concept of probability assigned to events.
The other critical word in the above deﬁnition is “unlinked”. By unliked we mean that the outcome of one experiment does not inﬂuence the outcome of any subsequent experiment. Common examples for unlinked experiments include drawing balls from an urn (with putting them back in !), casting a dice, or the roulette wheel. Please note that in all these examples there is a connection between two successive experiments as the ﬁrst one does inﬂuences the second. It is a conscious decision by the observer that the reshuﬄing of the balls in the urn caused by the ﬁrst experiment does not aﬀect the probability in the second one. This sounds almost like a paradox, as the reshuﬄing surely does eﬀect the outcome. But remember, just above we noted that the probability of an event does not determine the next outcome at all, so there is not contradiction here.
We have to be careful with sequences of PRN and their relation to independent random variables, too. The concept of independence is based on the concept of distributions. As we cannot ascribe distributions to numbers, we cannot use the term “independence” for sequences of PRN. We will use the word correlations to refer to any unwanted relationship between elements in the sequence.
As described above, the theory of random variables and probability tries to model aspects of the physical world. The fundamental principles of science demand justiﬁcation in form of experiments for all such theories. For typical physical models such experiments are usually easy to set up and follow the same scheme of comparing an expected (calculated) result to the measurements of the actual physical event. If they diﬀer more than inaccuracies in the measurements would allow, the theory is proven to be wrong. Philosophy of Science tells us that it is impossible to positively prove a theory.
Do the same principles hold for conjectures in the ﬁeld of probability, too ? Unfortunately, they do not. Let us illustrate this with an example:
As a theory to test we might take the assumption that a given coin is fair, meaning that the probability it lands with the heads side up is $1\u22152$. How might an experiment designed to test this hypothesis look like ? Surely it will involve throwing the coin a number of times and then comparing the result to the prediction. Calculating the prediction based on the theory is simple, unfortunately the prediction assigns each possible outcome a positive probability. Thus regardless of the behaviour of the coin the result is consistent with the theory, as we cannot rule out the measured result. If we have no way to reject a theory, we have to ﬁnd a diﬀerent set of criteria according to which we can justify theories.
The common way out is statistical testing. It should be clear that statistical testing can never be as strict as testing in other areas. It is a heuristic approach to the problem. As such, it relies on the good judgement of the tester and is not objective. But before we elaborate on the shortcomings of statistical testing let us summarize the basic procedure again, already using the test for randomness as the example.
The alternative hypothesis ${H}_{1}$ states that ${H}_{0}$ is not true.
This is the basic outline of all common empirical tests. We will discuss a few tests and their results later in this thesis. So what are the weak spots in this method of testing pseudorandom numbers ?
First of all, when testing sequences of numbers generated by a PRNG basic premises of statistical testing are violated.
The theory behind statistical testing assumes that we actually deal with random events. It is thus a circular argument to conclude from the result of such a test that the numbers in question are “random”. Only their statistical properties that are subject to the test, not the basic premise that the concept of random variables is a valid model for that experiment.
It is therefore not correct to speak of “statistical testing” with respect to PRN testing. A more appropriate term is “numerical testing” as the test examines only a numerical property of a ﬁxed set of numbers.
The only “statistical” part of the test is the calculation what numerical values for the test statistic are considered to be good and which are considered to be bad.
The test statistic determines which aspect of the numbers we want to test. Dividing $[0,1[$ into the intervals $\left[\frac{i}{n},\frac{i+1}{n}\right[$ for $0\le i<n$, counting the hits in each interval, and then calculating the ${\chi}^{2}$ statistic is a straightforward test statistic which aims the the overall equidistribution of the pseudorandom numbers in $[0,1[$. The choice of the bins (in this case intervals) seems to be a natural one.
But what bins should we use to measure the ﬁner aspects of equidistribution ? We could use just a large value for $n$ and keep the intervals, but that would cause a problem with the validity of the ${\chi}^{2}$ approximation as the number of hits per bins decreases. An other option is to use something like this: Deﬁne bin $i$ as $\{x\in [0,1\left[:\lfloor x\cdot k\rfloor \equiv i\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}n\right)\right\}$ for some suitable values for $k$ and $n$. Then the bins are no longer simple intervals, but sets of intervals that are spread over the unity interval. The value for $k$ determines the width of each component interval.
Is there a natural choice for $k$ ? We do not think so. But the choice can be important as the result of the test depends on it. Consider for example the set of numbers deﬁned by $\left\{i\u2215m\mid 0\le i<m\right\}$ which is perfectly equidistributed in$[0,1[$. For certain relations between $k$, $n$ and $m$, like $k\mid m\cdot n$, the test will result in extremely bad ${\chi}^{2}$ statistics. As an example, consider the case $k=n\cdot m$ where all numbers $i\u2215m$ fall into the same bin. For other values of $k$, this set of PRN will exhibit no weakness in this test.
We have seen that even such simple changes to the test statistic, like modifying the width of stripes, can completely change the result of the test. One can imagine that completely diﬀerent layouts of bins will lead to a great variance in the test results, too. Thus we have to keep in mind that the choice of the test statistic, and thus to some extent the result of the test, is arbitrary.
One consequence of this fact is that we cannot declare one sequence of PRN to be better than a second one just because we found a test where the ﬁrst one rates better, as a slightly modiﬁed test might produce exactly the opposite result.
What we strive for are numbers who behave in most respects like realisations of $U([0,1\left[\right)$distributed RVs, so it is a natural question how such ideal random numbers will perform in our statistical test. The answer to this is quite simple: If we conduct a test at the signiﬁcance level $\alpha $ then a sequence of random numbers will fail the test with probability $\alpha $. If we have set $\alpha =0.01$ then we can expect a failure about once every hundred tries.
As PRN should model all aspects of real random numbers, they should fail statistical tests at about the same rate.^{4}
It is therefore not advisable to outright reject a sequence of PRN based on its failure in single tests.
Classic statistical tests examine if the test statistic does not deviate from its expected value too much. If we are only interested in the expected outcome of a similar simulation problem, such onelevel statistical tests are all we need in order to be conﬁdent about the accuracy of the simulation.
On the other hand we might be interested in the distribution of the simulation’s outcome. For this goal hitting the expected value is not enough, the variance of the result is now important, too. Thus we will demand the same behaviour from the test statistic, too.
Let us illustrate this principle with an example. We want to test the well known strategy of doubling the ante in a game of roulette. It is supposed to guarantee winning the initial ante and works like this: If we do not win in the ﬁrst round (and therefore win twice the ante) the ante is doubled for the next round. If this round is won, we get back four times the initial ante while we invested three times the initial ante resulting in a net win of one ante. In case of bad luck we double the ante again hoping for eight times the ante for an investment of seven. As we hope that we will ﬁnally win before our capital is drained a net win seems to be certain.
In order to simulate this we need random numbers to determine whether we will win the current bet. The probabilities are $18\u221537$ for winning and $19\u221537$ for losing each round, respectively. It seems to be natural to use the lengths of runs as a test statistic to test our source of PRN for its ﬁtness to simulate a real roulette table. The probability that the maximal run length in 500 tries is greater than 15, is smaller than all usual values for $\alpha $, so according to the corresponding statistical test we should reject all sequences where such runs do occur.^{5}
When we now run the simulation with these prescreened sequences we will never ever experience a loss as long as we have enough money for 15 steps of doubling the ante. Thus we should conclude that the strategy works. As we know, this is not true. So what went wrong with our simulation ?
The statistical test considered it equally important whether the sequence in question was “wellbehaved” or not, whereas the simulation assigned completely diﬀerent weights to those cases. Thus the area that the test considered to be insigniﬁcant (smaller than $\alpha $) played a major role in the simulation (more than $1\u22152$).
There are some other cases of simulations where we are not so much interested in the average case, but in the extreme ones. Consider for example all those safety measures in power plants or other machinery where a rare sequence of occurrences might lead to catastrophic results. When simulating these security systems one must not a priori exclude unusual sequences.
Please note that the distinction between level1 and level2 tests (tests which test the distribution of the results of a level1 test) is arbitrary. The test statistic of a level2 test is just another function of the underlying set of PRN, too.
One might be tempted to use a set of statistical test to once and for all declare one generator superior to another generator. Intuitively this makes sense, especially when comparing two generators of the same type. A common use for this heuristic is the selection of optimal parameters for the LCG based on its lattice structure [29, 73].
But is this judgement mathematically justiﬁed ? Leeb explains in [59] that such judgement is not justiﬁed as all possible sequences of PRN of a given ﬁnite length pass exactly the same number of statistical tests.
In order be able to use statistical tests as a criteria for the selection of PRNG the user has to declare which properties he considers important. With this knowledge it is possible to weight the tests and therefore select a suitable generator for this speciﬁc application.
Both statistical tests and simulations use a set of PRN to perform a more or less elaborate calculation.
In a statistical test we draw a conclusion from the right side to left: Based on the diﬀerence between expected and calculated value we judge the quality of the PRN.
A simulation operates the other way round. Based on the (hopefully) known quality of the PRN we hope that the simulation result is correct.
While generic statistical tests can be used for this reasoning, one can increase their value by designing the tests to closely model the simulation. Thus the tests can target exactly those properties in the PRN which will be signiﬁcant in their application.
In order to conclude this chapter on the notion of randomness let us recapitulate what we know about testing a generator, and how we should proceed when we face the task of selecting a generator for a particular simulation problem.
Furthermore, analytical investigations can yield some insight in the overall structure of a generator’s output which can be compared to the properties required in the simulation. The lattice structure of the LCG might be actually useful in QuasiMonte Carlo integration whereas it can be harmful in geometric problems, e.g. the nearestpair test [14, 54].
In this chapter we will discuss analytically derived properties of the explicit inversive congruential generator (EICG). We will use results obtained for the LCG to serve as a reference as the LCG is the most commonly used generator. Let us start by repeating the deﬁnition of the EICG:
$${y}_{n}:=\overline{a\cdot \left({n}_{0}+n\right)+b}\phantom{\rule{2em}{0ex}}\left(n\ge 0\right)$$  (3.1) 
Please remember that we perform all calculations except the ﬁnal scaling in the ﬁnite ﬁeld ${\mathbb{Z}}_{p}=\left\{0,1,2,\dots ,p1\right\}$. ${\mathbb{Z}}_{p}^{\ast}$ will denote the nonzero elements of ${\mathbb{Z}}_{p}$, that is ${\mathbb{Z}}_{p}^{\ast}=\left\{1,2,3,\dots ,p1\right\}$. The overline $\overline{a}$ denotes the multiplicative inversion in ${\mathbb{Z}}_{p}$ for all nonzero elements $a\in {\mathbb{Z}}_{p}^{\ast}$. With the special case $\overline{0}=0$ added, $x\mapsto \overline{x}$ is a bijective function from ${\mathbb{Z}}_{p}$ onto ${\mathbb{Z}}_{p}$. Furthermore, we have $\overline{\overline{x}}=x$ and $\overline{x}={x}^{p2}$ for all $x\in {\mathbb{Z}}_{p}$. The latter identity is due to Fermat’s Little Theorem.
Note that from the explicit deﬁnition of the sequence ${\left({y}_{n}\right)}_{n\ge 0}$ we can easily derive a recursive description:
$$\begin{array}{rcll}{y}_{0}& =& \overline{a\cdot {n}_{0}+b}& \text{}\\ {y}_{n+1}& =& \overline{\overline{{y}_{n}}+a}\phantom{\rule{2em}{0ex}}\phantom{\rule{2em}{0ex}}\left(n\ge 0\right)& \text{}\end{array}$$ $$\begin{array}{rcll}& & & \text{(3.2)}\text{}\text{}\end{array}$$
In order to achieve maximal period length $p$, the parameters $a$,$b$, and ${n}_{0}$ can be freely chosen from ${\mathbb{Z}}_{p}$ as long as $p$ is prime and $a\ne 0$. To see this, consider the function $f\left(n\right):=\overline{a\cdot \left({n}_{0}+n\right)+b}$ which is composed of bijective functions in ${\mathbb{Z}}_{p}$ and thus is bijective, too. As $n+p=n$ in ${\mathbb{Z}}_{p}$ we have $f\left(n+p\right)=f\left(p\right)$ for all $n$, thus the sequence ${\left({y}_{n}\right)}_{n=0}^{\infty}$ is purely periodic with period length $p$.
We will only consider full period generators, that is $a\ne 0$.
We will write $eicg\left(p,a,b,{n}_{0}\right)$ to denote the output of a particular instance of the EICG method. Unlike Leeb [59, p. 89] we mean the whole inﬁnite (but periodic) sequence, and not just the ﬁrst $p$ numbers. This way, no special treatment of wraparounds is needed when considering subsequences.
The choice of parameters is simple for an EICG, but not all choices will lead to completely diﬀerent pseudorandom numbers. In this section we will examine the relations between EICGs with the same modulus, but diﬀerent parameters $a$, $b$, and ${n}_{0}$.
These results are helpful for the implementation, as one can eliminate an addition modulo $p$, as well as to the theoretical investigation as they provide a very elegant way to describe substreams. We will elaborate on this idea which is due to Niederreiter [71, p. 5] later on.
First of all, let us make a rather trivial observation on the role of the parameter ${n}_{0}$.
Observation 3.1 Let ${\left({y}_{n}\right)}_{n\ge 0}=eicg\left(p,a,b,0\right)$. Then we can write the sequence $eicg\left(p,a,b,{n}_{0}\right)$ as ${\left({y}_{n}\right)}_{n\ge {n}_{0}}$. In other words, the second sequence is ﬁrst one shifted by ${n}_{0}$.
Proof: This relation follows from the fact that $n$ and ${n}_{0}$ appear only as their sum $n+{n}_{0}$ in the deﬁnition of the EICG. __
The following observation is taken from Leeb [59, p. 89]; it states that one of the parameters is redundant.
Observation 3.2 Let $p$ be prime and $a\in {\mathbb{Z}}_{p}^{\ast}$ be ﬁxed. Then we have for all $b,{n}_{0}\in {\mathbb{Z}}_{p}$
$$\begin{array}{rcll}eicg\left(p,a,b,0\right)& =& eicg\left(p,a,0,\overline{a}b\right)& \text{(3.3)}\text{}\text{}\\ eicg\left(p,a,0,{n}_{0}\right)& =& eicg\left(p,a,a{n}_{0},0\right)& \text{(3.4)}\text{}\text{}\end{array}$$and
$$eicg\left(p,a,b,{n}_{0}\right)=eicg\left(p,a,0,{n}_{0}+\overline{a}b\right)=eicg\left(p,a,a{n}_{0}+b,0\right).$$  (3.5) 
Proof: We base the proof on the recursive deﬁnition of the EICG. As the recursion does only depend on $a$, which is constant, it is suﬃcient to show that the ${y}_{0}$ of these generators are equal. In the ﬁrst two cases we have
$$\begin{array}{rcll}\overline{a\cdot 0+b}\phantom{\rule{1em}{0ex}}=& {y}_{0}& =\phantom{\rule{1em}{0ex}}\overline{a\cdot \overline{a}b+0}& \text{}\\ \overline{a\cdot {n}_{0}+0}\phantom{\rule{1em}{0ex}}=& {y}_{0}& =\phantom{\rule{1em}{0ex}}\overline{a\cdot 0+a{n}_{0}},& \text{}\end{array}$$and the third equality translates to
_
The third equality can be used to rewrite any EICG as an EICG with $b=0$, but a diﬀerent value for ${n}_{0}$. Thus the generating formula can always be rewritten as
which saves one addition. The addition ${n}_{0}^{\prime}+n$ can be implemented by simply incrementing the previous value modulo $p$, thus we need to perform only one increment, one multiplication, one inversion, and one division to generate the next pseudorandom number.
There is an obvious connection between $eicg\left(p,a,b,k{n}_{0}\right)$ and $eicg\left(p,ka,b,{n}_{0}\right)$, too:^{1}
Observation 3.3 Let $p$
be prime, $a,k\in {\mathbb{Z}}_{p}^{\ast}$,
and $b\in {\mathbb{Z}}_{p}$.
The sequence $eicg\left(p,ka,b,{n}_{0}\right)$
can be obtained by selecting every $k$th
element from the sequence $eicg\left(p,a,b,k{n}_{0}\right)$.
Proof: The sequence generated by taking every $k$th element in the sequence $eicg\left(p,a,b,k{n}_{0}\right)={\left({y}_{n}\right)}_{n\ge 0}$ can be written as ${\left({y}_{kn}\right)}_{n\ge 0}$. We have
and thus
$$\begin{array}{rcll}{y}_{kn}& =& \overline{a\cdot \left(k{n}_{0}+nk\right)+b}& \text{}\\ & =& \overline{ak\cdot \left({n}_{0}+n\right)+b}& \text{}\\ & =& {v}_{n}& \text{}\end{array}$$where ${\left({v}_{n}\right)}_{n\ge 0}=eicg\left(p,ka,b,{n}_{0}\right)$. _
These three observations give us the tools to show that all maximal period EICGs can be derived from the “motherEICG” $eicg\left(p,1,0,0\right)$ in the following way:
Can these insights help us in the theoretical investigation on how samples from an EICG behave under various tests ? Yes, they provide a very convenient and elegant formalism to describe subsequences and various kinds of $s$tuples generated from the stream of pseudorandom numbers. With this formalism, the proofs of discrepancy estimates and nonlinear properties are very concise.
First of all, we do not need to bother with the parameter ${n}_{0}$ in the theoretical investigation as we can always rewrite the EICG to one with ${n}_{0}=0$.
Second, any property of a sequence of EICG numbers, which is valid independently of the parameters used, is immediately valid for subsequences consisting of every $k$th element. One direct consequence of this is, that once we can prove that pairs of consecutive numbers are uncorrelated for all valid parameters, we can rule out the possibility of longrange correlations at critical distances. See [10, 20] for a discussion of such problems inherent to the LCG.
The third gain, due to Niederreiter [71], is to be able to write almost arbitrary $s$tuples formed out of the stream of EICG numbers as parallel streams. Such $s$tuples as usually used to examine the correlation between successive numbers. For example, the overlapping serial test (see page §) tests the equidistribution of the vectors
$${y}_{n}=\left({y}_{n},{y}_{n+1},\dots ,{y}_{n+s1}\right)\in {\mathbb{Z}}_{p}^{s}$$  (3.6) 
for $n=0,1,\dots ,p1$ in order to test the PRN ${\left({y}_{i}\right)}_{i\ge 0}$ for correlations. If we pick the ﬁrst coordinate of each vector we get the original sequence. Picking always the second results in the original sequence shifted by one. According to the above equivalences we can write this shifted sequence as an EICG with the same parameter $a$, ${n}_{0}=0$, and a diﬀerent $b$. Thus we have
$${y}_{n}=\left({y}_{n}^{\left(1\right)},{y}_{n}^{\left(2\right)},\dots ,{y}_{n}^{\left(s\right)}\right)\in {\mathbb{Z}}_{p}^{s}$$  (3.7) 
where ${\left({y}_{n}^{\left(i\right)}\right)}_{n\ge 0}$ is the sequence generated by the EICG $eicg\left(p,a,a\left(i1\right)+b,0\right)$. The obvious generalisation is to allow almost arbitrary EICGs $eicg\left(p,{a}_{i},{b}_{i},0\right)$ for each coordinate.
In the following, we will prove all statements on the behaviour of $s$tuples in terms of these parallel streams. For that, we will need to restrict the possible values for the ${a}_{i}$ and ${b}_{i}$ in order to avoid certain special cases like ${a}_{1}={a}_{2}\phantom{\rule{1em}{0ex}}\wedge \phantom{\rule{1em}{0ex}}{b}_{1}={b}_{2}$. As we will see later in the various proofs, we need the condition $\overline{{a}_{i}}{b}_{i}\ne \overline{{a}_{j}}{b}_{j}$ for all $i\ne j$. Thus we have the following deﬁnition:
Deﬁnition 3.1 (Parallel Streams) Let $p$ be prime, $1\le s<p$, and ${a}_{1},\dots ,{a}_{s}\in {\mathbb{Z}}_{p}^{\ast}$, ${b}_{1},\dots ,{b}_{s}\in {\mathbb{Z}}_{p}$ such that $\overline{{a}_{l}}{b}_{1},\dots ,\overline{{a}_{s}}{b}_{s}\in {\mathbb{Z}}_{p}$ are distinct. Then we put
$${y}_{n}^{\left(i\right)}=\overline{{a}_{i}n+{b}_{i}}\phantom{\rule{2em}{0ex}}\text{for}i=1,2,\dots ,s\text{and}n\ge 0,$$  (3.8) 
and deﬁne a sequence ${\left({y}_{n}\right)}_{n\ge 0}$ in the $s$dimensional aﬃne space over ${\mathbb{Z}}_{p}$ by putting
An interesting special case of parallel streams, which is more general than the overlapping $s$tuples considered above, can be obtained as follows. Choose an integer $m\ge 1$ with $gcd\left(m,p\right)=1$ and integers $0\le {n}_{1}<{n}_{2}<\dots <{n}_{s}<p$ and put
where the ${y}_{n}$ are as in (3.1). This sequence of points in ${\mathbb{Z}}_{p}^{s}$ can be written in terms of parallel streams according to Deﬁnition 3.1 by putting ${a}_{i}=am$ and ${b}_{i}=a{n}_{i}+b$ for $1\le i\le s$. It is easy to show that the $\overline{{a}_{i}}{b}_{i}$ are distinct, thus all results concerning parallel streams are valid for this general method of composing $s$tuples, too.
The nonoverlapping tuples ${y}_{n}:=\left({y}_{sn},{y}_{sn+1},\dots ,{y}_{sn+s1}\right)$ are covered by the concept of parallel streams, too. To see this, set ${a}_{i}=sa$ and ${b}_{i}=a\left(i1\right)+b$ for $i=1,\dots ,s$.
The best known structural property of any pseudorandom number generator is the lattice structure of the LCG. Coveyou/MacPherson [9] and Marsaglia [65] noted ﬁrst that $s$tuples formed from successive LCGnumbers form a lattice in the $s$dimensional cube. Figure 3.1 depicts the lattice formed by plotting overlapping 2tuples for the full period of two “toy” generators. “Production quality” generators exhibit the same structure, you just have to zoom into the image to see the pattern.
The shape of the lattice depends very strongly on the parameters $a$ and $b$ of the LCG. Thus various measurements on the coarseness of the lattice are used to select suitable parameters $a$ and $b$. That way, a weakness of the LCG turns into a strength, as one can guarantee a wellbehaved lattice for low dimensions as long as the parameters are chosen well enough.
Does the EICG exhibit a similar structure ? Figure 3.2 suggests that the EICG does not possess this linear property, although one can see some other regularities. In fact, one can prove a very stringent nonlinearity property for $s$tuples taken from an EICG. The theorem describing this is due to Niederreiter [71].
Theorem 3.1 Let ${y}_{n}=\left({y}_{n}^{\left(1\right)},{y}_{n}^{\left(2\right)},\dots ,{y}_{n}^{\left(s\right)}\right)$ as in Deﬁnition 3.1, then every hyperplane in ${\mathbb{Z}}_{p}^{s}$ contains at most $s$ of the points ${y}_{n}$, $n=0,1,\dots ,p1$, with ${y}_{n}^{\left(1\right)}\cdots {y}_{n}^{\left(s\right)}\ne 0$. If the hyperplane passes through the origin of ${\mathbb{Z}}_{p}^{s}$, then it contains at most $s1$ of these points ${y}_{n}$.
Proof: All calculations in this proof are performed in the ﬁnite ﬁeld ${\mathbb{Z}}_{p}$. Furthermore, remember that according to Deﬁnition 3.1, the $\overline{{a}_{i}}{b}_{i}$ are distinct.
A hyperplane ${H}_{c,{c}_{0}}$ in ${Z}_{p}^{s}$ is uniquely deﬁned by a vector $c=\left({c}_{1},\dots ,{c}_{s}\right)\in {\mathbb{Z}}_{p}^{s}\setminus \left\{0\right\}$ and a scalar ${c}_{0}\in {\mathbb{Z}}_{p}$ as ${H}_{c,{c}_{0}}=\left\{x\in {Z}_{p}^{s}x\cdot c={c}_{0}\right\}$. We restrict our search for points on a hyperplane $n\in W:={\mathbb{Z}}_{p}\setminus \left\{\overline{{a}_{1}}{b}_{1},\dots ,\overline{{a}_{s}}{b}_{s}\right\}$. Thus for $n\in W$ we have according to (3.8) ${y}_{n}^{\left(1\right)},\dots ,{y}_{n}^{\left(s\right)}\ne 0$, therefore we can rewrite the hyperplane equation for ${y}_{n}$ as
By clearing denominators, we see that $n$ is a root of the polynomial
If ${c}_{0}\ne 0$, then $h$ is a nonzero^{2} polynomial of degree $s$ over ${\mathbb{Z}}_{p}$. Since such a polynomial has at most $s$ roots in ${\mathbb{Z}}_{p}$, the hyperplane ${H}_{c,{c}_{0}}$ contains at most $s$ of the ${y}_{n}$ with $n\in \left\{0,1,\dots ,p1\right\}\setminus \left\{\overline{{a}_{1}}{b}_{1},\dots ,\overline{{a}_{s}}{b}_{s}\right\}$.
If ${c}_{0}=0$, that is $0\in {H}_{c,{c}_{0}}$, we get
whose degree is at most $s1$. It remains to show that $h$ is not the zero polynomial. As $c$ is not the zero vector, one of its coordinates is nonzero. For ${c}_{k}\ne 0$ we have
$$\begin{array}{rcll}h\left(\overline{{a}_{k}}{b}_{k}\right)& =& {c}_{k}\prod _{\genfrac{}{}{0ex}{}{i=1}{i\ne k}}^{s}\left({a}_{i}\overline{{a}_{k}}{b}_{k}+{b}_{i}\right)& \text{}\\ & =& {c}_{k}\prod _{\genfrac{}{}{0ex}{}{i=1}{i\ne k}}^{s}{a}_{i}\left(\overline{{a}_{i}}{b}_{i}\overline{{a}_{k}}{b}_{k}\right)& \text{}\\ & \ne & 0,& \text{}\end{array}$$because ${c}_{k}$ is the chosen nonzero coordinate, and the ${a}_{i}$ as well as the $\left(\overline{{a}_{i}}{b}_{i}\overline{{a}_{k}}{b}_{k}\right)$ are nonzero according to the conditions of the theorem. As we have found $h\left(x\right)\ne 0$ for some $x$, $h$ cannot be the zero polynomial. __
This theorem proves that $s$tuples taken from an EICG do not form any linear structure such as a lattice. But that does not mean that no other kind of pattern emerges in plots of pairs of consecutive^{3} numbers. For example, consider Figure 3.3, where one can see a hyperbolalike structure in the upper left and lower right corner. EichenauerHerrmann and Wegenkittl are currently preparing a paper discussing these properties of the EICG.
All the plots so far contained all the overlapping pairs available from the full period of the generator. This way, the underlying structure of the generator is perfectly visible. But usually one does not utilize the full period of any generator; a common rule of thumb is to use not more than the square root of the period. Thus the LCG will never be able to build up the full lattice and the EICG will contain only a few points on hyperbola. Figure 3.4 depicts the lattice of lcg(65536,325,1,1), the ﬁrst image shows the full lattice in a zoomed view, the second one contains only 256 points, which corresponds to the square root of all possible points.
These ﬁgures clearly demonstrate that any regularities a generator develops over the full period are not necessarily present when only a fraction of the available numbers is used. The recommendation never to exhaust the full period can be further justiﬁed by the following argument: The PRNG is supposed to simulates drawing numbers from an urn with putting the numbers back into the urn, but in fact the typical PRNG empties the imaginary urn before it puts all numbers back when the period is exhausted. The diﬀerence between “drawing with replacement” and “drawing without replacement” is small as long as only a fraction of all numbers are drawn from the urn.
The discrepancy is a widely used and well studied measure for the equidistribution of a set of points. In this section we will try to give a motivated deﬁnition, some theoretical background, and summarize all published results concerning the EICG.
There are at least three approaches to the notion of discrepancy, one stems from statistics, one from geometric reasoning, and one from numerical integration. We will use the latter. An extensive introduction to discrepancy can be found in Niederreiter [70, Chapter 2].
We will use the following setting: The closed $s$dimensional unit cube $\u012as={\left[0,1\right]}^{s}$ will be the integration domain in which we will try to integrate the function $f$ by using the quasiMonte Carlo integration
$${\int}_{\u012as}f\left(u\right)du\approx \frac{1}{N}\sum _{n=1}^{N}f\left({x}_{n}\right)$$  (3.9) 
with ${x}_{1},\dots ,{x}_{N}\in \u012as$. Ideally, we hope that the approximation converges to the integral as the number of points increases. If this is the case for a reasonable class of functions, say, for all continuous $f$ on $\u012as$, then we call the (inﬁnite) sequence $\left({x}_{1},{x}_{2},\dots \right)$ uniformly distributed in $\u012as$. One can show that this deﬁnition of “uniformly distributed” is quite independent of the class of functions; using the Riemannintegrable functions yields the same test as using the characteristic functions of a very simple set of intervals.
Whereas the limit of the integration error can be used as a qualitative measure for the distribution properties of an (inﬁnite) sequence of points, one can use the integration error in the ﬁnite case as a quantitative measurement of the equidistribution of the ﬁnite sequence ${\left({x}_{n}\right)}_{n=1}^{N}$.
In order to get a workable measurement, we have to state which family of functions $f$ we consider for the integration, and how we condense all the integrationerrors for each function from the family into one single number.
The general concept of discrepancy uses the set of characteristic functions of axisparallel cubes in $Is:=[0,1{[s}^{}$ as the functions to integrate and the supremum as the condensing function. Formally, we can write this in the following way:
If $\Omega ={\left({x}_{n}\right)}_{n=1}^{N}$ is a ﬁnite sequence in $Is$, and $B$ an arbitrary subset of $Is$, then we can express the quasiMonte Carlo integration of the characteristic function^{4} ${c}_{B}$ in terms of the number of ${x}_{i}$ in $B$,
as
Based on this, the error when integrating ${c}_{B}$ can be written as $\left\frac{A\left(B,\Omega \right)}{N}{\lambda}_{s}\left(B\right)\right$, where ${\lambda}_{s}\left(B\right)$ is the $s$dimensional volume^{5} of $B$. Thus we can write the general notion of the discrepancy of a ﬁnite sequence $\Omega $ of points in $Is$ for an arbitrary^{6} family $\mathcal{\mathcal{B}}$ of sets as
$${D}_{N}\left(\mathcal{\mathcal{B}};\Omega \right)={sup}_{B\in \mathcal{\mathcal{B}}}\left\frac{A\left(B,\Omega \right)}{N}{\lambda}_{s}\left(B\right)\right$$  (3.10) 
From this general deﬁnition we can derive the deﬁnition of the two most important incarnations of discrepancy as follows:
Deﬁnition 3.2 The star discrepancy ${D}_{N}^{\ast}\left(\Omega \right)={D}_{N}^{\ast}\left({x}_{1},\dots {x}_{N}\right)$ of the ﬁnite sequence $\Omega $ is deﬁned by ${D}_{N}^{\ast}\left(\Omega \right):={D}_{N}\left({\mathcal{\mathcal{J}}}^{\ast};\Omega \right)$, where ${\mathcal{\mathcal{J}}}^{\ast}$ is the family of all subintervals of $Is$ of the form ${\prod}_{i=1}^{s}[0,{u}_{i}[$.
Deﬁnition 3.3 The (extreme) discrepancy ${D}_{N}\left(\Omega \right)={D}_{N}\left({x}_{1},\dots {x}_{N}\right)$ of the ﬁnite sequence $\Omega $ is deﬁned by ${D}_{N}\left(\Omega \right):={D}_{N}\left(\mathcal{\mathcal{J}};\Omega \right)$, where $\mathcal{\mathcal{J}}$ is the family of all subintervals of $Is$ of the form ${\prod}_{i=1}^{s}[{u}_{i},{v}_{i}[$.
While ${D}_{N}$ and ${D}_{N}^{\ast}$ are the classical ﬁgures for measuring the equidistribution, they are far from being the only ones. Interesting variations of the basic idea are the isotropic discrepancy, which uses the family of convex sets instead of axisparallel cubes, or the ${L}^{2}$discrepancy, which uses the $2$norm instead of the supremum. Especially the ${L}^{2}$discrepancy has received a lot of attention recently as it is suitable for empirical testing [38] and has a number of interesting theoretical properties [80, 62]. Another measurement worth mentioning is the weighted spectral test [40, 46, 44, 41, 47].
Let us quickly state a few general results concerning discrepancy. They will help us to interpret the main results of this chapter. We once again refer to Niederreiter [70, p. 14ﬀ, p. 166ﬀ] for proofs and further references.
Proposition 3.1 For all ﬁnite sequences $\Omega $ consisting of $N$ points in $\u012as$ we have
and
In dimension one, that is $s=1$, it is possible to express the discrepancy as a relatively simple formula operating on the ordered list of points.
Proposition 3.2 If $0\le {x}_{1}\le {x}_{2}\le \dots \le {x}_{N}\le 1$, then
and
From these formulae, as well as the well known fact that sorting is of complexity $\mathcal{\mathcal{O}}\left(NlogN\right)$, it is easy to see that one can calculate the discrepancy in the onedimensional case in $\mathcal{\mathcal{O}}\left(NlogN\right)+\mathcal{\mathcal{O}}\left(N\right)$ steps. Using a memory versus speed tradeoﬀ [34] it is possible to get the complexity down to $\mathcal{\mathcal{O}}\left(N\right)$. In higher dimensions $s$ calculating the discrepancy is of complexity $\mathcal{\mathcal{O}}\left({N}^{s}\right)$, making any reasonable empirical testing computationally infeasible. Probabilistic algorithms [89] can be employed to calculate tight upper bounds for a given $\Omega $.
What do we know about the behaviour of ${D}_{N}$ with increasing $N$ ? If the sequence of point is indeed uniformly distributed in $\u012as$, then we know that ${lim}_{N\to \infty}{D}_{N}=0$. For a sequence of uniformly distributed random points we know that ${lim}_{N\to \infty}{D}_{N}=0$ with probability one. But in order to use the discrepancy as a ﬁgure of merit for ﬁnite sequences, we need to know exactly how ${D}_{N}$ converges for random sequences. Luckily, the following result (due to Kiefer [49]) provides us with the benchmark according to which we can judge the discrepancy bounds derived for PRN.
Proposition 3.3 (Law of the iterated logarithm) Let ${z}_{1},{z}_{2},{z}_{3},\dots $ be a sequences of uniformly distributed random points in $\u012as$, then we have
where ${\lambda}^{\infty}$ is the Lebesgue measure on the space of all inﬁnite sequences in $\u012as$.
The discrepancy is per deﬁnition an upper bound for the quasiMonte Carlo integration error for a very limited class of functions, namely the characteristic functions of axisparallel cubes. A classic result by Hlawka uses the discrepancy to derive an errorbound for a large class of functions.
Proposition 3.4 (KoksmaHlawka inequality [70]) If $f$ has bounded variation $V\left(f\right)$ on $\u012as$ in the sense of Hardy and Krause, then for any ${x}_{1},\dots ,{x}_{N}\in Is$ we have
Why is this inequality so important ? For the Monte Carlo numerical integration, which is based on “random numbers”, one cannot derive an apriori^{7} error bound on the integration error. It is only possible to state a probabilistic error bound, a shortcoming that is often not acceptable. The inequality of KoksmaHlawka on the other hand, is a hard bound on the integration error. Thus in order to get such a bound for the Monte Carlo method, one has to calculate the discrepancy for the numbers used, which is not feasible in practice. The way out is to use a set of numbers for which bounds on the discrepancy are known in advance, such as $\left(t,m,s\right)$nets or PRNGs for which such bounds are available.
On the other hand, if we want our PRNG to model a $U[0,1[$ distributed random variable as closely as possible, the law of the iterated logarithm provides us with the correct order of magnitude for the discrepancy. One can argue that any results concerning the discrepancy of a particular generator which shows a rate of growth close to $\mathcal{\mathcal{O}}\left({N}^{1\u22152}loglogN\right)$ is a sign for the right amount of “randomness” in the generator. Empirical evidence seems to support this argument. In any case, the discrepancy is certainly the most widely used ﬁgure of merit in theoretical analysis of pseudorandom number generation algorithms.
In order to prove discrepancy bounds, we need a variety of auxiliary results. Unfortunately it is not possible to include the proofs for these lemmata without exceeding the scope of this thesis, as well as alienating the target audience. Thus we will only list the results and give references to the proofs.
The basic approach to derive discrepancy bounds for PRNG is due to Niederreiter [70], who established a link between the discrepancy of a ﬁnite sequence of points with rational coordinates and certain exponential sums. As most common PRNG use integer arithmetic combined with a ﬁnal scaling operation, this approach is perfectly suited for the study of the output of such generators.
Niederreiter’s proof [70, §3.2] is elementary, although rather tricky. Hellekalek [39] gave a proof based on dyadic harmonic analysis. A detailed proof can be found in Weingartner [87].
In correspondence with the literature [71, 70], we will use the following deﬁnitions: For a prime $p\ge 5$ let ${C}_{s}^{\ast}\left(p\right)$ be the set of all nonzero $h=\left({h}_{1},\dots ,{h}_{s}\right)\in {\mathbb{Z}}^{s}$ with $\left{h}_{i}\right<p\u22152$ for $1\le i\le s$. For such $h$, put $r\left(h,p\right)={\prod}_{i=1}^{s}r\left({h}_{i},p\right)$ with $r\left(h,p\right)=psin\left(\pi \lefth\right\u2215p\right)$ for $h$ nonzero and $r\left(0,p\right)=1$. Furthermore, set $\chi \left(n\right)={e}^{2\pi \sqrt{1}\phantom{\rule{0em}{0ex}}n\u2215p}$ for $n\in {Z}_{p}$, and let $u\cdot v$ denote the standard inner product of $u,v\in {\mathbb{R}}^{s}$.
Although some of the Lemmata below do not need this restriction, we will consider only the case of prime moduli here.
Lemma 3.1 For a prime $p\ge 2$ and ${y}_{0},{y}_{1},\dots ,{y}_{N1}\in {\mathbb{Z}}_{p}^{s}$, let $\Omega $ be the ﬁnite sequence ${\left({x}_{n}={y}_{n}\u2215p\right)}_{n=0}^{N1}$. Then
The following Lemma is due to Niederreiter [71, Lemma 2] which improves [70, Corollary 3.11].
Lemma 3.2 Let $p\ge 5$ be a prime and let ${y}_{0},{y}_{1},\dots ,{y}_{N1}\in {\mathbb{Z}}_{p}^{s}$. Suppose the real number $B$ is such that
Then the discrepancy ${D}_{N}$ of the ﬁnite sequence ${\left({x}_{n}={y}_{n}\u2215p\right)}_{n=0}^{N1}$ satisﬁes
In order to derive the bound $B$ for the EICG we need the following two Lemmata, the ﬁrst one is due to Cochrane [7], the second is a variant of the BomberiWeil bound for exponential sums (see Moreno and Moreno [68, Theorem 2]):
Lemma 3.3 For any prime $p\ge 5$ and any integer $N$ we have
Lemma 3.4 Let $Q\u2215R$ be a rational function over ${\mathbb{Z}}_{p}$ which is not of the form ${A}^{p}A$ with $A\in {\overline{\mathbb{Z}}}_{p}\left(x\right)$. Let $s$ be the number of distinct roots of the polynomial $R$ in ${\overline{\mathbb{Z}}}_{p}$. Then we have
where ${s}^{\ast}=s$ and $\delta =1$ if $deg\left(Q\right)\le deg\left(R\right)$, and ${s}^{\ast}=s+1$ and $\delta =0$ otherwise.
The KoksmaHlawka inequality can be used to derive a lower bound on the discrepancy of a ﬁnite sequence of points. All that is needed is a function with a known integral and bounded variation. In light of the previous lemmata it seems natural to use a function for which the Monte Carlo integration can be expressed in terms of exponential sums. The following result is due to Niederreiter [70, Cor. 3.17].
Lemma 3.5 For a prime $p\ge 2$ and ${y}_{0},{y}_{1},\dots ,{y}_{N1}\in {\mathbb{Z}}_{p}^{s}$, let $\Omega $ be the ﬁnite sequence ${\left({x}_{n}={y}_{n}\u2215p\right)}_{n=0}^{N1}$. Then, for any nonzero $h\in {\mathbb{Z}}^{s}$, we have
where $m$ is the number of nonzero coordinates of $h$.
Unfortunately, we cannot prove a lower bound on the generic sum $\sum \chi \left(h\cdot {y}_{n}\right)$ for ﬁnite sequences of points generated by an EICG. But we are able to prove such bounds for a slightly diﬀerent exponential sum ${E}_{N}\left(\chi ;d,e\right)$, which we can link to sequences generated by an EICG, and which is a special case of the sum in Lemma 3.5. This approach is due to EichenauerHerrmann and Niederreiter [23]. All lemmata and theorems concerning lower bounds are taken from this paper.
For $d=\left({d}_{1},\dots ,{d}_{s}\right)\in {\mathbb{Z}}_{p}^{s}$ and $e=\left({e}_{1},\dots ,{e}_{s}\right)\in {\mathbb{Z}}_{p}^{s}$ we deﬁne
$${E}_{N}\left(\chi ;d,e\right):=\sum _{n=0}^{N1}\chi \left(\sum _{j=1}^{s}{d}_{j}\overline{n+{e}_{j}}\right)\phantom{\rule{2em}{0ex}}\text{for}1\le N\le p,$$  (3.11) 
and put $E\left(\chi ;d,e\right):={E}_{p}\left(\chi ;d,e\right)$.
Lemma 3.6 If $d\ne 0$ and ${e}_{1},\dots ,{e}_{s}$ distinct, then
and
See [23] (Theorem 1 and Corollary 1) for the proofs, which are similar to the proofs of Theorems 3.2 and 3.3. Besides these upper bounds, we know the average value of the ${E}_{N}\left(\chi ;d,e\right)$, according to the next Lemma.
Lemma 3.7 Let $1\le N\le p$ and $1\le k\le s$, $e,d\in {\mathbb{Z}}_{p}^{s}$ with ﬁxed ${d}_{k+1},\dots ,{d}_{s}$. Then we have
Proof: We set ${d}_{j,m,n}:={d}_{j}\left(\overline{n+{e}_{j}}\overline{m+{e}_{j}}\right)$. With $e=\left({e}_{1},\dots ,{e}_{s}\right)$ we get
$$\begin{array}{rcll}\sum _{{d}_{1},\dots ,{d}_{k}\in {\mathbb{Z}}_{p}}{E}_{N}\left(\chi ;d,e\right){}^{2}=& & & \text{}\\ & =& \sum _{{d}_{1},\dots ,{d}_{k}\in {\mathbb{Z}}_{p}}{\left\sum _{n=0}^{N1}\chi \left(\sum _{j=1}^{s}{d}_{j}\overline{n+{e}_{j}}\right)\right}^{2}& \text{}\\ & =& \sum _{{d}_{1},\dots ,{d}_{k}\in {\mathbb{Z}}_{p}}\sum _{n,m=0}^{N1}\chi \left(\sum _{j=1}^{s}{d}_{j,m,n}\right)& \text{}\\ & =& \sum _{n,m=0}^{N1}\sum _{{d}_{1},\dots ,{d}_{k}\in {\mathbb{Z}}_{p}}\chi \left(\sum _{j=1}^{s}{d}_{j,m,n}\right)& \text{}\\ & =& \sum _{n,m=0}^{N1}\sum _{{d}_{1},\dots ,{d}_{k}\in {\mathbb{Z}}_{p}}\prod _{j=1}^{s}\chi \left({d}_{j,m,n}\right)& \text{}\\ & =& \sum _{n,m=0}^{N1}\left(\prod _{j=k+1}^{s}\chi \left({d}_{j,m,n}\right)\right)\sum _{{d}_{1},\dots ,{d}_{k}\in {\mathbb{Z}}_{p}}\prod _{j=1}^{k}\chi \left({d}_{j,m,n}\right)& \text{}\\ & =& \sum _{n,m=0}^{N1}\chi \left(\sum _{j=k+1}^{s}\chi \left({d}_{j,m,n}\right)\right)\sum _{{d}_{1},\dots ,{d}_{k1}\in {\mathbb{Z}}_{p}}\sum _{{d}_{k}\in {\mathbb{Z}}_{p}}\prod _{j=1}^{k}\chi \left({d}_{j,m,n}\right)& \text{}\\ & =& \sum _{n,m=0}^{N1}\chi \left(\sum _{j=k+1}^{s}\chi \left({d}_{j,m,n}\right)\right)\sum _{\genfrac{}{}{0ex}{}{{d}_{1},\dots ,{d}_{k1}}{\in {\mathbb{Z}}_{p}}}\prod _{j=1}^{k1}\chi \left({d}_{j,m,n}\right)\sum _{d\in {\mathbb{Z}}_{p}}\chi \left(d\left(\overline{n+{e}_{k}}\overline{m+{e}_{k}}\right)\right)& \text{}\\ & \vdots & & \text{}\\ & =& \sum _{n,m=0}^{N1}\chi \left(\sum _{j=k+1}^{s}\chi \left({d}_{j,m,n}\right)\right)\prod _{j=1}^{k}\left(\sum _{d\in {\mathbb{Z}}_{p}}\chi \left(d\left(\overline{n+{e}_{j}}\overline{m+{e}_{j}}\right)\right)\right)& \text{}\\ & =& \sum _{\genfrac{}{}{0ex}{}{n,m=0}{n=m}}^{N1}\chi \left(0\right)\prod _{j=1}^{k}\left(\sum _{d\in {\mathbb{Z}}_{p}}\chi \left(0\right)\right)& \text{}\\ & =& N{p}^{k},& \text{}\end{array}$$because we have ${\sum}_{d\in {\mathbb{Z}}_{p}}\chi \left(d\cdot k\right)=0$ for $k\in {Z}_{p},k\ne 0$ and $p$ otherwise. __
Applying Lemma 3.5 on a ﬁnite sequence of points generated by parallel streams of EICG according to Deﬁnition 3.1, and $h=\left(1,1,0,\dots ,0\right)\in {\mathbb{Z}}^{s}$ we get for $s\ge 2$
$$\begin{array}{rcll}{D}_{N}^{\left(s\right)}\left({x}_{1},\dots ,{x}_{N}\right)& \ge & \frac{1}{2\left(\pi +2\right)N}\left\sum _{n=0}^{N1}\chi \left({y}_{n}^{\left(1\right)}+{y}_{n}^{\left(2\right)}\right)\right& \text{}\\ & =& \frac{1}{2\left(\pi +2\right)N}\left\sum _{n=0}^{N1}\chi \left(\overline{{a}_{1}n+{b}_{1}}+\overline{{a}_{2}n+{b}_{2}}\right)\right& \text{}\\ & =& \frac{1}{2\left(\pi +2\right)N}\left{E}_{N}\left(\chi ;d,e\right)\right& \text{(3.12)}\text{}\text{}\end{array}$$with $d=\left(\overline{{a}_{1}},\overline{{a}_{2}}\right)\in {\mathbb{Z}}_{p}^{2}$ and $e=\left({b}_{1}\overline{{a}_{1}},{b}_{2}\overline{{a}_{2}}\right)\in {\mathbb{Z}}_{p}^{2}$. Similarly, for $h=\left(1,0,\dots ,0\right)\in {\mathbb{Z}}^{s}$ and $s\ge 1$ we have
$${D}_{N}^{\left(s\right)}\left({x}_{1},\dots ,{x}_{N}\right)\ge \frac{1}{2N}\left{E}_{N}\left(\chi ;\overline{{a}_{1}},{b}_{1}\overline{{a}_{1}}\right)\right.$$  (3.13) 
We now have the tools necessary to derive upper and lower bounds on the discrepancy of ﬁnite sequences of points generated by parallel streams of EICG numbers. But as a reference, let us ﬁrst look at the result available for the LCG, which will once again serve as a reference.
According to [70, Theorem 7.4] we have for the multiplicative linear congruential generator the following statement: For $s\ge 2$ and for an average multiplier $a$, the discrepancy of the ﬁnite sequence of $s$dimensional points consisting of all $M1$ overlapping tuples from $lcg\left(M,a,0,1\right)$ obeys
where the implied constant depends only on $s$.
We ﬁrst turn our attention to upper bounds; we will prove both bounds for the full, as well as or parts of the period. These bounds are relevant in two ways:
Please note that the following theorems do not depend on the choice of any parameters; they are valid for every single full period EICG. This is in sharp contrast to the bounds known for the LCG, which is only deals with the average over all multipliers, and thus tells us nothing about a particular generator. Furthermore, in the case of the LCG no bounds are known for parts of the period in dimensions $s\ge 2$.
As one can guess from the lemmata listed above, the proofs involve exponential sums which need to be rearranged in a way to be able to use Lemma 3.4.
The following two theorems are due to Niederreiter [71].
Theorem 3.2 For $p\ge 5$, prime, and $2\le s<p$ set ${x}_{n}={y}_{n}\u2215p$ based on the ${y}_{n}$ of Deﬁnition 3.1. Then we have for the discrepancy of the ﬁnite sequence ${\left({x}_{n}\right)}_{n=0}^{p1}$ the following upper bound:
Proof: For $h=\left({h}_{1},\dots ,{h}_{s}\right)\in {C}_{s}^{\ast}\left(p\right)$ put
We now restrict the sum to those terms where ${y}_{n}^{\left(i\right)}\ne 0$ by using the same set $W$ as in the proof of Theorem 3.1 as the summation domain. By noting that $card\left({\mathbb{Z}}_{p}\setminus W\right)=s$ and using the triangle inequation we get
where $Q\u2215R$ is the rational function over ${\mathbb{Z}}_{p}$ given by
$$\frac{Q\left(x\right)}{R\left(x\right)}=\sum _{i=1}^{s}\frac{{h}_{i}}{{a}_{i}x+{b}_{i}}\phantom{\rule{1em}{0ex}}\text{with}R\left(x\right)=\prod _{i=1}^{s}\left({a}_{i}x+{b}_{i}\right).$$  (3.14) 
As all ${a}_{i}$ are nonzero, we have $deg\left(R\right)=s<p$. Furthermore, as at least one of the ${h}_{i}$ is nonzero, the uniqueness of the partial fraction decomposition for rational functions implies that $Q\ne 0$ and $deg\left(Q\right)<s=deg\left(R\right)$. In order to apply Lemma 3.4, we have to show that $Q\u2215R$ is not of the form ${A}^{p}A$ with $A\in {\overline{\mathbb{Z}}}_{p}\left(x\right)$, where ${\overline{\mathbb{Z}}}_{p}$ denotes the algebraic closure of the ﬁeld ${\mathbb{Z}}_{p}$, and ${\overline{\mathbb{Z}}}_{p}\left(x\right)$ denotes the ﬁeld of rational functions over ${\overline{\mathbb{Z}}}_{p}$. If this were the case, we would have
with polynomials $K,L$ over ${\overline{\mathbb{Z}}}_{p}$ and $gcd\left(K,L\right)=1$, and thus
$${L}^{p}Q=\left({K}^{p1}{L}^{p1}\right)KR.$$  (3.15) 
Since we have demanded $gcd\left(K,L\right)=1$, $L$ cannot divide $K$ or $\left({K}^{p1}{L}^{p1}\right)$, thus ${L}^{p}$ must divide $R$. As $deg\left(R\right)=s<p$, that can only be the case if $L$ is a nonzero constant polynomial which implies $deg\left({L}^{p}\right)=1$. Comparing the degrees in (3.15) yields $deg\left(Q\right)\ge deg\left(R\right)$, which contradicts the degrees derived from (3.14). Thus we can apply Lemma 3.4, and this leads to
$$S\left(h\right)\le \left(2s2\right){p}^{1\u22152}+s+1\phantom{\rule{1em}{0ex}}\text{forall}h\in {C}_{s}^{\ast}\left(p\right).$$  (3.16) 
The rest follows from Lemma 3.2. __
Theorem 3.3 For $p\ge 5$, prime, and $2\le s<p$ let the ﬁnite sequence ${\left({x}_{n}={y}_{n}\u2215p\right)}_{n=0}^{p1}$ be like in Theorem 3.2. Then we have for the discrepancy of the ﬁrst $N$ points the following upper bound:
$$\begin{array}{rcll}{D}_{N}^{\left(s\right)}<& & 1{\left(11\u2215p\right)}^{s}& \text{}\\ & +& \left(\frac{2s2}{{p}^{1\u22152}}+\frac{s+1}{p}+\frac{s}{N}\left(2{p}^{1\u22152}+1\right)\left(\frac{4}{{\pi}^{2}}logp+0.38+\frac{0.64}{p}\right)\right)\cdot & \text{}\\ & \cdot & {\left(\frac{4}{{\pi}^{2}}logp+1.38+\frac{0.64}{p}\right)}^{s}& \text{}\end{array}$$
Proof: Just as in the proof of Theorem 3.2 we need to derive a bound for an exponential sum to be able to apply Lemma 3.2. This time the summation domain is not ${\mathbb{Z}}_{p}$, thus we need to rewrite the sum in a rather tricky way.
For $h=\left({h}_{1},\dots ,{h}_{s}\right)\in {C}_{s}^{\ast}\left(p\right)$ put
with ${r}_{n}={\sum}_{i=1}^{s}{h}_{i}{y}_{n}^{\left(i\right)}$ for $n\ge 0$. We can now rewrite ${S}_{N}\left(h\right)$ using the fact that ${\sum}_{u\in {\mathbb{Z}}_{p}}\chi \left(uk\right)$ evaluates to $0$ for $k\ne 0$ and to $p$ otherwise.
$$\begin{array}{rcll}{S}_{N}\left(h\right)& =& \sum _{n=0}^{N1}\chi \left({r}_{n}\right)& \text{}\\ & =& \sum _{n=0}^{p1}\chi \left({r}_{n}\right){1}_{\left\{0,\dots ,N1\right\}}\left(n\right)& \text{}\\ & =& \sum _{n=0}^{p1}\chi \left({r}_{n}\right)\sum _{t=0}^{N1}{\delta}_{t,n}& \text{}\\ & =& \sum _{n=0}^{p1}\chi \left({r}_{n}\right)\sum _{t=0}^{N1}\frac{1}{p}\sum _{u=0}^{p1}\chi \left(u\left(nt\right)\right)& \text{}\\ & =& \frac{1}{p}\sum _{u=0}^{p1}\sum _{t=0}^{N1}\sum _{n=0}^{p1}\chi \left({r}_{n}\right)\chi \left(ut\right)\chi \left(un\right)& \text{}\\ & =& \frac{1}{p}\sum _{u=0}^{p1}\left(\sum _{t=0}^{N1}\chi \left(ut\right)\right)\left(\sum _{n\in {\mathbb{Z}}_{p}}\chi \left({r}_{n}+un\right)\right)& \text{}\\ & =& \frac{N}{p}S\left(h\right)+\frac{1}{p}\sum _{u=1}^{p1}\left(\sum _{t=0}^{N1}\chi \left(ut\right)\right)\left(\sum _{n\in {\mathbb{Z}}_{p}}\chi \left({r}_{n}+un\right)\right)& \text{}\end{array}$$In the last line the summand for $u=0$ was pulled out; the term $S\left(h\right)$ is the same as in the proof of Theorem 3.2. As we need an upper bound on $\left{S}_{N}\left(h\right)\right$, we apply the triangle inequation, yielding
$$\left{S}_{N}\left(h\right)\right\le \frac{N}{p}\leftS\left(h\right)\right+\frac{1}{p}\sum _{u=1}^{p1}\left\sum _{t=0}^{N1}\chi \left(ut\right)\right\left\sum _{n\in {\mathbb{Z}}_{p}}\chi \left({r}_{n}+un\right)\right,$$  (3.17) 
and examine each of the terms on the right side.
We want to apply Lemma 3.4 on the rightmost term: For $1\le u\le p1$ we have, by the same argument as following (3.14)
where $Q\u2215R$ is the rational function over ${\mathbb{Z}}_{p}$ given by
Once again, we claim that $Q\u2215R$ is not of the form ${A}^{p}A$ with a rational function $A\in {\overline{\mathbb{Z}}}_{p}\left(x\right)$. From the deﬁnition of $R$ and $Q$ we have $deg\left(R\right)=s$ and $deg\left(Q\right)=s+1$, as all the neither the ${a}_{i}$ nor $u$ can be zero. For if we had
with polynomials $K,L$ over ${\overline{\mathbb{Z}}}_{p}$ and $gcd\left(K,L\right)=1$, then the argument following (3.15) shows that $L$ is a nonzero constant polynomial. Thus we have
for suitable ${e}_{1},{e}_{2}\in {\overline{\mathbb{Z}}}_{p}$ with ${e}_{1},{e}_{2}\ne 0$. Comparing the degrees of the polynomials in this equation we get $deg\left({e}_{1}{K}^{p}+{e}_{2}K\right)=1$, which implies $deg\left(K\right)\ge 1$, hence $deg\left({e}_{1}{K}^{p}+{e}_{2}K\right)=pdeg\left(K\right)>1$. This contradiction proves that we can apply Lemma 3.4, yielding
$$\left\sum _{n\in {\mathbb{Z}}_{p}}\chi \left({r}_{n}+un\right)\right\le s\left(2{p}^{1\u22152}+1\right)\phantom{\rule{2em}{0ex}}\text{for}1\le u\le p1.$$  (3.18) 
Furthermore, by rewriting the sum over $t$ using some elementary equivalences like $\chi \left(x\right)1={e}^{2\pi \sqrt{1}x}1={e}^{\pi \sqrt{1}x}\cdot \left({e}^{\pi \sqrt{1}x}{e}^{\pi \sqrt{1}x}\right)={e}^{\pi \sqrt{1}x}\cdot 2\sqrt{1}sin\pi x$, we get
$$\begin{array}{rcll}\left\sum _{t=0}^{N1}\chi \left(ut\right)\right& =& \left\sum _{t=0}^{N1}{e}^{2\pi \sqrt{1}ut\u2215p}\right& \text{}\\ & =& \left\sum _{t=0}^{N1}{\left({e}^{2\pi \sqrt{1}u\u2215p}\right)}^{t}\right& \text{}\\ & =& \left\frac{{e}^{2\pi \sqrt{1}uN\u2215p}1}{{e}^{2\pi \sqrt{1}u\u2215p}1}\right& \text{}\\ & =& \left\frac{{e}^{\pi \sqrt{1}uN\u2215p}\phantom{\rule{0em}{0ex}}2\sqrt{1}}{{e}^{\pi \sqrt{1}u\u2215p}\phantom{\rule{0em}{0ex}}2\sqrt{1}}\right\cdot \left\frac{sin\left(\pi uN\u2215p\right)}{sin\left(\pi u\u2215p\right)}\right& \text{}\\ & =& \left\frac{sin\left(\pi uN\u2215p\right)}{sin\left(\pi u\u2215p\right)}\right.& \text{(3.19)}\text{}\text{}\end{array}$$We now return to (3.17) to put the pieces together. Combining everything with an application of Lemma 3.3 we get
$$\begin{array}{rcll}\left{S}_{N}\left(h\right)\right\le & & & \text{}\\ & \stackrel{\left(3.17\right)}{\le}& \frac{N}{p}\leftS\left(h\right)\right+\frac{1}{p}\sum _{u=1}^{p1}\left\sum _{t=0}^{N1}\chi \left(ut\right)\right\left\sum _{n\in {\mathbb{Z}}_{p}}\chi \left({r}_{n}+un\right)\right& \text{}\\ & \stackrel{\left(3.18\right)}{\le}& \frac{N}{p}\leftS\left(h\right)\right+s\left(2{p}^{1\u22152}+1\right)\frac{1}{p}\sum _{u=1}^{p1}\left\sum _{t=0}^{N1}\chi \left(ut\right)\right& \text{}\\ & \stackrel{\left(3.19\right)}{=}& \frac{N}{p}\leftS\left(h\right)\right+s\left(2{p}^{1\u22152}+1\right)\frac{1}{p}\sum _{u=1}^{p1}\left\frac{sin\left(\pi uN\u2215p\right)}{sin\left(\pi u\u2215p\right)}\right& \text{}\\ & \stackrel{L.3.3}{<}& \frac{N}{p}\leftS\left(h\right)\right+s\left(2{p}^{1\u22152}+1\right)\left(\frac{4}{{\pi}^{2}}\phantom{\rule{0em}{0ex}}plogp+\left(0.38\right)p+0.64\right)& \text{}\\ & \stackrel{\left(3.16\right)}{\le}& \frac{N}{p}\left(\left(2s2\right){p}^{1\u22152}+s+1\right)+s\left(2{p}^{1\u22152}+1\right)\left(\frac{4}{{\pi}^{2}}\phantom{\rule{0em}{0ex}}plogp+\left(0.38\right)p+0.64\right).& \text{}\end{array}$$for all $h\in {C}_{s}^{\ast}\left(p\right)$, and thus we can apply Lemma 3.2 to obtain the desired upper bound on ${D}_{N}^{\left(s\right)}$. _
The theorems covering lower bounds are formulated in a diﬀerent way. Instead of giving hard bounds for all EICG, these theorems state how many EICGs there must be exceeding a threshold value for the discrepancy. This kind of statement follows from the basic approach to the problem, namely combining upper bounds on exponential sums with their average values.
Lower bounds guarantee that the PRN are not perfectly equidistributed, they contain the irregularities found in “random” numbers, too.
The following three theorems are due to EichenauerHerrmann and Niederreiter [23].
Theorem 3.4 Let ${a}_{2}\in {\mathbb{Z}}_{p}^{\ast}$,${b}_{2}\in {\mathbb{Z}}_{p}$, and $c\in {\mathbb{Z}}_{p}\setminus \left\{{b}_{2}\overline{{a}_{2}}\right\}$ be ﬁxed. Let $0<t\le \sqrt{p\u2215\left(p1\right)}$, and set
Then there exist more than ${A}_{p}\left(t\right)$ values of ${a}_{1}\in {\mathbb{Z}}_{p}^{\ast}$ such that for $s$tuples from the corresponding parallel stream of EICG numbers with ${b}_{1}={a}_{1}c$ and $s\ge 2$ we have
Proof: We rewrite the theorem in terms of ${E}_{N}$ by using (3.12). Thus we have $d=\left({d}_{1},{d}_{2}\right)=\left(\overline{{a}_{1}},\overline{{a}_{2}}\right)$, and $e=\left({e}_{1},{e}_{2}\right)=\left({b}_{1}\overline{{a}_{1}},{b}_{2}\overline{{a}_{2}}\right)\in {\mathbb{Z}}_{p}^{2}$ with ${e}_{1}\ne {e}_{2}$, and we need to show that there are more than ${A}_{p}\left(t\right)$ values for ${d}_{1}\in {\mathbb{Z}}_{p}^{\ast}$ such that $\leftE\left(\chi ;d,e\right)\right\ge t{p}^{1\u22152}$.
Now suppose there exist at most ${A}_{p}\left(t\right)$ values of ${d}_{1}\in {\mathbb{Z}}_{p}^{\ast}$ with $\leftE\left(\chi ;d,e\right)\right\ge t{p}^{1\u22152}$, i.e., there exist at least $\left(p1\right){A}_{p}\left(t\right)$ values of ${d}_{1}$ with $\leftE\left(\chi ;d,e\right)\right<t{p}^{1\u22152}$. From Lemma 3.6 (with $s=2$) we know that $\leftE\left(\chi ;d,e\right)\right\le 2{p}^{1\u22152}+3$ for every ${d}_{1}\in {\mathbb{Z}}_{p}^{\ast}$. Hence, observing that ${d}_{1}=0$ contributes nothing to the sum, we obtain
which contradicts Lemma 3.7 (with $s=2$, $k=1$, and $N=p$). _
Theorem 3.5 Let ${a}_{2}\in {\mathbb{Z}}_{p}^{\ast}$,${b}_{2}\in {\mathbb{Z}}_{p}$, $c\in {\mathbb{Z}}_{p}\setminus \left\{{b}_{2}\overline{{a}_{2}}\right\}$ and an integer $N$ with
be ﬁxed and set
$$\begin{array}{rcll}{\tau}_{N}& :=& \frac{p}{p1}\frac{1}{N\left(p1\right)}{\left(2{p}^{1\u22152}\left(\frac{4}{{\pi}^{2}}logp+0.38+\frac{0.64}{p}\right)+2\right)}^{2}& \text{}\\ {A}_{N}\left(t\right)& :=& \frac{N\left(p1\right)\left({\tau}_{N}{t}^{2}\right)}{{\left(4{p}^{1\u22152}\left(\frac{4}{{\pi}^{2}}logp+0.38+\frac{0.64}{p}\right)+\frac{N}{p}\left(2{p}^{1\u22152}+1\right)+2\right)}^{2}N{t}^{2}}& \text{}\end{array}$$for $0<t\le \sqrt{{\tau}_{N}}$. Then there exist more than ${A}_{N}\left(t\right)$ values of ${a}_{1}\in {\mathbb{Z}}_{p}^{\ast}$ such that for $s$tuples from the corresponding parallel stream of EICG numbers with ${b}_{1}={a}_{1}c$ and $s\ge 2$ we have
Proof: The proof is analogous to the last one. The only real diﬀerence is the handling of ${d}_{1}=0$, where we need to apply Lemma 3.6 with $s=1$. _
Using (3.13) instead of (3.12) we get a slightly diﬀerent result. As the proof contains no new ideas, we omit it, too.
Theorem 3.6 Let $c\in {\mathbb{Z}}_{p}$ and $1\le N<p$ be ﬁxed. For real numbers $t$ fulﬁlling $0<t\le \sqrt{\left(pN\right)\u2215\left(p1\right)}$ set
Then there exist more than ${B}_{N}\left(t\right)$ values of ${a}_{1}\in {\mathbb{Z}}_{p}^{\ast}$ such that for $s$tuples from the corresponding parallel stream of EICG numbers with ${b}_{1}={a}_{1}c$ and $s\ge 2$ we have
Restricting oneself to the ordinary serial test, i.e. is considering only overlapping $s$tuples instead of vectors from parallel streams, it is possible to improve Theorem 3.6. See [23, Corr. 9] for details.
For congruential generators modulo a prime $p$, the following deﬁnition, due to Niederreiter [70], speciﬁes another criteria which can be used to classify PRNGs.
Deﬁnition 3.4 For given $s\ge 1$, a congruential generator producing the sequence ${y}_{0},{y}_{1},\dots \in {\mathbb{Z}}_{p}$ passes the $s$dimensional lattice test if the vectors ${y}_{n}{y}_{0},n=1,2,\dots $, span the $s$dimensional vector space ${\mathbb{Z}}_{p}^{s}$, where
Theorem 3.7 An EICG with modulus $p$ passes the $s$dimensional lattice test exactly for all $s\le p2$. This is the optimal behaviour under this test.
Proof: As explained in 1.3.4, one can visualize any congruential generator modulo $p$ with period length $p$ as a permutation polynomial mapping $n$ to ${y}_{n}$. In the case of the EICG, this polynomial has degree $d=p2$ as we can write the the EICG formula as
according to the theorem of EulerFermat. The rest follows immediately from a theorem by Eichenauer, Grothe, and Lehn [13], which can also be found in Niederreiter [70, Theorem 8.2]. __
As explained is section 2.3.2, testing a pseudorandom number generator is a tricky task. Even if one decides which properties one wants to test for, designing the test itself involves a fair amount of statistic knowledge as to how the test should be parameterized as well as how the results should be judged.
In this chapter, we will try to give a survey of empirical tests concerning the EICG carried out by various authors. Since it will be inappropriate to spend too much time on discussing all design decisions in detail, we will only describe the motivation, the test procedure, and the results. For more information we refer to the original authors.
Another compilation of empirical test results concerning inversive generators can be found in [21].
The Digit Test, due to Leeb^{1} [58, 26, 61], tries to assess the distribution quality of a PRNG by looking at the $g$adic representation of its output. If we only allow $g={2}^{l}$, one digit in base $g$ corresponds to $l$ binary digits, which we can obtain by cutting out $l$ consecutive bits in the computer representation of the numbers. This is a very eﬃcient procedure which can be done by simple bitwise AND and shift operations. Let’s visualize this by an example, using $8={2}^{3}$ as the base and selecting the second digit which is corresponds to the three bits starting at position $k=4$.
Decimal  Binary  Base 8  selected digit 
0.5859375  0.100 101 10  0.454  5 
0.82421875  0.110 100 11  0.646  4 
0.21484375  0.001 101 11  0.156  5 
0.765625  0.110 001 00  0.610  1 
0.16015625  0.001 010 01  0.122  2 
0.48046875  0.011 110 11  0.366  6 
Ideally, we expect the last column to be uniformly distributed over all possible digits. Furthermore, we can assume that if the original numbers are uncorrelated, this will hold for the sequence of digits, too. On the other hand, if we can prove that something is wrong with the digit sequence, this does not shed a good light on the original numbers.
What have we gained by mapping numbers to digits ? Basically this mapping plays the role of the “bins” discussed on page § and §. As discussed there, this mapping is used to make the problem manageable by drastically reducing the number of possible values for each number. Now we can easily count the occurrences of each diﬀerent digit and perform correlation tests on them.
As a correlation test Leeb used the idea of the serial test (see p. §) with nonoverlapping $s$tuples. To measure the distribution of the $s$tuples, a ${\chi}^{2}$ test was used; the resulting ${\chi}^{2}$value was called ${t}_{1}\left(s,k,l\right)$, the level1 test statistic. This procedure was repeated 64 times, and all the ${\chi}^{2}$ values were compared to their expected distribution by a twosided KolmogorovSmirnov test, yielding ${t}_{2}\left(s,k,l\right)$, the level2 test statistic on which we will focus in the graphics.
Let’s summarize all the parameters which must be speciﬁed to turn the abstract idea of the Digit Test into a computer program.
It should be clear by now that the hierarchical design of the complete testing procedure results in huge resource requirements both computationally as well as memorywise to actually run the test. See Table 4.1 for the actual parameters used, and Table 4.2 for the generators tested. The computations were carried out using the pLab [57] PRNG testing framework, which is based on the author’s generator library.


For the graphics, the ${t}_{1}$ value was transformed according to the expected ${\chi}^{2}$ distribution to yield a value which should be asymptotically equidistributed. Thus all points in the graphics on the left hand side should vary freely between 0 and 1. Values close to 0 signify a distribution of the $s$tuples which is much to wellbalanced, whereas values close to 1 indicate gross irregularities. The right hand graphics depict the KSstatistic ${t}_{2}$, for which the critical region at the $1\%$ level of signiﬁcance is $\left[1.63,\infty \right)$. Any generator which features high values there (especially when reaching the cutoﬀ point 2 in the graphics) fails in the test.
As Leeb in [58], we present here only a selection of the results, namely those for dimension $s=3$. For each generator, the values of $k$ and $l$ were varied.
Interpretation: The digit test seems to be sensible to intrinsic properties of the LCG, as even the best one (FISH) fails the test for certain parameters. Leeb conjectures in [58] that the digit test is sensible on grid structures or longrange correlation as these are two features which are present in all LCG but are proven to be absent in inversive generators. Especially the lattice quality parameter $1\u2215{\nu}_{s}$ seems to correlate with the digit test results. The better the lattice is (small values for $1\u2215{\nu}_{s}$), the higher the values for $l$ and $k$ must be to uncover deﬁciencies in the generator.
The overlapping serial test, ﬁrst proposed by Marsaglia [66] as the “Mtuple test”, is in its basic setup quite similar to the digit test described above. The main diﬀerence is to use overlapping tuples. This modiﬁcation is rather small in the implementation, but requires some statistical work to calculate the expected distribution as the tuples are no longer independent.
We will describe here the empirical tests done by Wegenkittl [85, 86, 61]. As the testing procedure was very similar to the digit test setup, we will only list the diﬀerences here. Whereas Leeb in the digit test used only one way to extract bits from the stream of pseudorandom numbers, Wegenkittl used two:
This time these blocks of bits were used to generate variable number of tuples. Whereas in the digit test the sample size $M$ was always tuned to the subsequent ${\chi}^{2}$ test, Wegenkittl generated up to $M={2}^{26}$ tuples. From these tuples, a modiﬁed ${\chi}^{2}$ statistic ${t}_{1}^{\left(o\right)}$ was computed, resulting in the teststatistic ${\chi}_{o}$. Whereas the “normal ${\chi}^{2}$ test statistic” does not converge to a ${\chi}^{2}$ distribution for overlapping tuples due to the correlations, this modiﬁed one does (see [85, p. 57] for details). This procedure was repeated 32 times and the resulting empirical distribution of the ${\chi}_{o}$ values was compared to the theoretical one using a KS test.
The following graphics^{2} depict the results for all the generators used in the digit test, as well as for EICG7 which stands for $eicg\left({2}^{31}1,7,0\right)$.
The left hand diagrams show the values for each of the $32$ calculated ${t}_{1}^{\left(o\right)}$ test statistics as the lightness of each small rectangle. A white square signiﬁes a low value for ${t}_{1}^{\left(o\right)}$, meaning perfect equidistribution of the tuples, whereas a black one signiﬁes extreme deviations. The transformation ${t}_{1}^{\left(o\right)}\mapsto $ lightness was chosen in such way that each grayscale level should be equally likely. Unfortunately, these diagrams are not available for all parameters.
In the right hand diagrams these $32$ values were distilled into one single KS value representing the quality of their distribution. The critical region at the $1\%$ level of signiﬁcance is in this case $\left[1.58,\infty \right)$.
Interpretation: Like the Digit Test, the overlapping serial test does uncover deﬁciencies in the linear generators. Whereas the setup in the Digit Test focused on the number of bits you can take from a number while still getting good distributions, Wegenkittl turned his attention to the number of tuples one can generate before the PRNG fails. The results are basically the same: even the best LCG has a limited loadcapability; if you start to do headyduty computations using many PRN you must be careful not to run into the breaking point of the LCG. The traditionally rule of thumb concerning which percentage of the period length one can safely use (up to $\sqrt{\text{period}}$) seems to be sound for the LCG.
The inversive generators are not perfect either, even they tend to fail the test at some point. But their loadcapability is much better. Even when taking a high number of samples, their distribution quality decreases quite slowly.
A completely diﬀerent kind of empirical test is the run test. The basic idea behind this test is to check if the occurrences of runs conforms to its expected value. There is a variety of diﬀerent ways to implement this idea, but we will only describe the idea and Entacher’s implementation [24].
First of all, what do we mean by “runs” ? In a binary context, that is sequence consisting only of two symbols, a run is deﬁned as a subsequence consisting only of one symbol. For example, consider the sequence
in which the brace indicate the runs.
As the common pseudorandom sequences are far from being binary, one has to transform them ﬁrst. A straight forward way of doing this is
which is binary as ${x}_{n+1}\ne {x}_{n}$ for PRNG we consider. A run of 1s of length $k$ corresponds to a monotonically increasing subsequence of length $k+1$ in the original sequence called a “run up”. The distribution properties of these runs in a sequence of “really random numbers” is well known.
According to Wolfowitz [63] (see also Knuth [50, p. 68]) Entacher constructed an asymptotically ${\chi}^{2}$distributed test statistic ${U}_{r}$ which involves counts for ascending runs up to length $r$. The calculation done by Entacher involved evaluating ${U}_{6}$ 100 times for each generator and testing this empirical distribution against the expected one using a KS test.
We will focus on Entacher’s results concerning the behaviour of LCGs and EICGs. The list of generators tested include the previously deﬁned generators FISH, ANSI, MINSTD, and RANDU, as well as two other LCGs with a bad lattice structure: LCG5$=lcg\left({2}^{31},{2}^{31}3,0,1\right)$ and LCG6$=lcg\left({2}^{31}1,{2}^{21}+1,0,1\right)$. As examples for the EICG Entacher used $eicg\left({2}^{31}1,a,0,0\right)$ with parameter $a\in \left\{{2}^{5},{2}^{10},{2}^{15},{2}^{20},{2}^{25},{2}^{30}\right\}$ which he labeled EICG1 to EICG6.
The following ﬁgures^{3} depict the results; The sample size was varied between ${2}^{12}$ and ${2}^{24}$ (the labels are ${log}_{2}\left(N\right)$), the height of each square shows the resulting KS statistic. If the square is coloured dark gray, then the KS statistic exceeds the critical value for the signiﬁcance level 0.01.
Interpretation: The run test seems to be able to distinguish good LCGs from bad ones. All basically randomly chosen EICG pass the test without any problems, showing again that the EICG is not sensitive to the parameter selection.
Another conclusion from this test is that subsequences taken by a leap frog technique are save when using EICGs, whereas such sequences taken from an LCG may exhibit a bad lattice and can fail the run test. See an upcoming paper from Entacher [24] for details.
As mentioned on page §, the weighted spectral test is a promising new approach to assess the quality of pseudorandom numbers, see Hellekalek [40, 41], Hellekalek and Niederreiter [46], and Hellekalek and Leeb [44]. One numerical realization of the weighted spectral test is the diaphony, see Hellekalek and Niederreiter [46].
Both the diaphony [90] as well as the classic spectral test [9] approach the point set from a Fourier point of view, looking for any disturbances in the spectrum. If the point set has lattice structure (as in the case of LCGs), the ﬁrst wavelength which yields a nonzero Fourier coeﬃcient corresponds to the largest distance between hyperplanes. Where the classic spectral test targets just this wavelength, the diaphony tries to include more information, namely a weighted sum over all possible wavelengths. High wavelengths (which correspond to low frequencies) in the point set indicate a large scale imbalance, whereas high frequencies target ﬁne structures in the set. As these ﬁne structures are unavoidable at the certain point, higher frequencies are considered less important and thus will contribute little to the diaphony.
So basically the $s$dimensional diaphony is a weighted sum over the correlation coeﬃcients of the point set, which are basically the Weyl sums ${S}_{N}$ we encountered in Section 3.3, see [41] for details.
There are close ties between the discrepancy and the diaphony. One can bound one in terms of the other (see Stegbuchner [79]), one can interpret it, too, as integration error (see James, Hoogland and Kleiss [47]), and it is possible to derive Lemmata similar to those in section 3.3.2 (see [41]).
Whereas it is virtually impossible to do any reasonable empirical studies with discrepancy due to its computational complexity of $\mathcal{\mathcal{O}}\left({N}^{s}\right)$, the diaphony only needs $\mathcal{\mathcal{O}}\left(s\cdot {N}^{2}\right)$ steps to calculate it. Thus it is possible to conduct empirical test using this ﬁgure of merit. In the following we will describe the results obtained by Hellekalek [40].
The test procedure was as follows: For a given generator, for a given dimension $s$, and a given sample size $N$, the diaphony ${F}_{N}^{2}$ was evaluated for 20 samples of nonoverlapping $s$tuples. As the expected value for ${F}_{N}^{2}$ equals $1\u2215N$, Hellekalek multiplied the diaphony by $N$ to get the same expected value for all sample sizes, see Leeb [60] for the theoretical distribution.
In the graphics^{4}, the abscissa shows the sample size as ${log}_{2}N$ and the ordinate the average value over the 20 samples. For each generator from the now familiar set (see Table 4.2) the calculation was done in dimensions 2 ($\u25b3$), 3 ($\circ $), 4 ($\star $), 5 ($\u2662$), and dimension 6 ($\mid $).
Frank Härtel [37] implemented an impressive array of PRNG and compared their performance under a variety of statistical tests. Unfortunately he used only one EICG ($eicg\left(1000081,240318,197,0\right)$), whose modulus ($p<{2}^{20}$) is far too small to be able to compete with generators featuring a period length of ${2}^{32}$. We will therefore not elaborate on his results.
Both EichenauerHerrmann and L’Ecuyer presented results of empirical test concerning the minimal distance between vectors of PRN on the MC&QCM’96 conference in Salzburg. Although the results have not been published yet, one can summarize them as follows: due to their lattice structure, LCGs are not able to simulate the correct behaviour of the test statistic whereas EICGs with the same period length pass this test with ﬂying colors. I refer to the upcoming proceedings for details.
This section discusses the implementation of the EICG pseudorandom number generator using a standard procedural programming language. We will use C syntax for the code printed here.
The algorithms presented here were used to write a generic and portable PRNG library which implements not only the EICG, but other congruential generators, too. This library (written in ANSI C) is available on the Internet from the pLab WWW server at http://random.mat.sbg.ac.at/.
If we look at the deﬁnition of the EICG (see p. 1.3.4), we see that generating the numbers is a two step procedure. First we have to compute the ${y}_{n}:=\overline{a\left({n}_{0}+n\right)+b}$, a calculation operating in the ﬁnite ﬁeld ${\mathbb{Z}}_{p}$ which can be done by standard integer calculation modulo $p$. The second step is the scaling operation ${x}_{n}:={y}_{n}\u2215p$, which we will implement as a straight forward ﬂoating point division.
We will thus focus on the ﬁrst step, which includes the following three operations
How they are best implemented depends on how numbers are represented in the computer. We use integer arithmetic based on the native integer format of the computer. Most current workstations use 32bit integers which can hold numbers from ${2}^{31}$ to ${2}^{31}1$. This limits the choice of the modulus to values smaller than ${2}^{31}$, a common choice is the Mersenne prime ${2}^{31}1$. Large values for $p$ give the resulting PRN a ﬁne resolution, but one has to pay for this with an increase in calculation time. If this resolution, as well as the period length it implies, are not adequate, one can use the technique of combining generators (see p. §), to generate even better pseudorandom numbers.
First we turn to the problem of modular inversion (denoted throughout this text by overlining ($\overline{a}$) the operand) which is deﬁned as
where ${a}^{1}$ is the uniquely deﬁned element of ${\mathbb{Z}}_{p}^{\ast}={\mathbb{Z}}_{p}\setminus \left\{0\right\}$ such that $a\overline{a}modp=1$.
The special case $a=0$ is easily handled, what remains is to ﬁnd ${a}^{1}$ for $a\ne 0$. There are two diﬀerent ways to compute the inverse, one of them is to utilize the fact that
by the wellknown theorem of Euler–Fermat, and thus ${a}^{\phi \left(p\right)1}\equiv \overline{a}\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)$ holds. In our case here the modulus is prime, thus $\phi \left(p\right)$, Euler’s totient function, is equal to $p1$, which gives us
Evaluating ${a}^{b}\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)$ is a wellknown exercise in computational number theory (e.g. in the RSA cryptosystem). It can be solved in logarithmic time [8, p. 829], but intermediate results exceed the domain $\left[p,p\right]$. This fact renders an implementation diﬃcult because of the limited integers available on a computer.
A diﬀerent approach is to use an extended version of Euclid’s algorithm. This algorithm is usually used to calculate the greatest common divisor of two numbers, but it can also be used to calculate the integers $x$ and $y$ which fulﬁll the linear diophantic equation $ax+by=gcd\left(a,b\right)$. By substituting $p$ for $b$ and observing that $gcd\left(a,p\right)=1$, one gets
which can be rewritten as
This algorithm to calculate $x$ and $y$ is based on the following recursion: The division with remainder $a=qb+r$ is used to calculate $q$ and $r$. If we can ﬁnd ${x}^{\prime}$ and ${y}^{\prime}$ which fulﬁll ${x}^{\prime}b+{y}^{\prime}r=1$, then $x={y}^{\prime}$ and $y={x}^{\prime}q{y}^{\prime}$ will satisfy $xa+yb=1$. Since $b<a$ and $r<b$ the question how to ﬁnd the ${x}^{\prime}$ and ${y}^{\prime}$ will lead to a trivial case.
Figure 5.1 shows a straightforward recursive implementation of the extended Euclid’s algorithm. Figure 5.2 demonstrates how the modular inversion can be based on rec_eeuclid.

This recursive implementation is not very eﬃcient due to the overhead caused by the repeated function calls. Although rec_eeuclid is not endrecursive, it is possible to rewrite it as an iterative function [50]. Figure 5.3 is a C implementation of the modular inversion using this method. A further optimization [77, p. 521] is to unroll the loop twice to avoid unnecessary swapping of the variables in each iteration.

Gordon [35] describes a modiﬁcation which uses shift operations to avoid multiplications and divisions. This does pay on certain computers, for example on SPARC, R4000, or Alpha AXP based systems this approach is faster. On the other hand, the division on the i486 is comparatively fast, thus the original version runs faster there.
The number of recursive calls in Euclid’s algorithm is of the order $O\left(lgb\right)$, see [8, p. 810]. An equivalent statement is, that the arguments of these calls decrease exponentially. Another way to put this is that rec_eeuclid will need about the same number of recursive calls to get from $b\approx 16384$ to $b\approx 128$ as it needs from $b\approx 128$ to the end of the recursion.
This observation leads to another technique to speed up the computation of the multiplicative inverse: For all $a,b$ smaller than some threshold we use a table of precomputed values for $x$ and $y$ instead of continuing the recursion. This way some recursive calls can be avoided.

To test the advantages of this approach, we compared it with the optimized iterative implementation. Table 5.1 lists the timings of my test case, that is calculating the multiplicative inverses of 2147483 uniformly distributed numbers modulo ${2}^{31}1$.
As long as the threshold is not larger than 256, a byte is enough to hold an element of the table. A threshold of 512 forces the program to resort to 16bit integers which doubles the memory requirements. It is hard to make a general statement which threshold is best, too much depends on the cache size, memory access speed, and on memory available. We have set the default to 256, which seems to result in a reasonable tradeoﬀ between memory and speed.
Our experience has shown that there is no single “best” algorithm, too much depends on the relative execution speed of various elementary operations. Thus our implementation includes three diﬀerent algorithms as well as a proﬁling program which can be used to select the one which is running fastest on the user’s computer.
The problem of evaluating $a\cdot n\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)$ lies in the limited range of the integers available in common programming languages. The intermediate result $an$ of the straightforward implementation is very likely not to ﬁt in machine size integers, so one has to devise an algorithm to calculate $an\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)$ in which all intermediate results are representable on a $b$bit computer.
One approach is the following algorithm due to Bratley, Fox, and Schrage [4, Sec. 6.5.2] which can compute $an\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)$ if ${a}^{2}<p$. The idea is to factor the modulus, but since this is not possible with primes, one has to deal with remainders, too. Let
where
Then one can rewrite $an\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)$ as
$$\begin{array}{rcll}an\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)& =& anp\phantom{\rule{0em}{0ex}}\left(andivp\right)& \text{}\\ & =& anp\phantom{\rule{0em}{0ex}}\left(ndivq\right)+p\phantom{\rule{0em}{0ex}}\underset{\delta \left(n\right)}{\underbrace{\left(ndivqandivp\right)}}& \text{}\\ & =& anaq\phantom{\rule{0em}{0ex}}\left(ndivq\right)r\phantom{\rule{0em}{0ex}}\left(ndivq\right)+p\phantom{\rule{0em}{0ex}}\delta \left(n\right)& \text{}\\ & =& a\phantom{\rule{0em}{0ex}}\left(nq\phantom{\rule{0em}{0ex}}\left(ndivq\right)\right)r\phantom{\rule{0em}{0ex}}\left(ndivq\right)+p\phantom{\rule{0em}{0ex}}\delta \left(n\right)& \text{}\\ & =& a\phantom{\rule{0em}{0ex}}\left(nmodq\right)r\phantom{\rule{0em}{0ex}}\left(ndivq\right)+p\phantom{\rule{0em}{0ex}}\delta \left(n\right)& \text{}\\ & =& \gamma \left(n\right)+p\phantom{\rule{0em}{0ex}}\delta \left(n\right)& \text{}\end{array}$$If $r<q$, which is a direct consequence of ${a}^{2}<p$, evaluating $\gamma \left(n\right)$ does not pose a numerical problem, for
and
and thus $\left\gamma \left(n\right)\right\in \left\{0,\dots ,p1\right\}$. Since $an\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)\in \left\{0,\dots ,p1\right\}$ evaluating $\delta \left(n\right)$ is unnecessary, because $\delta \left(n\right)=0$ iﬀ $\gamma \left(n\right)\ge 0$ and $\delta \left(n\right)=1$ iﬀ $\gamma \left(n\right)<0$. This leads to the following algorithm:
$q$ and $r$ can be precomputed, calculating $\left(ndivq\right)$ and $\left(nmodq\right)$ requires on some computers only one instruction, so this is a very eﬃcient algorithm.
For the usual choice of ${2}^{31}1$ for $p$ the above can be applied for all $a<{2}^{15}$. If $p$ is smaller than that, then the limitation that all intermediate results should be between $p$ and $p$ is unnecessarily tight. On a $b$bit computer all integers between ${2}^{b1}$ and ${2}^{b1}$ (exclusive) are representable. If we loosen the restriction on $a$ from ${a}^{2}<p$ to ${a}^{2}<{2}^{b2}$, the term $r\phantom{\rule{0em}{0ex}}\left(ndivq\right)$ is no longer bounded by $p$. But it can be shown that ${2}^{b1}$ is an upper bound:
$$\begin{array}{rcll}r\phantom{\rule{0em}{0ex}}\left(ndivq\right)& <& r\phantom{\rule{0em}{0ex}}\left(pdivq\right)& \text{}\\ & =& r\phantom{\rule{0em}{0ex}}\left(\left(aq+r\right)divq\right)& \text{}\\ & \le & r\phantom{\rule{0em}{0ex}}\left(a+rdivq\right)& \text{}\\ & \le & a\phantom{\rule{0em}{0ex}}\left(a+adivq\right)& \text{}\\ & \le & 2{a}^{2}& \text{}\\ & <& {2}^{b1}& \text{}\end{array}$$We can now conclude that ${2}^{b1}<\gamma \left(n\right)<p$ and $an\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)=\gamma \left(n\right)\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)$. Thus one possible algorithm is this:
The while loop can execute at most $\lceil {2}^{b1}\u2215p\rceil $ times. L’Ecuyer and Côté [56] observed that in the average case very few iterations are executed, thus this algorithm is eﬃcient.
If $a$ is not restricted in any way (except of course by the requirement $a<p$) one can use decomposition to reduce the multiplication to cases for which we already have solutions. This can be achieved by writing $a$ in base ${2}^{d}$, where $d=\left(b2\right)\u22152$ (usually 15 on current 32bit computers):
with $0\le {a}_{0},{a}_{1}<{2}^{d}$, and ${a}_{2}\in \left\{0,1\right\}$. Then
$$\begin{array}{rcll}an\left(mod\phantom{\rule{0em}{0ex}}\phantom{\rule{0em}{0ex}}p\right)=& & & \text{}\\ & =& \left({a}_{0}n\right)modp+\left(\left({a}_{1}n\right)modp\right){2}^{d}modp+\left(\left({a}_{2}n\right)modp\right){2}^{2d}modp& \text{}\\ & =& \left(\left(\left(\left({a}_{2}{2}^{d}n\right)modp+\left({a}_{1}n\right)modp\right)modp\right){2}^{d}modp+{a}_{0}nmodp\right)modp.& \text{}\end{array}$$In all four products modulo $p$ one of the factors is bounded by ${2}^{d}$, so the previously discussed algorithm can be applied. Figure 5.4 shows a C implementation of this method. It is used whenever it is not possible to resort to a simpler algorithm.

Modular addition is simple to solve, though it is not trivial. A straightforward implementation might look like this:
This is correct, as long as $a+b$ is still fully representable with the data type used. Assuming that the usual signed integer type is used, an overﬂow would occur if $a+b$ is negative. It cannot happen that $a+b$ is positive in spite of an overﬂow, since $a$ and $b$ are both smaller than ${2}^{b1}1$ (assuming a $b$bit computer) and $\left({2}^{b1}1\right)+1$ wraps to ${2}^{b1}$. If we detect an overﬂow, subtracting $p$ will bring the result back into the interval $\left[0,p1\right]$. Thus this
is a correct implementation.
Let us quickly summarize the main points of this thesis.
Let $p$ be a (large) prime and $a,b,{n}_{0}\in {\mathbb{Z}}_{p}$. The explicit inversive congruential generator (abbreviated as “EICG”) with parameters $p,a,b,$ and ${n}_{0}$ deﬁnes a sequence ${\left({y}_{n}\right)}_{n\ge 0}$ in ${\mathbb{Z}}_{p}$ by
and a sequence $eicg\left(p,a,b,{n}_{0}\right)={\left({x}_{n}\right)}_{n\ge 0}$ of pseudorandom numbers in $[0,1[$ by
where $\overline{c}$ denotes the multiplicative inverse modulo $p$ extended by $\overline{0}:=0$.
For parts of the period we get similar results: as an upper bound we have ${D}_{N}^{\left(s\right)}=\mathcal{\mathcal{O}}\left({N}^{1}{p}^{1\u22152}{\left(logp\right)}^{s+1}\right)$, and we can show the existence of EICGs with ${D}_{N}^{\left(s\right)}\ge c{N}^{1\u22152}$ for some constant $c$.
[1] S.L. Anderson. Random number generators on vector supercomputers and other advanced architectures. SIAM Rev., 32:221–251, 1990.
[2] D.A. André, G.L. Mullen, and H. Niederreiter. Figures of merit for digital multistep pseudorandom numbers. Math. Comp., 54:737–748, 1990.
[3] K. Binder and D.W. Heermann. Monte Carlo Simulation in Statistical Physics. An Introduction. 2nd corr. ed. SpringerVerlag Heidelberg New York, 1992.
[4] P. Bratley, B. L. Fox, and L. E. Schrage. A Guide to Simulation. Springer Verlag, second edition, 1987.
[5] G.J. Chaitin. Randomness and mathematical proof. Sci. Amer., 232:47–52, 1975.
[6] WunSeng Chou. On inversive maximal period polynomials over ﬁnite ﬁelds. Appl. Algebra Engrg. Comm. Comput., 6:245–250, 1995.
[7] T. Cochrane. On a trigonometric inequality of Vinogradov. J. Number Th., 27:9–16, 1987.
[8] Thomas H. Corman, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Press, ﬁrst edition, 1989.
[9] R.R. Coveyou and R.D. MacPherson. Fourier analysis of uniform random number generators. J. Assoc. Comput. Mach., 14:100–119, 1967.
[10] A. De Matteis and S. Pagnutti. Parallelization of random number generators and longrange correlations. Numer. Math., 53:595–608, 1988.
[11] G. Dueck and T. Scheuer. Threshold Accepting: A General Purpose Algorithm Appearing Superior to Simulated Annealing. Journal of Computational Physics, pages 161–175, 1990.
[12] W.F. Eddy. Random number generators for parallel processors. J. Comp. Appl. Math., 31:63–71, 1990.
[13] J. Eichenauer, H. Grothe, and J. Lehn. Marsaglia’s lattice test and nonlinear congruential pseudo random number generators. Metrika, 35:241–250, 1988.
[14] J. Eichenauer and J. Lehn. A nonlinear congruential pseudo random number generator. Statist. Papers, 27:315–326, 1986.
[15] J. EichenauerHerrmann. Statistical independence of a new class of inversive congruential pseudorandom numbers. Math. Comp., 60:375–384, 1993.
[16] J. EichenauerHerrmann. Nonoverlapping pairs of explicit inversive congruential pseudorandom numbers. Monatsh. Math., 119:49–61, 1995.
[17] J. EichenauerHerrmann. Modiﬁed explicit inversive congruential pseudorandom numbers with power of 2 modulus. Statistics and Computing, 6:31–36, 1996.
[18] J. EichenauerHerrmann and F. Emmerich. A review of compound methods for pseudorandom number generation. In P. Hellekalek, G. Larcher, and P. Zinterhof, editors, Proceedings of the 1st Salzburg Minisymposium on Pseudorandom Number Generation and QuasiMonte Carlo Methods, Salzburg, Nov 18, 1994, volume ACPC/TR 954 of Technical Report Series, pages 5–14. ACPC – Austrian Center for Parallel Computation, University of Vienna, Austria, 1995.
[19] J. EichenauerHerrmann and F. Emmerich. Compound inversive congruential numbers: an averagecase analysis. Math. Comp., to appear, 65:215–225, 1996.
[20] J. EichenauerHerrmann and H. Grothe. A remark on longrange correlations in multiplicative congruential pseudo random number generators. Numer. Math., 56:609–611, 1989.
[21] J. EichenauerHerrmann and E. Herrmann. A survey of quadratic and inversive congruential pseudorandom numbers. Submitted to Proceedings of the Second International Conference on Monte Carlo and QuasiMonte Carlo Methods in Scientiﬁc Computing, Salzburg, July 9–12, 1996.
[22] J. EichenauerHerrmann and K. Ickstadt. Explicit inversive congruential pseudorandom numbers with power of two modulus. Math. Comp., 62:787–797, 1994.
[23] J. EichenauerHerrmann and H. Niederreiter. Bounds for exponential sums and their applications to pseudorandom numbers. Acta Arith., 67:269–281, 1994.
[24] K. Entacher. Selected random number generators in run tests. Preprint, Institut für Mathematik, Universität Salzburg, Austria, 1996.
[25] K. Entacher and P. Hellekalek. Parallel stochastic simulation: inversive pseudorandom number generators. In G. De Pietro and P. Zinterhof A. Giordano, M. Vajteršic, editors, Proceedings of the International Workshop Parallel Numerics 95, Sorrento, Italy, September 27–29, 1995, pages 1–14. IRSIP Institute of the NRC of Italy, Naples, 1995.
[26] K. Entacher and H. Leeb. Inversive pseudorandom number generators: empirical results. In Proceedings of the Conference Parallel Numerics 95, Sorrento, Italy, September 27–29, 1995, 1995.
[27] G.S. Fishman. Multiplicative congruential random number generators with modulus ${2}^{\beta}$: an exhaustive analysis for $\beta =32$ and a partial analysis for $\beta =48$. Math. Comp., 54:331–344, 1990.
[28] G.S. Fishman and L.R. Moore. A statistical evaluation of multiplicative congruential random number generators with modulus ${2}^{31}1$. J. Amer. Statist. Assoc., 77:129–136, 1982.
[29] G.S. Fishman and L.R. Moore. An exhaustive analysis of multiplicative congruential random number generators with modulus ${2}^{31}1$. SIAM J. Sci. Statist. Comput. (see also the Erratum, ibid. 7(1986), p. 1058), 7:24–45, 1986.
[30] M. Flahive and H. Niederreiter. On inversive congruential generators for pseudorandom numbers. In G.L. Mullen and P.J.S. Shiue, editors, Finite Fields, Coding Theory, and Advances in Communications and Computing, pages 75–80. Dekker, New York, 1992.
[31] Mark Fleischer. Simulated Annealing: Past, Present, and Future. In Proceedings of the 1995 Winter Simulation Conference, pages 155–161, 1995.
[32] J. Foley, A. van Dam, S. Feiner, and J. Hughes. Computer Graphics: Principles and Practice. AddisonWesley, second edition, 1990.
[33] I. Goldberg and D. Wagner. Netscape SSL implementation cracked!
Available on the WWW at
http://tezcat.com/web/security/items/sslnews.txt.
[34] T. Gonzales, S. Sahni, and W.R. Franta. An eﬃcient algorithm for the KolmogorovSmirnov and Lillefors tests. ACM Trans. Mathem. Softw., 3:60–64, 1977.
[35] J. Gordon. Fast Multiplicative Inverse in Moduar Arithmetic. In H. J. Beker and F. S. Piper, editors, Cryptography and Coding. Oxford Clarendon Press, 1989.
[36] J.H. Halton. Pseudorandom trees: multiple independent sequence generators for parallel and branching computations. J. Comp. Physics, 84:1–56, 1989.
[37] F. Härtel. Zufallszahlen für Simulationsmodelle. PhD thesis, Hochschule St. Gallen für Wirtschafts, Rechts und Sozialwissenschaften, St. Gallen, 1994.
[38] S. Heinrich. Eﬃcient algorithms for computing the ${L}_{2}$ discrepancy. Interner Bericht, Fachbereich Informatik, Universität Kaiserslautern, 1995.
[39] P. Hellekalek. Study of algorithms for primitive polynomials. Report D5H1, CEIPACT Project, WP5.1.2.1.2, Research Institute for Software Technology, University of Salzburg, Austria, 1994.
[40] P. Hellekalek. Correlations between pseudorandom numbers: theory and numerical practice. In P. Hellekalek, G. Larcher, and P. Zinterhof, editors, Proceedings of the 1st Salzburg Minisymposium on Pseudorandom Number Generation and QuasiMonte Carlo Methods, Salzburg, Nov 18, 1994, volume ACPC/TR 954 of Technical Report Series, pages 43–73. ACPC – Austrian Center for Parallel Computation, University of Vienna, Austria, 1995.
[41] P. Hellekalek. On correlation analysis of pseudorandom numbers. Submitted to Proceedings of the Second International Conference on Monte Carlo and QuasiMonte Carlo Methods in Scientiﬁc Computing, Salzburg, July 9–12, 1996, 1996.
[42] P. Hellekalek and K. Entacher. Revised implementation and testing of the algorithms for IMPpolynomials. Report D5H3, CEIPACT Project, WP5.1.2.1.2, Research Institute for Software Technology, University of Salzburg, Austria, 1995.
[43] P. Hellekalek and K. Entacher. Tables of IMPpolynomials. Report D5H4, CEIPACT Project, WP5.1.2.1.2, Research Institute for Software Technology, University of Salzburg, Austria, 1995.
[44] P. Hellekalek and H. Leeb. Dyadic diaphony. To appear in Acta Arithmetica, 1996.
[45] P. Hellekalek, M. Mayer, and A. Weingartner. Implementation of algorithms for IMPpolynomials. Report D5H2, CEIPACT Project, WP5.1.2.1.2, Research Institute for Software Technology, University of Salzburg, Austria, 1994.
[46] P. Hellekalek and H. Niederreiter. The weighted spectral test: diaphony. In preparation, 1996.
[47] F. James, J. Hoogland, and R. Kleiss. Multidimensional sampling for simulation and integration: measures, discrepancies, and quasirandom numbers. Preprint submitted to Computer Physics Communications, 1996.
[48] M.H. Kalos and P.A. Whitlock. Monte Carlo Methods, Volume I: Basics. Wiley, New York, 1986.
[49] J. Kiefer. On large deviations of the empiric d.f. of vector chance variables and a law of the iterated logarithm. Paciﬁc J. Math., 11:649–660, 1961.
[50] D. E. Knuth. The Art of Computer Programming, Vol. 2: Seminumerical Algorithms. AddisonWesley, Reading, MA, second edition, 1981.
[51] L. Kuipers and H. Niederreiter. Uniform Distribution of Sequences. Wiley and Sons, New York London Sydney Toronto, 1974.
[52] J. C. Lagarias. Pseudorandom numbers. Statistical Science, 8:31–39, 1993.
[53] P. L’Ecuyer. Random numbers for simulation. Comm. ACM, 33:85–97, 1990.
[54] P. L’Ecuyer. Testing random number generators. In J.J. Swain et al., editor, Proc. 1992 Winter Simulation Conference (Arlington, Va., 1992), pages 305–313. IEEE Press, Piscataway, N.J., 1992.
[55] P. L’Ecuyer. Uniform random number generation. Ann. Oper. Res., 53:77–120, 1994.
[56] P. L’Ecuyer and S. Côté. Implementing a Random Number Package with Splitting Facilities. ACM Transactions on Mathematical Software, 17(1):98–111, March 1991.
[57] H. Leeb. pLab – a system for testing random numbers. In M. Vajteršic and P. Zinterhof, editors, Proceedings of the International Workshop on Parallel Numerics ’94, Smolenice, Sept. 19–21, pages 89–99. Slovak Academy of Sciences, Institute for Informatics, 1994. Available on the internet at http://random.mat.sbg.ac.at.
[58] H. Leeb. On the digit test. In P. Hellekalek, G. Larcher, and P. Zinterhof, editors, Proceedings of the 1st Salzburg Minisymposium on Pseudorandom Number Generation and QuasiMonte Carlo Methods, Salzburg, Nov 18, 1994, volume ACPC/TR 954 of Technical Report Series, pages 109–121. ACPC – Austrian Center for Parallel Computation, University of Vienna, Austria, 1995.
[59] H. Leeb. Random Numbers for Computer Simulation. Master’s thesis, Institut für Mathematik, Universität Salzburg, Austria, 1995.
[60] H. Leeb. A weak law for diaphony. Rist++ 13, Research Institute for Software Technology, University of Salzburg, 1996.
[61] H. Leeb and S. Wegenkittl. Inversive and linear congruential pseudorandom number generators in empirical tests. submitted to ACM Trans. Modeling and Computer Simulation, 1996.
[62] V. F. Lev. On two versions of ${L}^{2}$discrepancy and geometrical interpretation of diaphony. Acta Math. Hungar., 69:281–300, 1995.
[63] H. Levene and J. Wolfowitz. The covariance matrix of runs up and down. Annals Math. Stat., 15 :59–69, 1944.
[64] R. Lidl and H. Niederreiter. Finite Fields. AddisonWesley, Reading, Mass., 1983.
[65] G. Marsaglia. Random numbers fall mainly in the planes. Proc. Nat. Acad. Sci., 61:25–28, 1968.
[66] G. Marsaglia. A current view of random number generators. In L. Brillard, editor, Computer Science and Statistics: The Interface, pages 3–10, Amsterdam, 1985. Elsevier Science Publishers B.V. (North Holland).
[67] L. Ming and P. Vitány. An Introduction To Kolmogorov Complexity And Its Applications. Texts and Monographs in Computer Science. Springer Verlag, New York, 1993.
[68] C.J. Moreno and O. Moreno. Exponential sums and Goppa codes: I. Proc. Amer. Math. Soc., 111:523–531, 1991.
[69] Netscape Communications Corporation. Potential Vulnerability in
Netscape Products. Available on the WWW at
http://www.netscape.com/newsref/std/random_seed_security.html.
[70] H. Niederreiter. Random Number Generation and QuasiMonte Carlo Methods. SIAM, Philadelphia, USA, 1992.
[71] H. Niederreiter. On a new class of pseudorandom numbers for simulation methods. J. Comput. Appl. Math., 56:159–167, 1994.
[72] H. Niederreiter. New developments in uniform pseudorandom number and vector generation. In H. Niederreiter and P.J.S. Shiue, editors, Monte Carlo and QuasiMonte Carlo Methods in Scientiﬁc Computing, volume 106 of Lecture Notes in Statistics. SpringerVerlag, Heidelberg New York, 1995.
[73] S.K. Park and K.W. Miller. Random number generators: good ones are hard to ﬁnd. Comm. ACM, 31:1192–1201, 1988.
[74] C. A. Pickover. Random number generators: pretty good ones are easy to ﬁnd. The Visual Computer, 11:369–377, 1995.
[75] B. D. Ripley. Stochastic Simulation. John Wiley, New York, 1987.
[76] R. A. Rueppel. Stream Ciphers. In Gustavus J. Simmons, editor, Contemporary cryptology: the science of information integrity, chapter 2. IEEE Press, 1992.
[77] Bruce Schneier. Applied Cryptography. John Wiley & Sons, Inc., ﬁrst edition, 1993.
[78] I.M. Sobol. Die MonteCarloMethode. VEB Deutscher Verlag der Wissenschaften, 1983.
[79] H. Stegbuchner. Zur quantitativen Theorie der Gleichverteilung mod 1. Arbeitsberichte, Mathematisches Institut der Universität Salzburg, Salzburg, Austria, 1980.
[80] O. Strauch. ${L}^{2}$ discrepancy. Math. Slovaca, 44:601–632, 1994.
[81] R.C. Tausworthe. Random numbers generated by linear recurrence modulo two. Math. Comp., 19:201–209, 1965.
[82] S. Tezuka. Uniform Random Numbers: Theory and Practice. Kluwer Academic Publ., 1995.
[83] S. Tezuka and M. Fushimi. Calculation of Fibonacci polynomials for GFSR sequences with low discrepancies. Math. Comp., 60:763–770, 1993.
[84] J.F. Traub and H. Woźniakowski. The Monte Carlo algorithm with a pseudorandom generator. Math. Comp., 58:323–339, 1992.
[85] S. Wegenkittl. Empirical Testing of Pseudorandom Number Generators. Master’s thesis, Institut für Mathematik, Universität Salzburg, Austria, 1995.
[86] S. Wegenkittl. On empirical testing of pseudorandom number generators. In G. De Pietro, A. Giordano, M. Vajteršic, and P. Zinterhof, editors, Proceedings of the International Workshop Parallel Numerics 95, Sorrento, Italy, September 27–29, 1995, pages 113–123. IRSIP Institute of the NRC of Italy, Naples, 1995.
[87] A. Weingartner. Nonlinear congruential pseudorandom number generators. Master’s thesis, Universität Salzburg, Austria, 1994.
[88] B.A. Wichmann and I.D. Hill. An eﬃcient and portable pseudorandom number generator. Appl. Statist., 31:188–190, 1982. Corrections, ibid. 33, 123 (1994).
[89] P. Winker and KaiTai Fang. Application of threshold accepting to the evaluation of the discrepancy of a set of points. Research report, Universität Konstanz, Germany, 1995.
[90] P. Zinterhof. Über einige Abschätzungen bei der Approximation von Funktionen mit Gleichverteilungsmethoden. Sitzungsber. Österr. Akad. Wiss. Math.Natur. Kl. II, 185:121–132, 1976.
Name: Otmar Lendl
Date of birth: May 8th, 1970
Place of birth: Salzburg, Austria
Parents: Ingeborg and Wolfgang Lendl
Education:
1976–1980:  Volksschule in Salzburg 
1980–1988:  Gymnasium in Salzburg 
1988–1996:  University studies (M.Sc. in Mathematics) 
at the University of Salzburg 