Random Number Generators: True Randomness

I. Introduction:

Random (adj): a: lacking a definite plan, purpose, or pattern. b: made, done, or chosen at random c: relating to, having, or being elements or events with definite probability of occurrence. d: being or relating to a set or to an element of a set each of whose elements has equal probability of occurrence. [Oxford English Dictionary]

Before commencing deep discussion of the art of “true randomality”, it must first be made clear that true randomness is theoretically impossible by the defining principals of our universe. The definition above clearly defines the paradox that surrounds the concept of randomality when subject to probability. In essence we will claim that “truly random” is the state in which for a given set A, for any i, element in A, i if chosen at random, has a probability of [1/|A|] (where |x| denotes cardinality of the set) of occurrence. This is how we judge the “randomness” of a Random Number Generator (RNG(s)), by its ability to exploit probability; given a set A, a perfect RNG will not repeat an element before the set is exhausted; described as the period of a generator, its point of repetition.

It must be noted that defining a choice as random is a classification that relies on pure ignorance of the causes and events that result in the ultimate choice. With that aside, the philosophical discussion of “true randomness” will be left behind. The remainder of the discussion will judge “true randomness” as stated above; perfect probabilistic distribution over a given finite field. Although such distribution has never been possible with the various algorithms being discussed, (meaning such a distribution could not be perfect on every occurrence of a specific algorithm) relatively good distribution suffices.

I.i Various Uses of Random Number Generators:

Random numbers have a multitude of applications. Of particular interest to this study and intended future studies by the author is Cryptography. Many cryptographic protocols make use of RNGs, particularly, public key cryptography (RSA) and some implementations of symmetric ciphers (DES, Serpent). Besides cryptographic function however, RNGs are used in Simulations, for the realistic recreation of “natural” occurrences; in this case, computer games are qualified as simulations, in which RNGs are heavily used in conjunction with probability weights (Gaussian). They’re also used for integrity testing on various computer applications during development, even hardware tests, such as GPU to memory pipelines on AGP based graphic cards. Among those mentioned are many other uses and purposes for the development and “perfection” of making truly random choices.

I.ii Brief Algorithm Introduction:

Random number generating algorithms come in multiple flavors; these can be separated into two main groups, linear number generators (LNG(s)) and non-linear number generators (NLNG(s)). Each group contains multiple types of RNGs and each of these, have their purpose and uses. It is important to know that although not all generators are made equal, good generators have purposes for which other good generators are not suited to perform.

I.ii.a Linear Number Generators:

Linear Congruential Generators (LCG(s)) deserve first mention purely on the basis of ubiquity. LCGs and their various spawns and modifications are used in various applications. The LNGs in their purest form are nearly as predictable as the Fibonacci Sequence. These are seeded generators that make their “choices” – if that can be said – linearly in the given finite field, based on their seed. The come in more flavors that most other generator types, no doubt, due to the simplicity of modifying the algorithm for specific purposes.

I.ii.b Non-Linear Number Generators:

The Inversive Congruential Generator (ICG(s)) and the Explicit Inversive Congruential Generator (EICG(s)) are the two main focuses in this category. These generators are non-linear (as implied), and therefore are not predictable the way that LNGs are. Also mentioned in the non-linear group is the Linear Feedback Shift Register (LFSR(s)) generator. This generator, although linear, as implied by its name, carries the principles of Non-Linear Generators in its implementation; so much so, that LFSRs closely resemble their non-linear counter parts, Non-Linear Feedback Shift Register generators. The details of the history of Feedback Shift Registers go slightly beyond the scope of this paper however a brief introduction to the principals of feedback functions and shift registers is given in parallel with the LFSR discussion.

II Linear Congruential Generators:

LNGs ultimately generate a sequence of integers between 0 and a given modulus, for the equation Ij+1 = aIj + c (mod m). In this equation, m is used as the modulus, a is a multiplier, and c is an increment. The sequence will repeats within a period of m, where m is usually prime.

The advantage of the LNG is immediately noticed by the equation above; its fast, requiring one multiplication, one addition, and one modulus. This immediately lends itself to explaining its wide uses. This type of generator is used in a multitude of applications for which the nature of the random sequence doesn’t matter, only that it be a different sequence from execution to execution; Monte Carlo simulations for example. The speed and simplicity of the algorithm has also led to the amount of flavors developed for different applications. For instance, the equation above is the same equation that the ANSI C/C++ committee has dubbed for use with these languages, given appropriate values for a, c, and m. Besides the linear equation, there are also:

Quadratic LCG:

Xn = (aXn-12 + bXn-1 + c) mod m

Cubic LCG:

Xn = (aXn-13 + bXn-12 + cXn-1 + d) mod m

There are also polynomial LCGs, Truncated LCGs, etc; each of which work on the same principle and are generally predictable and breakable. For example, the standard LCG was first broken by Jim Reeds, and Quadratic and Cubic Generators broken by Joan Boyar.

It should be noted that most modifications to LCGs simply yield worst generators. Since linear generators can only be judged based on their progression through a linear field, and not on a probabilistic survey of a given set, the period of a LCG is of paramount importance, more so than other Congruential generators, even in non-linear fields. There are multiple methods for extending the period of a given generator, however many attempts, unpredictably yield shorter periods. Computing history, both hardware and software, is filled with the “botched” attempts to improve LCGs. Notably, early IBM mainframes with the RANDU routine, using a = 65539, and m = 231 producing a n – dimensional plot in only 11 plane dimensions. There’s also the problem of seeding the generator; each sequence is only as unique as its seed. The Netscape browser’s security was a one point compromised due to the predictability of the chosen seeds in it LCG for which it created crypto-keys.

It is possible to force an LCG to be probabilistically correct, by creating uniform deviations. The problem with focusing on the deviation of an LCG is that a well deviated LCG will have an extended period, but its complexity increases, due to the deviation, sometimes beyond the useful range of LCGs. It is also possible to use deviations normalized in a given interval, such as Gaussian deviations, where the period is lengthened such that every integer in the given field is selected.

III Inversive Congruential Generators and Linear Feedback Shift Registers:

Defined as our non-linear generators, ICGs and EICGs, developed by Eichenauer and Lehn (1986, 1993) are defined by the following congruence:

yn+1 = ayn + b (mod m)

Where, for practicality, m is also chosen as a prime number in parallel to LCGs. These two generators are most notable due to their speed and period efficiency compromise. They produce relatively long periods, in (5 – 10x average) more time, than LCG.

III.i Linear Feedback Shift Registers:

LFSR generators are of more relevance here, as a suitable alternative to LCGs for cryptographic applications. Shift registers work on the concept of generating on the bit level, instead of in a base 10 finite field. They address the problem of creating uniform distribution of single random bits, with 0 and 1 equally probable.

LFSR generators are made up of two parts, the shift register and the feedback function. The shift register is the bit sequence, in which the size of the shift register is determined by the length of its bits. When bits are needed, the register is shifted 1 bit to the right, and the left bit is computed based on the remaining bits in the register. The simplest method of representing LFSRs feedback function is the XOR of key register bits. The method used by the feedback function is called tap sequence.

It’s obvious, that an n-bit LFSR can be in 2n -1 states; noting that the size of the register denotes its length. Therefore, a 4 bit register (1111 would be called a 4 bit register, based on the cardinality of the register, rather than the value of the high order bit, base 10) would yield 24-1, or 15 unique states. With an output sequence of the least significant bits following the shifts, the size of the register defines a period or 2n – 1 bits before repetition, therefore yielding a base 10 value of 22**n -1 as maximum value, and period. In order to reach this maximum period, a primitive polynomial must be formed by the tap sequence, where the degree of the polynomial yields the length of the shift register.

III.i.a Primitive Polynomials in Linear Feedback Shift Registers

A primitive Polynomial is defined as an irreducible polynomial P of degree d in [Z/p[x]] (denoting finite field of p[x] in Z) is primitive if P divides xd – 1 but does not divide xi – 1 for any integer i with 0 xn = 1 mod P

but,
xi != 1 mod P for 0

It is not necessary to check all smaller exponential values than d, but only possible values from the divisors of degree d.

There is no easy method for generating primitive polynomials modulo 2 for a given degree. Therefore, choosing random polynomials and testing if its primitive is the most used method.

III.i.b Maximizing and Reaching the Period of an LFSR

As stated above, the period of any LFSR is based on the size of the shift register; the size of the shift register is based on the degree of the polynomial, therefore the period of the LFSR is determined by the polynomial used. Primitive polynomials are necessary here because if the polynomial used is not primitive, the period will be shortened, and there may be bad states which may shorten the period further. Therefore choosing a primitive polynomial is of dominant importance to any LFSR.

Given the polynomial, x8+x4+x3+x2+x which is primitive modulo 2, a maximal period can be produced. The first exponent is the length of the shift register, and all exponents except 1 and 0 are used to specify the tap sequence by the feedback function, where low degree terms correspond to the left most register bits. Therefore, in a given 8-bit shift register, a new bit is generated by XORing the 8th, 4th, 3rd and 2nd bits of the register together. The LFSR that results from this polynomial will have a precise period of 28-1.

Linear Feedback Shift Registers are found in many places, such as in Ueli Maurer’s Fast Prime Generation routine, but most commonly in the stream-cipher domain of cryptography. Therefore the propagation of primitive polynomials is important to success of such ciphers. The polynomial used in LFSRs is as important as the seed used in a LCG. Just as LCGs are cracked by deterministic seeds, LFSRs are also cracked by many organizations, based on the use and reuse of deterministic primitive polynomials. It should also be noted that the algorithm itself is completely linear when sequential bits are taken into account, however its implementation in software makes in non-linear. By running simultaneous LFSR applications (32 for a computer with a word size of 32) nonlinear sequences can be produced; rather, seemingly nonlinear, described as a large linear complexity.

IV Discussion:

The Linear Congruential Generators we presented here for an ease into the concept of random number generation, and for study of a variety of methods of generation. However, in the domain of cryptography, LCGs are all but irrelevant. LFSR, and their “cousins” Non-Linear Feedback Shift Registers are of more relevance to the authors interest. Besides the methods described above, another interesting method of generating random numbers are also worth brief mention.

Symmetric block ciphers are also capable of producing randomality. Beginning with block product ciphers like LUCIFER and DES, the use of block ciphers to produce random bits, although many times slower than LCG and even LFSR, yields considerably random bit sequences. The more recent successor to DES, AES (Rijndael algorithm) and even the runner up contender to AES, Serpent, have also been used for the generation of random bits.

Essentially, the philosophical debate of the concept of randomality displays the obvious difficulty in constructing a truly random algorithm. The fact that most random generation algorithms are based on random sequences, be they number sequences or bit sequences, undermines the term random from the very first line of code and comments justifying its feasibility. However, random sequences are frequently used, and make amply use of the ignorance factor. As long as new methods of seeding and new primitive polynomials are found, for which we have an infinite supply, the number generators above will remain in use. The possible existence of a truly random number algorithm is questionable, but with many algorithms and sequences still remaining unbroken and uncharted, that existence cannot be denied.