Visual Basic for Applications/Pseudo Random Repeated Substrings

Summary
'''This page describes some matters that apply to the Rnd function of VBA. In particular it illustrates that repeated substrings can result when the Randomize function is wrongly positioned inside the same loop as Rnd, instead of before it.'''

The VBA Rnd Function

 * The Rnd function is pseudo random, as opposed to true random .  True randomness is found rarely, one notable example being the sequence of data that can be derived from white noise.   White noise, like the radio noise from the sun or perhaps the unintentional noise in a radio or other electronic device, has a fairly uniform distribution of frequencies, and can be exploited to produce random distributions of data; also known as linear probability distributions because their frequency distributions are straight lines parallel to the horizontal axis.
 * Pseudo randomness can be obtained with a feedback algorithm, where a sequence of output values of a function is fed back and assists in the making of the next part of an output stream.  These are referred to as pseudo random number generators (PRNG).  Such a process, although complex, is nonetheless determinate, being based entirely on its starting value.   Such generators, depending on their design can produce long sequences of values, all unique, before the entire stream eventually repeats itself.
 * A PRNG output stream will always repeat itself if a long enough set of values is generated. The Rnd function in VBA can generate a sequence of up to 16,777,216 numbers before any one number is repeated, at which time the entire sequence itself is repeated.  This is adequate in most cases.   The Rnd function has been described by Microsoft as belonging to the set of PRNGs called Linear Congruential Generators (LCG), though it is unclear as to whether or not the algorithm has since been modified.
 * The Rnd function is not suitable for large tables or for cryptographic use, and VBA itself is quite insecure in its own right. For given starting values the generator always will produce the same sequence. Clearly, if any part of the stream is known, this allows other values in the sequence to be predicted, and this state of affairs is insecure for cryptographic use.  Perhaps surprisingly, modeling methods that make much use of random values need even longer unique sequences than that produced by Rnd.
 * The exact coding of the Microsoft Rnd function is not available, and their descriptive material for it is quite sketchy. A recent  attempt by me to implement the assumed algorithm in VBA code failed because of overflow, so those who intend to study such generators in VBA need to use another algorithm.   Perhaps study of the Wichmann-Hill (1982) CLCG algorithm, that can be implemented in VBA would be a better choice.   A VBA implementation, (by others), of the Wichmann-Hill (1982) algorithm is provided in another page of this series, along with some simpler generator examples.

Worst Case for Rnd Substrings?

 * A well designed PRNG stream consists of unique numbers, but this applies only to the designer's unfiltered set of numbers in the range from zero to unity, [0,1].  As soon as we start to take some values from the stream, and ignore others, say to make custom outputs, the new steams will take on different characteristics.   The combination of cherry-picking elements from the natural sequence and the mapping of a large set to a very small set takes its toll.   When observing the new set, the count of characters to the cycle repeat point shortens, and the number of repeated substrings increases throughout the set.
 * The code listing below allows checking of a Rnd stream for substrings, using preset filter settings, eg; capitals, lower case, integers, etc., and in addition, includes a similar generator based on a hash for those who would like to compare it.
 * The repeat substring procedure is quite slow, depending as it does on the location of repeats.  The worst case is for no repeats found where the number of cycles becomes maximum at (0.5*n)^2, the square of half the number of characters in the test string.   Of course the smallest number of cycles is just one when a simple string is repeated, eg; abcabc. Clearly, an increase of string length by a factor of ten could increase the run time by a factor of one hundred.  (1000 characters in about 2 seconds, 2000 in 4secs, 10000 in 200secs, is, so far, the best timing!).
 * Coding layout can affect the length of repeated substrings too.  The reader might compare the effect of placing the Randomize function outside the random number loop, then inside the loop, while making an output of only capitals. (See code in MakeLongRndStr).   In the past the repeat strings have worsened considerably when placed within.   The code as listed here to test Rnd with 1000 samples of capitals, no use of DoEvents, and Randomize wrongly placed inside the loop, will return a repeat substring of up to four hundred characters for this author.   Increasing the code lines in the loop, affecting the time (?) for each iteration of the loop to run, also affects the length of any substrings.