Skewness in random numbers

Aries

New Member
I've seen many examples on the forum of using something like "random w0: w0 = w0 // n" to generate a random number in the range 0 to n-1. What I have recently realised is that while this does give a random number, it is not uniformly distributed - there is a greater likelihood of getting a number in the lower part of the range. In the case I was working on, I was looking for numbers in the range 0 to 28799 (the number of seconds in 8 hours - don't ask why). After running the function several thousand times, I was surprised to see that for every 1000 numbers in the range 0-143999, there were only 800 (or so) in the range 14400-28799. Having checked my program, and not seeing anything wrong, I did a bit of mathematical analysis.

To keep things simple, I'm going to use 32 rather than 65536 as the range for w0, and 6 for "n" - the range over which I want the random numbers. The pseudo-random number generator in Picaxe does, I think, produce all the numbers in the given range, but effectively in a random order. So, after 32 executions of the function, you have generated each of the numbers 0-31 once. Using the formula w0 // n, the numbers 0-31 give the numbers 0-5, 0-5, 0-5, 0-5, 0-5, 0-1. So, 0 and 1 occur 6 times but 2-5 occur 5 times, a bias towards 0 and 1 as the result.

Before I saw the use of // to get random numbers, I was using **. In fact, this replicates the more usual form of random numbers, where - for example - the spreadsheet function RAND() gives a number in the range 0-1, which is then multiplied by "n" to give a number in the range 0 to n-1. w0 = w0 ** n is effectively the same as w0 = n * (w0/65536) and gives a much closer approximation to a uniform distribution of random numbers across the range 0 to n-1.
 

hippy

Technical Support
Staff member
Your analysis seems to be correct. You are also correct that the RANDOM function cycles through all numbers so with 65536 RANDOM commands all numbers 0-65535 will have been generated in a sort of random order. It's actually 'nnew = nold * 2 + b' where b is an XOR of some of the old number bits to make it work. It's a standard linear feedback shift register (LFSR) pseudo-random number generator implementation, nothing fancy.

Those 'w1 = w0 // n' results won't be evenly distributed if that 65536 isn't divisible by n. Even distribution only occurs when n is a power of two, and most times it won't be.

The evenness of distribution depends on 'n'. For example w0 // 40000 gives 0-39999 then 0-25535 as w0 goes from 0-65535. Numbers up to 25535 occur twice as often as those above. I am guessing there are worse cases than that but I'm not sure what they would be.

Using 'w1 = w0 ** n' is an interesting solution to the problem and looks like it should work. It would be interesting to compare the distribution.

So congratulations - it looks like you figured out something no one else had. It's certainly something I would never have thought of.
 
Last edited:

Aries

New Member
I used this code - switch between // and ** by commenting/uncommenting the line(s). It generates the random number in the range 0 to 28799 and checks if it is in the lower or upper half. w10 is the upper-half counter, w11 is the lower-half counter.

View attachment LCTestRandom.bas

After 256 printouts (=65536 calculations):

** has w10 =32768, w11 = 32768. Ratio w10/w11 varies from 0.7534 to 1.003. It's better than 0.99 for over half the time.
// has w10=28800, w11= 36736 (as you would expect). Ratio w10/w11 varies from 0.7142 to 0.8417.

A different seed would probably produce a different range of ratios.
 

AllyCat

Senior Member
Hi,

all numbers 0-65535 will have been generated in a sort of random order.
As the Random function emulates a Maximal Length Feedback Shift Register, it never generates 0, because that is a closed loop (however many XOR inputs are used, an all-zeros input will always generate another zero). So the range is 1 - 65535, but of course any further operation (remainder, logic operator, etc.) can produce a zero.

I described the actual algorithm in a Code Snippet , and various ways to use it, a few months ago.

Cheers, Alan.
 
Top