I think we can improve on Rik's solution and lose the loop. Consider

which has a range of uniform (0, 1). However, this still only allows for

2^24 - 1 unique values in the interval, but it seems quite simple. I also

which ends up with 79,691,776 unique values. (that's 0x04C00000 if

you'd like to figure out the math behind it; I just ran it through a loop).

Hope these help.

>

> On 12/17/19 2:27 PM, Rik wrote:

> > Maintainers,

> >

> > I believe there are serious, but subtle, problems with Octave's random

> > number generation. Resolving them, without introducing further issues,

> > requires careful thought and review beyond just a single coder (myself).

> > Hence, I'm bringing this issue to the Maintainer's list. I've also CC'ed

> > Jordi and Philipp Kutin who were active in initially discovering the bugs,

> > and Michael Leitner who was helpful in removing the bias from Octave's

> > randi function.

> >

> > The tip of the iceberg was bug report #41742

> > (

https://savannah.gnu.org/bugs/?41742). In this report, using rand to

> > generate an integer in the range [1, a] using the straightforward code

> >

> > x = round (rand * a + 0.5)

> >

> > occasionally fails and returns a result of a+1 outside the desired range.

> > This is only possible if rand should return exactly 1. But, according to

> > the documentation, rand produces a floating point number in the range (0,

> > 1), i.e., endpoints are excluded. Unfortunately, it is quite simple to

> > demonstrate that the rand function does include the endpoint. Test code

> > that I found that always reproduces the problem is

> >

> > octave:1> rand ("state", 1);

> > octave:2> x = max (rand ([20e6, 1], "single"))

> > x = 1

> >

> > This result is contrary to the documentation, and contrary to the behavior

> > of the rand function in Matlab. Discussion between Jordi and Philipp

> > ensued at this thread

> >

https://octave.1599824.n4.nabble.com/Re-Random-number-generation-quirk-td4662514.html.

> >

> > The Octave code which generates a "single" (32-bit IEEE floating point)

> > random number is found in liboctave/numeric/randmtzig.cc

> >

> > /* generates a random number on (0,1)-real-interval */

> > static float randu32 (void)

> > {

> > return (static_cast<float> (randi32 ()) + 0.5) * (1.0/4294967296.0);

> > /* divided by 2^32 */

> > }

> >

> > The function randi32 (which I have no reason to doubt) generates an

> > unsigned 32-bit integer in the range [0, 2^32 - 1]. Unfortunately, I think

> > there was some confusion by the original programmer between Octave and C

> > implicit conversion rules. In Octave, arguments are demoted to lower

> > precision before the operand is applied:

> >

> > single + double = single

> >

> > This is exactly reversed in C where arguments are promoted to the more

> > precise class. In C,

> >

> > single + double = double

> >

> > So this code is actually performed completely with double precision values,

> > and then truncated to single precision float at the end of the function

> > because the return type for the function is float. In Octave pseudo-code,

> >

> > single ( (double (randi32 ()) + 0.5) * ( 1.0 / 4294967296.0) )

> >

> > A useful Octave function for mimicking the C code is

> >

> > fs = @(x) single ((floor (x) + 0.5) * (1.0 / 4294967296.0))

> >

> > Using fs() one can determine the maximum valuable allowable in order for

> > the result not to become exactly equal to 1.

> >

> > octave:28> fs (2^32-129)

> > ans = 0.9999999

> > octave:29> fs (2^32-128)

> > ans = 1

> >

> > I assume Philipp did something similar. He has provided a patch that,

> > after unpacking various macros, does the following

> >

> > static float randu32 (void)

> > {

> > uint32_t i;

> >

> > do

> > {

> > i = randi32 ();

> > }

> > while (i == 0 || i > (2^32 - 129);

> > is

> > return i * (1.0 / 4294967296.0);

> > }

> >

> > My fear, and more mathematical help in this specific would be nice, is that

> > this will introduce a bias.(1.0/4294967296.0)

>

> In the discussion between Jordi, Mike and Philipp is the link to the code:

>

>

http://hg.savannah.gnu.org/hgweb/octave/file/c579bd4e12c9/liboctave/numeric/randmtzig.c#l392>

> Yes, I don't like the notion of tossing out a range of numbers using the

> while loop. RNG's have been extensively analyzed for randomness, and

> the conditional statement above has the *potential* to effect that.

> Then again, one analysis might be that good pseudo-RNG should have

> subintervals that look properly random.

>

> Small technicality is that

>

> return ((float)randi32 () + 0.5) * (1.0/4294967296.0);

> /* divided by 2^32 */

>

> is actually multiplying by the inverse, not dividing. Fine for real

> numbers and a CPU-saving technique in a general sense, but there will

> always be a bit of a guess as to exactly what a hardware floating point

> processor is doing. It probably doesn't make a difference in this case,

> though.

>

> But along that line, is there a slightly smarter way of doing the

> arithmetic closer to integer math? For example, if there is a way of

> avoiding the rounding problems by generating a range that is as

>

> delta + [0,M-1] * alpha

>

> it might avoid the rare value going out of range (which is kind of what

> you did in your proposed alternative later).

>

>

> > In Octave, the new code can be modeled as

> >

> > fs2 = @(x) single (floor (x) * (1.0 / 4294967296.0 ))

> >

> > Trying some sample values

> >

> > octave:78> fs2(1)

> > ans = 2.3283064e-10

> > octave:79> fs2(2)

> > ans = 4.6566129e-10

> > octave:80> fs2(3)

> > ans = 6.9849193e-10

> > octave:82> fs2(2^32-128)

> > ans = 1

> > octave:83> fs2(2^32-129)

> > ans = 0.9999999 return ((float)randi32 () + 0.5) * (1.0/4294967296.0);

>

> /* divided by 2^32 */

> > octave:84> fs2(2^32-130)

> > ans = 0.9999999

> > octave:100> fs2(2^32-129) - fs2(2^32-383)

> > ans = 0

> >

> > What one notices is that the small integer values all get mapped to

> > different floating point values, but at the upper end, large ranges of

> > different integer values get mapped to the same floating point value.

> >

> > Given that IEEE single-precision floating point has only 24 bits for the

> > mantissa, I think something like this would work better

> >

> > static float randu32 (void)

> > {

> > uint32_t i;

> >

> > do

> > {

> > i = randi32 () & static_cast<uint32_t> (0xFFFFFF);

> > }

> > while (i == 0 || i == 0xFFFFFF);

> >

> > return i * (1.0f / 16777216.0f);

> > }

> >

> > In my testing, this does not produce exact values of 1, nor does it map

> > multiple integer values to the same floating point value. Is anyone aware

> > of problems with this algorithm?

>

> Creative, but mapping out that whole upper level of pseudo-random

> numbers doesn't feel good. It may be as good as anything though

> because, yeah, we could change the name of the function to randu24() and

> it would have the same meaning because of the imposed range of (0,1).

> In some way, that range must implicitly toss away the exponent and the

> floating point bit. It's effectively now fractional point, not floating

> point.

>

> Are we doing a "horse-is-out-of-the-barn" type of thing by generating

> this (0:1) floating point range and then expecting to scale it? Say I

> take a vector of numbers from the RNG with range [1/2^24 : 2^24-1/2^24]

> (or think of mantissa [1:0xFFFFFE] if you like) and multiply by floating

> point value greater than 1? How about less than 1? Probably some

> strange dithering effect on the floating point numbers in either case,

> unless the scaling is a power of 2.

>

>

> > Resolving this must happen, but is only part of the issue. There is a

> > #define in randmtzig.cc

> >

> > #define RANDU randu32()

> >

> > and then RANDU is used in the algorithms for rand_normal and

> > rnd_exponential. In particular, the use occurs as the argument to the log

> > function. For example,

> >

> > yy = - std::log (RANDU);

> >

> > The old RANDU = randu32() occasionally put out values of 1 which will be

> > mapped by the log function to 0. Whereas any new function will always be

> > slightly less than one which means the log function output will be small,

> > but not exactly 0. Do rand_exponential and rand_normal rely on the old

> > behavior in order to function correctly? I have no idea.

>

> This is probably where the significance of [0:1) versus (0:1) makes a

> difference. Several math functions throw an error or require special

> treatment with an input of 0. It would be a mistake to think of

> floating point in terms of real number (0:1). Instead it means

> [delta:1-delta] where delta is the resolution of the floating point number.

>

> Dan

>