
Part 1 was about the problem with generation of singleprecision random
numbers. I already had identified a separate concern with double precision
variables.
The code in randmtzig.cc for the generation of doubleprecision random
numbers is
/* generates a random number on (0,1) with 53bit resolution */
static double randu53 (void)
{
const uint32_t a = randi32 () >> 5;
const uint32_t b = randi32 () >> 6;
return (a*67108864.0+b+0.4) * (1.0/9007199254740992.0);
}
Juan helpfully provide the code from NumPy which is nearly identical.
> static inline double mt19937_next_double(mt19937_state *state) {
> int32_t a = mt19937_next(state) >> 5, b = mt19937_next(state) >> 6;
> return (a * 67108864.0 + b) / 9007199254740992.0;
One difference is only performance related: the use of multiplication by a
constant rather than division. The real difference is the addition of 0.4
to handle the case where both a and b are zero, which would result in
including the endpoint 0 in the range of possible values. Should we be
consistent with the single precision case and just exclude 0 by test?
Although we would spread the integer state across one fewer floating point
values, the upside would be consistent floating point interval size across
the entire range. The equation for the floating point value is
fp = @(n) (n + 0.4) / 2^53
By inspection, the difference between any two points n, n+1 is 1/2^53 =
1.11e16. However, this does not hold at the end points. At the lower bound,
fp (0)
ans = 4.440892098500626e17
At the upper bound, the nearest available floating point enforces the same
interval value.
1  fp (2^53  1)
ans = 1.110223024625157e16
If we exclude 0 by test, then
fp = @(n) n / 2^53
and the first interval is now the same size as all others
fp (1)
ans = 1.110223024625157e16
That feels appealing to me.
Rik
