# Re: Random number generation quirk

8 messages
Open this post in threaded view
|

## Re: Random number generation quirk

 Moving this to the dev list... On Fri, 2014-02-28 at 12:14 -0500, Jordi Gutiérrez Hermoso wrote: > Ah, you're using single. I can reproduce now on 3.8.0: > >     octave:1> x = rand(3000,"single"); max(round(x*10+0.5)(:)) >     ans =  10 >     octave:2> x = rand(3000,"single"); max(round(x*10+0.5)(:)) >     ans =  11 >     octave:3> max(x(:)) >     ans =  1 > > This a bug one way or another, since the documentation clearly > states that this should not happen. Can you please file a bug > report? So I think I see the bug:     http://hg.savannah.gnu.org/hgweb/octave/file/c579bd4e12c9/liboctave/numeric/randmtzig.c#l392Here the cast to float is pointless. The constants are doubles, not floats, so the computation is done in doubles, with 32 bits. However, a float mantissa only has 23 bits, so casting to float loses precision bits, and I think this casting also performs rounding. I think it might be better to do bit manipulations to light the appropriate mantissa bits in the float representation than to be relying on implicit casts, explicit casts, and the arithmetic between them. - Jordi G. H.
Open this post in threaded view
|

## Re: Random number generation quirk

 I have had the problem occur with double rands, but it is much easier to reproduce with singles.I think loss of precision could account for the problem.  I was under the impression that the division was taking place as a single precision calculation, not a double precision calculation? I would be more confident in the correctness of the result if all operands were doubles, and the division took place as a double precision division, and then the final result was cast as a single. That said, it would probably be much faster to do bit manipulations place the bits in the appropriate locations.  However, I am not sure whether assumptions about x86 versus other architectures will require specialized coding.  I don't completely understand the comments from lines 104-108 on performance impacts of the i686 64-bit architecture. - MikeOn Fri, Feb 28, 2014 at 1:39 PM, Jordi Gutiérrez Hermoso wrote: Moving this to the dev list... On Fri, 2014-02-28 at 12:14 -0500, Jordi Gutiérrez Hermoso wrote: > Ah, you're using single. I can reproduce now on 3.8.0: > >     octave:1> x = rand(3000,"single"); max(round(x*10+0.5)(:)) >     ans =  10 >     octave:2> x = rand(3000,"single"); max(round(x*10+0.5)(:)) >     ans =  11 >     octave:3> max(x(:)) >     ans =  1 > > This a bug one way or another, since the documentation clearly > states that this should not happen. Can you please file a bug > report? So I think I see the bug:     http://hg.savannah.gnu.org/hgweb/octave/file/c579bd4e12c9/liboctave/numeric/randmtzig.c#l392 Here the cast to float is pointless. The constants are doubles, not floats, so the computation is done in doubles, with 32 bits. However, a float mantissa only has 23 bits, so casting to float loses precision bits, and I think this casting also performs rounding. I think it might be better to do bit manipulations to light the appropriate mantissa bits in the float representation than to be relying on implicit casts, explicit casts, and the arithmetic between them. - Jordi G. H.
Open this post in threaded view
|

## Re: Random number generation quirk

 On Fri, 2014-02-28 at 15:14 -0500, Michael Pender wrote: > I have had the problem occur with double rands, but it is much > easier to reproduce with singles. Alternatively, we might decide this is not a problem at all and we just update the docs to say that sometimes rand is exactly 1 (or exactly 0?). > That said, it would probably be much faster to do bit manipulations > place the bits in the appropriate locations. However, I am not sure > whether assumptions about x86 versus other architectures will > require specialized coding. We have dropped the ability to do arithmetic other than IEEE 754 float arithmetic. We could do bit manipulations instead of casting around and doing arithmetic. But come to think of it, this might be complicated. I just remember that we can't use the same exponent bits for all mantissas, due to the implicit bit. - Jordi G. H.
Open this post in threaded view
|

## Re: Random number generation quirk

 That's a pretty interesting problem! On Fri, Feb 28, 2014 at 7:39 PM, Jordi Gutiérrez Hermoso <[hidden email]> wrote: > So I think I see the bug: > >     http://hg.savannah.gnu.org/hgweb/octave/file/c579bd4e12c9/liboctave/numeric/randmtzig.c#l392> > Here the cast to float is pointless. The constants are doubles, not > floats, so the computation is done in doubles, with 32 bits. However, > a float mantissa only has 23 bits, so casting to float loses precision > bits, and I think this casting also performs rounding. Yup, looking at the border case shows this nicely: octave:4> q = ((2^32-1) + 0.5) * (1.0/4294967296.0) q =  1.00000 octave:5> q==1 ans = 0 octave:6> q = (single(2^32-1) + 0.5) * (1.0/4294967296.0) q =  1 octave:7> q==1 ans =  1 This makes perfect sense, of course. If memory serves me right, FP default rounding is defined that if the mathematical result of an operation falls between two finite FP numbers, then the closer one is chosen (in case of ties, such that there's no bias on average, IIRC). With epsf := eps('single') being 2^-23, the predecessor of 1 is 1-2^-23. On Fri, Feb 28, 2014 at 11:00 PM, Jordi Gutiérrez Hermoso <[hidden email]> wrote: > Alternatively, we might decide this is not a problem at all and we > just update the docs to say that sometimes rand is exactly 1 (or > exactly 0?). I think that besides this being incompatible with Mathworks' definition of M rand(), it would be inappropriate due to Octave using randu32() for derived calculations (sampiling of other probability distributions, it seems). > We could do bit manipulations instead of casting around > and doing arithmetic. But come to think of it, this might be > complicated. I just remember that we can't use the same exponent bits > for all mantissas, due to the implicit bit. Indeed, it's only easy at first sight :-). Here's a function that samples the numbers epsf * [1 .. 2^23-1] uniformly, assuming randi32() is uniform on [0 .. 2^31-1]: float randu32(void)  // (0, 1) {     const float C = 1.0f / 8388608.0f;  // 2^-23     uint32_t i;     do {         i = randi32()>>9;  // [0, 2^23-1]     } while (i == 0);     // i now in [1, 2^23-1]     return C * (float)i; } The catch is that while in the interval [0.5, 1), all FP numbers are sampled this way, in [0.25, 0.5) it's every second one, in [0.125, 0.25) every fourth one, and so on. So the question is, are you fine with this "coarseness", or do you want all FP numers in [0.25, 0.5) sampled too, which would then have to be with halved probability for each one (quartered probability for [0.125, 0.25), and so on). --Philipp
Open this post in threaded view
|

## Re: Random number generation quirk

 On Sat, Mar 1, 2014 at 11:38 AM, Philipp Kutin <[hidden email]> wrote: > Here's a function that > samples the numbers epsf * [1 .. 2^23-1] uniformly, assuming randi32() > is uniform on [0 .. 2^31-1]: Oops. That should be "assuming randi32() is uniform on [0 .. 2^32-1]". --Philipp
Open this post in threaded view
|

## Re: Random number generation quirk

 In reply to this post by Philipp Kutin Thinking about this some more. First, another correction: On Sat, Mar 1, 2014 at 11:38 AM, Philipp Kutin <[hidden email]> wrote: > With epsf := eps('single') being 2^-23, the predecessor of 1 is > 1-2^-23. That should be "the predecessor of 1 is 1 - 2^-24". If you want to do away with the coarseness, you may think along the lines of, well, let's just do the intermediate calculation in double precision, and only at the end demote to float. We repeat this as long as this result equals one of the boundary values. Here's what you might come up with then: float randu32(void)  // (0, 1) {     static const double C = 1.0 / (65536.0 * 65536.0);  // 2^-32     float f;     do {         f = C * (double)randi32();     } while (f == 0.0 || f == 1.0);     return f; } As an optimization, since f == 0.0 IFF randi32() returns 0, we can do the f == 0.0 comparison in integer beforehand. How does this function compare with the one posted earlier? First, it's clear that it samples more values (without telling which ones these are). However, comparing the converted result is slower than doing it on the integer early. These are some timings for a whole 0 .. 2^32-1 cycle: first version: 12.8 s above version: 23.7 s above version with early i==0 check: 20.7 s But besides these efficiency issues, there's another numeric one that relates to the FP rounding in the case of ties. octave:37> for i=0:10, fprintf('%ld %ld\n', 2^24+i, single(2^24+i)); end 16777216 16777216 16777217 16777216  % round down 16777218 16777218 16777219 16777220  % round up 16777220 16777220 16777221 16777220  % round down 16777222 16777222 16777223 16777224  % round up 16777224 16777224 16777225 16777224  % round down 16777226 16777226 Because of this rounding definition, we don't sample the FP numbers in an interval [2^(-n-1) 2^-n) (n >= 0) uniformly if, denoting the difference between two consecutive numbers in this interval as d(n), numbers "finer" than d(n) are used as intermediate values. So much for "fun with floating point" for now. I'm attaching a C source file with the three variants of randu32(), plus some simple checking in main(). --Philipp randfloat.c (2K) Download Attachment