random numbers in tests

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

random numbers in tests

John W. Eaton
Administrator
After renaming the DLD-FUNCTIONS directory to dldfcn, two new test
failures popped up.  The failing tests were the last two from
splinefit.m:

  %!shared xb, yb, x
  %! xb = 0:2:10;
  %! yb = randn (size (xb));
  %! x = 0:0.1:10;

  %!test
  %! y = interp1 (xb, yb, x, "linear");
  %! assert (ppval (splinefit (x, y, xb, "order", 1), x), y, 10 * eps ());
  %!test
  %! y = interp1 (xb, yb, x, "spline");
  %! assert (ppval (splinefit (x, y, xb, "order", 3), x), y, 10 * eps ());
  %!test
  %! y = interp1 (xb, yb, x, "spline");
  %! assert (ppval (splinefit (x, y, xb), x), y, 10 * eps ());

The new test failure was just that the compute value did not match the
expected value to the requested tolerance.  To the digits displayed in
the log file, everything looked OK.

Since all I had done was rename some files, I couldn't understand what
could have caused the problem.  After determining that the changeset
that renamed the files was definitely the one that resulted in the
failed tests, and noting that running the tests from the command line
worked, I was really puzzled.  Only after all of that did I finally
notice that the tests use random data.

It seems the reason the change reliably affected "make check" was that
by renaming the DLD-FUNCTION directory to dldfcn, the tests were run
in a different order.  Previously, the tests from files in the
DLD-FUNCTION directory were executed first.  Now they were done later,
after many other tests, some of which have random values, and some
that may set the random number generator state.

Is this sort of thing also what caused the recent problem with the
svds test failure?

Should we always set the random number generator state for tests so
that they can be reproducible?  If so, should this be done
automatically by the testing functions, or left to each individual
test?

jwe
Reply | Threaded
Open this post in threaded view
|

Re: random numbers in tests

Daniel Sebald
On 08/01/2012 11:39 AM, John W. Eaton wrote:

> After renaming the DLD-FUNCTIONS directory to dldfcn, two new test
> failures popped up.  The failing tests were the last two from
> splinefit.m:
>
>    %!shared xb, yb, x
>    %! xb = 0:2:10;
>    %! yb = randn (size (xb));
>    %! x = 0:0.1:10;
>
>    %!test
>    %! y = interp1 (xb, yb, x, "linear");
>    %! assert (ppval (splinefit (x, y, xb, "order", 1), x), y, 10 * eps ());
>    %!test
>    %! y = interp1 (xb, yb, x, "spline");
>    %! assert (ppval (splinefit (x, y, xb, "order", 3), x), y, 10 * eps ());
>    %!test
>    %! y = interp1 (xb, yb, x, "spline");
>    %! assert (ppval (splinefit (x, y, xb), x), y, 10 * eps ());
>
> The new test failure was just that the compute value did not match the
> expected value to the requested tolerance.  To the digits displayed in
> the log file, everything looked OK.
>
> Since all I had done was rename some files, I couldn't understand what
> could have caused the problem.  After determining that the changeset
> that renamed the files was definitely the one that resulted in the
> failed tests, and noting that running the tests from the command line
> worked, I was really puzzled.  Only after all of that did I finally
> notice that the tests use random data.
>
> It seems the reason the change reliably affected "make check" was that
> by renaming the DLD-FUNCTION directory to dldfcn, the tests were run
> in a different order.  Previously, the tests from files in the
> DLD-FUNCTION directory were executed first.  Now they were done later,
> after many other tests, some of which have random values, and some
> that may set the random number generator state.
>
> Is this sort of thing also what caused the recent problem with the
> svds test failure?

It sure looks like it.  Some of the examples I gave yesterday showed
that the SVD on sparse data algorithm had results varying at least four
times esp(), and that was just one or two examples.  If one were to look
at hundreds or thousands of examples, I would think it is very likely to
exceed 10*eps.

Spline fits and simulations can have less accuracy as well.  So the
10*eps tolerance is a bigger question.


> Should we always set the random number generator state for tests so
> that they can be reproducible?  If so, should this be done
> automatically by the testing functions, or left to each individual
> test?

I would say that putting in a fixed input that passes is not the thing
to do.  The problem with that approach is if the library changes their
algorithm slightly these same issues might pop up again when a library
is updated and people will wonder what is wrong once again.

Instead, I think the sorts of approaches that Ed suggested yesterday is
the thing to do.  I.e., come up with a reasonable estimate for how
accurate such an algorithm should be and use that.  Octave is testing
functionality here, not the ultimate accuracy of the algorithm, correct?

I tried running some of the examples to see how accurate the spline fit
is, but kept getting errors about some pp structure not having 'n' as a
member.

Dan
Reply | Threaded
Open this post in threaded view
|

Re: random numbers in tests

Daniel Sebald
In reply to this post by John W. Eaton
On 08/01/2012 11:39 AM, John W. Eaton wrote:
> After renaming the DLD-FUNCTIONS directory to dldfcn, two new test
> failures popped up.  The failing tests were the last two from
> splinefit.m:

[snip]

> It seems the reason the change reliably affected "make check" was that
> by renaming the DLD-FUNCTION directory to dldfcn, the tests were run
> in a different order.  Previously, the tests from files in the
> DLD-FUNCTION directory were executed first.  Now they were done later,
> after many other tests, some of which have random values, and some
> that may set the random number generator state.

My other comment is that there seems to be a few sys-faults showing up
in the last day.  There is lots of activity on the source tree, but
don't rule out that rearranging the source tree directories might
uncover some long-standing programming bug that only becomes evident
because of the way in which hunks of code are assembled.

Dan