Re: Update and questions regarding vecorization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Update and questions regarding vecorization

Oliver Heimlich
Hi Joel,

On 21.06.2017 16:55, Joel Dahne wrote:

> This week I have begun the work on vectorization and I have some
> questions/comments.
>
> Where to test vectorization?:
> At the moment the tests in the unary functions (and some others), for
> example sin.m or cos.m, does not test vectorization in any way. They all
> use the same functions for performing the operations,
> crlibm_functions.cc or mpfr_function_d.cc, so that functionality can
> perhaps be tested there. But a lot of the unary functions also do a lot
> of other computations directly in the octave-script. This is true in for
> example sin.m. Should we add tests for this in their own files? We would
> then test with vectors, matrices and ND-arrays as input. I can also
> mention that all of work done in the octave-scripts uses smart indexing
> so they immediately worked for N-dimensional arrays.

There are some vectorized test cases for mpfr_function_d (see
src/mpfr_function_d.cc) and for crlibm_function (see
inst/__check_crlibm__.m).  They load several test cases in column
vectors and call the tested function only once.  However, it only
affects a few functions and does not involve the interval methods.

Of course, you can add some hand-picked test cases at the end of the
individual functions that you'd like to be tested, e. g., in
inst/@infsup/sin.m you may add the cause of bug #51283.

When you want to check all mapping functions (=functions that are
applied element-wise on non-scalar inputs) on a large scale, I would
suggest that we improve ITF1788:  ITF1788 currently generates a long
list of assert statements to test arithmetic correctness of most
functions in the package with many test cases.  The test cases are
defined in the test/*.itl files.  That could be improved as follows:
ITF1788 generates an Octave script, which could then generate test code
for Octave.  The numeric test data could be stored in *.mat files and
could then be used to (1) loop over the test data for scalar test cases
of interval functions like the assert statements do today, to (2) test
the mapping functions on the vectors of all test values at once, and (3)
reshape the test data into more dimensions to also test vectorization in
higher dimensions, and (4) check broadcasting.

Do you want me to look into this during the weekend?  When you are
unfamiliar with ITF1788, it could be distracting to start this side
project yourself now.

> Broadcasting for ternary functions:
> I added support for broadcasting of N-dimensional arrays for binary
> functions in mpfr_function_d.cc. This in practice required a rewrite of
> the whole function, I will write a bit more about this in my blog post
> tomorrow. When I looked at the ternary functions noticed that it does
> not perform full broadcasting at the moment. However the only ternary
> function used is fma.m and that function performs broadcasting already
> in the octave-script. Should I still add broadcasting for ternary
> functions? It might be used in the future. If we choose not to we should
> probably add a comment about it in the code.

Sounds complicated, I'll answer this during the Weekend after looking at
your code for the binary functions.  I don't see broadcasting in
mpfr_function_d as important, since I consider it more of an internal
function.  Do you know of any ternary Octave functions that allow
broadcasting, e. g., does plot3 allow broadcasting?

> rdivide:
> This is not really connected to N-dimensional arrays but I noticed it
> when reading the file. When the numerator is equal to 1 it short
> circuits and performs a more efficient computation. If the numerator is
> an array it only short circuits if all the elements are 1. We should be
> able to short circuit for all elements that are 1 and for the other ones
> we perform the normal computation. Should I implement this? Since it
> does not have to do with my project it might be a better idea to add it
> directly to the interval repository and not to my repository.

Do it if you like.  I have been too lazy to write an indexing expression
to short-circuit the correct subset of the N-D array.  You may either
send a separate patch (patch tracker) or commit it to your repo, from
where I will eventually merge it anyway.

> sin.m:
> See https://savannah.gnu.org/bugs/index.php?51283. We should probably
> add a work around for this. Or perhaps wait and see what Octave does
> with it?

Thanks for the bug report.  I have added a comment.  Please use the work
around to also support current versions of Octave.

> times.m:
> The choice has been made to compute all possible multiplications at all
> times. Normally I agree with this, but broadcasting makes things
> different. If we have a lot of broadcasting it does make sense to check
> the boundaries first I think, one check can then potentially save lots
> of multiplications. For example in
>
> infsup (rand (1000, 1000)) .* infsup (1, 2);
>
> we could save 6 million (correct?) computations with one check. This is
> of course an extreme example but you get the point. Since multiplication
> with a scalar is quite a common operation it might be worth it to add a
> short circuit for this case. It does however make the code more complex,
> which is never good.

Did you measure execution times?  When I last checked it with
non-broadcasting vectors, it wasn't worth the effort (except for corner
cases).  With broadcasting this should be no different.

However, this might have changed or I might be wrong.  You may try to
find a faster solution.  The times function is not that complicated and
we have lots of unit tests for regression testing.  If you find a faster
algorithm that'd be good.  However, you should measure execution times
for several input constellations.

> It will try post on my blog tomorrow. On Friday we celebrate Midsummer's
> Eve here in Sweden so then I will not be working.

I'm looking forward to your post and have a happy Midsommar.

Best
Oliver

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Update and questions regarding vecorization

Urathai

Hi Oliver,

Oliver Heimlich writes:

> Hi Joel,
>
> On 21.06.2017 16:55, Joel Dahne wrote:
>
> Of course, you can add some hand-picked test cases at the end of the
> individual functions that you'd like to be tested, e. g., in
> inst/@infsup/sin.m you may add the cause of bug #51283.

I added this specific test!

> When you want to check all mapping functions (=functions that are
> applied element-wise on non-scalar inputs) on a large scale, I would
> suggest that we improve ITF1788:  ITF1788 currently generates a long
> list of assert statements to test arithmetic correctness of most
> functions in the package with many test cases.  The test cases are
> defined in the test/*.itl files.  That could be improved as follows:
> ITF1788 generates an Octave script, which could then generate test code
> for Octave.  The numeric test data could be stored in *.mat files and
> could then be used to (1) loop over the test data for scalar test cases
> of interval functions like the assert statements do today, to (2) test
> the mapping functions on the vectors of all test values at once, and (3)
> reshape the test data into more dimensions to also test vectorization in
> higher dimensions, and (4) check broadcasting.
>
> Do you want me to look into this during the weekend?  When you are
> unfamiliar with ITF1788, it could be distracting to start this side
> project yourself now.
I think what you propose is a good idea. Then we can generate lots of
tests for all relevant functions. This would help us to catch bugs like
the one for sinus.

If you can take a look at it that's probably a good idea. From what I
understand it's mainly a matter of adding more ways to parse the test
data?

> Do it if you like.  I have been too lazy to write an indexing expression
> to short-circuit the correct subset of the N-D array.  You may either
> send a separate patch (patch tracker) or commit it to your repo, from
> where I will eventually merge it anyway.
>
I have made a note about it at least, might fix it later on.

>> sin.m:
>> See https://savannah.gnu.org/bugs/index.php?51283. We should probably
>> add a work around for this. Or perhaps wait and see what Octave does
>> with it?
>
> Thanks for the bug report.  I have added a comment.  Please use the work
> around to also support current versions of Octave.

I added the workaround

> Did you measure execution times?  When I last checked it with
> non-broadcasting vectors, it wasn't worth the effort (except for corner
> cases).  With broadcasting this should be no different.
>
> However, this might have changed or I might be wrong.  You may try to
> find a faster solution.  The times function is not that complicated and
> we have lots of unit tests for regression testing.  If you find a faster
> algorithm that'd be good.  However, you should measure execution times
> for several input constellations.

I have not done any measurements. Made a note about it and might look at
it later on.

Best,
Joel

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Update and questions regarding vecorization

Oliver Heimlich
Hi Joel,

On 22.06.2017 16:29, Joel Dahne wrote:
> Oliver Heimlich writes:
>> On 21.06.2017 16:55, Joel Dahne wrote:

>> When you want to check all mapping functions (=functions that are
>> applied element-wise on non-scalar inputs) on a large scale, I would
>> suggest that we improve ITF1788:  ITF1788 currently generates a long
>> list of assert statements to test arithmetic correctness of most
>> functions in the package with many test cases.  The test cases are
>> defined in the test/*.itl files.  That could be improved as follows:
>> ITF1788 generates an Octave script, which could then generate test code
>> for Octave.  The numeric test data could be stored in *.mat files and
>> could then be used to (1) loop over the test data for scalar test cases
>> of interval functions like the assert statements do today, to (2) test
>> the mapping functions on the vectors of all test values at once, and (3)
>> reshape the test data into more dimensions to also test vectorization in
>> higher dimensions, and (4) check broadcasting.
>>
>> Do you want me to look into this during the weekend?  When you are
>> unfamiliar with ITF1788, it could be distracting to start this side
>> project yourself now.

> I think what you propose is a good idea. Then we can generate lots of
> tests for all relevant functions. This would help us to catch bugs like
> the one for sinus.
>
> If you can take a look at it that's probably a good idea. From what I
> understand it's mainly a matter of adding more ways to parse the test
> data?

I have sketched my idea in this Github issue, feel free to add comments.
 I am going to look into this during the weekend.

https://github.com/oheim/ITF1788/issues/8

Just assume that you will receive a large amount of vectorization tests
as soon as I am finished.  ;-)

Oliver

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GSoC interval] Re: Update and questions regarding vecorization

Oliver Heimlich
In reply to this post by Oliver Heimlich
On 21.06.2017 23:56, Oliver Heimlich wrote:

>> Broadcasting for ternary functions:
>> I added support for broadcasting of N-dimensional arrays for binary
>> functions in mpfr_function_d.cc. This in practice required a rewrite of
>> the whole function, I will write a bit more about this in my blog post
>> tomorrow. When I looked at the ternary functions noticed that it does
>> not perform full broadcasting at the moment. However the only ternary
>> function used is fma.m and that function performs broadcasting already
>> in the octave-script. Should I still add broadcasting for ternary
>> functions? It might be used in the future. If we choose not to we should
>> probably add a comment about it in the code.

> Sounds complicated, I'll answer this during the Weekend after looking at
> your code for the binary functions.  I don't see broadcasting in
> mpfr_function_d as important, since I consider it more of an internal
> function.  Do you know of any ternary Octave functions that allow
> broadcasting, e. g., does plot3 allow broadcasting?

Hi Joel,

I have merged your changes locally, but didn't publish them yet in the
official repository.  Some remarks/questions regarding the new
vectorization code in mpfr_function_d.cc:

 - The method NDArray.size has been introduced in Octave 4.2.  If we
rely on that method, the package will have a dependency on the latest
version of Octave.  I'd prefer, if the package can compile under Octave
4.0 (where this method is missing).  Could you use a different method
(or define a helper function if compiled with an old Octave version)?

 - In your blog post—which was a very interesting read, by the way, and
I could learn some Octave pro stuff—you write that you have been
inspired by do_bsxfun_op in Octave core.  I guess I should add copyright
notes then from that file, shouldn't I?

        Copyright (C) 2009-2017 Jaroslav Hajek
        Copyright (C) 2009 VZLU Prague

   By the way, I also add copyright notes to all files modified by you.
So, you don't have to do that yourself.

 - For the ternary vectorization operator: Like I said before, it is not
a priority to have broadcasting handled in mpfr_function_d when it is
already handled in the m-file.  Probably, the resulting C++ code would
be hard to understand.  You should remove the naive broadcast / resize
code from the function entirely, and require that all three input
arguments are of equal size.

Oliver


P. S. I didn't finish unit tests for vectorization yet.

P. P. S. I can remember another ternary function from an early draft of
the interval arithmetic standard (it doesn't make sense to compute that
with mpfr):

        case (c, g, h) = if c < 0 then g, else h.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GSoC interval] Re: Update and questions regarding vecorization

Urathai

Hi Oliver!

Oliver Heimlich writes:

>  - The method NDArray.size has been introduced in Octave 4.2.  If we
> rely on that method, the package will have a dependency on the latest
> version of Octave.  I'd prefer, if the package can compile under Octave
> 4.0 (where this method is missing).  Could you use a different method
> (or define a helper function if compiled with an old Octave version)?

Good to know! I have changed it to use older functions now.

>  - In your blog post—which was a very interesting read, by the way, and
> I could learn some Octave pro stuff—you write that you have been
> inspired by do_bsxfun_op in Octave core.  I guess I should add copyright
> notes then from that file, shouldn't I?
>
> Copyright (C) 2009-2017 Jaroslav Hajek
> Copyright (C) 2009 VZLU Prague

Since it is very similar it might be a good idea. Though I must say I
don't know much about how to handle copyright notes.

>    By the way, I also add copyright notes to all files modified by you.
> So, you don't have to do that yourself.

Great!

>  - For the ternary vectorization operator: Like I said before, it is not
> a priority to have broadcasting handled in mpfr_function_d when it is
> already handled in the m-file.  Probably, the resulting C++ code would
> be hard to understand.  You should remove the naive broadcast / resize
> code from the function entirely, and require that all three input
> arguments are of equal size.

That makes everything much easier. I now check that the input parameters
have the exact same size and I don't perform any broadcasting on ternary
functions. I have also added a comment about this and made a note about
it in the documentation of the function.

> Oliver
>
>
> P. S. I didn't finish unit tests for vectorization yet.
At the moment my work does not depend on it so there is no hurry. I read
you issue on GitHub and I think looks very good!

Best regards,
Joel

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GSoC interval] Re: Update and questions regarding vecorization

Oliver Heimlich
Hi Joel,

On 27.06.2017 10:42, Joel Dahne wrote:
> Oliver Heimlich writes:
>> P. S. I didn't finish unit tests for vectorization yet.
> At the moment my work does not depend on it so there is no hurry. I read
> you issue on GitHub and I think looks very good!

I have almost finished my work on the testing framework.  There is some
cleanup required with how the framework handles array constructors in
various supported languages before I can push the change.

In the meantime I have reviewed your latest changes to the code.  It
looks very good so far and I have only minor remarks:

1. I have noticed a failing test in @infsup/disp for ND arrays, which
should probably be fixed.

2. Please describe the bug that you fixed in @infsup/diag in the NEWS file.

3. It's awesome to see how you can simplify and improve most of the code
at the same time.  Please go on.  :-)

Best
Oliver

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GSoC interval] Re: Update and questions regarding vecorization

Oliver Heimlich
On 04.07.2017 22:06, Oliver Heimlich wrote:
> On 27.06.2017 10:42, Joel Dahne wrote:
>> Oliver Heimlich writes:
>>> P. S. I didn't finish unit tests for vectorization yet.
>> At the moment my work does not depend on it so there is no hurry. I read
>> you issue on GitHub and I think looks very good!
>
> I have almost finished my work on the testing framework.  There is some
> cleanup required with how the framework handles array constructors in
> various supported languages before I can push the change.

I have pushed the new changesets to the ITF1788 and the octave-interval
repositories.  Make sure that you run “make clean” after you have
updated your workspace with these changes to also update your ITF1788
workspace.

The tests can be run with “make test-bist” (it's not yet possible to
properly use “test @infsup/plus” after “make run” or “make
install”—though you can use it after “make dist” + “pkg install …”).

So far, I have only started to integrate the new test format for the
plus function [1].  There is a loop which calls the function on scalar
inputs (for all test cases), which is equivalent to the old generated
tests in test/*.tst.

Also, I have demonstrated how to do a vector evaluation of the function,
where all test cases are given as column vectors in a single function call.

You can probably reshape the testcases easily to test for proper
element-wise ND array evaluation.  Also, you may add this new kind of
tests to other methods, the test data contains test cases for various
functions.  I will not continue to work on that task myself during GSoC
to prevent merge conflicts.

Oliver



[1]
https://sourceforge.net/p/octave/interval/ci/2eaf1a499291865f051cca883bbca5eb017038b5/tree/inst/%40infsup/plus.m?diff=1b8686048dae314217bb1b4aa40c8e9fe4c261af

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GSoC interval] Re: Update and questions regarding vecorization

Urathai

Oliver Heimlich writes:

>> I have almost finished my work on the testing framework.  There is some
>> cleanup required with how the framework handles array constructors in
>> various supported languages before I can push the change.
>
> I have pushed the new changesets to the ITF1788 and the octave-interval
> repositories.  Make sure that you run “make clean” after you have
> updated your workspace with these changes to also update your ITF1788
> workspace.
>
> The tests can be run with “make test-bist” (it's not yet possible to
> properly use “test @infsup/plus” after “make run” or “make
> install”—though you can use it after “make dist” + “pkg install …”).
>
> So far, I have only started to integrate the new test format for the
> plus function [1].  There is a loop which calls the function on scalar
> inputs (for all test cases), which is equivalent to the old generated
> tests in test/*.tst.
>
> Also, I have demonstrated how to do a vector evaluation of the function,
> where all test cases are given as column vectors in a single function call.
>
> You can probably reshape the testcases easily to test for proper
> element-wise ND array evaluation.  Also, you may add this new kind of
> tests to other methods, the test data contains test cases for various
> functions.  I will not continue to work on that task myself during GSoC
> to prevent merge conflicts.
>
> Oliver
>
>
>
> [1]
> https://sourceforge.net/p/octave/interval/ci/2eaf1a499291865f051cca883bbca5eb017038b5/tree/inst/%40infsup/plus.m?diff=1b8686048dae314217bb1b4aa40c8e9fe4c261af

Great! I will take a look at this next week!

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GSoC interval] Re: Update and questions regarding vecorization

Urathai
In reply to this post by Oliver Heimlich

Oliver Heimlich writes:

> Hi Joel,
> In the meantime I have reviewed your latest changes to the code.  It
> looks very good so far and I have only minor remarks:
>
> 1. I have noticed a failing test in @infsup/disp for ND arrays, which
> should probably be fixed.
For me all 9 tests passes, what is the error you get and under what
circumstances?
>
> 2. Please describe the bug that you fixed in @infsup/diag in the NEWS
> file.
Done!
>
> 3. It's awesome to see how you can simplify and improve most of the code
> at the same time.  Please go on.  :-)
>
> Best
> Oliver

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GSoC interval] Re: Update and questions regarding vecorization

Oliver Heimlich
On 05.07.2017 22:11, Joel Dahne wrote:

>
> Oliver Heimlich writes:
>
>> Hi Joel,
>> In the meantime I have reviewed your latest changes to the code.  It
>> looks very good so far and I have only minor remarks:
>>
>> 1. I have noticed a failing test in @infsup/disp for ND arrays, which
>> should probably be fixed.
> For me all 9 tests passes, what is the error you get and under what
> circumstances?

This test is failing for me in Octave 4.0.3:

>> test @infsup/disp
***** test
 i = infsupdec (reshape (1:24, 2, 3, 4));
 i(1, 1, 2) = entire ();
 i(1, 1, 3) = empty ();
 i(1, 1, 4) = nai ();
 assert (disp (i(1,1,:)), "ans(:,:,1) =   [1]_com\nans(:,:,2) =
[Entire]_dac\nans(:,:,3) =   [Empty]_trv\nans(:,:,4) =   [NaI]\n")
!!!!! test failed
ASSERT errors for:  assert (disp (i (1, 1, :)),"ans(:,:,1) =
[1]_com\nans(:,:,2) =   [Entire]_dac\nans(:,:,3) =
[Empty]_trv\nans(:,:,4) =   [NaI]\n")

  Location  |  Observed  |  Expected  |  Reason
     []      ans(:,:,1) =   [1]_com
ans(:,:,1) =   [Entire]_dac
ans(:,:,1) =   [Empty]_trv
ans(:,:,1) =   [NaI]
 ans(:,:,1) =   [1]_com
ans(:,:,2) =   [Entire]_dac
ans(:,:,3) =   [Empty]_trv
ans(:,:,4) =   [NaI]
   Strings don't match


Looking at the code, I guess that ind2sub is not computed correctly, it
should be changed from

  [matrixpartsubscript{:}] = ind2sub (size (x.inf), matrixpart);

into

  [matrixpartsubscript{:}] = ind2sub (size (x.inf)(3:end), matrixpart);

However, it is strange that you are not affected by this.


Best
Oliver


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GSoC interval] Re: Update and questions regarding vecorization

Urathai

Oliver Heimlich writes:

> On 05.07.2017 22:11, Joel Dahne wrote:
>>
>> Oliver Heimlich writes:
>>
>>> Hi Joel,
>>> In the meantime I have reviewed your latest changes to the code.  It
>>> looks very good so far and I have only minor remarks:
>>>
>>> 1. I have noticed a failing test in @infsup/disp for ND arrays, which
>>> should probably be fixed.
>> For me all 9 tests passes, what is the error you get and under what
>> circumstances?
>
> This test is failing for me in Octave 4.0.3:
>
>>> test @infsup/disp
> ***** test
>  i = infsupdec (reshape (1:24, 2, 3, 4));
>  i(1, 1, 2) = entire ();
>  i(1, 1, 3) = empty ();
>  i(1, 1, 4) = nai ();
>  assert (disp (i(1,1,:)), "ans(:,:,1) =   [1]_com\nans(:,:,2) =
> [Entire]_dac\nans(:,:,3) =   [Empty]_trv\nans(:,:,4) =   [NaI]\n")
> !!!!! test failed
> ASSERT errors for:  assert (disp (i (1, 1, :)),"ans(:,:,1) =
> [1]_com\nans(:,:,2) =   [Entire]_dac\nans(:,:,3) =
> [Empty]_trv\nans(:,:,4) =   [NaI]\n")
>
>   Location  |  Observed  |  Expected  |  Reason
>      []      ans(:,:,1) =   [1]_com
> ans(:,:,1) =   [Entire]_dac
> ans(:,:,1) =   [Empty]_trv
> ans(:,:,1) =   [NaI]
>  ans(:,:,1) =   [1]_com
> ans(:,:,2) =   [Entire]_dac
> ans(:,:,3) =   [Empty]_trv
> ans(:,:,4) =   [NaI]
>    Strings don't match
>
>
> Looking at the code, I guess that ind2sub is not computed correctly, it
> should be changed from
>
>   [matrixpartsubscript{:}] = ind2sub (size (x.inf), matrixpart);
>
> into
>
>   [matrixpartsubscript{:}] = ind2sub (size (x.inf)(3:end), matrixpart);
>
> However, it is strange that you are not affected by this.
>
You are right! I have added a patch for it now. The reason the test
passed for me is that ind2sub have been updated in 4.2.1 and it now
works differently when nargout is less than the number of
dimensions. This meant that disp would always give the correct result
for 3-dimensional arrays (which the test is for). But now it should work
for any input!
>
> Best
> Oliver

Loading...