More extended performance profiling

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

More extended performance profiling

Rik-4
I was able to get versions 3.2.4 and 4.0.3 to compile so I have a longer baseline of performance measurements.  Using bm_toeplitz.m, there has been more than 100% slowdown since 3.2.4 to current dev.

Version 3.2.4 3.4.3 3.6.4 3.8.2 4.0.3 4.2.1 4.4.1 5.1.0 dev (6.1.0)
runtime 5.8968


10.055 10.544 13.052 13.481 13.291

--Rik
Reply | Threaded
Open this post in threaded view
|

Re: More extended performance profiling

John W. Eaton
Administrator
On 8/11/19 8:48 PM, Rik wrote:

> I was able to get versions 3.2.4 and 4.0.3 to compile so I have a longer
> baseline of performance measurements.  Using bm_toeplitz.m, there has
> been more than 100% slowdown since 3.2.4 to current dev.
>
> *Version* *3.2.4* *3.4.3* *3.6.4* *3.8.2* *4.0.3* *4.2.1*
> *4.4.1* *5.1.0* *dev (6.1.0)*
> runtime 5.8968
>
>
> 10.055 10.544 13.052 13.481 13.291

Is there a bug report open where we could track this info?

Please post the changes you made to build 3.2.4 with current tools.  I
tried but ran into numerous problems and if you have already done the
work it would save others time in trying to duplicate the issues.

The bm_toeplitz script includes an indexed assignment, a function call,
and a binary arithmetic operations in the loop.  As a quick check to see
if just one of these might be the greatest contributing factor to the
slowdown, could you try simple loops that have only one of these
features at a time?  For example, try replacing the statement in the
loop with

   b(k,j) = 13;   ## indexed assignment only, with a constant value

   abs(13);       ## function call only with a constant value (Octave is
not smart enough to eliminate the call or move it out of the loop)

   abs(k);        ## function call only with one variable value lookup

   k+1;           ## simple binary arithmetic with one variable value
lookup (Octave is dumb, so it will do this operation every time through
the loop).

Also, does it matter whether there is more than one index in the indexed
assignment expression or more than one nested loop?

What happens with an empty loop body?

Using the current sources, is there any significant difference when
running with the GUI vs. octave-cli?

jwe

Reply | Threaded
Open this post in threaded view
|

Re: More extended performance profiling

Rik-4
On 08/12/2019 08:00 AM, John W. Eaton wrote:
On 8/11/19 8:48 PM, Rik wrote:
I was able to get versions 3.2.4 and 4.0.3 to compile so I have a longer baseline of performance measurements.  Using bm_toeplitz.m, there has been more than 100% slowdown since 3.2.4 to current dev.

*Version*     *3.2.4*     *3.4.3*     *3.6.4*     *3.8.2*     *4.0.3*     *4.2.1* *4.4.1*     *5.1.0*     *dev (6.1.0)*
runtime     5.8968    
    
    
    10.055     10.544     13.052     13.481     13.291

Is there a bug report open where we could track this info?

I have opened a new report here https://savannah.gnu.org/bugs/index.php?56752.


Please post the changes you made to build 3.2.4 with current tools.  I tried but ran into numerous problems and if you have already done the work it would save others time in trying to duplicate the issues.


Going backwards is hard.  I still haven't got 3.4 - 3.8 to build.  I will post my configure scripts and patches for each version (eventually) to bug #56752.

The bm_toeplitz script includes an indexed assignment, a function call, and a binary arithmetic operations in the loop.  As a quick check to see if just one of these might be the greatest contributing factor to the slowdown, could you try simple loops that have only one of these features at a time?  For example, try replacing the statement in the loop with

  b(k,j) = 13;   ## indexed assignment only, with a constant value

  abs(13);       ## function call only with a constant value (Octave is not smart enough to eliminate the call or move it out of the loop)

  abs(k);        ## function call only with one variable value lookup

  k+1;           ## simple binary arithmetic with one variable value lookup (Octave is dumb, so it will do this operation every time through the loop).

Also, does it matter whether there is more than one index in the indexed assignment expression or more than one nested loop?

What happens with an empty loop body?

This was my first thought.  I re-purposed the original for loop test with an empty body

a = 1; b = 1; tic; for i=1:1000; for j=1:1000;   ; end; end; toc

Results below do show a slowdown of ~100%.

Version 3.2.4 3.4.3 3.6.4 3.8.2 4.0.3 4.2.1 4.4.1 5.1.0 dev (6.1.0)
bm.empty_loop.m 0.053467


0.0782108 0.0723422 0.128019 0.1248 0.096776

However, absolute numbers are very small.  This is 50 milliseconds over 1 million loop iterations.  Applied to the toeplitz benchmark it would only be around 100 milliseconds and I'm seeing 7 seconds of difference between 3.2.4 and dev for that.


Using the current sources, is there any significant difference when running with the GUI vs. octave-cli?

Not significant.  I used toeplitz benchmark and the following options with run-octave

-f --no-gui-libs : 13.472
-f --no-gui        : 13.715
-f --gui             : 13.800

--Rik

Reply | Threaded
Open this post in threaded view
|

Re: More extended performance profiling

Mike Miller-4
In reply to this post by John W. Eaton
On Mon, Aug 12, 2019 at 11:00:12 -0400, John W. Eaton wrote:
> Please post the changes you made to build 3.2.4 with current tools.  I tried
> but ran into numerous problems and if you have already done the work it
> would save others time in trying to duplicate the issues.

Attached are the patches I use to build Octave 3.4.3 through 4.0.3 with
a current toolchain. I haven't tried 3.2.4 yet but would also be
interested in seeing that patch.

I plan to maintain a project to build released versions of Octave, and
incorporate these patches, useful configure options, and any other
knowledge needed to keep old versions going.

--
mike

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: More extended performance profiling

Mike Miller-4
On Mon, Aug 12, 2019 at 10:36:39 -0700, Mike Miller wrote:
> Attached

Yeah.

--
mike

octave-old-patches.tar.gz (9K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: building ancient Octave versions

Rik-4
In reply to this post by Mike Miller-4
On 08/12/2019 10:36 AM, Mike Miller wrote:

> On Mon, Aug 12, 2019 at 11:00:12 -0400, John W. Eaton wrote:
>> Please post the changes you made to build 3.2.4 with current tools.  I tried
>> but ran into numerous problems and if you have already done the work it
>> would save others time in trying to duplicate the issues.
> Attached are the patches I use to build Octave 3.4.3 through 4.0.3 with
> a current toolchain. I haven't tried 3.2.4 yet but would also be
> interested in seeing that patch.
>
> I plan to maintain a project to build released versions of Octave, and
> incorporate these patches, useful configure options, and any other
> knowledge needed to keep old versions going.
>
I also attached them to bug #56752. 

It is probably good enough to just test 3.2.4 against dev until most of the
discrepancy has been removed.  Ideally, one might then look at the
difference to the 3.4.X series which had the fastest runtimes in my experience.

--Rik


signature.asc (201 bytes) Download Attachment