profiling Octave

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

profiling Octave

Rik-4
jwe,

I wanted to look to look empirically at the slow down that has occurred
over time with Octave.

I tried oprofile first, but even with some tweaking of the sampling
frequency I couldn't get high enough resolution to really see what needs to
be optimized.

Since we're using GNU tools I thought I would just instrument the code by
compiling with -pg and then use gprof to examine things.  But, run-octave
quits immediately with "Profiling time alarm".  I suspect it is something
to do with uncaught signals and the way that the octave wrapper forks
immediately.

As a final stab I tried Callgrind with 'run-octave -callgrind' but the
resulting log file is completely empty.  Probably, again, because it is
tracking the wrapper that forks rather than the true executable.

Any ideas?

--Rik


Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

John W. Eaton
Administrator
On 8/9/19 12:30 AM, Rik wrote:

> jwe,
>
> I wanted to look to look empirically at the slow down that has occurred
> over time with Octave.
>
> I tried oprofile first, but even with some tweaking of the sampling
> frequency I couldn't get high enough resolution to really see what needs to
> be optimized.
>
> Since we're using GNU tools I thought I would just instrument the code by
> compiling with -pg and then use gprof to examine things.  But, run-octave
> quits immediately with "Profiling time alarm".  I suspect it is something
> to do with uncaught signals and the way that the octave wrapper forks
> immediately.
>
> As a final stab I tried Callgrind with 'run-octave -callgrind' but the
> resulting log file is completely empty.  Probably, again, because it is
> tracking the wrapper that forks rather than the true executable.
>
> Any ideas?

If you are just looking at the interpreter speed and not the GUI, can
you try the profiling tools with octave-cli?  It doesn't fork at startup.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Andreas Weber-6
In reply to this post by Rik-4
Am 09.08.19 um 06:30 schrieb Rik:
> Since we're using GNU tools I thought I would just instrument the code by
> compiling with -pg and then use gprof to examine things.  But, run-octave
> quits immediately with "Profiling time alarm".

If I remember corectly you have to use "--no-pie"
Two years ago I created this to profile various versions in a docker
container:
https://github.com/octave-de/octave-docker/tree/master/stretch_release_tarball_gperf

-- Andy

Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Rik-4
In reply to this post by John W. Eaton
On 08/08/2019 09:50 PM, John W. Eaton wrote:

> On 8/9/19 12:30 AM, Rik wrote:
>> jwe,
>>
>> I wanted to look to look empirically at the slow down that has occurred
>> over time with Octave.
>>
>> I tried oprofile first, but even with some tweaking of the sampling
>> frequency I couldn't get high enough resolution to really see what needs to
>> be optimized.
>>
>> Since we're using GNU tools I thought I would just instrument the code by
>> compiling with -pg and then use gprof to examine things.  But, run-octave
>> quits immediately with "Profiling time alarm".  I suspect it is something
>> to do with uncaught signals and the way that the octave wrapper forks
>> immediately.
>>
>> As a final stab I tried Callgrind with 'run-octave -callgrind' but the
>> resulting log file is completely empty.  Probably, again, because it is
>> tracking the wrapper that forks rather than the true executable.
>>
>> Any ideas?
>
> If you are just looking at the interpreter speed and not the GUI, can you
> try the profiling tools with octave-cli?  It doesn't fork at startup.
>
Thanks, that helps.

With Callgrind, I can now get some results, although it is 70X slower than
the code running by itself so I can't do extensive tests.

When I run with gprof, there still seems to be an issue.  The flat profile is

  %   cumulative   self              self     total          
 time   seconds   seconds    calls  Ts/call  Ts/call  name   
  0.00      0.00     0.00        2     0.00     0.00 
__gnu_cxx::__enable_if<std::__is_char<char>::__value, bool>::__type
std::operator==<char>(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)
  0.00      0.00     0.00        1     0.00     0.00  octave_hg_id[abi:cxx11]()
  0.00      0.00     0.00        1     0.00     0.00  main

for "gprof -p src/.libs/octave-cli".

This does seem to correspond with main-cli.cc which only contains main()
and the function check_hg_versions().  Is there another image file I can
give gprof?

--Rik






> jwe
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Carnë Draug
In reply to this post by Rik-4
On Fri, 9 Aug 2019 at 05:30, Rik <[hidden email]> wrote:
> I wanted to look to look empirically at the slow down that has occurred
> over time with Octave.
>
> [...]
>
> Any ideas?

Have we actually established that Octave has indeed been running
slower over time?  There's been a lot of talk about it, but all the
measurements I have seen where by running the test suite and not a
real Octave program.

I think that before worrying about what tools to check where the
performance drop is, maybe we should get a set of programs that can be
used to confirm this slowdown.

For what is worth, I too have the feeling that Octave has been getting
slower over time, I just haven't actually got around to confirm this
with proper methods.

Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Dmitri A. Sergatskov
On Fri, Aug 9, 2019 at 2:16 PM Carnë Draug <[hidden email]> wrote:

>
> On Fri, 9 Aug 2019 at 05:30, Rik <[hidden email]> wrote:
> > I wanted to look to look empirically at the slow down that has occurred
> > over time with Octave.
> >
> > [...]
> >
> > Any ideas?
>
> Have we actually established that Octave has indeed been running
> slower over time?  There's been a lot of talk about it, but all the
> measurements I have seen where by running the test suite and not a
> real Octave program.
>
> I think that before worrying about what tools to check where the
> performance drop is, maybe we should get a set of programs that can be
> used to confirm this slowdown.
>
> For what is worth, I too have the feeling that Octave has been getting
> slower over time, I just haven't actually got around to confirm this
> with proper methods.
>

I run somewhat modified sciviews' benchmark (size of test matrices
increased) and
I do not see much difference in the last few years:

octave-4.2.0-rc2 (2016-09-21):
   Octave Benchmark 2
   ==================
Number of times each test is run__________________________: 3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 5000x5000 matrix (sec): 1.1713
5000x5000 normal distributed random matrix ^1001____ (sec): 1.9489
Sorting of 9,000,000 random values__________________ (sec): 1.0415
5000x5000 cross-product matrix (b = a' * a)_________ (sec): 2.7143
Linear regression over 5000x5000 matrix (c = a \\ b') (sec): 2.1651
                  ------------------------------------------------------
                Trimmed geom. mean (2 extremes eliminated): 1.7034

   II. Matrix functions
   --------------------
FFT over 9,000,001 random values____________________ (sec): 0.80781
Eigenvalues of a 900x900 random matrix ______________ (sec): 2.4541
Determinant of a 5000x5000 random matrix____________ (sec): 1.2405
Cholesky decomposition of a 5000x5000 matrix________ (sec): 0.5662
Inverse of a 5000x5000 random matrix________________ (sec): 3.9265
                  ------------------------------------------------------
                Trimmed geom. mean (2 extremes eliminated): 1.3498

   III. Programmation
   ------------------
9,500,000 Fibonacci numbers calculation (vector calc)_ (sec): 1.5602
Creation of a 5250x5250 Hilbert matrix (matrix calc) (sec): 0.48573
Grand common divisors of 500,000 pairs (recursion)___ (sec): 0.17606
Creation of a 620x620 Toeplitz matrix (loops)_______ (sec): 5.363
Escoufier's method on a 97x97 matrix (mixed)________ (sec): 5.8139
                  ------------------------------------------------------
                Trimmed geom. mean (2 extremes eliminated): 1.5959


Total time for all 15 tests_________________________ (sec): 31.4351
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.5424
                      --- End of test ---

Today's (2019-08-09) dev (hgid = 2c4759c8239c @)

   Octave Benchmark 2
   ==================
Number of times each test is run__________________________: 3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 5000x5000 matrix (sec): 1.1394
5000x5000 normal distributed random matrix ^1001____ (sec): 0.93096
Sorting of 9,000,000 random values__________________ (sec): 1.0292
5000x5000 cross-product matrix (b = a' * a)_________ (sec): 1.4269
Linear regression over 5000x5000 matrix (c = a \\ b') (sec): 1.2995
                  ------------------------------------------------------
                Trimmed geom. mean (2 extremes eliminated): 1.1508

   II. Matrix functions
   --------------------
FFT over 9,000,001 random values____________________ (sec): 0.81812
Eigenvalues of a 900x900 random matrix ______________ (sec): 0.64627
Determinant of a 5000x5000 random matrix____________ (sec): 1.3008
Cholesky decomposition of a 5000x5000 matrix________ (sec): 0.57645
Inverse of a 5000x5000 random matrix________________ (sec): 3.7264
                  ------------------------------------------------------
                Trimmed geom. mean (2 extremes eliminated): 0.8827

   III. Programmation
   ------------------
9,500,000 Fibonacci numbers calculation (vector calc)_ (sec): 0.86849
Creation of a 5250x5250 Hilbert matrix (matrix calc) (sec): 0.52379
Grand common divisors of 500,000 pairs (recursion)___ (sec): 0.16265
Creation of a 620x620 Toeplitz matrix (loops)_______ (sec): 4.4124
Escoufier's method on a 97x97 matrix (mixed)________ (sec): 5.966
                  ------------------------------------------------------
                Trimmed geom. mean (2 extremes eliminated): 1.2614


Total time for all 15 tests_________________________ (sec): 24.8273
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.0861
                      --- End of test ---

This is on i7-2600 computer.

Dmitri.
--

Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Rik-4
In reply to this post by Carnë Draug
On 08/09/2019 12:16 PM, Carnë Draug wrote:
On Fri, 9 Aug 2019 at 05:30, Rik [hidden email] wrote:
I wanted to look to look empirically at the slow down that has occurred
over time with Octave.

[...]

Any ideas?
Have we actually established that Octave has indeed been running
slower over time?  There's been a lot of talk about it, but all the
measurements I have seen where by running the test suite and not a
real Octave program.

I think that before worrying about what tools to check where the
performance drop is, maybe we should get a set of programs that can be
used to confirm this slowdown.

For what is worth, I too have the feeling that Octave has been getting
slower over time, I just haven't actually got around to confirm this
with proper methods.
Always good to check assumptions, but unfortunately I have been testing regularly with the same code.  I didn't start out intending this to be a performance benchmark so it is rather limited, but I do have historical data captured with it.

The m-file is tst_for_loop.m and is the single line:

a = 1; b = 1; tic; for i=1:1000; for j=1:1000; a = a + b + 1.0; end; end; toc

Because I have changed computers over time I have a segmented profile history.  For the distant past I had

Version 3.2.4 3.4.3 3.6.4 3.8.0 3.8.1
Mean 2.24752 1.71794 2.96738 3.45597 3.16926

Right now I see

4.2.1 4.4.1 5.1.0 dev (6.1.0)
1.08590 1.38079 1.37580 1.41128

Part of the problem is that we have regression testing for functionality, but no regressions tests for performance.  Thus, we can accidentally change Octave's performance drastically without noticing.  This bug report captured one instance in which the performance decline was noted, and because it was noted it got fixed (https://savannah.gnu.org/bugs/?func=detailitem&item_id=52809).

--Rik
Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Dmitri A. Sergatskov


On Fri, Aug 9, 2019 at 3:27 PM Rik <[hidden email]> wrote:
On 08/09/2019 12:16 PM, Carnë Draug wrote:
On Fri, 9 Aug 2019 at 05:30, Rik [hidden email] wrote:
I wanted to look to look empirically at the slow down that has occurred
over time with Octave.

[...]

Any ideas?
Have we actually established that Octave has indeed been running
slower over time?  There's been a lot of talk about it, but all the
measurements I have seen where by running the test suite and not a
real Octave program.

I think that before worrying about what tools to check where the
performance drop is, maybe we should get a set of programs that can be
used to confirm this slowdown.

For what is worth, I too have the feeling that Octave has been getting
slower over time, I just haven't actually got around to confirm this
with proper methods.
Always good to check assumptions, but unfortunately I have been testing regularly with the same code.  I didn't start out intending this to be a performance benchmark so it is rather limited, but I do have historical data captured with it.

The m-file is tst_for_loop.m and is the single line:

a = 1; b = 1; tic; for i=1:1000; for j=1:1000; a = a + b + 1.0; end; end; toc

Because I have changed computers over time I have a segmented profile history.  For the distant past I had

Version 3.2.4 3.4.3 3.6.4 3.8.0 3.8.1
Mean 2.24752 1.71794 2.96738 3.45597 3.16926

Right now I see

4.2.1 4.4.1 5.1.0 dev (6.1.0)
1.08590 1.38079 1.37580 1.41128

Part of the problem is that we have regression testing for functionality, but no regressions tests for performance.  Thus, we can accidentally change Octave's performance drastically without noticing.  This bug report captured one instance in which the performance decline was noted, and because it was noted it got fixed (https://savannah.gnu.org/bugs/?func=detailitem&item_id=52809).

--Rik

The code for Toeplitz matrix is:
cumulate = 0; b = 0;
for i = 1:runs
  b = zeros(620, 620);
  tic;
    for j = 1:620
      for k = 1:620
        b(k,j) = abs(j - k) + 1;
      end
    end
  timing = toc;
  cumulate = cumulate + timing;
end
timing = cumulate/runs;
times(4, 3) = timing;

So it is mostly  loops.

Dmitri.
--

 
Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Rik-4
On 08/09/2019 02:00 PM, Dmitri A. Sergatskov wrote:


On Fri, Aug 9, 2019 at 3:27 PM Rik <[hidden email]> wrote:
On 08/09/2019 12:16 PM, Carnë Draug wrote:
On Fri, 9 Aug 2019 at 05:30, Rik [hidden email] wrote:
I wanted to look to look empirically at the slow down that has occurred
over time with Octave.

[...]

Any ideas?
Have we actually established that Octave has indeed been running
slower over time?  There's been a lot of talk about it, but all the
measurements I have seen where by running the test suite and not a
real Octave program.

I think that before worrying about what tools to check where the
performance drop is, maybe we should get a set of programs that can be
used to confirm this slowdown.

For what is worth, I too have the feeling that Octave has been getting
slower over time, I just haven't actually got around to confirm this
with proper methods.
Always good to check assumptions, but unfortunately I have been testing regularly with the same code.  I didn't start out intending this to be a performance benchmark so it is rather limited, but I do have historical data captured with it.

The m-file is tst_for_loop.m and is the single line:

a = 1; b = 1; tic; for i=1:1000; for j=1:1000; a = a + b + 1.0; end; end; toc

Because I have changed computers over time I have a segmented profile history.  For the distant past I had

Version 3.2.4 3.4.3 3.6.4 3.8.0 3.8.1
Mean 2.24752 1.71794 2.96738 3.45597 3.16926

Right now I see

4.2.1 4.4.1 5.1.0 dev (6.1.0)
1.08590 1.38079 1.37580 1.41128

Part of the problem is that we have regression testing for functionality, but no regressions tests for performance.  Thus, we can accidentally change Octave's performance drastically without noticing.  This bug report captured one instance in which the performance decline was noted, and because it was noted it got fixed (https://savannah.gnu.org/bugs/?func=detailitem&item_id=52809).

--Rik

The code for Toeplitz matrix is:
cumulate = 0; b = 0;
for i = 1:runs
  b = zeros(620, 620);
  tic;
    for j = 1:620
      for k = 1:620
        b(k,j) = abs(j - k) + 1;
      end
    end
  timing = toc;
  cumulate = cumulate + timing;
end
timing = cumulate/runs;
times(4, 3) = timing;

So it is mostly  loops.

Dmitri.
--


I changed the code slightly to make this benchmark file, bm_toeplitz.m

runs = 5;

cumulate = 0; b = 0;
for i = 1:runs
  b = zeros(620, 620);
  tic;
    for j = 1:620
      for k = 1:620
        b(k,j) = abs(j - k) + 1;
      end
    end
  timing = toc;
  cumulate = cumulate + timing;
end

## Total time
cumulate

Results are the same as the simpler loop test, about a 25% slowdown from 4.2 to 4.4.

4.2.1 4.4.1 5.1.0
10.544 13.052 13.481

--Rik

Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Dmitri A. Sergatskov


On Sat, Aug 10, 2019 at 1:31 PM Rik <[hidden email]> wrote:
I changed the code slightly to make this benchmark file, bm_toeplitz.m

runs = 5;

cumulate = 0; b = 0;
for i = 1:runs
  b = zeros(620, 620);
  tic;
    for j = 1:620
      for k = 1:620
        b(k,j) = abs(j - k) + 1;
      end
    end
  timing = toc;
  cumulate = cumulate + timing;
end

## Total time
cumulate

Results are the same as the simpler loop test, about a 25% slowdown from 4.2 to 4.4.

4.2.1 4.4.1 5.1.0
10.544 13.052 13.481


I compiled 4.2.2  and can confirm that (on i7-2600k):

4.2.2: cumulate =  17.162
6.0.0 (dev: 2c4759c8239c @): cumulate =  21.816


--Rik

Dmitri.
--
Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Daniel Sebald
On 8/10/19 5:08 PM, Dmitri A. Sergatskov wrote:

>
>
> On Sat, Aug 10, 2019 at 1:31 PM Rik <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I changed the code slightly to make this benchmark file, bm_toeplitz.m
>
>     runs = 5;
>
>     cumulate = 0; b = 0;
>     for i = 1:runs
>        b = zeros(620, 620);
>        tic;
>          for j = 1:620
>            for k = 1:620
>              b(k,j) = abs(j - k) + 1;
>            end
>          end
>        timing = toc;
>        cumulate = cumulate + timing;
>     end
>
>     ## Total time
>     cumulate
>
>     Results are the same as the simpler loop test, about a 25% slowdown
>     from 4.2 to 4.4.
>
>     *4.2.1* *4.4.1* *5.1.0*
>     10.544 13.052 13.481
>
>
>
> I compiled 4.2.2  and can confirm that (on i7-2600k):
>
> 4.2.2: cumulate =  17.162
> 6.0.0 (dev: 2c4759c8239c @): cumulate =  21.816
>
>
>     --Rik
>
>
> Dmitri.
> --

You've picked an instruction format heavily dependent on the efficiency
of the interpreter, correct?

Is there any part of Octave around the interpreter engine that has gone
from one version to another?  (I'm recalling some kind of open-source
parser or compiler as one of the libraries included.)  That is, is it an
Octave issue, or could it be some outside library's issue?

Regarding the interpreting result and how efficient it might be, is
there anywhere in the Octave code where one could dump out, say, the
size of the interpreter code/instruction list/or anything that might
give an indication of efficiency?  That is, say there is a point where
the parser does it's job, and it is then a case of executing some type
of primitive "language".  We could then put a printf for the size of the
primitive list in both 4.2.1 and 4.4.1.

Using a timer *inside* the code rather than external might help as well.
  That is, say there is such a point I described above.  Printing out
times at various locations along the way

Start
End of interpreter
End of execution phase

might reveal something.

One last question, is just-in-time compiling supposed to improve this by
cutting down on the amount of interpretation needed in the double
looping above?

Dan

Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Rik-4
On 08/10/2019 09:48 PM, Daniel J Sebald wrote:
>
> You've picked an instruction format heavily dependent on the efficiency
> of the interpreter, correct?

Correct.  The libraries behind Octave, like BLAS and LAPACK, are changing
very slowly and mostly just for the occasional bug fix.  Thus, I don't
foresee any slowdowns in the code that Octave calls.  Hence, I think any
slowdown over time is in Octave's sequence of parsing code, building an
intermediate representation, and then evaluating that representation.

>
> Is there any part of Octave around the interpreter engine that has gone
> from one version to another?  (I'm recalling some kind of open-source
> parser or compiler as one of the libraries included.)  That is, is it an
> Octave issue, or could it be some outside library's issue?

JWE has changed the parser from a pull parser to a push parser (or maybe
the other way around?) over time.  My hunch is that it is not parsing (done
by lex and bison--old, well-established tools), but evaluation that has slowed.

>
> Regarding the interpreting result and how efficient it might be, is there
> anywhere in the Octave code where one could dump out, say, the size of
> the interpreter code/instruction list/or anything that might give an
> indication of efficiency?  That is, say there is a point where the parser
> does it's job, and it is then a case of executing some type of primitive
> "language".  We could then put a printf for the size of the primitive
> list in both 4.2.1 and 4.4.1.

This is a bit beyond me, but I think so.  I believe Octave constructs a
tree of nodes from the source code and then proceeds to evaluate them.  So
you are correct, if

y = x + 1

generates the same primitive actions

1) create temporary octave_value from scalar 1
2) fetch x
3) perform operator function plus (x, tmp_value)
4) assign octave_value output to y

between versions, but one version takes longer than another, then the
conclusion must be that the individual steps are taking longer.
 

>
> Using a timer *inside* the code rather than external might help as well.
>  That is, say there is such a point I described above.  Printing out
> times at various locations along the way
>
> Start
> End of interpreter
> End of execution phase
>
> might reveal something.

We might have to go that route.  I wasn't able to get solid results from
oprofile or gprof, and callgrind just left me confused.

>
> One last question, is just-in-time compiling supposed to improve this by
> cutting down on the amount of interpretation needed in the double looping
> above?

I think so.  Idea is to do the parsing to intermediate form.  Then, instead
of calling evaluate on the IR each time through the loop, continue further
and actually compile the IR to true executable code that can just be run,
not evaluated.

--Rik


Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

Daniel Sebald
On 8/12/19 5:08 PM, Rik wrote:
> On 08/10/2019 09:48 PM, Daniel J Sebald wrote:
[snip]
>> Is there any part of Octave around the interpreter engine that has gone
>> from one version to another?  (I'm recalling some kind of open-source
>> parser or compiler as one of the libraries included.)  That is, is it an
>> Octave issue, or could it be some outside library's issue?
>
> JWE has changed the parser from a pull parser to a push parser (or maybe
> the other way around?) over time.  My hunch is that it is not parsing (done
> by lex and bison--old, well-established tools), but evaluation that has slowed.

Yeah, lex.  That is what I was trying to think of.


>> Regarding the interpreting result and how efficient it might be, is there
>> anywhere in the Octave code where one could dump out, say, the size of
>> the interpreter code/instruction list/or anything that might give an
>> indication of efficiency?  That is, say there is a point where the parser
>> does it's job, and it is then a case of executing some type of primitive
>> "language".  We could then put a printf for the size of the primitive
>> list in both 4.2.1 and 4.4.1.
>
> This is a bit beyond me, but I think so.  I believe Octave constructs a
> tree of nodes from the source code and then proceeds to evaluate them.  So
> you are correct, if
>
> y = x + 1
>
> generates the same primitive actions
>
> 1) create temporary octave_value from scalar 1
> 2) fetch x
> 3) perform operator function plus (x, tmp_value)
> 4) assign octave_value output to y
>
> between versions, but one version takes longer than another, then the
> conclusion must be that the individual steps are taking longer.

Interesting about the push/pull parser.  Considering steps like above,
could reference counting have been affected in some way?  Say changing
the parser design resulted in the reference count going down to zero
when it shouldn't have so that there is more memory reallocation than in
the past?  Just a theory.  Something like memory allocation should be
tracked in valgrind, I would think.

Dan

Reply | Threaded
Open this post in threaded view
|

Re: profiling Octave

John W. Eaton
Administrator
In reply to this post by Rik-4
On 8/12/19 5:08 PM, Rik wrote:

> JWE has changed the parser from a pull parser to a push parser (or maybe
> the other way around?) over time.  My hunch is that it is not parsing (done
> by lex and bison--old, well-established tools), but evaluation that has slowed.

We currently ask bison to generate both push and pull parsers but
currently only use the pull parser (the parser calls lex, which calls a
function to gather input from the user or a file).  We would only use
the push parser and lexer if we change the design of the GUI terminal
widget as we have discussed previously.

Compared to evaluation, parsing is pretty fast and only happens once
when a function or script is first loaded by the interpreter.  You can
time required to parse a single function or script by doing

   tic, __parse_file__ ("bm_toeplitz.m"), toc

On my system, this reports a time of 0.000945091 seconds.

> This is a bit beyond me, but I think so.  I believe Octave constructs a
> tree of nodes from the source code and then proceeds to evaluate them.  So
> you are correct, if
>
> y = x + 1
>
> generates the same primitive actions
>  > 1) create temporary octave_value from scalar 1

The constant is already stored in the parse tree as an octave_value object.

> 2) fetch x

This should be fairly quick for variables since they are the first thing
we look for when resolving symbols.

> 3) perform operator function plus (x, tmp_value)

Somewhat time consuming because we have to dispatch to the correct
addition function based on the types of the values./

> 4) assign octave_value output to y

Simple assignment to a variable with no index should be the fastest, but
there may still be a lot of room for improvement.

> between versions, but one version takes longer than another, then the
> conclusion must be that the individual steps are taking longer.

We may also be doing more work because we have to handle new features
(for example, class dispatch or import declarations when looking for
functions).  Maybe some of the things we are doing now are unnecessary
in many cases but we are doing them always.  It's not always easy to see
what operations can be skipped and still provide the correct behavior.

> I think so.  Idea is to do the parsing to intermediate form.  Then, instead
> of calling evaluate on the IR each time through the loop, continue further
> and actually compile the IR to true executable code that can just be run,
> not evaluated.

Yes, and the general idea is that compiling has to be done "just in
time" because without type info you can't compile the language and type
info is not known until a function is called with actual values.  Then
you cache the compiled code along with the types of the arguments so
that it can be reused if you encounter another call to the same function
with the same argument types again.

A generic compiler that generates code that uses octave_value objects
would be of almost no use because one of the most time consuming parts
of evaluation is dispatching based on types.  Using octave_value objects
means you still do all of that work.

jwe