Stories from static analysis of the Octave code base

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Stories from static analysis of the Octave code base

Rik-4
All,

It's been a humbling experience going through the issues detected by static
analysis of the Octave code base
(https://wiki.octave.org/PVS_static_analyzer_-_5.0_Release).  Frankly, I
thought the code base was better than it turned out to be.  Besides flat
out calling the wrong function (e.g., double_value instead of
xdouble_value), I just found and corrected a surprisingly simple but
pervasive performance hit for any exceptional value (-Inf, Inf, NaN).

The code in question is deep in liboctave at liboctave/util/lo-ieee.cc. 
The representation of an IEEE exceptional value on the particular HW Octave
is running on is found and then stored away so that subsequent uses just
retrieve that value rather than re-deriving it each time.  The
initialization code was

void
octave_ieee_init (void)
{
  bool initialized = false;

  if (! initialized)
    {
       ... code to find and store exceptional values ...
       initialized = true;
    }
}

As you can see, initialized is always false so the longer code to derive
the exceptional values is run every single time.  And the exceptional
values themselves are functions which end up calling octave_ieee_init
twice: once for the "double" value and once for the "single" value.

Also, despite their name, Octave uses exceptional values quite a bit.  For
example, graphics handles are often initialized to NaN until they can be
assigned a true handle.

In this case, the solution was quite simple.  The declaration of
"initialized" just needed the static keyword to make this variable
persistent.  I wish all bugs were that simple to correct.

--Rik

Reply | Threaded
Open this post in threaded view
|

Re: Stories from static analysis of the Octave code base

John W. Eaton
Administrator
On 1/4/19 2:35 PM, Rik wrote:
> All,
>
> It's been a humbling experience going through the issues detected by static
> analysis of the Octave code base
> (https://wiki.octave.org/PVS_static_analyzer_-_5.0_Release).  Frankly, I
> thought the code base was better than it turned out to be.  Besides flat
> out calling the wrong function (e.g., double_value instead of
> xdouble_value), I just found and corrected a surprisingly simple but
> pervasive performance hit for any exceptional value (-Inf, Inf, NaN).

Well, that's embarrassing.  I probably made the original mistake.  Oops.

But I don't think we should be too hard on ourselves.  It is a bit
disappointing to find mistakes like this, but I'm glad that things are
improving.  And I think we have made a lot of progress in the last few
years to make Octave much better than it was.

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Stories from static analysis of the Octave code base

Andreas Weber-6
In reply to this post by Rik-4
Am 04.01.19 um 20:35 schrieb Rik:
> Besides flat
> out calling the wrong function (e.g., double_value instead of
> xdouble_value),

I found(1) one that wasn't detected by PVS, perhaps it was configured
without java and thus not checked?

http://hg.savannah.gnu.org/hgweb/octave/rev/45d5a4ae636b

-- Andy

[1] grep -rPHn '\.[^x][a-zA-Z0-9_]+_value +\("'

Reply | Threaded
Open this post in threaded view
|

Re: Stories from static analysis of the Octave code base

Andreas Weber-6
In reply to this post by Rik-4
Am 04.01.19 um 20:35 schrieb Rik:
> It's been a humbling experience going through the issues detected by static
> analysis of the Octave code base

IMHO the detection of string constants "true" and "false" as bool values
were also a surprising detection:

http://hg.savannah.gnu.org/hgweb/octave/rev/7a6f7a81ccd0

-- Andy

Reply | Threaded
Open this post in threaded view
|

Re: Stories from static analysis of the Octave code base

Rik-4
In reply to this post by John W. Eaton
On 01/04/2019 12:19 PM, John W. Eaton wrote:

> On 1/4/19 2:35 PM, Rik wrote:
>> All,
>>
>> It's been a humbling experience going through the issues detected by static
>> analysis of the Octave code base
>> (https://wiki.octave.org/PVS_static_analyzer_-_5.0_Release).  Frankly, I
>> thought the code base was better than it turned out to be.  Besides flat
>> out calling the wrong function (e.g., double_value instead of
>> xdouble_value), I just found and corrected a surprisingly simple but
>> pervasive performance hit for any exceptional value (-Inf, Inf, NaN).
>
> Well, that's embarrassing.  I probably made the original mistake.  Oops.
>
> But I don't think we should be too hard on ourselves.  It is a bit
> disappointing to find mistakes like this, but I'm glad that things are
> improving.  And I think we have made a lot of progress in the last few
> years to make Octave much better than it was.

Yes, on the optimistic side there has been enormous improvement.  I think
5.0 is going to be a really nice piece of software.

--Rik

Reply | Threaded
Open this post in threaded view
|

Re: Stories from static analysis of the Octave code base

Daniel Sebald
In reply to this post by Rik-4
On 01/04/2019 02:35 PM, Rik wrote:

> All,
>
> It's been a humbling experience going through the issues detected by static
> analysis of the Octave code base
> (https://wiki.octave.org/PVS_static_analyzer_-_5.0_Release).  Frankly, I
> thought the code base was better than it turned out to be.  Besides flat
> out calling the wrong function (e.g., double_value instead of
> xdouble_value), I just found and corrected a surprisingly simple but
> pervasive performance hit for any exceptional value (-Inf, Inf, NaN).
>
> The code in question is deep in liboctave at liboctave/util/lo-ieee.cc.
> The representation of an IEEE exceptional value on the particular HW Octave
> is running on is found and then stored away so that subsequent uses just
> retrieve that value rather than re-deriving it each time.  The
> initialization code was
>
> void
> octave_ieee_init (void)
> {
>    bool initialized = false;
>
>    if (! initialized)
>      {
>         ... code to find and store exceptional values ...
>         initialized = true;
>      }
> }
>
> As you can see, initialized is always false so the longer code to derive
> the exceptional values is run every single time.  And the exceptional
> values themselves are functions which end up calling octave_ieee_init
> twice: once for the "double" value and once for the "single" value.

The compiler should have issued a "test always false" type of warning
for that...

There might be a small non-bug change in order beyond what you modified
in Rev #26425.  If I understand correctly, before the change something like

x = Inf(1000);

would have a performance hit because of the reinitialization for every
instance of IEEE inf.

However, there are multiple locations where there is assurance of
initialization:

libinterp/corefcn/interpreter.cc:    octave_ieee_init ();
liboctave/util/lo-ieee.cc:  octave_ieee_init ();
liboctave/util/lo-ieee.cc:  octave_ieee_init ();
liboctave/util/lo-ieee.cc:  octave_ieee_init ();
liboctave/util/lo-ieee.cc:  octave_ieee_init ();
liboctave/util/lo-ieee.cc:  octave_ieee_init ();
liboctave/util/lo-ieee.cc:  octave_ieee_init ();

The first check is when an instance of the interpreter is created, in
the constructor.  The remaining are when there is a particular call to
actually set +/-Inf or NA.  I would think that only one of those two
groups is needed, but not both.

If the IEEE initialization is checked only once upon creating the
interpreter then Octave would avoid quite a bit of checking in the
x=inf(1000) case, but that would also mean that Octave would immediately
abort if the Inf/NA aren't present when an interpreter is created for
any particular non-Inf/NA script.

Instead, if the IEEE initialization is checked only whenever an Inf/NA
assignment is done, then Octave would have a chance to run properly so
long as it didn't need Inf/NA.


> Also, despite their name, Octave uses exceptional values quite a bit.  For
> example, graphics handles are often initialized to NaN until they can be
> assigned a true handle.

Then that makes more of a case for having the octave_ieee_init() as
mandatory in interpreter.cc.  [One could temporarily force the
occurrences of octave_ieee_init() in lo-ieee.cc to abort and see if that
function is used anywhere outside the interpreter prior to instantiation
of the interpreter.]

Something else I'm wondering about is what would happen in the case
where there is no octave_ieee_init() in interpreter::interpreter() and
the NA/Inf arises via CPU/ALU arithmetic operations (e.g., 1 DIV 0) as
opposed to assignment.  In other words, if there were no instance of
octave_ieee_init() in interpreter::interpreter, could there be
situations where the non-IEEE float fails even when none of the
abort-plus-informative-message checks happen and one simply gets
abort-via-crash?

And what about within lo-ieee.cc itself?  There are currently no
octave_ieee_init() within the routines

__lo_ieee_is_NA (double x)
__lo_ieee_is_old_NA (double x)
__lo_ieee_float_is_NA (float x)

Couldn't the user run a script, say "isinf(5)" or "isnan(20)", before
actually creating a variable with value +/-Inf or NA?

Dan

Reply | Threaded
Open this post in threaded view
|

Re: Stories from static analysis of the Octave code base

Rik-4
On 01/05/2019 08:32 AM, Daniel J Sebald wrote:

> On 01/04/2019 02:35 PM, Rik wrote:
>> All,
>>
>> It's been a humbling experience going through the issues detected by static
>> analysis of the Octave code base
>> (https://wiki.octave.org/PVS_static_analyzer_-_5.0_Release).  Frankly, I
>> thought the code base was better than it turned out to be.  Besides flat
>> out calling the wrong function (e.g., double_value instead of
>> xdouble_value), I just found and corrected a surprisingly simple but
>> pervasive performance hit for any exceptional value (-Inf, Inf, NaN).
>>
>> The code in question is deep in liboctave at liboctave/util/lo-ieee.cc.
>> The representation of an IEEE exceptional value on the particular HW Octave
>> is running on is found and then stored away so that subsequent uses just
>> retrieve that value rather than re-deriving it each time.  The
>> initialization code was
>>
>> void
>> octave_ieee_init (void)
>> {
>>    bool initialized = false;
>>
>>    if (! initialized)
>>      {
>>         ... code to find and store exceptional values ...
>>         initialized = true;
>>      }
>> }
>>
>> As you can see, initialized is always false so the longer code to derive
>> the exceptional values is run every single time.  And the exceptional
>> values themselves are functions which end up calling octave_ieee_init
>> twice: once for the "double" value and once for the "single" value.
>
> The compiler should have issued a "test always false" type of warning for
> that...
>
> There might be a small non-bug change in order beyond what you modified
> in Rev #26425.  If I understand correctly, before the change something like
>
> x = Inf(1000);
>
> would have a performance hit because of the reinitialization for every
> instance of IEEE inf.
>
> However, there are multiple locations where there is assurance of
> initialization:
>
> libinterp/corefcn/interpreter.cc:    octave_ieee_init ();
> liboctave/util/lo-ieee.cc:  octave_ieee_init ();
> liboctave/util/lo-ieee.cc:  octave_ieee_init ();
> liboctave/util/lo-ieee.cc:  octave_ieee_init ();
> liboctave/util/lo-ieee.cc:  octave_ieee_init ();
> liboctave/util/lo-ieee.cc:  octave_ieee_init ();
> liboctave/util/lo-ieee.cc:  octave_ieee_init ();
>
> The first check is when an instance of the interpreter is created, in the
> constructor.  The remaining are when there is a particular call to
> actually set +/-Inf or NA.  I would think that only one of those two
> groups is needed, but not both.
>
> If the IEEE initialization is checked only once upon creating the
> interpreter then Octave would avoid quite a bit of checking in the
> x=inf(1000) case, but that would also mean that Octave would immediately
> abort if the Inf/NA aren't present when an interpreter is created for any
> particular non-Inf/NA script.
>
> Instead, if the IEEE initialization is checked only whenever an Inf/NA
> assignment is done, then Octave would have a chance to run properly so
> long as it didn't need Inf/NA.
>
>
>> Also, despite their name, Octave uses exceptional values quite a bit.  For
>> example, graphics handles are often initialized to NaN until they can be
>> assigned a true handle.
>
> Then that makes more of a case for having the octave_ieee_init() as
> mandatory in interpreter.cc.  [One could temporarily force the
> occurrences of octave_ieee_init() in lo-ieee.cc to abort and see if that
> function is used anywhere outside the interpreter prior to instantiation
> of the interpreter.]
>
> Something else I'm wondering about is what would happen in the case where
> there is no octave_ieee_init() in interpreter::interpreter() and the
> NA/Inf arises via CPU/ALU arithmetic operations (e.g., 1 DIV 0) as
> opposed to assignment.  In other words, if there were no instance of
> octave_ieee_init() in interpreter::interpreter, could there be situations
> where the non-IEEE float fails even when none of the
> abort-plus-informative-message checks happen and one simply gets
> abort-via-crash?
>
> And what about within lo-ieee.cc itself?  There are currently no
> octave_ieee_init() within the routines
>
> __lo_ieee_is_NA (double x)
> __lo_ieee_is_old_NA (double x)
> __lo_ieee_float_is_NA (float x)
>
> Couldn't the user run a script, say "isinf(5)" or "isnan(20)", before
> actually creating a variable with value +/-Inf or NA?
>
> Dan
>

In general, absolutely to your comments.  I saw the same chance for
optimization, but time is a bit short for the 5.0 release so I didn't
pursue it.  I think the best option would be to initialize the values just
once at the very beginning of the interpreter.  At least for the executable
Octave, this would mean keeping the instance in
libinterp/corefcn/interpreter.cc, and removing the others.

The wrinkle is that liboctave is theoretically a standalone library that
others can link in to their C++ code without also linking in the
interpreter in liboctinterp.  Is there a way to guarantee when a library is
loaded that a particular piece of code is executed?  I kind of think not. 
I don't know how many programmers have actually tried to link in liboctave,
but we probably want to preserve the flexibility to do so.

The is_XXX_NA routines are safe because they eventually call routines that
either do octave_ieee_init or do not depend on it at all such as routines
from the std library.

--Rik


Reply | Threaded
Open this post in threaded view
|

Re: Stories from static analysis of the Octave code base

Andreas Weber-6
In reply to this post by Andreas Weber-6
Am 04.01.19 um 21:28 schrieb Andreas Weber:
> I found(1) one that wasn't detected by PVS, perhaps it was configured
> without java and thus not checked?
>
> http://hg.savannah.gnu.org/hgweb/octave/rev/45d5a4ae636b

Answered in https://savannah.gnu.org/bugs/?55347#comment23

Reply | Threaded
Open this post in threaded view
|

Re: Stories from static analysis of the Octave code base

al davis
In reply to this post by Rik-4
On Sat, 5 Jan 2019 10:27:56 -0800
Rik <[hidden email]> wrote:
>  Is there a way to guarantee when a library is
> loaded that a particular piece of code is executed?  I kind of think not. 

Yes ...  A static constructor.

class BOB {

BOB() {
   // code to run when library is loaded
}

~BOB() {
   // code to run when library is unloaded
}

};

static BOB x;