sparse matrices and error handling

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

sparse matrices and error handling

Paul Kienzle-2
Recent changes in Octave have changed the way critical errors are handled.
It use to be that when code encountered an unrecoverable error, it could
use jump_to_top_level to return to the octave prompt.  That is no longer
the case, so sparse matrix error handling will have to be revisited.  

I'll see to what extent I can turn it into error propogation code.  I may
have to use setjmp/longjmp to catch errors from deep within superlu.  Andy,
do you have any opinions on how this should be done?  I'm thinking of
changing the internal functions so that instead of returning matrices, they
take a matrix reference as a parameter and modify that, and return an
error flag.

John, does error() now throw an exception so that oct-files no longer
have to check for error_state after every operation?  This is a
rhetorical question, since I see from the code that error_state is
alive and well.  This business of propogating error_state is why I
was looking at C++ exception handling a couple of years ago.  Between
that and better memory recover, I'm almost convinced that C++ exceptions
are a good idea.  Too bad they aren't integrated with posix signal
handling in any meaningful way 8-(


Paul Kienzle
[hidden email]

On Sun, Nov 17, 2002 at 01:26:33AM +0000, Ole J. Hagen wrote:

> Hi, again.
>
> What does jump_to_top_level function do? And where is it declared? Couldn't
> find it....But I found OCTAVE_JUMP_TO_TOP_LEVEL in either sighandler.h or
> sighandler.cc file.....Do you have a explanation of that matter? Since it is
> inside a sighandler file, is it a method in the signal-handler? But that
> doesn't make sense either, since it could be found in util.h or util.cc......
>
>
> By the way; I have checked out the sparse-library and it
> works.....YIPPI-KAY-AY
>
>
> Ole
>
> On Sunday 17 November 2002 00:10, you wrote:
> > I already removed some jump_to_toplevel's.  It didn't occur to me to look
> > in a .h file.  They really have no business being in user functions.  It
> > is going to take some thought to remove them.
> >
> > - Paul
> >
> > On Sat, Nov 16, 2002 at 11:49:03PM +0000, Ole J. Hagen wrote:
> > > Hi again...
> > > It seems to me that the function  "jump_to_top_level" causes the
> > > problem.. I uncommented it from make_sparse.h! I believe
> > > jump_to_top_level has been removed from the CVS version of Octave, or is
> > > a conditional-step due to preprocessor in CVS version of Octave.
> > >
> > >
> > >
> > > Like this:
> > >
> > >
> > > #define SP_FATAL_ERR(str) { error("sparse: %s", str);  \
> > >  /*                           jump_to_top_level ();*/  \
> > >                             panic_impossible (); }
> > >
> > > When I did this change, it compiled successfully....Where can I find the
> > > definition of jump_to_top_level() ?
> > >
> > > Cheers,
> > >
> > > Ole
> > >
> > >
> > > Here is the error-message from SuperLU package...
> > >
> > > make[2]: Leaving directory `/home/olejh/Octave/octave-forge/main/signal'
> > > make[2]: Entering directory `/home/olejh/Octave/octave-forge/main/sparse'
> > > mkoctfile -DHAVE_OCTAVE_21 -s -v -c sparse_ops.cc -ISuperLU/SRC/
> > > -ISuperLU/CBLAS  -o sparse_ops.o
> > > g++ -c -fPIC -I/usr/include/octave-2.1.39
> > > -I/usr/include/octave-2.1.39/octave -I/usr/include -mieee-fp
> > > -march=pentium4 -mcpu=pentium4 -O3 -pipe -fomit-frame-pointer
> > > -ISuperLU/SRC/ -ISuperLU/CBLAS -DHAVE_OCTAVE_21 sparse_ops.cc -o
> > > sparse_ops.o
> > > cc1plus: warning: changing search order for system directory
> > > "/usr/include" cc1plus: warning:   as it has already been specified as a
> > > non-system directory sparse_ops.cc: In function `ColumnVector
> > > sparse_index_oneidx(SuperMatrix, idx_vector)':
> > > sparse_ops.cc:313: `jump_to_top_level' undeclared (first use this
> > > function) sparse_ops.cc:313: (Each undeclared identifier is reported only
> > > once for each function it appears in.)
> > > make[2]: *** [sparse_ops.o] Error 1
> > > make[2]: Leaving directory `/home/olejh/Octave/octave-forge/main/sparse'
> > > make[1]: *** [sparse/] Error 2
> > > make[1]: Leaving directory `/home/olejh/Octave/octave-forge/main'
> > > make: *** [main/] Error 2
> > >
> > > ______________________________________________________
> > > Se den nye Yahoo! Mail på http://no.yahoo.com/
> > > Nytt design, enklere å bruke, alltid tilgang til Adressebok, Kalender og
> > > Notisbok
>
> ______________________________________________________
> Følg VM i fotball 2002 på http://fifaworldcup.yahoo.com
>


Reply | Threaded
Open this post in threaded view
|

sparse matrices and error handling

John W. Eaton-6
On 16-Nov-2002, Paul Kienzle <[hidden email]> wrote:

| Recent changes in Octave have changed the way critical errors are handled.
| It use to be that when code encountered an unrecoverable error, it could
| use jump_to_top_level to return to the octave prompt.  That is no longer
| the case, so sparse matrix error handling will have to be revisited.  

OK.  I'm still planning to integrate the sparse matrix code with
Octave, but I haven't really looked at it, so I don't know how much
work that will be or when it will happen, but this project is near the
top of my list.

| I'll see to what extent I can turn it into error propogation code.  I may
| have to use setjmp/longjmp to catch errors from deep within superlu.  Andy,
| do you have any opinions on how this should be done?  I'm thinking of
| changing the internal functions so that instead of returning matrices, they
| take a matrix reference as a parameter and modify that, and return an
| error flag.

If you can't avoid longjmp, then you should set it up as a call to
foreign code and use the octave_jump_to_enclosing_context when you
need to bail out.  There are macros to do this in quit.h for
non-fortran code.  I'm currently using them around the call to
readline for user input.

| John, does error() now throw an exception so that oct-files no longer
| have to check for error_state after every operation?  This is a
| rhetorical question, since I see from the code that error_state is
| alive and well.

Right, as you discovered, error still uses the same bad "call and check
error status everywhere" style of exception handling.  This should
change, but I think we need to come up with a reasonable migration
strategy if we can, so that existing code that checks error_state will
continue to work along with code that uses exceptions.

The other thing related to exception handling that still needs to
be fixed is unwind_protect, but i'm not sure that a good migration
strategy is needed for that.  I'd guess that there isn't too much code
that uses it (apart from the Octave distribution, which will be
updated when any change is made).

| This business of propogating error_state is why I
| was looking at C++ exception handling a couple of years ago.  Between
| that and better memory recover, I'm almost convinced that C++ exceptions
| are a good idea.

What would it take to more than almost convince you?

| Too bad they aren't integrated with posix signal
| handling in any meaningful way 8-(

Yes, that would be helpful.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: sparse matrices and error handling

Paul Kienzle-2
On Sun, Nov 17, 2002 at 12:10:57PM -0600, John W. Eaton wrote:

>
> | I'll see to what extent I can turn it into error propogation code.  I may
> | have to use setjmp/longjmp to catch errors from deep within superlu.  Andy,
> | do you have any opinions on how this should be done?  I'm thinking of
> | changing the internal functions so that instead of returning matrices, they
> | take a matrix reference as a parameter and modify that, and return an
> | error flag.
>
> If you can't avoid longjmp, then you should set it up as a call to
> foreign code and use the octave_jump_to_enclosing_context when you
> need to bail out.  There are macros to do this in quit.h for
> non-fortran code.  I'm currently using them around the call to
> readline for user input.

IIRC, SuperLU needs to be compiled with an error function defined which
prints the error and exits.  I don't believe it allows for an error
handler which returns.  It's been a long time since I looked at it though.

BTW, isn't it a misnomer to refer to the setjmp/longjmp setup as only
applying to foreign code?  It seems to me that you would want it around
any code which can take more than 1/2 second or so to complete, even
if it is coded in C++ and is part of liboctave.  If you don't have it,then
how can you break out of such code since Ctrl-C can't directly throw an
exception?

Also, I saw one post suggesting that the gcc java and ada implementations
could convert signals into exceptions, but I didn't see a follow-up
explanation.  Perhaps we should post to the gcc list?

> | This business of propogating error_state is why I
> | was looking at C++ exception handling a couple of years ago.  Between
> | that and better memory recover, I'm almost convinced that C++ exceptions
> | are a good idea.
>
> What would it take to more than almost convince you?

The usual question of cost and convenience.

In order to properly collect resources after exceptions you have to write
your code just so.  I don't yet have a feel for how inconvenient "just so"
is.  Is it much less hassle than unwind_protect?

I would also like to know how much it costs compared to unwind_protect.

Paul Kienzle
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: sparse matrices and error handling

John W. Eaton-6
On 17-Nov-2002, Paul Kienzle <[hidden email]> wrote:

| IIRC, SuperLU needs to be compiled with an error function defined which
| prints the error and exits.  I don't believe it allows for an error
| handler which returns.  It's been a long time since I looked at it though.

Then I think the right thing to do is use setjmp and
octave_jump_to_enclosing_context from the error handler, though more
cleanup may be needed if SuperLU allocates resources.  If it is not
possible to do that without modifying SuperLU, then we may want to
consider distributing a modified version with Octave, in which case we
can probably avoid the need for setjmp/longjmp.

| BTW, isn't it a misnomer to refer to the setjmp/longjmp setup as only
| applying to foreign code?  It seems to me that you would want it around
| any code which can take more than 1/2 second or so to complete, even
| if it is coded in C++ and is part of liboctave.  If you don't have it,then
| how can you break out of such code since Ctrl-C can't directly throw an
| exception?

In code that we can modify (all of Octave) we can use the new
OCTAVE_QUIT macro.  It checks to see if an interrupt has occurred, and
if so, throws an exception.  So everything that is not part of Octave
and that doesn't allow us to insert checks for the interrupt state and
maybe throw exceptions is "foreign code" that has to handle interrupts
some other way.  The simplest way seems to be a longjmp out of a
signal handler and back to the location where the foreign code was
called, then throw an exception.

| The usual question of cost and convenience.
|
| In order to properly collect resources after exceptions you have to write
| your code just so.

I'd say it is much harder to get this right without exception
handling, because you would have store a pointer to everything you
need to free, even for local objects.  For example, we currently have
a lot of code that does something like

  some_fun (...) {
    ...
    Matrix mx (m, n);
    ...
  }

and as it is now, the memory allocated for mx will not be deleted if
we longjmp while this function is active.  To handle this, we would
need to write something like

  some_fun (...) {
    // Need to mark beginning of frame in unwind_protect list.
    unwind_protect::begin_frame ();
    ...
    Matrix mx (m, n);
    unwind_protect_local_matrix (mx);
    ...
    // Would not delete here, the Matrix destructor handles that, so
    // we need to discard info about locally allocated objects.
    unwind_protect::run_frame_discarding_local_objects ();
  }

for *every* matrix allocated this way.  Seems like a huge cost to me,
even if we added the unwind protect code to the Matrix constructor
(though that seems like a bad mixture to me).

Exception handling saves a lot of work for things like this because it
guarantees that if an exception causes us to unwind the call stack,
destructors for any live local objects will be called.

| I don't yet have a feel for how inconvenient "just so"
| is.  Is it much less hassle than unwind_protect?

Typically it just means making sure that resources are allocated in
constructors and released in destructors.  To be safe, you need to be
careful about how write the constructors and destructors, but I don't
think it is really hard.  Not all of Octave has it right yet, but for
the most part, resources are handled in constructors and destructors.

| I would also like to know how much it costs compared to unwind_protect.

I don't really know, though I suspect that it is not all that much.
There is a huge benefit compared to not using exceptions, since
Octave's unwind_protect is really only handling some resources.
Anything that is allocated as a local object will not go away if we
call longjmp from a signal handler to go back to the top level.  With
exception handling, we will clean up everything allocated that way
when we go back up the call stack.  We would still have to handle
things allocated outside of constructors, but we have to worry about
those things anyway.  Moving allocations like that inside constructors
is probably one of the better ways to solve the problem.  The C++
auto_ptr class can help.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: sparse matrices and error handling

Paul Kienzle-2
On Sun, Nov 17, 2002 at 07:42:22PM -0600, John W. Eaton wrote:

> On 17-Nov-2002, Paul Kienzle <[hidden email]> wrote:
> | BTW, isn't it a misnomer to refer to the setjmp/longjmp setup as only
> | applying to foreign code?  It seems to me that you would want it around
> | any code which can take more than 1/2 second or so to complete, even
> | if it is coded in C++ and is part of liboctave.  If you don't have it,then
> | how can you break out of such code since Ctrl-C can't directly throw an
> | exception?
>
> In code that we can modify (all of Octave) we can use the new
> OCTAVE_QUIT macro.  It checks to see if an interrupt has occurred, and
> if so, throws an exception.  So everything that is not part of Octave
> and that doesn't allow us to insert checks for the interrupt state and
> maybe throw exceptions is "foreign code" that has to handle interrupts
> some other way.  The simplest way seems to be a longjmp out of a
> signal handler and back to the location where the foreign code was
> called, then throw an exception.

I was assuming that we wouldn't want to interfere with the optimizer by
putting OCTAVE_QUIT in the middle of a tight loop.  To demonstrate this
I looked for an appropriate loop in filter.cc, but I see that that is
where you put the OCTAVE_QUIT.  

I measure a 4% in the following unrealistic case:

        octave> x=rand(20000,1);
        octave> b = hanning(1024);

        With OCTAVE_QUIT

        octave> tic; filter(b,1,x); toc
        ans = 2.2529

        OCTAVE_QUIT in outer loop only:

        octave> tic; filter(b,1,x); toc
        ans = 2.2023

        Without OCTAVE_QUIT

        octave> tic; filter(b,1,x); toc
        ans = 2.1686


In the following realistic case there is less than 1% performance
penalty:

        octave:9> [b,a]=butter(8,0.3);

        With OCTAVE_QUIT
        octave> tic; filter(b,a,x); toc
        ans = 0.039206

        Without OCTAVE_QUIT
        octave> tic; filter(b,a,x); toc
        ans = 0.038891

Am I correct in assuming nobody will be bothered by this?  I start to
get concerned when it is more than 10-15%, so it doesn't bother me.
         
Paul Kienzle
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: sparse matrices and error handling

John W. Eaton-6
On 18-Nov-2002, Paul Kienzle <[hidden email]> wrote:

| I was assuming that we wouldn't want to interfere with the optimizer by
| putting OCTAVE_QUIT in the middle of a tight loop.  To demonstrate this
| I looked for an appropriate loop in filter.cc, but I see that that is
| where you put the OCTAVE_QUIT.  
|
| I measure a 4% in the following unrealistic case:
|
| [...]
|
| In the following realistic case there is less than 1% performance
| penalty:
|
| [...]
|
| Am I correct in assuming nobody will be bothered by this?

I don't think it is too bad, and there have been bug reports about
memory leaks due to signal handling behavior.

In cases like filter, where we have loops like this,

  for (int i = 0; i < x_len; i++)
    {
      ...

      for (int i = 0; i < si_len; i++)
        {
          OCTAVE_QUIT;
          ...
        }
    }

I put OCTAVE_QUIT in the inner loop because if we put it at the top of
the outer loop and x_len is 1 and si_len is some large number, then we
might not be very responsive to interrupts.  OTOH, the inner loop is
just

  si(j) = si(j+1) - a(j+1) * y(i) + b(j+1) * x(i);

so how many iterations would be have to have to make this take more
than a few tenths of a second on reasonably modern hardware?

If it becomes a problem, we could always rewrite to break up the loops
so they process blocks of data before checking the interrupt state.

I suppose I'd rather leave things as simple as possible until we
actually see a performance problem, and there are probably lots of
other places in Octave where some optimization could make a much
bigger difference.

jwe