signal handling

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

signal handling

John W. Eaton-6
I'm thinking about how to improve Octave's signal handling and error
recovery.  Currently, when Octave receives an interrupt signal, it
simply tries to use longjump to get back to the beginning of the main
interpreter loop.  This approach has at least two major problems.
First, there is no guarantee that a longjump out of a signal handler
will even work.  Second, it skips over the destructors for all the
objects that have been created up to the point of the signal, which
leads to big memory leaks.

The program below provides an outline of how signal handling in Octave
currently works and also how I propose to improve it.  If you compile
without -DUSE_EXCEPTIONS, you will get the current behavior.  Run it
and type Control-C a few times just after the "sleeping" message
appears (so you are sure to have some memory allocated by an
intermediate object) and watch the process grow (with top, ps,
whatever).  It should grow in increments of about 8MB and you should
see more messages from the big_memory_hog constructor than from the
destructor.

My proposal is to modify the signal handler so that it simply sets a
global variable and returns.  Then when Octave hits a QUIT macro, it
will throw a C++ exception which will be caught at the top level.  If
we then make sure that all resources are managed properly in object
constructors/destructors, the exception handling code should take care
of deleting any temporary objects.  I think that in most instances,
this is already true, so most of that work is done.  What remains is
to add calls to QUIT in a number of places in the Octave sources.  To
see how this method works, compile the example below with
-DUSE_EXCEPTIONS and try the same experiment as above.  This time, you
should see the same number of messages from the big_memory_hog
constructor and destructor so the process should not grow (the total
size should never be too much more than 8MB).

This method will take care of exceptions that happen in the C++ code
that Octave uses.  For the Fortran bits and other library code that we
call, I think we will need a slightly different approach.  But we only
have to worry about code that may run for a significant amount of time
(otherwise, we just wait for it to finish, and the exception handling
stuff will take care of getting us back to the top level without too
much delay).

My proposal for Fortran or other long-running foreign code is to set
up a separate signal handling/longjump scheme around them.  The idea
is that when an interrupt signal is received, we will jump out of the
signal handler, but only back to the point where the foreign code was
called.  Then we throw an exception and make our way back to the top
level, using the exception handler code to clean up as we go.  This
does not solve the problem of resource management for the foriegn code
or on systems where jumping out of a signal handler does not work (I
don't know what to do about that, other than ignore interrupts on
those systems).  It is OK for the Fortran code that Octave uses
because that stuff doesn't do any memory allocations and typically
does not open external files.  But for other foriegn code, we may leak
memory or other resources.  The only solution I can see for that is to
make sure that the foreign code manages its resources properly in the
presence of interrupt signals.

Comments?

Thanks,

jwe



except.cc (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

signal handling

John W. Eaton-6
On  5-Nov-2002, John W. Eaton <[hidden email]> wrote:

| I'm thinking about how to improve Octave's signal handling and error
| recovery.

I've made changes for this in CVS.  If you are interested in seeing
diffs, let me know.  Otherwise, update from CVS, give it a try, and
let me know if you notice any problems.  In particular, I'd like to
know if there are operations that can no longer be interrupted
reasonably quickly.  I've inserted a lot of calls to the OCTAVE_QUIT
macro that actually throws the exception that takes us back to the
top-level command loop, but it's hard to know exactly how many to use
and where they should go.  Mostly I've tried to insert them in loops
that could run for a long time.  I may have used too many in some
places and missed some others where it would be helpful.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: signal handling

Etienne Grossmann-4
In reply to this post by John W. Eaton-6

  Hello,

thanks for working on this important problem. Right now, I am busy
preparing a trip to the New England and planning visits to labs there,
so I won't have much resource to test your code. However, I will try
to install CVS and compare its behavior w/ that of 2.1.39. Since what
octave I do presently is debugging, I often do C-c.

  Cheers,

  Etienne

On Tue, Nov 05, 2002 at 11:59:41AM -0600, John W. Eaton wrote:
Content-Description: message body text
# I'm thinking about how to improve Octave's signal handling and error
# recovery.  Currently, when Octave receives an interrupt signal, it
# simply tries to use longjump to get back to the beginning of the main
# interpreter loop.  This approach has at least two major problems.
# First, there is no guarantee that a longjump out of a signal handler
# will even work.  Second, it skips over the destructors for all the
# objects that have been created up to the point of the signal, which
# leads to big memory leaks.
#
# The program below provides an outline of how signal handling in Octave
# currently works and also how I propose to improve it.  If you compile
# without -DUSE_EXCEPTIONS, you will get the current behavior.  Run it
# and type Control-C a few times just after the "sleeping" message
# appears (so you are sure to have some memory allocated by an
# intermediate object) and watch the process grow (with top, ps,
# whatever).  It should grow in increments of about 8MB and you should
# see more messages from the big_memory_hog constructor than from the
# destructor.
#
# My proposal is to modify the signal handler so that it simply sets a
# global variable and returns.  Then when Octave hits a QUIT macro, it
# will throw a C++ exception which will be caught at the top level.  If
# we then make sure that all resources are managed properly in object
# constructors/destructors, the exception handling code should take care
# of deleting any temporary objects.  I think that in most instances,
# this is already true, so most of that work is done.  What remains is
# to add calls to QUIT in a number of places in the Octave sources.  To
# see how this method works, compile the example below with
# -DUSE_EXCEPTIONS and try the same experiment as above.  This time, you
# should see the same number of messages from the big_memory_hog
# constructor and destructor so the process should not grow (the total
# size should never be too much more than 8MB).
#
# This method will take care of exceptions that happen in the C++ code
# that Octave uses.  For the Fortran bits and other library code that we
# call, I think we will need a slightly different approach.  But we only
# have to worry about code that may run for a significant amount of time
# (otherwise, we just wait for it to finish, and the exception handling
# stuff will take care of getting us back to the top level without too
# much delay).
#
# My proposal for Fortran or other long-running foreign code is to set
# up a separate signal handling/longjump scheme around them.  The idea
# is that when an interrupt signal is received, we will jump out of the
# signal handler, but only back to the point where the foreign code was
# called.  Then we throw an exception and make our way back to the top
# level, using the exception handler code to clean up as we go.  This
# does not solve the problem of resource management for the foriegn code
# or on systems where jumping out of a signal handler does not work (I
# don't know what to do about that, other than ignore interrupts on
# those systems).  It is OK for the Fortran code that Octave uses
# because that stuff doesn't do any memory allocations and typically
# does not open external files.  But for other foriegn code, we may leak
# memory or other resources.  The only solution I can see for that is to
# make sure that the foreign code manages its resources properly in the
# presence of interrupt signals.
#
# Comments?
#
# Thanks,
#
# jwe
#
#



--
Etienne Grossmann ------ http://www.isr.ist.utl.pt/~etienne


Reply | Threaded
Open this post in threaded view
|

Re: signal handling

Paul Kienzle-2
In reply to this post by John W. Eaton-6
On Tue, Nov 05, 2002 at 11:59:41AM -0600, John W. Eaton wrote:
Content-Description: message body text
> I'm thinking about how to improve Octave's signal handling and error
> recovery.  Currently, when Octave receives an interrupt signal, it
> simply tries to use longjump to get back to the beginning of the main
> interpreter loop.  This approach has at least two major problems.
> First, there is no guarantee that a longjump out of a signal handler
> will even work.  Second, it skips over the destructors for all the
> objects that have been created up to the point of the signal, which
> leads to big memory leaks.

Here is how the open group defines siglongjmp:

        http://www.opennc.org/onlinepubs/7908799/xsh/siglongjmp.html

        As it bypasses the usual function call and return mechanisms,
        siglongjmp() will execute correctly in contexts of interrupts,
        signals and any of their associated functions. However, if
        siglongjmp() is invoked from a nested signal handler (that is, from
        a function invoked as a result of a signal raised during the
        handling of another signal), the behaviour is undefined.

I agree that doesn't address the destructor problem, but it does
suggest that you should be using sigsetjmp/siglongjmp in those places
where you cannot use C++ exception handling.

> My proposal is to modify the signal handler so that it simply sets a
> global variable and returns.  Then when Octave hits a QUIT macro, it
> will throw a C++ exception which will be caught at the top level.  If
> we then make sure that all resources are managed properly in object
> constructors/destructors, the exception handling code should take care
> of deleting any temporary objects.  I think that in most instances,
> this is already true, so most of that work is done.  What remains is
> to add calls to QUIT in a number of places in the Octave sources.  To
> see how this method works, compile the example below with
> -DUSE_EXCEPTIONS and try the same experiment as above.  This time, you
> should see the same number of messages from the big_memory_hog
> constructor and destructor so the process should not grow (the total
> size should never be too much more than 8MB).

Can we have a debug_on_interrupt flag so that octave will stop at the
next statement?  This can be useful if you have a long running process.
Hit Ctrl-C, see where it has got to, then continue.  

One problem I see is that you don't know whether the last line ran to
completion or if it was interrupted within an external function.  In a
patch I sent earlier, the first interrupt would set a flag saying stop at
the next statement, but if the user got bored waiting they could hit ctrl-C
again which would see that the flag had been set and abort to the top
level.

Paul Kienzle
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: signal handling

John W. Eaton-6
On 14-Nov-2002, Paul Kienzle <[hidden email]> wrote:

| Here is how the open group defines siglongjmp:
|
| http://www.opennc.org/onlinepubs/7908799/xsh/siglongjmp.html
|
| As it bypasses the usual function call and return mechanisms,
| siglongjmp() will execute correctly in contexts of interrupts,
| signals and any of their associated functions. However, if
| siglongjmp() is invoked from a nested signal handler (that is, from
| a function invoked as a result of a signal raised during the
| handling of another signal), the behaviour is undefined.

Can we guarantee that they will never be called from a nested signal
handler?  The only place we would jump from would be the interrupt
handler, and I think that when it is executing, interrupts are
ignored.  What would happen if we catch an interrupt when some other
signal handler is running?  Would ignoring interrupts inside other
signal handlers solve that problem?  What about signal handlers that
might be installed by "foreign" code?

Hmm.  In any case, you're right that we should be using
sig{set,long}jmp instead, if they exist (and I would expect them to be
present on all modern systems).  I can make that change.

I also thought of another problem.  With my most recent changes, we
now only jump out of the signal handler when executing foreign code
(mostly Fortran) and that should be OK except when the foreign code
calls back to Octave.  For example, fsolve calls a user-defined
function written in C++ which then executes some interpreted Octave
code.  When that code is running, we should avoid jumping so that we
don't leak resources.  This means that the body of the user-supplied
function should be surrounded by a try-catch block and we should set
up to return from the signal handler instead of jumping, then throw
and exception, then jump from the try block back to the point where
the foreign code was intially called.

I can make these changes for the code that is in Octave now, but
unfortunately doing this correctly requires making similar
modifications for any foreign code that calls back to Octave.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: signal handling

John W. Eaton-6
In reply to this post by Paul Kienzle-2
On 14-Nov-2002, Paul Kienzle <[hidden email]> wrote:

| Can we have a debug_on_interrupt flag so that octave will stop at the
| next statement?  This can be useful if you have a long running process.
| Hit Ctrl-C, see where it has got to, then continue.  

Sure, this seems like a reasonable feature.

| One problem I see is that you don't know whether the last line ran to
| completion or if it was interrupted within an external function.  In a
| patch I sent earlier, the first interrupt would set a flag saying stop at
| the next statement, but if the user got bored waiting they could hit ctrl-C
| again which would see that the flag had been set and abort to the top
| level.

OK, so this "double interrupt" will only be required when
debug_on_interrupt is true and we are inside foreign code that would
normally jump out of the signal handler, correct?

jwe


Reply | Threaded
Open this post in threaded view
|

Re: signal handling

Paul Kienzle-2
On Thu, Nov 14, 2002 at 03:59:45PM -0600, John W. Eaton wrote:

> On 14-Nov-2002, Paul Kienzle <[hidden email]> wrote:
>
> | Can we have a debug_on_interrupt flag so that octave will stop at the
> | next statement?  This can be useful if you have a long running process.
> | Hit Ctrl-C, see where it has got to, then continue.  
>
> Sure, this seems like a reasonable feature.
>
> | One problem I see is that you don't know whether the last line ran to
> | completion or if it was interrupted within an external function.  In a
> | patch I sent earlier, the first interrupt would set a flag saying stop at
> | the next statement, but if the user got bored waiting they could hit ctrl-C
> | again which would see that the flag had been set and abort to the top
> | level.
>
> OK, so this "double interrupt" will only be required when
> debug_on_interrupt is true and we are inside foreign code that would
> normally jump out of the signal handler, correct?

__long__ running foreign code.  Short running foreign code (e.g., a small
matrix multiply) will quickly get to the next statement and the debugger
will pick it up before the user has a chance to press Ctrl-C twice.  

It might be nice to let the user know which foreign function is active when
Ctrl-C is pressed the first time, but I/O during signal handling sounds
like a bad idea.

>
> jwe
>


Reply | Threaded
Open this post in threaded view
|

Re: signal handling

Paul Kienzle-2
In reply to this post by John W. Eaton-6
On Thu, Nov 14, 2002 at 03:57:22PM -0600, John W. Eaton wrote:

> On 14-Nov-2002, Paul Kienzle <[hidden email]> wrote:
>
> | Here is how the open group defines siglongjmp:
> |
> | http://www.opennc.org/onlinepubs/7908799/xsh/siglongjmp.html
> |
> | As it bypasses the usual function call and return mechanisms,
> | siglongjmp() will execute correctly in contexts of interrupts,
> | signals and any of their associated functions. However, if
> | siglongjmp() is invoked from a nested signal handler (that is, from
> | a function invoked as a result of a signal raised during the
> | handling of another signal), the behaviour is undefined.
>
> Can we guarantee that they will never be called from a nested signal
> handler?

We can using sigaction, which allows you to mask signals while the handler
is active.

> The only place we would jump from would be the interrupt
> handler, and I think that when it is executing, interrupts are
> ignored.

I can't find anything definitive about that for signal().

> What would happen if we catch an interrupt when some other
> signal handler is running?  Would ignoring interrupts inside other
> signal handlers solve that problem?  What about signal handlers that
> might be installed by "foreign" code?

Foreign code should set and restore the signal handlers that it needs.

>
> Hmm.  In any case, you're right that we should be using
> sig{set,long}jmp instead, if they exist (and I would expect them to be
> present on all modern systems).  I can make that change.
>
> I also thought of another problem.  With my most recent changes, we
> now only jump out of the signal handler when executing foreign code
> (mostly Fortran) and that should be OK except when the foreign code
> calls back to Octave.  For example, fsolve calls a user-defined
> function written in C++ which then executes some interpreted Octave
> code.  When that code is running, we should avoid jumping so that we
> don't leak resources.  This means that the body of the user-supplied
> function should be surrounded by a try-catch block and we should set
> up to return from the signal handler instead of jumping, then throw
> and exception, then jump from the try block back to the point where
> the foreign code was intially called.

I played with this a long time ago, and I'm rusty on the details.  The
foreign code has to be compiled with -fexceptions, but from the signal
handler can't you simply throw an exception as usual?  I believe I was
throwing the exception in the callback rather than the from the signal
handler, but the idea is the same.

If the foreign code has acquired resource they won't be released (C++
doesn't know about them), but that is true even if you use siglongjmp
as you are suggesting.  The only way around it that I can see is if
the foreign code can register and invoke unwind_protect code much like
you are doing things now.

Oh, but a lot of the code we are calling (e.g., atlas) may not have
been compiled with -fexceptions, and that might mean we can't throw
an exception from a signal invoked while that code is active.  We need
to try this on various platforms.

The matlab solution, by the way, is to require resource allocation
via matlab functions so that the can be automatically released when your
function exits unless you insist that the resources are persistent.

>
> I can make these changes for the code that is in Octave now, but
> unfortunately doing this correctly requires making similar
> modifications for any foreign code that calls back to Octave.

Which is not that much.  I'm assuming most oct files would not be classed
as foreign.

Paul Kienzle
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: signal handling

John W. Eaton-6
In reply to this post by Paul Kienzle-2
On 14-Nov-2002, Paul Kienzle <[hidden email]> wrote:

| On Thu, Nov 14, 2002 at 03:59:45PM -0600, John W. Eaton wrote:
|
| > OK, so this "double interrupt" will only be required when
| > debug_on_interrupt is true and we are inside foreign code that would
| > normally jump out of the signal handler, correct?
|
| __long__ running foreign code.  Short running foreign code (e.g., a small
| matrix multiply) will quickly get to the next statement and the debugger
| will pick it up before the user has a chance to press Ctrl-C twice.  

Right.

This also brings up the question of when to even bother setting up for
siglongjmp.  We have two ways to call Fortran code.  One is to make a
direct call using the F77_FUNC macro (for name mangling) from autoconf
and the other is to make a call using F77_XFCN, defined in
libcruft/misc/f77-fcn.h.  This macro does all the work of setting up
for siglongjmp and the possibility of errors in the Fortran code.
It's relatively safe, but expensive.  It probably makese sense for
calls to LSODE, but not calls to DERFC.  So I think we are only using
F77_XFCN for the potentially expensive calls now (I just changed a few
more simple calls in liboctave/lo-specfun.cc to use F77_FUNC).  OTOH,
there is no code like this (yet):

  if (problem_size > some_threshold)
    F77_XFCN (dgemm, DGEMM, (args, for, dgemm, ...));
  else
    F77_FUNC (dgemm, DGEMM) (args, for, dgemm, ...);

which would avoid all the setup costs if we think the problem is small
enough to be computed quickly.  I don't think there is any way to
guess this for iterative solvers like LSODE, but it would probably be
possible for things like matrix multiply, inversion, etc.

| It might be nice to let the user know which foreign function is active when
| Ctrl-C is pressed the first time, but I/O during signal handling sounds
| like a bad idea.

Right.  I think it's best to do as little as possible in the signal
handlers.  My only excuse for the way the code was originally written
is that I was in a hurry.  I figured that if it was important enough,
it would eventually be fixed.  According to this entry in ChangeLog.1,
it's only taken about 10 years:

  Wed Jun 24 15:09:36 1992  John W. Eaton  ([hidden email])

        * Version 0.10.

        * Cheesy signal handler added to catch interrupts.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: signal handling

John W. Eaton-6
In reply to this post by Paul Kienzle-2
On 14-Nov-2002, Paul Kienzle <[hidden email]> wrote:

| We can using sigaction, which allows you to mask signals while the handler
| is active.

OK.

| > The only place we would jump from would be the interrupt
| > handler, and I think that when it is executing, interrupts are
| > ignored.
|
| I can't find anything definitive about that for signal().

Maybe I was imagining it.  In any case, it should not be too hard to
block them in the other signal handlers.

| > Hmm.  In any case, you're right that we should be using
| > sig{set,long}jmp instead, if they exist (and I would expect them to be
| > present on all modern systems).  I can make that change.

I've made changes for this, and I'm just about to check them in.

| > I also thought of another problem.  With my most recent changes, we
| > now only jump out of the signal handler when executing foreign code
| > (mostly Fortran) and that should be OK except when the foreign code
| > calls back to Octave.  For example, fsolve calls a user-defined
| > function written in C++ which then executes some interpreted Octave
| > code.  When that code is running, we should avoid jumping so that we
| > don't leak resources.  This means that the body of the user-supplied
| > function should be surrounded by a try-catch block and we should set
| > up to return from the signal handler instead of jumping, then throw
| > and exception, then jump from the try block back to the point where
| > the foreign code was intially called.
|
| I played with this a long time ago, and I'm rusty on the details.  The
| foreign code has to be compiled with -fexceptions, but from the signal
| handler can't you simply throw an exception as usual?  I believe I was
| throwing the exception in the callback rather than the from the signal
| handler, but the idea is the same.

Does the foreign code have to be compiled with -fexceptions if we have
the following structure?

   fortran_code
     ...
     call user_defined_function_in_cxx
     ...
   end


   if (sigsegjmp (buf, 1))
     throw interrupt_exception ();
   else {
     interrupt_immediately++;  // i.e., jump from signal handler
     fortran_code ();
   }

   user_defined_function_in_cxx () {
     int save_interrupt_immediately = interrupt_immediately
     try
       {
         interrupt_immediately = 0;  // i.e., don't jump from SIGINT handler
         some_useful_calc ();
       }
     catch (interrupt_exception)
       {
         interrupt_immediately = save_interrupt_immediately;
         siglongjmp (back_to_location_of_call_to_fortran_code, 1);
       }
     interrupt_immediately = save_interrupt_immediately;
   }

This is what I was trying to describe in my previous message.  So we
just want to skip back over the fortran code because maybe it doesn't
give us a good way to say "bail out now!" from inside the user defined
function.

I've made some changes for this that I'm about to check in.  They
seemed to work fine on the example I tried -- sending SIGINT while the
lsode user function was executing ended up in the catch block and then
the jump back to the location of the sigsetjmp followed by an
exception to go back to the main loop was successful.

| If the foreign code has acquired resource they won't be released (C++
| doesn't know about them), but that is true even if you use siglongjmp
| as you are suggesting.

Right.  I'm just assuming that the foreign code doesn't acquire any
resources (true for the Fortran code that Octave uses, I think).

| The only way around it that I can see is if
| the foreign code can register and invoke unwind_protect code much like
| you are doing things now.

Would be possible, but maybe not always practical.

| Oh, but a lot of the code we are calling (e.g., atlas) may not have
| been compiled with -fexceptions, and that might mean we can't throw
| an exception from a signal invoked while that code is active.  We need
| to try this on various platforms.

OK.  Just to try to clarify, the exceptions are never thrown from the
signal handler.  The signal handler just sets a flag and returns.
Then we hit an OCTAVE_QUIT macro which throws the exception.  It's
caught before we get back to the fortran routine, then we jump over
that and back to C++, where we rethrow the exception.  These kinds of
blocks can be nested.

If we happen to catch SIGINT when the foreign code (fortran_code in
the above example) is executing, we will jump back to the location of
the previous sigsetjmp.  I think that's the best we can do unless we
want interrupts for long-running foreign code to be unresponsive
(and annoying).

I don't see anything in the g77 manual that says anything about
exceptions, and g77 -v doesn't show any options enabled for exception
handling.  My scheme above seemed to work OK (not that it will work on
all systems or with all compilers, of course).

jwe