parallel make hangs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

parallel make hangs

Dmitri A. Sergatskov
Every now and then parallel make (make -j8 in my case) hangs at
....  
  GEN      doc/interpreter/octave.html/splinefit3.png
  GEN      doc/interpreter/octave.html/splinefit4.png
  GEN      doc/interpreter/octave.html/splinefit6.png
  GEN      doc/interpreter/octave.html/octave.css
(with octave process running at 100% cpu load)
Killing it with Ctrl-C results in:

^Cmake[2]: *** Deleting file 'doc/interpreter/convhull.eps'
^C^Cfatal: caught signal Interrupt -- stopping myself...
attempting to save variables to 'octave-workspace'...
octave exited with signal 11
make[1]: *** [Makefile:26595: all-recursive] Interrupt
make: *** [Makefile:10262: all] Interrupt

After that i can just issue simple make and it will finish no problem.
It looks to me there are some race conditions...

Dmitri.
--

Reply | Threaded
Open this post in threaded view
|

Re: parallel make hangs

Daniel Sebald
On 11/21/2017 07:01 PM, Dmitri A. Sergatskov wrote:

> Every now and then parallel make (make -j8 in my case) hangs at
> ....
>    GEN      doc/interpreter/octave.html/splinefit3.png
>    GEN      doc/interpreter/octave.html/splinefit4.png
>    GEN      doc/interpreter/octave.html/splinefit6.png
>    GEN      doc/interpreter/octave.html/octave.css
> (with octave process running at 100% cpu load)
> Killing it with Ctrl-C results in:
>
> ^Cmake[2]: *** Deleting file 'doc/interpreter/convhull.eps'
> ^C^Cfatal: caught signal Interrupt -- stopping myself...
> attempting to save variables to 'octave-workspace'...
> octave exited with signal 11
> make[1]: *** [Makefile:26595: all-recursive] Interrupt
> make: *** [Makefile:10262: all] Interrupt
>
> After that i can just issue simple make and it will finish no problem.
> It looks to me there are some race conditions...

Those happen sometimes.  Here the make process hangs on voronoi.eps
(regardless of parallel build or not), but I think that is a
long-standing bug with osmesa and nVidia drivers.  convhull and voronoi
figures are challenging for

      print (outfile, d_typ);

for some reason.  Didn't simply running "make -j8" a second time succeed?

Dan

Reply | Threaded
Open this post in threaded view
|

Re: parallel make hangs

Dmitri A. Sergatskov


On Tue, Nov 21, 2017 at 9:06 PM, Daniel J Sebald <[hidden email]> wrote:
On 11/21/2017 07:01 PM, Dmitri A. Sergatskov wrote:
Every now and then parallel make (make -j8 in my case) hangs at
....
   GEN      doc/interpreter/octave.html/splinefit3.png
   GEN      doc/interpreter/octave.html/splinefit4.png
   GEN      doc/interpreter/octave.html/splinefit6.png
   GEN      doc/interpreter/octave.html/octave.css
(with octave process running at 100% cpu load)
Killing it with Ctrl-C results in:

^Cmake[2]: *** Deleting file 'doc/interpreter/convhull.eps'
^C^Cfatal: caught signal Interrupt -- stopping myself...
attempting to save variables to 'octave-workspace'...
octave exited with signal 11
make[1]: *** [Makefile:26595: all-recursive] Interrupt
make: *** [Makefile:10262: all] Interrupt

After that i can just issue simple make and it will finish no problem.
It looks to me there are some race conditions...

Those happen sometimes.  Here the make process hangs on voronoi.eps (regardless of parallel build or not), but I think that is a long-standing bug with osmesa and nVidia drivers.  convhull and voronoi figures are challenging for

     print (outfile, d_typ);

for some reason.  Didn't simply running "make -j8" a second time succeed?


​This all sounds plausible. ​

 
Dan

​Dmitri.
--

Reply | Threaded
Open this post in threaded view
|

Re: parallel make hangs

John W. Eaton
Administrator
In reply to this post by Daniel Sebald
On 11/21/2017 10:06 PM, Daniel J Sebald wrote:

> Those happen sometimes.  Here the make process hangs on voronoi.eps
> (regardless of parallel build or not), but I think that is a
> long-standing bug with osmesa and nVidia drivers.

I'm surprised that you have any success mixing the nvidia libGL with
OSMesa.  I don't think the layout of the GL context used the the nvidia
library is compatible with the one used by Mesa.  As far as I know, the
only way to make that work properly is to use the Mesa libGL and force
software rendering.  What happens if you do that, by setting
LD_PRELOAD=/path/to/your/mesa/libGL.so and LIBGL_ALWAYS_SOFTWARE=1 in
the build environment?  Does it hang then?  Are both actually needed for
building?  Since that is only doing off-screen rendering, then the
LIBGL_ALWAYS_SOFTWARE might not be needed.

jwe

Reply | Threaded
Open this post in threaded view
|

Re: parallel make hangs

John W. Eaton
Administrator
In reply to this post by Dmitri A. Sergatskov
On 11/21/2017 08:01 PM, Dmitri A. Sergatskov wrote:

> Every now and then parallel make (make -j8 in my case) hangs at
> ....
>    GEN      doc/interpreter/octave.html/splinefit3.png
>    GEN      doc/interpreter/octave.html/splinefit4.png
>    GEN      doc/interpreter/octave.html/splinefit6.png
>    GEN      doc/interpreter/octave.html/octave.css
> (with octave process running at 100% cpu load)
> Killing it with Ctrl-C results in:
>
> ^Cmake[2]: *** Deleting file 'doc/interpreter/convhull.eps'
> ^C^Cfatal: caught signal Interrupt -- stopping myself...
> attempting to save variables to 'octave-workspace'...
> octave exited with signal 11
> make[1]: *** [Makefile:26595: all-recursive] Interrupt
> make: *** [Makefile:10262: all] Interrupt
>
> After that i can just issue simple make and it will finish no problem.
> It looks to me there are some race conditions...

When this happens, could you use gdb to connect to the process and see
where it is hanging?

jwe

Reply | Threaded
Open this post in threaded view
|

Re: parallel make hangs

John W. Eaton
Administrator
In reply to this post by Dmitri A. Sergatskov
On 11/22/2017 09:03 AM, Dmitri A. Sergatskov wrote:

>
>
> On Tue, Nov 21, 2017 at 9:06 PM, Daniel J Sebald <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 11/21/2017 07:01 PM, Dmitri A. Sergatskov wrote:
>
>         Every now and then parallel make (make -j8 in my case) hangs at
>         ....
>             GEN      doc/interpreter/octave.html/splinefit3.png
>             GEN      doc/interpreter/octave.html/splinefit4.png
>             GEN      doc/interpreter/octave.html/splinefit6.png
>             GEN      doc/interpreter/octave.html/octave.css
>         (with octave process running at 100% cpu load)
>         Killing it with Ctrl-C results in:
>
>         ^Cmake[2]: *** Deleting file 'doc/interpreter/convhull.eps'
>         ^C^Cfatal: caught signal Interrupt -- stopping myself...
>         attempting to save variables to 'octave-workspace'...
>         octave exited with signal 11
>         make[1]: *** [Makefile:26595: all-recursive] Interrupt
>         make: *** [Makefile:10262: all] Interrupt
>
>         After that i can just issue simple make and it will finish no
>         problem.
>         It looks to me there are some race conditions...
>
>
>     Those happen sometimes.  Here the make process hangs on voronoi.eps
>     (regardless of parallel build or not), but I think that is a
>     long-standing bug with osmesa and nVidia drivers.  convhull and
>     voronoi figures are challenging for
>
>           print (outfile, d_typ);
>
>     for some reason.  Didn't simply running "make -j8" a second time
>     succeed?
>
>
> ​This all sounds plausible. ​

Are you using a Mesa libGL on this system, or do you have the nvidia
libGL?  If it is not Mesa by default, are you preloading the Mesa library?

jwe


Reply | Threaded
Open this post in threaded view
|

Re: parallel make hangs

Dmitri A. Sergatskov
On Wed, Nov 22, 2017 at 9:38 AM, John W. Eaton <[hidden email]> wrote:
On 11/22/2017 09:03 AM, Dmitri A. Sergatskov wrote:



On Tue, Nov 21, 2017 at 9:06 PM, Daniel J Sebald <[hidden email] <mailto:[hidden email]>> wrote:

    On 11/21/2017 07:01 PM, Dmitri A. Sergatskov wrote:

        Every now and then parallel make (make -j8 in my case) hangs at
        ....
            GEN      doc/interpreter/octave.html/splinefit3.png
            GEN      doc/interpreter/octave.html/splinefit4.png
            GEN      doc/interpreter/octave.html/splinefit6.png
            GEN      doc/interpreter/octave.html/octave.css
        (with octave process running at 100% cpu load)
        Killing it with Ctrl-C results in:

        ^Cmake[2]: *** Deleting file 'doc/interpreter/convhull.eps'
        ^C^Cfatal: caught signal Interrupt -- stopping myself...
        attempting to save variables to 'octave-workspace'...
        octave exited with signal 11
        make[1]: *** [Makefile:26595: all-recursive] Interrupt
        make: *** [Makefile:10262: all] Interrupt

        After that i can just issue simple make and it will finish no
        problem.
        It looks to me there are some race conditions...


    Those happen sometimes.  Here the make process hangs on voronoi.eps
    (regardless of parallel build or not), but I think that is a
    long-standing bug with osmesa and nVidia drivers.  convhull and
    voronoi figures are challenging for

          print (outfile, d_typ);

    for some reason.  Didn't simply running "make -j8" a second time
    succeed?


​This all sounds plausible. ​

Are you using a Mesa libGL on this system, or do you have the nvidia libGL?  If it is not Mesa by default, are you preloading the Mesa library?



​I am preloading mesa. I do not use nvidia libGL. The computer used to hang intermittently when running
dump plot demos. I guess I got misled by what appears to be happening during  building of octave.css.

It does not happen too often (fortunately/unfortunately).

jwe


​Dmitri.
--

Reply | Threaded
Open this post in threaded view
|

Re: parallel make hangs

John W. Eaton
Administrator
On 11/22/2017 10:45 AM, Dmitri A. Sergatskov wrote:

> ​I am preloading mesa. I do not use nvidia libGL. The computer used to
> hang intermittently when running
> dump plot demos. I guess I got misled by what appears to be happening
> during  building of octave.css.
>
> It does not happen too often (fortunately/unfortunately).

OK, I thought so, but wanted to confirm.

If you can attach with gdb the next time this happens, maybe it will
give some clues.

jwe

Reply | Threaded
Open this post in threaded view
|

Re: parallel make hangs

Dmitri A. Sergatskov


On Wed, Nov 22, 2017 at 9:48 AM, John W. Eaton <[hidden email]> wrote:
On 11/22/2017 10:45 AM, Dmitri A. Sergatskov wrote:

​I am preloading mesa. I do not use nvidia libGL. The computer used to hang intermittently when running
dump plot demos. I guess I got misled by what appears to be happening during  building of octave.css.

It does not happen too often (fortunately/unfortunately).

OK, I thought so, but wanted to confirm.

If you can attach with gdb the next time this happens, maybe it will give some clues.

​I had some other sporadic hanging during build process. E.g. here

buildbot hanged during make test. I rebuilt the same snapshot as myself and test
passed just fine. I do not see any messages in logs to indicate it is a faulty
hardware.

 

jwe

​Dmitri.
--

Reply | Threaded
Open this post in threaded view
|

Re: parallel make hangs

Dmitri A. Sergatskov
​Another (and perhaps) related issue with Fedora is that different libraries
are built against different blas implementations and as a result the octave gets
linked to all of them:

​ldd src/.libs/octave-cli  | grep blas
    libopenblas.so.0 => /lib64/libopenblas.so.0 (0x00007f47d28f8000)
    libopenblasp.so.0 => /lib64/libopenblasp.so.0 (0x00007f47c97fd000)
    libblas.so.3 => /lib64/libblas.so.3 (0x00007f47c95a7000)

ldd src/.libs/octave-cli  | grep atlas
    libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007f2ce47a1000)

I am not 100% sure how it all work in the end, but I assume the last
linked library overwrites all previous symbols, so this *should* not be
a problem as long as all those libraries ABI compatible.

Dmitri.
--


Reply | Threaded
Open this post in threaded view
|

Re: parallel make hangs

Daniel Sebald
In reply to this post by John W. Eaton
On 11/22/2017 09:34 AM, John W. Eaton wrote:

> On 11/21/2017 10:06 PM, Daniel J Sebald wrote:
>
>> Those happen sometimes.  Here the make process hangs on voronoi.eps
>> (regardless of parallel build or not), but I think that is a
>> long-standing bug with osmesa and nVidia drivers.
>
> I'm surprised that you have any success mixing the nvidia libGL with
> OSMesa.  I don't think the layout of the GL context used the the nvidia
> library is compatible with the one used by Mesa.  As far as I know, the
> only way to make that work properly is to use the Mesa libGL and force
> software rendering.  What happens if you do that, by setting
> LD_PRELOAD=/path/to/your/mesa/libGL.so and LIBGL_ALWAYS_SOFTWARE=1 in
> the build environment?  Does it hang then?  Are both actually needed for
> building?  Since that is only doing off-screen rendering, then the
> LIBGL_ALWAYS_SOFTWARE might not be needed.

I would guess that you are correct on the LIBGL_ALWAYS_SOFTWARE=1
switch.  Rather than rebuild for that, let me go back to the default
configure and try preloading libGL.so.

I don't think I am having partial success.  Running in the -j4 mode
makes it appear that some things have succeeded when in fact they are
all four still active, e.g.:

   GEN      doc/interpreter/voronoi.txt
   GEN      doc/interpreter/triplot.txt
   GEN      doc/interpreter/griddata.txt
   GEN      doc/interpreter/convhull.txt
   GEN      doc/interpreter/delaunay.txt
   GEN      doc/interpreter/inpolygon.txt
   GEN      doc/interpreter/voronoi.eps
   GEN      doc/interpreter/triplot.eps
   GEN      doc/interpreter/griddata.eps
   GEN      doc/interpreter/convhull.eps
^C^C^Cfatal: caught signal fatal: caught signal fatal: caught signal
Interruptfatal: caught signal Interrupt -- stopping myself...Interrupt
-- stopping myself...Interrupt
  -- stopping myself... -- stopping myself...

voronoi.eps (the very first EPS figure generated) is failing.

OK, well I didn't follow your description precisely on locating the
OSMesa libGL.so and instead chose

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGL.so make

which I assume is the generic linux driver (as opposed to graphics-card
specific which I think is /usr/lib/nvidia-340/libGL.so), but that
appears to work.

Well, that's at least getting me a little further in a convenient way
(i.e., don't have to build with the

configure --without-osmesa --no-recursion

).

And actually that may be a partial solution, at least for people
building with nVidia drivers, in the sense that if the build process can
recognize that there is a libGL.so driver in the conventional location,
i.e., directory

/usr/lib/x86_64-linux-gnu

or some standard make-tools LDPATH variable, force the documentation
steps to pre-load *that* driver instead of any possible graphics-card
specific driver that it would use otherwise.  What does it matter which
driver is used if there is going to be no screen graphics?

Another alternative is to maybe explore some settings in the following
list of OSMesa build options (OSMesa is built particular to Octave's
make?  I'm not sure, really.  I looked at this some time ago...oh wait,
I'm actually remembering now having debugged swrast at some point so the
following options would be an extensive rebuild):

https://www.paraview.org/Wiki/ParaView/ParaView_And_Mesa_3D#Mesa_for_your_CPU_.28i.e._without_graphics_hardware.2C_and_OSMesa.29

Anyway, how about the search for the standard libGL.so and pre-load when
building documentation?

Dan